Please use this identifier to cite or link to this item:
https://ptsldigital.ukm.my/jspui/handle/123456789/476191
Title: | Byte frequency analysis with multi circular indexing for file type classification |
Authors: | Hongshen Xie (P59449) |
Supervisor: | Azizi Abdullah, Dr. |
Keywords: | Frequency analysis |
Issue Date: | 15-May-2014 |
Description: | Digital forensics is generally about recovering and investigating the content of digital devices such as computer devices or storage devices. Examining information and extracting evidences from the digital devices are not an easy task. In data recovery for example, the successful in recovering the digital information is highly dependent on how a method is able to understand the content of a document effectively. The more the system is able to understand the content of documents the more effective it will be in recovering the desired documents. One of the challenging issues in recovering documents is to determine the type of file types from an incomplete structure of documents. One possible solution to the problem is based on statistical analysis such as the byte frequency analysis. The byte frequency analysis computes a global descriptor and provides a statistical distribution of file fragments. However, one possible limitation of this method is to create a global histogram input vector for a machine learning classifier, such as support vector machine it is insensitive to small changes in describing the content of fragment files. Besides, it does not include any spatial information, and liable to false positive especially for large datasets. Therefore, as our contribution, the byte frequency analysis with circular scheme representation is proposed, which is used to incorporation with spatial information where a set of file fragments is divided into several blocks using a fixed partitioning scheme. Then, for each block the lower-level byte frequency analysis descriptor feature is used to represent the partitions. After that, all features are combined to create one large input vector for machine learning classifier for classification. We have performed experiments on 10 different file categories at three different resolutions i.e. level0, level 1, level 2 and combination of several of these resolutions. The results show that the proposed method slightly outperforms the single byte frequency analysis distribution.,Master/Sarjana |
Pages: | 98 |
Publisher: | UKM, Bangi |
Appears in Collections: | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Byte frequency analysis with multi circular indexing for file type classification.pdf Restricted Access | Partial | 13.87 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.