FRET and libfret - helping understand file formats
FRET is a command line tool which can analyse a single or multiple files in order to identify structures within these files. It is designed to be a developerīs tool which can be used to analyse files when the developer does not know the internal structure of the files. It complements currently available tools such as hex editors and binary diff tools.
The programīs functionality is based around a library called libfret. libfret has a clear API and is designed to be easily integrated with other tools. libfret can analyse any buffer(s) containing data.
There is no magic here. Each file or buffer is firstly scanned using a heuristic algorithm which attempts to identify the data structures within the buffer. If it does identify a structure, it also assigns a risk to this structure - the risk is a statistical measure of how probable it is that the structure is not genuine and just a random occurance within the file. When all the buffers have been scanned individually, they are also compared to each other with the aim of identifying structures common to more than one buffer. Finally, all of the detected structures are analysed and rationalised and a list of detected structures is output. FRET and libfret are not designed to be fast - they should perform a slow, methodical analysis of the target files. The most important design goal was to create an architecture which allows the painless integration of new functionality.
The libfret API uses a specific terminology. The most important terms are;
The analysis of a buffer or buffers is divided into 6 Phases. The libfret API allows single or multiple buffers to be analysed. If only a single Buffer is analysed then only Scans from Phases that do not require multiple Buffers can be used. Improved results can be obtained by analysing a larger number of Buffers.
- Buffer. An array of bytes, of finite length, which stores the data to be analysed.
- Gram. A data structure that is detected in one or more Buffers. Grams can be found in a single Buffer or if found in more than one Buffer, they are called generic Grams and they are not associated with any specific Buffer.
- Scan. A Scan is a single heuristic algorithm that is used to examine files or Buffer Grams for new Grams.
- Phase. Scans belong to one of 6 Phases. All of the Scans in the same Phase operate on the same type of source data. Scans in one Phase operate only on raw Buffer data whereas Scans in another Phase may operate only on Buffer Grams.
If more information is required, please see the FRET project website.
- Phase 1: Pre-process the target Buffers. Undo any compression or obfuscation of the Buffers.
- Phase 2: Scan each Buffer individually, identifying Grams within the Buffer. Identifies Grams such as text strings and fill sequences.
- Phase 3: Compare detected Grams to raw data for the same Buffer. Identifies Grams such as offsets within Buffer to other data in the same Buffer.
- Phase 4: Compare the raw data in multiple Buffers, identifying Grams that are present in more than one Buffer. Identifies patterns of data that are common to multiple Buffers.
- Phase 5: Compare detected Grams for each Buffer against the detected structures of other Buffers. Identifies Grams that are generic and present in more than one Buffer.
- Phase 6: Analyse the detected structures (Grams) for all of the Buffers and rationalise/clean-up the results.
Damian Ivereigh for his excellent Red-Black Tree implementation called libredblack. Multiple Red-Black Trees are used for internal Gram storage.
- See also:
Landon Curt Noll for making the FNV (Fowler-Noll-Vo) hash source code available in the Public Domain.
- See also:
The project source code is released under the GNU General Public License. For more information see the file COPYING that is distributed with the source code.
- See also:
Michael McCarthy is a professional Software Engineer. He can be contacted at <michael.mccarthy--AT--ieee.org>
- See also:
- http://www.fsf.org for more licence information.
Generated on Thu Jan 19 18:59:19 2006 for FRET by