FRET helping understand file formats
 

user manual version 0.0.6

introduction

FRET allows you to examine files in several ways. The most basic way to use FRET is pass it the name of two or more files at the commandline, however it provides several commandline options that allow you to specify the type of analysis you need performed.


installation

FRET is currently distributed as a source tarball i.e. fret-*.tar.gz . It may be possible to find a packaged version of the binary that has been prepared by a third party. The tarball can be installed using the standard GNU process, see the INSTALL file for more information.


querying the command line options

[bob@localhost working]$ fret --help
Usage : fret [OPTION] [INPUT FILES]
Analyse structure and layout of a file or files.

-v --version prints version information then exit.
-h --help displays this help message then exits.
-l --list List all of the availables scans.
-g --generic Prints only generic grams and not the grams of individual buffers.
-m --min LENGTH Minimum number of bytes that a Gram may have. Default is 8 bytes.
-p --phase PHASE Only scans from this phase will be run. The phase is entered as an integer. Currently Phases 2-4 are supported.
-s --scan SCAN Specify a scan to be run on the file(s). The scans one word textual name must be given in full.


querying the version

[bob@localhost working]$ fret --version
fret 0.0.6
Copyright (C) 2005 Michael McCarthy
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.


analysing files

FRET can analyse a single or multiple files. If only analysing a single file, then FRET will parse the file looking for data structures and will print a table describing the structures it found. In this way, FRET can be used to detect and extract ASCII strings or other structures in a file.

FRET provides much more powerful results when multiple files are compared. When multiple files are provided as input, FRET can compare them to identify structures that are common to more than one file. Before printing the results of an analysis, FRET will attempt to amalgamate structures that have been detected in more than one file. Structures that are common to more than one file are called generic structures and they are differentiated from file specific results by the word "generic" in the filename column.

Below is an example of analysing and comparing three files using FRET.
[bob@localhost working]$ fret test/fileA.txt test/fileB.txt

Gram ID Type Risk   Offset Length Freq Filename
17 fill 2.939e-39   0 16 1 test/fileA.txt
213 offset_cur 3.750e-01   16 2 1 test/fileA.txt
39 ascii 4.010e-08   48 16 1 test/fileA.txt
55 fill 2.939e-39   0 16 1 test/fileB.txt
341 fill 2.939e-39   16 16 1 test/fileB.txt
15 ascii 5.343e-06   48 16 1 test/fileB.txt
408 fill 2.939e-39   0 16 2 generic

explanation of Type

Each detected structure in a file (called a Gram in FRET terminology) is assigned a type based on the method (or Scan in FRET terminology) that detected it. Following is an explanation of the types of Gram that are detected;

Gram Type Description
TEXT_ASCII ASCII encoded string of 1-byte characters
FILL_BYTE sequence of identical bytes
FILL_SHORT sequence of bytes that repeats every 2 bytes.
FILL_LONG sequence of bytes that repeats every 4 bytes.
OFFSET_BEGIN offset in number of bytes to an existing Gram within a file from the start of the file. offset can be stored in either a byte, 2 bytes or 4 bytes
OFFSET_CURRENT offset in number of bytes to an existing Gram within a file from the location of the Gram. offset can be stored in either a byte, 2 bytes or 4 bytes
MATCH_EXACT pattern of bytes that occurs in this file and has been detected in at least one other file.

performance

Most of the analysis methods used by FRET have analysis durations that grow linearly with the number of files and the file size. However, it should be noted that the scanGrind algorithm that matches Grams in files has performance such that its analysis duration grows exponentially with file size (actually as a square of the file size). Increasing the number of analysed files shows the standard linear growth in analysis duration.

This means that the user is strongly advised to not compare large files against each other as the analysis duration will be extreme. For example, comparing two 100kB files takes just 5 minutes on a 3GHz machine, whereas comparing two 1MB files will take about 9 hours.

In order to reduce analysis times you may extract the interesting section of a file e.g. the header and then use this for analysis. Another option is to convert your house into a large grid supercomputer but this is more expensive.


explanation of Risk

Each detected structure in a file (called a Gram in FRET terminology) is assigned a Risk. This is the probability that the structure occured randomly within the file and is not a valid structure. The higher the Risk value, the greater the probability that the Gram is a "false positive". This value can be used to eliminate or rank results.


explanation of Freq

Each detected Gram in a file is assigned a Frequency. This is the number of occurances of this Gram that have been detected. For file specific Grams this will always be 1 but for group Grams this value should be greater than 1 since group Grams must exist in more than one file.