Digital Forensic Science. Vassil Roussev
Чтение книги онлайн.
Читать онлайн книгу Digital Forensic Science - Vassil Roussev страница 7
• Intermediary images, In. Zero, or more, images recorded between the baseline and final images; In is the nth image acquired.
• Common baseline is a single image that is a common ancestor to multiple final images.
• Image delta, B – A, is the differences between two images, typically between the baseline image and the final image.
• The differencing strategy defines the rules for identifying and reporting the differences between two, or more, images.
• Feature, f, is a piece of data that is either directly extracted from the image (file name/size), or is computed from the content (crypto hash).
• Feature in image, (A, f). Features are found in images; in this case, feature f is found in image A.
• Feature name, NAME (A, f). Every feature may have zero, one, or multiple names. For example, for a file content feature, we could use any of the file names and aliases under which it may be known in the host filesystem.
• Feature location, Loc(f), describes the address ranges from which the content of the particular feature can be extracted. The locations may be either physical, or logical, depending on the type of image acquired.
• A feature extraction function, F(), performs the extraction/computation of a feature based on its location and content.
• Feature set, F(A), consists of the features extracted from an image A, using the extraction function F().
• The feature set delta, F(B) – F(A), contains the differences between the feature sets extracted from two images; the delta is not necessarily symmetric.
• Transformation sequence, R, consists of the sequence of operations that, when applied to A, produce B. For example, the Unix diff program can generate a patch file that can be used to transform a text file in this fashion. In general, R is not unique and there can be an infinite number of transformations that can turn A into B.
Generalized Differential Analysis
As per [75], each feature has three pieces of metadata:
Location: A mandatory attribute describing the address of the feature; each feature must have at least one location associated with it. Name: A human-readable identifier for the feature; this is an optional attribute. Timestamp(s) and other metadata: Features may have one, or more, timestamps associated with them, such as times of creation, modification, last access, etc. In many cases, other pieces of metadata (key-value pairs) are also present.
Given this framework, differential analysis is performed not on the data images A and B, but on their corresponding feature sets, F(A) and F(B). The goal is to identify the operations which transform F(A) into F(B). These are termed change primitives, and seek to explain/reproduce the feature set changes.
In the general case, such changes are not unique as the observation points may fail to reflect the effects of individual operations which are subsequently overridden (e.g., any access to a file will override the value of the last access time attribute). A simple set of change inference rules is defined (Table 3.1) and formalized (Table 3.2) in order to bring consistency to the process. The rules are correct in that they transform F(A) into F(B) but do not necessarily describe the actual operations that took place. This is a fundamental handicap for any differential method; however, in the absence of complete operational history, it is the best that can be accomplished.
If A and B are from the same system and TA < TB, it would appear that all new features in the feature set delta F(B) – F(A) should be timestamped after TA. In other words, if B were to contain features that predate TA, or postdate TB, then this would rightfully be considered an inconsistecy. An investigation should detect such anomalies and provide a sound explanation based on knowledge of how the target system operates. There is a range of possible explanations, such as:
Table 3.1: Change detection rules in plain English ([75], Table 1)
If something did not exist and now it does, it was created |
If it is in a new location, it was moved |
If it did exist before and now it does not, it was deleted |
If more copies of it exist, it was copied |
If fewer copies of it exist, something got deleted |
Aliasing means names can be added or deleted |
Table 3.2: Abstract rules for transforming A → B (A into B) based on observed changes to features, f, feature locations Loc (A, f), and feature names NAME (A, f). Note: The RENAME primitive is not strictly needed (as it can be modeled as ADDNAME followed by DELNAME), but it is useful to convey higher-level semantics ([75], Table 2).
Tampering. This is the easiest and most obvious explanation although it is not necessarily the most likely one; common examples include planting of new files with old timestamps, and system clock manipulation.
System operation. The full effects of the underlying operation, even as simple as copying a file, are not always obvious and require careful consideration. For example, the Unix cp command sets the creation time of the new copy to the time of the operation but will keep the original modification time if the -p option is used.
Time tracking errors. It has been shown [127, 167] that operating systems can introduce inconsitencies during normal operation due to rounding and implementation errors. It is worth noting that, in many cases, the accuracy of a recorded timestamp is of little importance to the operation of the system; therefore, perfection should not be assumed blindly.
Tool error is always a possibility; like all software, forensic tools have bugs and these can manifest themselves in unexpected ways.
One important practical concern is how to report the extracted feature changes. Performing a comprehensive differential analysis, for example, of two hard disk snapshots is likely to result in an enormous number of individual results that can overwhelm the investigator. It is critical that differencing tools provide the means to improve the quality and relevance of the results. This can be accomplished in a number of ways: (a) filtering of irrelevant information; (b) aggregation of the results to highlight both the common and the exceptional cases; (c) progressive disclosure of information, where users can start at the aggregate level and use queries and hierchies to drill down to the needed level of detail; (d) timelining—provide a chronological ordering of the (relevant) events.
3.3.2 COMPUTER HISTORY MODEL
Differential analysis offers a relatively simple view of forensic inference, by focusing on the beginning and end state of the data, and by expressing the difference in terms of a very small set of primitive operations. The computer history model (CHM) [27]—one