Multimedia Security, Volume 1. William Puech
Чтение книги онлайн.
Читать онлайн книгу Multimedia Security, Volume 1 - William Puech страница 15
1.4.2.2. Double demosaicing detection
Another method proposes to directly detect the CFA pattern used in the image (Kirchner 2010). In order to do this, the image is remosaiced and demosaiced in the four possible positions, with a simple algorithm such as bilinear interpolation. The reasoning is that demosaicing should produce an image closer to the original when it is remosaiced in the correct position. They then compare the residuals to detect which position of the CFA has been used. Since CFA artifacts are generally more visible in the green channel, they decide first the position of the sampled green pixels, before deciding between the remaining two positions with the red and blue channels, a paradigm that has been used in most publications since then. Their use of the bilinear algorithm limits them in the same way (Popescu and Farid 2005; González Fernández et al. 2018) due to the linearity and chromatic independence of the bilinear algorithm, which is not shared by most modern demosaicing algorithms. However, their method does not depend on the choice of algorithm, and could therefore provide very good results should the originally used demosaicing algorithm be known.
1.4.2.3. Direct detection of the grid by intermediate values
In order to break away from a specific algorithm, Choi et al. (2011) highlights that pixels are more likely to present extreme values locally in the channel in which they are sampled and, on the contrary, to take on intermediate values when they are interpolated (Choi et al. 2011). Therefore, they count the number of intermediate values in the four positions to decide which position is the correct one. The idea that pixels are more likely to take extreme values in their sampled channel is generally true with most algorithms, which makes this method produce good classification scores. However, the probability bias can be reversed when the algorithms make heavy use of the high frequencies of other channels, which can lead to confident, but incorrect detection of certain regions of the image.
1.4.2.4. Detecting the variance of the color difference
Shin et al. (2017) attempts to avoid the assumption that color channels are processed independently. Instead of working separately with each channel, as was done until then, they work on the difference between the green and red channels, as well as between the green and blue channels. This reflects more accurately the operations done by many demosaicing algorithms, which first interpolate the green channel before using the green channel’s information to interpolate the red and blue channels. They compute the variance of these differences in the four possible patterns on the two computed maps, and identify the correct pattern as the one featuring the highest variance, which is expected of the original pattern, whose pixels are all sampled instead of interpolated. Although the dependence of the color channels is hard-coded, the color difference is actually used in many current algorithms and represents a first step toward a full understanding of demosaicing artifacts.
1.4.2.5. Detection by neural networks of the relative position of blocks
More recently, Bammey et al. (2020) proposed to train a self-supervised convolutional neural network (CNN) to detect modulo-(2, 2) position of the blocks in the image. As CNNs are invariant to translation, they need to rely on image information to detect this position. Demosaicing artifacts, and to some extent JPEG artifacts, are the only relevant information a network can use to this end. As a result, training a network to detect this position will implicitly make it analyze demosaicing artifacts. This will thus lead to a local detection of the Bayer matrix’s position. Erroneous outputs of the network are caused by inconsistencies in the image’s mosaic, and can thus be seen as traces of forgery.
This method obtains better results than previous works, and can help further analyze the forgery as different kinds of forgeries will cause different artifacts. For instance, copy-move will cause a locally consistent shift in the network’s output, whereas inpainting – usually performed by cloning multiple small patches onto the target area – may show each cloned patch detected with a different pattern. Other manipulations, such as blurring, or the copy-move of an image that features no mosaic – for instance due to downsampling – may locally remove the mosaic, and the output of the network will thus be noise like in the forged region. It is possible to achieve even better results with internal learning, by retraining the network directly on images to study. This lets the network adapt to different post-processing, most importantly to JPEG compression.
However, this method is more computationally intense than the other presented algorithms, especially when internal learning is needed. This makes it less practical to use when many images are to be analyzed.
1.4.3. Limits of detection demosaicing
Recent methods proposed by Choi et al. (2011), Shin et al. (2017) or Bammey et al. (2020) are able to analyze the mosaic of images well enough for practical applications. It is now possible to detect, even locally, the position of the Bayer matrix. Detecting the presence of demosaicing artifacts is generally easy, even though their absence is not necessarily a sign of falsification because most modern demosaicing algorithms leave little to no artifacts on easy-to-interpolate regions. However, the range of images that can be detected remains limited. Demosaicing artifacts are 2-periodic, and they reside in the highest frequencies. As a result, they are entirely lost when the image is downsampled by a factor of at least 2. More generally, image resizing will also rescale the demosaicing artifacts; even though those might not always be lost, detection methods would need to be adapted to the new frequencies of the artifacts. JPEG compression is an even more important limitation. As compression mainly drops precision on the high-frequency components of an image, demosaicing artifacts are easily lost on compressed images. To date, even the best methods struggle to analyze CFA artifacts even at a relatively high compression quality factor of 95. Internal learning presented in Bammey et al. (2020) provides some degree of robustness to JPEG compression; however, demosaicing artifact detection remains limited to high-quality images, uncompressed or barely compressed, and at full resolution. This complements well the detection of JPEG compression, which we will now present.
1.5. JPEG compression, its traces and the detection of its alterations
In this section, we seek to determine the compression history of an image. We will focus on the JPEG algorithm, which is nowadays the most common method to store images. Most cameras use this format but others exist, such as HEIF, used in particular in Apple products since 2017. HEIF is also a lossy compression algorithm and therefore leaves traces; nevertheless, these traces are different from the ones produced by JPEG. As we will see, the analysis of the JPEG coding of an image makes it possible to detect local manipulations. For this, the methods take advantage of the structured loss of information caused by this step in the processing chain.
1.5.1. The JPEG compression algorithm
In JPEG encoding, the division of the image into 8 × 8 blocks and the application of a quantization step lead to the appearance of discontinuities at the edges of these blocks in the decompressed image.
Figure 1.9 shows the blocking effect that appears after JPEG compression. Contrast enhancement allows us to clearly see the 8 × 8 blocks. The greatest loss of information is during the quantization step, explored in more detail in section 1.2.4. The blocking effect is due to quantization, depending on the Q parameter, applied on all 8 × 8 size blocks. Therefore, standard JPEG compression leaves two characteristic traces: the division into 8×8 non-overlapping blocks and the quantization, according to a quantization matrix, of the DCT coefficients. In other words, the two features to be detected from the