Back to Psych 221 Projects 2013
Due to improvements in digital technologies, image forgeries are increasingly believable and easy to produce. There are some situations, including forensic settings, where it is necessary to detect and localize such manipulations in images. A robust technique to identify image forgeries on uncompressed images is presented in this paper.
Cameras introduce specific statistical correlations in images . One such identifiable pattern is caused by color filter arrays (CFA). Typically, each pixel sensor only measures the value corresponding to one color channel. The most common mosaic arrangement is called the Bayer Filter, which is shown in Figure 1. To produce red, green, and blue (R,G,B) values at each pixel, demosaicing algorithms are used to interpolate missing color channel values by taking a combination of neighbor’s color channels . The authors of  suggest that it is sufficient to treat the color channels independently and assume that a linear combination is used for interpolation. While we follow this advice and assume the linear model in this paper, we show that considering the dependence between separate color channels produces more accurate results.
In the seminal paper by Farid , the assumption of linearity allows an expectation maximization algorithm (EM) to be used. This predicts whether each pixel is likely to be a linear combination of its neighboring pixels in two-dimensions. Periodicities in this likelihood map appear as strong peaks in the Fourier magnitude spectrum. In regions where tampering is present, these peaks will appear in different locations, or not at all. A follow-up paper  expands this technique to look for the periodicities introduced by CFA demosaicing algorithms in particular. In regions where tampering is present, the correlations introduced by CFA are obscured. Our paper attempts to reproduce the results found in  and provide additional refinements to improve the efficacy of forgery detection.
An EM algorithm is used for unsupervised clustering of a data set. The power of the EM algorithm derives from its iterative alternation between an expectation step (E) and a maximization step (M). In the E step, the likelihood of each data point belonging to each of the models is maximized given the parameters of the clustered models. In the M step, these parameters are recomputed given the likelihood that each data point belongs to each model.
As described in , in the case of detecting image forgeries, our aim is to determine the likelihood that each pixel in the image belongs to the sampled or unsampled clusters. Pixels belonging to the sampled cluster possess independent values that were measured by the camera’s sensors. However, pixels in the unsampled cluster are approximately a structured linear combination of their neighbors’ pixel values.
In an attempt to capture the effect of multiple sampling methods in an image we developed a multi-model extension to the EM algorithm approach proposed in . Namely, our generalized algorithm has the ability to classify pixels into 1 of K classes, where the authors of  have chosen the specific case when K=2. For example, the multi-model approach is able to classify pixels as belonging to the green color filter demosaicing pattern versus pixels belonging to the red color filter demosaicing pattern versus unsampled pixels. However, after experimental verification, it was determined that the multi-model approach is computationally much more intensive and can be accurately approximated by the simpler K=2 case. Therefore, in the rest of this paper, we build upon results assuming that pixels are only classified as sampled or unsampled.
In this paper, a raw image of dimension [n x m x 3] is passed into the EM algorithm and the result is a likelihood map of dimension [n x m x 3] that each pixel in the given image is unsampled. An example of which is shown in Figure 2. As mentioned in the Introduction section, our method does not assume an independent model between the three color channels. As a result, although our method is more computationally intensive than that proposed in , it more accurately represents actual demosaicing algorithms. Although difficult to see in the spatial representation, definitive periodicities exist in likelihood maps such as the one shown in Figure 2. However, due to details in the underlying image scene and noise effects, other artifacts are also visible in the EM likelihood map.
Discrete Fourier Transform
To uncover the underlying periodicities located in the likelihood map, we utilize the two-dimensional discrete Fourier transform (FFT). The formula for the two-dimensional Fourier transform is
It is intended that transorming the likelihood map using sinusoidal basis functions will make the periodicities more apparent.
Different CFA demosaicing algorithms introduce telltale periodicities as shown in . In the frequency domain, these periodicities appear as strong peaks in the magnitude spectrum. Typically, these peaks appear at predictable locations such as when one or both dimensions are at the Nyquist frequency. Figure 3 below shows an example of peaks occurring at Nyquist frequencies in the Fourier domain. Just as in , we have upsampled the likelihood map by a factor of two before taking the Fourier transform. The effect is that the frequency content shrinks inward toward zero to make the peaks more visible for the reader. This operation is performed for visualization purposes only.
While the two-dimensional Fourier transform is a powerful tool for exposing periodicities in a two-dimensional map, it provides absolutely no information on where the frequency content comes from in the underlying map. Therefore, a raster scanning window approach is developed to provide spatial resolution to the tampering localization process. The window is scanned across the image to analyze the periodicities in the likelihood map at different locations. In this paper, we parameterize this square window by its length (window_size) and the amount by which successive windows shift in each dimension (hop_size).
At each window location, we extract a portion of the likelihood map of dimension window_size by window_size. We then follow the method described in  to compute the magnitude of the Fourier transform of the patch. First, we apply a radially-symmetric Hann windowing function to decrease the effects of edge artifacts on the patch. This windowing function is described by
and 1 otherwise. Where r is the radial distance from the center of the patch as defined by
The two-dimensional Fourier transform of the windowed patch is then computed and is resized to be fft_size by fft_size. A radially-symmetric high pass filter is applied in the frequency domain to attenuate the lowest frequency components of the Fourier transform. These components are superfluous and are not descriptive of the periodic nature of the patch. The DC component, in particular, is an artifact of the fact that the range of data values is not symmetric about zero. The high pass filter is the normalized two-dimensional Gaussian, and is described by
where the Gaussian is centered in the middle of the magnitude spectrum image and σ is the standard deviation of the two-dimensional Gaussian.
Lastly, we perform gamma correction using a factor of 2 on the high pass filtered spectrum and take its magnitude. Figure 4 displays a graphic with the raster scanner at a location in the image and the corresponding magnitude spectrum of the Fourier transform for each color channel of the window. In the figure below, the green box denotes the location of the tampered region, and the blue box denotes the current location of the raster scanner.
The results of the forgery detection algorithm are highly sensitive to the window_size and hop_size parameters describing the raster scanning window. The hop_size parameter directly affects the resolution of the detection algorithm. Decreasing the hop_size will only improve the results at the expense of more computational intensity.
However, the window_size parameter should be carefully chosen to balance the trade-offs between spatial and frequency domain resolution. A window_size that is too large makes it difficult to localize tampered regions within the window. On the other hand, while a smaller window_size improves the spatial resolution, it suffers from more windowing in the spatial domain. As a result, the Fourier transform of a small window lacks finer details.
It is often possible for a human to visually detect the difference between tampered and untampered regions by examining the magnitudes of the Fourier transforms. However, it is our goal in this paper to automate this detection process. To this end, we seek to identify features in the magnitude Fourier spectrum that distinguish a region as tampered or untampered.
It should be noted that CFA demosaicing algorithms introduce energy in only a subset of the frequency components of the magnitude spectrum. It is these frequency elements that we consider in determining the authenticity of a patch. For computational reasons, it is desirable to reduce the dimensionality of the magnitude Fourier spectra before attempting classification.
Principal Component Analysis (PCA) is considered in an attempt to automatically identify the most important dimensions of the magnitude Fourier spectrum. In general, PCA is a procedure to map a set of potentially correlated variables into a set of linearly uncorrelated variables whereby the dimensionality of the data is dramatically decreased.
Theoretically, this procedure would have the desired effect of decomposing the magnitude Fourier spectrum into its most salient components. However, it was experimentally determined that the additional computational cost required to run the PCA algorithm on the full dimensional data outweighed its benefits. Instead, we develop a novel approach inspired by , where various metrics on the magnitude Fourier spectrum are designed to extract prominent features. These metrics involve peak-based, variance-based, and template-matching methods. For example, Figure 5 shows the template used in the template-matching metric. The intuition behind this particular metric is to expose patches whose Fourier transforms contain a large portion of their energy around the Nyquist frequencies.
It has been experimentally determined that these metrics are computed very efficiently. Furthermore, they map the high-dimensional magnitude Fourier spectrum into a much lower-dimensional representation. By using the metric illustrated in Figure 5, the dimensionality of the data is reduced from [fft_size by fft_size] to 1. This equates to a reduction in dimensionality of more than three orders of magnitude.
Having computed metrics that compactly represent the training patches, a predictive mechanism is necessary to classify a query patch as tampered or untampered. One popular binary classifier is Support Vector Machines (SVM). After training on a representative data set, an SVM builds a model to classify new data into one of two categories. Figure 6 illustrates an example of a trained SVM in a two-dimensional space, using two of our metrics to classify query patches. Green dots indicate patches labeled as untampered, red dots indicate patches labeled as tampered. Black circles surround data points that are used as support vectors in calculating the linear classifying hyper plane.
Support Vector Machines are designed to find an optimal distinguishing hyperplane between two clusters of data. However, this is not always the desirable functionality. For instance, in the case of tampering detection, false positives should be more heavily penalized than false negatives. These penalty relationships are difficult to incorporate into an SVM framework.
We develop an alternate framework in one dimension, using our highest-performing metric, which is shown in Figure 5. This framework allows for direct tuning of the number of false positives allowed. In , the authors recommend allowing a 1% false positive rate. To achieve the desired false positive rate, we design a thresholding algorithm. This algorithm maximizes the amount of fake values correctly classified by the threshold while maintaining the specified misclassification ratio of real values. Figure 7 shows an example of a one-dimensional spread of metric scores, their classification labels, and the resulting threshold of the thresholding algorithm.
One potential problem while training occurs when the raster scanner extracts patches from the likelihood map that correspond partially to untampered regions and partially to tampered regions. The effect is that the patch possesses some of the characteristics of both types of region. It is not recommended to assign a binary classification label for this type of patch. Instead, for robustness, we develop a continuous-valued classification strategy based on the proportion of each training patch that encompasses a tampered region. For example, a patch whose window overlapped one quarter of a tampered region and three quarters of a real region would possess a continuous label value that is twice as close to "real" as it is to "fake." This continuum of labels is demonstrated by a continuum of red and green colors in Figure 7.
We use a image data set provided by the authors of  to train and test the algorithm described in this paper. The results below correspond to training the thresholding algorithm on 90% of the data set. Then the trained algorithm is queried using the remaining 10% of the data set. The modifications between example real and fake images are too small to be apparent to the human eye and are shown in Figure 8. These pixel level differences are increased by a gamma factor of 8 for visibility.
The efficacy of the method discussed in this paper can be visualized using a confusion matrix. Such a matrix indicates the number of true positives, false positives, true negatives, and false negatives in a tabular format. We define a true positive to be an actually tampered image declared as a tampered image and a false negative as an actually tampered image declared as an untampered image, etc. Figure 9 shows the confusion matrix computed from the testing results.
Figure 9. The confusion matrix
The reader should note that both the false positive and false negative rates are smaller than the true positive and true negative rates, which is an indication of acceptable performance of the algorithm. Furthermore, the false positive limitation requirement manifests itself in a very small value in the confusion matrix. We should point out that the false positive rate requirement was set per patch, not for the overall image. And an image is declared as false if just one of its patches is declared as fake. Therefore, the 10% false positive rate shown in the confusion matrix does not contradict the 1% false positive condition posited in the thresholding section. Because there was no limitation placed on the false negative rate, this score is acceptably approximately twice as high.
Our algorithm performs well when the tampered region has dimensions of approximately window_size by window_size. In general, the smaller the tampered region is in comparison to the raster scanning window, the worse the algorithm performs. Figures 10 and 11 show a case where our algorithm correctly identifies a tampered region in an image, and correctly neglects to identify any tampered regions in the untampered counterpart image. Notice the similarity between Figures 8 and 10. These two figures derive from the same tampered image and so it is expected that the detected tampered region in Figure 10 spatially lies within the green rectangle of Figure 8.
Figures 12 and 13, on the other hand, show a case where a small tampered region cannot be detected by our algorithm. In these figures, the actual tampered regions are denoted by a green rectangle, and our algorithms tampering detection is denoted by black regions.
After running our proposed algorithm on the supplied test images given after the presentation, there are a few key results we should comment on. First, our algorithm appears to suffer when applied to a JPEG compressed image. The artifacts introduced by the quantization step and the macro-blocking step of the compression procedure greatly modify the periodicities introduced in the CFA demosaicing algorithm. Furthermore, one can view the JPEG quantization step as a lowpass filtering in the frequency domain. Because our algorithm looks for magnitude spectrum peaks at certain locations, lowpass filtering directly effects the efficacy of our peak-finding metrics.
Furthermore, many of the images in the test suite had modifications near the borders of the images. To avoid bordering artifacts introduced by edges, our algorithm does not train on bordering patches. Thus, tampering regions on the borders of images are difficult for our algorithm to recognize. Finally, as we mentioned earlier, tampered regions that are smaller than window_size by window_size are challenging for our algorithm to identify. A modification to take into account smaller tampered regions would be to employ a multi-scale approach on top of our algorithm.
Our method is primarily rooted in the assumption that cameras employ Color Filter Array demosaicing algorithms. And that these algorithms introduce specific detectable statistical correlations into the pixel values of the image. Tampering after image capture will often lead to discrepancies in these correlation patterns. Using the methods outlined in this paper, we have had success in detecting various types of tampering in images.
It should be noted that this technique is not a panacea. It is recommended that this method be employed in conjunction with other techniques to expose digital forgeries. In fact, in , the authors mention a specific vulnerability of this family of methods. Namely, an attacker could resample a tampered image onto a known color filter array lattice and perform demosaicing to conceal the forgery. However, such an attacker would need considerable knowledge of the minutia of color filter array demosaicing algorithms to produce an undetectable cover-up.
Future research should focus on abstracting the metric thresholding method developed in this paper to a higher dimensional space. It was experimentally determined that additional reasonably orthogonal high-performing metrics allowed SVM to perform better than our proposed approach. However, the lack of direct tuning of the false positive rate kept this method from being a useful technique. If the false positive rate could be controlled in an SVM setting, it is theorized that the classification results would improve considerably.
Also, to improve on the thresholding mechanism adopted in this paper, future work should consider constructing a Receiver Operating Curve (ROC) to determine the optimal classification parameters. Various metrics and their corresponding thresholds should be used on the test data. The metric and the threshold value that create the largest area under the ROC should be chosen as the optimal classification parameters.
References - Resources and related work
 Farid, H. Image Forgery Detection: A Survey. IEEE Signal Processing Magazine. March 2009.
 Farid, H., Popescu, A.C. Statistical Tools for Digital Forensics. 6th International Workshop on Information Hiding. Toronto, Candada. 2004.
 Farid, H. Exposing Digital Forgeries in Color Filter Array Interpolated Images. Signal Processing, IEEE Transactions. Vol. 53, Issue 10. October 2005.
 Farid, H., Popescu, A.C. Exposing Digital Forgeries by Detecting Traces of Resampling. Signal Processing IEEE Transactions. Vol. 53, Issue 2. February 2005.
 DigitalFilms. Photo. digitalfilms.wordpress.com. 16 March 2013.
Appendix I - Code
Appendix II - Work partition
Drew Schmitt: FFT, Metrics, SVM, Thresholding
Kurt Werner: Raster Scanner, FFT, PCA, Metrics