The empirically obtained image distortion maps are compared to the
predicited visible difference calculated using (1) the widely used
root mean square error (point-by-point RMS) computed in uncalibrated
RGB values, (2) the point-by-point CIELAB
values (CIE,
1994), and (3) S-CIELAB
,
a spatial extension of CIELAB
metric. The uncalibrated RMS metric did not predict the
perceptual image distortion data well. The point-by-point CIELAB
metric provided better predictions, and the S-CIELAB
metric, which incorporated the spatial color sensitivity of the eye,
gave the most accurate predictions.
None of the metrics provided an excellent fit to the data. Image
areas with poor predictions were concentrated in regions containing
large negative local contrast. When these areas were excluded from our
data analysis, both S-CIELAB and CIELAB predictions had much better
agreement with the perceptual data. This suggests that the next step
in improving color image fidelity metrics is to re-define color
difference formula such as CIELAB
in terms of local
contrast.
Metrics for predicting the visibility of color changes of large
uniform targets have been used widely to describe tolerances for color
reproduction of large samples in the paint and dye industry. The
CIELAB metric is a standard that specifies how to transform physical
image measurements into perceptual differences. The metric was derived
from perceptual measurements of color discrimination of large uniform
targets. Though not perfect, the metric has been in use for twenty
years, and it has served as a satisfactory tool for measuring
perceptual difference between large uniform patches of colors. A
modification of the
formula was released by CIE in 1994
based on new experimental data. The new formula was found to predict
color difference slightly better than the old formula
[1]. Hence, in this paper we will use the CIE94 formula to
calculate
values.
The CIELAB metric is not suited for image fidelity. Many studies have found that color discrimination and appearance depend on spatial pattern of the images [2,3,4,5,6,7]. For example, the human visual system is not as sensitive to color differences in fine details as compared to large patches, yet the CIELAB color metric will predict the same perceptual difference for the two cases since there is no spatial variable in the CIELAB transformation.
In a spatial extension of CIELAB, named S-CIELAB [8], the
spatial-color sensitivity of the human eye is included in the metric.
The S-CIELAB metric incorporates the different spatial sensitivities
of the three opponent color channels by adding a spatial
pre-processing step before the standard CIELAB
calculation.
This spatial extension is designed to accounts for human spatial-color
sensitivity and thus improve the performance of the CIELAB
metric for patterned targets.
To test how well the S-CIELAB metric predicts image fidelity of natural images, we measured perceptual image distortion for a set of color images. These images were displayed on a CRT monitor, and reproduced using either (a) simple halftoning algorithm or (b) a simple image compression algorithm [9]. In this paper we compare how well several color metrics predict the perceptual image fidelity.
The image fidelity metrics evaluated in this paper are (1) the
widely used root mean square error (point-by-point RMS) computed in
uncalibrated RGB values, (2) the point-by-point CIELAB
values [10], and (3) S-CIELAB
,
a spatial
extension of CIELAB.
![]() |
1 |
where
,
,
and
represent the difference
in R, G, and B values between the original color image and the
reproduction.
The RMS metric does not include any information about the device used to present the images. Therefore, the RMSE value computed using the above equation is un-calibrated. Using un-calibrated image values to measure perceptual difference is poor practice, because the displayed image can differ depending on the display hardware. Still, because the RMS measure is commonly used in this fashion, we included it in this analysis.
where
,
and Xn, Yn, Zn define the white point.
The CIE X,Y,Z values contain information about the level of light
absorption by each of the 3 types of cone photorceptors. The XYZ
values are close to being linear transformations of the light
absorption levels of the long, middle, and short wavelength sensitive
cones on the human retina. The non-linear
transformation takes into account the findings that perceptual
difference between two colors is nonlinear and better predicted by the
contrast difference between the two colors [12]. This
nonlinearity can be quite large in some regions of the color space,
resulting in strongly elongated iso-sensitivity contours in the XYZ
space. The
transformation attempts to make the
iso-sensitivity contours circular in the
space.
The CIELAB space was intended to be a perceptually uniform color space, so that equal distances in the color space represent equal perceived differences in appearance. Color difference is defined as the Euclidean distance between two colors in this color space:
![]() |
2 |
The CIELAB transformation is relatively simple, but it does not result
in a perfectly uniform perceptual space [13]. The goal of
the transformation was to make iso-sensitivity contours circular and
about the same size everywhere in the color space, and the CIELAB
transformation achieves this goal only approximately. In addition,
visual environmental factors such as the ambient illumination,
background color, etc. also affects color discrimination. In 1994, the
formula was modified based on new experimental data, and in
an effort to allow easy parametric correction of the formula. The new
color difference formula is calculated as a weighted distance between
two colors in the lightness, chroma, and hue space (
,
,
):
The symbols
,
,
and
represent the differences between the two colors to be
compared along the lightness, chroma, and hue dimensions, SL,
SC, and SH represent weighting factors calculated from the
chroma coordinates of the ``standard'' of the two colors compared, and
kL, kC, and kH are parameters specific to experimental
conditions [1]. In this paper, we will use the CIE94
formula to compute CIELAB
values, and the result will be
denoted as
.
We set the kL, kC, and kH values
to be 1. The values SC and SH are calculated from the chroma
coordinates
of the colors in the original (un-distorted)
images.
The
E calculations require specification of a white point.
For the calculations in this study, the white point is set to be the
white point of the display used in the Image Distortion Map
experiment. This is not the best way to choose a white point, because
the white point is supposed to depend on individual images, not the
display device. However, since the display white was always visible on
the screen when subjects did the Image Distortion Map experiment (as a
white button on the gray background), this choice of white point was
considered acceptable.
To make the spatial extension to CIELAB modular, the extension is
added as a spatial pre-processing of the images before computing
CIELAB differences. The purpose of the extension is to remove the
image components that cannot be seen by the eye. S-CIELAB consists of
three processing steps. First, the original and distorted images,
which are represented in a device-dependent space, are converted into
a device-independent representation consisting of one luminance and
two chrominance color components for each image
[2,3]. Second, each component image is passed
through a spatial filter that is selected according to the spatial
sensitivity of the human eye for that color component. Third, the
filtered images are transformed into the CIE-XYZ format such that the
CIELAB color difference formula can be applied to give a S-CIELAB
map, which tells us where the visible distortions are
in the image, and how large the distortions are.
The interpretation of the
E values is the same as the
interpretation of the standard CIELAB
values, i.e.,
distortions of 1
unit is at threshold visibility at
optimum viewing condition. Under less-controlled viewing conditions in
practice, distortions with
values around 2 or below
are generally not visible.
Because of the design of the spatial filters, the S-CIELAB predictions are the same as the CIELAB predictions for large uniform targets. S-CIELAB will typically predict lower visibility of color differences for textured regions, however. Qualitatively, these predictions are consistent with measurements of human spatial-color sensitivities.
Other factors that affect perceptual fidelity measurements include, but are not limited to, the adaptation of the eye to ambient illumination [14,15,16,17,18], contrast masking effect of spatial patterns [19,20,21], and higher level cognitive processes such as memory and attention [22]. Other color image metrics exist that take into account one or more of these effects, such as DCTune [23]. For this paper, however, we will limit our scope to the three metrics described above, because they are all simple metrics with the calculation tools readily implemented, and they generate predictions in the format of pixel-by-pixel distortion maps, which is consistent with the format of the data we will use.
To test the accuracy of color image fidelity models, it is necessary to have a data set of experimental measurements establishing where and how subjects perceive image reproduction errors on real images. In this paper we use a set of measurements of perceived reproduction errors between six natural images and reproductions of these images created using (a) digital halftoning (void and cluster), and (b) image compression (JPEG-DCT).
![]() |
In this experiment, subjects identified regions in halftoned or compressed images that appeared to be different from the original image. They were asked to mark all image regions that had visible distortions with a digital marker stamp in different sizes, the smallest of which is circular with a diameter of 10 pixels, or 0.4 degree of visual angle. Subjects were encouraged to use the smallest stamp size whenever they can (it's also the default stamp size at the beginning of each image presentation). They were instructed to mark all regions that have visible difference until all differences had been covered by marks. Because of the fixed stamp shape and sizes, most of the times it was impossible to avoid stamping identical regions of the two images. The smallest marker size limits the spatial resolution of the measured image distortion map and we account for this in the data analysis.
A total of 24 subjects performed the task for all image pairs. The error marks produced by the subjects were pooled for each pair of images. From these pooled data we calculate the probability of a mark covering each pixel in a halftoned or compressed image, which we call image distortion maps. Figure 1 shows the image distortion map for an original and its reproduction. The probability that a pixel is marked is represented by the gray level: Light regions correspond to frequently marked areas (high visible differences) and dark regions correspond to infrequently marked areas (low visible difference). More details about the method and procedures of this experiment can be found in another paper [9].
This experiment provides a large set of perceptual distortion data on real images in a calibrated format. This data set has several benefits when used to evaluate image fidelity metrics. First, the data was collected for images, not simple spatial frequency patterns, so that the evaluation result can generalize better to practical situations. Second, people can identify where the distortions are on an image very well, so marking regions of distortion is a more natural task than rating an image as a whole for its fidelity. Third, the image distortion maps contain a large amount of perceptual data. Instead of having single number ratings, we have a full map of empirical distortion measures for each image. They provide large amount of information that we can use to evaluate the accuracy of theoretical image fidelity metrics.
From the monitor calibration data, we computed the CIE XYZ representations of each image as shown on the CRT display. These XYZ values were used to compute the point-by-point CIELAB and S-CIELAB error values. Point-by-point RMS errors were computed from the frame buffer values and needed no calibration information.
The empirical image distortion maps cannot be compared directly to the distortion values calculated from metrics. Some image regions have very small or no visible differences between the original and the reproduction, but they were close to regions of large perceptual error. These regions may be covered by marks intended to cover the nearby high distortion regions as a result of the fixed stamp sizes and shape. Data from such regions in the image distortion map do not accurately reflect the perceptual difference between the two images at those locations, therefore we should exclude these points before comparing the data from the image distortion maps to the metric predictions.
To determine which pixel locations are likely to have been marked due to proximity to pixels with large perceptual error, we compare the metric values at each pixel location with the largest metric value in its 10-pixel diameter circular surround (the size of the smallest marker). For a target pixel location, we search for nearby pixels with significantly larger predicted perceptual error. If such a pixel exists, then the probability of the target pixel's location being marked is likely to be affected more by perceptual distortion levels of its neighbors than its own distortion level. We exclude such target points from the data analysis.
Specifically, let Mi,j represent the measured perceptual distortion value by a particular metric at pixel location (i,j), and Mc represent a criterion value for this metric, then we select points on the image distortion map that satisfy the following:
![]() |
3 |
Pixel locations with a distortion value significantly smaller than any
of its neighbors in the 5-pixel radius circular neighborhood
(according to the criterion Mc) are not used in the data analysis;
the probability of these points being marked is likely to be affected
by a neighboring point with much larger perceptual distortions.
Ideally, the selection criterion Mc should be determined by the
``true'' perceptual distortion values in the image. However, such data
is not available to us. Therefore, we use the theoretical perceptual
distortion values instead to select Mc. The criterion value Mcwe used were chosen separately for each metric, so that evaluation of
each metric is not dependent on predictions of other metrics. For
S-CIELAB
values, we used Mc = 0.5. For CIE94
,
we used Mc = 1.5. For the RMS metric, we used Mc = 0.02.
The selection of these criterion values is somewhat arbitary - the
only constraint we have is that the proportion of image points
selected using these criterions should not be higher than a few
percent (which is roughly the percentage of points that can be
independently marked given the size and shape of the markers). The
above criterion values were selected to keep approximately the same
number of image points in the data analysis for the three
metrics. Varying the criterion values up and down within the above
constraint did not change the analysis result significantly.
Because the data selection method depends on perceptual distortion measures by different metrics, we obtain a different selection of image pixels for different metrics. Assuming that the metric predicts perceptual distortion accurately, the selected image pixel locations should have relatively independent probabilities of being marked by the subjects. Thus, we can treat subjects' marks at these selected pixel locations as independent binary choices.
![]() |
Figure 2 shows the image points selected for data
analysis for a perceptual error map predicted by the CIELAB
metric. The selections were computed for all image pairs used
in the experiment and for all metrics tested.
For all three metrics, approximately 3-6% of the image pixel locations were selected for data analysis. This still leaves us with a large number of data points that we can use to evaluate the metrics. Each image was presented to the subjects 35-40 times, and between 3500-11000 pixels of each image distortion map (depending on the size of the image) were selected for data analysis.
For all three metrics, the data selection improves the correlation
between the data and metric predictions. An example is shown in
Figure 3. Each point in panel (A) shows a selected
image pixel. The horizontal axis measures the S-CIELAB value of that
pixel, and the vertical axis measures the probability that a subject
marked that pixel. Panel (B) shows the same distribution but this
time for points rejected from the analysis. Notice that in panel (A)
there are no points with small
values and large
probability of being marked. Panel (B) contains many points in this
category, and these were rejected in the selection process. We
suspect that such points are accurate reproductions that are marked
only because of their proximity to points with large error.
![]() |
We have examined the agreement between metrics and data by treating the metrics as signal detectors that try to predict the subject's responses given the image data. In the experiment, subjects made binary decisions about whether distortions were visible at various image locations. Where the human observer decides there is image distortion, a good metric should predict distortion; and where the human observer does not see distortion, the metric should not. By setting a threshold level for each metric so that any distortion measures above the threshold will be predicted as visible, we can calculate the hit rate (HR) and false alarm rate (FAR) of each metric as a signal detector of the perceptual data. Let Mi,j represent the distortion measure by a metric at image location (i,j), Mcrepresent a threshold level for this metric, and Di,j represent the binary decision each subject made about the visibility of distortion at image point (i,j) (visible=1, invisible=0). Then the hit rate and false alarm rate are defined as:
where P(X) represents the probability of the event X for all image points and all subjects.
A good metric should have high hit rates and low false alarm rates.
In a system with noise, it is easy to see that the hit rate of 1 and
false alarm rate of 0 will not happen at the same time. If the metric
outputs are monotonically related to the probability of detection,
both the hit and false alarm rate will increase as threshold levels
decrease. A good metric will have much larger hit rates than false
alarm rates at all threshold levels. Therefore, plotting the hit rate
as a function of false alarm rate (the ROC curve, [24])
for each metric describes how sensitive each metric is in predicting
the data. This method does not require us to assume a particular
function for converting metric output to probability of being
marked. We will plot the ROC curves for the RMS, CIELAB
and S-CIELAB
metrics as a means to compare
them with the experimental data.
The same kind of ROC curve can be used to describe the reliability of the data as well. We treated the mean image distortion map as another ``metric'' that is used to predict individual subjects' responses, and calculated an ROC curve for the mean distortion map. This ROC curve reflects how well the mean distortion map predicts individual subjects' data, therefore reflect the reliability of the image distortion data. It will be used as a reference for how well a theoretical metric predicts the perceptual data.
Images with halftone errors and JPEG errors are analyzed separately, because they represent different types of distortions. The halftone distortions generally are a random texture noise, while the JPEG distortions generally include blurring and blocking artifacts. First we look at images with halftone distortions.
In Figure 4, we plotted the ROC curves for RMSE, CIELAB
,
and S-CIELAB
,
using hit rates and
false alarm rates computed from data on the halftone image pairs, at
many threshold levels for each metric. A metric that accurately
predicts the subjective decisions should have hit rates as large as
possible, and false alarm rates as small as possible, i.e. an ROC
curve that bows away from the diagonal line as much as possible. The
ROC curve labeled Data is generated using mean image distortion
map as the predictor. This marks the best a metric can do in terms of
prediction accuracy.
From Figure 4, we can see that for halftone images, the RMS error measure one device RGB values can predict the data to some degree (dashed line), in that the hit rates are always larger than the false alarm rates. For the particular display we used in the experiment, the RMS error calculated on un-calibrated RGB values is somewhat consistent with the data.
The RMS metric is not a perceptual metric. One would expect that it does not predict perceptual distortions at all. So why did it work to some degree in this instance? Part of the reason lies in the fact that the device-dependent RGB values sent to the display frame buffer is related to the actual intensity of light emitted from a CRT display by a power function (approximately). One of the early observations about perception of light is that the perceived brightness of a light signal is related to its physical luminance by a non-linear function (Steven's Power Law). This perceptual non-linear function is also a approximately a power function of the form:
| B = aI0.4 | 4 |
where B represents perceived brightness, I represent physical intensity of the light, and a is a scale factor [25].
For the CRT display used in our experiment, the intensity of the RGB phosphor emissions are linearly related to the RGB frame buffer values raised to a power of 2.5. The value 2.5 is called the gamma value of the display. Due to the non-linear relation between intensity and brightness, the RGB frame buffer values were actually linearly related to perceptual brightness for the particular CRT display we used. The device RMS measure had coincidentally taken into account the non-linear nature of brightness perception this time, therefore it predicted the perceptual data to some degree.
To confirm this reasoning, we calculated the RMS prediction of image distortions again using a linear display gamma value. Assume that the same images were displayed on another CRT monitor with a linear gamma value, then different RGB frame buffer values will be necessary to generate the same physical image seen by the subjects. We can calculate the RMS error between the original images and the halftoned images on this hypothetical display, and look at how well it predicts the data. In Figure 5, the ROC curves for the linear RMS error measure and the device-dependent RMS metric were plotted together. For the linear RMS measure, the hit rates are not higher than the false alarms rates at any threshold level, indicating that it does not predict the perceptual data at all.
![]() |
The ROC curves in Figure 5 tell us that using device RMS to measure perceptual image fidelity may or may not work, depending on the characteristics of the display device. The RMS results for this data set are probably the best possible one can get using device dependent RMS metric, because the frame buffer RGB values for the display we used in the experiment is almost exactly linear with perceptual brightness. For other display devices, using RMS measure will give similar or worse results.
Next we look at CIELAB
results. Figure 4
shows the ROC curve for the CIELAB
metric (dotted
line). As expected, the CIELAB
metric is an
improvement over the RMS metric. The CIELAB
metric not
only accounted for the non-linear nature of brightness perception
(through a cube root transform from XYZ values), but also accounted
for non-uniform perceptual discrimination thresholds at different
color directions. In addition, CIELAB
is a calibrated
metric, calculated from XYZ values, which are device
independent. Therefore, when more advanced color metrics are not
available, CIELAB
should be used instead of RMS.
As a side note, we also calculated standard CIELAB
predictions for the data. The standard
predicted the data
slightly worse than the
color difference metric. This
suggests that the CIE94 color difference formula is indeed an
improvement over the older formula.
The S-CIELAB
metric extended CIELAB to include spatial
sensitivity. As shown in Figure 4, its ROC curve (thin
solid line) bows further away from the diagonal, indicating a further
improvement over CIELAB
in accuracy of predicting the
perceptual data. This result is consistent with an earlier
experimental test of S-CIELAB [26].
Still, the ROC curve for S-CIELAB
metric is quite far
from the upper limit of how well a metric can do to predict the
data. There are many perceptual factors that are not yet included in
the S-CIELAB metric. What is the next factor to include in an image
fidelity metric to achieve a significant improvement in accuracy of
measurement?
The data analysis so far indicated that non-linearity of brightness perception, non-uniform color discrimination along different color directions, and different spatial sensitivities for different color components all influence perceptual fidelity in real images. Factors that are still not incorporated in S-CIELAB include adaptation, contrast masking, attention, etc. Which of these factors have significant influence on the perceptual fidelity of images over and above what was already included in S-CIELAB?
![]() |
Closer inspection of the predicted image distortion maps suggested that most of the regions where the S-CIELAB predictions were poor fell in areas with large negative local contrast. To confirm this observation, we identified image points with negative local contrast from image points with zero or positive local contrast, and looked at the agreement between S-CIELAB predictions and the data for these two groups of image points separately. For each original-reproduction pair, the S-CIELAB spatial filtering was applied first, which resulted in S-CIELAB opponent representations of both the original and the halftoned images. Because most of the image variance is in the luminance channel [27], only the luminance images were used to calculate local contrast. The local contrast of an image was computed as the difference between the image luminance and the local mean luminance, scaled by the local mean luminance [28]. Specifically,
![]() |
5 |
After we calculated the local luminance contrast image for both the
original image and the halftone image, we found out the image
locations at which both the original and the halftoned image had
negative local contrast. These image locations were excluded from
the data analysis. For the remaining image points, we examined the
agreement between S-CIELAB
predictions and the
empirical image distortion maps by plotting an ROC curve using the
same method as described in the previous section. This ROC curve is
shown in Figure 6, along with the S-CIELAB
ROC curve. The new ROC curve computed by excluding image
points with negative local luminance contrast on both the original and
the halftone images, labeled ``S-CIELAB positive'', bows away further
from the S-CIELAB curve, indicating a better agreement between the
metric's predictions to the data. When excluding image points with
negative local luminance contrast, the S-CIELAB predictions are much
closer to the predictions generated using the mean image distortion
maps themselves (curve labeled ``Data'). On the same graph, the
S-CIELAB ROC curve for the image points with negative local luminance
contrast is also plotted, labeled ``S-CIELAB negative''. The S-CIELAB
predictions for the negative contrast points are not very good.
Why does S-CIELAB make much worse predictions for image locations with
negative local luminance contrast? This seems to be a limitation of
the CIELAB
metric. The CIELAB
measure is defined
in terms of absolute levels of cone absorption. It is not defined in
terms of contrast. When the two colors to be discriminated are on a
bright background, perceptual difference between the two colors is not
as visible as when the background is very close to their own level of
brightness (Crispening effect, [29]). Therefore, the
metric tends to over-estimate perceptual color differences
at image regions with large negative contrast. When we excluded those
regions from the analysis, the S-CIELAB
metric values
correlates with the data much better.
![]() |
If the above argument is valid, excluding negative contrast points
should improve the CIELAB
predictions as well. This is
indeed the case. In Figure 7, the dashed line labeled
``CIELAB positive'' represents the performance of CIELAB
metric when excluding negative contrast image points. The area
under this ROC curve is much larger than the CIELAB
curve including negative contrast points. Its agreement with empirical
image distortion map is even better than the original S-CIELAB
metric, and only a little worse than the S-CIELAB
metric with negative contrast points excluded.
From the above analysis, it seems that re-defining CIELAB
value on the basis of local contrast will provide significant
improvements. This improvement can be implemented as a separate
extension to CIELAB that translates negative local contrasts to
positive local contrasts, or in terms of a completely new perceptual
color space that is based on contrast. Some existing color image
fidelity metrics, such as DCTune [23] already work in
contrast space, but they do not use the CIELAB
metric. Evaluation of such metrics using the image distortion data
will be helpful in revealing what factors make a metric work, and what
factors do not. If possible, a computationally simple extention that
is based on CIELAB
is still desirable, since the CIELAB
unit is familiar and meaningful to the color reproduction
industry.
For this data set, none of the RMS, CIE94, and S-CIELAB metrics made satisfactory predictions of the marked errors in the JPEG-DCT reproductions. Figure 8 shows the ROC curves for RMS, CIELAB, and S-CIELAB error measures, using hit rate and false alarm rate computed on JPEG image data.
This inconsistency between the metric error measures and the data can be a result of either poor data quality or poor metric predictions.
From close examinations of the image distortion maps for the JPEG images and from subjects' feedback on the experimental task, it seems that the JPEG data may not be as reliable as the halftone data. First, many subjects reported difficulty in performing the task for JPEG images since they saw general blurring on the JPEG images and could not pinpoint particular locations of visible distortion. Second, there seem to be a large difference in visibility of JPEG distortion between experienced observers and naive observers. Many of the subjects commented that the JPEG distorted images looked just fine except for maybe a tiny bit of blurring, whereas experienced observers were able to identify block artifacts at many locations. Overall, since most of the subjects were naive observers, they clicked on very few places on most of the JPEG-DCT images. The proportion of mark coverage for 2 of the 6 JPEG images is below 3%, and between 17% and 34% for the rest. Therefore, there are good reasons to believe that the image distortion data on the JPEG-DCT images are not reliable and thus failure of a metric to predict this data may not indicate inability of the metric to predict JPEG-DCT distortions in general.
Limitations in the metrics can also contribute to the inconsistency between the metric error measures and the JPEG-DCT image distortion data. Given the nature of JPEG artifacts and the characteristics of the RMS, CIELAB, and S-CIELAB color difference metrics, it can be expected that none of them should do well in predicting visibility of JPEG-DCT distortions. The JPEG-DCT artifacts arise from (1) the coarse quantization of high frequency components, and (2) the block processing structure of the algorithm. In the case of quantization, the errors are typically correlated with lines or edges in the images, and therefore hidden by the effect of orientation selective masking and contrast masking [19,30,31,32]. The RMS, CIELAB, or S-CIELAB metrics do not include effects of contrast masking or orientation selective masking. These metrics should not be expected to make accurate predictions about visibility of JPEG artifacts.
Subjects identified visible reproduction errors in a collection of halftone and JPEG-DCT reproductions. The responses were summarized as image distortion maps. Using these maps, three image distortion metrics were evaluated: RMS, CIELAB, and S-CIELAB. From results on images with halftone distortions, the RMS metric made the least accurate predictions, and S-CIELAB predictions were significantly better in consistency with the image distortion data.
The RMS metric was calculated on RGB frame buffer values. Depending on
gamma curve for particular display devices, the RMS error value may
correspond somewhat to perceptual difference, or may not correspond to
perceptual difference at all. The CIELAB metric accounts for color
sensitivity of the human eye but does not include spatial sensitivity
mechanisms, therefore it is better than RMS metric in predicting
perceptual distortions, but does not explain all the variance in the
data. The S-CIELAB metric incorporates the human spatial-color
sensitivity. As expected, it provides improved measurement of image
distortion visibility over CIELAB
.
An interesting
observation is that when we excluded image regions of negative local
luminance contrast from the data analysis, the predictions of both
S-CIELAB and CIELAB
were much more consistent with the
perceptual image distortions. This suggests that defining CIELAB
in terms of contrast is a good next step for improvement of
image fidelity metrics.
The RMS, CIELAB, and S-CIELAB metrics all failed to predict the image distortion maps measured with JPEG-DCT reproductions in this experiment. Likely causes are the lack of orientation selectivity and masking mechanisms in these metrics, and also the possible low reliability of the distortion data on JPEG images. Further experimental and theorectical work is needed to better evaluate metric predictions of JPEG-DCT distortions against empirical data.
This study was supported by a grant from the Hewlett Packard Company.
This document was generated using the LaTeX2HTML translator Version 98.1p1 release (March 2nd, 1998)
Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html compare.tex.
The translation was initiated by Xuemei Zhang on 1998-10-07