So how fair is the DxOMark Camera Sensor Score?
There is a difficult, but important question. Probably every image quality scientist[i] in the world would have a somewhat different personal preference for a benchmark like this. But my impression is that the benchmark is pretty useful: I analyzed the model and the data, but didn’t find any serious flaws. Reassuringly, results like Figure 1 also appear to be pretty consistent with traditional hands-on reviews: camera models that were stronger [or weaker] than state-of-the-art at the time when they were introduced (such as the Canon 40D [or 50D]) show up as expected in Figure 1. And, as mention at the start of the article, having a pretty solid metric by an independent party is better than never-ending discussions about what the ultimate benchmark might look like.
The list of critical notes, suggestions and open issues are relatively subtle because the entire topic is a bit subtle:
Low ISO bias
If you compare the DxOMark data in Figure 7 for a number of prominent cameras you would get a more balanced impression about which camera to buy than by just looking at the overall DxOMark Sensor score. If you focus on the latter, you would strongly prefer the Nikon D800 with its excellent low ISO dynamic range. But this emphasizes one aspect of the sensor (essentially the ability to do single shot HDR) that provides a capability we never had in the past. It is a feature which we may infrequently need – and one that some types of users may never see (e.g. if you shoot JPG).
However, at sufficiently high ISO, other models win. High ISO usage may be a more relevant usage for many users than HDR ability at low ISO.
One can therefore ask whether DxOMark hasn’t overstressed low ISO noise[ii]. This may explain why some reviewers arrive at different conclusions about the image quality of the Canon 5D3 (or 1Dx) compared to the Nikon D800 (or D4)..
To DxOMark’s credit, the user does get three detailed scores to choose from. So you can focus on “dynamic range” if you need single-shot HDR like capability and “low light ISO” if you need to boost your ISO settings often.
Comparing different sensor sizes
As pointed out by Falk Lumo[iii], the fact that larger sensors tend to have higher DxOMark scores than smaller sensors is not a guarantee that bigger is better – even if you check out the actual DxOMark Sensor scores before selecting a camera.
Say you are considering an APS-C model like the Fujifilm X-E1 versus a full frame camera. Falk Lumo’s point is that the intrinsic definition of the DxOMark metric (as well as most other benchmarks) assumes that you would compare say Fujifilm’s 35mm f/1.4 lens to an “equivalent” 50mm f/1.4 lens on full frame. This reduces the depth of field on full frame while decreasing the noise. But if we had kept the depth of field[iv] constant (by picking a 50mm f/2.0 lens on full frame), the noise would have stayed the same. So one can argue that the DxOMark (and any other comparison across formats at constant ISO setting makes larger formats look good by assuming that we increase the total influx of light falling on the larger sensor by picking increasingly large diameter lenses (constant aperture).
Complexity of interpreting the numbers
Complexity is a fact-of-life in the high tech industry. To DxO's credit, DxOMark allows you use just a single overall score to compare camera body image quality. They alternatively allow you to zoom in (and get 3 numbers instead of one) or zoom in all the way (for graphs with the actual measured data). But despite or possibly due to all this data, it is difficult to translate a conclusion of “A is 20 points better than B” into what exactly you would expect to observe in actual photos. Because I initially had trouble translating the numbers into “What type of images would this difference show up in?”, I added some photos to this essay now I believe that I more or less figured it out.
The undocumented Master Formula
DxO does not document how the final DxOMark Camera Sensor score is computed from the individual Dynamic Range, Color Sensitivity and Low-Light ISO scores. I feel it should be provided as the overall score gets a lot of attention. With the formula, DxO countered, a manufacturer could attempt to optimize the overall score. But I still don’t see benefits to leaving this formula undocumented: if DxO believes the master formula is a reasonable approximation of what photographers are looking for in a camera, they should document it with a note that it is a compromise between completeness and ease of use.
Here is my own attempt at a recipe to compute the overall score from the three subscores. Start off with the DxOMark Camera Sensor score for the Leica M8 as an arbitrary reference point. This give you 58 points. Next add 4.3 points for every extra bit of Color Depth that a camera has compared to the M8. Then add 3.4 points for every extra unit of Dynamic Range that the camera has compared to the M8. And finally add 4.4 points for every factor 2 improvement in ISO relative to the M8. The formula version[v] seems to predict accurately to within 1 or 2 points.
Use case names
The Landscape/Sport/Portrait terms can easily confuse people who take this literally. I am tempted to interpret the 3 metrics as Dynamic Range (as DxO does), Luminance Noise (instead of Low-Light), and Chroma Noise (instead of Color Sensitivity). Those are quantities you find more often in reviews.
Print versus Screen mode
To compare DxOMark Camera Sensor scores between cameras with different resolutions, you need to look at the “Print” results. The overall DxOMark Camera Sensor score is “Print” level only, which is fine. For the next level of detail a viewer gets to choose between Print and Screen. This is less fine: Screen is not normally useful for end users (it can be useful for debugging your own calculations). The lowest level of data is presented in “Screen” mode only, but is not labeled as such. I would prefer to see all data to be labeled Print/Screen or –better yet– Normal/100%. Normal would stress that this is what matters. And 100% is similar to pixel peeping: here you look at the noise at the 100% crop level and loose the overview of what it means at the image level.
Why measure Color Depth at low ISO?
High-ISO chroma noise seems more relevant for photographers than low-ISO chroma noise. I doubt people are actually able to see color noise at such low ISO: it's hard enough to spot regular noise at low ISO, and chroma noise is even more elusive. I suspect that the choice to use low-ISO Color Depth is an artifact of originally trying to define a metric that matched studio portrait conditions. But I am not convinced that a studio portrait photographer typically has problems with visible chroma noise in the first place.
Metric measureable per ISO setting
It might have been simpler to have a single "perceived image quality" metric that could be measured at different ISO levels. This is particularly relevant because some cameras excel in high ISO conditions (requires a low noise floor) while others excel in low ISO conditions (requires physically large sensor). Showing a high-level graph with a single figure of merit per ISO setting might have simplified interpreting the results.
Sensor size visualization
DxOMark’s online graphs allow you to plot scores with MPixels along the horizontal axis. It would be nice to have an extra setting to show sensor size instead of MPixels. This would (just like many of the graphs in this article) cluster comparable products together. Representing sensor size as color would also help because photographers tend to consider different sensor sizes (unlike MPixel ratings) as different product categories.
About Peter van den Hamer
Peter van den Hamer is a physicist by training who has been working in the Netherlands as a scientist/architect in various large high-tech and electronics companies for 25 years. Apart from merely writing about technical aspects of photography, he also does some actual photography and has exhibited work at local art galleries. The most recent exhibition at the local town hall ended in the loss of these displayed signed and numbered prints when the building went up in flames in a bizarre[vi] incident.
Although I have a science and semiconductor industry background, I am not an image quality expert.[ii]
Both Dynamic Range and Color Sensitivity are measured at low ISO. This possibly gives a 2:1 bias towards the low ISO side and stresses differences there that may not be normally visible. As illustrated in Figure 7, top notch low ISO performance is no guarantee for top notch high ISO performance.[iii]
For details and references, see sections on Falk Lumo’s Equivalence Theorem above.[iv]
There are more aspects than Depth of Field, but this is the easiest one.[v]
Expressed in a linear formula: DxOMark_Sensor_Score = 59 + 4.3*(ColorDepth-21.1) + 3.4*(DynamicRange-11.3) + 4.4*log2(ISO/663) -0.2. The 3 middle terms can either add or subtract points, depending on whether the camera did better or worse than the Leica M8. Expansion of the formula gets rid of the choice of the Leica M8 as a reference. Camera scores predicted by this formula differ from the published DxOMark Sensor scores by a standard deviation of 0.7. The formula tells us that the 3 subbenchmarks have roughly equal importance. And that a factor of 2 improvement in each subbenchmark would increase the overall score by 12.1 points. My guess is that the actual formula is non-linear and may use (under some conditions) coefficients of 5/5/5 rather than 4.3/3.4/4.4.[vi]