How we test smartphone audio recording

The DXOMARK audio testing laboratory
- The anechoic box
- The listening room
Recording use cases
The testing protocol for recording and playback
“Perceptual” and “objective” measurements
The scores

People are using their smartphones to create ever more video content. Smartphone camera image quality has greatly improved over the years, but what about the sound quality of the microphones? DXOMARK has developed a unique set of rigorous test protocols that mirrors the experience of people who use their smartphones to record audio input.

Inside our specially-commissioned laboratory at our headquarters in Paris, France, our audio engineers use a series of multimedia and audio samples that represent the full range of a user’s experience when recording and playing back the sound on speakers. We use these real-world examples, called use cases, as the basis for evaluating the audio input of each and every device under test to ensure fair comparisons.

Some of our testing protocols use purely objective measurements—in other words, the testing equipment provides clearly quantifiable results—for example, microphone output undergoes objective measurements based on spectrographic evaluation. Having said that, no equipment can match the interpretative capabilities of the human ear, so some of the testing protocols involve perceptual measurement to some degree (we will explain this in more detail later).

Smartphone audio input relies on microphones, and the number and location of microphones on a device depends on the smartphone in question. For example, it is very common to see two to three microphones on a smartphone—two for use with the rear (or main) camera, and one for use with the front (or selfie) camera.

Because of the very limited real estate on a smartphone, the microphones are minuscule and have to perform exceptionally well to capture the sound at the quality the user expects. They are omni-directional, picking up sound from all around the device. This means the device often must deal with poor environmental acoustics, as sound waves bounce off walls, ceilings, and floors, and arrive a fraction of a second later than sound received directly from the source. So instead of getting a clear sound, a spectrum of sound arrives at the microphone that creates distortions for which the device will have to compensate.

One thing a smartphone has going for it that many standalone microphones do not is the huge computing power at its disposal. Many smartphones—particularly flagship and high-end models—can detect how sound is being recorded and under which circumstances and use case. DXOMARK tests these capabilities. Each test generates its own score and is weighted differently.

The DXOMARK audio testing laboratory

All our audio testing is undertaken in DXOMARK’s carefully-designed audio testing laboratory, which houses an anechoic box and the listening room.

The anechoic box

An anechoic box is a space that completely absorbs sound waves. The chamber is insulated against external noise; moreover, it is lined with fiberglass wedges that cover the entire ceiling, floor, and walls to ensure that all the energy of a sound wave dissipates, thus completely eliminating echoes. Anechoic boxes are the quietest places on Earth.

In our protocol, we place a device inside the box and test the sound pressure level by cranking up the speaker volume and measuring the level of distortion on the recorded output.

Sound-absorbing baffles inside the anechoic box

Microphones within the box

Speakers in the box for recreating use cases

The listening room

The listening room is larger, and houses an array of speakers that we place around a human subject or a device under test.

We recreate use case environments in the listening room.

Objective measurements are often based on spectrogram analysis.

The speakers in the listening room recreate various environments by playing real-world recordings that mimic a variety of conditions. We faithfully recreate the use case environment (such as a busy street) in the listening room with recordings that we made using an array of microphones arranged in a manner identical to the speakers in the listening room.

DXOMARK uses both the anechoic box and the listening room to test the recording capabilities of a device’s microphones.

Recording use cases

DXOMARK employs different use cases to test each device. As explained above, we have commissioned audio samples that represent a variety of situations in which the device could be used to record.

We place the device under test in the listening room located in the lab, and then play back the audio samples through an array of speakers surrounding the device to recreate the use case scenario.

One such use case is “live video recording in a busy outdoor environment,” such as a park or a busy street. The controlled conditions recreate the situation of a user trying to record a two-way conversation. The purpose of the test is to see how well the microphones can pick up the individual voices and minimize the ambient background noise.

Another common use case we recreate in the laboratory is the selfie video. In this controlled test, we record the voices of two subjects in front of the selfie camera; as with the live video test described above, we evaluate the camera to see how well it can pick out the subjects’ voices from the ambient surroundings.

Recording a street performance

Making a selfie video recording with audio

One of the more challenging environments to test is the recording of music at a live concert where volume and bass are very high. It is often hard for smartphones to faithfully replicate the same level of volume and to record the bass sounds without significant distortion.

An equally challenging environment is the classical concert, but for very different reasons. Here there is a greater emphasis on recreating the spatial environment of the concert. For the listener, it is important to be able to pick out the timbre of individual instruments and to localize them within the soundscape, and this is fundamentally what DXOMARK tests for.

Recording a meeting environment offers its own unique challenges. Here the object is to record several voices from several different directions. DXOMARK engineers evaluate how well the device makes it possible to pick out individual voices and to suppress ambient noise.

Users recording a rock concert with their smartphones.

Recording a meeting has its own challenges.

The testing protocol for recording and playback

DXOMARK’s protocol for audio recording tests for the following attributes—timbre, spatial, dynamics, volume, artifacts, and background. Depending on the use case, some of these attributes will carry more weight than others.

Timbre

Within the timbre of a sound, DXOMARK measures the bass, midrange, and treble frequencies and their tonal balance. DXOMARK is specifically looking for the overall balance of frequencies within the device’s recording. A good balance would typically consist of an even distribution of frequencies.

Spatial

The engineers also measure the spatial attributes of a recording and how the device uses the microphones and sensors to accurately reflect the sense of space (wideness and distance) in a recording, and to pick out individual sounds within the soundscape.

Poor spatial localizability

Appropriate spatial localizability

The spatial sub-attribute localizability measures a device’s ability to create the impression that specific sounds are coming from specific locations within the overall soundscape.

Poor spatial directivity

Appropriate spatial directivity

Spatial directivity measures how the device handles sound levels based on where the sounds originate, and on the use case. For example, when recording a video, voices both at the front of the device and those behind need to be recorded at the appropriate level so that the playback can reflect the location and levels of the original sources.

Volume

Volume is a measure of the loudness or the intensity of a sound—in particular, the maximum amplitude capabilities of a recording from the source. DXOMARK measures the ability of the device to record at the appropriate volume whatever the input acoustic level is.

Dynamics

DXOMARK reviews the dynamics of a device’s recording capabilities, such as the sound envelope. It is the sound wave’s envelope that helps establish a sound’s unique individual quality, and the envelope has significant influence on how we interpret sound. For example, a sound may have a particular punchiness that the envelope helps define.

Artifacts

Artifacts are anomalous sounds that are not present in the original sound source, but are heard in audio playback. Users generally find such sounds annoying.

Occlusion caused by hand position.

Hand position does not always create occlusion artifacts.

The sub-attribute occlusion pertains to the effect of the positioning of the user’s hand when recording from the device. Users may often be unaware that the way they hold their device could impede the recording, and many devices try to compensate for this.

Background

These are the ambient sounds in a recording; they are sometimes desirable, but can be detrimental depending on the use case and the purpose of the recording. For example, when recording a selfie, the user would want the background noise to be very low so as not to overwhelm their voice.

Background artifacts: the ambient sound is too loud for the use case.

Background artifacts: the ambient sound is at an appropriate level.

A smartphone’s recording capabilities are often directional, which means that their microphones are designed to pick up sounds from a specific location; some smartphones have sensors that can widen the directional range if needed. The more narrow the soundscape, the more directional the playback or recording. In the case of the foreground directivity, such as when recording a selfie, DXOMARK evaluates how clear the subject’s voice is and how well the background noises are muted.

Artifacts are a sub-attribute of the Background attribute.

Smartphone makers want their devices to record “clean” background sounds.

Artifacts are a sub-attribute of the Background attribute; they are best described as anomalous sounds picked up during recording that seem to come from the background and which will be audible to listeners during playback.

“Perceptual” and “objective” measurements

When we talk about perceptual testing, we are talking about using the human ear and brain as the main measurement tools. Because our sound experts have worked for years in the fields of audio engineering and audio industrial design in a wide array of industries, they are able to discern different aspects of audio and sound to a level far superior to untrained listeners; still, they evaluate device performance with the average consumer in mind. Furthermore, DXOMARK has created specific protocols to ensure that any perceptual measurement is consistent over time. Therefore, the same test carried out months later on the same device would provide identical results.

Our objective tests, on the other hand, rely on sensors that record the results from testing devices such as a spectrogram or sound-level meter. Whether the tests are perceptual or objective or a combination of both, both types of tests are quantified using proprietary protocols.

All of this means that our perceptual tests are no less scientific than the objective measurement tests. We carefully record the measurements and we repeat the tests several times to ensure that the results are recorded accurately and impartially.

The scores

It would be very difficult for the average user to properly evaluate the audio quality of a device’s recording capability without owning the device for several weeks and having had the opportunity to put it through most, if not all, use cases. Additionally, what most of us never get the opportunity to do is to compare two separate devices side-by-side. This is the fundamental value of an impartial scoring system—users will be able to see in a quantifiable manner the differences between competing devices.

Within the testing protocol, the attributes and use cases have their own scores. Furthermore, each attribute has several sub-attributes that we also evaluate and score. We weight these scores individually and subsequently use a complex algorithm to calculate our headline score from them.

A number of attribute sub-scores feed into the Recording score which in turn is a sub-score of the DXOMARK Audio Overall score.

DXOMARK’s reviews will drill down into every aspect of the device’s audio capabilities and will consider the use cases and the various attributes. This means that anyone who reads our reviews will be able to investigate the specific sub-scores that are relevant to their own use cases.

Finally, unlike some benchmarking systems, DXOMARK’s audio scoring will be an open scale. This means that the headline score (and the sub-scores) will be a number that will constantly climb as the technology and the performance improve. This open scale will allow you to objectively compare the results of current flagship smartphones with any future devices.

For more information about our Audio testing, read the article on how we test speakers.