DXOMARK, a leader in image-quality testing, as well as audio, display, and battery evaluation, is bringing its camera and audio teams together to evaluate video conferencing in an effort to improve the overall experience. The teams will apply their respective expertise to scientifically assess the image and audio of how different devices — laptops, webcams, and tablets — perform during a video conference.
These days we take for granted that we can make a call with our smartphone and actually see the person or people we are talking to. But how many times have we uttered the following phrases when joining a meeting via desktop computer or laptop while working from home: “Can you hear me?” “Can you see me?” “You’re frozen.” “Your microphone is on mute.”
The Covid lockdowns around the world during the past two years, which forced many people to work from a home office, highlighted just how much we rely on video calls to communicate with colleagues, family, and friends.
Depending on the calling situation, the quality of the audio and the image can vary enormously. Sometimes a smartphone can deliver smoother video and clearer audio on a loud street than a laptop can in a busy open office. While the quality of the hardware such as the screen, camera, microphone, and speakers, plays big a role in the video conference experience, so does the software that makes it all work.
Studies show that 55% of a message is conveyed through body language and facial expressions, 37% from the tone of voice, and only 7% from the words used (source: Forbes). Problems like choppy video or poor audio aren’t only annoying, they can hinder effective communication. A poor video conference experience can have serious consequences for business, too.
In a 2020 survey of 2,025 full-time workers in the United States on the state of remote work, 57% thought the video quality of a video conference made working from home a challenge, while 56% thought audio quality made working from home a challenge (source: OwlLabs). So it’s little wonder that both end users and manufacturers alike are looking more closely at the different elements of video conferencing.
A recent informal DXOMARK survey found that nearly three-quarters of respondents use video conferencing on a daily basis. Laptops and smartphones are by far the most used devices for a video call (as opposed to tablets and smart displays.) Yet a third of the respondents said that they often faced poor video or poor audio quality during the experience. Among the main problems cited were frequent audio/video lags.
It is now a necessity to have the ability to reliably make video conferences anywhere, whether from home or on the go.
Challenges and limitations
Video conferencing uses the camera, speakers, and microphones systems of a device to allow individuals or groups of people to communicate with each other from different locations. A good or mediocre video conferencing experience depends on the integration of these elements as well as their tuning.
Manufacturers of laptops, tablets, and webcams often face size and cost limitations when designing their products, and as a result, the cameras often have narrow apertures, low-quality lens systems, and small sensors. A too-simple lens system, for example, can introduce aberrations in the image quality such as strong color fringing or distortion. The miniature aperture of the lens feeds little light to the small sensor, limiting its information capacity, which affects the dynamic range, texture-to-noise compromise, and color depth and accuracy.
On the audio side, the number of microphones, their placement and sensitivities as well as various playback configurations, require the audio pipeline to be compatible with a wide range of setups and implementations. Audio processing during a video conference call can be affected by double talk (when the two people speak at the same time), by echo cancellation (the cancellation of one user’s voice recording in the other’s speakers’ output), by ambient noise reduction (when the user is typing on the keyboard, or when the environment is noisy).
Hardware limitations are often compounded by communication issues with video recording software. Different operating systems may handle camera drivers and APIs differently, resulting in different color profiles. Video and audio codec (encoding and decoding) performance for each application is also key for the final end-to-end user performance because it can allow maximizing the overall final quality even when dealing with low bandwidth network quality.
In some cases, operating systems such as Android and iOS have recommended settings for the camera and microphone of the device, but those settings may not be the ones preferred by the developer of a given conferencing app. That is why these applications will interpret and process camera input differently, producing different video and audio outputs.
When hardware, signal processing, and software are designed together, they can compensate for each other’s faults and eliminate many of these limitations. That’s already the case with cellphones, where the large players manage to reach impressive image and audio performances with systems that are only a few millimeters thick.
But even products from the same manufacturer that are optimized to be used with the same software can have vastly different renderings.
Video and audio quality
The image quality during a video conference depends on mostly three sources: the optical/camera system, the image signal processing (ISP), and the software. Integration and tuning of these sources is important.
In the following example, three videos were taken in a typical indoor scene with the MacBook Pro (laptop, MacOS), with the Logitech Brio Ultra 4K (webcam, on Windows), the and Lenovo IdeaPad Flex 5 (laptop, Chrome OS).
The shots were taken at maximum resolution on the native app of each operating system: QuickTime for the MacBook Pro, Windows Camera App for the Logitech Brio, and the Chrome OS Camera app for the Lenovo IdeaPad. The vastly different rendering of the three images is immediately apparent. The MacBook camera is the only device that produced the most natural skin tones and white balance; the Logitech Brio had accurate exposure and high detail, but its white balance was off; and finally, the Lenovo IdeaPad struggled with color, contrast, and target exposure.
In another video example, the laptops were used in a challenging yet common backlit scene.
All three devices strongly clip the background, with the Lenovo even struggling to expose the target properly. The three devices have somewhat inaccurate skin tones, and all the devices present noise on the subject, with the Lenovo IdeaPad Flex 5 showing the most noise.
What our experts perceived in audio testing was that the Macbook Pro managed to have a natural-sounding audio rendering, but it lacked volume, and it only moderately attenuated background noise. The Logitech Brio managed to have a high recorded volume but was affected by distortion artifacts as well as a tonal balance centered around the mid and high-mid frequencies (which is the frequency range for the human voice). The Lenovo IdeaPad Flex 5’s audio was similar to the Macbook Pro’s in terms of volume and tonal balance, but it was affected by a very noticeable pink noise throughout the recording.
On the technical side, one of the most difficult things video conference software developers face is the large differences between hardware and signal processing pipelines. Two people using the same conferencing app on different devices and on different networks can get very different results, as seen in the above examples.
Part of this is because each device manufacturer makes different choices about form, cost, and performance when designing a product.
But there are some interesting observations even when devices share the same ecosystem. The following series of video stills are of a challenging backlit situation: the devices are all within the same ecosystem—they are all designed and tuned by Apple. Here, the Apple iPhone 12 Pro Max (2020), the iPad Pro (2021), and the MacBook Pro (2021) have vastly different specifications. The iPhone has a 4K-capable front-facing camera, the iPad has a 1080p front-facing camera, and the MacBook Pro has a 720p camera.
The comparison shows that the iPhone’s video has a wider dynamic range and higher detail rendering, but with lower face exposure, whereas the iPad Pro shows more saturated colors and a brighter subject, which is maybe the best tradeoff for the use case. Meanwhile, the MacBook Pro clearly shows more noise and lower contrast, and slightly underexposes the subject.
In terms of audio, the MacBook Pro shows a nice attenuation of background noise and a realistic perceived distance; the iPhone recording results in voices perceived as more distant; and the iPad Pro sounds even more distant and has a quite noticeable sibilance issue (“s” sounds are unnaturally amplified and saturated). All of these recordings showed a very low recorded volume.
Although the devices were made to work within the same ecosystem, the gaps in quality that our experts perceived on these and other devices can result in vastly different experiences for two users using a different device, ignoring any potential connection or software issues.
The integration and tuning of an entire video conference system can be a very resource-consuming task for product makers, and some manufacturers do more testing than others when releasing a product, precisely to catch potential problems.
But as video calling for professional and personal use evolves and becomes an even more integral part of everyday life for people, it is important that manufacturers deliver the right hardware and software so as to provide a quality experience to their consumers.
Through its specially designed testing of laptops, webcams, and tablets, DXOMARK will continue scrutinizing devices used for video conferencing in order to evaluate image and audio quality as well as the consumer experience. Stay tuned for more articles and insights about video conferencing from DXOMARK.