2000 to 2021: The evolution of smartphone audio playback

Before 2010
Early 2010s
Late 2010s until today

When it comes to audiovisual content, sound has always been the poor cousin — and it’s as true today as it was a century ago. Way before growing louder and better sounding, our smartphones grew bigger and brighter, exactly as did our computers, our TV sets, and movie theaters before them. However, sooner or later, audio catches on.

Over the past few years, speaker quality has become crucial for sharing social media content, listening to music, watching videos, gaming, and making video calls. In 2020, the need for amplified audio content reached another level during the Covid-19 outbreak, as people spent much more time at home, thus adding a multi-tasking dimension to their everyday lives, while eliminating the social obligation of using headphones. (You will find consistent consumer data supporting that statement in this white paper, courtesy Cirrus Logic.)

This article traces the evolution of smartphone audio playback, from the days of thin and distorted mono sound to the increasingly immersive and enjoyable audio experience available today… and in the near future.

Before 2010: The dawn of smartphone audio

Prior to 2010, very few phones delivered stereo playback: a typical device setup would consist of one earpiece for phone calls, one loudspeaker for amplified audio, and that was it. Playback volume would thus be fairly low overall, would vary depending on the battery level, and would be almost completely deprived of bass.

Infographic by Cirrus Logic

As the digital music market awakened to its first hours of glory, new mobile audio use cases appeared, bringing some much needed awareness about our smartphones’ sound capabilities: People would rip their CDs, download music (legally or not), and transfer it all into their phone. In 2008, iTunes officially became the number one music retailer in the US, boasting the largest music catalog and no fewer than five billion songs sold from their store. This was the context in which the highly popular Nokia N95 and Apple iPhone 3GS were launched, the former in March 2007, the latter in June 2009.

The Nokia N95 (left) and the iPhone 3GS.

Back when many people still felt content with a landline, Nokia managed the incredible coup of selling over a million handsets in the UK alone by the end of its release year, and 10 million in total. Besides being one of those early bird stereo phones, the N95 delivered an impressive wideness in portrait mode — an ability that to this day remains extremely rare.

One of the N95’s stereo speakers

That said, both speakers were extremely easy to occlude when the phone was handheld (in other words, most of the time), and the overall rendering was still extremely thin and nasally, as playback was centered around the human vocal range (which goes to show that mobile phones were still primarily considered as communication devices, not on-the-go personal entertainment centers).

Frequency response: iPhone 3GS (pink) vs Nokia N95 (green)

Total harmonic distortion: iPhone 3GS (pink) vs Nokia N95 (green)

Further, distortion was off the charts, with acute spectral artifacts taking over all frequencies. As for the iPhone 3GS, while it certainly delivered a wider frequency range and less audible distortion, it would still have to wait almost a decade before finally going stereo.

The iPhone 7 (white device) was the first in the series to be capable of stereo playback. (The iPhone 7 is shown here with the Apple iPhone 3GS).

In those years, output power requirements were low enough for Class AB amplifiers, which were the norm in smartphones. As the name suggests, they offered an interesting compromise between Class A and Class B, in that they delivered both better audio fidelity than Class B amplifiers and higher efficiency than Class A amplifiers. But battery life was still dragged down by the energy required to amplify audio signals: that’s when equipment manufacturers started considering Class D amplifiers. However, due to firmly rooted fears about allegedly sacrificing the audio quality, Class D amplification would still have to wait a few more years for its global democratization.

Navigating the music player on the iPhone 3GS.

Apple iPhone 3GS

Nokia N95

Nokia N95 Stereo Widening

In addition, smartphones guaranteed no thermal or excursion protection to their speakers. In other words, a loud song with lots of bass or a sine wave played during a sufficient amount of time could simply dislodge, burn, or even melt the speaker! In reality, that mostly meant that in order to shield the speakers — and, collaterally, their users — from harm, manufacturers wouldn’t drive them beyond their rated power, which also explained why playback volume was so low. That’s when the idea of safely overdriving those tiny speakers with boosted amplifiers emerged.

Early 2010s: Louder, slimmer… and more efficient

In the early 2010s, comprehensive speaker protection became part of the most basic audio signal chains, which allowed manufacturers to drive them to their amplification limits without causing thermal or excursion damage. While playback volume therefore became considerably louder, speakers also became more compact, and thanks to the wide adoption of Class D amplifiers, much more efficient in terms of battery consumption — two key elements in the smartphone industry, in which both size and power optimization represented the most competitive areas. Those times also saw the innovative use of multiple magnets and new materials, which allowed the speakers to be pushed to higher temperatures and to a larger excursion. For the user, this meant louder volumes, and a wider frequency response that no longer made all singers sound like they had a stuffy nose.

Infographic by Cirrus Logic

While stereo playback became more and more expected from such flagship devices as the acoustically acclaimed HTC One, audio enhancement algorithms were embedded in even the most affordable ones. Distortion became better tamed, dynamic range expanded. Besides the democratization of stereo builds, the most notable evolution in those years was certainly “bass virtualization”: as such tiny speakers were particularly ill-adapted to reproduce low-end frequencies, the trick was (and still is) to add distortion in order to generate overtones, which allowed users to hear — or rather think they heard — frequencies below 100 Hz.

Late 2010s until today: current “state of play”

Since the late 2010s, the social media boom, the advent of streaming apps, and the arrival of mobile gaming created an exponential rise in demand for immersive audio, which quickly transformed into a playback quality boost. On the hardware side of things, speakers now reach better acoustic sensitivity and higher excursion, and stereo configuration is common in nearly all high-end smartphones. In addition, Class D amps quickly transitioned to Class H to achieve even greater efficiency while maintaining low distortion.

Infographic by Cirrus Logic

On the software side of things, a new class of audio algorithms emerged to manage audio power drawn from the battery. On another note, audio enhancing algorithms — such as Nokia’s OZO technology, Dolby Atmos, Xperi DTS, and Dirac — strove to deliver better tonal balance, stronger bass reproduction, higher dynamic range, and/or more immersive spatial reproduction. This leads us to two of the most recent top-scoring phones in our Audio ranking, the Apple iPhone 12 Pro Max and the BlackShark 4 Pro. Let’s compare their frequency response and distortion to that of the Nokia N95 and of the iPhone 3GS.

Frequency response: iPhone 12 Pro Max (pink) vs iPhone 3GS (green)

Frequency response: BlackShark 4 Pro (pink) vs Nokia N95 (green)

The graphs above illustrate how dramatically bass response and tonal balance have improved since the Nokia N95 and the iPhone 3GS were released. While the green curves are focused on midrange frequencies and drop to -60 dB below 300 Hz, the pink ones exhibit stronger bass presence, deeper low-end extension, and a much more harmonious tonal balance. The BlackShark 4 Pro in particular delivers an impressive response across all frequency ranges, with one of the best tonal balances we’ve measured to date on a smartphone, along with an outstanding low-end extension in every use case.

Total harmonic distortion: iPhone 12 Pro Max (pink) vs iPhone 3GS (green)

Total harmonic distortion: Black Shark 4 Pro (pink) vs Nokia N95 (green)

Improvements in the area of sonic artifacts are even more compelling: Compared to today’s measurements, total harmonic distortion from the days of yore appears highly disproportionate, with percentages that often tickle and sometimes even reach 100% — meaning that at that given frequency, sound is 100% distorted.

Apple iPhone 3GS directivity plot

Apple iPhone 12 Pro Max directivity plot

The directivity plots shown above measure the sound pressure level at all points on a circle surrounding the smartphone. While the iPhone 3GS (left) delivers a nearly omnidirectional sound (the curves approximatively form circles), which is typical of a mono rendition, the iPhone 12 Pro Max’s stereo configuration (right) is illustrated by clear differences according to the listening angle. However, you can notice that the dark pink curve is fairly circular; this is explained by the nature of bass frequencies, which due to their large wavelengths are intrinsically omnidirectional. But enough with the theory, let’s have a listen, shall we?

In short, frequency response, distortion, spatial reproduction, and volume all have improved so dramatically that current smartphones’ playback abilities don’t have much in common with their ancestors. However, while audio has certainly benefitted from many evolutions over the past 15 years, it still lags behind. At a time when smartphones boast an ever-increasing number of cameras, and as HDR transforms our viewing experience even on the go, two-channel audio reproduction — though nowadays prehistoric in the scheme of smartphone history– still remains unsurpassed, and audio experts still haven’t found a way to truly reproduce bass within such narrow spaces.

In the near future, we can thus look (or listen) forward to increasingly immersive sound reproduction from our smartphones. In that vein, the Xiaomi Mi Mix Fold delivers a stereo field in both landscape and portrait mode — a possibility that had never been implemented until now. But let’s push this even further and imagine ourselves watching a movie on our smartphone, with its speakers producing deep bass, 3D sound, wide dynamics, intelligent audio processing, and sensor technologies that seamlessly adapt to your preferences. We look forward to meeting again soon in these columns to talk about those brand-new, innovative audio experiences, applications, and services that are about to revolutionize the way we consume audio on the go — all technologies that might very well be just around the digital corner!

This DXOMARK article was prepared with contributions from Cirrus Logic and Nokia Technologies.