Hear, hear: Immersive audio is on the rise
Next-generation audio (NGA) is the new buzzword in the broadcast world. While immersive/3D audio has been mandatory in the movie sector for years, the broadcast world is still busy putting everything in place. A lot of pioneering work has been done to grace the 4K picture quality with a worthy rendition of the soundscape, in order to turn the viewing experience into events in their own right.
Unlike movie soundtracks, however, which are produced for a well-known, controlled, and calibrated listening environment, 3D/immersive audio presentations can be consumed in a variety of ways—hence the ‘next-generation audio’ moniker. What does this mean for audio engineers and the way they monitor their productions?
The main challenges of next-generation audio are the ‘bonus features’ afforded by the new approach: the audio engineer needs to bear in mind the freedom and flexibility granted to end consumers of the audio content. This is due to the object-based nature of the MPEG-H or ATMOS audio material. End consumers can individualise the streams they receive by changing the levels of the ambience, the commentator, etc. Audio engineers—or ‘sound supervisors,’ as they are increasingly called—must take a step back and work with assumptions. They cannot ‘play to the gallery’, because they do not know what the gallery looks like. With NGA object-based audio productions, the focus on listening scenarios is no longer possible. Sound supervisors create their three-dimensional space based on a given number of objects. Every object bundle constitutes a ‘presentation.’ End consumers are at liberty to listen to the 3D/immersive audio presentations using binaural headphones, speakers, soundbars, up-fire speakers, etc. The decoder in end-consumers’ homes, which is required to receive NGA streams, mixes in tandem with the audio engineer. Mixing for a specific setup is almost pointless. Enter the rendering principle: the audio objects supplied to end consumers contain co-ordinates rather than channel or speaker references. This allows the decoder in end consumers’ homes to render the immersive audio content to the available diffusion system by translating/adapting the panning information set by the sound supervisor to the real-life speaker setup in your home.
The individualisation component of object-based delivery, i.e. the possibility for end consumers to change the level of the ambience, the commentator, and other objects, nevertheless, forces sound supervisors to check whether their mix works in a variety of listening scenarios, even though none of them may actually correspond to the rendition at home. ‘Typical’ reference setups are the best option a sound supervisor has, for monitoring the audio content. Some audio engineers use four reference setups: 5.1.4, 5.1, stereo, and binaural headphones. These are checked against the various presentations/mixes, for a total of at least 16 mixes an A1 needs to check regularly.
So far, so immersive
The presentations and streams discussed above are currently being rolled out. MPEG-H already allows Korean viewers to select the streams they are interested in and to change their levels, while Britain-based Sky and BT Sports telecast in Dolby ATMOS. Individualisation of 3D/immersive audio can become a blessing in disguise, though: any added flexibility the sound supervisor provides may lead to situations where tweaks by end-users can blur the audio content beyond recognition. Sound supervisors of NGA productions therefore favour a slightly conservative approach. They know that they are unable to control what viewers at home do with the presentations they receive, and so limit the options.
The production as such, is relatively straightforward and similar to 5.1 scenarios, except for the added dimension (height/elevation), which requires additional busses on the mixing console. New considerations include how the additional options are presented to end consumers, how to monitor the presentations, and which individualisation options to make available to general public.
The most important consideration for how sound supervisors go about their immersive job, is the convenience with which they can monitor the various presentations and formats (stereo, surround, 3D) right from their console. Lawo consoles have allowed audio engineers to control all relevant parameters without ever moving away from the desk for years.
There is no time to tweak settings on two, or even three devices, during a live production. Thanks to their integration with Dolby, MPEG-H and other formats, as well as their open Ember+ control protocol, mc² consoles are an important step in the right direction and have proved worth their salt in trailblazing forays into immersive sound.
Dynamics effects are extremely important in a surround/immersive audio scenario. For key signals—speech, music, FOP noises—the human ear indeed prefers to stay in a +7~–10dB LUFS range. The dynamic range can (and probably must) therefore be restricted to provide a satisfactory listening experience at lower levels—typically around 70dBA at home, as opposed to 110dBA at the venue.
Most sound engineers are confident that 3D/immersive audio will establish itself much faster than 5.1, not least thanks to important sidekicks like VR and AR, i.e. games that have been applying 3D viewing and listening for years. Most of today’s children are already familiar with binaural listening, even though they may be unable to describe what it is. Head tracking is available on most gaming consoles. And smartphones are perfectly able to decode such information.
Audio engineers can easily create binaural mixes that serve as immersive sound renditions—and most people will be hooked almost instantly and never want to return to a stereo mix. Headphones will probably play an important part in establishing immersive audio. And lest we forget: sound bars score high on the wife acceptance factor scale….