4 Minuten
Our media consumption has changed dramatically thanks to media libraries and streaming services. We want to decide what we want to watch and when. We also want to choose our own playback system: Sometimes we want to watch a movie on the TV with a 5.1 surround system, then a documentary on our laptop or even a series on our cell phone - preferably binaurally via headphones.
With so many options, we sometimes wish we could adapt the content to our preferences. Of course, we can't change the plot of a movie. But raising the volume of the dialog without making the entire sound louder? Such adjustments are indeed possible. To offer them across the board - and at data transmission rates that are also practical for streaming - the MPEG-H Audio System was developed with significant involvement from the Fraunhofer Institute for Integrated Circuits IIS.
State-of-the-art audio technology for individual experiences
MPEG-H Audio is a Next Generation Audio (NGA) codec that can seamlessly play back personalized audio content across all device classes in addition to immersive sound. It enables the production, transmission, and playback of any combination of channel-, object- and scene-based audio. In this system, audio objects are individual sound elements that are arranged both horizontally and vertically using position-based metadata. By transferring audio and associated metadata, the content can be flexibly adapted to the respective system during playback. This makes it easy to implement barrier-free content and adapt it to personal taste.
The options required for this are created in a new step in the production process: During authoring, metadata such as the language of the commentators or interaction options are defined and saved with the audio data in a file during export. In this way, all personalization and selection options are carried in the same data stream during transmission, whereby low bit rates can still be achieved. This metadata then enables interaction on the end device.
Authoring MPEG-H audio is possible with modern DAWs. The system is fully integrated into Nuendo 13 and is also supported by ProTools.
Brazil modernizes its digital infrastructure
The public in Brazil has already had its first experience of this. The entire TV infrastructure there is currently being modernized in a complex process. A series of criteria were initially defined for the new digital broadcasting system TV 3.0. In addition to realistic, immersive sound that can be reproduced on different types of devices, these also included a high degree of user-friendliness, interactivity, and customizability. An easily accessible audio description and the option to position it individually in the room, as well as the balance between dialog and background noise based on personal preferences, should make it easier for people with visual or hearing impairments to access the broadcast program. In an extensive evaluation by independent test laboratories, MPEG-H Audio finally prevailed over the other submissions and became the mandatory audio standard for the new system in 2021.
Bei Globo in Recife (Brasilien) wird eine Sendung in MPEG-H Audio produziert.
Since this decision, Brazilian broadcasters have been testing and implementing the new audio system. Globo, the largest media group in Latin America, tested demanding live situations with the popular conference call on the São João do Nordeste holiday. The venues in the north of Brazil transmitted the audio signals to the local Globo hub in Recife, where the TV production took place. From the incoming signals, the Globo team created a 5.1+4H immersive sound live mix. Commentators, audience, and audio descriptions were added as separate audio objects. This allowed for advanced customization options for the TV audience. A particularly novel approach was the real-time creation of the audio description: the video and audio signals were transmitted to São Paulo, where the audio description was created live and sent back to Recife to be integrated into the live stream.
The Globo team created the audio signal with four presets for the audience to choose from: a standard preset, a version with more intelligible dialog, one with audio description, and one with more live atmosphere. Finally, the video and the MPEG-H audio signal were encoded and exported simultaneously in two output formats: A classic broadcast signal and a live streaming service with the Common Media Application Format (CMAF) for HLS and DASH.
MPEG-H Audio fully meets the industry's requirements for a modern, future-proof audio system. This is demonstrated by several practical tests as well as the extensive evaluation process in Brazil. The massive upheavals and modernizations that are currently changing media reception on a global level make investments necessary in many areas. However, these will pay off several times over if the television of the future can offer many new and meaningful innovations on an auditory level in addition to high-resolution video, both in terms of artistic possibilities and in terms of personalization and inclusion.
More on the topic:
Daniela Rieger
Daniela Rieger specializes in audiovisual media and 3D audio and has been working at Fraunhofer IIS since 2020. As a sound engineer, she works on topics such as MPEG-H audio and immersive music. Her master's thesis on object-based music production won 2nd prize in the ARD/ZDF "Women + Media Technology" award in 2021. She is still involved in research on AI-based technologies for accessible audio content. Daniela Rieger has been Vice President of the Association of German Sound Engineers (VDT) since 2022.
Article topics
Artikel von Daniela Rieger
Article translations are machine translated and proofread.