AOO: low-latency peer-to-peer audio streaming and messaging

Have you ever wanted to jam over the internet with your favorite DAW or audio programming language? Or stream sound from your computer to mobile devices? Or build your own wireless microphones or speakers? Then the AOO library may be exactly what you are looking for!

Christof Ressi Christof Ressi
Veröffentlichung
Reading time
9 Minuten
Listen to article
Loading the Elevenlabs Text to Speech AudioNative Player...

AOO (audio over OSC) is a library for peer-to-peer audio streaming and messaging with emphasis on low latency, robustness, flexibility, and ease of use. AOO networks consist of several endpoints, so-called sources and sinks, which can be connected and disconnected ad hoc. The AOO connection server allows peers to address each other by name and facilitates communication over the public internet.

The reference implementation of the AOO protocol is written in C++. The library has a simple C and C++ API and can be easily embedded in host applications or audio plugins. AOO is relatively light on CPU and memory usage and is therefore suitable for embedded devices, such as the ESP32. At the time of writing, AOO includes the following components:

  • C++ library (with C API)
  • Pure Data external
  • aooserver command line program

These are in active development:

  • SuperCollider extension
  • aoosend and aooreceive command line programs

Finally, the following components are desirable, but either planned for the future or best left to third-party developers:

  • Max/MSP external
  • audio plugins (VST3, AU, etc.)
  • programming language bindings (Rust, Java, Python, etc.)

AOO is open-source and released under the permissive BSD license. The official repository is located at AOO (audio over OSC). Follow the README for build and installation instructions. See the doc folder for additional documentation and the examples folder for simple examples in different programming languages. Release notes and pre-built binaries can be found here. For bug reports or feature requests, please use the issue tracker

History

The vision of AOO was first presented in 2009 by Winfried Ritsch together with a proof-of-concept implementation for embedded devices. In his paper towards message based audio systems (Ritsch 2014), he sets out the following requirements:

  • audio signal intercommunication between distributed audio systems
  • arbitrary ad hoc connections
  • various audio formats, sample rates
  • synchronization and lowest latency possible
  • audio-data on demand only

In 2010 AOO was implemented by Wolfgang Jäger as a library with externals for Pure Data, but its practical use remained limited.

In 2020 Christof Ressi started a full revision (AOO 2.0) that deviates substantially from the original design. At the time of writing (2024), AOO 2.0 is still beta software, but the first stable release is planned for the same year.

Use cases

AOO 2.0 was initially developed in February 2020 for the art project Reenactment of Sonic Projections by Bill Fontana. The concept was to set up audio live streams from various places in the city of Graz to the Kunsthaus museum. Winfried Ritsch was responsible for the technical realization and developed the DIY Audio Stream Box for this purpose. The streams were transmitted with the AOO protocol, using an independent wireless network operated by radio hams.

Stream boxes for Reenactment of Sonic Projections. 

AOO has been subsequently used in the Virtual Rehearsal Room (VRR) project which allowed music students of the PPCM program at the University of Arts Graz to keep rehearsing remotely during the early Covid pandemic. At the time there were no software solutions that met our requirements (easy setup, cross-platform, good audio quality, stereo panning, reverb, etc.), so we quickly built a prototype in Pure Data based on the AOO library. The students played in their private rooms all across Europe, often under less-than-ideal conditions: noisy environments, cheap audio equipment, bad internet connection, etc. Nevertheless, at the end of the semester, we hosted two online concerts for which the VRR was expanded to a Virtual Concert Hall (VCH) – with convincing results!

Since then AOO has been used in several art projects. Notable examples are the performances TRACK and BAD BLOCK by the French theater company La Boîte à sel. In TRACK a performer/comedian/beatboxer creates live samples which are played back on small speakers in toy train wagons. The trains move along extended tracks, carrying the sounds across the whole performance space. BAD BLOCK  uses up to 104 portable wireless streaming boxes, allowing the performer and the audience to tactilely interact with the sound sources. In both performances, AOO is responsible for streaming the sounds from the computer to the playback devices.

AOO is also used in the IEM Computer Music Ensemble, in the TPF research project, or in the online jamming app SonoBus. I am aware of at least two companies that use the AOO library in their products.

Finally, AOO can be used to implement wireless speakers or microphones. This can be very handy for multichannel sound installations where it would be impractical or even impossible to run audio cables, in particular when the devices are supposed to move around. As already mentioned in the beginning, AOO can be used on embedded devices such as the ESP32. A proof-of-concept with the Olimex ESP32-ADF board showed that it is easily possible to stream from one computer to 18 speaker devices over WIFI with a total latency of less than 50 milliseconds. The final release of AOO will contain several examples for the ESP32 platform.

Features

AOO supports up to 256 audio channels and puts no restriction on the block size and sample rate. If sources and sinks run at different sample rates and/or block sizes, the streams are automatically reblocked and/or resampled accordingly. The stream format can be set independently from the hardware buffer size and sample rate. This is important because the host audio settings are not necessarily well-suited for streaming. For example, if you run Pure Data with a block size of 64 samples @ 96 kHz, you can set the stream format to 256 samples @ 48 kHz to reduce the required bandwidth. AOO is also capable of dynamic resampling to compensate for clock speed deviations between machines (see Adriaensen 2005).

Automatic reblocking and resampling between computers.

AOO supports several audio codecs. It currently comes with raw PCM (with various bit depths) for the best audio quality and Opus for optimized transmission over the internet or low bandwidth networks. Opus has been chosen because it offers good audio quality even at very low bit rates, adds very little latency, and is capable of packet loss concealment, which makes it ideal for low-latency audio streaming applications. PCM and Opus seem to serve most use cases, but it is possible to add additional codecs via the code plugin API. The QOA codec is currently being evaluated as a possible third built-in option and middle ground between PCM and Opus.

AOO sinks utilize a jitter buffer to compensate for network jitter and packet reordering at the cost of additional latency. The buffer latency can be set dynamically. In case of packet loss, the sink can ask the source to resend the missing data, given that the latency is large enough for the resent packets to arrive in time. For real-time use, however, you rather want to keep the latency as small as possible and instead rely on the packet loss concealment algorithm employed by the Opus codec. In case of jitter buffer overruns or underruns, the sink outputs corresponding events which may be handled by the application. It is also possible to receive diagnostic events about dropped, reordered, or resent packets. This can be used, for example, to dynamically adjust the bit rate in the Opus codec.

Operating principle of the jitter buffer.

Each AOO source can send to several sinks. Conversely, sinks can receive from multiple sources, in which case audio signals on the same channel will be summed. As a consequence, sources and sinks can form any kind of network topology.

Example for AOO network topologies.

Streams can be initiated from both directions: the source can decide to start a stream to one or more particular sinks; conversely, the sink can “invite” one or more sources, which in turn may accept or decline the invitation. Sinks can also “uninvite” sources, i.e. ask them to stop sending.

Initiating AOO streams from both sides.

Streams can be started and stopped freely, without latency, and with sample-accurate timing. The start message may contain additional metadata, e.g. to indicate musical properties or describe the channel layout. Possible data types include binary, text, MIDI, OSC, JSON, XML, FUDI, or numeric arrays.

In combination, these features facilitate various interesting use cases:

  • AOO networks are dynamic and can freely change their topology over time.
  • Streams can be (temporarily) stopped when there is no signal, saving network bandwidth and CPU time.
  • AOO enables an event-based audio approach where a sequence of short sections can be transmitted as a series of distinct streams, possibly with metadata to describe the musical or sonic properties.

AOO streams can also embed messages, so-called stream messages, using any of the before-mentioned data types. For example, these can be used to include panning information, send MIDI messages, or transmit parameter modulation data. Stream messages contain sample-accurate timestamps; the original relative timing between messages will be preserved down to a single sample, even after repeated resampling and reblocking.

AOO stream messages with reblocking and resampling.

While AOO sources and sinks can be addressed directly by their IP address, it is often more convenient to refer to each other by name. When AOO clients connect to an AOO server and join a particular group, they receive information about all the current group members and are notified whenever a new user joins the group or an existing user leaves the group. Groups and users may carry additional metadata. For example, this can be used by an online jamming app to show information about a particular group (location, musical style) and user (location, musical instrument). The concept of server groups is inspired by the OSCgroups project.

AOO server architecture.

Generally, AOO supports both IPv4 and IPv6. Clients typically do not know their public IPv4 address. IPv4 address exhaustion led to the wide employment of NAT, which seriously inhibits peer-to-peer connections over IPv4. There are various NAT traversal strategies, most notably “UDP hole punching”, which AOO uses to connect IPv4-only clients over the internet. This works well unless one or more clients reside behind a symmetric NAT; in this case, it is possible to relay the traffic over dedicated relay servers or the AOO server itself. With IPv6 every device has its own globally unique and routable IP address, which renders NAT unnecessary. However, since firewalls are supposed to block unsolicited incoming traffic by default, some sort of “hole punching” is still required.

UDP hole punching.

AOO clients can also exchange messages of any data type and with optional timestamps to preserve the relative timing. These messages can be sent to individual peers or to whole groups and can be delivered reliably, if needed. For this purpose, AOO implements its own simple re-transmission and acknowledgment strategy. In fact, for stateful messages reliability is very important. Imagine you want to toggle a switch or advance to the next scene: if the network drops or reorders even a single message, the result can be catastrophic! One drawback of reliable messaging is that dropped packets will block subsequent messages until the missing packet has been resent. This is also known as head-of-line blocking. Continuous data streams, such as data from an IMU sensor, are typically immune to packet loss and reordering and may therefore be sent unreliably to avoid undesired delays. Reliable and unreliable messages are sent and received independently from each other, so that the latter will never be blocked by the former.

Reliable and unreliable client messages.

Conclusion

AOO is a new audio streaming and messaging solution with many possible applications in the realm of art and beyond. Its rich feature set enables interesting and unconventional use cases and helps to rethink or challenge traditional concepts of audio streaming.

Finally, I would like to thank the following people and institutions for their support, ideas, and invaluable feedback: Winfried Ritsch, IOhannes m zmölnig, Roman Haefeli, José-Miguel Fernández, Jesse Chappell, IEM, IRCAM, and others.

  • Adriaensen, Fons. 2005. Using a DLL to filter time. Linux Audio Conference.
  • Ritsch, Winfried. 2014. towards message-based audio systems. Linux Audio Conference.

Christof Ressi

Christof Ressi is an Austrian composer, performer, arranger, and software developer. His work fluctuates between New Music, jazz, free improvisation, experimental electronics, and computer music and often features interactive live electronics and multimedia. Together with clarinet player Szilard Benes he performs under the name ressi/benes. He is a regular contributor to open-source projects such as Pure Data and SuperCollider and publishes his software under open-source licenses.

Original language: English
Article translations are machine translated and proofread.