The human voice as a sensor-based electronic live instrument

On the development of experimental interfaces for the world's most complex instrument and the gestural and haptic control of artificial voices in live performances.

Ulla Rauter Ulla Rauter
Veröffentlichung
Reading time
7 Minuten
Listen to article
Loading the Elevenlabs Text to Speech AudioNative Player...

The human voice is probably the oldest, most complex and most versatile musical instrument. And the most difficult to imitate, both analog and digital.

The first successful attempts to create a speaking voice detached from the body date back to the 18th century.

Wolfgang von Kempelen developed the first functioning speaking machine that could produce at least some intelligible speech sounds.

Von Kempelen attempted to imitate human speech instruments using wood, rubber, leather and ivory. The apparatus was played with the hands and forearm, and the intelligibility depended heavily on the virtuosity of the player.

Wolfgang von Kempelen, Speech Synthesizer, 1791, Ausschnitt

The talking machines of the 19th century also resemble musical instruments both visually and in the way they are played: The speaking machine Euphonia, for example, was operated via a keyboard and foot pedals and is strongly reminiscent of an organ, despite its staging with a woman's head.

The first electrically generated speaking voice - the Voder - sent a source signal through various variable filters that were controlled by hand buttons. A wrist switch alternated between voiced and unvoiced sounds, while a foot pedal controlled the basic frequency.

Although these first speaking machines were not designed as musical instruments but for scientific or technical purposes, they were played like musical instruments.

Their operation was very physical, auditory-haptic, in which the sensorimotor system was directly linked to the phonetics and acoustics of the artificial voice. Movement and fine motor skills for playing these instruments had to be learned and trained - often through years of practice. My self-built interfaces for electronic voice described below also focus on the physical playing style and its connection with the vocal sounds and speech sounds.

The act of painting voices

My first work on artificial voices began with spectrograms:

Spectrograms are visualizations of sounds: a fanning out of the frequencies on a time axis - a kind of "fingerprint" of the sound that shows its spectral properties.

Spektrogramm

Spektrogramm des gesprochenen Wortes „eins“

In earlier works, I had already translated images into sounds by sonifying them from left to right along a "scan line" with sine tone generators. The experience that every image can be interpreted as a spectrogram and set to music gave me the idea of creating specific sounds using hand-drawn spectrograms - I was most interested in human voices.

Hand-drawn voice archives

In the work "Sound on Drawing", I paint voice spectrograms on the wall with UV paint. Depending on the context, these are excerpts from interviews, poetic texts or statements on language and identity. The space becomes an archive of inaudible voices, which under certain conditions can be made audible again using the technique of inverse FFT(Fast Fourier Transformation).

„Ton auf Zeichnung“ in der Ausstellung „FINE SOUND – keine medienkunst“, das weiße Haus, Wien

This act of drawing voices is intense and revealing, but as a process it is very laborious and slow. In the follow-up project "Sound Calligraphy", I realized the idea of drawing voices in real time in a live performance and simultaneously setting them to music - in other words, "speaking" through the drawing. As part of the PEEK project"Digital Synesthesia", I developed a calligraphic drawing system in which signs and spectrally coded sounds merge. By successively reducing the spectrograms of speech sounds, I found the most minimal form for each sound that still contains enough information to be recognized as a concrete speech sound when set to music, but which can also be drawn calligraphically with just a few strokes.

Klangkalligrafie des gesprochenen Wortes „eins“

In the performance, I draw the voice calligraphies on paper while they are filmed live by a camera and set to music in a loop. In this way, a text is created successively - word by word - that allows the hand-painted voice to speak about its own - disembodied - presence and identity.

Klangkalligrafie: Performance in der Ausstellung DIGITAL SYNESTHESIA, Angewandte Innovation Lab Wien

This artificially generated speaking voice is barely comprehensible at first, but with each additional stroke it becomes easier to perceive fragments of speech in what you hear.

Video URL
Klangkalligraphie

The hand and the voice - "speaking and singing" performance instruments

With my self-built interfaces, I am generally interested in being able to control and play digital and electronic sounds with the body and to create a meaningful connection between action/body reaction and sound through the use of sensors - similar to a classical musical instrument. By integrating internal body processes as control signals, the body becomes even more directly involved in the musical interface.

In recent years I have developed a series of interfaces for electronic voice. The aim was to work with artificially generated or transformed voices in my live musical performances: I designed these instruments to electronically expand my own voice and detach it from the body on the one hand, and to control artificial voices as a kind of digital ventriloquism with gestures and body movements on the other.

The voice interfaces are specifically about approaching the voice in its complexity with the input data and being able to control the most important parameters individually.

The first live instrument I developed for the voice was the Fingertip Vocoder: an interface worn on the fingers - initially wired, later connected to the computer via USB. Brass fingertip rings function as sensor switches and control a Pure Data patch that multiplies the performer's voice at certain intervals to create polyphonic sounds - the Fingertip Vocoder makes it possible to sing a choral piece solo. The chords are generated by combinations of fingertip touches, which have to be learned and internalized in a similar way to fingerings on the piano.

Fingertip-Vocoder: Belegung der einzelnen Fingerspitzenringe

Fingertip-Vocoder

Audio file

I extended the range of functions of the fingertip vocoder by also using a Leap Motion Controller. This infrared sensor detects the position of the hands and fingers in space. This means that - in addition to activating the individual voice transpositions - further parameters can be controlled: The left-right movement of the hand can, for example, move the individual voices smoothly between the left and right channels and distribute them in the room.

I also began to work spectrally with live recorded/sung or artificially generated voice material. I navigate through the audio track as if through an imaginary three-dimensional shape in space - the movement of the hand controls the cursor, so to speak, which scans the sound file. The vocal sounds can be "frozen" at any point and remain in the room as a continuous tone or noise - depending on which sound is being spoken or sung.

The musicality inherent in spoken language itself becomes tangible.

Video URL
Ulla Rauter, Christina Ruf //STIMME UND ELEKTRONIK, Alte Schmiede 2020

Light scans through the body of the voice

In the "Talking Hands" project, I replaced the fingertip rings with the exact position of the fingertips using the Leap sensor and designed a net-like LED interface that runs over the hands and fingers, making it visible which sound track is currently active and where in the "vocal space" I am currently moving.

Talking Hands

Talking Hands

For the first prototype of the "Talking Hands", I combined gestures and movements with the speech synthesizer DECtalk. Finger and hand gestures activate sounds that are set to music by DECtalk and processed as sound in real time with MaxMsp. The movement of the hand in space can control the intonation and volume. Dynamic (melodic) speech is possible, at least to some extent.

Audio file

However, I was interested in working with an artificial voice that sounded more natural and exploring it spectrally with the gestures via the interface. I generated a voice for "Talking Hands" on the Resemble AI platform, modeled on my own speaking voice. This can be integrated into a live workflow via Python and I use it as the basis for the spectral play with the sound in MaxMSP: the hand navigates through the time track and simultaneously wanders through the vertical layers of the voice - different areas in the overtone spectrum are emphasized or alienated, the timbre changes through subtle finger movements and rotation of the hand.

Video URL
Talking Hands | Live Performance

Digital choirs and the voice as a string instrument

In my current performance "All of my voices", I start with minimal vocal input and use it to generate an artificial choir that gradually unfolds in space - initiated and conducted by the fingertip vocoder. In the second part of the performance, I play the "Stimmbogen" - another electronic interface with which I transform the human voice into a stringed instrument by mapping it as a soundtrack onto a monochord. In the vocal bow, the voice merges with the string and unfolds its potential as a manually played live instrument: the spectrum of haptic and intuitive forms of expression of this interface for the medium of the electronic voice ranges from percussive consonants to bowed vowels.

Video URL
All of my voices | signale graz 2024

Further development

Technically, the potential for further development of my voice interfaces lies in the progress of AI-based voice synthesis.

The sound quality of AI voices is now very high, but the higher latency compared to classic voice generators such as DECtalk can be annoying in real-time applications in live performances. That's why I'm also interested in using a vocal tract model that controls parameters such as the shape and position of lips or tongue directly with sensor values. A virtual vocal tract could be manipulated intuitively and in real time via an interface, making it possible to create non-verbal and experimental sounds.

Musically, I am working on combining the original functions of the "Fingertip Vocoder" - the extension of the natural voice through transposition - with the spectral sound processes of the "Talking Hands". The aim is to create an even smoother transition between my own voice and the sensor-controlled synthetic sound shapes and vocal artifacts.

As a live instrument, the sensor-based voice continually opens up new fields of development and musical scope for me.

Ulla Rauter

Ulla Rauter works as a media artist and musician at the interface of sound art and visual art - her works include performative sculptures, musical performances and self-built instruments. Her sensor-based, electronic musical instruments are shown internationally in performances and exhibitions. Central themes in her artistic work are silence as a material and place of longing and the human voice as the source material for translation and transformation processes. Ulla Rauter is co-founder and organizer of the annual Klangmanifeste audio show. She has been teaching at the Department of Digital Art at the University of Applied Arts Vienna since 2013.

Original language: Deutsch
Article translations are machine translated and proofread.