Soon We Will be Able to Design Custom Sounds with Voice And Gesture

We may earn a commission from links on this page.

The first thing an architect or graphic designer will do at the start of a project is to produce some preliminary sketches — just to rough out their ideas on paper, perhaps augmented with computer-aided design software. But sound designers don’t have similar tools. A consortium of European researchers is seeking to change that by developing a suite of sketching tools for sound, based on voice and gestures.

“If you are an architect and want to sketch a house, you can simply draw it on a sketchpad,” the researchers wrote in a summary of their work. “But what do you do if you are a sound designer and want to rapidly sketch the sound of a new motorbike?” The usual tools — synthesizers, samplers, and sequences, for instance — are complicated and require considerable training to use. They’re just not as simple, quick, and intuitive as a sketch pad.

Sound is difficult to describe in words, which is why most of us resort to a combination of gesture and vocal mimicry when, say, trying to convey to someone else that a car goes vrooom. The human voice is like a built-in sound synthesizer.


“People can recognize fairly well what a person imitates,” Guillaume Lemaitre, a researcher at Ircam in Paris, France, told Gizmodo via email. “So our dream tool would be a synthesizer that we could directly interact with, [using] our voice and gestures, just as what we do naturally when we talk to someone. Ideally, this synthesizer would understand the imitations the same way a person would do, and create sounds accordingly.”

That’s the goal of SkAT-VG (Sketching Audio Technologies with Voice and Gestures), a three-year interdisciplinary collaborative project between four partners. Ircam is responsible for aspects involving perception psychology, gesture analysis, signal processing, and machine learning. The Royal Institute of Technology (KTH) in Stockholm, Sweden, is handling the phonetics, while Iuav University of Venice, Italy, focuses on sound design and sound synthesis. And Genesis, a company based in Aix-en-Provence that conducts sound studies and develops audio technologies for sound design, is in charge of user studies and prototype integration.


The first step is gaining a better understanding of how people use mimicry and gesture to communicate different sounds. So Lemaitre and his Ircam colleagues rounded up 50 volunteers and had them listen to recorded sounds, then imitate those sounds. There were mechanical sounds (like tapping and scraping), sounds of common objects (cars, blenders and saws) and also computer sounds, like sound effects in video games. All the participants were filmed with a GoPro camera, and fitted with a body-tracking kinect and accelerometers attached to their wrists. They also captured the process on video:


Lemaitre admits that they had some misconceptions going into the study. For instance, “We initially thought that people would draw the trajectory of some acoustical features — like pitch or the intensity — with their hands in the air, like raising your hand to imitate pitch going up,” he said. But this proved not to be the case. Instead, gestures were used more for emphasis, in a metaphorical fashion stereotypically associated with Italian characters in film and television. “They seemed to be more like symbols that indicate certain overall properties of the sounds,” Lemaitre said.

Based on that, he and his colleagues concluded that gestures would not be particularly useful as a means of precisely controlling the behavior of a synthesizer in real time, as the consortium members originally thought would be possible. Vocal imitations are far more effective for that purpose. “Voice can reproduce accurately higher tempos than gestures, and is more precise than gestures when reproducing complex rhythmic patterns,” according to Lemaitre’s summary.


The next step is to build actual prototypes of the sketching tools, based on what’s been learned so far, and test how well they work in real-world conditions. Lemaitre said the consortium will hold a special event this spring in the south of France, specifically for sound designers, giving them the task of creating specific sounds with the prototype tools and evaluating the pros and cons of the prototypes.

Practical uses aside, Lemaitre thinks studies of vocal imitations and gestures might also prove beneficial for neuroscientists interested in auditory perception and cognition. Studies like the one above could improve our understanding how sounds are encoded in memory.



Rocchesso, D., Lemaitre, G., Susini, P., Ternström, S., & Boussard, P. (2015) “Sketching Sound with Voice and Gesture,” Interactions 22(1): 38-41.


[Via Acoustical Society of America]

Image: View Apart/Shutterstock