Researchers Use Deep Learning to Convert Mono Sounds to 3D




Your ears are a magnificent feat of evolutionary optimization, and have formed over millions of years to give you fantastic sound-locating super powers. The shape of your ear and your brain’s auditory processing capabilities allow you to determine what direction a sound is coming from and how far away it is with remarkable accuracy. But, that effect has been difficult to reproduce in recordings. Now, a new deep learning technique can turn mono sound recordings into three-dimensional representations.

This technique was developed by Ruohan Gao at the University of Texas and Kristen Grauman at Facebook Research. In the linked article, the author claims that “effectively imitating 3D sound has always eluded researchers.” That’s not strictly true; 3D sound is entirely possible in many situations. What’s difficult is recording and reproducing 3D sound in the real world. Stereo microphones don’t faithfully reproduce the human ears, and so they don’t hear the same way we do.

The system created by Gao and Grauman attempts to do that with a special recording setup and deep learning. Sounds are recorded on a pair of microphones embedded in synthetic reproductions of human ears, which hear similarly to your actual ears. Another method converts monaural sound to binaural sounds through deep learning. A trained machine learning system watches a video of the recording, and attempts to find the source of the sound. The recording is then distorted to match the predicted source. This “2.5 recording” isn’t perfect, but it’s a step forward in reproducing 3D sound.


Researchers Use Deep Learning to Convert Mono Sounds to 3D was originally published in Hackster Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Original article: Researchers Use Deep Learning to Convert Mono Sounds to 3D
Author: Cameron Coward