This binaural microphone can teach us some really important lessons about the way humans hear sound. The principles in this post are at the foundation of immersive audio in films, video games, and virtual reality.
Inside each of these artificial ears is a condenser microphone. I’m going to record the left and right microphone and play them back through your headphones so that you hear exactly what these ears are hearing.
To get the most out of the demonstrations in the video above, you’ll want to wear headphones!
How Does Binaural Audio Work?
There are three fundamental ways that humans determine the location of a sound source. These are called sound localization cues.
The first is an interaural level difference, or ILD. If a sound is louder in the left ear than it is in the right ear, you will naturally perceive the sound to be originating from the left side.
The second localization cue is an interaural timing difference, or ITD. When a sound originates from the right side, for example, the sound will reach the right ear slightly before it reaches the left ear.
Your auditory system is very sensitive to these cues, allowing you to very clearly determine where a sound is coming from even if you can’t see it.
You can harness the power of ILDs and ITDs to trick the listener’s mind and create a more immersive experience with a bit of panning and delay while mixing in post production. You can also capture ILDs and ITDs with regular microphones using stereo microphone techniques.
What makes this microphone so unique and powerful is that it adds a third sound localization cue – HRTF.
HRTF stands for Head-Related Transfer Function. That name sounds complicated, but a transfer function is just the effect that a component has on the signal. In this case, we are talking about the effect that the listener’s head has on the signal.
Our subconscious awareness of the effects that the head, outer ears, and shoulders have on sounds around us opens the door to more precise localization.
Interaural level differences and interaural timing differences alone can have ambiguous effects. While these cues can help the listener localize a sound from left to right, they don’t do much to help the listener localize on the vertical plane or to localize something from behind or in front.
For instance, imagine a sound that arrives at both ears at the same time and is the same level in each ear. That sound could be directly in front of the listener, directly above, or directly behind.
If a sound comes from the left side, it will not only be louder overall in the left ear, but the high frequencies will also be attenuated or reflected before they reach the right ear. That will result in a slightly darker sound quality in the right ear.
The shape of the pinnae also plays into this, filtering sound differently depending on the angle at which the sound arrives. Therefore, sounds from behind the listener will undergo a slightly different transfer function than sounds from in front of the listener.
Binaural Microphones
Several microphones have been designed with these principles in mind.
One example is the Neumann KU 100 microphone which simulates the average size, density, and shape of a human head. Other microphones even add an artificial torso to capture the cues that the shoulders and chest provide in localizing a sound source.
As you can imagine, these are highly specialized microphones and are therefore prohibitively expensive. That’s why this 3DIO microphone is so exciting, because it offers remarkably realistic binaural recording at a much more practical price point.
Rather than building out the full head and torso, 3Dio has chosen to use a simple bar that separates the artificial ears to the appropriate distance.
They have a few versions of this mic depending on your budget. The FS Pro II has a DPA omni microphone in each ear and XLR outputs, maintaining professional-level recording quality. I use the FS XLR, which also has XLR outputs with slightly less expensive FS microphone capsules. There is also a less expensive version that has a 3.5mm output for connecting to phones and tablets.
There are also microphone kits that are designed to be inserted into your own ears for recording. However, these have several disadvantages. Firstly, you will need to remain perfectly still and quiet during the recording as any sound or movement will be permanently printed to the recording. And while using your own ears might make for a recording that sounds perfect to your ears, it may also make your recording less compatible to other listeners due to the unique shape of your head and ears.
Binaural Audio In Films & Video Games
While using panning and delay is a fairly straightforward process while mixing in post production, harnessing the power of HRTFs is slightly more complicated if sounds aren’t originally recorded with a binaural microphone. However, tools for implementing HRTFs into audio experiences are becoming increasingly prominent.
As a mixing engineer, you can utilize binaural panning plugins that will take a mono or stereo audio input and output a binaural rendering. This unlocks your panning capabilities from just the 2-dimensional horizontal plane of stereo to a full 360-degree soundscape.
One example of binaural technology on the listener’s end is the binaural rendering process for Dolby Atmos in headphones. This will adapt a Dolby Atmos mix into a binaural experience so that the surround elements are preserved.
In fact, several formats that utilize binaural rendering, including Dolby Atmos, now offer the option to create a custom HRTF rather than a standard algorithm based on a generic HRTF. Dolby Atmos has an app that captures 50,000 points of the user’s head, ears and shoulders to generate an algorithm that is uniquely tailored to each listener, and ensures that the mix or audio experience translates that much better to that unique individual.
The capabilities of this technology are endless and I’m looking forward to seeing how far it will go in the future! One idea that is particularly exciting for video games, VR, and AR is to track the direction the listener faces and adapt the audio experience in real time.