Sunday 28 March 2010

3D Audio, and Binaural Recording

Binaural recording: NIH 'Virtual Human' head cross-section, Neuman KU100 'dummy head' binaural microphone (inverted image), Sound Professionals in-ear microphone (left ear)

One of the dafter things they teach in physics classes is that because humans only have two ears, we can only hear location by comparing the loudnesses of a sound in both ears, and that because of this we can only hear "lefty-rightiness", unless we start tilting our heads.

It's wrong, of course: Physics people often suck at biology, and (non-physicist) humans are actually pretty good at pinpointing the direction of sound-sources, without having to tilt our heads like sparrows, or do any other special location-finding moves.

And we don't just perceive sound with our ears. It's difficult to locate the direction of a backfiring car when it happens in the street (because the sound often reflects off buildings before it reaches us) ... but if it happens in the open, we can directionalise by identifying the patch of skin that we felt the sound on (usually chest, back, shoulder or upper arm), and a perpendicular line from that "impact" patch then points to the sound-source.
For loud low-frequency sounds, we can also feel sounds through the pressure-sensors in our joints.

But back to the ears ... while its obviously true that we only have two of them, it's not true that we can't use them to hear height or depth or distance information. Human ears aren't just a couple of disembodied audio sensors floating in mid-air, they're embedded in your head, and your head's acoustics mangle and colour incoming sounds differently depending on direction, especally when the sound has to pass through your head to get to the other ear. The back of your skull is continuous bone, whereas the front is hollow, with eyeballs and eyesockets and naso-sinal cavities, with Eustachian tubes linking your throat and eardrums from the inside. You have a flexible jointed spine at the back and a soft hollow cartilaginous windpipe leading to a mouth cavity at the front, and as sounds pass through all these different materials to reach both ears, they get a subtle but distinctive set of differential frequency responses and phase shifts that "fingerprint" them based on their direction and proximity.

To make the colouration even more specific, we also have two useful flappy things attached to the sides of our heads, with cartilaginous swirls that help to introduce more colourations to sounds depending on where they're coming from. Converting all these effects back into direction and distance information probably requires a lot of computation, but it's something that we learn to do instinctively when we're infants, and we do it so automatically that  – like judging an object's distance by waggling our eye-focusing muscles – we're often not aware that we're doing it.

The insurance industry knows that people who lose an external ear or two often find it more difficult to directionalise sound. Even with two undamaged eardrums, simple tasks like crossing the road can become more dangerous. If you've lost an ear, you might find it more difficult working on a building site or as a traffic cop, even if your "conventional" hearing is technically fine.

Binaural, or 3D sound recording:

We're good enough at this to be able to hear multiple sound sources and pinpoint all their directions and distances simultaneously, so with the right custom hardware, a studio engineer can mimic these effects to make the listener "hear" the different sound-sources as coming from specific directions, as long as they're wearing headphones.

There are three main ways of doing this:

1: "Dummy head" recording

This literally involves building a "fake head" from a mixture of different acoustic materials to reproduce the sound-transmission properties of a real human head and neck, and embedding a couple of microphone inserts where the eardrums would be. Dummy head recording works, but building the heads is a specialist job, and they're priced accordingly. Neumann sell a dummy head with mic inserts called the KU100, but if you want one, it'll cost you around six thousand pounds.
Some studios have been known to re-record multitrack audio into 3D by surrounding a dummy head with positionable speakers, bunging it into an anechoic chamber and then routing different mono tracks to different speakers to create the effect of a 3D soundfield. But this is a bit fiddly.

2: 3D Digital Signal Processing

After DSP chips came down in price the odd company started using them to build specialist DSP-based soundfield editors. So for instance, the Roland RSS-10 was a box that let you feed in "mono" audio tracks and it'd let you choose where they ought to appear in the soundfield. You could even add an outboard control panel with alpha dials that let you sweep and swing positions around in real time.
Some cheap PC soundcards and onboard audio chips have systems that nominally let you position sounds in 3D, but the few I've tried have been a bit crap, their algorithms probably don't have the detail or processign power to do this properly.
At "only" a couple of thousand quid, the Roland RSS10 was a cheaper more controllable option for studio 3D mixing than using a dummy head in a sound booth, and Pink Floyd supposedly bought a stack of them. There's also a company called QSound that do this sort of thing: Qsound's algorithms are supposed to be more based on theoretical models, Roland's based more on reverse-engineering actual audio.

3: "Human head" recording

There's now a third option: a microphone manufacturer called Sound Professionals had the idea that, instead of using a dummy human head, why not use a real human head?.
This doesn't require surgery, you just pop the special microphones into your ears (making sure that you have them the right way round), and the mics record the 3D positioning colouration created by your own head's acoustics.
The special microphones cost a lot less than a Neumann KU100, and they're a lot easier to use for field recording than hauling about a dummy head – it's just like wearing a pair of "earbud"-style earphones. The pair that I bought required a mic socket with DC power, but I'm guessing that most field recorders probably provide that (they certainly worked fine with a Sony MZ-N10 minidisc recorder).
Spend a day wandering around town wearing  a pair of these, and when you listen to the playback afterwards with your eyes closed, it's spooky. You hear //everything//. Birds tweet above your head, supermaket trolley wheels squeak at floor level, car exhausts grumble past the backs of your ankles as you cross a road, supermarket doors --swisssh-- apart on either side of you as you enter.
"Human head" recording isn't quite free from problems. The main one is that you can't put on a pair of headphones to monitor what you're recording, real-time, because that's where the microphones are: you either have to record “blind” or have a second person doing the monitoring, and you can't talk to that person or turn your head to look at them (or clear your throat) without messing up the recording. If you move your head, the sound sources in the recording swing around in sympathy. Imagine trying to record an entire symphony orchestra performance while staring determinedly at a fixed point for an hour or two. Tricky.
The other thing to remember is that although the results might sound spectacular to you (because it was your head that was used for the recording), it's difficult to judge, objectively, whether other people are likely to hear the recorded effect quite so strongly. For commercial work you'd also want to find some way of checking whether your “human dummy” has a reasonably "standard" head. And someone with nice clear sinuses is likely to make a better recording that someone with a cold, or with wax-clogged ears.
Another complication is that most people don't seem to have heard of "in-ear" microphones for 3D human head recording, so they can be difficult to source: I had to order mine from Canada. 

Media

For recording and replaying the results: since the effect is based on high-frequency stereo colourations and phase differences, and since these are exactly the sort of thing that MP3 compression tends to strip out (or that gets mangled on analogue cassette tape), it's probably best to try recording binaural material as high-quality uncompressed wav files. If you find by experiment that your recorder can still capture the effect using a high-quality compressed setting, then fine. The effect's captured nicely on 44.1kHz CD audio, and at a pinch, it even records onto high-quality vinyl: the Eurythmics album track "Love you like a Ball and Chain" had a 3D instrumental break in which sound sources rotate around the listener's head, off-axis: if you look at the vinyl LP, the cutting engineer has wide-spaced the tracks for that section of recording to make absolutely sure that it'd be cut with maximum quality.

Sample recordings

I'd upload some examples, but my own test recordings are on minidisc, and I no longer have a player to do the transfer. Bah. :(
However, there's some 3d material on the web. The "Virtual Barber Shop" demo is a decent introduction to the effect, and there are some more gimmicky things online, like Qsound's London Tour demo (with fake 3D positioning and a very fake British accent!). When I was looking into this a few years back, the nice people at Tower Records directed me to their spoken word section where they stocked a slightly odd "adult" CD that included a spectacular 3D recording of, uh, what I suppose you might refer to as an adult "multi-player game". Ahem. This one actually makes you jump, as voices appear without warning from some very disconcerting and alarming places. I'm guessing that the actors all got together on a big bed with a dummy head and then improvised the recording. There's also a couple of 3D audio sites by binaural.com and Duen Hsi Yen that might be worth checking out.
So, the subject of 3D audio isn't a con. Even if the 3D settings on your PC soundcard don't seem to do much, "pro" 3D audio is very real - with the right gear, the thing works just fine. It's also fun.

2 comments:

ErkDemon (Eric Baird) said...

The crescent-shaped thing on the right of the banner picture is an in-ear microphone.

Kall Binaural Audio said...

Great overview of binaural recording techniques!