Sunday, 28 March 2010

3D Audio, and Binaural Recording

Binaural recording: NIH 'Virtual Human' head cross-section, Neuman KU100 'dummy head' binaural microphone (inverted image), Sound Professionals in-ear microphone (left ear)

One of the dafter things they teach in physics classes is that because humans only have two ears, we can only hear location by comparing the loudnesses of a sound in both ears, and that because of this we can only hear "lefty-rightiness", unless we start tilting our heads.

It's wrong, of course: Physics people often suck at biology, and (non-physicist) humans are actually pretty good at pinpointing the direction of sound-sources, without having to tilt our heads like sparrows, or do any other special location-finding moves.

And we don't just perceive sound with our ears. It's difficult to locate the direction of a backfiring car when it happens in the street (because the sound often reflects off buildings before it reaches us) ... but if it happens in the open, we can directionalise by identifying the patch of skin that we felt the sound on (usually chest, back, shoulder or upper arm), and a perpendicular line from that "impact" patch then points to the sound-source.
For loud low-frequency sounds, we can also feel sounds through the pressure-sensors in our joints.

But back to the ears ... while its obviously true that we only have two of them, it's not true that we can't use them to hear height or depth or distance information. Human ears aren't just a couple of disembodied audio sensors floating in mid-air, they're embedded in your head, and your head's acoustics mangle and colour incoming sounds differently depending on direction, especally when the sound has to pass through your head to get to the other ear. The back of your skull is continuous bone, whereas the front is hollow, with eyeballs and eyesockets and naso-sinal cavities, with Eustachian tubes linking your throat and eardrums from the inside. You have a flexible jointed spine at the back and a soft hollow cartilaginous windpipe leading to a mouth cavity at the front, and as sounds pass through all these different materials to reach both ears, they get a subtle but distinctive set of differential frequency responses and phase shifts that "fingerprint" them based on their direction and proximity.

To make the colouration even more specific, we also have two useful flappy things attached to the sides of our heads, with cartilaginous swirls that help to introduce more colourations to sounds depending on where they're coming from. Converting all these effects back into direction and distance information probably requires a lot of computation, but it's something that we learn to do instinctively when we're infants, and we do it so automatically that  – like judging an object's distance by waggling our eye-focusing muscles – we're often not aware that we're doing it.

The insurance industry knows that people who lose an external ear or two often find it more difficult to directionalise sound. Even with two undamaged eardrums, simple tasks like crossing the road can become more dangerous. If you've lost an ear, you might find it more difficult working on a building site or as a traffic cop, even if your "conventional" hearing is technically fine.

Binaural, or 3D sound recording:

We're good enough at this to be able to hear multiple sound sources and pinpoint all their directions and distances simultaneously, so with the right custom hardware, a studio engineer can mimic these effects to make the listener "hear" the different sound-sources as coming from specific directions, as long as they're wearing headphones.

There are three main ways of doing this:

1: "Dummy head" recording

This literally involves building a "fake head" from a mixture of different acoustic materials to reproduce the sound-transmission properties of a real human head and neck, and embedding a couple of microphone inserts where the eardrums would be. Dummy head recording works, but building the heads is a specialist job, and they're priced accordingly. Neumann sell a dummy head with mic inserts called the KU100, but if you want one, it'll cost you around six thousand pounds.
Some studios have been known to re-record multitrack audio into 3D by surrounding a dummy head with positionable speakers, bunging it into an anechoic chamber and then routing different mono tracks to different speakers to create the effect of a 3D soundfield. But this is a bit fiddly.

2: 3D Digital Signal Processing

After DSP chips came down in price the odd company started using them to build specialist DSP-based soundfield editors. So for instance, the Roland RSS-10 was a box that let you feed in "mono" audio tracks and it'd let you choose where they ought to appear in the soundfield. You could even add an outboard control panel with alpha dials that let you sweep and swing positions around in real time.
Some cheap PC soundcards and onboard audio chips have systems that nominally let you position sounds in 3D, but the few I've tried have been a bit crap, their algorithms probably don't have the detail or processign power to do this properly.
At "only" a couple of thousand quid, the Roland RSS10 was a cheaper more controllable option for studio 3D mixing than using a dummy head in a sound booth, and Pink Floyd supposedly bought a stack of them. There's also a company called QSound that do this sort of thing: Qsound's algorithms are supposed to be more based on theoretical models, Roland's based more on reverse-engineering actual audio.

3: "Human head" recording

There's now a third option: a microphone manufacturer called Sound Professionals had the idea that, instead of using a dummy human head, why not use a real human head?.
This doesn't require surgery, you just pop the special microphones into your ears (making sure that you have them the right way round), and the mics record the 3D positioning colouration created by your own head's acoustics.
The special microphones cost a lot less than a Neumann KU100, and they're a lot easier to use for field recording than hauling about a dummy head – it's just like wearing a pair of "earbud"-style earphones. The pair that I bought required a mic socket with DC power, but I'm guessing that most field recorders probably provide that (they certainly worked fine with a Sony MZ-N10 minidisc recorder).
Spend a day wandering around town wearing  a pair of these, and when you listen to the playback afterwards with your eyes closed, it's spooky. You hear //everything//. Birds tweet above your head, supermaket trolley wheels squeak at floor level, car exhausts grumble past the backs of your ankles as you cross a road, supermarket doors --swisssh-- apart on either side of you as you enter.
"Human head" recording isn't quite free from problems. The main one is that you can't put on a pair of headphones to monitor what you're recording, real-time, because that's where the microphones are: you either have to record “blind” or have a second person doing the monitoring, and you can't talk to that person or turn your head to look at them (or clear your throat) without messing up the recording. If you move your head, the sound sources in the recording swing around in sympathy. Imagine trying to record an entire symphony orchestra performance while staring determinedly at a fixed point for an hour or two. Tricky.
The other thing to remember is that although the results might sound spectacular to you (because it was your head that was used for the recording), it's difficult to judge, objectively, whether other people are likely to hear the recorded effect quite so strongly. For commercial work you'd also want to find some way of checking whether your “human dummy” has a reasonably "standard" head. And someone with nice clear sinuses is likely to make a better recording that someone with a cold, or with wax-clogged ears.
Another complication is that most people don't seem to have heard of "in-ear" microphones for 3D human head recording, so they can be difficult to source: I had to order mine from Canada. 

Media

For recording and replaying the results: since the effect is based on high-frequency stereo colourations and phase differences, and since these are exactly the sort of thing that MP3 compression tends to strip out (or that gets mangled on analogue cassette tape), it's probably best to try recording binaural material as high-quality uncompressed wav files. If you find by experiment that your recorder can still capture the effect using a high-quality compressed setting, then fine. The effect's captured nicely on 44.1kHz CD audio, and at a pinch, it even records onto high-quality vinyl: the Eurythmics album track "Love you like a Ball and Chain" had a 3D instrumental break in which sound sources rotate around the listener's head, off-axis: if you look at the vinyl LP, the cutting engineer has wide-spaced the tracks for that section of recording to make absolutely sure that it'd be cut with maximum quality.

Sample recordings

I'd upload some examples, but my own test recordings are on minidisc, and I no longer have a player to do the transfer. Bah. :(
However, there's some 3d material on the web. The "Virtual Barber Shop" demo is a decent introduction to the effect, and there are some more gimmicky things online, like Qsound's London Tour demo (with fake 3D positioning and a very fake British accent!). When I was looking into this a few years back, the nice people at Tower Records directed me to their spoken word section where they stocked a slightly odd "adult" CD that included a spectacular 3D recording of, uh, what I suppose you might refer to as an adult "multi-player game". Ahem. This one actually makes you jump, as voices appear without warning from some very disconcerting and alarming places. I'm guessing that the actors all got together on a big bed with a dummy head and then improvised the recording. There's also a couple of 3D audio sites by binaural.com and Duen Hsi Yen that might be worth checking out.
So, the subject of 3D audio isn't a con. Even if the 3D settings on your PC soundcard don't seem to do much, "pro" 3D audio is very real - with the right gear, the thing works just fine. It's also fun.

Friday, 19 March 2010

Virtual Lego


Someone's finally come up with the "killer application" for VR and computer-augmented reality.

It's buying Lego.

You walk into a participating Lego shop, pick up a box of Lego, and walk over to the big screen. A video camera shows you your image. You hold out the box in front of you, horizontally, as if you're holding a tray.

The software sees the box, recognises which product it belongs to, and calculates the exact position of the box corners in three dimensions.

It then retrieves a 3D computer model of the assembled Lego model from its database, and projects a virtual reality image of the completed masterpiece onto the screen as if the completed Lego masterpiece is sitting on top of the box clutched in your little sticky hands.

You rotate the box, and on the screen, the 3D model rotates. Tilt the box and it tilts. Move the box around and you get to see the final Lego construction from different angles, complete with perspective effects.

Oh, and the computer-generated Lego image is also animated. If it's a garage, the little Lego cars scoot about, if it's a building, the little Lego people are wandering about doing their own thing, "Sims"-style, and if its a tipper truck, the truck drives about the top of the box, tipping stuff.

It's very, very cool.

Sunday, 14 March 2010

The Caltech Snowflake Site

thumbnail link image to CalTech's snowflake site, www.snowcrystals.com
While I was finishing off yesterday's snowflake post, I came across Caltech's excellent snowflake site at www.snowcrystals.com (Kenneth G. Libbrecht).

Lots of photos, lots of useful information. Caltech even have their own snowflake creation machine, that, instead of electrostatically levitating the snowflakes as they grow, or using a vertical blower, applies an electric field to grow narrow ice-spikes, and then lets the snowflakes form at the spikes' tips (which means that the central mount is probaby rigidly aligned to the resulting flake with atomic precision, and doesn't seem to affect the growing process).

If you're in the UK, and you've mocked train companies for blaming their electrical locomotive failures on "the wrong kind of snow", well, it turns out that snow crystallisation has a slightly crazy dependency on both temperature and airborne water content, forming a range of very different shapes, from the classic branched hexagon "christmas card" forms, to hexagonal plates or long hexagonal tubes (snowflake chart).

The CalTech site explains the wide variety of snowflake forms by this temperature-dependence: the idea being that snowflakes form symmetrically because the conditions across the flake are the same at any given time, and that the extreme variety of shapes is a function of the varying environmental conditions that the whole snowflake experiences as it falls through different regions of sky. It might go through a "spiky dendrite" phase, then change temperature and start trying to grow plates, and then go back to "dendrite" mode, and the exact amount of time spent in these different phases then dictates the shape that emerges.

If the identical patterning of the arms is purely a result of the identical (varying) growing conditions across the whole flake, then we don't require any additional mechanism for regulating symmetry. In that case, we'll expect individual snowflakes to accumulate diverging asymmetries as they grow, due to gradients of temperature or water availability or light or airflow across the flake. This'd seem to make the formation of extremely regular crystals a bit unlikely.
But the CalTech site argues that actually, most natural snowflakes are pretty irregular, and that people generally overestimate the degree of symmetry because the artsy folks who photograph them (presumably including CalTech!) give a misleading impression by carefully selecting out the "best" (most regular) flakes to photograph and publish.

That explanation seems to be a bit at odds with the current suggestion of how triangular snowflakes form, though: if triangular snowflakes grow because of airflow over the flake creating an asymmetrical growing environment, breaking the hex pattern, then if there wasn't an additional internal regulating symmetry-mechanism, there'd be no obvious reason why the resulting aerodynamically-disfigured flake should have 120-degee rotational symmetry. Airflow and a moisture gradient flowing across the flake in one direction might allows a bilateral left-right symmetry for the two sides of the flake that are experiencing the same growing conditions ... it doesn't explain why the conditions at the leading point of the falling tri-flake (falling point-first) should be identical to that at the two trailing side-points, or why points on the sides of those two trailing spurs points should be equivalent, when the airflow is hitting them at different angles. If triangular flakes are due to sideways airflow, then it means that the flake seems to be fighting to retain some sort of symmetry despite significant asymmetrical disruptive forces that ought to be destroying it. That'd increase the odds of there being a significant internal symmetry mechanism in play.

Of course, it may be that our explanation of triangular snowflakes is simply wrong, that airflow isn't disrupting the hex pattern, and that instead chemical contamination (or some other factor) is causing the alternative triangular crystal structure. But that'd still mean that something in our current understanding of snowflakes is wrong or incomplete. Even if yesterday's wacky suggestion about the quantum mirage effect is midguided, we'd still not know why snowflake formation is so sensitive to environmental conditions, or what the (non-aerodynamic) explanation of triangular snowflakes might be.


So again, more research needed.


The Caltech site's debunking of "mysterious" causes of snowflake symmetry is in the "Myths and Nonsense section" at http://www.its.caltech.edu/~atomic/snowcrystals/myths/myths.htm . The page says that there aren't any special forces at work here regulating symmetry, that most snowflakes are asymmetrical and "rather ugly", and that the published examples (including the ones on the site) are atypical, because "not many people are interested in looking at the irregular ones". In other words, if you look through the published work, you get a misleading impression due to publication bias. Well, yes ... quite possibly. But since the idea of what counts as "significant" symmetry might be a bit subjective,and since the datasets aren't available for us to look at, it's difficult to take this as a definitive answer until there's been actual experimental testing done.

Water is wierd stuff, and it keeps catching us out. I remember when people used to debunk ice spikes as an obvious example of psudoscience, and now those are understood, studied, and have their own page on the CalTech site. A lot of "crazy" ideas about water do turn out to be just as dumb as they first appear, but a few turn out to be correct. The trouble is, it's not always immediately obvious which are which.

Saturday, 13 March 2010

Snowflake Engineering, Quantum Mirages and Matter-Replicators

Julia Set
One of the most impressive things about snowflakes is that we still don't really understand how they work.

We understand how conventional crystals grow – normal crystals assemble into large, faceted, regular-looking forms because the flat facets attract new atoms more weakly than the rougher, "uncompleted" parts of the structure, which provide more friendly neighbours for a new atom to bond with. So if you have an "incomplete" conventional crystal, it'll preferentially attract atoms to the sites needed to fill in the gaps, to produce a nice large-faceted shape that tries to maximise the size of its facets, as far as it can bearing in mind the original random initial distribution of seed crystals.

But snowflakes do something different. Their range of forms makes their growth appears pretty chaotic, but they also manage to be deeply symmetrical. It'd seem that the point of greatest attraction on a region of snowflake doesn't just depend on the atoms that are nearby, but also on the arrangement of atoms on a completely different part of the crystal, which might be some way away, and facing in a different direction, on a different spur. The sixfold symmetry of a snowflake suggests that when you add an atom to the point of one of the six spurs, the other five points become more attractive ... add an atom to the side of a spur, and we're dealing with twelve separate sites (twenty-four if the atom is off the plane). Add an atom to a side-branch, and a copy of the electrical-field image of that single atom is transmitted and reflected and multiplied and refocused at potentially tens of corresponding sites on the crystal surface. And that's for every atom in the crystal.

This would be beyond fibre-optics, and beyond conventional holography. It'd be multi-focus holography, and the holographically-controlled assembly of matter at atomic scales to match a source pattern – making multiple copies without destroying the original. It'd be using holographic projection to assemble multiple macroscopic structures that are atom-perfect copies of an original. And that idea should make the hairs on the back of your neck start to stand up.

The closest thing I've seen in print to this is the quantum mirage effect described in Nature, 3 Feb 2000. Researchers assembled an elliptical quantum corral of atoms on a substrate, and placed another atom at one of the ellipse's two focal points. They then examined the second focal point, and found that the atom's external field properties seemed to be projected and refocused at the second point, to give a partial "ghost" of the source atom [*][*][*]. You could interact with the ghost even though it wasn't there. Presumably your actions on the "ghost particle" copy would be transmitted back to the source, which'd be recreating the ghost behaviour by a process of electrical ventriloquism, using the elliptical reflecting wall to "throw" its voice to the ghost location.

Something similar may be happening in a perfectly-symmetrical monocrystalline snowflake as it grows. Maybe the crystal's regular structure just happens to not just split the image of the atom into multiples, but refocus them with phase coherence at all the key symmetry points. Maybe we could try adding a few metal atoms to one part of a snowflake crystal and seeing if matching atoms are preferentially attracted to the other corresponding sites.



A possible clue is the phenomenon of triangular-symmetry snowflakes.
It's been suggested that these form in nature when an asymmetrical snowflake falls corner-first, with the airflow disrupting regular hexagonal crystal formation (see also Wired). But since the remaining triangular symmetry is still so strong, this hints that perhaps the strongest linkage between crystal sites is in triples, with a secondary slightly weaker triplet attraction producing the hex.

Okay, so I suppose there might be problems in attempting to use giant snowflake crystals as matter-photocopiers ... for snowflake formation, every copied pattern forms an extension of the crystal, if you use the crystal to try to copy other things, then the "irregular" matter being copied is liable to disrupt of the focusing. You might only be able to copy layers an atom or two thick (at least, to start with).

But a giant atom-perfect monocrystalline snowflake would be an awfully fun thing to play with if you had a chip-fabrication lab with goodies like force-sensing tunnelling microscopes.

And to me, that was the one thing that could have justified building the International Space Station. The ability to build a giant, heavy-duty zero-gravity snowflake, hopefully one big and chunky enough to withstand eventually being brought back to Earth immersed in liquid helium for further study (what does Bose-Einstein condensate do when it's in in contact with a hex crystal?). That had to be worth a few billion in research money, and would have given the public something pretty to look at when it came time to tell them what the money had bought. We haven't done it yet, but maybe ...

Friday, 5 March 2010

Kylie Minogue and the Gorilla Experiment

Kylie, gorilla
To a large extent, we see and hear what we expect to see and hear. As newborns we're hit with a tidal wave of experiential data, a screaming torrent of raw sensory information that we have to learn how to deal with, and our brains' main coping strategy is to scrunch itself up until it's found ways of shutting out most of the din.

As infants, we initially lose neurons at an alarming rate until the remaining pathways can mimic (and to some extent synchronise with and predict) external datapatterns. We construct progressively more complex predictive mental models for how the outside world works, and increasingly live within our own models. We experience what we expect to experience, unless there's such a glaring mismatch that it can't be ignored.

It's a matter of data-reduction and enhanced reaction-times. We coast along, our experience being steered by sensory data but not dictated by it. If you're sitting on a chair, you don't suddenly jolt every few seconds and exclaim, "Chair!" – once the chair's been accepted you assume that it's still there until you're told otherwise. This internal secondary reality also compensates for the significant processing delays that happen in our brains – so that we think that we experience the world in real-time – by starting to react unconsciously to our internal models' predictions, before we're consciously aware of what we've seen. We live our lives from moment to moment in a state of continual anticipation.

Sometimes random data tickles our expectation-engine – when a black bin-bag blowing in the wind in the corner of an alley momentarily triggers an expectation of seeing a black cat, we don't just interpret the movement as possibly belonging to a cat, we actually see and remember the cat (until we look a second time and realise that it's just a refuse bag, and the rogue memory gets shredded).

These models act as perception filters and error-correction filters for what our brains allow us to register as reality. Information that's not compatible with the model (or not relevant) simply doesn't register on our consciousnesses, it gets stripped out as anomalous data and jettisoned before we have a chance to become fully aware of it.

The usual example for this is the basketball experiment, conducted by Daniel Simons and Christopher Chabris in the 1990s, but unfortunately, if I explain what the experiment is, it'll spoil it for you. If you don't already know about it, don't read anything else about it until you've watched this video and tried to count just the number of basketball passes made my the people in the white shirts. Then read the analysis.



The Gorilla Effect is now considered a classic, but what most psychologists might not realise is that in 1991, someone had already done a large-scale version of the experiment, using the UK's music broadcasting networks.

In '91, Kylie Minogue was still widely seen as a squeaky-clean pop songstress, freshly out of Neighbours, warbling heavily-processed Stock Aitken and Waterman lyrics over generic (and slightly cheesy) SAW chunka-chunka backing tracks.And that's when someone at the Minogue team decided to slip the f-word into one of the singles, three times, to see who noticed. Nobody did.

The single was called "Shocked" and charted at number 6.

" Shocked by the power, ooh-ohh, shocked by the power of love.
You got me fucked to my very foundations, shocked by the power, shocked by the power ..."

Whattt???

Uncharacteristically for SAW lyrics, “fucked to my very foundations” was actually a pretty great line for a pop song. Alliterative an' everything. I'd have been proud of it. And maybe that's why someone decided to leave it in.

Whether it was an ad-lib, like Atomic Kitten's alternative “You can lick my hole again” soundcheck version of their single, I don't know. But that's the version of "Shocked" that actually got broadcast, over and over again, on TV and on the radio. In a country that was obsessed with the F-word being used on music programmes, in which the Sex Pistols had made their careers by effing on Bill Grundy's show, and Jools Holland was suspended for accidentally let it slip on a live trailer for "The Tube" in 1987, and every Madonna single was eagerly being pored over by the UK press for possible naughty words or double-entendres that people could declare themselves outraged by, la Minogue got away with repeatedly standing up on Top of the Pops [a bit after ~7pm], and apparently singing her little heart out about how she was "fucked to my very foundations", three or four times per appearance, without anyone hearing it.

If you get hold of the more recent "Ultimate Kylie" compilation, the audio's different. They've either changed the recording or used a different version in which The Kylie is definitely singing "rrucked", with a pronounced "rr" rather than "fucked", with an "ff". But go back to contemporary broadcast recordings of the single ( thanks, YouTube! ), and yep – it's different.

The "Kylie" version of the gorilla experiment might be one of the biggest mass-media psychological experiments ever to take place, but unless you can get hold of contemporary recordings of radio and TV broadcasts, you might be forgiven for thinking that it never happened.