Sounds, Location and Perception

What is a Sound?

Reading various writers on the subject of sounds and their identity, I note what appears to be a level of confusion about what we mean by the word ‘Sound’. What is a sound? Where is a sound? What is it’s ontological status?  The various writers (See bibliography below but a very good introductory essay can be found at : speak of four main theories: The Proximal theory – that sounds are where we are, the Medial Theory; that the sounds are in the medium (air usually), the Distal Theory; that sounds are at the site of their generation and finally an Aspatial Theory, that space is not relevant to sound.

What is bothering me is the lack of a human dimension in the discourse. I don’t think it is necessary to complete a theory about things apart from persons. It is too remote from us to have the relevance which the topic demands and all talk of sound is predicated on a human auditor. We want to know what these things are in relation to ourselves. To discourse on trees falling in forests (which is the unstated question that lies behind all thoughts about the ontology of sounds)  is too abstract to be of use. I take a general philosophical stance that things signify in relation to the entity to whom they have significance. In short, ourselves. ( I call this Robust Pragmatism informed by Naivety – though I would welcome a professional philosopher’s comments to help  make this work out more rigorously). I’m happy to call upon support from such a professional:

‘…sounds directly perceived are sensations of some sort produced in the observer when the sound waves strike the ear. (Maclaclan, 1989)

I want to say that a sound is only a sound when experienced by an auditor who has the apparatus to detect it and the intellect to interpret the incoming waves in a consistent manner. If we consider sounds to be just waves in the air of between 20hz and 20Khz (an ambitious range for the over twenties in reality) then what of waveforms that lie above and below? Are they still to be called ‘sounds?’  There is also an implicit duration assumption of there being a repetition of the alternating compression and rarefaction of the air – were it to happen just once, might we not really hear a sound?  To paint an extreme picture – consider going up a mountain in a car until your ears pop then down the other side. Technically you have just experienced an ultra-low frequency (one cycle per half hour say – 0.00055 Hz) of one rarefaction followed by one compression  (on top of many other wave events) – could we justifiably (and according so some theories is might be so) be said to have ‘heard’ a sound at this frequency?

I want also to say that a sound contains information that has meaning to the receiver. Let me start with cats or some simple animal – the information it receives may only be fight or flight, come for food etc., but it is a sound to the cat – because it elicits a response that is causally linked to the wave stimulation is reveives. I know that sometimes the cat appears not to hear (but that’s between me and my cat). What can we really know of an animal’s experience of sound other than its response or lack of it. If I communicate with a friend and process my words electronically to be in the 40Khz to 60Khz range  they will not hear it. Can I be said fairly to have communicated simply because I created a wave in the air? I would suggest not.

Some writers have spoken about the objects vibrating and still ‘sounding’ even if in vacuum. I would contend that I could still ‘hear’ a sound in a vacuum if – for example – I were bounce a laser beam from it’s surface and create a sound from the reflection – it is still a transmission of a wave using only a bit of translation equipment – just like the air waves translated by the ears.

Location of a sound

Much gets written about the location of a sound. Where is this sound in relation to us? Bear in mind; the locational information about the source of the sound if not encoded in the waves of energy being emitted by the object.  The source knows nothing of the space in which it takes place and yet writers speak as if it did. As if the  information about its location were part of the wave.  Let’s go on a journey with a wave:

1        A sound is generated. Let’s keep it as simple as a finger snap.  (The reason for this is that it is a point source – I could enlarge for many pages on the problems of musical instruments that are rarely point-sources but a complex array of sources). There is a sphere of sound waves (pulses of rarefaction and compression ) radiating from the source at a speed of about 350 m/s – not forgetting that this is a approximate number depending on air composition and density, pressure and temperature plus other subtle factors. As soon as the wave has travelled an infinitesimally small distance it will have changed – albeit undetectably at this point but; it will have been subject to the air’s variable elasticity and have been modified in tone and volume in a Nano- second. This process of tonal and volume change over distance is continuous right to the point of audition in the ear drum at which point these factors cease to change. If we go too far away from the sound source then the energy of the waves (spreading and diminishing according to the inverse square law) reduces to Brownian motion in the air and is therefore as undetectable and lost as a homeopathic remedy.

2          The sound travels in air, constantly transforming and can be said to be accruing information as it travels. It does not know how far it needs to go to meet an auditor but it collects tonal transformation and volume modification as it goes. Within a very short time it will also encounter a surface – often the floor or ceiling first but as soon as it does so, it generates a reflection of itself (losing a tiny bit of energy to the wall in the process). Another part of the expanding sphere of sound waves will encounter another surface – possible a wall and another reflection will be created and so on until all the primary surfaces have been encountered. By the time the furthest surface has been met secondary reflections from the first surfaces will be following the main sound and will also be travelling to the next surface they encounter and so on. We soon get to tertiary and quaternary layers until we end up with something we clearly call reverb. (I will leave the special case of echo aside for now but the keen reader can quickly fish out some determining factors and add them to this account as well as commentary on early reflections) The totality of reflections and complexities of the wave being added to also gives us information about the building it is in. To a musician, nothing describes a building as accurately as the sound it makes.  But the sound still has no regard for the listener who may interpose her ears at any point or time in this journey but it is still accruing information as to the distance from its source and the surfaces it has encountered.

3   Finally it will encounter the listening body but let me start with a simplified case of a single monophonic microphone which will be a dumb witness to the sound as it arrives.  The apparatus will record the energy-impeded and tonally modified sound and from that a later auditor to the recording might infer some things about its genesis (such as what sort of object made the sound and how it was excited etc.) but certainly not whence it came. At best they might name the source to the extent of being able to repeat the sound. ‘It’s a finger snap.’ But the sound at this point has also accrued its multiple reflections, and the characteristics of these might lead an auditor to make some assumptions based on their experience of listening and their experience of spaces. They might quickly know that it was recorded in a church or a studio or a domestic environment. In a special set of circumstances (a trained engineer working with established recording venues) might be able to name the space – ‘That’s St. Paul’s Cathedral’ – or ‘that was recorded in Air Studio’.

I won’t go into other philosophical inferences that can be made from the sound such as it was a person with a hand and fingers and that there was intent – such inferences to back to the big bang (and beyond! I hear a philosopher want to rush in and fill our existential vacuum).

Let me add a second microphone to the set up so that we can consider direction of the sound through use of stereo recording (remember at all times that ‘stereo’ just means ‘solid’ but we will proceed with the standard assumptions about  what the word means). If we are to set up a stereo pair however, how will we orientate it?  An engineer will have set it up to be more or less tangential to the circle of sound waves arising – i.e. by pointing the microphones in the general direction of the sound source – so they will be adding information to the recorded sound by this simple act (but not to the sound in the room, of course). A true recording of the source might be better if it were made without regard for this and simply picked up – in stereo, what happened in the room. But this leads to difficulties. In some orientations it will not yeild directional  information easily – such as side-on to the source.

This ‘adding information’ to the sound is either witting or unwitting and is manifest in a vast array of interlacing frequencies and amplitudes that in theory could be reduced to sine waves (but which never seems to work very well when synthesising sounds – we are seldom fooled).

Here though, we have a clue as to what sound and location is all about. The act of placing a stereo microphone reveals that the information added by the selection of its location and its orientation is exactly what we as the listener might do when we go to listen to something.

Our left and right ears intercept the sound wave and ‘hear’ what is going on.  It is only at the point when the sound wave reaches the outer ear that all the location information is added. The stereo microphone, correctly aligned, will have picked up a couple of important clues as to location – the interaural level difference (ILD) which is the difference in volume between left and right ears will be captured more or less faithfully by a correctly rigged microphone pair. The interaural time difference (ITD) will also be captured (the difference in arrival time to the two ears is an important directional clue but a moments reflection will show that ITD and ILD are not sufficient to tell front from back – still less about up and down. ITD and ILD give as much radial information as can be captured in a microphone.

Note that it is only possible to speak of these things at the microphone. In the air in front  of the microphone – it is the presence of the equipment and its orientation that creates the point of measurement.

4          The microphones have still not captured the full spatial location however. They will give us clues as to right and left and the sound itself will capture distance information but now we have to look at the function of the ears and the head.  The ears and head form a complex system and these have been measured and calibrated to understand the function of all the parts of that system. The spatial information we depend on to locate sounds in space is a combination of the distance and direction information that is a variable at the microphone but the physical structure and orientation of the head now adds its own information about where we are in relation to the sound source.  Each part of the ear modifies the sound in tiny subtle ways which are individual to each of us – we all learned how to use the shape of ear and head we were born with to translate information about sound and their location. (but note how a new born child rotates its head to learn the link between the two)

Head Related Transfer Functions (HRTF) tell us how the head itself and its orientation affects how we hear.  The trouble is our unique personal HRTF that cannot be codified into a recording – though there are some generic data sets that might be applied to a recording to recreate a sense of location.

5          We move our heads. When listening we  ‘cock an ear’ as it were. By moving our heads we add to the information layer and include more about phase relationships in the sound (not dealt with here but again sufficient material for another day), we get to create a small shift in volumes and tones that give us more clues as to location based on our primitive sense of  locating danger and we employ all these almost like an analytic instrument to  locate the sound.

6          Brain processing takes place.  Bregman’s book on Auditory Scene Analysis is too large to summarise here but it describes and provides analysis on how we sort out incoming data streams in our minds and separate out what belongs in which group of audial experience – hence we can pay attention to one sound sequence whilst another is  sounding and we are not confused by a flute melody and simultaneous chatter from a child for example.

One can envisage scenarios where all the above criteria are in place but we do not hear the sound as a sound because the cognitive faculties are not correctly align or trained to respond and ‘tell the mind’ that such and such is a sound with content and meaning and that we have to respond to it (even ignoring it is a conscious response to it)


When we talk of what a sound is, I suggest it is the net effect of all the above factors and until it has been interpreted by a conscious entity it is only vibrations in the air with the potential to yield information in the right circumstances.

To speak of where a ‘sound’ is located seems like a senseless question unless we recognise that it is only a sound when the brain has processed the incoming impulses from the ear – and it is only an information rich sound to which we can respond when it  has been matched with our experience of sounds and we know what it means to us.

Defining sound as being ‘in the source object’ seems not right because of the  transformative journey the waves must take. Vibrations with the capacity to excite a transmission medium, that are shaped by the material of the object might be a start. Medial wave propagation has some useful features as long as we realise and acknowledge that it is a time dependant thing (as is all sound of course). We might take an infinity of time slices to define it (and I don’t have time here for differential calculus but other researchers have) but it seems to me not to be the sound – but an (unreliable!) transmission medium. And the proximal theory leaves us with hearing what is at the ear – but it still needs the intercession of intelligence to make an air vibration – no matter how sophisticated – into a sound that has a meaning.

I will have to write up all the steps between my assertions and show all my citations (work underway!)

But here are a few key sources:

Blauert, J (1997) Spatial Hearing: The Psychophysics of Human Sound Localization. Cambridge Mass. : MIT Press.

Bregman, A.S 1994 (1990) Auditory Scene Analysis: The perceptual Organisation of Sound, Cambridge, MA: MIT Press

Ed. Nudds and O’Callaghan.(2009) Sounds and Perception, New Philosophical Essays. Oxford: Oxford University Press.

Maclaclan, D.C.L. (1989) Philosophy of Perception. Englewood cliffs, Prentice hall.

O’Callaghan, C. (2007) Sounds: A Philosophical Theory. Oxford: Oxford University Press.

The bibliographies in these books alone will give you a lifetime of reading.

mjkm August 2013