Thinking about Music

All content on this site is copyright mjkmercer

Category: Recording and Technology

Recognising Sounds – Knowing What we are Hearing

Note- See my Essay – ‘Sounds, Location and perception’ which is a prelude to this and covers the journey of a wave and its transformations from source to the ear. 

Once the ear has received the complex array of waves (now collected at the ear canal as a pulsating tube of air) it falls to the rest of the auditory system to ‘work out’ the content and its meanings and to identify which components go together to form each discrete auditory stream. The apparatus now involves the brain and its experience of the world to carry out this almost instantaneous sifting – first to detect threat (which would make us jump before we knew why) then to look for messages with meaning for us – that we need to respond to then by focusing and making a decision to do so, hearing all the background sounds. This is Auditory Scene Analysis.

The Oncoming Wave

As I sit writing this, I am aware of the following:

There is the slightest whisper of short strokes of my pen like silk on paper; the paper emits a tiny hollowness as the nib touches the surface and behaves like a membrane (I’m resting on a pad not a desk). There is also the soft breath of my computer fan – so continuous that I usually don’t hear it. Outside (I know it is outside because of my location perception apparatus) a bird unwinds its minimalist song. I identify what sort of bird it is. If I could, the experience would be different. It’s sound stream would be that of an ‘x’ and would form an impression complete with a label and my familiarity might downgrade the event to the level of the computer fan. But being a musician I enjoy the tune and it is something I have heard before and recognise.  This is a special sort of sound stream in that it could be interrupted and I would still know what was coming next. Consider the inevitability of each note of a wood pigeon’s call. I’ll come back to this point about known sounds from  knowledge templates versus unknown sounds which become categorized by extrapolation from templates of similar sounds.


Theory of Forms and Sound Recognition Templates


I need to digress into Plato’s Theory of Forms briefly – so here is an early philosophy warning.

Sounds have form that exists in time and space. The sound of a car going by is such an event: it is a unique event yet totally familiar. Were we to record it and look at the waveform in great detail we would know in keeping with our knowledge of things like snowflakes, that we will never see the same wave form exactly the same ever again and yet there will be plenty of cars passing by. Each event has sufficient characteristics in common for us to be able to recognise the sound and label it because we are able to access the idea of a car sound in our minds and instantaneously realise that it conforms to that general form.

The sound recognition templates (forms) that we have in our minds can only be broad in scope. We cannot hope to match a sound identically – it might appear at a different distance, be a different car, different surrounding acoustic, we might have the window open or closed. So many factors guarantee that we  almost never will hear the same sound twice.

We recognise the sound of a violin easily and based on our wide experiences of violins we are able to get a close match to the templates we have. Were we to experience an er-hu (Chinese two-string violin) having never heard one before, our cognitive processes will go into overdrive looking for a match.  The mind will offer the template of a violin to our understanding, but it will also inform us that it was not a veridical match. That something was different about the sound that defied the quick labelling. We might choose then to focus on the sound to describe to our inner process a wider experience of the sound. On being told what it is, we then understand what was heard, what the name is and we file away a template for the future. (Imagine if we described an er-hu as being a little like a crude violin – how that might change our experience of ‘World Music’). However, this is only one instance of an er-hu so far. If, in future, we heard a very similar sound it might not match the template that was formed with one instance alone. Perhaps we hear the er-hu in a different context – say in a Chinese orchestra and the mind struggles because it has a hunch about what it is hearing and we might look at the programme and find the word. Then we can re-categorize the template with two instances attached to it – this is a gathering experience which, in time, will form a life of experience and cause us to say – ‘yes, we know what an er-hu is,’ without us having to go through the room marked  ‘violin samples.’

This is like Plato’s forms or to be more up to date and put a similar spin on it (perhaps these different theories of forms aspire to an ideal theory of forms too?) I’ll cite Wittgenstein’s ‘family resemblances’ (In the next draft of this I’ll find the necessary quotes and references to keep us all connected to the world of ‘stuff already said by others’)


Auditory Scene Analysis


I’m still in my study, listening…The bird continues to make its sound. But how do I know it is one bird and not two? It might well be of course – I cannot know each bird so intimately as to be able to distinguish individual sounds – (though the bird probably can).  But I can extrapolate from the incoming sound wave a single thread of sound.  This is where I either recount everything Albert Bergman wrote or just send you off to read his work (See bibliography below)  but to save time here is a rapidly and easily digested summary from  Wiki….


What we learn from this? That the auditory system has powerful tools to sort and label and understand streams even when they appear mixed. There is intelligence at work and it may give you a small thrill of pleasure to know that even a powerful computer struggles to do this – but hold on to you humour because they are coming…


The Sum of all Sounds


More sounds are layered onto each other in my room: Vehicles, voices, violinists (daughter at practice) and so on. All that I can hear is to some extent familiar but in another aspect unique. Similar though they may be I have never heard them in this context (this mix of other sounds) at this distance, or this room  reverberation , or with this  physical make up (my ears are particularly acute today)  and so on.


To sum the listening experience of this moment;

1          The low hum of large vehicle

2          The bird sound

3          The sound of the pen on paper

4          The fan in the computer

5          Murmuring female voices somewhere – indiscernible words

6          Slight creak form my old faithful chair

7          Distant violin music being played

8          Clock softly ‘chucking’ (It’s a deeper sound than a tick)

…I shall pause and sit still in contemplative listening for a moment….


9          A very distant jet somewhere high above

10        A dog barked.


In the moment of the dog barking, all the other sounds became masked but I know they had not stopped and sure enough they re-appeared a moment later when I widened my attention again.  Masking and the illusion of continuity behind the masking is another piece of lovely hearing perception theory, I am very keen on Brian Moore’s book (see Bibliography)


My point – all this is happening on a quiet morning early,  before the day has really got going.  All these sounds are there to be picked out of the incoming stream and labelled. And I know what each one was. Being of a musical disposition, if I heard something I could not label, I would normally feel compelled to investigate – especially if I could hear musical potential.


So all these sounds are conforming to the templates I have for them and they have sorted themselves out in labelled streams I do not confuse the hum of the lorry with the distant music – though I could imagine that that might happen in some circumstances. As a composer I could make all sorts of things come together in the studio through careful mixing, but at the moment the sounds are from different directions, different distance and they are therefore not confused.

This single ‘mix’ of the moment ‘now’ (surely grounds for a Cagean aleatory composition?) presents a single vibrating pipe of air to the eardrum and the mind sorts it out for us and labels each experience with word or familiar feelings. It still also works to identify the location of the source and so on. The interesting thing is that I seem only to be able to give my attention to one at a time or I can chose to let the whole unlabelled sound wash over me as if it were a single source and this ability we have is important because of the implications for music.


Musical Implications


My ears have to work out how to hear an orchestra. I can, of course, hear the different parts that make up the sound I hear. But is that really so?  What I actually hear is groups. I cannot, for example, discern individual violins (unless one is particularly loud or bad) but I can pick out the flute. But then the flute is joined in unison by the oboe forming a new single sounding texture. The two combine because they are part of the same auditory scene, their timing events are identical and their pitches change in unison. A good example of this blending of different sounds is the way in which organists create sounds by mixing and layering different sets of pipes in ‘registrations’.  We hear a single event or single event texture. We hear a single violin line – unless suddenly the leader plays a melody above the texture.

This auditory scene analysis is critical to our understanding of how we hear music and how therefore we are going to record it.


When we listen to music we hear the whole thing or we chose to attend to parts. (Listening to a fugue on the piano is a supreme example of this and the best advice anybody ever gave me for listening to a fugue is to ‘go with the flow’).


Separating sounds out for us – such that we do not wrongly mix them up are:


  • Pitch differences and  similarities
  • Timing differences and similarities
  • Following or not following a pattern (sequence of timed events like ticking clock)
  • Timbre
  • Location or apparent source
  • Event (part of the show/not part of the show)
  • Visual information about the sources


These factors help us to match different parts of the sound to templates to be able to recognise each as a separate element.

Sorting out Source Location

I wrote previously about how a sound accrues information as it travels to the ear concerning the source location in relation to the listener. The listener also has to process that data at the same time as carrying out the scene analysis.

In the same way that we have templates to recognise sound types and patterns, I suggest that we have, a template or model of the world that helps us sort out source locations. I am not sure if I should try to confound things by suggesting that each sound template that guides our recognition holds all the possible variants of location within  – that seems an inefficient way for the brain to do things. I am going to suggest (and try to research further) that localisation (spatial) processing is from a different set of templates to those we use to identify the sound itself. Some of those templates for location might be linked to the fight or flight sound identifiers and cause rapid alarm in us primitive beings who still want to jump and run if we hear something like a wasp approaching.


As I mentioned earlier, at the eardrum, there is only a vibrating column of air. (I will keep it simple but I am aware that there is other information available through the vibrations coming to the back of the eardrum through the head and through information picked up through bone vibration etc.) The intelligent ear has a means of assessing how far away a sound is. Sound changes with distance and it changes in level, reverberation and tone.  When we hear a sound thus modified we know immediately that we are hearing, say, a trumpet at a distance rather than one close to, and processed. (This we can register from a microphone and thus filter out extraneous information (for now) about its general direction.


But what if we were to do just that – take a close-up trumpet sound and drop it to the rear of a mix. This happens all the time in the studio and good engineers know it is not just volume and reverb but careful adjustment of the EQ that gives the desired result. The level of verisimilitude seems linked closely to the engineers understanding of sound propagation. In my experience engineers brought up in the ‘hands-on’ school of mixing – or  ‘do it the same way as Bob’ as apprentices soon find out, are lost when it comes to working in these more subtle ways. The rock techniques for  moving a sound to the rear will not however work very well when trying to creates a realistic soundstage that  contains  reproducible  distance information in particular and location information in general.


So we have in us, an innate ability to assess how far away a sound it. This must be informed by knowledge about where we are. We will know that we are in a cathedral or outdoors for example. Experience tells us something about what to expect and how a sound will behave in these environments.


Recreating The Experience


The problem for recording engineers is to recreate what the ear has heard convincingly. There is not space here for a general review of stereo and  multi-channel techniques (I’ll write it soon though).

There is much that can be improved in the stereo recording sand mixing process by understanding how sound gets to the sentient mind, and much that we can design as a solution to improve that.  It is vital to understand that:


At the point where the microphone receives the sound most of the location information will not be recorded. It will give some distance information, it will give an approximate direction but it will not pick up what the eardrum receives.  Were we to insert microphone capsules in the ear and record the sound that gets to the drum we might have more information to work with but because of  Head related Transfer functions, the sound will present uniquely to the individual whose head is being measured. It was in the hope of getting round all these problems that binaural recording was invented – placing microphones in a dummy head to mimic the way our own heads work.


More soon.


MjkM August 2013

Training Sound Engineers to read music

I have been campaigning (unsuccessfully) for many years about improving the training that sound engineers get. I have argued that to spend three years at University  in a studio learning all about rock guitars and drums and so on is fine but why not learn  a lot more about  music? It seems a terrible disappointment to me that after three years the graduates seek work with no knowledge of music at all in some cases. As a bare minimum they should be able to play one instrument to a reasonably credible level.

I have good reasons for saying this…

1   Its not that hard to learn enough about music to be able to follow score and know what is going on. You don’t have to be a sight reader – you are not the artist  – but if the player (who can read music) says lets go back to the  – (what ever musical term you like here .  the rit,  the Bm chord, the piano entry and so on) the engineer with little understanding of music will be lost. Why would you not want to be able to speak that same language as the musicians? I have often given a single 90 page book to young people wanting to get a career in music and told then just to get familiar with its content and practice following score (a Beethoven sonata will do)

2    You need the work.  You need to stand above the other applicant and a flexibly skilled person is far more likely to get the job. Studios are not full of rock bands day in day out – unless you are lucky enough to work  at a major specialist studio (and most of those jobs are very over-filled with long term staff who will only move on by dying or going deaf – and some still hang on after that!) The real world of studio work is a mixture of rock, folk, pop, classical, jazz,  schools music, Karaoke and so on.

3     You will never be a great engineer if you don’t understand the thing you are working with. How could you?

4     If you don’t play an instrument how will you know when somebody is playing well or not – or even – will you notice if it is in tune and in time? Some engineers, I grant, can do this without being able to play but they still lack any empathy with their guests.

Let me nail the point for you. When I interview staff to work in my studio it is a pre-requisite that they can read music and play – and that is just  for the non-classical side. It was my frustration at finding too many people training on a production line basis  with  theory diagrams and no real listening experience, or with only one string to their bow (‘I do drum and bass’)  that led me to wonder what on earth they were doing for three years. Really – the technical knowledge of how to the use the equipment can be taught  in a single term.  Reading music can also be taught in a single term. Love of music in general –  a lifetime.

Other things that are not taught well everywhere:

Understanding and appreciation of  all musics

Understanding an appreciation of all instruments

Listening truing on somewhere in the region of 1000 critical tracks across a wide spectrum

Attending a wide range of live concerts to see what the real thing is like

Learning to manage projects

Learning to get the best out of people and encourage musicians


There are some Universities that understand what it takes but  too many institutions are keen to claim successful graduations rather than  training them for the real world. I could go on….



Electronic music and the idea of Performance

Something that always bothered me about performances of electronic music was the lack of a human touch. It might be that this was a desired outcome – certainly some composers would agree  that the mechanical nature of the sound is within the aesthetic realm they wish to explore.  It was whilst working with CSound [1]  the other day that got me thinking. I had earlier spend quite some time working on the performance of a guitar piece,  fiddling with  slurs and fingerings until it sounded right, then trying out a different plucking position to see if the sounds might blend a little better and it struck me that there is no equivalent refining process in electronic music.  (I am sure I am wrong about this so please flood me with counter ideas – a conservatoire of electronic performance maybe?)

There might be a number a number of reasons for this.

Firstly,  in the early days there were few musicians working the field – it was lab coats and degrees in physics and maths and it showed. Much of the output was clever but  didn’t move anybody- it seas not an emotional experience and this is still true to some extent today. We are fascinated by what we hear but not moved.

Then,  there is usually one and one only performance that may become a CD that is never recorded again. Some of the recording artefacts are arbitrary and  unrepeatable, often the composer (who is also the performer) has moved on to new ideas, perhaps the set-up (swing microphones) will always produce something so different through the  mechanical set up that the human touch is not relevant to the realisation.  (Institute for excellence in microphone swinging?)

Nobody gets the chance to go back and re-look at how the piece might be better realised.  Some of the great electro-acoustic pieces are fixed in that what was done, was done and is complete – sealed in its time capsule recording forever. Whether we are discussing the  “Etude de Chemins de Fer” of Pierre Schaeffer or  a more recent work such as ‘Mutations’ by Jean Claude Risset or  even more recently, the wonderful work of Robert Normandeau (google and enjoy) we are talking about works whose realisation  is also a closed door on its future. I’m  sure not many scores exist for future generations and fewer instructions  other than the CD itself.

The other reason that little is said about performance is that there is not much ‘hands on’ work (except where processing live acoustic instruments). Working in Csound or in an electronic sound lab processing a recording, you get the feeling of something much more like either computer programming or hand- knitting an item of winter wear.  There is little or no discussion over the expression of a particular sequence  (melodic) or the balancing of a chord (Harmonic). Perhaps this is also because it takes months to programme something in Csound  that the idea of  slightingly stressing and sharpening a leading note in a violin line  is from other world. In the electronic work the composer is more likely to  require that a filter be opened in the last  five second of the piece.

Where I am going with this is that they are different worlds. Performance of acoustic instruments is  to some extent a black art; some never get it right, some train for twenty years to become the great performer we hear but in electronic music, excellence is more  concerned with mathematical  innovation and sound mixing  than with the idea of performance itself.  And I mean to include live performance in this. Performances I have been involved with were more concerned with just getting the technology to work as specified , but no musician had rehearsed the piece, nobody has worked out a series of options that would serve the sound better, nobody was interested in the man/machine interface.

There has been a lot of work on midi interfaces for performances fro address this lack of human touch, but there is no literature on how to realise any particular piece and what refinements might be available and so on.  There is little in the way of musicological analysis that might be the starting point for interpretive discourse. Perhaps it is time I wrote one.

Maybe this situation comes about because of the fundamental unrepeatability of the event.  There is no wrong reading of the score, no better interpretation, no comparative score from a similar composer  that would inform its realisation.  For this reason it is to me a different world. But this is not to denigrate it – I love the things it offers and I am listening to Robert Normandeau as I write.  But it is with very different ears that I will later assay the new recording of the  Britten Cello suites I have just bought.

I dont think I have said all I want to on this topic yet…. keep an eye out for a sequel.

[1] Csound is the lowest level  sound programming language for electronic music – it’s free software but a ten year study to master and each composition takes an eternity to realise simply because every event has to be specified – in incredible detail to get it right. Its a bit like writing down everything involved in making a cup of tea – down to the three dimensional mathematical trajectory of your hand as it lifts the kettle – an exercise in  Zen mastery. (

Space in Music – locating sounds


The spaces in which we perform, record or listen to music confer a layer of ‘meaning’ or invite interpretation of the music. That meaning might be created wittingly – in that one of the people involved in the process of music making made a decision or instruction regarding the perceived space and arrangement of the sounds. Or seemingly unwittingly wherein an particular environment such as a church has imposed its acoustic on the sound. I will however argue that such a thing is a part of the cultural heritage of our music making and is based on decisions made centuries ago when it was decided –  for example – that churches should host choral music – the invention and performance of which is not so easily imaginable in a field or small building though just about tolerable in a castle but which would then lack the  cultural milieu in which to make music. You need a strong sense  of worship or the numinous to kick start choral disciplines.

It is interesting that many textbooks, theses and commentaries on perceptual space and  in music approach from the points of view of psycho-acoustic or physics and even perceptual  psychology (See Brian Moore 1997). This text approaches the same considerations from the musicians point of view – more specifically,  from the point of view of the composer, the performer, the producer/engineer and the listener. I am interested in the implications of the decisions made by each of the participants in the music reproduction process.

We can begin with intentionality on the part of the composer ranging from the acknowledgement of and utilisation of acoustic reverberation in the composition of choral works for performance in cathedral spaces, through to Wagner’s demanding ‘space’ requirements for staging  his works – in particular, the Ring Cycle and to the modern era in which sound recording has been the largest catalyst for innovation and development in music.

To what extent does a composer consider the space involved in his composition. In most cases it is an arbitrary arrangement born of necessity: The seating of the orchestra and the acoustics of the environment in which it is performed. Coupled with this is the  ‘rule book’ ways in which recordings are made. The composer can more or less assume that the recording will be from the point of view of ‘the best seat in the house.’

But might composers be freer to specify more. – might there be a whole language of spatial articulation and implication were it to be made available? Examples might involve the consideration in meaning of a song sung in a dry acoustic versus one  performed in a vast stone space.  – the song might be the same but the meanings differ.

From the performers point of view, the spaces in which we perform, and the implied space in which a recording takes place tells the listener how to ‘read’ the piece. Most obviously the  difference between a singer songwriter  recorded in a dry small space will have an intimate  feel whereas the same performer recorded in a church will have a wider, more broadcast aspect. (terminology to be determined within the text below). Performers require feedback from the space around them. singers require reverb otherwise they cannot hear themselves. The right acoustic can enliven a performance of an individual or an orchestra. The singer might well move in a space to be heard better – pianos and instruments may be moved on a stage to favour a particular acoustic. Performers  seat themselves in space and chose where to sit based on primarily – tradition. Composers  communicate through the score to the performers – often with very specific staging instructions (John Culshaw Ring Resounding 1967)

Producers make interpretive decisions concerning the recording often in conjunction with conductors and musicians and in some serendipitous cases with the composers themselves. The decisions they make concern the  illusion of the space in which the sound takes place – and the location of instruments. This decision is more important than might at first seem obvious and has a direct bearing on hearing music correctly (a good example being the loss of identity of a chord that gets spread to widely in space). Composers in the modern world are more involved in the recording process than ever before as they have a knowledge and training and awareness of the techniques available and have thoughts about the controls they would exercise on the sound. Very few however make notations or demands on the recording process in their score. This is a feature that has been available for some time and one of my key outputs is to suggest a means of communication between composer and producer to ensure that the right recording gets made. The score always contains instructions to performers where necessary but rarely is there information on  – for example – the size of room acoustic that would suit the piece best.

I know that many view the presence of a composer in the studio as a blessing at times and a nightmare at others so I have tried to separate out what a composer might intend in contrast to what the producer might think.

And the listener. How do we cater for a lister that these days consumes their music on rather lower orders of equipment that in the hey day of hi-fi. It is true that the equipment has become more reliable and standard but  little attention  is placed to day on seating location with respect to loudspeaker placement – indeed most home do not permit such considerations. Most peoples’ enjoyment of music over loudspeaker is over a spread acoustic with the little  precision in location OR it is through the extensive use of earpieces which in their own way, limit the tonal range and experience of the music.


Space and the reproduction of space have developed since the invention of sound recording. In the early days of monaural recording space and reverberation were nevertheless available to some degree however the technology of the time demanded close proximity to the microphone to permit direct mechanical transmission form the instrument being played to the soft wax recording the sound. The subtly of a present acoustic largely ignored.

It was only later in the development of sound recording that sufficient fidelity existed for the acoustic to  be captured and heard.

The other dimension taken for granted today is the location in space of the sound source. A monophonic loudspeaker might be able to reproduce a sense of depth through the three prime depth indicators – roll of of high end tone, lowered volume and increased reverberation, but it was only with the advent of  stereophonic reproduction that the location of the sound became a consideration .

Spatial Location in concert music music has of course been a matter of convention, placing violins on the left of the conductor and so on – and early listeners of monophonic recordings may well have transcribed their  concert going experiences onto their listening.

It seems strange to us now to hear stories of  singers miming to recordings of themselves and fooling anyone in the early days of reproduction. Perhaps we do project our mental image of the sound onto the recording.  Certainly, we listen now at a lower level of high fidelity than we did in the 60’s and 70’s – sales of high end  equipment are for a very few and generally, classical oriented listeners.

For the purposes of this  essay, I will consider sound recording form the era of the high fidelity stereophonic recording in the 1950’s to be the beginning but I will not ignore that space and sound take place in earlier form of recording.

Space and location of  live performance of course cannot be ignored, and consideration of those features will be included – particularly with regard to how we demand our recording s to me made (Imagine if you will, Allegi’s Misere recorded in a dead studio environment and you will immediately see how much spacial and soundscape has become embedded in the  performance process and the listeners expectation) . Liturgical sounds belong in ecclesiastical spaces  -we might say, concert sounds belongs in places that sound like concert halls and electro acoustic music belongs in a fictive space created by  computer internally.

But  spatial listening begins with our environment and our listening equipment. We were born with two ears placed to give precise spatial location which presumably, the evolutionary biologists will argue and demonstrate that this gives us a competitive advantage in the world world of self preservation sand fight for food. Further discussion of this is not in my scope.

The early dawn of man as we recognise him does however contain some item of relevance to us. Recent theorists (Cross et al 2012) have posited that the cave dwellers were aware of and placed significance on acoustic properties in caves and that particularly ‘rich’ spots held such significance for theme that this is where they left their marks: cave paintings. Such assertions cannot be proved but there are clear indications and correlations to show that the  sensitivity to location and its sound was a more important facet to early man’s selection of places to be that was previously thought.

Clearly the sound of the environment ‘meant’ something to them. What it might have been we can only guess. To go to caves now ourselves and experience the sound in them – by intoning long vowels for example, leads us to discover echo, reverberation and then high and low spots, sweet and sour spots ands so on.  Recent  reports indicate the we echo-locate in subtle ways and that many blind people  have learned to listen and steer around object reporting lampposts as “a dark shadow past my ear” for example.

We also understand from  physicists that the higher the frequency, the more  specific the  location of the sound hence, low bass frequencies do not require stereo reproduction and may  be left to the work of a single specialised loudspeaker somewhere in the room.  The higher up the frequency bands we travel to the more ‘on centre’ we have to be to experience a strong sense of location in listening. The highest of instruments (unless the reverberation is out of control in the recording) being easy to ‘point to’.

Sound without a space in which to manifest is a largely unthinkable construct even if we take electroacoustic sounds generated in a computer, until we are able to “enjoy” cochlear implant music, the sounds depend upon air for their transmission and in that medium, we find that all spatial clues are generated.

Key features involved in spatial location:

  • Tone balance (EQ)
  • Reverberation
  • Left/right location
  • Up and down (which has no meaning in  stereo listening environments but is still a human perception largely ignored by musicians
  • Back and front – again with the exception of experiments  with quadraphonic and theatre sound, largely ignored by most musicians and composers. ( except e.g. Berlioz.)
  • Relative volumes.
  • Proportion of mechanical noise of the instrument (bow noises, breathing, key noises etc.
  • Signal to noise – as distance increases our perception of the noises in between  increases. Particularly true of a recording in which a more distant sound gets mixed higher than intended for musical reasons. (Off stage  instruments for example (Trumpets in  Wagner’s  Lohengrin, oboe and Berlioz symphony fantastique etc.)

Some of these might be necessarily linked to the reproductive fidelity of an experience – i.e. To experience as if you were at the concert hall with your eyes closed. In other cases certain decisions might be defined or directed by composers and performers in order to  say something else – a simple example being the movement of a singer across the speakers in an operatic scene.

The  area that concerns this work most though is our interpretation of those  acoustic triggers and how they inform our enjoyment and interpretation of the music.

I will cover the four steps from composition to performance, recording  through to listening to  discuss where these considerations matter with particular reference to  interviews with exponents of each element of the process.

I will be interested in what composers say about how they envisage their work being presented (if they think about it at all) or if they simply leave it to the performers.  How do performers  treat these things – we know in the studio that singers with headphones on need a little reverb to give them some audio feedback but  it that the same thing as an operatic soprano needed a large acoustic in which to perform or that more psychological?

Space and the soundscape is more than just a place in which music can take place – it confers meaning

There is a temptation- largely in musicological circles – to focus on music as being the interaction between score and performance.  “Until recently, musicology and music theory have had little or nothing to say about space in music, for a combination of reasons connected with their focus on the score, their comparative lack of interest in recordings and their intense focus on pitch and rhythm to the exclusion of almost everything else.” (Clarke;  Music,Space and Subjectivity in  Born: Music sound and space). This would seem to ignore a prime function of music: that of enjoyment and delight thorough our erotic relationship with it.  Roger Scruton (1997:) went to far as to argue that listeners focus on sounds themselves space being refined away by intense listening and space only plays an attenuated part in music.” Were he to be presented with a large choir singing in a dead studio environment in which the notes of the chords do not get the chance to blend in an environment  like the one the composer had in mind for the performance he might rethink his statement. In extremis he might find pleasure in focusing on the score itself and leaving us poor mortals to suffer the vagaries of the realised piece. The realisation of the piece is an essential part of its manifestation. Without the performance or recording, the piece is just a score.

Space separates the elements of the music an in the early days of recording  ‘flattened’ leaving the orchestra to organise itself  around the early recording devices. Spatial proximity them becomes a major factor in our appreciation of the music.

The listener suggest that music is not a pleasurable activity I would argue that today – particularly with contemporary  composition in with the realistic record of the piece is the recording itself, that the study of music making might now encompass a realisation of the importance and indeed the transformative effect off the modern recording studio and techniques associated.

Since the early stereo recordings of the Ring cycle produced by John Culshaw in which a studio realisation of the ring cycle brought a new language to the reading of the score-  that is to say an ability to present a audial stage for the listener rather than simply to record a semi staged event in the studio. Such control gave rise to special effects hinted at in Wagner’s score – and perhaps envisaged for future days when they might be realised but the forces demanded of same thunder were always unlikely to be co-ordinated with nature to bring a realistic sound so Wagner’s thunder special effect of a sheet of steel was used – and later improved in the studio through the use of separate room s with controlled acoustics to bring the effect to life – i.e. A higher degree of realism.

Becoming inured

The space in which music is performed or recorded is something which – if agreeable to the listener – will not be registered for more than a few moments whilst they orientate themselves to the sound world being presented.  For example, on playing a recording or a Bach Mass they will quickly establish the ecclesiastical setting (whether real or fictive) and ‘settle’ to that sound. The  acoustic environment will not be registered (but will be heard of course) unless;

It becomes apparent that there has been  shift for some reason. A good  example being  takes edited over a couple of performances during which the audience density changed and a slight shade of reverberation  changed

Or, it becomes apparent that the acoustic is not supporting the music well. A soloist may make an appearance that seems too  distant and blurry – at which point we become aware of the acoustic setting again and possibly have some thoughts about it.

So now it becomes something to which we take exception and wish to criticise.

Interestingly we seldom recall the acoustic environment of a recording unless it was either badly chosen or had some special effect, or a reason for a forefront presence. An example of which would be  Paul Horn’s famous recording of him playing solo flute in the Taj Mahal where  the whole recording is about the building in which he played  arpeggios to  let them blend I to chords that came bouncing back to the listener fully formed. The same performance in a dry acoustic would not  generally be unplayable and would certainly not be an enjoyable experience.

I cannot, for example, recall the acoustics in any particular way of an all time favourite album – Joni Mitchell “Blue”. I  know it will be  an average performance space acoustic, it will be  in proportion to the need and If I go back and listen with  the intention of hearing the acoustics specifically I will not be surprised. … I wasn’t – it was as I thought – professionally invisible to the process hence correct.

By contrast I can recall very well, some of the Deutch Grammaphone string quartet recordings from the seventies  which were recorded, to my ear, in large halls with too big an acoustic, losing detail in the recording.  Generally the faster the music the more detail gets lost in the overlapping of sounds.

The only other  circumstance where the  acoustics are recalled are when one has had a professional interest in the recording or the selection of the  location for the performance.

As a guitarist I have a ‘professional’ interest in recordings of guitar and can recall the acoustic of most – possibly because it is such a difficult instrument to site that I look for the better recordings – which to my mind are from small churches or larger  chambers appropriate to the flow of the music. There is an argument that Alhambra palaces are  fine for Spanish repertoire that might have ben composed in such places – or performed there  – the technique of leaving  notes ring as long as possible paying attention only to the rhythm at the front of the note, is not true for  Northern European repertoire in which the details in a Bach lute suite would be lost if all notes sounded as long as possible in this way and would therefore demand a less lively acoustic to make sure that the  players damping of a string was not frustrated by an acoustic that  competed with it.

Discussions about how the  performance space will affect both the players  ability to hear themselves and for the eventual listener to discern detail often involved long discussions with the players and sometimes the composer. (A compromise that can be made is to the use the noises  in the close proximity to the instrument  slightly mixed into the larger acoustic to give presence to the front edge of the note but the larger acoustic for the  duration and blending of the sound.

I have no idea about the acoustic in Alfred Brendel’s piano recordings, much loved thought they are. They must be agreeable and not  ‘present’ to the level where a comment becomes necessary. To a certain extent the recordings fit the canon of piano recordings.  We are used to hearing them played in  small  concert halls or large rooms and the wise producer will usually arrange that the recording sounds in the same way. We are only disconcerted when  something else  has taken place such as a recording in a vast auditorium with microphones too far off the piano.  As a matter of interest it takes between four and eight microphones to get a good balanced sound from a fine Steinway sited on the stage of a large concert hall.  – most are fairly close and a distant microphone is mixed in to agree the final level of the acoustic space present. The music itself will demand how it is to be mixed. (this is a major theme in this  thesis)

This inurement is true for all recordings and performances in which the space is only a setting. Where space isa dimension being deliberately and musically exploited, and the instrument placing is not natural (for example being recorded in a dry studio acoustic as was the case during the seventies and eighties when there was a fashion for dead spaces. Studios these days are more commonly designed to have a rich character of their own (see air Studios for example). The dead studios would depend on high end reverberation boxes to supply the necessary  space acoustic. Not in the case of classical recordings which has always  taken place in a live space but often has artificial reverberation added later in post production to sweeten the  overall presentation.

Where space is a dimension being exploited for effect and instrument placing is not natural but placed in an imagined space (albums such as ‘Dark side of the Moon’ are prime examples of a highly controlled and largely  illusory acoustic.) The is also evident in ‘Art music‘ and electroacoustic music.

Early electroacoustic works such as”Gesang der Junglinge” and later works such as Different trains by Steve Reich whose sound world depends upon the sampler and the modern studio for the realisation of the piece. In this the sound world is ‘constructed’ to suit the composer’s intention – but which still leans towards a naturalness.

XXX is clearly something in which we are expected to be disturbed or alerted by the ‘hopping’ location of the sounds .

There are only a few reasons why  a composer might want to  specify the spatial elements of the piece.

1 To subvert the normal run of things – perhaps to cause the listener to listen afresh

2 To suggest dramatic content (particularly in programme music) such as  emotional distance, movement across the stage, softness, etc.

3 As an effect to delight the ear – more as an extra ‘entertainment’

Note- you don’t find reference to ‘entertainment’ of music in many books on musicology. This might suggest that music has far more purposes – true but not to mention its prime component seems slipshod.

Such composers feel more in control of the elements of their work and thus use spatial  controls to the full.

Awareness of the sound world

Generally we are not aware of the spacial characteristics of  a recording – unless the are wrong. WE may be alerted to its nature when something changes – a movement of the acoustic or or an instrument in the sound field This happens more in rock and pop than in concert music music but is a mainstay of the electroacoustic composer whose whole audio world is about setting the ears  alight ( find quote)

We might listen to  song such as ‘strawberry Fields’ by the Beatles as an indication of what can be done when we ‘play’ the acoustic field as a part of the song.  What Strawberry fields shows us that there are levels of meaning unfolding with  the sing that are reflected in the  acoustic treatments. To single out just the spatialisation would be a mistake.  Engineers and producers usually work by feel and in the moment on such ideas and they are seldom programmed by the songwriter. The recording is a relatively early demonstration in the popular domain of what can be accomplished. The electroacoustic musicians ten years earlier had paved the way.


A sound source might be made to move within the performance space (as opposed to the recorded space) There are few occasions when  this might occur naturally (i.e. Without an instruction from the composer) – live or staged opera being one instance, marching bands being another (Charles Ives).

Other movements in the soundscape will be more or less a special effect – for example the ‘dramatic’ fly chase and swatting in Pink Floyd’s Ummagumma – Which track?) in which we are entertained by the realism of the effect of a stereo recording  depicting realistic movement in space. In the same album there is a track  – Granchester Meadow, in which a bird flies across the loudspeakers. Not in themselves, however musical events.

Jimi Hendrix’s Electric Ladyland shows us rapid panning effects to  break the sound field into a fictive space that gives us no real image of a performance space. The space is in your head and to try to extrapolate a real space would be a mistake.  We are expected to live with the uncertainties.  It is to real space what our experience of a roller coaster is.

It is hard to separate today where the acoustic decisions made are simply to sound good or to some extent involuntary or arbitrary. Many composers leave such questions in the hands of the production team at the studio or the performers. (popular music producers constantly search for novelty and as new sounds or techniques were invented they were readily  lapped up by the industry. Phasing in Rainbow chaser, and Thunderclap Newman – something in the air, beach boys use of theramin, very early synthesiser on Abbey road and so on.

One can find few examples of ‘meaning’ attached to spatial phenomena – perhaps a sense of remoteness, offstage or behind and distant instruments being the most common cliche for such emotions. In contemporary works space is treated as a firm dimension of the the sound which is considered as a part of the composition process.

Spatialisation of  of sound by ‘placing’ then in a sound field will give a greater clarity and ability to focus on individual melodic lines.

Seating and spatialisation in a String Quartet.

We expect a quartet to be seated on one particular way and with this image we are comfortable

V1 V2 Vla Vc

Actually and more accurately they will be seated:

V2 Vla

V1 Vc


This  semi circle  give them eye contact and the ability to take non musical cues form each other. It also faces the sound into a central sweet spot (SS) and this of course is where the wise recordist will place the principle microphone. In recent years we have seen a few examples of the following variation:


V1 Vla

This arrangement brings the lower register more to centre and assists in the spatialisation and separation of parts between V and Vla. It is possible that this seating position ins in recognition of the role of recording and the basic tenant of recording that Bass belongs in the middle. This  principle (more or less adhered to in 100% of cases) began as a means a dividing the hard work that a loudspeaker undergoes between the two speakers of a stereo system equally (there are early  stereo pop albums in which  the image is subverted – placement of sounds in Beatles recordings is somewhat counter intuitive or even bizarre at times. An example of subverting a principle almost before it has become established.

It is true that modern loudspeakers no longer need such kind consideration, but the  listeners ear and expectations are set as they are to expect the treble  balance to favour the left – whether in the quartet right up to the full symphony orchestra. It might seem that the natural playing position of the violin demands that they sit to the left  of the conductor, but that is more a matter of historic custom. I can think of no instance when the violins have appeared en mass to the right with the embarrassing exception of a record company that put out an Opera with left and right channels swapped by mistake (A hastily withdrawn and re-released Peter Grimes)

We ‘read’ sound from left to right with respect to treble and bass (which may seem counter intuitive given the placement of the treble on the right of a piano).

Of course a composer would be free to specify a different seating for a quartet and I can think of many reasons to offer:

VC Vla

V1 V2

Such a seating might suggest that dialogue effects across the space between the V1 and V2 are being made prominent and confining the lower parts to the centre. This might still give good eye contact between the players

Recording String Quartets

Recording a Quartet

I have reviewed a number of CDs recently where my overall comment has been that the recording was ‘Too wide and  too close.” Perfectly good musicians playing well and  creating a masterful rendition of a piece only to have it ruined by the recording quality.  When I use the expression ‘recording quality’ I am not  referring to  the  technology of the microphone,  pre-amps,  recorder and so, all of which work  brilliantly these days but there a re some basics that seem to me to be not right.  I will also hazard that they are not right because all training in recording technology  is focused on rock and pop technique (I hope I am wrong about this).  What we hear in these recordings of classical repertoire is firstly that the microphones are too close and that there are too many of them. This is very much a rock technique where, to get detail, and to reject noise, the engineer moves in to  capture the sound. But not being fluent with classical sound and form, does not really know how a ‘cello should sound in a recording so getting close in seems right to them – (about two feet away from the bridge). Similarly other instruments in a quartet will have close microphones at about the same distance. The wise engineer will also have placed a stereo pair a couple of meters away.

It’s what happens next that brings about the poor sound. Having captured a sound that is too close (you can tell it’s too close if you hear the  musicians breathing too much, or finger noises or too much bow sound) coupled with an out of balance sound of the instrument. Let me dive down a little.

Too Close

There is a distance for capturing the whole sound of an instrument without it feeling as if you were on top of the player. You need a little instrument and ambient noise to help articulation of the rhythm (hearing a tiny tap of keys on a flute helps rhythm become clear for example ). You need little instrument noise for realism – one of the criticisms of samples is that they have no body noises, no extraneous human artefacts. There is a distance that makes sense – it also balances the sound at the top of the instrument with the sound emanating from the bottom.  Theorists have suggested that the microphones minimum distance in these cases should be at the point of an equilateral triangle whose base is the  major dimension of the instrument. This is fine if there are no directional projections of sound – the piano for example requires a lot more finesse.

One of the reasons for getting in close is to be able to exclude sounds of other instruments (I am sure some engineers would like to put their artists in booths to achieve this). The players however, being friendly and in need of eye contact, like to sit quite close together which makes sound separation quite tricky – hence they put the microphones in close.

But why do they need separate channels for each instrument?  Because they feel they might need to do some ‘mixing’ and balancing later? Most quartets balance the sound themselves as they play – they have been trained to. There are some  top names I have heard who were recorded in this multi-miked way and it is a little insulting even to dream of touching the fader unless in consultation with the players.

Another reason for close miking might be to suppress the acoustic of the recording venue (too wild? too boxy?) with a view to adding artificial reverb later . We all do it, but usually just a touch in mastering to put the varnish layer on the  recording – not as a prime component of the sound.

All of this leads me to say – first make sure the venue you have chosen is good for the job. Generally studios (unless very good indeed) lack the sound to make the players blossom and  blend as they would in concert.  Secondly recognise that  the sound of a quartet does not  exist in close proximity to the players – the sound blends in the air somewhere about 3 meters away and the use of a good stereo pair should suffice. Now – if you need spot mikes on each instrument they are for the gentlest touches to bring out maybe a weak viola sound. But here is a  golden rule –  never move the faders in the middle of the piece – I can hear you doing it.

I’ll say that again – if you ‘push’ the cello part in a section to  help it along, I can hear the increase in volume and it makes the whole thing sound comical. Less well trained ears might not know what has happened bu they will experience the image wandering. Don’t forget that panning is  more correctly called amplitude panning and works by  dropping one volume and raising another so if you raise the volume of  violin 1 then the sound will  move over to the left and will be evident.

Too Wide

Here though is the biggest sin of all and the greatest evidence of a pop engineer  working with material he/she does not understand.  Just because the violin is on the left and the pan control suggest left by twisting it all the way to the left does not mean you get a stereo image.  Back to the beginning…. Stereo does not mean anything other than ‘solid’ (look it up).  Just because the quartet sits left to right and you have knobs that suggest left and right does not mean it is a good idea to put the Violin 1 far off to the left of(f) the stage and the ‘cello far out to the right – about 50 feet from his fellow.  This is to make the sound too wide and break down the ensemble.  By which I mean – sounds which go together – such as a chord between the players no longer resonate together but are spread across the sound field. One recording I heard recently, in which melodic lines are handed from violin to Viola then ‘cello and back (It was a Beethoven Quartet) had me feeling like a tennis referee, so twisted was the sound, and so broken down that the identity of a few of the chords was in doubt.  When a major triad is formed by two instruments on the left and the 7th degree of the scale by the ‘cello on the right you run the risk of hearing a split ensemble – it is too wide for the ear to put it back together.


Well – I have re-mastered a few recordings that were worth the time and trouble so that I narrowed the stereo field to one in which it feels like the quartet actually sat together. But better and more generally – We need to train more people in Classical music recording and not pretend that it is the same as rock. It isn’t and by a very wide margin. Ability to read music is a pre-requisite but few engineering courses require it.  a Knowledge and love of classical music is also required. You cannot move from rock to Bach with the same set of skills.

Am I ranting?  I hope not. Ranters don’t solve anything and I hope above I have suggested enough to give pointers on what is wrong with so many new recordings.  If you are a player about to record – get to know the engineer and their limits – if they don’t know what you are talking about then get another one or be prepared to have to work very hard alongside them telling them how to do it.