Surround Sound Post Production

G.L. Giustiniani - Honours Bachelor of Recording Arts - Middlesex Univeristy - SAE Amsterdam

Mission Statement and Product

This edutainment package has been designed for multimedia students who are interested in taking part in a creative learning experience about multi channel (5.1) surround sound production for movies, videos, or video-games, through an audio mixing and film post production session on a digital console with the software Digidesign Pro Tools. The practical workshop can only take place in a studio offering the required facilities in terms of surround mixing console and correctly configured monitoring system.
The course requires some previous exposure to music production, using professional or semi-professional studio equipment, with basic working knowledge of the software Pro Tools. The practical session is mainly focused on sound and images synchronization, editing and panning sound sources in the 5.1 panorama, studying the resulting acoustical and visual effects. The original material, audio and video, will need to be provided by the students, who will state why they believe it can work successfully in their post production project, motivating their choices and aims.
Ideal candidates for this workshop are Audio, Film or Multimedia students, as well as music students or composers approaching the soundtrack market, of an indicative age range from 18 to 40 years old. The course, lasting approximately one week, could be part of the SAE study curriculum, or a seminar for students coming from other music and multimedia educational institutions. Students will have the chance to produce a video-clip featuring multi-channel audio to be presented to an audience, experimenting on their acquired knowledge in audio and video perception, synchronization, and psychoacoustics. It can be a short film, a video-game presentation, a movie trailer, a live concert footage, or a song video-clip.
The package consists of the following elements, that will be discussed in the next sections of this document:
  • an illustrated tutorial, with practical exercises, covering technical and creative aspects related to post production, editing, soundtracks, story-telling, psychoacoustics, audiovisual perception, moving sources, spatial and acoustical elements;
  • listening and viewing sessions of film, video-clip or track excerpts proposed by the student, to be studied as examples and viewed as possible inspirations for soundtrack rendering in a surround field;
  • a quick reference guide to be used during the practical workshop on Pro Tools, illustrated with screen shots;
  • a bibliography to be used for further research and investigation, including all the sources cited in this document, including a useful glossary of technical terms used in post production (BBC 2008).
At the end of the course, the students will be asked to attend a feedback session with a supervisor, to evaluate the quality of the teaching and to measure the success of the edutainment experience. All the material will be available to the students in printed handouts and on digital slides. Besides, a library of soundtracks and movies on CDs and DVDs will be available for consultation.
Learning Goals and Outcome

Surround sound listening experiences have nowadays become quite common thanks to the digital multi channel audio devices in DVD home theatre installations and cinemas, but the production process itself and the analysis of the related psychoacoustic aspects are still rather overlooked in most of the curricula offered by audio and multimedia educational programs.
The outcome of this workshop is the production of a digital 5.1 video clip, obtained from a multi-track recording session on Pro Tools and a pre-edited sequence of images, both to be provided originally by the students, where they have been actively involved as musicians, engineers, directors, producers, or assistants. The product delivered by the students in this workshop is of experimental nature, based on personal interpretation rather than technical skills. There will be guidelines provided, but it is not expected that the students follow any predefined rule in their production, the main focus being on stimulating their creativity in the creation of spatial musical landscapes to form a soundtrack.
The learning goals are partly of technical nature, like understanding audio dynamics, frequency range, room and environment acoustics, reverberation and delay. However many of these concepts should already be part of the students background, to be used during the workshop as tools to develop their own creativity in a multi-sensory field. The principal goals are, in fact, developing listening and perception skills, imagination, storytelling ability through the combination of images and sound, the ability of driving emotions using certain sounds or effects when synchronized with the selected scenes.
Storytelling is the earliest form of entertainment ever created, it is an art and has evolved during the years in terms of media used, but always based on the same principles. A good storyteller can evoke visions, emotions, stimulate the listener's fantasy and imagination, capture the listener with the only use of words and speech (Chan 1987). In the same way a soundtrack designer needs to be a storyteller, using music and sound to reinforce the plot narrated by the scenes, ideally up to the point that, even without images, the story will be remembered or evoked by only listening to the film score.
The process that the students should follow is not intended to be a technical approach to a proper audio-video synchronization, it is, instead, an audiovisual composition, capable to express and incite emotions, just like a song, a film, a painting, or any other piece of art would do. The way the sound sources are placed in the surround mix, also using the Pro Tools automation to achieve dynamic and moving sound effects, might relate to the engineer's favourite taste or own experience as a spectator in live concerts or movies, or might be of a completely unrealistic nature. In either case, students have to motivate their choices and describe the possible effects in the audience.
The clips produced will be assessed by a commission of professional producers active in the music and multimedia industry, and the best selected work will be promoted for commercial production and release. In order to take part in the competition, the project must be delivered according to the following specifications:
  • the film produced must be of a total length between 5 and 25 minutes, including all titles, and can be of any nature, such as documentary, musical, dramatic, fantastic, comedy, animated, etc.;
  • access must be provided to the original tracks used in the post production workshop, such as Pro Tools session files, containing music, sound effects, dialog, ambience, and to the original video-clip file (Quicktime movie);
  • students must supply all material on DVD format by a specified date, properly labeled and packaged, specifying required credits and copyright statements.
Criteria of assessment will be mainly based on the originality, creativity, interest and distinctiveness in distributing the different sound sources in the 5.1 panorama, in relation to the musical composition and to the images, and coherent with the kind of experience which is intended to be delivered to the audience.
Being able to provide own material is an essential prerequisite of this course, for the consequent level of self commitment and involvement. Because encouraged to use and present their own created material for a production contest, students can effectively build their motivation towards the learning goals.

Prerequisites of the following tutorial are: basic knowledge of music and sound theory, minimum experience on audio mixing and editing in Pro Tools, minimum experience on video editing in Final Cut Pro or similar software package.
After each paragraph there will be practical exercises to be completed in order to verify the active learning process during that module. At the end of this tutorial, students will have covered the fundamental topics in the surround sound and post production practice, and will be able to apply this knowledge in the post production project using their provided tracks to be creatively assembled in Pro Tools.

Surround Sound

The 5.1 sound system is a decoding-encoding digital audio standard, originally created for movies, including six (5+1) separate channels. The bus-to-channel allocation generally adopted follows the SMPTE/ITU recommendation:
  1. front left (L)
  2. front right (R)
  3. front centre (C)
  4. centre low frequency effects (LFE)
  5. rear left surround (LS)
  6. rear right surround (RS)

The first successful 5.1 productions were released in the early nineties, with films such as “Dick Tracy”, “Batman Returns”, “Jurassic Park”, however, experiments to distribute sound throughout multiple channels had been undergoing since the sixties.
As opposed to the stereo standard, the 5.1 system changed completely the movies vision, providing the listener with a more involving and exciting experience, with sound sources being generated from multiple directions, and travelling between multiple points according to the dynamics of the scene. Films benefited immensely having an even more powerful tool to infiltrate viewers' minds, through more complex layers of multi-directional sound interlaced with the narrative (Pellerin 2001). The “sweet spot” is defined as the ideal central location in the hall where a listener is reached evenly by the soundwaves reproduced by the multiple speakers, minimizing phase incoherency and reflection issues.
Further surround sound derivations multiplied the number of speakers dedicated to the LS and RS channels displaced around the audience in the back. With opportune phase and delay controls, this configuration guarantees a more balanced and equal distribution of the sonic panorama even for listeners located in odd spots of the hall. Music released in surround sound format is fairly limited in quantity, while DVD and movies are nowadays mostly released in surround 5.1 format. Also older films, produced in the fifties, sixties, or seventies, are constantly being remastered in the audio and released in 5.1 format as well.

Exercise 1

Pick your favourite band or musician on a DVD release in 5.1 surround format which you have never seen before, watch at least one track on a proper 5.1 reproduction system and comment the following aspects:
  • How many different sound sources (instruments, vocals, ambience, crowd, etc.) are recognizable?
  • How does the viewing experience differ from just listening the same song on a standard CD?
Exercise 2

Pick one of your favourite old movies you are familiar with, which has been reissued on DVD in 5.1 surround format, watch at least one scene on a proper 5.1 reproduction system and comment the following aspects:
  • How does the new surround sound score relate to your original impression of the movie?
  • What changes have been done and how do they affect the vision?

A movie soundtrack is extremely important to contribute in engaging the viewer. The mix of images and surrounding sounds involves the spectator in a new reality, which is the one of the story being unfolded. The film editor and the sound designer share a relevant responsibility in a film story-telling, contributing to deliver the message through visual and sonic stimuli. The composition of music, noises, and sound effects is crucial to set the mood of a scene. If a scene is building tension, the sound score is probably playing low frequency sound effects, because the human brain will react to these frequencies with a sense of alarm, fear, awareness and expectation for something threatening to happen. Dramatic scenes are likely based on minor chord progressions delivered with sustain and presence, often by hi-pitched instruments, to enhance the sense of sadness and pain. On the other hand, happy scenes tend to feature score with major scale progressions, sparkling sounds in the mid-high frequency range, rich of even harmonics for the sense of warmth and the fulfilling reaction they generate in the brain.
A psychedelic, dreamy atmosphere will probably require long notes with volume swells, mutating sounds rich of electronic effects, complex harmonies, maybe dissonant chords and unusual progressions. In any case, sound and vision should be part of the same world, complementing each other in a coherent unity, no matter what techniques are used.
These are just few elements of the soundtrack creative composition, to hopefully give inspiration for the creative process to be developed during this course. In fact this workshop is intended to promote creativity rather than technique. The technical skills shown in the work that will be delivered will be judged only in relation with the amount of creativity and interest in the composition.

Exercise 3

Pick a movie you would always have liked to watch, but you have not seen yet, mute completely the audio and watch two different scenes, one involving the main character interacting with others, another scene mainly showing landscapes, and comment the following aspects:
  • What is the mood you can perceive of the main character and of the interacting secondary characters?
  • What sound effects or ambience noises would you suggest could fit to score the scene?
  • What sensations does the shown landscape and its elements evoke?
  • What genre of music would you suggest to score the scene?
  • Watch the same scenes with the audio turned on and match the featured soundtrack with your previous answers.
Exercise 4

Pick one of your favourite movies soundtrack, in any format featuring only audio (CD, SACD, DVD Audio), listen to it and comment the following:
  • What pictures or images does the soundtrack evoke?
  • Can you recall the scenes of the movie related to the music?
  • Does the soundtrack seem telling the same story of the movie?
Surround Mix and Perception

We will focus on the mix and will not cover the recording techniques, out of scope of this tutorial. The project will obviously benefit if the initial material contains additional tracks specifically recorded for surround use, like multiple distant ambient microphones capturing space and room acoustics on the different sound sources. If all the available tracks, instead, have only been recorded through close situated microphones, the project will require tailored use of reverb and delay to build an artificial surround ambient image when required (Thornton 2006).
When mixing a song there are several aspects involved in the creation of the final product. The mixing engineer is acting like a musical director, deciding which parts to emphasize, to attenuate, or to apply effects to, and of course the time sequence during which these alterations occur. The mixdown step of a track is an independent creative process and can drastically change the character and feel of the track itself. We name the process “mixdown” for the fact that, most of the times, we are down-mixing the original multiple tracks into two resulting tracks. When, instead, producing an artificial surround mix from original stereo material, we are attempting an up-mixing process.
Mixing in surround is an even more challenging creative step, because there are no real rules on how to pan the sounds on the additional available channels. It is a new canvas that allows ideas to be developed more freely and with a wider range of experimentation (Pellerin 2001). On a digital automated console, complex and daring mixing ideas can actually be implemented with ease of operation, the only limits being the ability to conceive them. Layers of sound are controlled by additional layers of automation, recorded on volume variations, panning, equalization, effects sends and returns.
In a post production, the new technology also facilitates quick and unplanned picture changes, additional visual effects, editing or structural changes, saving time and costs of the project. Pro Tools is ideal for additional sound design work, to create special effects, ambience noises, layering sounds, tailor and bounce multiple bits and pieces into a consolidated track.
On a traditional analog desk, the only way to deliver a complex and dynamic mix was indeed to employ several assistant engineers to put their hands on the console in a carefully synchronized and concerted performance. In the early days of surround sound productions, the rear channels were used sparingly and with caution to render a realistic reproduction, mostly only panning ambience reverberation or low frequency resonance to give depth. In fact, specially when signals have been recorded in multichannel format in the first place through microphone arrays, the rear speakers serve the important function of reproducing the correct spatial positioning image.
Commonly, the criteria being followed in a surround mix are based on how far the will is to detach from reality and create unrealistic or surrealistic sonic scenarios. A traditionally heavy instrument like a piano, for example, made moving quickly from front to rear is obviously unrealistic, as well as two lead vocalists panned one in the front one in the back.
It is important, though, to state that “unrealistic” does not mean in any sense “wrong”. If the engineer decides to provide the listener with an “on stage” experience, for example, he or she will distribute the musical instruments all around the centre. The more the distribution moves from an amphitheater-like shape towards a full circle, the less realistic will be the feeling in the audience, because not possibly reproducible in the real life. Knowing this, while mixing a piece of music, the more we want the listener to detach from reality, the more we have to spread the sound sources throughout the surround field, later landing back to a closer stereo field when we want to bring the dream to an end, like, for example, approaching the end of the song.
Mixing sound effects or ambient noises in surround, instead, can generally emphasize the sense of reality by correctly placing a sound source in a three-dimensional perspective, simulating a real life situation where the perception of sounds is affected by spatial location and movement. As an example, in a scene where a car approaches from the front and gets smaller on the horizon, the engine noise will be panned initially coming from the rear channels, behind the viewer, reaching its maximum intensity in the centre, and fading away in the front channels. For the Doppler motion effect, the noise frequency will slightly increase while approaching and decrease when fading out, in relation to the vehicle speed. While the emitter object is moving closer to the listener, it is still generating the original soundwave, causing its wavelength to reduce and consequently its frequency to increase. As the car is driving away, the emitted wavelength reaching the listener is expanding, causing instead its frequency to reduce (Gardner 1999).

Exercise 5

Pick one of your favourite music tracks (recorded in stereo) you are familiar with, listen to it in the unconventional ways described below, comment and motivate your reactions:
  • Listen to the track in mono.
  • Listen to the track in stereo, but with your back to the speakers.
  • If you were the producer, what variations or additions would you propose for its release in 5.1 surround?
Exercise 6

Choose your favourite classic track from the VH1 broadcast series “The Making of ...”, or from the “Classic Tracks” articles available on the magazine 'Sound On Sound' (, examine carefully the illustrated production process and comment the following:
  • Would the artist or producer have done it differently today? Why?
  • Would the track benefit a re-master for the current market? How?
  • Would the track be suitable and marketable for a 5.1 release? Why?

In any listening experience we have to consider the chain of events that triggers the human brain to interpret the messages, and to react in different ways. Knowing the acceptance process in our audience helps us to target our product to be delivered, to make decisions, define our variability boundaries, and potentially to make the product successful (Zacharov 2003). The sound stimuli which we have designed to be delivered by our product will first cause a prime sensation in our listeners, subject to a fair extent of variability in their physiologic sphere, possibly due to sensitivity, hearing loss, or fatigue. The perception is the next stage, where our listeners are using their cognitive factors to elaborate the sonic information. Aspects like association, memory, previous experiences, training, or even the lack of these elements, are crucial in the listener's brain to drive the way the stimuli will be interpreted. Finally a response is formulated, expressing the listener's level of acceptance, likes or dislikes, possibly including a physical or verbal reaction when the listener decides to make his or her response known externally. Our challenge is to find a balance of elements capable to drive a positive reaction hopefully in a large audience, so to make our product penetrate successfully. Specifically, the goal of a successful surround sound production is to establish an accurate balance between the two main aspects related to the human binaural perception mechanism:
  • envelopment, referred to the perception of sound being all around the listener, with no definable point source;
  • localization, referred to the ability to identify where a particular sound is coming from, to reach the subsequent state of cognition of the source.
In a post production, this balance must be found in relation to the images and to the message and sensation that need to be communicated in the different circumstances. The advantage of a multichannel system is the better separation of the different sources, minimizing the masking effects caused by predominant loudness levels or frequencies which are very close each other (Stuart 1999). It should in fact be remembered that the fewer loudspeakers are used to reproduce a performance, the more reciprocal masking will occur between the sound sources, specially if close in frequency range, like, for example, female vocals with guitars, or bass with kick drum. Masking effects are effectively used in the lossy digital compression algorithms for the common MP3 format, and the Dolby Digital AC3. In order to reduce the file size, compression algorithms will analyze the track and physically remove all the samples of sufficiently low amplitude and immediately following louder and close in frequency signals, which would likely mask them to an average listener. When mixing multichannel sound for picture, it is important to consider that the main attention of the viewer is on the action on screen, not on the sound score, so our goal is to match the sound to the action in the best supporting way (Dahmer, Massey, 2004). When working on a concert video, when a lead musician is prominent on the scene, a good practice is to raise the level of that instrument, and pan it mainly on the centre channel. Similarly, in feature film production, a dialog will need to be clearly heard above the background noise or the accompanying music, which will be reduced in level. Equalization will also need to be applied to correct the presence of the midrange content, where the human hearing is most sensitive.

Exercise 7

Choose one of your favourite MP3 (lossy compressed) tracks that you listen to on your iPod, but that you have never listened to on CD (44.1 Khz sampled wave format). Get the CD and listen to the track on a good quality desk CD player, comment the following:
  • How different is the overall listening experience?
  • Do you hear any new elements composing the track you did not notice before? If yes, what elements, instruments, effects?
Exercise 8

Sit in the middle of a large, empty classroom, close your eyes, close one ear, preferably using a sealing earplug, ask a colleague to enter the room in total silence, sit in a random spot, and talk to you with a normal tone. Try to guess the exact spot where he or she is sitting, then open your eyes, verify, and comment.


[BBC] (2008). Post Production Glossary.
Borgia, C. (2008). 'Steps in Sound Post Production' In: The Digital Filmmaker.
Chan, A. G. (1987). 'The Art of the Storyteller' In: The Leader.
Dahmer, C.; Massey, H. (2004). The Recording Academy's Producers & Engineers Wing. Recommendations For Surround Sound Production.
Dunkleberger, A. (2002). The Basics of Screenwriting. Screenplay Structure and Visual Storytelling.
Gardner, W.G. (1999). 3D Audio and Acoustic Environment Modeling.
Glixman, J. (2006). 'Create a Balanced Surround Sound Mix' In: Studio Monthly.
Huber, D.M.; Runstein, R.E. (2005). Modern Recording Techniques. Sixth Edition. Burlington, Massachussets: Focal Press.
Pellerin, D. (2001). 'Mixing in 5.1 Surround' In: Playback.
Price, S. (2002). 'Pro Tools Notes' In: Sound On Sound.
Stuart, J. R. (1999). The Psychoacoustics of Multichannel Audio.
Thornton, M. (2006). 'Surround Sound In Pro Tools TDM & LE' In: Sound On Sound.
Zacharov, N. (2003). 'Tutorial Seminar. Listening Tests in Practice.' In: AES 115th Convention.

Sun Travellers Home Page