Emotive VR − 5. Sound design

For me jumping from one role to another on a daily basis was interesting but not necessarily optimal for concentration. With all the technical challenges I feel there was not enough time for actual sound design work.

However, there were several questions and themes that needed special attention:

- What is the overall sonic mood of the films and the whole experience?

- The films are located in one room. Should the room has its own sonic atmosphere to support the story? Are there sounds coming from outside?

- How to create the soundscape of 1891 Vienna (heard through the window and inside Karl's head when he is under hypnosis)?

- When Freud hypnotises his patient Karl, what does Freud/Karl/spectator hear?

- Freud has nervosa about trains. How to use train sounds in the story?

- How to manipulate characters' own voices when spectator is in a first-person perspective?

- What happens to sounds when cutting between subjective and objective perspectives?

- How big a role Karl's dyskinetic hand movements and their distinctive sound should have?

- Is there a way or need to enrich the surrounding 3D space with sound design?

- Should we hear Freud's heartbeat and breathing? Should them react to spectator's EEG data? If yes, how to realise that technically?

- How to create the audio-only prologue and epilogue scenes?

- How to create the soundscape for the introductions with neurones, synthesiser and voice-over?

Sonic mood and consistency

The Emotive VR experience consists of several elements: two introductions, calibration, opening credits, prologue, choice sequence, two 360 films, epilogue and end credits. I am usually very much aware of consistency of the mood and quality throughout art pieces. In this project the overall structure was formed quite late, so there was not really a possibility to intentionally develop a uniform sonic aura for the whole experience. However when the project started to get its structure I started to identify some consistency across the different elements.

Music has a significant impact in creating the mood, and I am quite satisfied with the way the music in this project manages to enhance the overall contemplative and dismal atmosphere. The musical style also keeps rather consistent throughout the experience. The only exceptions are the introduction sequences with their background ambient synthesisers that are a lightyear apart from the musical universe of the rest of the experience. The reason for the use of the synth pads was to establish a neutral feeling around for the calibration process, almost a contrast to the actual story that is to follow.

Theremin-driven hypnosis sounds have their impact on the overall sonic mood, too. As mentioned already, I have also used theremin for the neurone sounds in the introductions as well as in the opening music, choice sequence music and end credits. Thus theremin is doing its part to glue the experience together.

Apart from music, hypnosis sounds and some of the bizarre sound effects in the hallucination scenes, my purpose was to create "cinema realistic" sound design through clear and clean dialogue with simple but carefully designed ambiences and sound effects. In this project the dialogue is the most important narrative element, so I wanted it to sound clear and professional without (too many) mistakes and anomalies that may disturb the immersion. Furthermore, I think that neutral-sounding dialogue and realistic soundscape help to ground the overall atmosphere before the situation in Karl's room gets crazy with everyone seeing and hearing hallucinations.

Sounds in the room

The film sequences are set in one room. In the story the room belongs to Karl's family in Vienna. It is the place where this young man, suffering from destructive thoughts, is locked. My first idea was to create a lively world outside of the room in order to emphasise the prison-like seclusion Karl is experiencing. In the end I never had time to test with distant voices or piano playing through walls. As nice an idea that would have been the narrative did not really require for that. There is however a window in the room which passes through sounds from the street. Karl's neurosis is about hurting random people, and therefore the street sounds were already scripted and one of the first things I started to create. The main considerations were how much the pedestrians should talk, or would it be mainly footsteps and horses, and how loud and muffled everything should be mixed.

The only other sounds coming from outside the room are footsteps of Freud and the servant Fritz heard from the hall next door. Inside the room there are no sound sources apart from the characters and their hallucinations.

Street of 1891 Vienna

To create the street sounds I went to the medieval centre of Le Mans and recorded people walking on the cobble stone streets. There were no airplanes, not much traffic noise apart from a constant urban hum and surprisingly little other noises such as HVAC machines. However people were talking a lot (and none in German). When later editing the street sounds together with some library sounds I had recorded earlier in other places, I mainly used footsteps but included some indistinguishable talking, too. However later I found out that it did not really work very well, so I re-edited the street atmo to contain only footsteps.

The perspective of the footsteps was also critical. When coming through the window no closeup sounds worked, which was obvious. They did work, however, when Karl is hypnotised and gets immersed with the sounds as he imagines walking among the people.

To hint that we are in the 19th century the script suggested and I added some horse steps and carriage sounds from sound fx libraries.

Hypnosis and theremin

For the hypnosis sound I first experimented with glassy sounds as the director told about glass harp that was popular in late 1700s and associated with hypnosis at the time. However, after some tests we decided to go for something else. I'm a thereminist myself, so I thought that theremin could maybe work sonically with its magical and eerie sound, but also thematically as it is invented at the same time as Freud was having his active career and it − at least to me − represents the era of new scientific ideas, inventions and change in thinking. Those topics are present in this Freud story, too.

Conventionally the ESBA-TALM school had a theremin similar to my own, so it was easy to start recording material. When playing lyrical melodies with theremin it usually sounds best when using intensive vibrato as with string instruments. The otherwise flat sound opens a lot and becomes interesting and lively, almost singing. However, for this project and thematics vibrato didn't work at all. It sounded too much like 1950s UFO movies. Thus I went for straight notes and slow glissandos. For the introduction and Karl's interactive music I needed to create the sounds of neurones. I wanted to use theremin again to create a link to the hypnosis, however with neurones the glissandos are much faster, almost like shooting stars.

The director and I agreed that some level of granularity was needed for the hypnosis sound, but I found it difficult to create that. I am not very experienced in synthesising or manipulating sounds, so it took some time to find different methods. I tried adding granular synthesis using Argotlunar by Michael Ourednik and some other plugins, morphing theremin with different other sounds using MORPH SC by Zynaptiq, Reformer by Krotos and Absynth by Native Instruments, but ended up using just Reaper's own Two-Voice Harmonizer with some delays and reverb. I am not entirely happy with the results, but taking account the limited time and my experience in sound design they work ok in my opinion.

Freud and trains

According to the director Freud had a neurosis about trains: he was afraid of them. Because the film was about Freud facing his own problems through the use of hypnosis with his patients, the director wanted to use train sounds to create that crazy and stressed feeling. That's what I did, and I ended up using some old-school steam train and rail sounds when Freud first time starts to mentally connect with Karl and his neurosis. Second time the trains appear in a crazy hallucination scene, and this time I spatialised them so that they come and go around the listener. I used several library sounds and SoundParticles by Nuno Fonseca to create the Ambisonic background layer of passing trains. I complemented that with individual spot sounds spatialised with IEM Room Encoder. Both SoundParticles and IEM Room Encoder are, by the way, able to create very nice Doppler effects.

At some point I tried to morph theremin with train sounds to create the basic hypnosis effect, but that didn't work out.

Voice echoing inside head

Interestingly the director wanted to edit the films so that the camera/spectator constantly jumps between first-person and third-person vantage points. The first-person perspective was called "subjective" and third-person perspective "objective". When the character is talking and we are in the first-person position I wanted to make the character's voice to sound like it is echoing inside the spectator's head. That was easy to do and it worked very nicely. Some test viewers told that it helped to understand the rapid perspective changes.

For the effect I used some notch EQ, strong multiband compression with boost in low and high frequencies and a short reverb using Reapers' ReaVerbate plugin with 175 Hz lowpass filter.

Changes in sonic perspective

The jumps between 1st and 3rd person perspectives would in theory mean that the sonic perspective changes, too. My concern was first that it would probably sound unnatural when all the sounds around you shift their relative position. However that turned out not to be any problem, probably due to the fact that the spectator quickly learns to watch this kind of 360 film style using experience from the established film montage and sound editing tradition where perspectives tend to shift often. Or at least that was the case with me. Also there are not many sound sources inside the room that would "move"; usually just two characters and the window with street sounds.

Only in one or two cuts, when dialogue continues over the edit from subjective to objective or vice versa, I had to offset the shift of sonic perspective a little to make the transition smooth.

Lack of 360

Although Emotive VR is a 360° experience nearly all the action in the films happens in front of the spectator. The surrounding space, the room of Karl van Schroeh, is important in creating immersion, but narratively the directions behind, above or below the spectator are not used (unless someone decides to follow the story facing away from the characters). Only in a few places one needs to turn head left of right in order to follow a protagonist walking across the room or switch gaze between multiple characters.

This of course contributes to an intense experience where the spectator can focus on the dialogue or action happenging in front of her instead of needing to corkscrew around while trying to spot pieces of story sprinkled here and there. The field of view of the standard HTC Vive headset is still quite narrow and thus observing the world around requires moving the whole head instead of just eyes. Hence this static approach to 360 narration may be justified.

From the sound design perspective, however, it would have been nice to be able to utilise the 3D space a bit more. Hearing and localising sounds coming from outside of one's field of view does not require turning head (if a good binaural decoder is used) which is useful for building the surroundings with sounds. With the street sounds coming through the window that is exactly what we did. They are a part of the story, and they are the only sonic element which has a clear narrative function while being independent in terms of the direction of action and spectator's gaze.

Yet the three-dimensional nature of hearing could also be utilised for taking the story forward with individual events happening outside of the field of view. In my opinion that would bring a level of surprise and realism to the story world as not everything always happens in front of you. For example the Karl film starts with a first-person view: the spectator is sitting in the middle of the room looking at a door. Even though one can start exploring the room by turning around, the door represents the established "front direction". When the approaching footsteps of Freud and Fritz start to echo from behind the door, their perceived direction is still "front" regardless of where the spectator is gazing at that moment. A more interesting solution, in my opinion, would have been to position the spectator looking at the window and listening to the street sounds. The approaching footsteps from behind left would then have aroused the spectator's attention and introduced the door as new element. Having said that a decision like that would have changed the emphasis with Karl's character and the situation; the director probably wanted that Karl (and the spectator) is nervously waiting for the doctor Freud to visit him and therefore already gazing at the door.

Later, when the hallucination sequences start, the 3D space is utilised to some extent with spatialised sound effects passing the spectator from all directions. However their positions and movements are arbitrary and carry no narrative purpose.

In the audio-only prologue and epilogue scenes Freud is walking around the spectator while talking and smoking cigar. Although the sequences make use of the whole (horizontal) 3D space the spatialised movement is again arbitrary in terms of narration: it is important that he is walking as that suggest that he is nervous and it also creates the sense of room and space, but his actual route around the spectator is insignificant.

To me it feels that in this project the use of three-dimensionality and spatial directions were not given too much attention while scripting the story. I was always free to suggest new sonic ideas and experiment with them, but it was very difficult and often pointless to try to add new elements into an already carefully scripted and filmed story.

The lack of 360 elements does not harm the immersion and it might even help in concentrating on the characters and dialogue. Nonetheless once making a VR experience with 360 films it would have been interesting to try all the possibilities of the medium.

Heartbeat and breathing

With the director we talked a lot about adding characters' heartbeat and breathing sounds to emphasise their nervousness. The idea was also to make the bodily sounds react to the spectator's excitement level derived from the EEG system.

When using a first-person perspective there is, in my opinion, a problem when playing the character's physical noises in the spectator's headphones; if the goal is to embody the spectator in the character, hearing the character making sounds can break the illusion and immersion. For me hearing the character breathing (or talking) feels like I am just a passenger inside another body. In this project the first-person characters talk for you, so according to this view we are already "passengers". Cutting between 1st person and 3rd person perspectives does not help the embodiment further. So maybe hearing the physical noises would not eventually have changed the level of immersion.

The real reason we dropped the heartbeat and breathing out was the lack of EEG data. Originally two emotional values were supposed to be measured: valence (positive−negative) and excitement (low−high). The team at the University of Nantes, however, could only develop reliable algorithms for valence, so they did not want to try guessing spectators' excitement levels, which was, I think, a good decision. As the breathing and heartbeat sounds would have been easy to link with the excitement, and now that we did not have that data, I suggested that we skip it.

There would have also been spatial sound design challenges tied to those sounds. If we would hear heartbeat pumping whenever we are in the 1st person view, should we not hear it when jumping to a 3rd person perspective? However, according to the director's vision, even the 3rd person view is showing the scene as experienced by the chosen character, not as an objective beholder. Shouldn't we then still hear the heartbeat, at least to maintain the liaison between the character and the spectator? That would have actually been a nice concept to experiment. But then, where should have been the sound source? Still inside the spectator's head, or spatialised where the character is? If the latter, then that would have meant some interesting technical challenges:

The heartbeat and breathing should be interactive, reacting to the EEG data. I already created them in Wwise and mapped them to the expected excitement data value. Heartbeat was just a loop of heart beating, and the EEG parameter changed the playback speed. Not the best solution, but worked fine for demo purposes. I also tried to create it so that the parameter changes the delay between the heartbeat samples, but for some reason I did not manage to get that working properly.

I assigned two output buses for the physical sounds, Headlocked Stereo Bus for the 1st person perspective and Ambisonic Bus for the 3rd person view. Now it became difficult, how to change between the buses when the perspective changes in the film, and how to spatialise the 3rd person sound? As we dropped the hearbeat and breathing concept I did not tackle these questions further, but I did come up with some ideas:

One solution to switch between the buses would have been to insert markers in the wav files and use them in Wwise to switch between the buses (using mutes/fades). I have never used wav markers in Wwise, but I would guess that should work. If not, another way would be to write a script in Unity with timecodes synced to the video and game calls to Wwise for the bus changes.

Spatialising the 3rd person perspective sounds would have been a bit more complex. I could not think any other way than creating a Unity object that follows the character in the film and linking the sound into that object.

Foleys and spot effects

I did not produce many foley recordings for the films, mainly due to the tight time schedule, but I did spent quite some time remaking Karl's dyskinetic hand sounds, which are an important part of his character. Whenever Karl is sitting he keeps playing with his hands and that makes a distinctive sound. It was of course captured by the lavalier he was wearing, but as his clothes were very noisy the sound was always accompanied by irritating clothe rustles. I also cut away from the lavalier tracks everything else but the characters' dialogue to remove cross-talk, so the hands needed to be recreated in any way.

Martin had recorded wild hand foleys with Karl's actor on location, but they were performed with a wrong tempo and they were unusable with the image. Hence I had to make foley performance for the hands.

Not having much experience on foley recordings the experience was fun, but at the same time rather difficult resulting to a lot of editing afterwards. The sound studio was very noisy with the ventilation hum so heavy denoising was needed once again.

I also did some other foleys, especially clothe sounds and footsteps for places that were lacking sound or where I wanted to boost the existing sounds. I also used some wild location foley recordings done by Martin.

In addition to performed foleys I created some spot effects using my own recordings including door locking and chandelier tingling.

Choice sequence

The choice sequence has three sound elements

- Background music

- Freud's voice

- Freud's and Karl's foleys

The music is an 8-bar loop playing from Wwise in headlocked stereo. It starts in the previous sequence, the prologue, and continues over the scene change. On top of the music there is Freud's voice which is also coming from Wwise.

When gazing at Karl or Freud one can hear them making some sounds. Unlike all the other sounds in the project these foleys are played back from Unity. This was asked by the students in Nantes as they were programming the Unity scene with the gazing and choosing functionalities. For them it was easier to fade the sounds up and down using Unity's sound engine.

I provided them with two looping sound clips, one for Karl's hand and one for Freud's mouth sounds. I edited Karl's foleys in sync with the picture, but I do not know if the synchronisation will remain in the final project. Freud's coughing and other sounds are not synchronised as he did not actually make any sounds when the choice sequence was filmed. I edited the little mouth sounds using left-over material of location recordings.

Prologue and epilogue

A big sound design challenge was the prologue and epilogue sequences. They are audio-only scenes with monologue, foleys and some sound effects spatialised in a 3D space.

The sequences are set in Freud's apartment in 1938 Vienna. There are German troops on the streets heard through a window, and Freud is inside the room, talking about the start of his career and his mixed experiences with the young patient Karl. He is smoking cigar, walking around the spectator and waiting for the proximate departure for an exile.

Originally these two scenes were to be recorded on the first week after the film shootings in January. The idea was to use the same location, the château in Nantes, and record the scenes "live" so that the Freud's actor William Flaherty would walk around the SoundField microphone while talking and smoking the cigar. The performance would have been more or less natural. In addition to the Ambisonic microphone the monologue could have been recorded with a boom and even with a lavalier.

However, after the shootings were finished the production team decided that there was no time for the prologue and epilogue recordings. They must be recorded separately afterwards.

That happened in the end of March, and the task for given to me. With the director we discussed whether it is better to record the monologue in a studio environment and create the foleys separately, but I liked the idea of recording the scenes as they were originally meant to be executed. I was sure that a more authentic performance could be achieved. However, I wanted the room to have a wooden floor and with as little background noise (traffic, ventilation, etc) as possible. Footsteps on a wooden floor would give an idea of an old apartment, maybe in Vienna, and low background noise would just make everything easier. Strangely enough neither the producer or the director could find any place fitting those requirements in either Nantes or Le Mans. As the Freud's actor was based in Nantes, we finally ended up recording the scenes in one of the office rooms of the production company's building. It did not have a wooden floor, it was very small and there was ventilation and traffic noise. But there we were.

I placed the SoundField Ambisonic microphone in the middle of the small room and rigged a Samsung 360 camera on top of that. My idea was to use the camera for reference when spatialising the sounds. I had also a boom microphone rig (Neumann KMR 81) to capture the monologue in closeup. I didn't setup a wireless lavalier, although that would have been a good backup (and actually a very good alternative to the boom in this situation, but I will talk about that in a minute).

Without the wooden floor there was no point for the actor to wear shoes as I would have captured just wrong kind of footsteps. Hence I suggested William to walk on his socks. The small room caused problems as the actor had to walk around the Ambisonic mic very close. That didn't sound good when listening afterwards. There should have been at least one or even two meters between the mic and the actor to create a natural spatial feeling. The ventilation and traffic noises also caused some extra work as I had to "iZotope" all the material. With the boom mic that was not a problem, but with the 4-track Ambisonic material it was quite slow as I had to process the tracks separately in RX and then put them back together.

With three of us in the room, the director, the actor and me, we spent some two hours recording the two scenes with multiple takes. The actor was walking around the mic while talking his lines and fake-smoking the cigar. I was booming and trying no to make any sounds with my movements. That was not successful. When listening afterwards the Ambisonic material I could clearly hear myself moving around the mic. A lavalier instead of boom would maybe have been a better choice.

The 360 camera turned out to be a problem as well as it overheated after recording a couple of minutes of operation. As we were on a tight schedule we just had to let it cool down and keep recording. Instead of the 360 camera the director used her mobile phone to make videos of the recordings for reference, but all she managed to capture was a few still images. So the idea of using video as reference for spatialising could not be tested.

After the recording session I sent the boom mic material to the director, and she chose the best takes. Next week back in Le Mans we listened to the material together and edited working version from multiple takes. The priority was in performance, which meant that the movements and positioning of the actor was not always consistent from one take to another. I said that that's ok, I will try adjust the rotation of the Ambisonic material to glue the takes together. However, due to the huge differences in relative distance to the mic, it turned out to be very difficult to match the positions and movements together.

As there were also the "ghost" sounds of myself moving with the boom, which were difficult to remove, I decided to abandon the Ambisonic tracks for good.

From that point on I created the scenes entirely from the boom mic material. That gave me the liberty to recreate the Freud's movements in the room. I believe the performance was still better done in a "real" location compared to a studio, but I would still have preferred the original approach with a proper location.

The sonic elements in the scenes are Freud's voice, his clothing sounds and footsteps, a clock ticking in the room, and street sounds coming through a window. Freud's voice was supposed to be older than in the films as the films are set in 1891 and the prologue and epilogue happen in 1938. I was told that there are plugins that can change a person's voice to sound older, but I had never used one. The plugin was apparently T.R.A.X. by IRCAM. I downloaded the trial version and tested it, but couldn't create believable results. So we decided to use the actor's voice at it was.

The clothing and footstep foleys were recorded separately as we couldn't capture them in Nantes. Compared to the difficulties the producer and director had with finding a location with a wooden floor and little background noise I was surprised how easy it was for me as a foreigner to find a place in Le Mans with those two parameters. It was the home of one ex-student working now as a staff member at the ESBA-TALM school. He nicely let me to use his home, and actually performed the foleys! They worked very well. Although the room was not very big we could easily have recorded the prologue and epilogue there with the original plan...

The street sounds were the most important element to set the scene in a city apartment in 1938 Vienna. I used the same street atmo I had created for the 1891 film scenes, but added a dog, a distant church bell and some cars. With cars it was difficult not to give a too modern feeling. For the German military presence I used some library files for marching soldiers. First I had some trucks there too, but they didn't work too well. Trams would have been a nice addition, too, but in the end I didn't have time to research and find suitable sounds.

For the spatialisation in Reaper I used the IEM Room Encoder plugin. I did test the just-released dearVR VST plugin with a trial license, but decided nevertheless give a change to the IEM. It sounded ok, although the elevation accuracy was blurry. That depended of course also on the binaural decoders at hand. When creating the scenes in Reaper I used the IEM Binaural Decoder out of all the numerous alternatives. When playing back the scenes from Wwise the Auro-Heaphones binauraliser interpreted the signal in its own way.

Later it was decided that the spectator should see a computer-generated smoke cloud following Freud's movements. As the cloud was created in Unity by the Nantes team, I was asked to deliver the position data to them. I didn't find any way of exporting the automation data from Reaper which would have convenient. I already started to write the values of the automation points manually to a text document, but as that was tedious, I decided to take screen capture videos of the IEM Room Modeller window with the X and Y coordinates changing while the scenes progressed. I sent the videos to Nantes, but I haven't since heard about what happened to the smoke effect.

Had I known about the smoke effect well in advance, I could have created the whole scenes straight in Unity using moving objects. That would have allowed me to use the dearVR spatialiser in Unity with much more authentic spatial image.

EDIT: Being unsatisfied with the quality of spatialisation (especially the lack of externalisation) I later recreated the scenes in Unity using animated objects and dearVR spatialiser.

The prologue and epilogue took me very much time to make, but in the end I am quite satisfied with how they came out. I like the sombre feeling in them which matches the overall mood of the whole experiment. I just hope the smoke effects in Unity will fit and have dialogue with the scenes!

Introduction #1 and #2

In the beginning of the Émotive VR experience there are two introductions: The first one welcomes the spectator to the experience and tells shortly about the the EEG calibration to be conducted. The second introduction is played after the calibration and more details are given about the EEG measurements.

The introduction has a simple voice-over, but the director wanted some "neutral" background sounds to create a relaxed atmosphere in preparation for the upcoming calibration. Visually there is a 360 animation of neurones flashing from node to node inside a brain.

The voice-over was recorded at the ESBA-TALM school in their sound studio, which is actually just an editing suite with some acoustic damping on the walls and ceiling, but not much sound isolation. There is no vocal booth or separate recording room. A student with theatre experience and a nice voice read the text, and I recorded with an AKG C4500 studio microphone. The mic was new to me, and afterwards I noticed that there was a strange metallic texture in the sound. Maybe another mic would have worked better. However, with some EQ I managed to smoothen the sound a little bit, but some funny tone remained.

There was quite loud ventilation in the room, which forced me to once more use RX for denoising.

The background music is something I am not very happy with. It consists of two quite generic synth pads spatialised in Wwise so that the sound sources are on the opposite sides of each other in relation to the listener. When turning head one should sense a little bit of spatialisation within the musical bed. As I have already mentioned this introduction "music" is not consistent with the rest of the musical material in the experience, but it serves a different purpose and serves to neutralise the atmosphere before the actual story begins. Synthesisers might also fit well with the computer-generated 360 animation of flashing neurones, maybe better than Schönberg-influenced theremin, piano and string music.

For the neurones I created several short theremin glissandos that I exported to Wwise, randomised them in terms of order and pitch, and randomly spatialised them in 3D. Only weeks after creating these neurone effects Christophe made the animation which doesn't really match the rhythm of my theremin sounds. However, it will serve its purpose.


Next:

6. Music

7. Wwise and Unity

Previous:

1. Émotive VR

2. Location sound recording

3. Post-production starts

4. Audio spatialisation