Keywords: 360 film, VR, interactive, music, 3D audio, film sound
Roles: sound editor, sound designer, composer
Émotive VR is a French 360º film project where spectator’s emotions are measured with EEG sensors and used to control musical and visual elements in real-time. I’ve been involved in the project since January 2018 as the sound editor, sound designer and composer through an internship programme from Aalto University, Helsinki.
In this post I will share my experiences and go through the workflows I have used to create the sound and music of Émotive VR. As the project is very complex and I’ve been jumping between multiple roles, this report will also be quite lengthy (and actually not completed yet, I will write more as soon as I have a bit more time).
I will start with an overall description of the project, and then dig deeper into several areas and topics. I will also try to give some reflection on my own work and how things could be made better next time.
Freud, la dernière hypnose
In 2014 a French filmmaker Marie-Laure Cazin realised an interactive film project Cinéma émotif where the storyline of her film “Madamoiselle Paradis” was changing according to spectators’ emotions. A couple of selected audience members were wearing EEG headsets, and their valence (positive–negative) and excitement levels were measured as they were watching the film. This data was then used to change the course of the film in real-time together with some audio effects on the soundtrack.
Émotive VR is a similar project, however this time the spectator will be inside a 360º film called “Freud – La dernière hypnose”. The film takes the audience into Sigmund Freud’s last hypnosis session with his young patient Karl. The story is based on Jean-Paul Sartre’s text Scénario Freud.
Unlike in the Cinéma émotif now the emotions won’t change the storyline, but they will affect music and some visual elements in certain parts of the film. The spectator is placed in either Freud’s or Karl’s subjective position, although there are intercuts to objective (third-person or “ghost”) vantage points, too.
The topics of hypnosis, psychoanalysis, self-knowledge, and psychology in general are extremely interesting in the context of VR, and exploring their relationships in terms of sound design would have been an intriguing challenge to me. Unfortunately it turned out that I didn’t have any extra time or energy to research or explore around the topic; my challenges were more in learning all new tools and workflows and trying to finish all the sound and music elements in time before the end of my contract. That goal wasn’t (of course) achieved, and I’m still working on the project remotely, and will be consulting later with the French team about the Unity/Wwise audio integration.
The core components of Emotiv VR are two 10-11 minute cinematic 360 degree monoscopic videos with actors and dialogue. These linear film sequences are complemented with spatialised sound design and interactive music. The videos will be running in Unity game engine which receives control parameters from the EEG system. Audio in 2nd order ambisonics and headlocked stereo will run separately but (hopefully) in sync in Wwise sound engine. The VR platform is HTC Vive / SteamVR.
Emotiv VR is a joint research project by the École superieur des beaux-arts TALM (ESBA-TALM), Le Laboratoire des Sciences du Numérique (LS2N) of the University of Nantes / Polytech Nantes, RFI Ouest Industries Creatives, Le Crabe Fantôme production company in Nantes, and DVgroup VR production company in Paris.
My physical location was in Le Mans, at the ESBA-TALM art school. I worked in a close relationship with the other intern Christophe Rey, whose job was in editing and visual design. From time to time we travelled to Nantes to meet with the polytech research group working on the EEG algorithms and Unity.
Structure of the experience
I will now go briefly through the structure of the experience. Later I will come back to details of how they were realised.
Set up. The trip starts with the spectator sitting down on a swivel chair. An EEG headset (by Emotiv) is installed after which an HTC Vive and headphones are put on. Software related to the EEG measurements are running, and the first scene of the Émotive VR Unity project is started.
- Introduction #1
A web of neural connections surround the viewer in all directions. Short, high-pitched “sounds of neurons” whistle around. A steady synth pad grounds the soundscape in intention to create a relaxed feeling. After a few seconds a female voice welcomes the spectator to the experience and explains about the EEG measurements and the calibration that will take place next.
Audio/music workflow in a nutshell: - VO recorded into Pro Tools, cleaned with iZtope RX, edited, slightly eq'd and compressed, exported to Wwise - Neuron sounds played with theremin, recorded into Pro Tools, edited, exported to Wwise, played back from a random container with randomised pitch and randomised positions in 3D space - Synthpad played with Pro Tools' Xpand!2, two separate tracks exported to Wwise, spatialised in 3D - 2nd order Abisonic bus used in Wwise with the Auro-3D spatialiser
A series of IAPS images are shown to the specator in order to calibrate the EEG sensory data for measuring valence (positive–negative). There’s no audio in this sequence.
- Introduction #2
Similar to the first introduction, however this time the voice explains more about when and how the EEG measurements are used during the experience. She also welcomes the spectator to the 1938 Vienna where the frame-story starts.
- Opening credits
Audio/music workflow in a nutshell: - Piano and theremin theme, or "Freud theme", influenced by Arnold Schönberg's early works, played with Pro Tools' MiniGrand and a real theremin - String theme, or "Karl's theme", a simple chromatic progression using a string quartet, played in Pro Tools using Spitfire Chamber Strings sample library - Exported to Wwise as a ready-made mix, using a headlocked stereo-bus
This is an audio-only prologue scene where Sigmund Freud reflects his experiences with hypnosis in the early days of his career. The spectator can hear street sounds coming from a window, a clock ticking in the room, Freud lighting a cigar, walking around on a wooden floor and talking. Visually there’s only a computer-generated trace of smoke following Freud’s movements.
Audio workflow in a nutshell: - Monologue recorded on location in Nantes with a boom and an Ambisonic microphone - After many tests I decide to use only the boom mic, it's edited and spatialised in Reaper using IEM room encoder and 2nd order Ambisonics - Foleys for footsteps and clothes recorded in Le Mans, spatialised in Reaper - Street sounds constructed from my own field recordings - Played back from Wwise with Auro-3D spatialiser
The spectator finds herself in a late 19th century room with paintings on the wall, fireplace, bed, books on a shelf… We’re now inside a 360 film. There are two characters in the room: Karl, the patient, is sitting and making dyskinetic movements with his hands. Freud is standing, now younger and in the early years of his career. The spectator is encouraged to make a choice between the two characters by gazing them for a while. When gazing at Karl the specator hears his hand foleys and fragmented breathing. When gazing at Freud the spectator hears him making little noises with his mouth, coughs, sighs, etc. There’s going to be a visual timer to indicate when the selection will be done. In the background we hear a short loop of music with theremin and piano.
Audio/music workflow in a nutshell: - Karl's and Freud's foleys and sounds edited from the wild location recordings, synced to the picture in Reaper - Foleys will be integrated and spatialised into the Unity project using game objects and Unitys' own audio engine (by the Polytech Nantes students) - Music created in Pro Tools with theremin and MiniGrand software synth, exported to Wwise
- The 360-films from Freud’s and Karls’ perspectives
If the specator chooses Freud, an 11 minute 360-film starts with the perspective of Freud. If the choice goes to Karl, the same story is shown but this time from his perspective. Almost all the audio (dialogue, ambiences, foleys, etc) is spatialised into 2nd order Ambisonics. However, when the vantage point is “subjective” ie. we’re inside Freud’s or Karls’ head we hear his voice in headlocked stereo echoing slightly. There’s no music in the films until Freud starts his second hypnosis with Karl and they both start hallucinating. That’s when the interactive part starts: the valence level of the spectator affects the music.In Freud’s version the music is a simple piano progression insipired by Schönberg. The valence affects sonic textures by adding strings and piano patterns when going to the positive side, and adding re-harmonising effects and low-tones when going to negative. In Karl’s version the music is string-based and quite different from Freud’s music. The valence level affects again the arrangement and some rhythmic elements. The re-harmonising effect is also used for negative feedback.I will talk later about the challenges in making interactive music, but I have to mention here that it was a real problem trying to maintain the desired feeling and tone of the scenes while at the same time craft variety to the music in order to allow clear feedback from the emotional data.I’m still polishing the final versions of the interactive music tracks, and will share them here as soon as they are ready. However without the 360 film around you and without all the dialogue and sound effects they won’t probably sound very interesting…
Audio/music workflow in a nutshell: - Dialogue recorded on location with wireless lavaliers - SoundField ambisonic microphone running simultaneously to capture the room - Dialogue edited and cleaned using Reaper and iZotope RX - Foleys recorded and edited in Reaper - Sound effects created with theremin and other sound sources with several plugins in Reaper - Spatialising done and synced with the 360-video using the FB360 toolkit - After several picture re-edits syncing of audio done with Vordio - Ambisonic 2nd order files and headlocked stereo files exported to Wwise - Videos running in Unity video player, synced with audio by starting video first and starting Wwise sound event 0.5 second later - Music created in Pro Tools using theremin and software synths and sample libraries, stems exported to Wwise for building the interactive music
- Choice again
After one of the films has finished the spectator is taken back to the choice sequence, where she can choose to watch the other version (or the same again) or exit to the epilogue.
Epilogue is again the old Freud walking and talking in his 1938 Vienna apartment.
- End credits
The visual realisation of the end credits is still open, but there is an idea to give the spectator a report of her emotional data during the whole experience. One form of feedback will be the end credit music changing according to the recorded valence level (of course sped-up to fit the length of the credits).
In addition to being interactive the music is also spatialised so that each instrument group is positioned around the spectator in a 3D space.
Next I will talk about different areas in more detail. I’ll start with the location sound recording as that was the first thing I encountered in the project. After that some words on starting the post-production, but about the rest of the areas I will write a bit later. Thanks for reading this far!
Location sound recording for a 360 film
The 360 video sequences were shot over three days with a professional film crew in a small château in Nantes. The actors were alone in one of the rooms decorated to match the film’s narrative, and the crew were behind the wall in an adjacent room.
The camera rig was a custom-built array of eight GoPros. An additional small Samsung Gear 360 camera was used to provide real-time monitoring for the director.
As no cables, mics or crew members were allowed to be visible in any direction (except below the camera, an area that was refilmed after each camera position to mask the tripod and other camera equipment), booming the actors was not considered as an option. Therefore the dialogue was recorded only with wireless lavalier mics. And maybe not so surprisingly there were a lot of problems with clothing noise. But otherwise the dialogue came through clean and nice for the most part. The production sound mixer Martin Gracineu used Wisycom transmitters and receivers, Sanken COS-11D capsules, and Sound Devices 633 recorder.
After each scene Martin recorder wild foleys with a “real” microphone (Schoeps CMC 6 with MK41 supercardioid capsule). Schoeps was also used for some off-screen dialogue tracks as well as one special effects scene where only a small portion of the 360 camera image was to be used, and thus the mic boom and the rest of the crew didn’t ruin the shot.
My task was to record the scenes in ambisonics using a SoundField ST450 MKII microphone. Original idea was to use ambisonics only for two audio-only scenes. In these prologue and epilogue sequences Freud walks around the spectator in darkness while having his monologue. The boom mic would capture the voice, and the ambisonic mic would capture the spatial ambience. However, as we had the SoundField mic at our use we decided to try it for all the scenes just in case. Hence for each shot I carefully rigged the mic and audio bag under the GoPro array in the limited space between battery packs, cables and the Samsung camera.
The ambisonic material turned out to be useful providing nice perspective and feeling of space with authentic room reverberations. Even though editing the ambisonic tracks in addition to the normal dialogue tracks and rotating them to match the picture azimuth gave some extra work, I would prefer this approach over trying to reproduce the same with plugins.
The four ambisonic B-format channels from the ST450 control unit were recorder into a Sonosax SX-R4+. It has a built-in WiFi for remote control and metering using a web browser on a computer or cellphone. That turned out to be very useful as we had to be in another room, and running in and out of the set between takes would have been unpractical. However, the wireless connection was extremely unreliable through the wall, and for one or two scenes I was forced to operate the recorder manually. Also there was no way to monitor the audio over the WiFi, so I had to “instrument fly” and let the recorder roll without listening. A wireless IEM system would have solved the problem, but we didn’t have an extra set available. However, as the mic was stationary all the time and I had set the gains quite conservatively (by the way, the Sonosax does not have limiters) there was not much to monitor.
In the end we didn’t have time to record the prologue and epilogue, so those recordings were done separately in the end of March.
Pre-stitching and editing the videos
After the shooting was completed the material from the eight GoPros needed to be stitched together to create the 360 videos. During the following weeks an intern college of mine, Christophe, together with another intern at DVgroup in Paris pre-stitched the material using Autopano software by Kolor. Audio from the internal mic of one of the GoPros was selected as the guide audio track for the video clips.
However no audio from the production sound recorders was synced to the stitched videos. Editing was done with the GoPro sound, and for example any parts of the dialogue recorded outside of the set was not audible. Only after the editing had started and when the director knew which takes would be used I was asked to deliver the mixdown of the dialogue tracks for each stitched video clip to be synced on the timeline of Premiere Pro. I prepared the mixdowns in Reaper syncing them with the slate and exported them as mono files. (Later I realised that we lost some valuable metadata in the process that could have been useful when the sound editing was about to start.)
Christophe got now some extra work as he had to take extra steps to attach external audio to already edited video clips. My mistake was also not to deliver him multichannel audio files at this point, but only the mono mixdown. Even though the mixdown served the picture edit fine, it caused me extra work afterwards as I needed to sync the individual mic tracks manually in DAW when starting the dialogue editing.
Although it wouldn’t have solved my syncing problems in sound editing, taking the mixdown track from the field recorder and aligning that with the video in Autopano would have been a quick way to get at least nice and clean audio for the picture edit.
During these first weeks – while Marie-Laure and Christophe were editing – I spent my days digging deeper into the 3D audio workflows, software and plugins, and I also started to create the music material. With Christophe we also spent a few days installing SteamVR and HTC Vive to a Mac Pro and checking that the VR headset and 360 videos worked in all computers and required software. I will talk more about these issues later, as they will get a concrete role in the production a bit later.
Sound editing begins
In the beginning of March the rough-edits were finally done. The easiest way for me to start with dialogue and sound editing would have been to take AAF or OMF exports from Premiere and open them in Reaper. In that way I would have had all the raw dialogue audio synced and trimmed to the picture edits while preserving their original length so that I could easily manipulate and replace any audio, even go back to the slate frame if needed.
But. Reaper does not understand AAF or OMF! Which is unbelievable. The AATranslator converter software should do the trick, but it’s not a cheap option. With Vordio it’s possible to convert XML export to a Reaper project, and while not free either, that’s the solution I went for.
However, as I already mentioned, there were no original mic tracks in the video edit project but only the mixdown. So I still needed to sync the individual tracks manually in order to start with dialogue editing. This must be streamlined for the next project!
To be continued…
More stuff coming soon. Stay tuned!