Audio spatialisation (positioning sounds in the virtual 3D space) should probably be one of the last steps in a 360 film post-production workflow, something that is started only after the final picture edits and azimuths are locked. Nevertheless I started testing spatialisation as soon as I got hold of material from the location shooting. The reason I started prematurely was that I wanted to master the workflows and software before receiving the actual edited film materia (let alone the final picture edits which were completed not much before my time was up in Le Mans).
DVgroup suggested me that I should use FB360 Spatial Workstation for the spatialisation work. This plugin and software package by Facebook seems to be the most widely used spatialisation package at the moment. Although being free it still has quite complete set of features including support for 2nd order Ambisonics, cross-platform compatibility, spatialisation graphically on top of the video image, and real-time VR monitoring using an HMD (Head-Mounted Display), in this case HTC Vive. However, in reality FB360 turned out to be buggy and limited in some features, picky with video formats and quite processor-demanding. Also the built-in binauraliser and its room-modelling do not sound very convincing, but that was not a big issue for me as the final audio will be played from Wwise using its binaural encoder plugin.
I tried to find an alternative to FB360 with a similar feature set, but it was difficult. Right in the beginning of the project in February I contacted a French company Aspic who were about to release a beta version of their CinematicVR package for developers in a month or so. Unfortunately the release date was delayed so that it came out only in the end of June. I don't know how good the software package is, but it would have been nice to try it out already in this project. Later I learned that it's Windows only, and as I was working mainly on Mac that would have been a bit of a problem, but not a deal breaker.
Another solutions, however without real-time VR monitoring, would have been AudioEase 360pan suite. AudioEase is famous for their Altiverb convolution reverb plugin used by professional film sound editors, so I would assume the quality of 360pan would have been very good, too. However the price was too much for the budget and I wanted to test real-time monitoring using the Vive headset, so I had to proceed with FB360.
In the project there are two audio-only scenes, the prologue and epilogue. For those I was free to try any spatialiser solution as there was no need to sync anything to video. I could have animated the audio objects in Unity and use a binaural spatialiser there, which retrospectively thinking would have been the best option, but I ended up creating the scenes in Reaper using VST plugins and exporting the audio in 2nd order Ambisonics as I did with the film sequences. EDIT: Later I did just that: recreated the scenes in Unity using animated audio objects and dearVR spatialiser plugin.
For Unity the ultimately best spatialiser I have tested so far is dearVR. It sounds far more realistic than any other spatialiser I have experience with. Conveniently dearVR released their VST plugin during my intern period, so I immediately downloaded a demo, and it sounded amazing. However the price was a bit too much for this project, so I started to look for other options.
An interesting choice would have been the SPARTA collection by Aalto University Acoustic Laboratory. Some of the plugins in the collection use the VBAP method developed by Ville Pulkki's research groups over the years. However I got hold of the plugin set just after my internship, so I could not utilise it for this project.
After some tests with several plugins I ended up using IEM plugin suite by the Institut für Elektronische Musik und Akustik in Graz, Austria. It souns quite nice, although the reflections of the room-modelling feel a bit metallic to my ears, and it turned out to have some little irritating bugs, but nothing too serious.
In Reaper I had three kinds of tracks, 2D tracks, 3D tracks and Ambisonic tracks. They were routed to either a head-locked stereo bus or a 3D master bus using 2nd order Ambisonics.
2D tracks were either mono or stereo and they contained head-locked audio:
- subjective dialogue (voice of the first-person character as if heard inside the spectator's head)
- subjective hallucination effects
- linear, not-spatialised music (only one short segment; all the other music was interactive and created in Wwise)
2D tracks were routed to a stereo bus named "HEADLOCKED". That stem was exported to Wwise as a stereo wav file and played back with no spatialisation.
3D tracks contained all the audio that was supposed to be spatialised in 3D. The source audio material was mainly mono, but the tracks had 9 channels to cater for the 2nd order Ambisonic decoding. The 3D tracks contained:
- dialogue
- foleys
- spot effects
Each 3D track had an FB360 Spatialiser plugin inserted in them. The plugin enabled me to position the audio source in the 3D space using the video image as a reference. With automation I could move the positioning dynamically. In most of the tracks I had also EQ and compressor before the spatialiser, and some tracks had other effects, too.
The 3D tracks were 10-channel tracks of which 9 were needed for the AmbiX format (Reaper doesn't support 9-channel tracks, the closest is 10). These tracks were first routed to the "3D ROOM REV" bus where I inserted a small amount of room reverb to all of the 3D tracks. I used FDNReverb by IEM capable of handling multi-channel audio. Although a bit "unorthodox" method, combined with the real room audio coming from the SoundField mic tracks gave quite nice results.
From the 3D ROOM REV bus the audio was routed to 3D MASTER. This track had only a loudness meter in it (FB360 Mix Loudness). The export of the 9 channel wav file of this stem was imported into Wwise and played back in sync with the video.
Ambisonic tracks contained 1st order Ambisonic audio (4 channels) from the SoundField microphone. They were edited to match the dialogue tracks.
The Ambisonic tracks were also routed to the 3D MASTER. They were originally in 1st order FuMA format, but converted to AmbiX (ACN with SN3D normalisation) with ATK plugin by the Ambisonic Toolkit Community. Unlike in FuMa, in ACN the channel order stays the same when adding or removing harmonic components, so it was no problem routing the 4-channel 1st order material to the 9-channel 2nd order 3D MASTER track.
For monitoring the 3D MASTER was routed to a bus named BINAURAL CONTROL which housed FB360 Control plugin. It's a binaural spatialiser reacting to the head position. The head position data comes from the FB360 Video Player playing back the video in sync with Reaper. Without a VR headset one can "move the head" with mouse. With a connected HMD the data comes from the headset. I will talk about that more later.
In my opinion the FB360 Control plugins' binaural modelling sounds flat, but I used it only for monitoring purposes. The actual binaural modelling happens inside Wwise.
The biggest headaches during the project were caused by the FB360 Spatial Workstation. For a free toolkit it has great features, but there are also many problems and bugs. The biggest issue is the video engine: the FB360 Video Player works only with the DNxHR codec. The Reaper, on the other hand, works nicely only with DNxHD coded. At least that was the case with my computer and setup. Therefore I needed to convert all the 360 videos to both formats. For DNxHD I used MPEG Streamclip, and for DNxHR I used ffmpeg running in Mac Terminal. I created an Apple Automator script to make the conversion process easier, but still these conversions took always quite much time.
However, even with the right codec the videos ran poorly: often the video just did not play at all, and when it did it took a couple of seconds after hitting play before the video was synced.
When the project grew bigger and the amount of plugins grew, my laptop (12" MacBook Pro, late 2013 with 8GB ram) just could not handle the workload. I spent almost two days just transferring the project into two other computers at the ESBA-TALM, but that did not help much: The studio computer was a Mac Mini, which had similar capacity as my laptop, and the studio was not often free for my use. The other machine was a Mac Pro which was capable enough running the project, but it was occupied by Christophe with picture editing, stitching and 3D modelling.
The use of FB360 suite also requires multiple screens as the plugin windows are quite big and one has to be able to see simultaneously the Reaper timeline, the video window and the spatialiser window. So I could not do much with a laptop only, but always needed to connect that to an external screen, preferably two. That might have affected the performance, too.
As mentioned, one of the key reasons I chose FB360 was the ability to use real-time monitoring with a VR headset. However, in the end I did not have opportunities to actually utilise that feature.
The setup was a bit complicated. The FB360 VR monitoring works only on Windows, but I was editing everything on Mac. Transferring all of my stuff into a Windows machine would have been very difficult and time-consuming, and there were no suitable machines available for me to use. However the FB360 allows a connection between a Mac and a Windows machine over a local network, so I was able to connect a Vive set to a Windows laptop, load the video on FB360 Video Player on that machine and run it in sync while using the Reaper project on my Mac. This worked perfectly once when I tested it, but later when I really needed the feature the FB360 Video Player refused to playback the videos. After several reinstallations and Google searches I gave up without succeeding.
We also had only one HTC Vive system at our disposal, and it was almost always used for either editing or some other people with their projects, so even if I had managed to get the real-time monitoring working it was unlikely that the VR headset would have been available.
Ability to mix and adjust audio spatialisation with using an HMD would have been very useful and almost essential in this kind of project, but unfortunately it did not work this time. Having said that I have to admit that during the test I managed to conduct it felt quite cumbersome to make an adjustment in Reaper, put the headset and headphones on, try to find the spacebar for playback, test the sequence, stop Reaper, take the headset off, take the headphones off, make a readjustment, and repeat everything. There should definitely be mixing and playback controls inside VR so that one can make adjustments without taking the HMD off. I have not tested, but I am in belief that the Aspic systems has something like that. Also as far as I know dearVR offers a similar setup.
Starting spatialisation early without waiting for the final edits was of course a risk. Schedule-wise it was necessary as the final edits came very late. The problem was that the azimuths or yaws of the 360 shots (the horizontal direction of the gaze) were changed many times over the different edit versions. In a "traditional" 360 film with only one or a few camera positions that would not have been a big thing, but in this project the multiple camera angles and constantly changing azimuths meant that I had to keep re-spatialising the 3D tracks and rotating the Ambisonic audio to match the updates in azimuth rotation.
In an optimal project the spatialisation (and sound editing in general) should start only after the locked picture edit, but in real life that might not always be possible. Maybe the video editing software manufacturers should add a feature that inserts reference data of the azimuth inside the video clips: when the azimuth is changed in the video editing software and the new picture edit is opened in DAW the updated azimuth data would be available and could be automatically applied to the audio spatialisation.
For me it was also difficult to rotate the Ambisonic audio to match the visual azimuth; I had nothing else to trust but my ears, which worked ok, but was slow. In the shooting I always placed the microphone carefully so that it was facing the same direction as the main camera or 0 degree line, but during the video editing and stitching process that 0 degree line was not always known anymore after several azimuth changes. A simple solution would have been taking a still photo of the Ambisonic microphone to the direction of its 0 degree line. That reference image could have been used in the final rotation adjustments to check against the video image. Or then one could use some kind of universal compass/coordination system for both camera and audio rigs while setting them up.
Next:
Previous: