Audio augmented reality (AAR) isn't simply an auditory medium; it's a multi-sensory, embodied medium that merges virtual sounds with the real world. Some AAR systems even go further, removing and replacing real-world sounds from one's personal perception. While it is often associated with audio walks, situational awareness, and all kinds of informative applications, AAR is also a powerful medium for art, entertainment, and storytelling in its own right.
In this book, published by Routledge in May 2025—and apparently the first to cover the subject comprehensively—I examine AAR from conceptual, historical, technical, and creative angles with a strong emphasis on narrative design. I frame AAR as an overarching concept, encompassing various definitions for meaningful blends of virtual and real sounds. I propose a continuum where audio augmentation features are gradually introduced to an application, with no definite point at which something becomes AAR.
I advocate for an interpretation where the content of an AAR experience doesn't necessarily require digital mediation. The idea of embedding artificial sounds in reality reaches back to prehistory, from reverberant cave walls and talking automata, long before the term was coined in the early 1990s as a subset of (computer-generated) augmented reality. What unites different AAR is a common goal: making the technology invisible, so that virtual and real sounds are simply there, out in the world to be heard.
The book also challenges the assumption that AAR must operate in the tangible, physical world. I argue that augmentation can target other observable environments such as a televised scene, a telephone call, or even one's personal 'headphone reality,' where virtual content blended into the listening experience can alter the perceived reality of the broadcast itself. Alongside this, I propose that the 'reality' being augmented can itself vary in realness, from live human actors to animatronics to video projections, each still capable of hosting meaningful audio augmentation.
The book surveys a wide range of AAR experiences and applications. For instance, in Chapter 5, examples from the 1990s to the present are categorised by the focus of augmentation—exhibitions, reinforced environments, and reskinned environments—drawing in part on Azuma's (2015) categories. Chapter 8 turns the ears towards narrative design, and Chapter 9 proposes a framework of narrative techniques characteristic of AAR as a medium—a vocabulary that, to my knowledge, has not been attempted before. Some of the many techniques and concepts identified and discussed are:
| Object congruence | Fully plausible contextual alignment of an object and its sound |
| Incongruence | A contextual mismatch between an object and its sound relative to expectations |
| Addition | Introduction of supplementary sounds that manipulate the perception of a sound-producing object |
| Alternative match | Use of audio that provides a different yet still congruent perspective on an object |
| Extension | Space beyond the observable field created and suggested by sound |
| Revelation | Source of an obstructed sound gets revealed |
| Acousmatic sounds | Sounds without physical counterpart |
| Attractor sounds | Objects beckoning the user with distinctive sounds |
| Spatial scale and offset | Spatial coordination system and scale manipulated in various ways |
| Near field | Utilising sounds very close to head |
| Removal and replacement | Removing or replacing sounds in user's perception |
The book is available from major online bookstores such as Routledge, Bookshop.org, Kobo, and eBooks.com. You can also look for it in your university library, and if it's not yet there, why don't you file a request!
From the book: Visitors waiting for their audio tour to start at Maison Gainsbourg in Paris. Photograph by the author
Since writing the book, both the field and my own knowledge have evolved. Here, I will be revisiting some chapters with updates and additions that didn't make it into the original book. This is a (slow) work in progress, so bear with me!
The book starts with a future vision of everyday, ubiquitous personal AAR. In the story, set in 2045, people use earbuds capable of manipulating their auditory perception by adding and replacing sounds around them.
It's hard to predict the future, but so far AR glasses seem more promising than earbuds as personal AAR devices. This is discussed later in the book, and one obvious reason is that eyewear makes it easier to install forward-looking sensors than ear-mounted devices, which can also be obscured by caps, hair, and so on.
In the more distant future, however, I think that compact and almost unnoticeable ear-mounted devices will be the winning platform. That is, provided that AAR keeps developing in such a ubiquitous direction. Something to stay alive long enough to witness!
The book lists possible application domains for AAR (see, e.g., Yang, Barde & Billinghurst, 2022):
1) situational awareness
2) navigation
3) presentation and display
4) narratives in education, training, entertainment, art, and social applications
5) non-narrative art
6) telepresence
7) healthcare
At least one important domain is missing from this list, though it's mentioned a couple of times in the book:
8) auditory enhancement
If situational awareness is about adding contextual information, 'auditory enhancement' would be more about improving and filtering the sounds already present. This would mean improving what the user hears: auditory equivalents of sunglasses, spectacles, binoculars, or microscopes. Examples include isolating and enhancing speech in noisy environments (something modern hearing aids already do) as well as real-time translation from one language to another. And potentially much more, including applications we can barely imagine yet.
In this subchapter, I discuss sense of presence and immersion, but not clearly distinguishing these concepts from each other. Only after returing the manuscript, I came across a paper by Hyunkook Lee (2025) that proposes a unified model distinguishing immersion, presence, and involvement. His framework nicely defines physical presence, social/self presence, and involvement as the core dimensions of an immersive experience, arising from different types of engagement. I recommend reading it!
For more about the potential dangers augmented reality may pose to society and democracy, see Morten Bay. His 2023 article lays out some genuinely unsettling scenarios, and his new book Mediating Plureality (2025) is currently on my bedside table.
. . . . . . .
More updates and comments coming up...