Voices in Ruins

In Voices in Ruins, movements of visitors “make waves” of historical sounds in a gallery space as an acoustic medium. Software sound synthesis is controlled by computer vision to detect position changes of people and objects. The underlying audio construct is a semantic network of historical speech recordings, accompanied by voice analysis residuals and radio transmission signal processing. A number of historical speeches in sound files have been analyzed in terms of signals and residuals. Oceans of artifacts play as metaphor within the residuals, environmental backgrounds, ambiences and traces of transmitting signals. Visitors can separate and re-composite speech signals from residuals as if they were practicing archeology. We engineer the synthesis system to afford multiple layers of acoustics, responding to local or global activities in the room by techniques such as spatialization and simulated occlusion. An 8-channel sound diffusion system provides differentiation of sound sources at multiple regions in the space. In addition, headphone-based listening stations provide coordinated “perspective views” of the auditory environment.

Project Description

Voice in Ruins is a sound installation presented at the Krannert Art Museum, University of Illinois at Urbana-Champaign, January 19 – March 20, 2000, and at the David Dorsky Gallery, New York, NY, May 9 – July 24, 2000.

Artistic goal: Bringing visitors’ attention to their own acts of assuming a disposition towards a displayed audio signal, from roles as distant observers to intimate observers.

Technological goal: Embedding a sensory data* retrieval system in a movement-sensitive space and a surround sound system.

*sensory data: a finite number of historical speech excerpts in audio form

Scenario

Voice in Ruins uses video sensing and real-time sound synthesis technology. Computer vision technology mediates this interface, gathering and evaluating luminance changes in physical space, and communicating the data to synthesis engines, databases, and the composed environment. The video system detects position changes of people or objects in the gallery. These changes are applied to control sound. The sound is assembled and processed by computation “on the fly”, not merely reproducing pre-recorded sounds. In the current installation, the gallery contains mostly open space with focused spot lighting and almost no other lighting. A small number acoustic partitions have been created at the periphery of the open space: occluding walls, corridors, and windows from one listening space to another. Moveable objects in the space are also measured and applied to direct manipulation of sounds. The computer vision system enables natural and unencumbered movements by gallery visitors to “make waves” in the space as an acoustic medium. In one version of the installation these waves are visualized in a computer graphics display adjacent to the main gallery.

Thesis

We prototype a sensory information retrieval system. In this project we focus on historical speech from which under-explored information may rise above the common practice of the information age. In this common practice written texts are the dominant form of information, the “word”. We observe that the logo-centric focus on texts enables only a part of information to be available, and discards the rest. Examples of overlooked or discarded information include vocalization techniques of speakers, meaning how a speaker distributes his or her vocal energy (longer or shorter vowels/harsh or round consonants/inflections/the unit of articulation/rhythm and pace), also the timbres of voices, and how the speakers address their audiences, also ambient space (outdoor speech or indoor speech), the speaker’s views toward the apparatus used in recordings, and how they addressed the apparatus. Needless to say this information, when available, offers in-depth access to the thought processes of the historical figures and the context in which they exercised their voices. We propose this art project as a prototype for data retrieval sensory information technology.

What is in the sounds ?

A number of historical speeches in sound files have been extensively analyzed in terms of signals and residuals. Oceans of artifacts play as metaphor with the oceans of residuals, air waves, evidences and traces of transmitting signals. Visitors can separate and re-composite speech signals from residuals as if they were practicing archeology.

The underlying audio construct is a semantic network of historical speech recordings, with an accompaniment of voice analysis residuals and radio transmission signal processing. Sound synthesis and processing includes physical and spectral models and analysis-resynthesis techniques, extensive dynamic modeling of noise and residuals, and of the acoustic environment that hosts them. We engineer the synthesis system to afford multiple layers of acoustics, responding to local or global activities in the room by techniques such as spatialization and simulated occlusion. An 8-channel sound diffusion system provides differentiation of sound sources at multiple regions of the space, and de-correlation techniques are applied to create a diffuse sound bed for non-localized sounds.