UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Sound synthesis for virtual reality and computer games Doel, Cornelis Pieter van den 1998

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_1999-389936.pdf [ 11.06MB ]
JSON: 831-1.0051161.json
JSON-LD: 831-1.0051161-ld.json
RDF/XML (Pretty): 831-1.0051161-rdf.xml
RDF/JSON: 831-1.0051161-rdf.json
Turtle: 831-1.0051161-turtle.txt
N-Triples: 831-1.0051161-rdf-ntriples.txt
Original Record: 831-1.0051161-source.json
Full Text

Full Text

Sound Synthesis for Virtual Reality and Computer Games by Cornells Pieter van den Doel Ph.D., University of California at Santa Cruz, 1984 M.Sc, University of British Columbia, 1994 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF T H E REQUIREMENTS FOR T H E D E G R E E OF  Doctor of Philosophy in T H E FACULTY OF G R A D U A T E STUDIES (Department of Computer Science) we accept this thesis as conforming to the required standard  The GnivefcSity of British Columbia November 1998 © Cornelis Pieter van den Doel, 1998  In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t of the requirements f o r an a d v a n c e d d e g r e e a t t h e U n i v e r s i t y o f B r i t i s h C o l u m b i a , T a g r e e t h a t t h e L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and s t u d y . I f u r t h e r agree t h a t p e r m i s s i o n f o x e x t e n s i v e c o p y i n g of t h i s t h e s i s f o r s c h o l a r l y p u r p o s e s may be g r a n t e d by t h e h e a d o f my d e p a r t m e n t o r by h i s o r h e r r e p r e s e n t a t i v e s . I t i s u n d e r s t o o d t h a t c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l not be a l l o w e d w i t h o u t my w r i t t e n p e r m i s s i o n .  Dep; i r t m e n t  of  (^M^^^<"  The U n i v e r s i t y o f B r i t i s h V a n c o u v e r , Canada  Sc^y^Oi-^ Columbia  Date  lof2  3/31/99 1:27 PI*  Abstract T h e synthesis of audio in real-time computer simulations is investigated. A physics based parameterized v i b r a t i o n model for physical objects is constructed, and a realtime synthesis algorithm is developed which allows the synthesis of the sound made by such objects under any kind of interaction force. M e t h o d s for o b t a i n i n g the parameters of such models are investigated. W e s t u d y m a t h e m a t i c a l models w i t h simple geometries, parameter fitting to measured d a t a , and e m p i r i c a l models. M o d e l s for interaction forces o c c u r r i n g d u r i n g contacts between rigid bodies such as impact and sliding interactions are developed, as well as models for the d r i v i n g forces for combustion engines and avalanches. Studies were conducted of several objects which were successfully modeled w i t h these techniques.  Several computer programs were w r i t t e n for the testing of  models, for the construction of models, and for the demonstration of the level of realism that can be achieved w i t h this type of synthesis. It is concluded t h a t this type of synthesis can generate realistic, interactive audio using only a small fraction of available C P U power on modern personal computers.  11  Contents  Abstract  ii  Contents  iii  List of Tables  vii  List of Figures  viii  Acknowledgements 1  2  xii  Audio in Virtual Reality  1  1.1  Goals and Background  1  1.2  Example of an Audio Simulation  4  1.3  Contributions of this Thesis  9  Background  12  2.1  Sound Perception  12  2.2  Representation and Rendering of Computer Sound  14  2.3  Environment Modeling  17  2.4  Computer Music  19  m  3  4  5  6  Physics of Sound  22  3.1  Sound Propagation  23  3.2  Fundamental Equations for Gases  23  3.3  Linearized Equations for Acoustic Waves  26  3.4  Interaction of Acoustic Waves with Solids  29  3.5  Sound Generation  30  3.6  Vibration Theory  34  Contact Sounds  38  4.1  Overview  40  4.2  Vibration Modes from Shape  40  4.3  Mode Amplitudes from Impact Location  44  4.4  Sound Sources from Vibrating Shapes  48  4.5  Sounds and Material Properties  49  4.6  The Sound Map  53  4.7  Real-time Synthesis  55  Creating Vibration Models  61  5.1  Computing Model Parameters Analytically  62  5.2  Fitting Model Parameters to Empirical Data  65  5.3  Designing Model Parameters by Hand  80  Interaction Force Models  82  6.1  Impact Forces  •  6.2  Continuous Contact Forces  84  6.3  Live Data Streams  86  6.4  Engine Forces  86 iv  82  7  Implementations 7.1  7.2  7.3  8  90  General System Architecture  90  7.1.1  SynthesisEngine  94  7.1.2  Controller .  Authoring Tools  •  . . . :  99 103  7.2.1  Off-line Model Tester  103  7.2.2  Vibration Model Generators  104  Demonstration Programs  105  7.3.1  Sonic Explorer 1.0  105  7.3.2  Sonic Explorer 2.0  106  7.3.3  Sonified Objects  108  7.3.4  Avalanche Sounds  110  7.3.5  Location Dependent Sounds of Mathematical Shapes  Ill  7.3.6  Virtual Bell Tower  112  7.3.7  Real-time Model Tester  113  Conclusions and Extensions  121  8.1  Summary of Results  121  8.2  Extensions and Future Work  123  8.2.1  Faster Synthesis  123  8.2.2  Integrate Waveguide Models  123  8.2.3  Improve Parameter Fitting  124  8.2.4  Improve Interaction Models  125  8.2.5  Non-linear Models  125  Bibliography  128  v  Appendices  134  Appendix A File Format for Vibration Models  135  Appendix B Parameter Fitting Data  139  Appendix C Audio Synthesis Techniques for Music  162  vi  List of Tables  1.1  Graphics-Audio analogies  3  7.1  Performance of synthesis algorithm at a sampling rate of 22,050 Hz on a 233 Mhz Pentium P C with M M X using a C implementation and a Java implementation  98  vii  List of Figures  1.1  The stages of modeling needed to compute audio  4  1.2  Interactive environment: a hammer hitting a bar  6  1.3  Interactive environment: a hammer hitting a bar at different locations  8  4.1  First nine eigenfunctions of a square membrane  4.2  Excited frequencies of a square membrane struck near the corner. . .  4.3  Excited frequencies of a square membrane struck near the middle of  43 -46  a side  47  4.4  Excited frequencies of a square membrane struck near the center. . .  47  4.5  Ratio of frequency/damping for the first 100 modes sorted by amplitude (left) and frequency (right) of a struck metal vase. Reconstructed with a Fourier window of 4096  4.6  51  Ratio of frequency/damping for the first 100 modes sorted by amplitude (left) and frequency (right) of a struck hockey stick. Reconstructed with a Fourier window of 1024  vm  52  4.7  Ratio of frequency/damping for the first 100 modes sorted by amplitude (left) and frequency (right) of a struck computer tower box. Reconstructed with a Fourier window of 1024  4.8  52  Ratio of frequency/damping for the first 100 modes sorted by amplitude (left) and frequency (right) of a struck sword (sword I). Reconstructed with a Fourier window of 2048  5.1  53  Sonogram of recorded (left) and reconstructed (right) impulse response of vase. The window size of the Fourier transform was 4096 and the sampling rate was 44100 Hz  5.2  .  Identified modes for vase. Shown are the logarithmic amplitudes of the frequency peaks and their linear fits  5.3  73  Identified modes for vase. Shown are the frequencies corresponding to Figure 5.2  5.4  71  74  Identified modes for vase. The frequency response is shown. The bottom figure shows a subset of the frequency range of the upper figure  75  5.5  Frequency response of vase for three regions  79  6.1  Sample driving four stroke one cylinder engine  87  6.2  Sample driving four stroke one cylinder engine with fan  88  6.3  Sample driving four cylinder engine  89  7.1  System architecture for applications with real-time audio synthesis. .  91  7.2  Off-line model tester  103  7.3  A room modeled with the Sonic Explorer  105  7.4  A xylophone modeled with the Sonic Explorer  107  ix  7:5  A vase modeled with the Sonic Explorer  7.6  Plastic swords with embedded contact mikes used to create metal  > 107  sword sounds.  109  7.7  Applet to generate avalanche sounds  Ill  7.8  Applet to create location dependent sounds for a variety of objects. . 112  7.9  Applet for ringing a bell tower  114  7.10 Applet for ringing a bell tower  115  7.11 Real-time model tester. Main control panel  116  7.12 Real-time model tester. Interaction windows to scrape and hit objects. 117 7.13 Real-time model tester. Engine model editor  117  8.1  Pendulum is non-linear when colliding with the walls  127  B.l  Top of vase with a window of 1024  140  B.2 Top of vase with a window of 2048  141  B.3 Top of vase with a window of 4096  142  B.4 Top of vase with a window of 8192  ! 143  B.5 Top of vase with a window of 1024: Material constant  144  B.6 Top of vase with a window of 2048: Material constant  144  B.7 Top of vase with a window of 8192: Material constant  145  B.8 Santur with a window of 1024  146  B.9 Santur with a window of 2048  147  B.10 Santur with a window of 4096  148  B . l l Santur with a window of 8192  149  B.12 Piano with a window of 1024  150  B.13 Piano with a window of 2048  151  x  B.14 Piano with a window of 4096  152  B.15 Piano with a window of 8192  153  B.16 Computer tower with a window of 512  154  B.17 Computer tower with a window of 1024  155  B.18 Computer tower with a window of 2048 . . .  156  B.19 Computer tower with a window of 4096  157  B.20 Computer tower with a window of 512: Material constant  158  B.21 Computer tower with a window of 2048: Material constant  158  B.22 Computer tower with a window of 4096: Material constant  . . . . . 159  B.23 Bell with a window of 1024: Material constant  160  B.24 Bell with a window of 2048: Material constant  160  B.25 Bell with a window of 4096: Material constant  161  B.26 Bell with a window of 8192: Material constant  161  xi  Acknowledgements I would like to thank my thesis supervisor, Dinesh K. Pai, for his active participation in this research and his support. I appreciate feedback and comments from the other members of my thesis committee, Alain Fournier and Uri Ascher. Thanks to IRIS, ASI, and IVL Technologies Ltd., for financial support.  C O R N E L I S  The  University  November  of British  Columbia  1998  xii  P I E T E R  V A N D E N  D O E L  Chapter 1  Audio in Virtual Reality The research described in this thesis was conducted to create a methodology for the rendering of audio in virtual reality environments such as simulations and computer games. The focus is on a topic which has not been investigated very much at the time of writing, namely the computation (synthesis) and rendering of "natural" sounds produced by physical objects. Most research in sound synthesis has concentrated on "computer music," where the primary interest is to create interesting sounds, or to reproduce musical sounds, rather than to faithfully reproduce existing naturally occurring sounds.  1.1  Goals and Background  Audio techniques [37] used in games and virtual reality (VR) are more inspired by movie and television sound effects, than based on a physical simulation approach. Unfortunately, because of the interactivity of games and V R , these techniques are of limited use in such environments. A common technique to generate audio in a game or simulation is to play 1  back a prerecorded sound (a "sample") when a certain event, such as one solid object colliding with another, or a player of a video game killing the final monster, takes place. For "effects" such as a triumphant fanfare after winning the game this is appropriate, but for more subtle interactive sounds caused by physical objects it is quite tedious to have the same sounds repeated over and over again. Other types of environmental sounds, such as those caused by objects sliding and rolling, are continuous and are driven by the momentary state of the physical objects that cause them. They can not be prerecorded and some type of synthesis is necessary. Synthesis is a form of modeling or simulation, so it is useful to compare audio synthesis with simulation in a more general context. A lot of research in "reality simulation", i.e., the creation of a sensory illusion with the computer, has focused on computer graphics rather than on computer sound. As a result, many techniques and software packages exist to display and animate a scene using computer graphics at many levels of realism, but similar techniques to recreate the sounds of the scene are still underdeveloped. The study of creating sound with a computer is analogous to computer graphics, just as the study of interpreting sound with a computer is analogous to computer vision. However, the analogy between sound and graphics is not straightforward, because vision and hearing are very different sensory modes. Nevertheless, it is useful to try to draw analogies between the two. In Table 1.1 we have indicated some possible connections one can make at various levels. The correspondences indicated in the table are intended to stimulate the mind, rather than to show actual connections between sound and graphics. For each of the correspondences one can find arguments against and for the analogy. The modeling and simulation of rigid bodies with contacts has been inves-  2  Sound  Graphics  Sample Sampled sound Stereo sound Spatial sound F M synthesis Sound emission Vibration analysis Distance attenuation Impact sounds Scraping, rolling Material Room acoustics  Pixel Digital picture Perspective Stereo display Color-map animation Shading Surface modeling Haze or fog Flashes Animation Color Raytracing and radiosity  Table 1.1: Graphics-Audio analogies. tigated widely [78, 79, 47, 42, 10, 9, 22] recently, and this type of simulation has a lot in common with the simulation of audio. In both cases the task is to update the state of the objects/audio-buffers in a manner consistent with the (simplified) physical laws governing the simulation. A rigid body simulation would also be an ideal environment to integrate with an audio simulation, as the information about body contact forces, collisions, and contacts, which is usually directly available from the simulation, can be used to drive the audio synthesis. What are the physical factors determining the generation of audio? Sound is most commonly created by accelerated bodies and their vibrating surfaces: Heat and turbulence can also act as sources of sound. These bodies interact with the medium (air, usually), and generate pressure waves that propagate through the air, and scatter from other objects. The sound is scattered at the ear and passes through the ear canal and is detected at the eardrum. It is often possible to regard the sound production as occurring in several  3  Impact  Vibration  Emission  Propagation  Listener  Figure 1.1: The stages of modeling needed to compute audio. stages that can be modeled independently. A force such as an impact is applied to a material object, which vibrates. The object emits sound waves, which propagate through the environment and are detected by the human ears. We have depicted this audio "pipeline" in Figure 1.1. To produce realistic simulated sounds with a computer we need to understand the physics behind the sound generating phenomena as well as the nature of human perception of sound. A simplified model of the complete physics of the sound generation should be created, so that the associated sounds can be computed and rendered in real-time. The computation should be based on physical laws, not on "hacked" algorithms. This provides the possibility of systematic refinement, based on knowledge of physical laws.  1.2  Example of an Audio Simulation  When we strike an object such as a metal bar, we hear a brief transient or'click followed by a more or less sustained sound, which then decays. The click or onset has some role in identifying the sound. For example, try listening to a recording of a flute played backwards. The sound is no longer as clearly recognizable as a flute, even though the sustained part of the sound is unchanged. See also [15, 51]. Nevertheless, a lot of information about the nature of the object is present in the sustained part. To obtain this, we need to compute the vibrations of an object when it is struck, and compute the resulting sound emitted. The sound made by a struck object depends on many factors, of which we  consider the following: 1. The shape of the object. The propagation of vibrations in the object depends on its geometry.  This is why a struck gong sounds very different from a  struck piano string, for example. Shape and material together determine a characteristic frequency spectrum. 2. The location  of the impact.  The timbre of the sound, i.e., the amplitudes of the  frequency components, depends on where the object is struck. For example, a table struck at the edges makes a different sound than when struck at the center. 3. The material  of the struck  object.  The harder the material, the brighter the  sound. We also relate the material to the decay rate of the frequency compo-^ nents of the sound through the internal friction parameter, see below. 4. The force  of the impact.  Typically, the amplitude of the emitted sound is  proportional to the square root of the energy of the impact. A l l these factors give important cues about the nature of an object and about what is happening to an object in the simulation. As an illustrative example, consider an environment where a user interacts with a metal bar by hitting it with a (virtual) hammer. When the user hits the metal bar, we want to synthesize the appropriate sound. Obviously, whatever processing we do after the impact will have to be done very fast, or unrealistic latency will appear. (Typically, for musicians, 20ms latency is considered acceptable, 3ms would be ideal. ) 1  'Andy Schloss, private communication.  5  Figure 1.2: Interactive environment: a hammer hitting a bar. A simple approach is to record a sample of an actual bar being struck, and load this sample in memory. When a strike event occurs the sample is then sent to the audio hardware. But in this approach the sound will always be the same, no matter how hard or soft the user strikes the bar. One solution would be to record a set of samples for various levels of impact. A t the impact event the sample that corresponds best with the actual force will be rendered. Another solution would be to change the amplitude of the sample in real-time before rendering it, to correspond to the volume of the sound for a particular strike force. This is generally such a low cost operation that it can be done in software even on very modest equipment. This is a very simple example of manipulating a sound by processing it with a parameterized "effect", in this case the volume. A single "prototype" sound is changed in real-time to create many sounds. This is also the simplest example of parameterized synthesis. Synthesis by physical modeling is presently a very active topic of research in computer music [20, 67], but the name is somewhat misleading. The focus of most musical instruments research is not so much on the physical modeling, but in parameterizing  sound with physical events.  The parameters of  interest to musicians for a simulated wind instrument are breath pressure, finger holes, speed of closure of finger holes, mouth position (for flute), and tongue position  (for ney). A "physical model" of a wind instrument in this context is a synthesizer that can create sounds that are parameterized by this set of physical characteristics. In this case the parameter is the force of the impact, which is translated into audio volume. One may argue whether this deserves the name "synthesis", but it should be clear that there is a continuum between "samples with effects" and "synthesis". In this case we are synthesizing the volume, using a sample for the basic waveform. When using more complicated effects such as 3D spatialization and reverberation effects, we add more synthesis, but still use some recorded data. However, by doing this we have sacrificed some level of realism, because we assume that the only effect of the force of the impact is a change in volume. This may be true in a first approximation, but non-linear effects cause the "timbre" of the sound to change also with the strike force. So we have sacrificed some level of detail by reusing a sample. Let us continue with the example of the virtual bar. What if we allow the bar to be moved or allow the user to listen from different locations? This could be incorporated in the synthesis by filtering the output of our synthesizer through a set of HRTF filters (see 2.3). This can be done with off the shelf hardware or software. It can be thought of as an independent "multiplexer". We now have three parameters: volume, azimuth, and elevation. (We are ignoring room acoustics here.) What if there were multiple bars in the scene? If the bars differ only in length, their sound spectra are approximately related through a simple scale factor in time, i.e., we can change the "pitch" of the stored sample depending oh the length of the struck bar. Many digital sampler synthesizers emulate musical instruments based on this principle. Typically, a group of tones will be represented by one sample, which is  7  Figure 1.3: Interactive environment: a hammer hitting a bar at different locations. played back at various rates. Because the timbral characteristics of instruments vary a lot over the entire range of the instruments, the recorded sample can only be transposed (scaled) to a certain extent, before it starts to sound "unnatural". So far, we have been successful in synthesizing the sounds of different sizes of bars, at different spatial locations, which can be struck with any force. A l l these sounds can be obtained from a single recorded sample. What if we strike the bar in the middle, and then near an end, as depicted in Figure 1.3? The timbre of the sound should change, depending on the strike point, because this will excite different resonance modes in different proportions. The same effect is well known in music; for example a violin will sound much more nasal when bowed near the bridge (which supports the strings). A linear model (developed in detail in this thesis) predicts that the intensities of the partials (constituent sinusoidal waves) of the sound change as the strike point is changed. However, their frequencies remain the same, because the same vibration modes are excited. To deal with this, we can record some samples of the sounds corresponding to various impact locations, and play back the closest one, or do a linear interpolation between two.  2  This could be called "sound morphing". It can be done in the time domain because the spectral content of the two sounds is the same. 2  8  But what if we now want to have different hammers, made out of metal, wood, and rubber? The sound will be different when hitting the bars with these different sticks. Should we now also pre-record the samples of the bar being struck by different sticks? If we want to have 10 different materials, 10 impact locations, and 10 different length bars, we would need 1000 samples! This clearly pushes' straightforward sample playback to its limits. The sample based approach does not require much modeling, and allows the use of recorded sounds, which will always (or at least for a very long time) be better than synthesized sounds. On the other hand, since recorded samples are not available to everyone, and are difficult to obtain, it restricts a designer of a virtual reality environment. A synthesis approach has the advantage that no external data/ is needed, but it also requires detailed physical modeling. Especially for complicated, structures, not enough data may be available to model the vibrations accurately. However, in such a case one can use an experimental approach, and use recorded sounds to extract the relevant synthesis parameters empirically from real objects. If we also want realistic continuous sounds when we scrape the bar with a stick, with velocity, direction, and speed under real-time user control, we can no longer use sample playback directly. One can imagine looping a sample at different speed/volume depending on the interaction, and this has been tried with limited success, but this does not take into account the resonances of the touched object. For these types of sound we need to synthesize the sounds in real-time.  1.3  Contributions of this Thesis  The work presented in this thesis is the first coherent effort to develop a physics based real-time audio synthesis methodology for environmental sounds caused by 9  interacting solid objects.  We develop a real-time synthesis technique specifically  tailored to the synthesis of sounds of solid objects under external forces. We address model rendering (audio synthesis), model authoring, interaction modeling, and software and system design issues. Using the techniques developed here, it is possi- ble to generate realistic, interactive audio in simulation or games applications using only a small fraction of available C P U power on modern personal computers. The remainder of this thesis is organized as follows. In Chapter 2 we discuss . general background material relevant to audio synthesis and environmental-sound simulation, and we describe related work in this field. In Chapter 3 we summarize the physical laws which govern acoustics and ,. which are the foundation upon which audio synthesis by physical modeling is built. In Chapter 4 we derive a physics based parameterized vibration model for physical objects from the linearized vibration equations for solid bodies. A real• • .time synthesis algorithm is developed which allows the synthesis of the sound of such objects under any kind of interaction force. In Chapter 5 we investigate methods for obtaining the parameters of the vibration models from mathematical models of simple geometries, by automatic parameter fitting to measured data, or by empirical means. Studies were conducted of several objects which were successfully modeled with these techniques. In Chapter 6, models for interaction forces during contacts between rigid bodies (impulsive forces, and sliding interactions) are developed, as well as models for the driving forces for combustion engines and avalanches. In Chapter 7 we describe the software tools that were built to implement the methodology developed. A n object-oriented application programming interface (API) is developed to interface the audio synthesis to user code, and implementations  10  in C + + and in Java are described. Several computer programs were developed for the testing of models, for the construction of models, and for the demonstration of the level of realism that can be achieved with this type of synthesis. In Chapter 8 we present our conclusions and outline directions for future research. Some technical details such as the numerical results of parameter fitting, and file formats are relegated to the appendices. It is hard to describe audio verbally or graphically, and many of our results should be heard in order to be appreciated. We have made several audio examples as well as some interactive applications available online on [4]. This material is also available on the accompanying C D .  11  [  Chapter 2  Background In this Chapter we will discuss some general topics that are relevant to sound simulation, and describe related work in this field. The topics considered are • Sound perception; • Digital representation of sound; • Environment modeling; • Computer Music.  2.1  Sound Perception  To simulate sounds, we need to know some things about how humans perceive sounds, since we do not want to waste effort on aspects of sound that are not important perceptually. Knowledge of what is perceptually important may focus sonic modeling on relevant phenomena. The human ear can detect frequencies in the range 20 — 20,000 Hz and is most sensitive in the 500 — 6,000 Hz region. The threshold of hearing is around ldB 12  (see Section 3.3 for a definition of dB), the threshold of pain is at about 140^5.: A normal conversation at a distance of 1 meter is about 50dB.  The sensitivity of  the ear at various levels of loudness can be depicted in sensitivity curves, see for example [73]. When a sound is heard, various attributes are usually associated with it by the human brain. For example one may hear it as a liquid sound, or as a distant animal call, or as a metallic percussive sound coming from a specific direction. An attempt at classifying sounds into categories (like liquid, metallic, etc.) was made in [30]. The study of how sounds are processed by humans and grouped into auditory streams, is called auditory scene analysis, and there exists a standard work on the subject [15]. Considerable recent attention has focused on the localization of sound in three dimensional space, i.e., on how we perceive sounds as coming from a specific direction. A good review of the subject is [11]. In [94] psychophysical experiments are described which are used to investigate how inter-aural time delays provide cues for left-right localization. A good review of psychological aspects of sound localization by humans can be found in [50]. For wavelengths greater than the size of the human head, the horizontal position cues come from a phase difference in the waves at the ears. These waves don't "see" the head and are therefore unaffected by it. Smaller waves are scattered by the head and by the pinna. The primary cue for position at the onset of sound with small wavelengths comes from the inter-aural delay time, and the manner in which the sound is altered (filtered) by the head and the pinna. The brain has learned to correlate these scattering effects with the position of the source. The vertical position of a sound can also be perceived, though not as accurately. Cues  13  for this come entirely from the filtering of the sound through the head and pinna. Percussive sounds are often perceived as having a distinct "onset", and a sustained part. Localization cues obtained at the onset tend to be important. The reverberations of the room also have a role in sound localization. Without reverberations it is harder to localize sounds. Different frequencies decay(attenuate) at different rates in the atmosphere. This is why sounds coming from a great distance can be heard as such, as they sound "muffled". Note that the propagation of sound through the atmosphere is strongly influenced by the weather, giving a distinct aural "feel" to various kinds of weather. Blind people report that they can "feel" the presence of objects like a wall; as a pressure on the face. I can reproduce this sensation also to some extent by moving my head close to a wall in the dark. What actually happens is that small sounds like that of your breathing reflect off nearby surfaces, which is detected by your ears but somehow perceived as a kinesthetic sensation.  2.2  Representation and Rendering of Computer Sound  Various techniques exist to record and reproduce sound. For example, the sound can be stored as a physical image of the vibrations in a gramophone, or as a pattern of magnetization on a tape. These are analogue techniques, as they map pressure variations directly to some other physical characteristics. It has even been speculated that sound can leave imprints by accident, for example on drying paint, and might be reconstructed [93]. A different method of storing sound is by digital sampling. The signal is sampled at some rate and encoded as a set of numbers, which are typically stored on some electronic device. Digitally stored sounds can be manipulated algorithmically 14  1  and recordings can be duplicated without any loss in quality. A good introduction to digital audio techniques is [72]. The signal at each ear will be represented by a scalar function of time which, apart from a multiplicative factor, specifies completely the perception at the ear. Two such signals would give a complete description of a sound as perceived by a human. There are also phenomena which are usually classified as "sound" that are perceived not directly through the ears. For example, the deepest tones of a large church organ are "felt" in the chest cavity. The signal at the ear could represent the pressure at the entrance to the ear canal, the deviation from equilibrium of the eardrum; or perhaps other relevant parameters. How this function is precisely to be translated into a physical rendering of the sound, with the aid of speaker, headphones, or other means, will depend on the setup of the speakers or on the headphone characteristics. But it should be clear that in principle any sound can be created in this manner. With loudspeakers we have the complication of "leaks" between the speakers. A simple minded setup with a speaker for the left ear and a speaker for the right ear will not work, as the right ear will also pick up the signal from the left speaker. But one could for example cancel out the "leak" from the left speaker to the right ear with an additional signal from the right speaker that had the opposite phase. Unfortunately, this requires a knowledge of the position of the listeners ear. Various techniques, sometimes called "Surround Sound", "Dolby Surround", or "3D sound", exist to partially overcome this problem. The signal function is represented by a digital sample.  The values of the  function are stored at time intervals of I/SR, where SR is the sampling rate. The values can be given as floats or as 16 or 8 bit signed integers.  15  The method of  sampling is usually indicated by specifying the sample width, which is the number of bits used to represent the value, and the sampling rate. To represent sounds with frequencies up to / Hz, the Shannon Sampling Theorem [61] tells us that we must have SR > 2 / ,  for an accurate representation of the function. If SR < 2 / , frequencies above / will "fold back" into low frequencies and produce distortions. The human ear can perceive frequencies of up to 20,000 Hz, [73]. A common sampling rate for sound , (used also for commercial C D quality recordings) is 44,100 Hz. The lowest sampling .rate in common use is 8,000 Hz, which is used often in Internet applications. A wide variety of file formats is available for the storage of digital samples. Some widely used formats are au Used by S U N and currently the only format supported by Java. (Java version 1.1.) This format stores the wave form logarithmically. wave This format can use a variety of encoding techniques, but most commonly is used with linear encoding. It is used on P C ' s . A I F C The format used by SGI. A linear encoding with a very complicated header structure with descriptions of looping etc. This is an elaboration of the older A I F F format. I F F The format used by Amiga. Most computers have some hardware to play a stereo sample in real time i.e., to output an electric signal that can be used with speakers or headphones to  16  !  generate sound. The sample will typically be located at some memory location of the computer. Depending on the available hardware, it is possible to dynamically manipulate a sample and change certain aspects of it in real time. Many things can be done in software, but often one uses special D S P hardware to do the signal processing in real time. Examples of such processors are the readily available "reverb" units, which add simulated room acoustics and other effects to an input signal in real time. Also emerging are "spatialization" engines, which modify a signal to position it at a specified location in space. With increasing processor speeds there is a tendency to assign more and more D S P tasks to the main C P U .  2.3  Environment Modeling  Suppose we have a virtual reality environment in which we want to compute the sound perceived by an observer. That is, we want to generate a signal that drives speakers or headphones that will lead to a correct perception of the sound. In principle one should model the sound generating objects, compute the resulting sound field at the eardrums of the simulated observer, and generate a signal at the headphones or speakers such that the eardrum of the human subject is exposed to the sound field computed. To make this a somewhat manageable task, it is usually assumed that the sources of the sound are sufficiently far from the observation point that we can approximate the acoustic wave by a plane wave. We can then replace the sound source with a suitable point source. We can then divide the simulation in three parts: source modeling, sound propagation (reverberation), and sound reception. By "reception" we mean the scattering of the sound waves around the head and 17  ears, which needs to be simulated if we want the sound to appear to come from definite spatial locations. The vibrational modeling to compute the point source is discussed in Sections 3.5 and 3.6. However, we can also record a sample of a real sound. This is an approach often taken, as more is known about modeling the propagation (reverberation) and reception (the scattering around the head and ears) than about modeling the sound production. Reverberation modeling involves computing reflections of the sound from surfaces along the path from the source to the observer. For this we need to have a good model of scattering of sound from materials. Some of the relevant physics will be discussed in Section 3. Off-line computation of acoustical properties of performance halls, in the context of graphical visualization techniques was investigated in [74]. When the sources and observers are not stationary, the Doppler shift changes the rate at which the wave arrives at the ear, which can be simulated by changing the effective sampling rate of the stored sample. The filtering of the sound at the ears and the head, which is direction dependent, can be summarized by a set of direction dependent filters that simulate the ears. These are called Head Related Transfer Functions (HRTF) and have received much attention recently. The basic idea is to measure the sound inside the ear for a standard source (usually white noise) positioned at various locations with respect to the subject. The relation between the source and the measured sound is then extracted as a finite impulse response filter (FIR) for every direction. These are the H R T F filters. On playback, a given "dry" sound can then be spatialized by convolving it with the appropriate H R T F . A review of the scientific and technological issues of auditory displays can  18  be found in [26]. Takaia and Hahn introduced the concept of sound rendering for computer animation [80]. They associated a characteristic sound with each object which could then be rendered after filtering the sound to model the environmental effects. Recently [31] they proposed "Timbre Trees," which, like shade trees, provide a scene description language for sounds. In [17] complex sounds such as breaking events are: analyzed and decomposed into individual components which can then be reconstructed with adjustable parameters to obtain: a form of parameterized synthesis.  2.4  Computer Music  Computer music has been around for a long time, but has traditionally been more concerned with imitating the sounds of musical instruments, or with the creation of totally new sounds, than with the reproduction of everyday sounds. Surprisingly little can be learned from computer music that can be used in the synthesis of simple sounds such as emitted by an object of known material and geometry being struck or scraped. Several synthesis techniques have been developed or are currently being investigated, which have some relevance to the synthesis of natural sounds: A comprehensive overview of musical synthesis techniques is given by Herbert Janfien, on the W W W site [7]. As this material is not published anywhere else and may be removed from this W W W site, we include a compilation of relevant material from that source with the author's permission in Appendix C .  '  In [36] audio synthesis is discussed from a more theoretical point of view and synthesis techniques are analysed using various criteria. Many of the synthesis techniques described there are based on specific al19  gorithms or specific hardware configurations to create periodic oscillations. Some methods are general enough that they could be used to generate sound for some of our purposes. One such technique is additive  synthesis,  which builds a complex sound from  sinusoidal components with controllable amplitudes.  Unfortunately,, there is no  obvious way to compute the amplitudes of the sinusoidal components of contact sounds, except for struck objects, where they decay exponentially. Sample  playbackxs  i!  currently the only technique used for environmental sounds  in games and virtual reality. It has the capability of producing "photo-realistic" sounds, but with little or no interactivity. For discrete sounds such as impacts this is an excellent method. Granular  synthesis  :  [82] superimposes many short fragments of recorded sam-  ples in a controllable stochastic manner. It has some applications in the generation of crowd sounds in computer games and seems to be well suited for other sounds which involve a large number of events, such as rain. Waveguides  [67] provide models of one dimensional structures such as vibrat-  ing strings and air columns. Because such structures have a harmonic spectrum of resonance frequencies, waveguides (a.k.a. comb filters) provide a very efficient means of synthesis and are used for musical instruments. The disadvantage of waveguides is that one has no control over the bandwidths and amplitudes of the resonances. This synthesis method is not suited for the sound of vibrating solid bodies, which usually have a non-harmonically spaced frequency spectrum. Modal  synthesis  [52] has been used for percussive instruments with a non-  harmonic frequency spectrum such as marimba and bells. Because each resonance requires additional computation, this synthesis technique is not well suited for har-  20  monic musical instruments which usually need many resonances. However it is very well suited for our purposes, and in the following chapter we shall show that this synthesis technique can be derived as an implementation of a real-time solution to the linearized vibration equations for solid bodies. Several research groups are actively pursuing the physical modeling approach to music: • The Center for Computer Music and Acoustics, C C R M A , at Stanford is developing physical models of musical instruments. They use mainly wave-guides for their modeling, but they also have done some work on modal synthesis. Musical instruments are modeled as filter banks to which an appropriate stimulus is applied. A n overview of work at C C R M A can be found on their W W W site [1]. • The Center for New Music and Audio Technologies, C N M A T , at Berkeley is developing a synthesis technique based on additive synthesis. A sound is described in frequency space where a number of dominant partials are identified with a given time evolution. A n efficient algorithm based on the F F T allows this description to be inverted in real time for sound synthesis [24]. A n overview of the research at C N M A T can be found at the W W W site [3]. • The Institut de Recherche et Coordination Acoustique/Musique, I R C A M , in their Modalys program has investigated synthesis techniques using modal models. Their interest is primarily in the synthesis of musical sounds and in the specification of control algorithms, rather than in real-time techniques. Their web site is [6] and their scientific publications are available on-line at [2].  21  Chapter 3  Physics of Sound In this chapter the laws governing the propagation of pressure waves in gases and the interaction of solids with these waves will be derived from fundamental physical laws. This allows a clear understanding of the nature of the approximations involved in the derivation, and thus shows under what conditions these simplified laws are valid and where they might break down. We will not consider any conditions under which the simplest linear wave model of sound in air breaks down in this thesis, but this is a possible direction for future research. The material in this chapter indicates how the construction of more complicated equations for pressure wave propagation might proceed. It provides enough background material to allow the reader to follow such a derivation. Material in this chapter can be found in many standard works on acoustics, see for example [13, 54, 45, 81]. We follow the notation of [81] here. In the last section of this chapter we will give a brief introduction to the theory of vibrating solids.  22  3.1  Sound Propagation  For sound propagation through fluids and gases we shall use the continuum model for fluids.  Such a model will be applicable when the Knudsen  number  is small  enough. The Knudsen number is defined by  Kn = A / / ,  (3.1)  where A is the mean free path, which is the average distance that a molecule travels between collisions with other molecules and / is the length scale of the phenomenon of interest, such as the wavelength of sound. For example, a sound wave with a frequency of 20,000 Hz (at the limit of the audible frequency range) has a wavelength of 1.7 cm. The mean free path in air is 5 . 9 5 1 0 m . The Knudsen number is 3.5 X 10~ , so the continuum model is 6  _8  applicable in this case.  3.2  Fundamental Equations for Gases  We shall present the fundamental equations that govern the propagation of sound. Derivations and more detailed discussions of the thermodynamic background can be found in [81]. The local state of a simple fluid (whose composition is uniform) can be specified by two quantities. For example, at low pressures all gases obey the ideal gas law, p = RpT  (3.2)  where R is the gas constant, T the absolute temperature, p the mass density of the gas, and p the pressure. The local state of the gas can be described by p and - 23  T, or by p and p, etc. For gases in motion, we can formulate the law of conservation of mass in terms of the mass density field p(x,t)  and the velocity field u(x,t)  as  (3.3) To derive equations of motion for the gas we must specify a model for the forces and stresses acting on an infinitesimal volume of the gas. External forces on the gas are described by a vector field F, which represents the force per unit volume acting on the gas. Stresses are described by the symmetric stress tensor cr of rank two. The component <r,-j is interpreted as the i-th component of the force per unit area acting on an infinitesimal surface element perpendicular to a vector in the j direction. For a Newtonian fluid, the fluid is governed by the Navier-Stokes equation plus thermodynamic equations. Most gases and simple fluids can be accurately described by the Newtonian fluid model. Fluids with complicated molecular structure can not, but are not considered here. For a Newtonian fluid, the stress tensor is given by  o~ij —  P&ij  where S{j is the unit tensor. The deviatoric The quantity P is called the mechanical  ~\-  (3.4)  dij,  stress tensor dij pressure.  is traceless (i.e., da = 0).  For small velocity gradients, the  deviatoric stress tensor is given by d^ = 2p(eij  -  -ASij)  (3.5)  where p, is the coefficient of viscosity (also called shear viscosity) of the gas and e;j 24  is the rate-of-strain tensor, given by l.dui e  duj  - = 2 f e  ^  +  ( 3  -  6 )  The local rate of expansion A is defined as A = e„-.  (3.7)  The difference between the mechanical pressure P and the thermodynamic pressure p is related to the local rate of expansion through p-P  where p is the expansion  coefficient  v  = pA  (3.8)  v  of  viscosity.  For reference, we will now give the complete set of equations for a Newtonian gas assuming that temperature variations are small and that p is constant. v  1. Continuity: |  +  ^  )  = 0.  (3.9)  2. Navier-Stokes:  +  = ^  (3.10)  + A/*+  3. Energy: VS Pi  VT_0TVP p  P i  p  1 8 pdx*  Pi  BT y  dx ' 1  &  p  K  '  4. State: p = p(p,S).  25  (3.12)  Here we have defined  £=4+ Vt  dt  dx'  v  (3.13) ;  The viscous dissipation function $ is defined by  d> = ^ ( ^ - I A  2  ) .  (3.14)  P  The specific heat is denoted by c and A; is the thermal conductivity of the gas. The p  coefficient of thermal expansion, (3, is defined by  with  v* = 1/p.  (3.16)  For the important case of an ideal fluid, the stress force is assumed to be perpendicular to an infinitesimal surface element, and to have the same magnitude independent of the direction of the normal to the surface element. In this case the equations for the gas can be written as  and  where c is the local speed of sound. Thermodynamic considerations lead to c  (Ip) '  '^  3.3  Linearized Equations for Acoustic Waves  5  n e  S  r a v  2  =  i t y vector is given by g\  For acoustic waves in an ideal gas we linearize Equations 3.17 and 3.18. We consider small fluctuations of pressure p and density p around the background values p and 0  26  po. If the length scale for variations in velocity is much smaller than Cg/g, with Co the velocity of sound and g the absolute value of the gravity vector, the linearized equations are "  1  + ^  P  = 0,  Po{x)cl(x) dt dx du  1 dp  %  + - ^ 7  dt  po ox  (3.19,  l  ,  = 0.  1  n  .  3.20  For uniform gases we can rewrite the linearized Equations 3.19 and 3.20 in vorticityu),  a different form. First, observe that the  .dv? dx hi  is conserved. e^  k  k  v  is the antisymmetric rank-3 tensor (with e  123  ;  = 1). If we assume  that the gas is at rest initially, we will have UJ = 0. For velocity fields with zero vorticity, we can write  «' = |4 ox  1  where (j) is the velocity  potential.  (3^)  The acoustics equations can now be formulated in  terms of a single wave equation for <f>, - ^ with  =  + c W=0 2  (3.23)  The pressure fluctuation p can also be obtained from the velocity  potential by P=~Po^-  (3-24)  The energy density of the acoustic field is given by E=E  + E  k  p  (3.25)  where the kinetic energy is given by  E  k  = ipouV 27  (3.26)  and the potential energy is given by 1 2p c'o 0  In terms of the velocity potential c/>, this becomes  1 d<p dcf>  ^ - ^ c V c V  ( 3  -  2 8 )  and  The  energy-flux  vector  (the energy flow per unit area) is given by  la  P  =  (  U  3  -  3  0  )  and its time average (3.31)  I=<q > a  is called the  acoustic  Its dimension is power per area, e.g. w a t t / m . It  intensity.  2  is customary to express the acoustic intensity in  decibels  (dB). For this we define a  reference level I  ref  =  10  - 1 2  watt/m . 2  (3.32)  It corresponds to the lowest intensity at which a sine wave of 1000 Hz can be heard by the average human. The intensity level in decibels is given by  IL = 1 0 1 o g ( / / / ) . 10  re/  (3.33)  We note that the linearized equations for acoustic waves can not deal with dissipation. For this we must use the Newtonian fluid model, described in Section 3.2. For sound waves in air, the decay rate is proportional to the frequency. The amplitude decay over 1000 wavelengths (340m at 1000 Hz) is only 1-2% in air, so it can be neglected in many applications. 28  3.4  Interaction of Acoustic Waves with Solids  Sound waves are reflected (scattered) by solids. Vibrating solids generate sound waves. Both phenomena require an understanding of the fluid dynamics at the boundary between the two media. When a sound wave hits a surface, the surface will move in response. If the surface is part of the boundary of a solid there will be waves passing through the solid, and the solid will emit sound from its entire boundary. However, in many situations the propagation of sound inside the solid can be ignored and we have > to deal only with reflection and absorption at the surface. How much of the wave is absorbed and how much is reflected depends on the material properties of the' surface. Since in general we do not know the dynamical equations of the surface, some sort of phenomenological model is necessary. The  specific  acoustic  impedance  of a  boundary is used to characterize a surface as follows. We assume, that the velocity of the boundary depends on the pressure variation of the air at the boundary'with some time lag. For a monochromatic incident wave we assume the relation (3.34)  u =p/z , n  a  where u is the surface normal component of the velocity of the surface and p is n  the pressure fluctuation at the boundary. The specific impedance z depends on a  the frequency of the incoming wave and is complex in general (inducing a phase difference, i.e., a time lag, between the surface response and the pressure). Some classical topics that can be solved analytically are the sound field in a piston driven tube and transmission of waves through tubes as a function of the tube diameter. An example of the latter is the human vocal tract. The vowel sounds 29  used in human speech are produced by changing the shape of the vocal tract with the throat and tongue, thereby changing the transmission of the sound radiated by the vocal chords, see [21]. The wave equation (Equation 3.23) has been studied thoroughly. The study of spherical waves leads to an exact solution for the sound field of a pulsating sphere. The sound field of a bursting balloon can be computed analytically. This is an important topic, because one may consider a small pulsating sphere as a "pointsource" of sound leading to a field theory approach of sound. Such a point source plays a similar role as the point charge and point mass do in electromagnetism and gravity. Just as a finite body can be considered as an infinite collection of point masses, so a complex sound source can be considered as an infinite collection of point sound sources.  The advantage of this approach is its similarity to the well  studied fields of electromagnetism and gravity.  3.5  Sound Generation  Vibrating bodies generate sound waves through boundary conditions imposed on the wave equation (Equation 3.23) at the surface of a solid body. For an ideal fluid the boundary condition is u.n = U ij.n with n the surface normal and U  b  (3.35)  the velocity of the surface. However, real gases  "stick" to the surface, in which case we have  u = U. b  (3.36)  For an ideal gas, the condition given in Equation 3.35 has to be used, as viscous effects are not taken into account in the ideal gas model. 30  A particularly useful formulation of the equations governing sound generation is the formulation in terms of Green's functions. A n exact computation of the far field generated by a pulsating sphere leads to the following simple source potential for a point source (i.e., a source of volume) that is pulsating (with frequency.u)  with k = u/c  0  and r the distance from the source (which is located at the origin).  The quantity Q  0  represents the volume flow out of a small sphere enclosing the  source. The intensity of the acoustic wave described by Equation 3.37 is (3-38) and the total acoustic power is = ^ M .  n  (3.39)  The solution 3.37 can be viewed as the solution of the inhomogeneous wave equation + c V cf> = clQ(x,t)  --^  2  (3.40)  2  0  which should be compared with Equation 3.23. The general solution to Equation 3.40 can be given as 1  =  r Oix  /  t - \ ~ '\) x  x  (3.41)  jf—!-dV(x).  \X — X \  47T JV  A n important case is a surface distribution of the source Q. For a plane, the result is ik\x-x' \-iwt \x-x'\  e  ^  t  )  =  -Yj  s  f  r fa  Tn  where | ^ is just the normal velocity of the surface. 31  d  M  x  )  '  "  (3  42)  Unfortunately, Equation 3.42 can not be used directly to compute the sound emitted by a vibrating polyhedron. The problem is that the sound emitted by one facet of the polyhedron is dependent on the overall shape of the polyhedron. Some special problems can be analyzed analytically. A n example is the sound field of an oscillating piston in an infinite wall. The problem of interest for simulating the sound field of vibrating solids leads to an integral equation for <f>. This equation, known in slightly modified form as the  Kirchhoff-Helmholtz  equation [46] plays a fundamental role for an analysis of the  emission of sound by vibrating bodies. It is derived in a particularly clear way in [81]. The end of [27] is devoted to a discussion of problems and numerical techniques associated with solving the Kirchoff-Helmholtz equation. equation is that it breaks down if there is resonancein  One problem with the  the system. A t certain driving  frequencies the amplitudes of the waves become infinite (as we have not incorporated dissipation in our model) and the solution of the integral equation diverges. resonance, dissipation must be modeled.  At  In this case the amplitudes often grow  large enough that nonlinear effects become important. In such a case, adequate physical models are often not available.  There is a section in [81] on nonlinear  effects. Nonlinear acoustics is an important field of research, see for example [12]. The  study of nonlinear waves (see for example [90]) is anextensive research field. We  will now state the Kirchoff-Helmholtz equation. The free-field Green's  function (which is essentially the sound field of a harmonic point volume source with frequency u) is defined by Jk\x\ G{xM  = j-r,-r  (3-43)  where k — OJ/CO. We  consider a region V bounded by surface S, which does not have to be  32  connected. We can think of S as consisting of an outer boundary like the walls.of a room and some boundaries of objects in the room which produce sound. Let n(x) denote the outward normal (i.e., pointing into the region V) at point x on S. We assume that every point y on S is oscillating harmonically with frequency to and velocity v(y). The velocity potential is written as <j> ( )e -%L.  <f>(x t) = r  (3.44)  iut  u x  t  The time independent potential (f> satisfies u  &,(*) = J  p ( y ) ^ G ( x - y) - Mx)n  k  s  {y)-  d S(y) 2  dy  k  (3.45)  where n(y) is the outward normal at point y on S, and d S(y) is an infinitesimal 2  surface element on S. The boundary condition for an ideal gas, Equation 3.35, gives us the normal derivative of <f> on S, which appears in the right side of Equation 3.45,  n (y)^f  L  k  = n(y).v(y)  (3-46)  with v(y) the velocity of S at y. We can consider this a given quantity (computed from a model of the physics of the vibrations of the surface S). The value of (f>ui{y) on S is related to the pressure fluctuation Pu(x) (in the frequency domain) through PUJ(X)  = -ipou<j>u{x)  (3.47)  which follows from Equations 3.24 and 3.44. If the pressure on the boundary was known, Equation 3.45 would give us the acoustic field on V . By restricting x on S in Equation 3.24, and dividing S in n elements, we formally obtain a system of n equations for the n values of p on S. However, there are some problems with this approach. 33  Observe that if x lies on S in Equation 3.45, the Green's function defined in Equation 3.43 is not defined for x = 0. In this case one should take the principal .values of the integrals at the singular points. This entails a regularization of the Green's function, by defining G (x) = 0 for |x| < e and letting e —» 0 at the end. c  If the volume V is finite, there may be no unique solution for the pressure on the surface, for some values of the frequency. resonance  The physical reason is that at  the acoustic field grows without limit. In this case a separate analysis is  required to identify the resonance frequencies. We refer to [27] for more details on numerical approaches to the Kirchoff-Helmholtz equation.  3.6  Vibration Theory  Vibrating bodies are sources of acoustic waves and produce sound through the mechanism described in Section 3.5. In this section, we describe the equations governing their behaviour. Suppose the configuration of a vibrating system can be described by a vector q(i) of n real numbers. A t equilibrium, without motion, we assume q — 0. A typical: example is a set of rigid bodies connected through (damped) springs. A solid body such as a bar can be thought of as an infinite collection of infinitesimal rigid bodies connected through generalized springs. They can be approximately described by a discrete set of n coordinates q, provided n is large enough. Another route to this conclusion is to note that the solution of the partial differential equation for the vibration of a bar can be written as a discrete sum of basic functions (a generalization of a Fourier transform series), which can be interpreted as the q in the above. The vibrating system, for sufficiently small amplitudes q, is described by the  34  linear equation of motion Mq + Cq + Kq — F  (3.48)  where the dots denote differentiation with respect to time and F(t) is the external force on the system. The matrices M, C, K, and the vector F can be found for a given system by various means, depending on the system. A case of particular interest to us is a system of many Finite Elements [58] that approximates a given solid. The continuous system is divided into a number of elements, each with appropriate elasticity properties and degrees of freedom. The elements are then connected to approximate the continuous object. The degrees of freedom corresponding to the finite elements obey an equation of the same structure as Equation 3.48, in the linear approximation. The last Chapter in [66] discusses Finite Element methods in vibration analysis. We solve Equation 3.48 by writing / (3.49)  y =  The equation of motion 3.48 becomes (3.50)  y = By + r  with B  0  1  ^ -M~ K  (3.51)  -M~ C  1  X  )  and \  0  (3.52)  r —  \ M~ F l  )  The solutions to Equation 3.50 with F — 0 are y = ae  35  lit  (3.53)  where the eigenvectors a (of dimension 2n) and the eigenvalues /J, satisfy (B  - fil)a  (3.54)  = 0.  There are 2n eigenvalues, each representing an oscillation with frequency  fi/(2n).  The reason we have two solutions for every degree of freedom is that there is also a phase associated with this degree of freedom. In the presence of an external force F the solution to Equation 3.50 can be written as y = T(t)y +f T(t-T)r(r)dr  where the state transition  matrix  (3.55)  t  0  Jo  T  is given by -l  (3.56)  where  (3.57)  Ti = 0 and the 2n x 2n modal  matrix  *  0  is given by = [A,  A  2  (3.58)  »-2n  with A i the i-th eigenvector a, i.e., a solution of Equation 3.54. y in Equation 3.55 0  is given by the initial conditions on the system. We conclude that a vibrating system can be characterized by a discrete set of eigenvalues which correspond to the natural frequencies and their associated decay rates. When the system is excited by an external force of finite duration some of  36  these frequencies will be excited. The relative amplitudes after the application of the force depend on the nature of the excitation. When an object is struck, the applied force is of very short duration and contains many frequencies. Very shortly after the strike, many frequencies of the system will be excited, but most decay very rapidly, leaving only a few. The object will be perceived as having a definite "pitch" if one frequency decays much slower than the others, such as is the case for a tuning fork, for example, or if the object has a harmonic spectrum, as is the case for many melodic percussive musical instruments such as the piano. The sound emitted when an object is struck is perceived as containing a "click" of very short duration, which is a mixture of many frequencies, and a sustained part, which contains only a few frequencies. Continuous systems such as bars and plates lead to partial differential equations. For some simple systems exact solutions can be found. The classical problems that can be tackled analytically are the vibrating string, the rectangular and the circular membrane and plate, and bars, under various boundary conditions.  37  Chapter 4  Contact Sounds In this chapter we investigate the computation and rendering of sounds emitted b y solid bodies in contact. We will construct a parameterized vibration model that' lends itself to real-time synthesis. Some of the material in this chapter has been published previously by Dinesh Pai and the author in [84]. The computation of sound emitted by an object to which a time dependent driving force is applied can be done in principle as follows. 1. Formulate the equations of motion of the object under external forces. In general, this will be a partial differential equation. 2. Solve the resulting system of equations in the presence of the driving force. 3. Determine the surfaces that are exposed to air, where sound will be emitted. Using the theory described in Section 3.5, determine the acoustic field at relevant locations. Usually this will be at the ears of the (virtual) observer. Note that we also have to take bounding surfaces such as walls into account. We also have to model the scattering of sound at the pinna (the outer ear),  38  and at the head and shoulders. To make this into a manageable task, we can make a number of simplifying assumptions: • Often the observer will be "far" from the source, so we only have to compute the distant field. • Often we can bypass the entire computation of the acoustic field and just replace the sound source with a point source. This point source oscillates in a manner which is obtained from some combination of the vibrations of the surfaces of the object which constitutes a reasonable approximation to the full emission theory. • Reflections of sound at the walls adds important reality value. Commercially available digital reverb processors are available for this. • Filters that model the scattering at the pinna (HRTF) have been measured widely and are available on the Internet. There are commercial "spatializers" available, that can model the scattering of sound at the pinna in real time. A t the time of writing, low cost soundcards for P C ' s come equipped with hardware support for 3D sound and the free DirectSound3D A P I has built in spatialization support. • The reverberation and H R T F contributions to the sound are, under certain conditions, independent of the emission computation and can therefore be dealt with separately and independently.  39  4.1  Overview  Based on material and shape properties, we do a pre-computation of the relevant characteristic frequencies of each object in Section 4.2. In Section 4.3 we then divide the boundary of the object into small regions and determine the amplitudes of the excitation modes if an impulsive force is applied to a point in this region. This is similar to the tesselation of a surface for graphics rendering. The whole procedure is analogous to assigning a color to a surface and rendering it with some shading model. In Section 4.4, we normalize the energies of the vibrations associated with the different impact points to some constant value, and scale them proportional to the impact energy when rendered. In Section 4.5 we discuss the material properties and their effect on sound. The decay rate of each mode is assumed to be determined by the internal friction parameter, which is an approximate material property [91, 43]. In effect, the decay rate of a component is assumed to be proportional to the frequency, with the constant determined by the internal friction parameter. After the preprocessing, a sound parameter map is attached to an object, which allows us to render sounds resulting from forces on the body. We discuss the structure of this map and a possible approach to reduce its storage requirements in Section 4.6. In Section 4.7 we will construct an algorithm to synthesize the sound under any type of interaction in real-time.  4.2  Vibration Modes from Shape  We now introduce the framework for modeling vibrating objects. We will illustrate it with a rectangular membrane, but the framework is quite general; we have used  40  it to generate sounds of strings, bars, plates and other objects. The framework is based on the well developed models in the literature on vibration or acoustics, for example [54] - for the calculus involved we refer to [75]. The vibration of the object is described by a function u(x,t),  which reprer  sents the deviation from equilibrium of the surface, defined on some region S, which defines the shape of the object. We assume that p obeys a wave equation of the form { A - ^ M x , t )  = F(x,t)  (4.1)  with c a constant (related to the speed of sound in the material), and A is a selfadjoint differential operator, under the boundary conditions on OS. E x a m p l e : For the rectangular membrane we consider a rectangle [0 — L , 0 — L ] spanned by a membrane under uniform tension. For this x  y  case the operator A is given by A  =  dx  dy  2  2  The boundary conditions are that p(x,y,t)  is fixed on the boundary of  the membrane, i.e., the membrane is attached to the rectangular frame.  We will take the following initial value conditions. p(x,0)  =  y {x), 0  i.e., the surface is initially in configuration yo{x), and dfi(x,  0)  v (x), 0  dt  where v (x) is the initial velocity of the surface 0  41  The solution to Equation 4.1, in the absence of external forces, is written as oo  fi(x,t)  = ^2{a  sin(u ct) +  n  n  b  c o s ( u >  n  n  c t ) ( 4 - 2 )  71=1  where a  and b are arbitrary real numbers, to be determined by the initial value  n  n  conditions.  The u „ are related to the eigenvalues of the operator A under the •  ^appropriate boundary conditions (which we specify below), and the functions ^ (x) n  are the corresponding eigenfunctions. That is, we have (A + u )V (x) 2 n  n  = 0.  (4.3)  The spectrum of a self-adjoint operator A is discrete, and the eigenfunctions are orthogonal. Their norm is written as  E x a m p l e : For the rectangular membrane, the eigenfunctions and eigenvalues are most naturally labeled by two positive integers n  x  n  and  and are given by  y  ^n n {x,y) x  y  =  sm(nn x/L )sm(Kn y/L ), x  x  y  y  and  In Figure 4.1, we show the first 9 eigenfunctions on a square membrane.  As Equation 4.3 is linear, we can normalize the eigenfunctions \ P ( x ) such n  that a  n  is independent of n, which often simplifies some of the algebra. Using the  42  Figure 4.1: First nine eigenfunctions of a square membrane.  43  orthogonality of the eigenfunctions we can find the coefficients in the expansion given in Equation 4.2 as a  n  = /  Js  rf*,,  (4.4)  (x)d x>,  (4.6)  ca co n  n  and  Js  ct  n  The time averaged energy of the vibration is given by E = constantX  ^^OM)^  J  <  p  k  where p(x) is the mass density of the vibrating object. The <> indicates an average over time. If the mass is distributed uniformly, we have oo  E = constant x ^  a u>l(a n  n  + b ). 2 n  (4.7)  71=1  4.3  Mode Amplitudes from Impact Location  Next we compute the vibrations resulting from an impact at some point p, when the body is initially at rest. The initial value conditions are taken to be y (x) 0  = 0,  (4.8)  = 5(x-p),  (4.9)  and v (x) 0  with S(x) the fc-dimensional Dirac delta function. We note that Equation 4.9 is not strictly correct as an initial value condition. The reason is that the expression for the energy, given in Equation 4.6, involves the square of the time derivative of n(x,t).  But the integral of the square of the Dirac 44  delta function is infinite. One symptom of this is that the infinite sum appearing in Equation 4.2 does not converge. A mathematically more correct method would replace the delta function in the initial value conditions by some strongly peaking external force function, representing the impact on a small region of the object over a finite region and over a small but finite extension in time. However, this would complicate things quite a bit, and we would gain little in terms of more realistic sounds. Therefore we shall just assume an appropriate frequency cutoff in the infinite sum appearing in Equations 4.7 and 4.2. Typically, we will only use the frequencies in the audible range. For more details and a more rigorous treatment of this problem for the special cases of the ideal string and the circular membrane, see [54]. Using Equations 4.8 and 4.9, and substituting them in Equations 4.4 and 4.5 we obtain the amplitudes of the vibration modes as a function of the impact location as  a  n  =  (4.10) ca u n  n  and 6„ = 0. The energy of the vibration is determined by the impact strength. It will be used to scale the amplitudes of Equation 4.10. The energy is given by  E = constant X J2 71=1  where nj is determined by the frequency cutoff mentioned above. E x a m p l e : In Figures 4.2 to 4.4 we show the amplitudes a , graphed n  against the frequency of the modes (i.e., u ) for a square membrane n  45  0.16i  0.14  'm  0.12  0.1  o.oe 0.04  0.02  1000  2000  3000  4000  5000 6000 7000 Frequency in Hz  8000  9000  10000  Figure 4.2: Excited frequencies of a square membrane struck near the corner. struck at the points (0.1,0.1), (0.1,0.4), and (0.5,0.4), using a cartesian coordinate system with the origin (0, 0) in the corner and the opposite corner at (1,1). We have taken the lowest frequency to be 500 Hz and taken the first 400 modes into account.  We can see clearly that the  higher frequencies become relatively more excited for strike points near the boundary of the membrane. In other words, the membrane sounds dull when struck near the center, and bright (or sharp) when struck near the rim.  The method outlined above is very general, and allows the computation of the vibrations under impact of any object governed by a differential equation of the form given in Equation 4.1. The frequency spectrum u> and the eigenfunctions *& (x) n  n  can be computed  analytically in a number of cases. In general one has to resort to numerical methods. For membranes, the problem reduces to the solution of the Laplace equation on a given domain, which is a well studied problem. We mention the method of particular solutions [28], which we have adapted for the example of the L-shaped membrane,  46  1000  2000  3000  4000  5000 6000 7000 Frequency in Hz  8000  9000  10000  Figure 4.3; Excited frequencies of a square membrane struck near the middle of a side.  1000  2000  3000  4000  5000 6000 7000 Frequency in Hz  8000  9000  10000  Figure 4.4: Excited frequencies of a square membrane struck near the center.  47  described below in Section 5.1. For plates, the operator A is fourth order, and a more general finite element method can be used. See for example [38]'.  4.4  Sound Sources from Vibrating Shapes  After obtaining the frequency spectrum and the eigenfunctions, using methods described in Section 4.3, we have to model the relation between the vibration of the object and the sound emitted. In general the sound-field around a vibration body is very complicated and non-uniform. However, it is clear that the sound emitted can be described as a sum of monochromatic components with frequencies u c, and n  amplitudes  , which will depend on the location of the observer with respect to  the object, as well as on the environment. As a first approximation, we will identify the coefficients  with the vibra-  tion amplitudes a , scaled with the inverse of the distance to the observer, as the n  amplitude decays inversely proportional to the distance. This is not strictly correct, but we argue that it is reasonable as follows. > Consider a vibrating plate. A t some point above the plate, waves emerging from all locations on the plate arrive at this point. Some will be in phase, and some will be out of phase. This interference will depend very sensitively on the location of the observation point. However, in most real situations, the sound will not only arrive directly from the source, but also from reflections from the walls and other objects in the room. The total effect of this is to average out the phase differences, making the sound-field less sensitive to the locations of the listener. As a heuristic, we assume that the intensity (i.e., the energy) of the sound  48  emitted in frequency u  n  I  n  , I , is given by n  = Ej n  Vl(x) = constant x tf* (p).  This seems reasonable, as it integrates the intensity of the vibration, but not the phase. This means, we can identify the a^, which are the amplitudes of the heard sound, with the a given in Equation 4.10, omitting the factor a . Note that since n  n  we assumed that the eigenfunctions are normalized so that the a  n  are independent  of n, this does not matter. Finally, we obtain the amplitudes af as 5 _  ffimpact^n  u Q{p)d  '  N  with d the distance from the sound source, Q (  P  ) =  X  impact  i >  (AT\\  (p)  2  (  P  (  4  -  U  j  the energy of the impact, and  ) ,  \ i=l with nj. a suitable frequency cutoff. Of course, the  are only defined up to a  multiplicative constant (corresponding to the volume setting of the audio hardware). For a more detailed treatment of the radiation of vibrating plates, we refer to books on vibration analysis [65, 66, 27].  4.5  Sounds and Material Properties  When the object is struck, each frequency mode is excited with an initial amplitude a{, which depends on where the object is struck. The relative magnitudes of the amplitudes a; determines the "timbre" of the sound. Each mode is assumed to decay exponentially, with decay time 1 nfi tan < 49  (4.12)  where <f> is the internal friction parameter. The internal friction parameter is roughly invariant over object shape, and depends on the material only. In [91] a method was proposed to identify the material type from the sound emitted by a struck object, by extracting the internal friction parameter of the material via Equation 4.12. Such a model is also used in [80] to simulate object sounds.  Some experiments were  reported in [43], where it was concluded that a rough characterization of material was indeed possible. However, the internal friction parameter is only approximately invariant over object shape. See also [87]. To emulate external damping of the object, we add an overall decay factor of e * / ° . -  This also allows us to adjust the length of the emitted sound, while  T  maintaining its "material character", which is determined by (j). So we assume the sound-wave ps(t) to be given for t > 0 (for t < 0 it is zero) by p (t) = e - ' / s  T0  y^afe-^  t a n  ^sin(27r/ i), t  (4.13)  i=l •  with the amplitudes af given in Equation 4.11, and  / j  ~  2TT'  with the u determined by Equation 4.1. n  Although this simple one-parameter characterization of material works perceptually reasonably well, there is no advantage to restricting the damping coefficients in any way. When dealing with model parameters acquired from measurements we will allow the damping coefficients to take on values independent of the frequencies. In this case we will write the impulse response as  p (t) = M5>.-e ''), i=l in  s  with Qi = u>i + idi, where di are the dampings. 50  (4-14)  70  80  90  100  10  20  30  40  50  70  80  90  Figure 4.5: Ratio of frequency/damping for the first 100 modes sorted by amplitude (left) and frequency (right) of a struck metal vase. Reconstructed with a Fourier window of 4096. We have investigated the extent to which the ratio of frequency and damping is constant in real objects, and found that this relation is only very approximately valid. In Figure 4.5 we plot the ratio f/d for the first 100 modes of a metal vase reconstructed with the techniques developed in Chapter 5. The average ratio f/d is 477, characterizing metal, but still varies considerably, with a standard deviation of 262. In Figure 4.6 we plot the ratio f/d for the first 100 modes of a wooden hockey stick. The average ratio f/d is about 35 which, as an order of magnitude, seems a good characterization of wood. The standard deviation is 22. In Figure 4.7 we plot the ratio f/d for the first 100 modes of a metallic computer tower box. This is an extremely complex object with rattling parts inside, and this seems to be reflected in the great variance in the ratio for this object. In Figures 4.8 we plot the ratio f/d for the first 100 modes of a metal sword. The average ratio f/d is 1005. The standard deviation 949 is very high.  51  100  0  10  20  30  40  50  60  70  BO  90  100  0  10  20  30  40  50  60  70  SO  90  100  Figure 4.6: Ratio of frequency/damping for the first 100 modes sorted by amplitude (left) and frequency (right) of a struck hockey stick. Reconstructed with a Fourier' window of 1024. . •  10  20  30  40  50  60  70  80  90  100  10  20  30  40  50  60  70  80  90  Figure 4.7: Ratio of frequency/damping for the first 100 modes sorted by amplitude (left) and frequency (right) of a struck computer tower box. Reconstructed with a Fourier window of 1024.  52  100  10  20  30  40  50  60  70  60  90  100  10  20  30  40  50  60  70  BO  90  Figure 4.8: Ratio of frequency/damping for the first 100 modes sorted by amplitude (left) and frequency (right) of a struck sword (sword I). Reconstructed with a Fourier window of 2048.  4.6  The Sound Map  In the preprocessing stage we first compute the frequency and damping spectrum, and then the excitation spectrum a,- under a suitably normalized impact (i.e., with fixed energy) for a number of locations on the surface.  This gives us all the in-  formation needed to compute the sound under any kind of external force during the real time simulation. This is somewhat analogous to texture mapping in computer graphics. A n interpolation of the timbre spectrum af between pre-computed locations is also obvious to implement. One may ask how many points on the surface need to be computed. In general, the timbre of the sound changes non-uniformly over the surface. For example, a string sounds "dull" when plucked near the center, and becomes dramatically brighter when excited near the endpoints. In this case, one would need a denser set of points near the ends. Given two sounds Si and 52, a measure d(Si,S2) 53  is needed, that tells us  100  how different sound Si "sounds" from sound S , such that if d(Si,S ) 2  < do, with  2  do a threshold (depending on the individual), S i and S can not be distinguished. 2  Perception of timbre is a complex subject, see for example [15, 51] for a discussion, so we can not expect to be able to formulate such a sound-distance measure easily and accurately. A complication is that a very important distinguishing factor between pitched sounds is the perceived pitch, which is the same in our case, so our timbre measure depends on subtle and poorly understood aspects of audio psychology. As an initial proposal we take the sonic distance d(S\, S ) between two sounds 2  to be d (S ,S ) 2  1  2  = J2S(mH(\og(E}/Eo))  -  H(\og(E?/E ))) , 0  2  where E\ denotes the energy contribution of the i-th mode of sound r (= 1,2), i.e.,  E\ = ( a i ) / , . We take the logarithm of the energy, as the human ear is sensitive S  2  2  to the logarithm of intensity (measured in decibels). The function H(x) is zero for.  x < 0, and x otherwise. The constant Eo represents the lowest sound-level that can be heard, so the term H (\og(E![ / Eo) vanishes if E± < Eo- The function  S(f)  models the sensitivity of the ear to frequency. Without loss of generality we take 0 < S(f) < 1. The function S(f) has to be determined psychoacoustic experiments. We have ignored the "masking" effect, which changes the sensitivity curve of the ear in the presence of other stimuli. One could also argue that the threshold energy Eo depends on frequency. We leave a refinement of the measure d as a topic for future research. In addition to encoding the location dependency of the interaction on the surface, one can also extend this by taking into account the direction of the interaction at a given point. It is also possible to have different coupling coefficients at different locations and orientation around the object, thereby encoding the spatial  54  field of sound around the object.  4.7  Real-time Synthesis  The vibration modeling developed thus far in this Chapter generalizes to more general interactions besides impulsive forces. Because the model is linear, any interaction force can be represented as the sum of an infinite number of "impulses", and the resulting equations lead to an efficient real time synthesis algorithm for the synthesis of the sound of an object under any kind of external force. We shall now derive the algorithm, show how it can be implemented most efficiently, and we then show that it can be viewed as a discretization of a continuous time spring-damper system. According to our linear model, the sound produced by an impulsive force of magnitude F at time s can be described by the imaginary part of the complex wave form •  y(i)= E n ' a  e  n n (  '"*  ) f f  (*- ) '' s  F  (  4 / 1 5  )  n  where the sum is over the complex eigenfrequencies Q (the imaginary part detern  mines the damping of a mode). H(t) = 0 for t < 0 and H(t) = 1 for t > 0. A continuous stimulus force F(t) can be represented formally as an infinite sum of infinitesimal impulses F(t)  /•oo  = / Jo  S(t-  s)F{s)ds,  where 8(t) is the Dirac delta distribution, assuming the force is zero for negative times. Using the principle of linearity, the output of the model driven by this force can be written as a sum of infinitesimal contributions from each of these impulses: /•oo  y(t)=  dsTa e ^H(t-s)F(s). n  55  tn  Discretizing this equation in time, with sampling rate SR, gives  k 1=0 n with y(k) = y{t ) and t = k/SR. This can be rewritten as a recursion by defining k  k  the functions y {k), n  one for each partial. The complex signal is written as a sum of  modal contributions  y( )  = J2vn(k).  k  n  For the partials y (k) we have n  J/n(0) = a F(.0) n  and the recursion relation y (k) n  = e *y (k s  determines the audio signal Im(y). stable.  - 1) +  n  a F(k) n  (4.16)  As | e « | < 1 , the recursion relation is always t s  Equation 4.16 requires 5 multiplications per sample point, which can be'  reduced to 3 as we will now derive. To simplify the notation, let us drop the subscripts n, labeling the mode, and define a two-component vector /  y =  Re(y)  X  The recursion 4.16 can now be written as y(k)  = Ay(k-1)  / aF(k)  +  where C  r  \  C{  56  C-> c  r  J  ^  (4.17)  with c = e-  d/SR  cos(w/5 ),  d = e~  d/SR  sin(w/5fl),  r  d =  fi  Im(Q),  and u = Re{Q).  Let us change variables to z by writing  y = Qz. In terms of z, the recursion 4.17 reads  z(k) = Bz(k- 1)  aF{k)  + Q- l  0 where  B = Q~ AQ. l  Since B has the same eigenvalues as A, this does not affect the stability of the recursion. There are several (equivalent) choices for Q that reduce the number of multiplications to 3, and we choose  Q  1 J l - c \ 0  c,  \  J  Defining C+ = C + yjl - C  2  r  and c_ = c, 57  we arrive at the following equation for B,  B =  (*- -A  V  1  C  W  In terms of the real variables u and v  z =  the recursion becomes u(t)  =  c-u(t  v(t)  =  u(t-l)  -  1) -  v(t -  +  aaF(t)  1) +  c+v(t-l).  We have now only 3 multiplications per sample point (as ac,- can be pre-computed), but because the physical quantity of interest, Im(y), Im(y(t))  -  is related to v by  Civ(t)  it appears that we need another multiply per sample point to obtain this.. We can avoid this by multiplying a with c; in the preprocessing phase. Assuming that we have initially silence, i.e., u(t) = v(t) = 0 for t < 0, the linearity of the system guarantees that multiplying the input signal with a factor c- will multiply the output 2  signal with this factor also. We therefore arrive at the following synthesis recursion for the individual modal contribution v(t): u{k)  c-u(k  v{k)  u(k  —  1) —  — 1) +  with  58  v(k  c v(k +  l) +  1)  aF{k)  (4.18)  The intuitive picture of a set of spring-damper systems driven by an external force is verified by considering the behaviour of u and i; for large SR. In that case, a continuous time differential equation should emerge, which describes a springdamper system under an external force. B y substitution of u(k — 1) = v(k) — c v(k  — 1)  +  in Equation 4.18 we obtain a second order recursion for v(k): v(k + 1) - (c+ + c_)v(k)  + (1 + c_c )v(k  - 1) =  +  aF(k).  We now make the connection to the continuous time system by expanding v- to second order in 1/SR. If we denote the continuous time limit of v by V, with V(t)  = v(tS ) R  wherever v is defined, we obtain v(k + 1) = V(t)  + V(t)/S  +  R  V"(t)/2S , 2 R  (dropping the arguments t)  v  {k  - 1) = V - V/SR  c+ = 2- CI/SR - u /S 2  +  V/2S  ~  2 R  2 R}  d /2S , 2  2 R  and c_ = -d/S  -  R  d /2S . 2  2 R  W i t h some algebra we obtain V + 2dV + {to + d )V 2  2  =  dS F, 2 R  which is precisely the equation for a spring-damper system driven by an external force aS F. R  In some applications we have found that the quantity u + v sometimes 59  produces a better sound, especially for car engine sounds. The quantity w = u + v satisfies in the continuum limit W  + 2dW + (u  2  + d )W 2  = aS {dF R  -  F).  This means that by using w instead of v for the audio signal, we can obtain the same sound (up to a volume factor) as obtained by using v with input signal (dF — F). Because of the derivative of F, this generally gives a brighter driving signal. To synthesize the sound in real-time, we repeatedly compute an audio buffer of length T. The synthesis algorithm fetches the values of the coefficient arrays c+, c_, and a, as well as the external force F for the time interval T. Equation 4.18 is then used to sequentially add contributions of the modes v until all modes have n  been added or until a certain deadline has been passed. If the modes are sorted in a decreasing order of importance, this allows for a graceful degradation in the quality of the synthesized sound, when the time available for audio synthesis is not constant. This algorithm is explained in more detail in Section 7.1.  60  Chapter 5  Creating Vibration Models The linear vibration model presented in Chapter 4 parameterizes an object's acoustic response by a set of frequencies, dampings, and a set of amplitude functions on the surface that determine the coupling between an external force and the modes at that location. The amplitudes will be sampled on a discrete set of locations and when we are not interested in the location dependency of the sound, the coefficients a will be a set of numbers. The number of parameters needed depends on the nature of the model. Dense sounds like that of drums, or complex musical instruments will require a large amount of modes, whereas sounds with a less dense spectra, such as bars and plates, require much less. The model parameters are stored in an ASCII file of a format that we denote by sy, which is used by all our implementations. The file format is designed to be human readable, and also contains some extra information which allows the user to change overall characteristics of the model (such as the frequency scale of the object) easily. This file format is explained in Appendix A . The model parameters can be acquired from mathematical modeling of the  61  material and shape, or by parameter identification methods, or by "manual" means.  5.1  Computing Model Parameters Analytically  For simple shapes and simple material properties it is possible to write down an explicit partial differential equation ( P D E ) for the vibration of the object, and for very  simple shapes it is even possible to solve the resulting P D E analytically for  some boundary conditions. For more complicated shapes and materials, a finite element model [38, 58] could be used to compute the vibration modes and frequencies. However, in many cases not enough information about the object's geometry and material characteristics is available to do a sensible computation from first principles. We therefore have not pursued the finite element modeling approach in this work. We have explicitly computed the model parameters for a number of simple geometries, which allow an analytic solution of the vibration equations.  These  shapes already provide a fairly rich set of objects for use in simulations and are very useful if the goal is to provide generic interactive audio rather than the sound of specific physical objects (which are usually too complicated to consider modeling from first principles). We have obtained analytic model parameters for the following shapes, which can be generated using the software described in Chapter 7. 1. The taut string. This is the simplest example of a vibrating system. The eigenfunctions are simple sine functions.  The sound becomes brighter for  impacts near the ends of the string. The frequency spectrum is harmonic, i.e., all frequencies are integer multiples of the lowest (fundamental)  frequency.  The amplitudes a are inversely proportional to n, for large n, in contrast to a n  62  plucked string, considered in [80], where they decay as 1/n . This is one factor 2  accounting for the difference between a piano and a guitar sound, for example. A derivation of the eigenfunctions and eigenfrequencies can be found in [54], Chapter III. The nonlinear behaviour of the string (making it less suitable for this type of modeling) was investigated in [16]. 2. The rigid bar. For the rigid bar the operator A appearing in Equation 4.1 is given by A  ~  Ox*'  As A is a fourth order operator, we need to specify 4 boundary conditions. We have computed a clamped-clamped bar, i.e., the bar is rigidly attached at both ends, and a clamped-free bar. The boundary conditions are  ••'«=»"("=(^L=(^L= < -» o  5  for the clamped-clamped case and  ••««-(^L-(^L-(^L- ™ for the clamped-free bar, which is assumed to be clamped at x = 0 and free at x = 1. The frequency spectrum is less dense than for the string, and not harmonic, due to the different nature of the restoring forces on a bar. The sparse spectrum makes this shape very suitable for efficient modeling with the synthesis methods considered here. For details, see [54], Chapter IV. 3. The rectangular membrane. This is the simplest two-dimensional geometry. This shape has been used as an illustrative example in Chapter 4. The sound spectrum is extremely dense, giving a rich complex sound. Because of this density, it does not lend itself well to the synthesis algorithm described in 63  this thesis, as many modes are needed to obtain a good sound. For details on the rectangular membrane, see [54], Chapter V , Section 18. 4. The circular membrane. This corresponds to the vibrations of a drum, ignoring the effects of the surrounding air on the drum membrane. The eigenfunctions are Bessel functions, and the eigenfrequencies can be computed,as• the zeros of Bessel functions. The sound is also very dense, and is therefore not well suited for our type of synthesis for real-time applications. The rectangular and the circular membrane can be thought of as idealized models for. drums, which fall outside the scope of this work. For details on the circular membrane, see [54], Chapter V , Section 19. 5. The circular plate. This is one of the few cases where the two-dimensional plate equations can be separated, which allows an analytic solution. The eigenfunctions are a combination of Bessel functions and modified Bessel functions. We have considered a plate clamped rigidly at the boundary. The spectrum is much less dense than for the circular membrane. This is due, as for the bar, to the larger restoring forces in a plate, compared to a membrane. Therefore this model leads itself very well to our synthesis method. For details on the circular plate, see [54], Chapter V , Section 21. 6. The simply supported rectangular plate. This is another of the few cases where the two-dimensional plate equations can be separated, which allows an analytic solution. The eigenfunctions are products of sine functions. spectrum is much less dense than for the rectangular membrane.  The  This is  due, as for the bar, to the larger restoring forces in a plate, compared to a membrane. Therefore this model leads itself very well to our synthesis method.  64  For details on the rectangular plate, see [55], page 389. 7. The L shaped membrane. A membrane supported by a domain consisting of three unit squares in the shape of an L does not allow an analytic solution of the wave equation. This problem has received some attention in the literature, as the resulting boundary value problem requires some refined numerical methods. We have computed the eigenfunctions and the spectrum with an adaptation of the method of partial solutions, see [28], which is available for the L shaped membrane from within M A T L A B . As an aside, we note that the first eigenfunction features prominently on the cover of the M A T L A B reference guide [8].  5.2  Fitting Model Parameters to Empirical Data  An experimental approach to obtaining sound model parameters is to record sounds of real objects and fit the model parameters to the recorded actual sounds. Because of the linearity of the model, all we need is the response of the object to an impulsive force. By striking an object and recording the sound, we can fit the recorded waveform with a function of the form given in Equation 4.15. We can think of this as designing a digital filter of a specific type with a given impulse response (the recording). Because recordings have a lot of noise in them (we want a method that is applicable for "home users"), and objects can't be excited with a true impulse, we need a robust parameter estimation method. The extraction of sinusoidal signals from time-series data has attracted a lot of attention in the statistics and signal processing literature. For a general introduction see [61]. For more recent work, see [44, 56, 14, 70, 69].  65  Experimentation in M A T L A B with the Prony method [23], and other filter design methods such as the maximum entropy method [71] used in Linear Predictive Coding gave very unsatisfactory results.  We tried constructing IIR filters from  recorded sounds of several objects and then compared the reconstructed impulse response to the originals. In most cases the reconstructed sounds were very distorted, and often the damping coefficients were completely wrong (too large). We believe this is because the actual sounds are only partially described by the linear model, and non-linearities and external noise are known to cause problems with these methods. Therefore we have not pursued this approach any further. Similar difficulties were reported in [41] when trying to construct an IIR filter for the impulse response of the body of a guitar. It was found that autoregressive (AR) modeling using the autocorrelation method of linear prediction (LP) [48], as well as the pole-zero ( A R M A ) model using Prony's method [57], require extremely high order filters in order to accurately predict the decay rates reasonably well. The general consensus seems to be that for complex noisy data and rough models (as we are considering here), methods based on spectral analysis are more appropriate. Better results were obtained with a more robust approach using windowed Fourier transforms. We construct a spectrogram from the recorded audio signal, ; and use this to determine the dominant frequencies, their decay rates, and their amplitudes. We will now describe the algorithm. The input consists of a recorded impulse response of a real object. The recordings were made at a sampling rate of 44,100 Hz and encoded as a 16 bit wav file. A brief fragment of silence before the strike is (optionally) used to determine the noise level. The analysis consists of the following steps:  66  To be computed: Arrays / , d, and a corresponding to the frequencies, dampings, and coupling amplitudes of a modal model. Perform the windowed D F T : 1. Set the following constants: fs = sampling rate (44100)  N = size of D F T window (1024 - 8192) M = desired number of modes (100) X = signal to noise ratio (10) Q = window overlap factor (4) 2. Load the sample y(i) from disk. Store as a floating point array of length L. Total duration of the sample is L/fs3. Compute the overlapping windowed discrete Fourier transforms W (i), k  i = 0 , . . . , N/2 — 1. Wk is obtained by windowing the vector y(i) over the interval [kN/Q,  kN/Q + N  - 1] with a Hanning window [25, 72, 61], and  taking the D F T . The index k which labels the D F T ' s lies in the range  k = 0,...,[{L-N)Q/N  - 1J.  Identify the part of the signal used for analysis 4. Compute the intensities N/2-1  J2  A= k  5. Find the k  m a x  i^wi-  for which A is maximal, and define ko = k k  is the start of the impulse response.  67  m a x  + 1. This  6. Analyze the amplitudes A on the "silence" fragment k = 0 , . . . , k k  max  and compute the average A, and the standard deviation cr(A). smallest k, k > ko, such that A  < A + Xo~(A),  k  k den  —Q  Find the  and call this value  This is the end of the "signal". The constant A , together with the  background noise level determines this. 7. Define log\W {i)\,  Vjt(i) =  ko+k  where k •= 0 , . . . , K — 1, with K = k d — ko. This is the part of the signal en  we will use to extract the model parameters. Estimate the frequency modes 8. Define an array of N/2 — 1 "bins" , B(i), i = 0 , . . . , N/2 — 1, each corre:  sponding to a frequency of fs(i + 1 ) / A . Initialize them to zero. 9. For k = 0 , . . . , K — 1, find the set of indices i, call them {i },  for which  max  V (i) k  is a local maximum, i.e., V (i) > V (i + 1) and V (i) > V (i — 1). k  (There is a different set {i } max  i ™ , . . . , i^ 1  ax  for  k  k  k  each value of k.) Select the M indices  with maximal values ofV (i), k  and add 1 to B(i™ ) ax  for each  of these indices. We say these bins have been voted for as candidates for a resonance mode. There are K voting rounds. 10. Select the M  bins B(i) with most votes, call them i =  obtain the estimated frequencies fi = fs(h + l)/N,  for z = 0 , . . . , A f - 1. Estimate the damping coefficients of the modes 68  IQ,  ..  .,IM-I  and  11. For each /,-, fit Vk{U) as a function of k, k = 0 , . . . , K — 1, with a linear function —a{k + /?;, using a least squares algorithm. The damping coefficients d{ are identified as di =  for i = 0, ...,M-  Qfsai/N,  1.  Estimate the coupling amplitudes 12. The amplitudes a; are obtained as  °< = with pi =  T ^ 7 <  diN/f S  13. Normalize the amplitudes a; so that max(a,) = 1, and sort /,-, di, and a; by the value of a;. The sample y(«)> ' = 0,  — 1 is divided in a number of (overlapping) ••  windows of size N. The window size N is typically N = 1024,2048,4096,8192, or about 25 — 200ms. The windows overlap, and the overlap of about 75%, corresponding to Q = 4, was found to work well by trial and error. For each window, a discrete Fourier transform W(j), j = 0 , . . . , N — 1 is computed, using a Hanning window [25, 72, 61], in step 3. The value of  determines the spectral resolution  as A / = f /N, where f is the sampling rate. For s  of about 20Hz.  s  For a typical pitch of about 400Hz  = 2048 this gives a resolution this corresponds to a perceived  pitch error of (12/log(2)) log((400 + 20)/400) = .13 semitones, or 13 cents. This is clearly audible, but in the types of sound we are concerned with, absolute pitch is not an important factor. If so desired, an overall fine-tuning of the pitch can be made  69  later, either manually or by using a specific pitch detection algorithm [68; 62, 60]. See Figure 5.1 for an example of the recording of the sound of a metal vase struck at the top, and its reconstruction using the parameter fitting described here. Displayed is the sonogram based on N = 4096 of the actual recording on the left, and its reconstruction using 40 partials on the right. The best window size for a given sound was determined experimentally, by reconstructing the impulse response and comparing it "by ear" with the original. There seems to be no obvious way to automate this process. Often reconstructions with different window sizes would differ audibly, but it was not possible to say which one was "closer" to the original sound. Until we have a better understanding of timbre perception, the best approach seems to be to provide interactive tools with humans making perceptual judgements. For each window the norm of the Fourier transform is summed over all frequencies, giving the average intensity of the signal as A  k  The window W  ko  after  = Y^iHo  1  l^fcWI  m  s  * P 4e  the window with maximum intensity A is chosen as the start k  of the impulse response in step 5. Note that if the signal starts somewhere  inside  an F F T window, this window may or may not register as the maximum intensity window. In order to avoid artifacts from this, we throw away the first window and start the analysis from the next one, which is guaranteed to contain only signal. The beginning of the sample (before the impulse response) is analyzed in step 6 for the average level (the background noise) and the standard deviation. This section ends with the window indexed by kmax  k  m a x  — Q  because the window indexed by  contains the beginning of the impulse response, and we therefore have to go  back Q windows to obtain the previous non-overlapping window. This information is used to determine the end of the impulse response, which is set at the point where the signal amplitude falls below the background level plus some reasonable number  70  6400 Hz  5120 Hz  +  -3840 Hz  +  +  +  I 6400 Hz  +  +  5120 Hz  +  -3840 Hz  +  +  1280 Hz  -256ini7~=F~  1280 Hz  0.2 sec  0.4 sec  0.6 sec  lT2*sec™  0.4 sec  0.6 sec  Figure 5.1: Sonogram of recorded (left) and reconstructed (right) impulse response of vase. The window size of the Fourier transform was 4096 and the sampling rate was 44100 Hz.  71  (empirically set to 10) times the background standard deviation. This corresponds roughly to what would be obtained by a visual inspection and cutting of the signal when it "looks like noise". The logarithms of the absolute values of the windows containing a signal are identified in step 7 to extract frequencies, amplitudes, and dampings. For each time-slice we find the frequency bins which are local maxima in step 9, and for the  M largest peaks we increase the counts of those bins by one (they are initialized to zero, of course). After doing this for all time-slices, we keep the M bins with the most votes and these are the frequencies identified in step 10. For each of those bins i  we fit V~k(i) as a function of k with a linear function, and we extract the damping  parameters in step 10 and the amplitudes in step 11. In Figure 5.2 we show the first 100 frequency peak functions and their fits. The plots are sorted left to right and top to bottom by the number of votes. We have set the vertical scale for each subplot separately. Note that the strongest peaks in amplitude don't necessarily get the most votes. We do it like this in order not to miss weak frequencies, because perceptually the strongest are not always the most audible ones. The corresponding frequencies are depicted in Figure 5.3. The resulting vibration model can be visualized by plotting its frequency response in Figure 5.4. Note that the highest peaks do not necessarily correspond to the largest coupling coefficients, as the damping also plays a role in the spectral response. Note that some modes seem to behave very linearly whereas others are almost completely random. In a refinement of the parameter fitting one could reject modes that did not produce an acceptable fit with a linear function. However, it is by no means clear that this would improve the model. Some of the linear fits to the modes (such as the mode in the lower left corner,  72  V  \  \  \  \ .  \  \  \  ^  \  ^  V  \ v.  ^  <  Figure 5.2: Id entified frequency peaks and their linear fits.  73  4425  4005  3822  3758  3467  3424  3391  3316  2993  2649  2390  2261  2078  1992  183  4996  4576  4177  3973  3865  3790  3208  2229  2121  2046  1109  6105  5846  5028  4619  4242  3941  3908  3714  3499  3165  3058  3036  2595  ,1518  1270  301  97  65  8656  7558  6611  6298  5922  5814  5060  4113  3650  3230  2961  2778  2186  484  10444  9109  8829  8742  8355  6826  6686  6643  6578  6492  6385  6072  5394  5243  5179  5136  4953  4037  3112  2175  1324  678  624  420  388  10993  10476  9615  8775  8269  8236  8053  7203  6891  6460  6223  5986  5717  5663  5599  5523  5426  Figure 5.3: Identified modes for vase. Shown are the frequencies corresponding to Figure 5.2.  74  Figure 5.4: Identified modes for vase. The frequency response is shown. The bottom figure shows a subset of the frequency range of the upper figure.  75  one from the bottom) seem to be wrong. This is an artifact of the least squares algorithm we used, which only works correctly if the best fit has negative slope,' which corresponds to a positive damping coefficient. Negative damping modes are artifacts caused by trying to fit noise. In all cases their amplitudes were so small that they do not contribute to the resulting sound. We preferred to have the algorithm produce positive damping always, even when a least-squares fit would result in a negative damping, as an undetected negative damping coefficient causes instability in the synthesis algorithm, and must be removed before using the data for synthesis. This is straightforward but the result of forgetting this can be rather unpleasant.  1 ;  A very weak but positively damped spurious mode is harmless, on the other hand." Some modes may appear to be oscillating, which can occur easily if two frequencies are approximately degenerate. Degeneracy in the spectrum is closely related to geometrical symmetries, with the exact degeneracy being broken by higher order effects, and occurs frequently. Beating between the frequency doublet causes apparent oscillations in a frequency. Such behaviour has been observed in bells [35] •, kettledrums [53], and in timpani [77]. Such oscillating behaviour could be detected and one could attempt to fit such spectral lines with two frequencies. For the purpose of this research we have not found a compelling reason to pursue this further. : Various other refinements of the fitting method are possible, but have not been pursued in the context of this research. For example, one could work with a spline interpolation of the windowed discrete Fourier transforms, to obtain a more precise estimation of the frequencies. This would necessitate some form of frequency tracking, as we don't have discrete bins. For this an adaptation of the McAulayQuatieri algorithm [49] for frequency tracking could be used. For rough models of ' i t results in a horribly loud squeak which may damage the ear.  76  "common" objects little seems to be gained by this, as an accurate determination! of the frequencies is not so important (unlike in musical instruments). Polynomial interpolated Hanning windowed Fourier transforms were used successfully in [77] to identify the spectral lines for timpani sounds. After acquiring the model parameters we usually have to make some manual adjustments.  Often background noise such as the hum of a refrigerator will show •  up as a spurious mode with zero decay constant.  Often these modes are easily'  recognized by a visual inspection (or by some pruning code) and can be removed easily. They are also very audible, so when importing the constructed model to the. model tester described in Chapter 7, these spurious modes will be discovered. In Appendix B we show results for the vase at different values of the window size, as well as results for three other objects: a santur, a piano, and a metal computer tower. The piano and the santur parameters did not produce a good vibration? model. This was expected as musical instruments are much more complicated than  !  "common" objects. In particular the santur is a very complicated instrument. It consists of a set of strings on a wooden frame, three strings per note. The strings arestruck with wooden hammers. Because of the coupling between the frame and the strings if a single string is struck, the other strings resonate. The sound is therefore extremely complex and thousands of modes would be needed in order to approximate even the linear response. We have included the analysis of this instrument to indicate where our method breaks down. The piano string, whose data was obtained by playing a note on a piano, produces a reasonable model, though hardly musically satisfying. The vase and the computer tower produced very realistic sounding vibration models for a modest number of modes, around 5 — 10. The original sounds and the reconstructed impulse responses can be accessed on-line on [4], and on the  77  accompanying C D . Our reconstruction assumed the recorded sound is the impulse response of the system. However in reality the impulsive force will have some finite duration. One could account for this if the exact force profile of the strike was known, by correcting the amplitudes a obtained by the algorithm with a factor which depends on the exact force profile of the interaction, as follows. The impulse response yo(t) and the response J/F(0 to a finite duration force F(t), (non-zero only for 0 < t < r) are related by •y (t) F  = [ Jo  T  yo{t-  s)F(s)ds.  If the impulse response yo(t) is assumed to be of the form Vo(t) =  ± a l e ^ \ k=l  and the response yF{t) is assumed to be of the same form with different coefficients ajT, we can relate a£ and a° by k  a  - Ik  fc)  a  where the correction coefficients 7 ^ are given by 7jf  y/( J  T  e ^sin(u t)F(t)dt) d  k  +  2  (e ^cos(u t)F(t)dt) . d  k  2  We can now use the algorithm described previously, and divide the coupling coefficients by the correction coefficients 7 ^ to obtain the impulse response. For this the objects would need to be hit with a hammer equipped with a force sensor. We attempted to use the initial part of the recorded impulse response as a profile for the contact force, however this did not improve the results and we conclude that this is not an acceptable approximation. The resulting reconstructed sounds can be heard for the vase and the santur on the web page and on the C D . 78  90 80 h  0  1000  2000  3000 Hz  4000  5000  6000  Figure 5.5: Frequency response of vase for three regions. We also have tested the models by applying various input forces to the models taking into account different number of vibration modes and a selection of these sounds can also be accessed on the web page and C D . In Figure 5.5 we show the frequency response, as reconstructed by our method, for three different locations on the vase at the top, middle, and bottom. As can be seen, many frequencies and dampings (which show up as the widths of the resonances) are shared but the amplitudes differ, as expected.  79  5.3  Designing Model Parameters by Hand  In addition to computing or measuring the parameters, there are also circumstances where a less automated approach is desirable. For example, we have been able to synthesize the sound of avalanches by a set of random resonances with frequencies and dampings within a specified range. The avalanche models can be generated on-line and heard on the Java applet on [4], and on the accompanying C D . Also, the sound of "generic" objects such as unspecified structures of a certain type of material can be generated by choosing a set of resonances and controlling their distribution by stochastic methods.  For example on the web page referred  to above there is an object denoted by "pseudo string" which consists of a set of resonances which are spaced almost harmonically as in the ideal string model, but with some random perturbation. To generate the sounds of combustion engines, described in more detail in Chapter 6, a stochastic model for the resonances is also found to be very effective. In an application described in Chapter 7 we constructed a xylophone object, which was obtained by adding resonances with frequencies corresponding to the seven white piano keys (using just intonation) and adding a bar-like spectrum for each on top with coupling and damping coefficients adjusted by hand until it "sounded right". The xylophone was examined in great detail in [18], with particular attention to the mallet-bar interaction. We have also tried to extract the model parameters from a recorded sound of a church bell "by hand". For this we used a software package to analyze the sound, do Fourier transforms, plot spectrograms etc., in order to identify important frequencies.  After a time consuming analysis, it was found that the first 20 or  so modes identified were also found by our parameter fitting software, and the 80  weaker modes were not audible. This provides a nice validation of the parameter identification method. Although the "manual" approach did not give us a better model, in general one can take more extensive measurements of vibrating structures under a variety of interactions to arrive at a modal model. For example, steel drums [63], harpsichords [64], and guitar bodies [41] have been investigated using a variety of measurements.  81  Chapter 6  Interaction Force Models To create sounds we need an interaction force model as well as a vibration model. In this chapter we consider four types of interaction models: • Impact forces • Continuous contact forces (sliding and rolling) • Engine forces • Live data streams  6.1  Impact Forces  When two solid bodies collide, large forces are applied for a short period of time. The precise details of the contact force will depend on the shape of the contact areas as well as on the elastic properties of the involved materials. For example, a rubber ball colliding with a concrete floor will experience a contact force which will increase faster than linearly with the compression of the ball, because the contact area also  82  increases during the collision. A generic model of contact forces based on the Hertz model and the radii of curvatures of the surfaces in contact was considered in [39]. A Hertzian model was also used to create a detailed model of the interaction forces between the mallet and the bars of a xylophone [18]. To create the simplest model of a collision force to drive the audio synthesis, we assume that the two most important distinguishing characteristics of an impact on an object is the energy transfer in the strike and the "hardness" of the contact/ A psychophysical study of perceived mallet hardness [29] of xylophones showed that this is indeed a very perceptible parameter of an acoustic event.  The hardness  translates directly in the duration of the force, and the energy transfer translates directly in the magnitude of the force profile. We have experimented with a number of force profiles and found that the exact details of the shape is relatively unimportant.  A phenomenological model  of a finite duration impact has been constructed and implemented.  The model  approximates the contact force by a function of the form (6.1) for 0 < t < T , with T the total duration of the contact. This function has the qualitative correct form for a contact force. The force increases slowly in the beginning, representing a gradual increase in contact area, and then rises rapidly, representing the elastic compression of the materials. A n implementation of this contact profile can be found on the interactive demo's on the accompanying C D and on [4]and is found to be quite effective. The sounds of soft contacts (with large T) are recognizable as such, which shows that this model can produce this perception. We have also adopted this contact force model in a real time synthesis program described in Chapter 7, as well in other testing modules. 83  The spatial extension of the contact area will have an influence on the resulting sound. This effect can be incorporated easily in the linear model by averaging the amplitudes a over a region, which will have only a small effect on the sound for small contact areas. This is not an important effect and it will be ignored. More complex interactions during contact such as damping effects may have an important effect in some cases. For example, the hammers in a piano are covered with felt, so during the contact between the hammer and the string they damp the higher modes of vibration. How this actually occurs is quite complicated [32, 33, 34, 76] and potentially important, especially for high quality musical instrument modeling. Another example is a ringing sword. When the sword is sliding over a surface, the continuous contact will provide extra damping, compared to a free ringing sword. This can be taken into account by adjusting the damping parameters appropriately during continuous contact.  6.2  Continuous Contact Forces  A n important ingredient in synthesizing realistic scraping and rolling sounds is a surface interaction model. A lot of research has been conducted on models of contact interactions between solids [78, 79, 47, 42, 10, 9], but they usually focus on predicting forces at a coarser time scale than needed for our purposes. A n recent exception is [83]. Nevertheless a rigid body simulator would be able to provide information about the contact force magnitudes and the friction forces at the contact areas which may be used as inputs for a high sampling rate contact force model. This model should be able to generate contact forces for a specific type of contact depending on the contact force and the sliding speed.  The roughness  profile of both surfaces will determine the effective force stimulus to the object 84  and therefore have an important effect on the sound. Initial experiments indicated that bandwidth limited white noise and Gaussian noise (in the time domain) are' reasonably satisfactory, but they lack a surface property parameter. We have also experimented with scaling (fractal) noise. This is noise with a spectral content which behaves as f  a  (at least over a sizable region), for some  constant a. Such noise (of which white noise is a simple example with a = 0) sounds the same when played back at any speed, and is extremely common in nature [89]. Especially 1 / / noise [85, 88] occurs frequently, and is afield of study by itself. The exponent a can be used as a roughness parameter, a = 0 being a very rough surface, and a = 1 representing a smoother surface. The most satisfactory solution we found is to use a looping digital sample with pitch shifting and volume control to adjust the speed and force of the contact. With a small set of short samples representing a variety of textures a great variety of contact surface profiles can be imposed upon the vibration models. Surface profiles were created by simply scraping a real object with a contact microphone. See the accompanying C D or [4]for a palette of scraping sounds and their use. The physical picture behind this contact model is that the sample encodes the shape of the surface and the object scraping it is following this surface profile exactly. In reality microscopic deformations and breaking of contacts will complicate the mechanism whereby the interaction force is generated. These interactions are so complex that it seems hopeless to try to model the physics behind this in real time. We therefore believe the pitch-shifting sample playback approach is the best way to model these type of contacts. If it is necessary to take into account that the contact force does not simply scale in time at different speeds, one can have a numbers of samples for different contact speeds and cross-fade between them. In essence, this is  85  the technique used presently for wave table synthesis of musical instrument sounds, but used here for the contact force. Viewed in this way, our synthesis can be thought of as an "effect" added to a recorded sample, in this case representing an interaction force, instead of a sound.  6.3  Live Data Streams  A n interesting interface to this type of synthesizer is a sensor that measures real interaction forces. We have implemented this very simple (and cheap) with a contact microphone (designed to be attached to a guitar body) attached to a real object. When touching and scraping real objects the audio signal is sent to our real-time simulator and the audio signal is interpreted as a force to whatever vibration model is currently loaded. We can then scrape some interface object and transfer the measured signal to the audio synthesis to create the impression of touching a virtual object. Another type of application is to use the output of an electrical guitar as the driving force for some virtual guitar body. In Chapter 7 we will describe some applications written around this idea.  6.4  Engine Forces  Engine sounds are very difficult to achieve in computer games, as they are continuous sounds. Racing games are very popular and a properly modeled car sound that responds in a realistic way to input parameters would greatly enhance the audio in such games. It is not obvious that combustion engines can be modeled with our techniques, as the sources of sound are explosions and very complicated gaseous phe-  86  Figure 6.1: Sample driving four stroke one cylinder engine. nomena. Rather surprisingly, we found that very good sounding interactive models can be made by driving some resonance object (a lumped model of everything that is vibrating) with a rather simple-minded model of a combustion engine. The first model we created is a 4-stroke engine. The driving force is obtained by constructing a looping audio sample divided into four regions which represent the 4 stages of the engine. The sample depicted in Figure 6.1 contains an intake stroke, a compression stroke (silence), a combustion stroke, and an exhaust stroke. The intake stroke was modeled as white noise enveloped with a bell curve. The exhaust stroke is modeled as white noise, rapidly decaying in time, inspired by a high pressure gas mixture being released when the valve opens. The combustion stroke consists of an enveloped burst of 1// noise. It was found, after trying various l/f  a  noises, that this gives the most realistic sound. The reason is probably that  the combustion takes place inside the cylinder, so the shock wave is transmitted to the mass of metal of the engine block, which acts as a low pass filter. The sample is looped at adjustable rate, corresponding to the running speed of the engine, but we keep track of the four stages in the sample and allow the volume of the intake, combustion, and exhaust stages to be set in real-time independently of each other. This gives extra dimensions of control to map for example to driving  87  Figure 6.2: Sample driving four stroke one cylinder engine with fan. uphill, with a large load, etc. This simple driving force model can generate a variety of engine sounds by coupling it to various vibration models. Models with relatively high frequency resonances with high damping give a "lawn mower" effect, whereas a low frequency object gives a very convincing motorcycle sound. The best motorcycle sound was obtained by using a mathematical model of a rectangular plate with dimensions 7 •by 1 and lowest resonance at 50Hz.  About 7 modes gives an optimal result.  A slightly different sound is produced by addinga background pitched sound to the sample at all four stages. This simulates the sound of the fan, and perhaps other rotating parts. For the pitched sound, a short noisy note played on a 70cm long pipe with holes (a.k.a. a ney) was used, which is very satisfactory. The driving sample is depicted in Figure 6.2. A race car engine sound was obtained by creating a four cylinder version. We assume the four cylinders fire VT/2 out of phase and simply add the samples from . the one cylinder engine four times, with a relative phase shift of 7r/2. The resulting sample is depicted in Figure 6.3. If we just use this driving force as is the result is not very good. The reason is that in a real engine the four cylinders are attached to the intake manifold at different locations and therefore sound different. To incorporate this we adjust the volumes of the four one cylinder samples individually and when  88  t  tIL± Figure 6.3: Sample driving four cylinder engine.  they are set all different the resulting sound is very convincing. In a real game application one would probably spend more time on the actual modeling of the car engine than we have done here. But our simple models show that a convincing result can already be obtained using extremely simple models. To create richer engine sounds, different cylinders may be coupled to different exhaust manifolds or muffler pipes. It is interesting that the Harley Davidson motorcycles are designed specifically to generate their characteristic sound. Sub optimal engine performance is tolerated, just to get their characteristic sounds.  A n interesting  article on the audio aspects of the Harley Davidson motorcycle by Joel C . Moser appeared as the 1996 winner of the Streuben essay competition of the College of Engineering at the University of Wisconsin-Madison. The article is available on the W W W at [5].  89  Chapter 7  Implementations The theory presented in the previous chapters was tested in practice in order to evaluate the quality of interactive audio created with the methodology in several applications.  ,  In this chapter we will first describe the general architecture of applications with real-time audio synthesis and describe the software components for audio syn1  thesis that have been implemented in C + + and in Java. Following this, we w i l l , discuss authoring tools that we created, and finally we will describe several demonstration programs that have been constructed to illustrate what can be done with the audio synthesis.  7.1  General System Architecture  In order to apply the theory developed here to the creation of interactive audio in software applications (such as simulations) we need to implement software components which will have to be integrated with the application. The low level audio synthesis and the control algorithms should be designed as orthogonal as possible. 90  Synthesis Engine Sampling rate Buffer size C P U load  User Input  Controller  c_[] c+[]  Vibration model  a[]  Geometry model  Physical object  force[]  Interaction model  Interaction state  Real-time Application  Offline Authoring Tools OS Audio API  Model Database  Mathematical shapes  C + + classes Vibration models Parameter fitting  Figure 7.1: System architecture for applications with real-time audio synthesis. We therefore have divided the runtime system into two main components: • SynthesisEngine. This component represents a real-time thread or process that continuously writes computed audio buffers to the audio rendering hardware. This object implements the core synthesis algorithm at the lowest level. • Controller.  This controls the SynthesisEngine by feeding it the necessary  parameters. This object encapsulates composite models of objects and interactions and computes the filter coefficients and input buffers from higher level parameters such as sliding speed, throttle settings, etc.  The SynthesisEngine communicates with the audio hardware by repeatedly submitting a computed audio buffer of duration T using the audio A P I of the operat91  ing system. The operating system will provide an abstraction of the audio hardware in the form of a queue on which audio buffers can be placed. The SynthesisEngine must submit buffers at a rate of 1/T, or the queue will underflow, resulting in spurious noises. The buffer size T will determine the latency of the synthesis algorithm (the delay between an interaction and when it is heard) and the control overhead. We discuss this in more detail in Section 7.1.1. Before computing the next buffer, the SynthesisEngine will obtain information from the application about the object whose sound it should render and about the force applied to it. This information is obtained through the Controller class, which is responsible for making the coefficients c , c_, a, and the force F (see Equation 4.18) available for the duration T of +  the buffer. The SynthesisEngine will then compute the modal contributions of each mode as defined by the parameters obtained from the Controller one by one. After each mode is added, it checks if it has time to compute another mode. If its allotted time has run out or if all modes have been computed, it will place the computed audio buffer on the output queue and start computing the next buffer. The time allowed for the computation of a single buffer should be less than T, in order to insure an uninterrupted audio stream. By allowing significantly less time than T for the completion of the buffer, one can effectively force the synthesis algorithm to use only a small fraction of the available computational resources of the system. If more resources become available (for example when a computationally intensive task in a game is finished), more modes can be added and the quality of the synthesized sounds will adjust dynamically to the load on the system. For a buffer length of T, the latency of the synthesis will be between T and 2T. The force F(t) will become available after a time interval T , and the synthesis  92  algorithm will have to finish its computation of the audio buffer after another time interval T , because after that the next buffer F will have to be processed.  In  practice, the audio synthesis will often be required to use only a small fraction of the computational resources of the host machine, and will be required to finish in a time much shorter than T. To achieve minimum latency, we should choose T to be as small as possible. However, there is additional overhead for each buffer, as the model parameters may change. For large values of T this overhead can be made as small as desired, at the expense of more latency. Let us denote the computational cost of a multiplication by C(mult) and the computational cost of an addition by C(add), and let SR be the sampling rate and n the number of modes. The computational load per second of the synthesis will be L  s y n t h  = (3C(mult) + 4C(add))n5R.  (7.1)  The control overhead is assumed to have a computational cost of C(control) per mode, and the control parameters are recomputed every T seconds. The control overhead will be -^control  per second.  = nC(control)/T  (7.2)  As an illustrative example let us consider an object whose overall  frequency scale will be modified in real time. In this case the frequencies will have to be multiplied by a scale factor which will cost TiC(mult). Then the coefficients c  +  and c_ appearing in Equation 4.18 have to be recomputed giving a total cost per  mode of C(control) = C(exp) + 2C(trig) + 2C(div) + 4C(mult) + 2C(add) + C(sqrt), where C(exp) is the cost of an exponentiation, C(trig) is the cost of computing a trigonometric function, C(div) is the cost of a division, and C(sqrt) is the cost of 93  taking a square root. The condition under which the control cost can be neglected, ^synth,  reads  ^control  <C  •. C(control)/T < (3C(mult) + 4 C ( a d d ) ) 5  ; R  (7.3)  In practice a buffer length of about 50 ms, corresponding to about 1000 samples at 22, 050 Hz was used. The interface to the audio hardware on the computer introduces additional latency and requires a certain minimum buffer length for the audio queues^ In practice this also determines the buffer length used in the synthesis implementation. In this case the condition given by Equation 7.3 becomes C(control) < (3308C(mult) + 4410C(add)), which is easily satisfied. The runtime components are implemented as C + + and/or Java classes. The application writer will create a properly configured SynthesisEngine object, which will start the low level synthesis thread, but not interact with it directly. The SynthesisEngine is given access to a Controller object, which encapsulates the task of translating physical parameters such as resonance frequencies and excitation forces into appropriate control parameters for the low level synthesis algorithm. The rela- • tion between the components and the application is depicted in Figure 7.1. We will now describe the run-time components in more detail.  7.1.1  SynthesisEngine  The SynthesisEngine implements the audio synthesis algorithm defined by Equa- • tion 4.18. The class definition in simplified form is as follows: class SynthesisEngine { 94  private:  <  int samplingRate; int bufferSize; // =T*samplingRate double cpuFraction;  '  SynthesisEngineController *sec; public: // Define controller object int setSynthesisEngineController(SynthesisEngineController *sec); // Set output buffer size int setBufferSize(int bufsz); // Set fraction of CPU load to use int setCPUFraction(double  fraction);  // sampling rate i n Hertz , int setSamplingRate(int  samplingRate);  // Create synthesis thread int run(); // Destroy synthesis thread int stopO ; };  When a SynthesisEngine object is created and its run() method is called, thread is started which implements the following pseudo code: buf[bufsz]; // Audio buffer to be computed uPrev[] = vPrev[] = 0 ; // Zero'd arrays while(not_stopped)  {  Obtain arrays a[] , c _ [ ] , and c+[] from Controller;  95  // See Equation 4.18 for these coefficients Obtain array F[bufsz] from Controller; N = length of c[] array, i e . , number of modes; buf[] =0; time_allowed = fraction * bufsz / srate; start_time = getTimeO; for(i=0;i<N && (getTime()-start_time)<time_allowed;i++) { // Add contribution of mode i to buf [] and // recover the state variables from the previous time s l i c e u_prev = uPrev[i]; v_new = vPrev[i]; // Load parameters for this mode i n register ccplus = c_ [ i ] ; ccminus = c+ [ i ] ; aa = a[i] ; // Add this mode to output buffer for(k=0;k<bufsz;k++) { u_new = ccminus * u_prev - v_new + aa * F[k]; v_new = ccplus * v_new + u_prev; u_prev = u_new; buf [k] += u_new; > // Save state variables for next buffer uPrev[i] = u_new; vPrev[i] = v_new;  96  }  // Computed as many modes as time allows, now send to audio hardware. submitBufferToAudioHardware(buf); }  The inner loop adds the contributions from the modes one by one, until either all modes are computed or the allotted time has run out. If other processes are running at high priority on the machine, less modes will be added in the time allotted than if the synthesis thread has more resources available. In this case there will be a graceful degradation of the audio, provided that there is enough time to computed the "essential" modes of the object.  1  When the computed buffer is  submitted to the audio hardware, it will be put on a playback queue and the function submitBuf ferToAudioHardwareO is assumed to block if the queue is full, thereby  1  taking care of the timing of the synthesis loop. This function needs to be called at a rate of samplingRate/buf f erSize to keep the audio stream moving at the correct rate. How submitBuf ferToAudioHardwareO is implemented depends on the operating system on which the application is built. The operating system's audio A P I will also impose restrictions on latency, in addition to those associated with the value of the buffer size used in the actual synthesis algorithm. We have implemented the SynthesisEngine on top of the SGI IRIX A L audio A P I and on top of the Microsoft DirectSound A P I . On A L , we were able to achieve a total latency of about 50ms at a sampling rate of 22,050 Hz. Using DirectSound, the lowest latency we could achieve was about 100 ms. It is to be expected that the latter figures will come down when the DirectSound implementation matures. 'We assume that the granularity of the thread scheduling on the processor is fine enough.  97  Max. modes C P U load for 10 modes  C 450 2.2%  Java (Compiled) 250 4%  Java (JIT) 150 6.7%  Table 7.1: Performance of synthesis algorithm at a sampling rate of 22,050 Hz on a 233 Mhz Pentium P C with M M X using a C implementation and a Java implementation. The latency of the IRIX implementation is caused by the minimum length ofithe input queue buffer. If necessary, this buffer can be bypassed and the latency can be made as small as desired. As the achieved latency is quite satisfactory, we have not pursued this possibility. Bypassing this buffer also requires the process to run with root privileges, which severely limits its deployability. In order to test the speed of the algorithm, we timed the inner synthesis' loop on a Pentium 233 M M X P C . The results are summarized in Table 7.1. A t the sampling rate of 22,050 Hz, using a C implementation of the synthesis loop, we found that we can synthesize a maximum of 450 modes. This means we can compute about 5 modes utilizing 1% of the computational resources on average. We also benchmarked the algorithm in Java.  2  With the Symantec J I T compiler  version 3.00.029(i) we were able to synthesize a maximum of 150 modes, about three times slower. Compiling the Java code into an executable allowed us to increase the performance to 250 modes, which is slightly better than twice as slow as the C implementation. We have not implemented the algorithm in assembly language, which could provide a significant performance boost. 2  This estimate assumes that there are no exceptional events such as page faults, context switches,  etc.  98  7.1.2  Controller  The Controller class is an abstract base class and is subclassed by the application writer to encapsulate concrete audio models of objects. The Controller will usually load data from a database of vibration models created with various authoring tools described in more detail below. It translates parameters such as throttle setting or contact sliding speed into the low level parameters that the SynthesisEngine needs. In bare-bones form the class is given by: class SynthesisEngineController { public: // Callback function to get current f i l t e r parameters . v i r t u a l RTSFilter *callBack( int *location_index,const double *force,int srate.int buflen) = 0; >;  The function callBackO is called by the SynthesisEngine thread and returns a pointer to a struct, RTSFilter (real Time Synthesis Filter) which has the following form: ttdefine MAX_SPECTRUM 100 // Max. of 100 modes #define MAX_L0C 10  // Max. of 10 locations  typedef struct { int nf; /* Number of components */ int np; /* Number of points on object*/ double c_plus[MAX_SPECTRUM]; double c_minus[MAX_SPECTRUM]; double a[MAX_L0C][MAX_SPECTRUM]; 99  } RTSFilter; This struct contains the parameters necessary to set the coefficients in Equation 4.18, for a set of "locations" on the object. A "location" is just an integer 0 < i < np, which indexes the arrays a[ ][}. The translation to this linear array of "locations" from geometrical concepts used in the application will be done at a higher level in the code layers. The floating point array force [ ] represents the force applied to the object.  The parameter buflen indicates the number of samples to be returned in  the force buffer and is set by the SynthesisEngine thread. Similarly the parameter srate is used to inform the application of the current sampling rate used by the SynthesisEngine. The member function SynthesisEngineController::callBack() is overridden by the user (application programmer) of the SynthesisEngine class and also returns the location index of the vibration model in the pointer variable int *location_index.  Example: MouseTouchableObject The first example (used in the demonstration program described in Section 7.3.7) uses this base class to define a controller allowing the user to scrape virtual objects with the mouse: class MouseTouchableObject : public SynthesisEngineController { private: double m_scrapeSpeed; double m_scrapeForce; double *m_heightProfile;  100  SonicObject *sob; public: int setScrapeSpeed(double); int  setScrapeForce(double);  int setSonicObject(SonicObject  *sob);  int setSurfaceProfile(double *heightProfile,int buflen); };  // Implementation of callBackO RTSFilter *MouseTouchableObject:: callBack(int *location_index,const double *force,int srate.int buflen) { *location_index = 0 ;  // Only one location  // Maintain a circular pointer m_p i n the buffer m_heightProfile. Copy // a section of length buflen*m_scrapeSpeed into tmpbuf [], and advance // m_p  by this amount. Resample tmpbuf []  // scale with  a factor m_scrapeForce and  to contain buflen samples, copy i n force [].  This i s a  // pitch-shifted segment of the height p r o f i l e buffer, return sob->getRTSFilter(srate); // return the f i l t e r parameters }  The interface metaphor used in this class is a surface profile, whose shape is stored in the array mJieightProf i l e . This surface is touched by an object with force m_scrapeForce and speed m_scrapeSpeed, and the resulting excitation force can be obtained by looping through the circular array mJieightProf i l e at speed m_scrapeSpeed, resulting in a pitch shift of the buffer.  101  The SonicObject class is a wrapper around a parameter set for a vibration model of an object. It contains data members for frequencies, dampings, and amplitudes, and can be loaded and saved from file. It stores its data in the sy file format explained in Appendix A . The member function getRTSFilter(int srate) computes the actual coefficient arrays c_, c , and a (see Chapter 4) from the higher +  level model parameters.  3  It also provides member functions to change the material  properties by rescaling frequencies, changing the material constant, etc.  Example: AudioSignalProcessor The second example (used in the demonstration programs described in Section 7.3.7, • and in Section 7.3.3) uses the base class to define a controller that will obtain the input force from the input channel of the audio hardware.  In this manner, the  SynthesisEngine can be viewed as an audio effects processor. class AudioSignalProcessor : public SynthesisEngineController {1 private: SonicObject *sob; public: int setSonicObject(SonicObject *sob); };  // Implementation of callBackO RTSFilter *AudioSignalProcessor:: callBack(int *location_index,const double *force,int srate,int buflen) { *location_index = 0 ; 3  // Only one location  It caches the result and computes this only once, unless model parameters change.  102  getlnputBufferFromAudioHardware(force.buflen);  // Get input from audio  return sob->getRTSFilter(srate); // return the f i l t e r parameters }  The function getlnputBufferFromAudioHardware(double *,int) uses the operating system's audio A P I to obtain a buffer representing the input audio signal. This can originate for example from a microphone.  7.2  Authoring Tools  In this section we will discuss some authoring tools that were created to facilitate the creation of vibration models.  7.2.1  Off-line Model Tester onicObiect Tester; File  O X  Help  nm Seconds of sound Damping  Sampling Rate  j '  | Strike  3  Number of modes 1 10  p i  Interaction Type  144100  j j j Freq. scale  jT  Mallet Hardness Soft  Figure 7.2: Off-line model tester.  This utility allows the testing of a vibration model by applying various forces  103  to it and writing the resulting audio to disk as a wav file. This allows a user to fine-tune the model parameters and to analyze in detail the sounds made.  The  interface (shown in Figure 7.2) requires the user to select a vibration model from disk by loading an sy file. The vibration model can then be excited in three ways. The object can be struck with a virtual mallet with adjustable "hardness". This parameter corresponds to the width of the peak of the excitation force, which has the form given in Equation 6.1. The object can be scraped with white noise and the interaction force can also be loaded from disk as a wav file. The parameters of the vibration model can be modified in a number of ways. The dampings and frequencies can be scaled by constants, the sampling rate can be set, and the number of modes can be set. The sounds on the web page [4], and on the accompanying C D referred to in Section 5.2 were generated with this tool. The tool is written in portable C + +  i  with a command-line interface. A G U I written in Java was wrapped around this.  7.2.2  Vibration Model Generators  Two command-line tools were used to construct vibration models. A parameter fitting module implemented in M A T L A B was discussed in detail in Section 5.2. We have used it to create models of vases, pots and pans, stringed instruments, hockey sticks, bells, boxes, plates, glasses, cups, basketballs, swords, and more. A t a sampling rate of 44,100 Hz we would typically generate models with 100 modes at window sizes of N = 1024,2048,4096,8192, or about 25 - 200ms. We would then try the resulting models with the off-line tester described in Section 7.2.1. Sometimes spurious modes need to be eliminated manually. These modes arise from constant background noise and lead to frequencies with zero decay rate. They are  104  immediately audible, and can also be recognized by inspecting the s y file. Usually one or two window sizes give results that are considerably better than the others and those would be selected. As expected, the results were best for objects with a rather simple spectrum. None of the sounds were "photo-realistic", except perhaps the hockey stick. But in an interactive setting the sounds were convincing. A tool to create vibration models of mathematical shapes as described in Section 5.1 was implemented in C . The shape, material, size, and other relevant parameters are given as command line arguments and the module produces a sequence of s y files.  7.3 7.3.1  Demonstration Programs Sonic Explorer 1.0  We have constructed a test bed application in C + + for SGI, called the "Sonic Explorer", which demonstrates the enhancement interactive audio can bring to a three dimensional graphics environment.  Figure 7.3: A room modeled with the Sonic Explorer 105  The first version of the Sonic Explorer presents to the user a three dimensional model of a room with various objects as depicted in Figure 7.3. The user can view the room from any point by moving a virtual camera. The purpose of this program is to evaluate the quality and usefulness of simple impact sounds in a real environment. This demo does not use real-time synthesis, it is intended to test' impact sounds only. Several objects in the room were modeled by vibration models of mathematical shapes. The impulse responses of these objects were computed on a grid of locations using 10 points in each dimension and the resulting audio samples were stored on disk. The user can move a virtual drumstick around and hit objects at various locations and the corresponding audio sample is identified and played back. The material of the objects was modeled by the method described in Section 4 . 5 .  :  Even though the interactivity is very limited in the demo, the effect of changing sound as objects are struck at different locations and the material of the objects are conveyed quite well with these simple models.  7.3.2  Sonic Explorer 2.0  The second version of the Sonic Explorer implements a real-time synthesis engine as outlined in Section 7.1 using the IRIX audio library. Geometrical models of a vase (see Figure 7.5) and of a xylophone (see Figure 7.4) were created using the Open Inventor toolkit. A vibration model of the vase was constructed using the parameter fitting module discussed in Section 7.2.2. Data from three regions on the vase the top, middle, and bottom, was used and integrated manually into an sy file with three locations. The vibration models for the eight xylophone bars were constructed manually from the solutions of the free bar equations and the  106  107  base frequencies were tuned to form a "white key" scale of pitches tuned in just temperament. A special interaction device, called the Sonic Probe was constructed to allow the user to touch these objects by tapping and scraping.  The probe utilizes a  graphics tablet to move the mouse pointer. When the mouse pointer is on a the vase or on a xylophone bar the appropriate action is taken by the controller, as described in Section 7.1.2, and the SynthesisEngine will model the corresponding vibration model. The stimulus force on the virtual object is obtained from a contact microphone embedded in the pen. When the user touches the graphics tablet the local interaction, caused by tapping or scraping, is picked up by the contact mike and used as the input signal for the audio synthesis as described in Section 7.1.2. A small cover of rough wood was optionally placed over the graphics tablet. The, surface of the wood is rougher than the smooth plastic of the tablet and in this way we obtain a more interesting signal. In this manner we effectively map the surface structure of the wood to the object being modeled. A n additional feature is that the vase can be resized in the vertical direction, stretching or shrinking it, and the frequencies of the modes are scaled according to the length of the vase. Even though this is not the actual way the modes will change if the geometry of a vase is changed in this manner, it illustrates the concept quite well.  7.3.3  Sonified Objects  This application uses the idea of picking up real interaction forces with contact microphones and mapping them on virtual sonic objects in a non-graphical context. The SynthesisEngine was extended to process two independent input streams and  108  Figure 7.6: Plastic swords with embedded contact mikes used to create metal sword sounds. apply the virtual forces to two independent objects and to render the audio to different speakers. We attached contact microphones to various objects and obtained interesting effects by feeding these data streams to various vibration models. Two plastic toy swords were equipped with contact mikes and two different metal sword audio models were used to render the sounds. See Figure 7.6. Amusing effects could be obtained by changing the model to church bells or other inappropriate sounds. We have also used this setup (with one data stream) to create a digital effect box for an electric guitar. Very interesting effects were obtained by hooking the guitar up to vibration models of plates, objects with randomized modal structures,  109  bells, etc.  7.3.4  A v a l a n c h e Sounds  A n implementation of the synthesis algorithm was made in Java. Unfortunately Java 1.1 does not provide any method to render computed sound in real time. It even doesn't provide a method to render computed sounds by any means* whatsoever. Fortunately an "unofficial" audio A P I (the "sun.audio" library) does provide a minimal A P I to allow the playback of computed arrays of 8-bit audio data at the sampling rate 8,000 Hz. We have created a small package of classes to allow the creation of sonic objects and forces. The force can then be "applied" to the sonic object' and the resulting audio can be saved in a buffer and rendered later. A n advantage of this is that the sounds are created locally and need not be transmitted over the network. Viewed in this way the synthesis is the ultimate compression method. The class hierarchy and the complete documentation for the Java synthesis classes are available on the accompanying C D and on [4]. The classes were used to create parameterized avalanche sounds in the applet depicted in Figure 7.7. A n avalanche sound is generated by applying white noise to a vibration model containing an adjustable number of modes with frequencies randomly generated within a selectable frequency band. These are called the "rumbling frequencies". The stimulus force consists of modulated white noise whose time envelope can be controlled by four set-points. The avalanche sound is computed when the button "Compute" is pressed and played by pressing "Trigger". By repeatedly pressing the trigger button overlayed sections of the pre-computed sounds can generate a certain amount of interactivity. It is possible to re-compute the sounds while they are playing and  110  Trigger  i Trigger avalanche (press repeatedly to layer) I Compute soundeffect (slightly indeterministic)  Compute  Minimum rumbling frequency  100.0  Maximum rumbling frequency  150.0  Rumbling damping factor  0.87  Number of modes (sound quality)  25  Time to full level  1.2  Time to start fade  3.0  Time to end avalanche  6.0  Time to end all sound  10.0  Figure 7.7: Applet to generate avalanche sounds. every time a new set of random modes will be created, providing never repeating sounds. By setting the maximum frequency to higher values other interesting sounds reminiscent of wind or even eerie musical effects can be created.  7.3.5  Location Dependent Sounds of Mathematical Shapes  If the ideas presented here are to be used in a virtual reality environment, better tools will need to be provided to the user to specify the sonic attributes of objects. A graphical design tool that allows a user to define the types of sounds made by an object interactively would be very useful.  For example, a user will describe  an object as a circular metallic plate with a base frequency of 415 Hz. The user should then be allowed to poke and scratch the virtual object and adjust the model parameters until the object behaves as desired. A prototype has been implemented as a Java Applet, which allows a small class of objects to be created. See Figure 7.8. The object selected is depicted graphically and the user can hit it and scrape it  111  Object  Force C C  Hit 2  r  Bart  C  Ptusk  (~  Bat?  P  CtreularPhK*  (~  HRI  Scrapft(whftft note*)  Grid EKB  <• S-f*|:ft (0*J?SI*t> rwis*> f~" P$€l»dOSW«fi  Duration Scrape time PtuckBmg Hammtf hardness  Figure 7.8: Applet to create location dependent sounds for a variety of objects. and adjust the model parameters. Besides as a prototype of a sonic design tool, the program is also useful to test contact models. For example, the finite duration contact model described in Section 6.1 is currently implemented and this allows an immediate evaluation of the model. By repeatedly hitting an object, possibly at different locations, some interactive control can be exercised. Every mouse click generates an audio thread and many can be overlayed, through which dynamic effects can be created.  7.3.6  V i r t u a l B e l l Tower  A third Java applet was created to show an application which is both entertaining, small, and musically interesting. The applet maintains models of eight bells (a bell tower) and allows the user to select among different bell models, retune the bells using various presets or custom, play them interactively with the mouse, change the hardness of the mallet used to strike the bells, and sequence "change ringing" patterns. English bell ringing [40, 92, 19] is a complex mathematical art-form based  112  on ringing tuned bells in a steady rhythm in such a way that the bells ring all, or a subset of, the possible permutations of the bells. Typically six, eight, ten or twelve bells are used, but as few as three or as many as sixteen tower bells can also be rung to "changes". A mathematical notation system to describe a "method" exists. This "place notation" system can be viewed as an algorithm for generating permutations. The applet implements a place notation parser to allow the interested virtual change ringer to try out methods. A collection of ancient traditional methods can also be selected from the pull-down menu. Apart from the synthesized bell sounds, it is also possible to use wave table playback. For this purpose, eight recordings of actual church bells have to be loaded. Though the quality of the sound is somewhat better, all interactivity is lost, as the bells can not be tuned, or struck with different mallets etc.  Under Netscape 4,  the digital samples are not loaded from the server until the samples are actually used. The very noticeable delay while the audio samples are loaded illustrates the advantage of creating the sounds locally rather than downloading them over the network.  7.3.7  Real-time Model Tester  This demonstration program runs on Windows 95/98 using the DirectSound audio A P I . It is intended as a test-bed for object models, interaction models, and A P I design, as well as a demonstration program of the versatility of the audio synthesis methodology. The main control window is depicted in Figure 7.11. The check boxes allow the selection of various operation modes. In the basic mode, when all boxes are unchecked, except possibly the "3D"  113  .*Jtnfc.«.. .AjarMK.lt. . 4j*n—.4 *jf ^ ^ ^ P u J P ^ ^ ^ ^ W 4 P ^ ^ ^ ^^^M^^^ ^ ^ * W M p " ^ ^ ^ * W J p ^ ^  liMirHiiRi ^^^K^**^ ^^^^4p*""  12345878 12346678 21438587 12483857 21648375 26143857 62413375 62148735 26417853 624  d  Start  Faster Slower  C  Belli  (•  Bell2B  C  Bell 2  f  Sample  Ed* pitch Irregular  c  B A  E D C Method  Modes (1-30)  Sharp  20  187.5  K  G F  strike  200.0  scRaMbLe 150.0  I* I* K K  133.33  Apply  125.0 Ready Dull  Regular  X.38.X. 14.X. 1258.X.36 X . 14.X.58.X. 18 .X.78 . X . 16 .X.58 . X . 14.X.: Cambridge Surprise Major  Tuning  Major  iCamtiridge Surprise Major Grandsire Doubles Plain Bob Minor Little Bob Minor Reverse Canterbury Pleasure Place Doubles Norwich Surprise Minor Birmingham Carter Triples  Figure 7.9:  Help  Applet for ringing a bell tower.  114  3  .jjuiraujL.  «kK».  12345678 12345678 21436587 12463857 21848375 28143857 82418375 82148735 28417853 824  - —^  F aster  Start  Slower  I Irregular  - -  A  - -  G  -  Method  J  Belli  (•  Bell2B  C  Bell 2  f  Sample  :1 —  strike  C B  Is/  Sharp  20  187.5  F  133.33  E  125.0  D  112.5  C  100.0  scRaMbLe  Appfc  Ready Dull  X.38.X. 14.X. 1258.X.38 X . 14.X.58 X . 16 X.78.X. 16 .X.58 X . 14.X.; Cambridge Surprise Major  Tuning  Major  Major  For You the Bells Toll (If you press STA Dorian  Phrygian Lyd i a n Mixo-Lydian  Figure 7.10: Applet for ringing a bell tower.  115  Help  Modes (1-30)  166.66  Regular  .&J*M».4.  JL*«k^.  C  Edit pitch  ~ :  ±J  jLa<>tJL  w  1  jL*  S0  IP S y n t h e s i s E n g i n e Object-  Scrape  randobjectl .txt randstringl .txt ;rectplate1 .txt beil.sy desklamp.sy guitar.sy vase.sy computertower. < stick, sy  -Hit  Vol  Restart Stop F  F  r  i f  Force S p e e d Frequency  Contact Model  Min  Force Elast i Damping  10.0 Modes  r  Object  Car  Joystick _ Contact Model  Aspect  30  100  Realtime Modifiers Freq. Damp.  Max 10K . j -  white.wav 1 overt. 5.wav 1 overf.wav  r  Mike 3D  25  -  r  mmrnrn wood.wav metal.wav J plastic.wav sandpaper.wav grid.wav  Fr  50 50  8060  1.07  T'  1  7.05  Figure 7.11: Real-time model tester. Main control panel.  116  Fwrpl  Slow Scrape..  .Fast Scrape  Figure 7.12: Real-time model tester. Interaction windows to scrape and hit objects.  Car Engine Parameter Intake  Combustion  Exhaust  - i  Beatupness  Figure 7.13: Real-time model tester. Engine model editor.  117  box, the program allows the user to interact with several virtual objects. The object is selected from the "object" list, or can be loaded from disk as an "sy" file (see Appendix A ) or as a "txt" file. The sy file is expected to contain a vibration model of an object with one location. If a "txt" file is selected, the object is specified by its shape, base frequency, damping constant, maximum frequency, number of modes, and its aspect ratio (if applicable). We currently have implemented the ideal string, the rectangular plate and a "random" object, which creates randomized resonances and dampings. A n example of a "txt" file specifying a rectangular plate is rectplate base_frequency: 50.000000 damping: 5.559740 maximum_frequency: 8060.745000 number_of_modes: 13 npoints_x((must_be_l_for_now)or_npoints_in_l_dimension): 1 npoints_y((must_be_l_for_now)unused_in_l_dimension): 1 aspect_ratio_rectangle(unused_for_others): 7.052800 After a model is loaded from a "txt" or "sy" file, the user can alter its 118  properties with the sliders labeled "Frequency", "Damping", "Modes", "Aspect". This requires a restart of the synthesis by pressing the button in the upper right. The three sliders labeled "Realtime Modifiers" can be adjusted during synthesis: They allow the frequencies and the dampings to be scaled uniformly, and the slider labeled " F w r p l " applies a more complicated transformation to the vibration model. This transformation is intended to demonstrate the capability of real-time "sound morphing" with this synthesis method. After the vibration model is selected, the user can interact with it in various manners. If the "Mike" option is selected, the audio input channel is interpreted as the applied force, as in the demo described in Section 7.3.3. A problem with DirectSound causes very high latency of up to half a second, making this mode: very awkward. Nevertheless we have used it to test vibration models stimulated by real-time interactions.  ; >  If the "Mike" option is not checked, the user can stimulate the object in various other manners.  If the "Car" and "Joystick" modes are not selected, 'the  user interacts with the object by selecting a contact model from the second list box or from disk. The contact model is stored as a wav file and represents the surface texture of the virtual object, according to the model explained in Chapter 6. Mathematical models using scaling noise, as well as recorded dry scraping sounds have been used. Once the object and the contact model are selected, the user can scrape the object by moving the mouse over the upper interaction window depicted in Figure 7.12. The position of the mouse on the interaction rectangle determines the scraping normal force (top to bottom) and the scraping speed (left to right). The range of the force and speed can be adjusted with the "Scrape" sliders. If the "3D" option is selected, the horizontal position of the mouse is mapped to left-right  119  spatialization of the resulting sound. The second interaction window depicted in Figure 7.12 allows the user to hit the object by clicking the mouse at various points on the interaction window. The vertical position is mapped to force, while the horizontal position is mapped to the hardness of the virtual hammer, using the contact model for hitting described in Chapter 6. Sounds of a four stroke combustion engine are obtained by selecting the "Car" option. The interaction with the engine models is similar to the interaction with scraping models, except that one now controls the engine speed through the upper window. For this purpose, several wav files representing models for combustion engines as discussed in Chapter 6 can be selected. If the mouse pointer leaves the window, the engine sound continues at this rpm. If the "Joystick" option is selected, the user can control the rpm and the spatial position of the sound with the joystick. In this case the idling speed, and the maximum rpm can also be adjusted in real time with the joystick buttons. In "Car" mode, a third interaction window is present, depicted in Figure 7.13, which allows the user to adjust the volumes of the four stages of the four stroke engine model independently, to create different sounds interactively. The "Realtime Modifiers" allow the user to change the resonance properties of the engine and exhaust system while running the engine interactively with the joystick. Very convincing sounds of motorcycles, race cars, lawn mowers, and chain saws, can be created using relatively simple vibration and interaction models.  120  Chapter 8  Conclusions and Extensions 8.1  Summary of Results  In this thesis we have constructed a methodology to synthesize audio in real-time in simulation or game environments. We have shown how the physics of vibrating objects (using linear approximations) leads to a parameterized vibration model suitable for real-time synthesis of interactive audio. The synthesis algorithm maps the eigenvalues of the associated P D E to resonances and maps the mode shapes of physical objects to coupling coefficients. We believe the connection between the physics and the synthesis method is important as it allows a refinement of the method if so desired. In contrast to synthesis methods such as F M , our method provides a clear path to extension and improvement: include more physics. We have investigated mathematical models of simple geometries and computed the vibration models directly. A one parameter model for the material was found to convey the perception of wood versus metal versus glass quite well. A parameter fitting algorithm was implemented, which reconstructs a vibra-  121  tion model from a recording of a struck object. A large number of vibration models of objects have been reconstructed and the method performs satisfactorily. We have investigated interaction models for collisions, continuous contact, engines, and more abstract forces such as avalanches. It was concluded that a simple one parameter model of impact suffices to convey the perception of the "hardness" of a virtual mallet. For continuous contacts, we conclude that looped wave-table. playback of a small sample (representing knowledge about the surface contact) is the best solution. It conveys the perception of "roughness" and of contact speed quite well and it has a solid physical justification to be used as the first order approximation for contact force. The synthesis algorithm was implemented on several platforms and its performance in terms of speed, latency, and quality was analyzed in some detail, both experimentally and theoretically. It was found that the synthesis is efficient enough to allow a real-time synthesis of reasonably complex objects using just a few percent of C P U cycles on modern personal computers. We have designed and implemented an A P I for synthesis around the idea of a SynthesisEngine (which implements the low level synthesis) and a SynthesisController (which implements models of interaction). The A P I was implemented in C + + and Java and was used to create several demonstration programs, some of which can be found on the accompanying C D or W W W site [4]. From the performance and quality of the sound obtained in the demonstration programs we conclude that this type of audio synthesis can be used to significantly enhance the feeling of immersion in interactive environments. Current wave-table technologies are not able to provide the level of interactivity required in audio for continuous sounds. The synthesis method presented here is the most  122  efficient and realistic method to obtain contact sounds of interacting objects. The method also produces very good engine and rumbling sounds.  8.2  Extensions and Future Work  Several directions for future work present themselves naturally. Some are obvious and straightforward, some are more challenging.  8.2.1  Faster Synthesis  The performance of the synthesis algorithm can be improved in several ways. A significant performance boost can be expected by implementing the inner loop of the synthesis algorithm in assembly language instead of C . Another possible improvement is to implement the algorithm at multiple resolutions. For a resonance mode of 100 Hz, a sampling rate of 200 Hz should be high enough to accurately compute the contribution of this mode to the total vibration. It is perhaps more efficient to compute each resonance at the minimum sampling rate for that specific resonance and "up-sample" the result when adding all the modes together. It remains to be seen if the overhead of down-sampling the interaction force once for every mode and up-sampling the resulting modal contribution is computationally more efficient than a straightforward computation at a fixed sampling rate.  8.2.2  Integrate Waveguide Models  For linear structures such as strings and air-tubes (exhaust pipes!), waveguide models [67] provide a more efficient method to model resonance properties. The reason is that the resonances in those structures are linearly spaced and are much denser than in solid structures. These systems can be modeled with the modal techniques 123  pursued in this thesis, but this will be computationally expensive because each resonance takes a fixed percentage of the C P U . Waveguides provide a large set of harmonic resonances for the price of a single mode. Though the amplitudes and dampings can not be individually controlled as in modal synthesis, the performance gain is considerable. Extending the entire methodology presented here to incorporate waveguide models of vibrating structures is completely straightforward.  8.2.3  Improve Parameter Fitting  How can the linear models be improved? The parameter fitting module produces a set of vibration models which depend on how several parameters such as the window size for the windowed Fourier transform, but there is no systematic or automatic way to chose the "best". As discussed in Section 4.6, this requires a measure on perceptual audio space that would allow us to determine if sound A is a better approximation to sound C than sound B . This appears to be a very difficult problem, which probably requires more understanding of audio perception. A less ambitious project would be to formally evaluate the quality of the models by psycho-acoustic experiments. It would be interesting to see if a more objective classification of physical objects with respect to their suitability to be modeled with modal vibration models can be obtained. On the technical level, as discussed in Section 5.2, several improvements and refinements of the parameter fitting algorithm could be attempted.  124  8.2.4  Improve Interaction Models  The interaction models presented in Chapter 6 are all "first-cut" models and could potentially be improved.  The one-parameter impulse model could be refined to  include properties relating to the detailed shape of the force profile. The continuous contact model essentially assumes that the surface profile of the object is followed exactly during contact. In reality there are deformation, hysteresis effects, and contact is frequently broken when one object slides across another. Unfortunately the actual physics of contact is prohibitively complex, and it is not clear how to improve the model in an obvious and simple manner. We suggest investigation of a model where one object frequently "goes ballistic", i.e., at higher contact speeds the objects break contact frequently.  Perhaps a simple  inelastic collision model would be computationally tractable enough to refine the contact models driving scraping and sliding sounds. The engine excitation models were constructed in an ad-hoc manner, using simple-minded ideas about explosions and hissing sounds in a four stroke cylinder. Many different types of engines such as diesel engines, two stroke engines,- multicylinder configurations with accurately modeled firing sequences, could be modeled and investigated.  8.2.5  Non-linear Models  The resonance models are strictly linear. Though non-linear effects can be obtained by coupling vibration models with feedback loops, objects in the real world do not behave in a linear fashion to various degrees. If you hit an object harder and harder, the resulting sounds are not related by just a volume factor. This is partly due to the interaction, which changes non-linearly, and partly due to the actual vibration 125  equation of the object, which is linear only in first order approximation. A general theory of non-linear vibration does not exist. In general, nonlinear phenomena are extremely complicated and no universal unified method for analysis exists [90]. Some work has been done on non-linear extensions to harmonic oscillators [59], but a physical motivation seems to be lacking. A n interesting observation is that many objects, such as glasses, behave approximately linearly when struck, but then suddenly change their behaviour completely: they break. The interesting thing to note is that after the breaking event we again have an approximately linear system of glass fragments that fall within the scope of the present method. After the breaking event we will have to model many objects and their collisions in order to correctly synthesize the sound of the glass fragments falling. Presumably, some stochastic method of generating the impulsive forces and the distribution of the sizes of the fragments could drive the synthesis. If we consider the breaking characteristic of the glass as part of the glass model, we can view this as a piecewise linear simulation. Before the glass breaks we use a linear model and after the glass break we have a (different) linear model. See also [30, 17, 86] for other work on this topic. This type of piecewise linear behaviour occurs very frequently in reality. For example, an old rattly car consists of many parts which collide against each other as a result of the oscillations of the different parts. This line of reasoning would lead to a generalization of the modal audio synthesis model where different components are allowed to collide and impulsive forces are generated internally by collisions. A simple example of a non-linear model of this kind is a constrained pendulum, depicted in Figure 8.1. The pendulum behaves approximately linearly as long as the amplitude of the oscillation is small, but as soon as the pendulum starts  126  Figure 8.1: Pendulum is non-linear when colliding with the walls. colliding with the walls, it is only piecewise linear. Such a constrained pendulum could perhaps be used as a generalization of the spring-damper systems on which the audio synthesis described in this thesis is built.  127  Bibliography [1] http://ccrma-www.stanford.edu. [2] http://mediatheque.ircam.fr/articles/index-e.html. [3] http://www.cnmat.berkeley.edu. [4] http://www.cs.ubc.ca/nest/lci/thesis/kvdoel. [5] http://www.engr.wisc.edu/epd/steuber/motorcycle.html. [6] http://www.ircam.fr. [7] http://www.neuroinformatik.ruhr-uni-bochum.de/ini/people/heja/sy- • prog/nodel.html. [8] MATLAB  Reference  Guide.  The MathWorks, Inc., 1992. Computer  [9] D . Baraff. Dynamic simulation of non-penetrating flexible bodies. Graphics,  26(2):303-308, 1992.  [10] D . Baraff. Dynamic Simulation Cornell University, 1992.  of Non-Penetrating  Rigid  [11] Durand R. Begault. 3-D Sound  for Virtual  and Multimedia.  Reality  Bodies.  P h D thesis,  Academic  Press, London, 1994. [12] R. T . Beyer. Nonlinear Acoustics. Department of the Navy, Sea Systems Command, Washington, D . C , 1974. [13] W . G . Bickley and A . Talbot. Systems.  An Introduction  to the Theory  of  Vibrating  Oxford University Press, London, 1961.  [14] S. Braun and Y . M . Ram. Determination of structural modes via the prony model: System order and noise induced poles. J. Acoust. 1459, 1987. 128  Soc. Am.,  81(5):1447-  [15] Albert S. Bregman. Auditory  Scene  Analysis.  M I T Press, London, 1990.  [16] G . F . Carrier. On the non-linear vibration problem of the elastic string. Q. Appl.  Math,  3:157-165, 1945.  [17] Michael Anthony Casey. Auditory Basis  Methods  for Structured  Group  Audio.  Theory  with Applications  to  Statistical  P h D thesis, Massachusetts Institute of  Technology, 1998. [18] Antoine Chaigne and Vincent Doutaut. Numerical simulations of xylophones, i . time domain modeling of the vibrating bars. J. Acoust. Soc. Am., 101(1):539557, 1997. [19] A . W . T. Cleaver. The theory and Co., Oxford, 1976.  of change  ringing:  an introduction.  J . Hannon  [20] Perry R. Cook. Physically informed sonic modeling (phism): Percussive synthesis. In Proceedings of the International Computer Music Conference, pages 228-231, Hong Kong, 1996. [21] Perry Raymond Cook. Identification Vocal  Tract  Model,  with  Applications  of Control  Parameters  to the Synthesis  in an  of Singing.  Articulatory  P h D thesis,  Stanford University, 1991. [22] J . Cremer and A . J . Stewart. The architecture of newton, a general-purpose dynamics simulator. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 1806 - 1811, 1989. [23] Baron Gaspard Riche de Prony. Essai experimental et analytique: sur les lois de la dilatabilite de fluides elastique et sur celles de la force expansive de la vapeur de Palkool, a differentes temperatures. Journal de VEcole Polytechnique, l(22):24-76, 1795. [24] P. Depalle and X . Rodet. Synthese additive par F F T inverse. Technical report, IRC A M , Paris, 1990. [25] James F . Doyle. An FFT-Based Verlag, New York, 1989.  Spectral  Analysis  [26] N . I. Durlach and A . S. Mavor, editors. Virtual logical  challenges.  [27] Frank Fahy.  Response.  Methodology.  Reality,  Scientific  Springer-  and  techno-  National Academy Press, Washington, D . C , 1995.  Sound  and Structural  Vibration.  Academic Press, London, 1985. 129  Radiation,  Transmission  and  [28] L . Fox, P. Henrici, and C . Moler. Approximations and bounds for eigenvalues of elliptical operators. SIAM  J. Num. Analy.,  4:89-102, 1967.  [29] Daniel J . Fried. Auditory correates of perceived mallet hardness for a set of recorded percussive sound events. J. Acoust.  Soc. Am.,  87(1):311-321, 1990.  [30] W . W . Gaver. What in the world do we hear?: A n ecological approach to auditory event perception. Ecological Psychology, 5 ( l ) : l - 2 9 , 1993. [31] James K . Hahn, Joe Geigel, Jong Won Lee, Larry Gritz, Tapio Takala, and Suneil Mishra. A n integrated approach to motion and sound. Journal of Visualization and Computer Animation, 6(2):109-123, 1992. [32] Donald E . Hall. Piano string excitation in the case of small hammer mass. J. Acoust.  Soc. Am., 9(1):141-147, 1986.  [33] Donald E . Hall. Piano string excitation II: General solution for a hard narrow hammer. J. Acoust. Soc. Am., 81(2):535-545, 1987. [34] Donald E . Hall. Piano string excitation III: General solution for a soft narrow hammer. J. Acoust. Soc. Am., 81(2):547-555, 1987. [35] H . L . F . Helmholtz. On the Sensations  of Tone.  Dover, New York, 1954.  [36] David Jaffe. Ten criteria for evaluating synthesis and processing techniques. Computer Music Journal, 19(l):76-87, 1995. [37] David Javelosa. Sound  and Music  for Multimedia.  Henry Holt & Company'  Inc., New York, 1997. [38] Claes Johnson. Numerical solutions of partial differential equations element method. Cambridge University Press, Cambridge, 1987. [39] K . L . Johnson. Contact  Mechanics.  by the finite  Cambridge University Press, Cambridge,  1985. [40] Ron Johnston and Allsopp Graham. An atlas of Bells.  Blackwell Reference,  Cambridge, Mass., 1990. [41] M a t t i Karjalainen and Julius Smith. Body modeling techniques for string instrument synthesis. In Proceedings of the International Computer Music Conference, pages 232-239, Hong Kong, 1996. [42] J . B . Keller. Impact with friction. Journal 1986. 130  of Applied  Mechanics,  53(l):l-4,  [43] Eric Krotkov and Roberta Klatzky. Robotic perception of material: Experiments with shape-invariant acoustic measures of material type. In Preprints of the Fourth  International  Symposium  on Experimental  Robotics,  ISER  '95,  Stanford, California, 1995. [44] Debassis Kundu. Estimating the parameters of undamped exponential signals. Technometrics,  35(2):215-218, 1993.  [45] Horace Lamb. The Dynamical  Theory  [46] Strutt Lord Rayleigh. The Theory  of Sound.  of Sound.  Edward Arnol, London, 1910.  Dover, New York, 1896.  '  [47] Per L0tstedt. Numerical simulation of time-dependent contact and friction problems in rigid body mechanics. SIAM J. Sci. Stat. Comput., 5(2):370-393, 1984. [48] J . D . Markel and A . H . Gray. Linear New York, 1976.  Preciction  of Speech.  Springer Verlag,  [49] Robert J . McAulay and Thomas F . Quatieri. Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. Acous., Speech, Signal Processing, ASSP-34(4):744-754, 1986. [50] John C . Middlebrooks and David M . Green. Sound localization by human listeners. Annu. Rev. Psychol, 42:135-159, 1991. [51] Brian C . J . Moore. An Introduction Press, London, 1986.  to the Psychology  of Hearing.  1  Academic  [52] J . D . Morrison and J . - M . Adrien. Mosaic: A framework for modal synthesis. Computer Music Journal, 17(1), 1993. [53] P. M . Morse and K . U . Ingard. Theorectical  Acoustics.  Princeton University  Press, Princeton, N J , 1986. [54] Philip Morse.  Vibration  and Sound.  American Institute of Physics for the  Acoustical Society of America, fourth edition, 1976. [55] D . E . Newland. Mechanical  Vibration  Analysis  and Computation.  Longman  Scientific and Technical, New York, 1989. [56] M . R. Osborne and G . K . Smyth. A modified prony algorithm for fitting functions defined by difference equations. SIAM 382,1991. 131  J. Sci. Stat.  Comput.,  12(2):362-  [57] T . W . Parks and C . S. Burrus. Digital  Filter  Design.  John Wiley and Sons,  Inc, New York, 1987. [58] Maurice Petyt. Introduction to Finite University Press, Cambridge, 1990.  Element  Vibration  Analysis.  Cambridge  [59] John R. Pierce and Scott A . van Duyne. A passive nonlinear digital filter design which facilitates physics-based sound synthesis of highly nonlinear musical instruments. J. Acoust. Soc. Am., 101(2):1120-1122, 1997. [60] L . R. Rabiner, M . J . Cheng, A . E . Rosenberg, and C . A . McGonegal. A comparative performance study of several pitch detection algorithms. IEEE Trans, on Acoustics, Speech and Signal Processing, 24(5):399-418, 1976. [61] Clifford T . Mullis Richard A . Roberts. Wesley, Reading, Massachusetts, 1987.  Digital  Signal  Processing.  Addison-  [62] M . J . Ross, H . L . Schaffer, A . Cohen, R. Freudberg, and H . J . Manley. Average maginitude difference function pitch extractor. IEEE Trans, on Acoustics, Speech  and Signal  Processing,  22(5):353-362, 1974.  [63] Thomas D . Rossing, D . Scott Hampton, and Uwe J . Hansen. Music from oil drums: the acoustics of the steel pan. Physics Today, March:24-29, 1996. [64] William R . Savage, Edward L . Kottick, Thomas J . Hendrickson, and Ken-, neth D . Marshall. A i r and structural modes of the harpsichord. J. Acoust. Soc. Am., 91(4):2180-2189, 1992. [65] A . A . Shabana.  Theory  of Vibration,  Volume  I: An Introduction.  Springer-  Verlag, London, 1991. [66] A . A . Shabana.  Systems.  Theory  of Vibration,  Volume  II: Discrete  [67] J . O . Smith. Physical modeling using digital waveguides. Journal,  Continuous  Computer  Music  16(4):75-87, 1992.  [68] M . M . Sondhi. New methods of pitch extraction. IEEE Electro  and  Springer-Verlag, London, 1991.  Acoustics,  Trans,  on Audio  and  16(2):262^266, 1968.  [69] William M . Steedly, Chinghui J . Ying, and Randolph L . Moses. Statistical analysis of tls-based prony techniques. Automatica, Special Issue on Statistical Processing  and Control,  1994.  132  William M . Steedly, Chinghui J . Ying, and Randolph L . Moses. A modified tlsprony method using data decomation. IEEE Transactions on Signal Processing, 42(9):2292-2303,1995. Ken Steiglitz. Classic maximum entropy. In: Maximum Methods. Kluwer Academic, Norwell, M A , 1989. Ken Steiglitz. A Digital Audio  and Computer  Signal  Music.  Processing  Primer  Entropy  and  with Applications  Bayesian  to  Digital  Addison-Wesley, New York, 1996.  R. W . B . Stephens. Acoustics lishers) L t d . , London, 1966.  and Vibrational  Physics.  Edward Arnold (Pub-  Adam Stettner and Donald P. Greenberg. Computer graphics visualization for acoustic simulation. Proc. SIGGRAPH'89, Computer Graphics, 23(3):195-206, 1989. Gilbert Strang. Press, 1986.  Introduction  to Applied  mathematics.  Wellesley-Cambridge  Anatoli Stulov. Hysteretic model of the grand piano hammer felt. J. Soc. Am., 97(4):2577-2585, 1995.  Acoust.  Donald L . Sullivan, accurate frequency tracking of timpani spectral lines. J. Acoust.  Soc. Am., 101(l):530-538, 1997.  Frank W . Sinden Suresh Goyal, Elliot N . Pinson. Simulation of Dynamics of Interacting Rigid Bodies Including Friction I: General Problem and Contact Model. Engineering with Computers, 10:162-174, 1994. Frank W . Sinden Suresh Goyal, Elliot N . Pinson. Simulation of Dynamics of Interacting Rigid Bodies Including Friction II: Software System Design and Implementation. Engineering with Computers, 10:175-195, 1994. Tapio Takala and James Hahn. Sound rendering. Proc. SIG'GRAPH'92, Computer  Graphics,  ACM  26(2):211-220, 1992.  Samuel Temkin. Elements  of Acoustics.  John Wiley and Sons, Inc., New York,  1981. Barry Truax. Real-time granular synthesis with a digital signal processor. puter  Music  Journal,  12(2):14-26, 1988.  133  Com-  C. Ullrich and Dinesh K . Pai. Contact response maps for real time dynamic simulation. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 1019 - 1025, Leuven, May 1998. Presence,  Kees van den Doel and Dinesh K . Pai. The sounds of physical shapes. 7(4):382-395, 1998.  Richard F . Voss and John Clarke. 1/f noise in music: Music from 1/f noise. J. Acoust.  Soc. Am., 63(l):258-263, 1978.  R. M . Warren and R. R. Verbrugge. Auditory perception of breaking and bouncing events: A case study in ecological acoustics. Journal of Experimental Psychology: Human Perception and Performance, 10:704-712, 1984. C. A . Wert. Internal friction in solids. Journal  of Applied  Physics,  60(6):1888-  1895,1986. Bruce J . West and Michael Shlesinger. On the ubiquity of 1/f noise. Journal of Modern Physics B, 3(6):795-819, 1989.  Inernational  Bruce J . West and Michael Shlesinger. The noise in natural phenomena. ican Scientist,  Amer-  78:40-45, 1990.  G . B . Whitham. Linear  and Nonlinear  Waves.  Wiley, New York, 1974.  Richard P. Wildes and Whitman A . Richards. Recovering material properties from sound. In Whitman Richards, editor, Natural Computation, Cambridge, Massachusetts, 1988. The M I T Press. Wilfred G . Wilson. Change  ringing  on  Richard Woodbridge. Acoustic Recordings from Antiquity. In Proceedings  of  church  and hand  the I.E.E.E.,  bells.  ringing  : the art and science  of change  Faber and Faber, London, 1965.  pages 1465-1466, 1965.  P. M . Zurek. The precedence effect and its possible role in the avoidance of interaural ambiguities. J. Acoust.  Soc. Am.,  134  67(3):952-964, 1980.  Appendix A  File Format for V i b r a t i o n Models We give the specification of the "sy" file format used by our applications to define a vibration model of an object. The lines with text are comments, indicating the meaning of the following data. The nactive_freq: field describes how many modes should be used by the synthesis software to render the sound. It should be less than or equal to n_freq: which is the number of modes stored. The n_points: field indicates how many discrete "locations" are available for the amplitudes a. These "locations" may correspond to actual geometric locations on the contact surface of the objects, or to directional parameters. The fields frequencyjscale:, damping_scale:, and amplitude_scale: multiply the following frequency, damping, and amplitude parameters by a constant scale factor. These fields allow a quick manual edit by shifting all frequencies, increasing the damping, or boosting the coupling strength of the model. After the frequencies: header field, the frequencies are listed in Hertz, after the  135  damping: header the damping coefficients (the factors d in the impulse response a e  (-dt+2irift)  Q  f  a  n  individual mode), and after the amplitudes[point][freq]:  the  amplitudes; first the amplitudes of the first "location", then the second location, etc. The end of file is indicated by the string E N D . Below we give a short example of an sy-file for an ideal string with coupling data for three locations. nactive_freq: 10 n_freq: 10 n_points:  -  3 frequency_scale: 1.000000 damping_scale: 1.000000 amplitude_scale: 1.000000 frequencies: 440.000000 880.000000 1320.000000 1760.000000 2200.000000 2640.000000 3080.000000 3520.000000 3960.000000  136  4400.000000 dampings:  1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 8.000000 9.000000 10.000000 amplitudes[point][freq] :  0.258819 0.250000 0.235702 0.216506 0.193185 0.166667 0.137989 0.108253 0.078567 0.050000 0.707107 0.500000 0.235702 0.000000 0.141421  137  0.166667 0.101015 0.000000 0.078567 0.100000 0.965926 0.250000 0.235702 0.216506 0.051764 0.166667 0.036974 0.108253 0.078567 0.050000 END  Appendix B  Parameter F i t t i n g D a t a In this section we present the results of the models obtained by parameter fitting for several objects. The corresponding sounds can be heard on the accompanying C D or on [4]. Each figure shows the linear fits overtime to the identified frequency peaks (in d B ) , the estimated frequencies in Hz, and the frequency response of the resulting model on two different scales. The sampling rate was 44100 Hz and we show the results of various size Fourier windows. The material constant plots show the ratio of frequency and damping for the first 100 modes (sorted by amplitude), which should be constant according to the simplest material model.  139  \  ^ N_  W  \  V  \ x~  "S*  \  S  v  \  \ >—  W  ^_ \  S>»  \  V-  S**  ^  V^,  3316  1507  1120  172  4436  474  4006  2412  3790  2584  7537  4737  4565  4264  1249  1981  61 IS  10724  9733  6288  5814  3058  8742 '  7192  5383  4996  3445  15590  12575  11628  9087  8269  7321  6891  6632  1134  3402  2196  13652  12489  11757  10422  8053  7020  5512  3015  2239  18217  17786  17485  16710  16150  13135  12877  12705  12317  9647  9259  8355  7B81  5642  3833  2627  18906  14815  13351  13264  12059  11413  10982  10465  10034  9432  9130  8096  7752  684B  5943  5657  5211  5082  4608  43  , 15805  15719  15418  15159  14427  14384  14126  14040  13480  13006  11B43  11542  11370  11283  11240  1000  2000  3000  4000  Figure B . l : Top of vase with a window of 1024  140  5000  7000  6000  ^  ^  \  v^.  \  S«  \  3424  3316  2606  1507  4436  2390  2003  1120  49 S  4005  x  2261  172  1270  301  6115  4177  3165  2067  5922  4587  3876  5383  4759  10573  9755  6223  5814  3790  2778  2649  11003  2175  4996  4242  8656  7558  7321  9109  8829  8742  7214  6998  6891  5663  5233  4350  3611  3704  3510  3208  3058  1034  10444  10034  9022  6355  8053  7773  7687  7515  6632  6611  6417  5986  5620  5534  5340  5146  5060  4845  4134  3639  3036  2326  2196  1680  1615  775  624  388  43  16710  11348  10939  10788  10702  10422  9668  8979  8506  8419  v^.  v. V  689  Figure B.2: Top of vase with a window of 2048  141  \  \  \  ^  X.  \  \ \  \  V  \  x  x  \  \  \  3790  3208  2229  2121  2046  1109  6105  5846  5028  4619  X  X  \  \  X  \  \  4242  3941  3908  3714  3499  3165  3058  3036  2595  1518  \  ^  ^  x  1270  301  97  65  7558  6611  6298  5922  5814  X.  X  x~  X  5060  4113  3650  3230  2778  2186  484  10444  9109  X  8829  8742  8355  6826  6643  6578  6492  6385  6072  5394  5243  5179  5136  4953  4037  3112  2175  1324  678  624  420  388  10993  10476  961S  8775  8269  8236  8053  7203  6891  6460  6223  5986  5717  5663  5599  5523  5426  7000  aoco  X  X  X  \  \  X  V.  X  \ X  X  X  \  X  x  \ X  X  ^  X-  ^  X  \  X  V  X  X  4425  4005  3822  3758  3467  3424  3391  3316  2993  2649  2390  2261  2078  1992  163  4996  4576  4177  3973  3865  6686  Figure B.3: Top of vase with a window of 4096  142  502B  4177  4005  3978  3908  3871  3822  3795  375B  3424  3391  3079  3063  3031  2993  2600  2579  2390  2261  2186  2078  1997  1513  1114  7558  6616  6422  6298  6110  Figure B.4: Top of vase with a window of 8192  143  3467  10  20  30  40  50 n  60  70  80  90  100  Figure B.5: Top of vase with a window of 1024: Material constant  3000  25001-  2000 k  Figure B.6: Top of vase with a window of 2048: Material constant 144  145  \.  V-  W  X.  V,  s-  V.  >»-  V  V.  W  V  •v.  X*  v.  V.  s» W ft*  w  *1  S  V  2713  689  345  2369  2024  3747  1680  1034  7450  3058  4823  4479  5556  5943  3402  1335  5428  17270  4134  5211  6503  16279  9819  1652  7063  11154  8613  12B77  10508  15590  6675  7838  8226  43  9991  62B8  172  15805  6158  1895  16581  14212  13308  12532  6848  5771  11542  10250  9044  13738  11585  8398  3919  16624  10680  5814  12403  12274  9388  861  13824  6115  1378  16925  152B9  13049  12618  5728  2541  ' 1206  14901  14686  14556  11929  11499  10379  B18  16451  15935  15719  11843  10939  10637  10207  9561  6891  4091  3445  1163  474  18863  18734  18131  17743  15073  14858  13523  10810  10164  9690  8r—  Figure B.8: Santur with a window of 1024  146  V  1012  3424  3079  2046  1357  2390  1873  1701  668  4479  2735  3747  4134  883  452  5211  1104  G5  5556  4414  14S4  1120  17270  1572  538  5426  4070  129  9819  4823  8613  7063  11S20  818  10508  5922  k.  16279  12B77  1H76  7838  603  15590  6675  1249  10659  v.  5103  258  12920  12855  11133  B226  5943  10487  4845  9970  9044  6288  1787  840  172  16258  16042  x. w.  ^  585 2713  2153  193B  1637  8247  7795  6848  6309  6115  5448  4113  3725  2821  2239  2175  495  237  13394  1443  775  22  Figure B.9: Santur with a window of 2048  147  \ \  K  w  K  X  -  V  334  4479  2379  1012  678  3736  1357  4081  3424  7450  4124  3779  2035  1873  3068  484  1701  1120  452  V,  s.  fcv  s. V.  2724  s.  V "s.  <**  \  V.  •»»  X.  V-  \  \ "»»  w  S»  s. V,  i—  \.  118  883  54  5932  5200  4425  1314  1249  592  183  B6  7074  5416  1572  1464  624  1647  1637  1443  1174  840  12886  10497  9819  8613  6492  5566  1195  947  549  538  420  301  258  ii  17270  11574  11176  10659  8236  5803  5556  4834  3391  1475  969  829  818  388  237  65  15579  11520  9830  5114  1916  1690  1184  1087  1066  894  775  743  506  269  22  16290  16268  15590  11133  7429  5760  5448  5426  5211  3079  2089  1755  1744  1669  7000  BOOO  Figure B.10: Santur with a window of 4096  148  \ ^.  \-  \  \  IV  V  X  \.  k  \  \ •\  K  >  \  v.  \  \  \  \  \  \  \  V  V  V,  S-  v.  \  \. s,  \  \  \  V  ^.  X  \  h X  3758  2708  2385  1696  678  3424  1674  1249  1233  1120  1017  883  834  4124  3052  1873  1448  1319  1303  624  452  178  92  59  7074  7052  5932  4829  4479  1626  1195  818  544  479  382  301  6498  5206  4420  1642  1475  1470  1044  974  904  592  275  „  12866  10497  7429  5566  '5561  4129  3391  3047  2089  1647  1577  1211  1206  1179  1098  1039  969  899  700  635  608  436  210  188  167  156  32  5421  5416  5200  4064  1933  7000  8000  ¥ X  \  V  312  264  258  237  215  16268  9819  8236  7849  6691  Figure B . l l : Santur with a window of 8192  149  X. N  \  \ >v  \  S. v.,  »«•  *~ »m  Km  W  ** >*»  **  •*  Sir.  >»  *r  1"  ft* »»  *~  m  fV.  1766  1593  1421  661  689  517  345  172  2326  1938  1249  2110  2885  2670  2498  1034  3445  3058  3833  3230  3618  4608  5383  4221  5943  6158  3273  9776  4996  4760  4005  11025  5556  4436  4048  10207  8829  4953  15332  15246  6675  43  16322  13178  12059  9948  8613  6288  17528  t5676  11800  11197  10724  8226  8053  7838  7278  6417  3144  19078  17657  15073  14083  14040  13781  13264  8441  7537  7020  6546  6503  5211  5168  4393  18777  14815  14212  13910  11972  11671  10637  9862  9173  9001  B742  7795  7666  7623  7192  6804  6374  6331  6115  1077  18519  16882  16064  15590  14901  14772  »10*  Hz  Figure B.12: Piano with a window of 1024  150  8000  V \  X.  \  \ Vv.  tl  V V. "n >•*  > s»  S.  N *r  M fa *k *t ^ « fa fa  1593  1227  883  345  172  1400  711  517  2864  3252  1938  105S  2498  65  2692  3833  3058  3639  4587  4027  9778  6137  4221  4996  4436  >*  V*  *•*  1766 2304  258  5383  5190  5577  2799  5556  4802  4242  1960  15892  14707  10573  9109  7795  6934  6503  5426  5082  3575  3553  3316  3122  2670  2067  16107  13738  12425  10357  10164  10056  9388  8786  8269  7106  6998  6891  6675  5900  5685  3984  3919  2218  1120  18691  17227  16042  15827  15676  15052  14750  14643  13910  13501  12985  12511  11951  11929  11779  10831  10336  9819  9755  9647  Figure B.13: Piano with a window of 2048  151  \ V,  N s. N  \  \  X.  Sr  v.  X  s.  \  s  W  V.  W> V*  ->  V  "-r  \ \  V,  231S  \  s *•  T-  '—i  *t ton  V  1410  1227  700  528  172  1949  1583  872  355  3252  2638  2498  1055  3058  3833  3639  2B75  3445  65  4231  2078  6137  5383  1873  V-  4791  2864  301  9776  9765  5566  4996  4597  2455  118  54  22  5954  4393  3435  2175  1680  1130  s.  v. s- M  <-*  **  3>V| IN  1766  >•  w« V.  2132  h>  441  398  280  248  129  108  11  7644  4985  3682  3564  3305  2379  2272  1303  1195  1044  947  592  463  420  291  269  226  97  32  16645  12705  11326  11197  1092B  10810  10250  9991  9841  9033  6557  6514  6180  5792  4436  4425  4188  4081  3962  Figure B.14: Piano with a window of 4096  152  \  X  X  N  \  X  x  X x  X  \ x X  x.  X  X  V  x TN  "•>.  W.  H~  V< V-  A*  700  350  3252  2498  1233  1039  528  9771  3440  3058  4592  3833  2869  2681  1949  1410  172  59  2638  2132  2309  1879  140  4990  2315  371  318  226  124  70  4430  4027  3634  X  S  2482  2277  1750  1308  1249  910  565  501  312  X  108  38  ii  9528  6508  5572  6383  5184  4791  s~  3558  2126  2105  2078  2073  1620  1292  1184  1168  743  732  554  463  425  393  328  291  280  269  248  242  221  215  210  205  129  97  75  43  22  5  17636  16511  16145  15832  14713  w-  S X.  877  X  X, X.  1588  x  v.  X  1766  x  x-  M  7000  Figure B.15: Piano with a window of 8192  153  8000  v. »l  y  >i  V  \  \  V  i.  *«  k,  s \  M  f  h  ^  v,  \  V  \  V  s  V,  \.  \  \  \  v,  h  k,  \ \ \.  v.  \.  v.  \  V  \  1. V,  \  \  V  V  ll  v.  \  \  s  k  0  \  \  >  >»  \  V.  i  \  20586  18519  10680  7149  6115  4048  3790  1550  1895  21792  1292  517  16710  16193  16107  15504  15246  14987  14384  14040  13867  12834  9905  8958  8527  6891  6029  5512  5254  3101  2929  2670  1723  861  21275  21016  19638  19466  18863  17916  14729  14212  13954  13178  12575  12489  11972  11886  11628  ,„„  10250  10164  9647  9130 20930  8096  5599  3445  2412  2326  1378  775  21533  21447  V.  2075B  20413  19552  19380  18949  18002  17140  17054  16882  16796  \  16538  16451  15935  15848  15676  15332  14643  13695  13351  13006  6000  7000  o.z  000  2000  3000  4000  5000  Figure B.16: Computer tower with a window of 512  154  8000  155  \  \  \  \  \  \  \  \  \  \  \  \  \  \  \  \  \  \  \  \  \  \  \  1895  7429  5146  4974  2390  538  474  280  7257  6546  6115  6051  5857  5642  5469  5383  5254  5190  4845  4694  4479  4199  3854  3704  3575  3316  3230  2993  2713  2649  2498  1960  1680  1615  1357  1055  818  775  689  8204  7687  7192  7149  7041  6891  6804  6740  6266  6094  5965  5922  5900  5749  5728  5663  5577  5534  5340  5297  5103  4931  4910  4780  4759  4716  4651  4587  4565  4393  \  4350  4221  4156  4113  4091  4005  3790  3747  3639  3510  \  3424  3402  3338  3187  3165  3122  3101  3036  2842  2821  \  \  \  \  V  \  \  V  \  \  V  \  .V  2907  \  \  \  \ \  \  \  \  V  \  \  \ \  V  \  \  \  \  \  \  \  \  \  \  \  \  \  \ \  \  \  \  \  \  \  \  \  Figure B.18: Computer tower with a window of 2048  156  2B64  1680  614  5146  4963  4920  4587  4091  3790  3338  3316  3090  2896  2832  2659  2616  2541  2487  2444  2412  2304  2239  2207  1884  1830  1615  1540  1497  1475  1357  1281  1087  1012  528  420  398  248  10336  10271  10142  10088  9442  7978  7644  7429  7418  7257  6331  6105  6094  6062  6051 5243  807 11940  5329  5297  5276  5254  4845  4802  4791  4716  10034  9905  9701  6740  6621  6546  6492  5965  5911  5846  5545  5437  5233  5179  5103  4974  4877  4651  4630  4554  4522  4479  Figure B.19: Computer tower with a window of 4096  157  11402  1500  10  20  30  40  50 n  60  70  80  90  100  Figure B.20: Computer tower with a window of 512: Material constant  1500 r  1  1  1  1  1  1  1  1  r  10  20  30  40  50 n  60  70  80  90  1000^  500 k  0  100  Figure B.21: Computer tower with a window of 2048: Material constant 158  159  g 4000  Figure B.23: Bell with a window of 1024: Material constant  Figure B.24: Bell with a window of 2048: Material constant 160  g 4000y  Figure B.25: Bell with a window of 4096: Material constant  2 4000  Figure B.26: Bell with a window of 8192: Material constant 161  Appendix C  Audio Synthesis Techniques for Music A comprehensive overview of musical audio synthesis techniques is given by Herbert JanBen, on the W W W site [7], from which the following material is compiled. • Additive Synthesis, Fourier Synthesis Any sound, however complex it may be, can be described as a mixture of a number of sine wave components with different phases and amplitudes. These are the partials of a sound, which are also called harmonics if their frequencies are an integer multiple of the fundamental frequency. The method to generate a complex sound spectrum as the sum of (many) simple sine waves is called Fourier synthesis, after Joseph Fourier who found its mathematical basis. The more general term additive synthesis can also be used if the waveforms added are not sine waves. Ideally, a lot of sine oscillators are needed for Fourier synthesis. How many  162  depends on the required range and brightness: a bright bass note, think of a slap bass, may need more than hundred, while a high pitched harmonic sound will probably need only a dozen. For dynamical sounds and expressive play of an additive synth very many parameters are needed: ideally, each oscillator should have its own amplitude envelope, pitch envelope, velocity sensitivity and modulation routing. Although this may sound like the hardware is the limiting factor, the usability is even more so. Most of the many parameters have only little influence on the sound and generally it is very hard to estimate how the spectrum of a desired sound looks like. Thus the simulation of acoustic instruments seems to be impossible without appropriate analysis hardware and software. • Subtractive Synthesis This is just the classical method of synthesis used in most analog synths and in most sample playback synths and samplers. Subtractive synthesis means that you take a sound (preferably a spectral rich one like a sawtooth or a square/pulse wave, or a sample of a grand piano) and route it trough a modulatable filter and amplifier to change its timbre. This way you reduce the level of some partials of the original spectrum and hence the term. The terminology is a bit fuzzy for the real world, since almost any synth uses filters. The general usage of the term tends to refer to the classical "oscillator/filter/amplifier" trinity though. What is nice about subtractive synthesis is that by selecting the oscillator waveform or sample, the basic timbre is rather well determined and the usual filter and amplifier parameters allow to effectively tweak it to make it brighter, 163  duller, more percussive etc. The main problem with subtractive synthesis is that its tools are rather bold: the oscillator waveforms or samples have a distinct character that is hard to overcome and the usual filters do not allow for very subtle changes. • Analog Synthesis Analog Synthesis is not really a synthesis method, but rather a hardware issue: analog synths use analog instead of digital electronics to create their sounds. What most of them can do in terms of synthesis methods is quite simple subtractive synthesis. However some more advanced machines and especially modular synths may use a great variety of synthesis methods including F M , wave shaping, vector synthesis and others. Analog synths are considered "cool" nowadays, because of their supposedly "warm" and "fat" sound. So what is so special about analog synthesis then? First, most analog synths have an extensive user interface with dedicated knobs and switches for every function. This gives a very intuitive access and results in instant gratification for sound tweaking. This is possible because typical analog synths offer a limited number of control parameters, but those parameters are highly effective. Second, rather simple analog circuitry can perform the functions needed for subtractive synthesis rather well: the resulting sound will be artificial but with slight variations and instability. Thus, the sound quality is lively in a way similar to acoustic instruments. On the other hand most digital synths compromise in sound quality to achieve the high number of voices many buyers seem to be fond of today. 164  Modular analog synths have the additional advantage that there is no distinction between audio and control signals. Everything is just a control voltage which results in a vast number of patching possibilities. These beasts are very rare and pricy nowadays, but are probably still the best way to learn about synthesizing sounds, and can be used as a musical instrument of exceptional power. • Sample Playback ( P C M , A W M , A W M 2 , A I , ...) This is a form of subtractive synthesis that is also called P C M (Pulse Code Modulation), A W M (Advanced Wave Memory), A W M 2 (Advanced Wave.Memory Version 2) A I (?) by manufacturers. Usually all those terms refer to basically the same thing: A n audio signal e.g. a miked acoustic instrument or an electrical or electronical instrument is sampled (digitized) and the recording is stored in R A M or R O M . If a device is able to sample and store the result in R A M or to disk, it is called a sampler. A device that can playback samples (from R A M , R O M or disk) at different pitches is called a sample playback synth. Most samplers and sample-based synths use subtractive synthesis although there are some samplers and synths that offer only very limited processing. The term P C M refers to the coding technique which is used in virtually all digital instruments. The terms A W M and A W M 2 are Yamaha marketing slang for 16-bit sample playback and 16-bit sample playback with filters. Korg uses the term A I synthesis for their M l , which is just another sample playback synth, synthesis wise. Sample playback is what has made synths realistic sounding. On the other  165  hand sampling per se offers less options for expressive play than almost any other synthesis scheme. Samples that have a lot of inherent character or are easily recognized as an acoustic instrument are hard to shape, so filters and other processing options will merely adjust the timbre of the sample. There are however unique possibilities in sample synthesis, but these are often not implemented in commercial synths. I'm talking about modulation of the sample playback parameters itself: extreme transposition of multisamples, modulation of sample start and sample loop length, multiple sample loops and more. Among others, various Ensoniq and Emu synths and samplers are capable of some of these. On many synths (including the SYs) even transposition (the changing of sample playback rate) can only be achieved with the trick of a constantly biased pitch envelope.  • Frequency Modulation ( F M , A F M ) Frequency Modulation is usually abbreviated F M or A F M (for Advanced Frequency Modulation). This is the family of synthesis methods that brought a breakthrough for commercial digital instruments in the eighties. Basically it means that you control the frequency of an audio oscillator by the frequency of another audio oscillator. The interesting aspect sound-wise is that you can generate a very wide variety of spectra plus many transient sound characteristics with F M (and not only the never ending variations of electric pianos and bells). F M was "invented" by John M . Chowning at Stanford and used in the academic computer music scene long before Yamaha marketed it. The commercial Yamaha implementation introduced some restrictions, but also some useful ex-  166  tensions like feedback. F M exists in many different flavours: some analog synths resp. digital/analog hybrids are able to do a very basic F M . But F M relies mainly on the frequency ratios of the oscillators involved, and therefore requires very high tuning stability. Also F M becomes a versatile synthesis technique only if you have multiple oscillators with multiple envelopes to control their amplitude which results in a big number of components/modules needed in the analog realm. Maybe this is why it was and is not as popular with analog synths. Yamahas digital F M implementations use custom chips to reduce cost. In case of digital F M , there are also many variants: depending on the number of oscillators (minimum is 2, most synths use 4 or 6, I recall to have heard of 10 in some Yamaha organs), whether there is a real envelope per oscillator (some very simple Yamaha sound chips like the one used in the old Atari STs miss them) and of course how variable the routing between the oscillators is (number of "algorithms, modulation and feedback paths). • R C M (Real-Time Convolution and Modulation) Synthesis This is another marketing term by Yamaha and is mainly an extension to F M . The background is that you use a whole AWM2-element as modulator input for an AFM-operator, which also means that you can apply the filter on the sample before you put it through the FM-process. The former fact may be used to motivate the term modulation, the latter the term convolution (one possible algorithm for a filter is the convolution of the signal with a kernel). In my opinion the term R C M is ill defined and misleading. In lack of a better term, I will use R C M to denote the capability to use feed the A W M section  167  into the A F M section and vice versa. Phase Distortion (PD) and Interactive Phase Distortion (iPD) The terms phase distortion and interactive phase distortion were used by Casio for their synths (CZ and V Z series). Actually the two methods seem to be very different.  The phase distortion  synths (the C Z series) offer eight basic waveforms (saw, pulse, resonance, etc) . 1  Each of them can be morphed continuously from and to a sine wave via an eight-stage envelope, thus emulating the use of a low-pass filter on a basic analog synthesizer. Another two envelopes are used for pitch and amplitude and for two-oscillator patches a ring modulator can be used. Wave Shaping Wave shaping refers to a sound manipulation (not generation) technique which applies a (nonlinear) function on the original signal (i.e. the output of an oscillator). This scheme is similar in principle to analog distortion in a guitar amp or fuzz unit, but offers much more sound variation possibilities including resonance-like effects.  Wave shaping can be used as an advanced synthesis  method in a way similar to F M . L A (Linear Arithmetic) Synthesis This buzzword was used by Roland to describe their approach to digital sound synthesis in the eighties. It is based on the observation that the attack transient of a sound is its most important part with respect to human perception. Therefore the LA-synths (D-50, M T 32 and others, but most notably not the D-70) used a combination of sampled attack transients and simple digital oscillators with only sawtooth and pulse waveforms to generate the sustained part of the sound. 168  Ring Modulation, Amplitude Modulation Ring modulation and amplitude modulation are not complete synthesis methods, but rather processing techniques that are quite common on advanced analog and digital synths. Sometimes these features are wrongly named or used when the actual implementation is quite different. Manufacturer specific terminology for similar schemes includes the terms cross modulation and F X M (frequency cross modulation). Ring modulation is the multiplication of two signals. The output of a ring modulator will contain the sum and the difference of all available input frequency pairs. Amplitude modulation is the multiplication of two signals, where one signal is always positive. Vector Synthesis The first synth to implement this paradigm was the SCI Prophet V S . The V S can mix four oscillators with different waveforms in real-time via a joystick controller and a multistage envelope. While this is a really simple concept, it is effective for expressive play and nice evolving sounds. The Korg Wavestations and the Yamaha SY22, TG33 and SY35 are other "vectorized" synths. The Yamahas can mix up to two F M and two sample elements, while the Wavestations mix up to four sample based wave sequencing oscillators. In principle most synths can do real-time vector synthesis, when fed with M I D I joystick data to cross-fade oscillators. If you like to try that you can rewire a P C game joystick to fit your synths pedal jacks. 169  Wave Table Synthesis This term is used for two completely different things: Many sound card companies call their R A M based sample playback capabilities like this (because the samples are stored in a table in R A M ) . For the P P G Wave and the Waldorf Microwave and Wave synths this term is used to describe the ability to produce a sound by sequencing through a table of different waveforms during the duration of a single note. For the wave tables and waves there is a preset R O M area as well as a user loadable R A M area provided. Which entry of the wave table is selected may be controlled by an envelope, L F O or any other modulation source in real-time. Also these synths can interpolate between subsequent waveforms in the wave table thus smoothing the timbral change. The waveforms are single cycle ones, so realistic acoustic emulations are out of reach for this technique, but the vastly improved •' modulation capabilities, compared to sample playback, more than make up for this.  Wave Sequencing This term means that a sequence of different sample segments can be used to generate a sound. Korg implemented this on their famous Wavestation synths. The Wavestations oscillators can sequence through programmable patterns of samples. Each of the patterns consists of a number of individually tunable sample snippets and each sample in the sequence is assigned its own level and duration. Typical for the Wavestation (and rather easy to program) are "rhythmic" wave sequences in which an oscillator steps through a number of samples in a predefined periodic rhythm. The Wavestations also combine this  170  with vector synthesis capabilities. Granular Synthesis Granular Synthesis means sequencing through very many very short sound (sample) snippets. The difference to wave sequencing is that the single samples are played for such a short time, that the sequencing is heard more as a timbre than as a rhythm. Granular synthesis has been developed in the academic computer music scene and has not found its way into commercial products so far. Physical Modeling Synthesis Physical modeling (PM) is a whole class of synthesis methods that do not synthesize sound based on an abstract mathematical description, like the Fourier transform for additive synthesis or by classical signal processing means like filtering for subtractive synthesis, but rather tries to model the diverse instruments themselves: e.g. the bow, string and resonance corpus of a cello or the plucking finger, string and body for an acoustical guitar. There are many different physical modeling algorithms including relatively simple ones like Karplus-Strong synthesis and rather complex ones like the waveguide [67] approach which uses multiple delay lines (comb filters) to model strings or air columns. The biggest advantage of physical modeling is the real-time control it offers. While other synthesis methods offer some algorithm specific and rather arbitrary control parameters like filter cutoff or modulation index, physical modeling enables the use of control parameters that are more musical and have a more complex influence on the timbre and phrasing. 171  Examples for such  parameters are embouchure or tonguing. Another, advantage of P M is that the sound generation is context sensitive: a note on a clarinet model will sound different if it is played with legato binding to its predecessor or with a little pause in between. The dependency is much more complex than with the traditional synthesizer portamento or glide function. Another example: the pitch bend of a clarinet patch will not just linearly shift the frequency of the note, but the synth will respond in a similar way to a real clarinet, i.e., it will shift the frequency and timbre for a while but then jump to the octave. P M has its disadvantages though: Instrument models have to be designed with great care and a lot of knowledge of both instrument acoustics and the necessary math. On the available P M instruments by Korg and Yamaha, editing is only possible via macro parameters of the otherwise hard coded instrument models, probably the only practicable way to provide sound programming to the "normal" user.  • Karplus-Strong Synthesis This synthesis method uses a percussive sound, like a noise burst or a single pulse, which excites a delay unit with feedback. If the feedback is high enough (90-99%) an exponentially decaying sound with definite pitch will result. It is the delay time that determines the pitch in this case. To be exact the delay time equals the period of the resulting periodical wave. This synthesis method is particularly well suited for emulating plucked strings and other percussive harmonic sounds. To make the decay more realistic, one can include a low-pass filter in the feedback path, so that higher harmonics are damped faster.  172  Karplus-Strong synthesis is actually a very simple form of physical modeling and shares the most important physical modeling advantage: since the  perr  cussive sound acts as an exciter for the delay loop that produces the harmonic sound, one can change the plucking/fingering of the model-led string by chang- > ing the percussive sound, and change the string by controlling the delay line parameters.  • Modal Synthesis This synthesis techniques uses a bank of resonators with individually adjustable frequencies and dampings, which is driven by some input signal. Non-harmonic percussive instruments such as bells and marimba have been successfully modeled with this technique [20]. For general musical instrument synthesis this technique has the disadvantage that a resonator is needed for every partial, and the partials are spaced linearly for most musical instruments, with the exception of non-harmonic percussive instruments like bells. The many resonators needed leads to a large computational cost, which can be avoided by using waveguides or comb filters, which by themselves already have a complete harmonic spectrum and are therefore better building blocks.  173  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items