UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Automatic measurement and modelling of contact sounds Richmond, Joshua Lee 2000

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_2000-0550.pdf [ 11.32MB ]
JSON: 831-1.0051315.json
JSON-LD: 831-1.0051315-ld.json
RDF/XML (Pretty): 831-1.0051315-rdf.xml
RDF/JSON: 831-1.0051315-rdf.json
Turtle: 831-1.0051315-turtle.txt
N-Triples: 831-1.0051315-rdf-ntriples.txt
Original Record: 831-1.0051315-source.json
Full Text

Full Text

Automatic Measurement and Modelling of Contact Sounds by Joshua L . Richmond B . A . S c , Universi ty of Waterloo, 1998 A T H E S I S S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F M a s t e r of Sc ience in T H E F A C U L T Y O F G R A D U A T E S T U D I E S (Department of Computer Science) We accept this thesis as conforming to the required standard The University of British Columbia August 2000 © Joshua L . Richmond, 2000 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y a v a i l a b l e for reference and study. I further agree that permission for extensive copying of t h i s thesis for s c h o l a r l y purposes may be granted by the head of my department or by h i s or her representatives. It i s understood that copying or p u b l i c a t i o n of t h i s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. The U n i v e r s i t y of B r i t i s h Columbia Vancouver, Canada Abst rac t Sound plays an important role in our everyday interactions wi th the environ-ment. Sound models enable v i r tua l objects to produce realistic sounds. The manual creation of sound models from real objects is tedious and inaccurate. A brief review of sound models is presented, wi th details of a sound model for contact sounds. This thesis documents the development of a system for the automatic acquisi-t ion of sound models. The system is composed of four modules: a sound acquisition device, an asynchronous data server, an algorithm for computing prototypical sound models and an adaptive sampling algori thm. A description of each module and its requirements is included. Implementations of each module are tested and explained. Results of typical da ta collections are discussed. Sound models for a cali-bration object, brass vase, plastic speaker and toy drum are constructed using the system. Comparisons of the sound models to the original recordings are displayed for each object. Under ideal circumstances the system produces accurate sound models. E n -vironmental noise, however, decreases the accuracy of the estimation technique. A n evaluation of the parameter estimation algorithm confirms this observation. M a n y opportunities exist for future work on this system. Ideas for improve-ments and future investigations are suggested. ii Contents A b s t r a c t i i C o n t e n t s i i i L i s t o f T a b l e s v i i L i s t o f F i g u r e s v i i i A c k n o w l e d g e m e n t s x i D e d i c a t i o n x i i 1 I n t r o d u c t i o n 1 2 Contac t Sounds 4 2.1 Overview 4 2.2 Related Work 5 2.3 A Contact Sound M o d e l 7 2.4 Empi r i ca l Parameter Es t imat ion 8 2.4.1 Performance Evaluat ion 10 3 A S y s t e m for A u t o m a t i c M e a s u r e m e n t o f C o n t a c t S o u n d s 13 3.1 Objectives 13 3.2 System Requirements 14 ii i 3.3 System Overview 15 4 A S o u n d A c q u i s i t i o n D e v i c e 20 4.1 Overview 20 4.2 Related Work 21 4.3 Requirements 22 4.4 The Act ive Measurement Faci l i ty 22 4.5 Sound Effector 23 4.6 Sound Capture Hardware 25 4.7 Sound Effector Software 25 5 A n A s y n c h r o n o u s D a t a S e r v e r 28 5.1 Overview 28 5.2 Requirements 28 5.2.1 Generic D a t a Server Requirements 29 5.2.2 Sound Server Requirements 29 5.3 Server Architecture 30 5.3.1 Server Execut ion 31 5.3.2 D a t a F low 33 5.3.3 Specialisation to Sound D a t a 35 5.4 Implementation Details 36 5.4.1 Performance Evaluat ion 36 5.4.2 Limi ta t ions 37 6 B u i l d i n g a P r o t o t y p i c a l M o d e l 40 6.1 Overview 40 6.2 Related Work 41 6.3 Spectrogram Averaging 42 6.4 Performance Evaluat ion 43 iv 7 A n A d a p t i v e S a m p l i n g A l g o r i t h m 47 7.1 Overview 47 7.2 Related Work 49 7.3 Requirements 51 7.4 Surface Representation 51 7.5 Acoust ic Distance Metr ics 53 7.5.1 Frequency-Independent D a m p i n g Coefficient 54 7.5.2 Frequency Similar i ty 55 7.6 Perceptual Thresholds 56 8 S a m p l e D a t a C o l l e c t i o n s 60 8.1 Overview 60 8.2 Tuning Fork 61 8.2.1 Est imat ion Results 61 8.3 Brass Vase 62 8.3.1 Es t imat ion Results 64 8.3.2 Refinement Results 66 8.4 Plast ic Speaker 68 8.4.1 Es t imat ion Results 69 8.4.2 Refinement Results 71 8.5 Toy D r u m 71 8.5.1 Es t imat ion Results 73 8.5.2 Refinement Results 74 9 C o n c l u s i o n s 77 9.1 Overview 77 9.2 Future Work 78 B i b l i o g r a p h y 80 v A p p e n d i x A S o u n d E f f e c t o r Spec i f i ca t ions 83 A . l Moun t ing Bracket 83 A . 2 Con t ro l circuit 85 A p p e n d i x B Effect o f W h i t e N o i s e o n S p e c t r o g r a m s 86 A p p e n d i x C D e t a i l s o f U n i q u e F r e q u e n c y - M a p p i n g A l g o r i t h m 88 vi List of Tables 5.1 Results of sound server performance evaluation 38 7.1 Regression of similarity on frequency and decay difference 57 7.2 Examples of calculated similari ty factors 59 8.1 Summary of setup parameters for test objects 61 8.2 Summary of refinement results 67 vi i List of Figures 2.1 At t r ibutes of contact sounds 5 2.2 Parameter Es t imat ion Algor i thm 9 2.3 Mean errors of the parameter estimation algorithm on synthetic data . 12 3.1 Sound Measurement-to-Production Pipeline 14 3.2 General procedure for sound model creation 17 3.3 A test object on the A C M E test station 18 4.1 The Sound Effector 24 4.2 Dig i t a l Outpu t Connection server architecture 26 5.1 D a t a server architecture 31 5.2 D a t a server state diagram 32 5.3 Cont ro l loop for RUNNING state 33 5.4 Cont ro l loop for CAPTURING state 33 5.5 Server data flow 34 5.6 One A u d i o P a c k e t 35 5.7 Spectrogram of sound containing "pops" 38 6.1 Mean errors of the parameter estimation algorithm on spectrogram-averaged synthetic data 45 6.2 Standard deviation of error for estimated damping parameters. . . . 46 vi i i 7.1 Adapt ive sampling algori thm 48 7.2 Result of adaptive sampling 49 7.3 Loop scheme refinement masks for subdivision 52 7.4 Algor i thm to find unique frequency mapping between two models. . 56 8.1 Setup for acquiring model of tuning fork 61 8.2 Recorded spectrogram 61 8.3 Results of tuning fork collection 62 8.4 A photo of the brass vase and the subdivision surface which represents it 63 8.5 Setup for acquiring sound model of brass vase 63 8.6 Results of brass vase experiment 65 8.7 Deta i l of narrow-band low-frequency noise 66 8.8 Effect of noise on low-frequency modes 66 8.9 Refinement results of brass vase experiment 67 8.10 A photo of the plastic speaker and the subdivision surface which represents it 69 8.11 Results of plastic speaker experiment 70 8.12 Refinement results of plastic speaker experiment 72 8.13 A photo of the toy drum and the subdivision surface which represents it 72 8.14 P ivo t of the drum's metal bars 73 8.15 Results of toy drum experiment (metal) 74 8.16 Results of toy drum experiment (plastic) 75 8.17 Refinement results of toy drum 76 A . l Bending schedule for sound effector mounting bracket 84 A . 2 Schematic for solenoid control circuit 85 B . l Effect of white noise on spectrograms 87 ix C l Algor i thm to find unique frequency mapping between two models. . 89 C.2 Difference matr ix 89 C.3 Order array 89 C.4 Index array 90 C.5 Matched array 9 ° C.6 M a p p i n g array 90 C .7 F indUniqueMapp ing Pseudo-code 93 x Acknowledgements "Great discoveries and improvements invariably involve the cooperation of many minds." - Alexander G r a h a m Bel l Th is thesis is the result of the encouragement and support of dozens of my friends. W i t h o u t the contributions listed below, this thesis may have never pro-ceeded beyond the course project from which it began. I owe A C M E members past and present a huge thank you! In particular, Jochen Lang and John Lloyd contributed the bulk of the A C M E software within which my experiments resided. Thank you for the seemingly endless discussions regarding the sensor class, mysterious aborts and assorted odd A C M E behaviours. Derek D i F i l i p p o , Paul K r y and D o u g James contributed many focused ideas when mine were fuzzy. Thanks for all the insight into sound, surfaces and life! M y supervisor, Dinesh Pa i , played an enormous role in the development of this thesis. His thoughtful suggestions and advice steered me out of dark corners whenever I got lost. Thank you for being a fantastic supervisor and friend! M a n y other friends supplied advice and support when I needed it most over these past two years. J i m Green and Jacob Ofir were constant barometers of my progress. Bonnie Taylor, Kr is ten Playford and A n d r e a Bun t kept me going through tough times. Ke l ly Cronin gave me a good reason to finish. A n d of course, the rest of the faculty, staff and students of the department always made me feel welcome. Th i s thesis would not have been possible without the generous financial con-tributions of N S E R C , B C Advanced Systems Institute and IRIS. JOSHUA L . RICHMOND The University of British Columbia August 2000 x i Dedicated to my family: Gary , C a r o l and E r i c a R ichmond . Thank you for your unwavering encouragement and support . Y o u are terrific examples of success in work, love and life. x i i Chapte r 1 Introduction W h e n radio dramatists began adding sound effects to their productions in the 1920's, they demonstrated the important role of sound in our perception of everyday events and environments [20]. Wi thou t visual aid, listeners could easily identify unspoken actions, e.g., a character's entrance into a room by the sound of a door opening, followed the sound of footsteps. Even with the advent of motion pictures in 1896, it was quickly realized that the rattles, pings and knocks of events were needed to cement the realism of the viewing experience 1 . W h a t the entertainment industry has learned is that sound is a cri t ical component of our everyday lives. The ubiquity of sound in our lives is not surprising when the physics of contact is considered. When contact occurs between two objects, the energy of the impact is transferred to each object. This energy propagates through the objects, producing vibrations of their surfaces. The frequency, ampli tude and decay of these vibrations depend, in part, on the shape and material of the object [31]. These surface vibrations create fluctuations in the air pressure around each object and are perceived as sound. Such "contact sounds" provide a listener wi th much information: location of contact, contact force, material composition of the object as well as its 1The emergence of sound in film in 1926 was delayed only by the technical difficulties of synchronising recorded sound and film. In fact, Thomas Edison viewed moving pictures as an accessory to the phonograph! [29] 1 shape, size and surface texture. Humans can use these audio cues to recognise events and discriminate between materials [10, 12, 16]. Given the importance of sound to real-life interactions, it is clear that sound is a required component of any successful vir tual environment or s imulat ion. One approach to including sound in vi r tual environments is synthesis from physics-based models [30]. A sound model of an object can synthesise appropriate contact sounds, given a contact force and contact location, by direct computat ion. B y "sound model" we mean a mathematical representation that can be used for simulation of the object's sound. Th i s is in contrast to an approach where recorded samples are simply replayed, or modulated to account for various forces and locations. A n advantage of the model-based approach is that one model can be used to generate sounds for any number of different interactions (e.g., scraping, pinging, rolling) by substi tut ing the appropriate force profile. The parameters of a sound model can be derived analytically for objects with simple geometry and material composition [30]. M o s t everyday objects, how-ever, have complex geometry and material composition which complicate analyt ical solutions. For such objects, model parameters can be estimated from empirical measurements [30]. Recent work on robotic perception has determined that sound models can also be used to automatical ly identify materials [8, 17]. A sound model parameterised over the surface of an object can be viewed as an acoustic map of the object, analogous to a texture map in graphics. To create such a sound model of an object from empirical measurements requires recording sounds at hundreds of locations over the object's surface. To populate even a simple vi r tual environment wi th sonified objects could therefore require thousands of mea-surements. The system described in this thesis automates the sound measurement procedure. To our knowledge, this system is the first to automatically create com-plete sound models of objects. Th i s sound model can be registered to other models 2 of an object (e.g., surface, deformation) to represent a complete reality-based model. Parts of this work were published previously: [25], [26]. A review of sound models and measurement techniques is presented in Chap-ter 2. The requirements of a sound measurement system and an overview of our sys-tem are included in Chapter 3. The components of the system are subsequently de-scribed: the sound acquisition device (Chapter 4). asynchronous data server (Chap-ter 5), prototypical model generation (Chapter 6) and adaptive sampling algori thm (Chapter 7). Results of typical da ta collections are presented in Chapter 8, and conclusions on the work are stated in Chapter 9. 3 Chapter 2 Contact Sounds 2.1 Overview Contact sounds are the sounds produced by our everyday interactions with the environment. We use the phrase "everyday" in the context of Gaver 's definition of everyday listening: the perception of events from the sounds they make [9]. Just as he contrasts everyday listening to musical listening, we distinguish contact sounds from musical sounds. Mus i ca l sounds are themselves the intended result of some action; contact sounds are the unintentional consequence of an action. In general, musical sounds are harmonic while contact sounds are inharmonic. These distinctions are not strict, but serve to separate the domain of our task from the modelling of musical instruments. A s an il lustrative example of contact sounds, imagine the sound produced by setting your coffee mug onto your desk, or pushing it across the desk. The sound of the mug st r iking the desk, and the scraping sound of it being pushed, are both contact sounds. The ease wi th which these sounds are brought to mind emphasises the importance of sound to our everyday experiences; though you may have never consciously attended to the sounds in real life, they can be easily recalled. M u c h useful information is conveyed by contact sounds. Gaver was the first to enumerate this information in [9]. He classified the information into three cat-4 Material Interaction Configuration A Restoring Density Damping Internal Force Structure Type Force Shape Size Resonating Cavities Figure 2.1: At t r ibutes of contact sounds. Gaver identified three categories of infor-mation conveyed by contact sounds: material , interaction and configuration. Each category is composed of more primitive dimensions as shown. Adapted from [9]. egories: material , interaction and configuration. These categories are expanded in Figure 2.1. Such information is beneficial to many applications. For instance, it is useful feedback in teleoperative systems. A s mentioned previously, the addition of contact sounds to vir tual environments and simulations enhances the realism of the environment. Durs t and Kro tkov also demonstrated the use of contact sound models for automatic material classification [8]. Furthermore, recent work has related the parameters of a contact sound model to human perception of material [16, 12]. The system described by this thesis automatical ly acquires the measurements required to create contact sound models. B y "contact sound model" , we mean a mathematical representation that can be used to synthesise the sound produced by contact between two objects. The model is parameterised over the surface of an object, much like a texture map in graphics. Th is chapter is an overview of the mathematical model we use to represent contact sounds. A brief description of the model and its parameters is given. The algorithm for estimating the model's parameters from recorded samples is also ex-plained. M u c h research has been conducted in the field of sound modelling. In particular, many people have modelled musical instruments. Physical modelling is possible 2.2 Related Work 5 for a variety of musical instruments [1]. A s one example, Chaigne and Doutaut modelled the acoustic response of wooden xylophone bars using a one-dimensional Euler-Bernoul l i equation with the addition of two damping terms and a restoring force [3]. Th is formulation is derived from the geometry of the xylophone bars, wi th empirical values for the damping parameters. Another example is Cook and Trueman who used principal component analysis, Infinite Impulse Response (IIR) filter estimation and warped linear predication methods to model the directional impulse response of stringed instruments from empirical da ta [4]. Gaver introduced the modelling of everyday sounds [9]. His model of metal and wood bars is based on the wave equation wi th an exponential damping term for each frequency mode. The fundamental mode, its in i t ia l ampli tude and damping factor are determined empirically. Pa r t i a l modes are calculated as ratios of the fundamental mode. Recently, a physics-based model for contact sounds of everyday objects was developed by van den Doel [30, 31]. The model of van den Doe l was selected for use in our work, and wil l be described in the subsequent sections of this chapter. The model of van den Doel is appealing for several reasons. F i r s t , it is similar in structure to models used by Gaver [9], Hermes [12] and K l a t z k y et al. [16] in their perception experiments. Such perceptual studies may yield results useful to evaluating the performance of our system. Furthermore, these studies provide information applicable to our adaptive sampling algori thm as described in Chapter 7. Secondly, the model is suitable for synthesising sounds in real-time — a necessary component of interactive simulations. Final ly , it was selected for its effectiveness at representing contact sounds of everyday objects. It should be noted that the measurement system described herein is not re-stricted to producing one specific sound model. A n y sound model whose parameters can be estimated from recordings of acoustic impulse responses may be created using this system. 6 2.3 A Contact Sound Model Formula t ing a general sound model is complex because it depends on many param-eters: material , geometry, mass distr ibution, etc. Th is section follows the develop-ment of the contact sound model presented in [30, 31]. If we assume linear-elastic behaviour 1 , the vibrat ion of an object's surface is characterised by the function p,(z,t). Here \x represents the deviation of the surface from equil ibrium point z at time t. fi obeys a wave equation of the form in Equat ion 2.1, where A is a self-adjoint differential operator and c is a constant related to the speed of sound in the material [31]. ( A - l j ^ M z , t ) = F (z , t ) ' (2.1) In the absence of external forces, Equat ion 2.1 can be solved by the expression fi(z,t) = J ^ ( a i sin(u>ict) -f- 6; cos(w;ct))\I>t'(z), (2.2) i=l where at- and 6, are determined by boundary conditions, u>i are related to the eigen-values of operator A and \P; (z) are the corresponding eigenfunctions. W h e n the object is sufficiently far from the listener, the sound pressure due to this solution can be approximated by the impulse response function in Equat ion 2.3. The exponential term is added to model material damping. Complete details of this derivation are provided in [30]. TV, p ( x , t ) = X> x , . -e~ d x > , ' ( * ) sin(w x , i i) (2.3) j=i This model expresses the sound pressure p at t ime t, as the sum of Nj frequency modes. Here x is the location of the force impulse. Each mode i in the model has a frequency w X ) j , ini t ia l amplitude aX j t- and exponential damping factor c?x,i- The damping factor models material damping due to internal friction; in the 1A reasonable assumption for light contacts. 7 literature [32], this is parameterised by an internal friction parameter <f> as expressed in Equat ion 2.4. dx>i = 2 w X i i tan(<£) (2.4) For each location x on the surface of an object, the parameters wX ),-, a X ) t - and d X ) j must be estimated for all Nj modes. Because the model is an impulse response model, sounds produced by any linear force interaction can be synthesised by a simple convolution of the force and the model. 2.4 Empirical Parameter Estimation The parameters for the sound model as it is expressed in Equat ion 2.1 can be determined analyt ical ly by Equat ion 2.2 for objects of simple geometry and material composition [30]. O f course, to solve this expression for arbitrary everyday objects is not feasible. The alternative is to estimate the parameters of a similar model (i.e., Equat ion 2.3) from empirical measurements. T h a t is, the object is struck at position x , the resulting sound recorded, and the parameters estimated from the recording. Examples of such techniques are [4, 8, 30]. Since we are using the model of van den Doe l , we wi l l use the parameter estimation algori thm described in the same work. A n overview of the algori thm is described in this section. Figure 2.2 lists the steps of the algori thm; a brief explanation of each step follows. Readers requiring more detail are referred to [30]. In the first step, a spectrogram is computed for the entire recorded sample. The spectrogram is computed by calculating the discrete Fourier transform ( D F T ) on fixed-width (in time) segments of the recording. The segments are selected using overlapping Hanning windows. For details on Hanning windows and the D F T , a good introduction is [28]. 8 1. Compute the windowed D F T spectrogram of the recorded sample. 2. Identify the signal. 3. Est imate the frequency modes. 4. Est imate the damping parameters. •5. Est imate the in i t ia l amplitudes. Figure 2.2: Parameter Es t imat ion Algor i thm. The signal must now be isolated from the background noise (Step two). The segment of the spectrogram wi th max imum intensity is the start of the signal. T h i s peak corresponds to the onset of the impact . The end of the signal is the first segment k whose intensity Ak falls below A + Xe(A), where A is the average intensity of the region before the signal's start, <r(A) is the standard deviation of that region and X is a constant wi th a typical value of 10. To estimate the frequency modes (Step three), a histogram is created. Each segment (in time) of the spectrogram within the signal region casts votes for the Nj frequencies wi th greatest amplitude wi th in that segment. The Nj frequencies that obtain the most votes over al l the segments are selected as the dominant frequency modes (wX ij) of the model at that location. For each mode, the log of the spectrogram is fit to the linear function —otik + (3,, where k is an index into the segments of the spectrogram, and the signal starts at segment k — 0. The damping coefficients are then calculated by Equa t ion 2.5. dx,i = Qf.on/N, (2.5) where Q is the window overlap factor of the D F T , / s is the sampling frequency and N is the size of the D F T window. The ini t ia l amplitude of each frequency mode can then be calculated by 9 Equation 2.6. (2.6) with pXii = dXtiN/fs. A s stated, the algorithm assumes the recording is the response to an im-pulsive impact. The algorithm can be used wi th responses to other forces by de-convolving the signal wi th the force profile. Aga in , the measurement system described by this thesis could use any suit-able parameter estimation technique, and is not restricted to the method outlined above. This technique was chosen because i t is designed specifically for the selected sound model. 2.4.1 Performance Evalua t ion A n evaluation of the estimation algori thm was performed using synthetic data to determine its robustness to noise. Since the real samples wi l l be recorded in a relatively noisy environment, this evaluation is necessary to estimate the expected performance of the algorithm wi th real data . To evaluate the estimation algori thm, a test signal is synthesised using the model in Equat ion 2.3. Fi f ty frequency modes are randomly selected from a Gaus-sian distr ibution (p = 7 kHz , a = 4 k H z ) . For each frequency mode, damping factors (d{) and ini t ia l amplitudes (a;) are randomly selected from a uniform distr ibution of ranges [10,30] and [0,1] respectively. The in i t ia l amplitudes are scaled so that Noise is added to this test signal at a specified signal-to-noise ratio. M o s t of the noise in the A C M E environment originates from fans used to cool equipment. A quick spectral analysis of the room revealed the highest concentration of noise in a band from 0 to 200 H z . Ambient white noise was present, though at lower energy levels. Th is room noise is approximated in our simulation by low-pass Gaussian 1.0. 10 noise, band-limited at 200 Hz by a fourth-order filter. Broadband white noise is added at l / l O O " 1 of the normalised ampli tude of the low-frequency noise. One hundred trials of the evaluation were executed. For each t r ia l , a test signal was synthesised with noise added at eight signal-to-noise levels: oo (i.e., no noise), 100, 50, 30, 20, 15, 10 and 5 2 . The noisy signal was then processed by the estimation algorithm outlined in Section 2.4. The entire test was implemented in M a t l a b . The estimated model parameters are compared by the following metrics. Each estimated frequency mode / ; is assumed to be the estimation of the generated frequency mode fj that minimises the difference \ fi~ fj\ V / = 1 . . .50. A logarithmic error ratio (Equation 2.7) was computed for each estimated model parameter. The means of these error ratios are plotted in Figure 2.3. Ef = log A Ed = l o g j (2.7) where j is chosen to minimise |/,- — fj\ V j = 1 . . .50. A s illustrated by Figure 2.3, the parameter estimation algori thm is extremely sensitive to noise. For estimates of ini t ia l amplitude and frequency, the error in-creases consistently with increasing levels of noise (Figure 2.3 (a,b)). The trend is not as consistent for the damping parameter estimate (Figure 2.3 (c)). However, as the dashed line in Figure 2.3 (c) indicates, the variance of estimation error increases dramatical ly with increasing noise. M o s t important ly, it should be noted that fre-quency estimates have an error ratio of almost 30 at even modest levels of noise ( S N R = 100). 2The signal-to-noise ratio is calculated as the maximum signal amplitude divided by the maximum noise amplitude. This yields an appropriate measure of signal-to-noise for our experiments since the signal decays to zero amplitude. 11 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.16 0.2 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 noise/signal noise/signal (a) Mean ini t ia l amplitude errors (Ea) (b) Mean frequency errors (Ej) 0 0.02 0.O4 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 noise/signal (c) Mean damping errors (Ed) Figure 2.3: Mean errors of the parameter estimation algorithm on synthetic data. The mean error over 100 trials is plotted for eight signal-to-noise ratios: oo, 100, 50, 30, 20, 15, 10 and 5. For convenience, the inverse ratio (noise/signal) is used as the abscissa. Sub-figure (c) also plots the standard deviation of error in the damping parameter estimate. See Equat ion 2.7 for definitions of Ej, Ea and Ed-12 C h a p t e r 3 A System for Automatic Measurement of Contact Sounds 3.1 Objectives The goal of this work is to produce a system which automatical ly acquires contact sound measurements over the surface of an arbi trary object for the purpose of cre-at ing a sound model. Th is system is one component of a larger project focused on creating complete reality-based models. A reality-based model is one whose pa-rameters are calculated from empirical measurements of real objects. A complete reality-based model is inherently mult i -modal , wi th sound comprising only one part. Other modes could include surface texture, shape and deformation. A s part of the reality-based modelling project, our sound measurement sys-tem shares its development platform: the Ac t ive Measurement ( A C M E ) facility at the Universi ty of Br i t i sh Co lumbia ( U B C ) . The A C M E facility is a fifteen degree-of-freedom ( D O F ) robot designed to acquire measurements for reality-based model creation [21]. The facility is equipped with a 5 - D O F gantry, 6 - D O F robot arm on a linear stage and a 3 - D O F test stage. These actuators are used to position sensors and make measurements of objects mounted on the test s tat ion. Sensors 13 S i m u l a t i o n Sound Measurement (x,y,z) (.••.innate model (x,y,z) Sound Measurement f - t l l l l . U C (x,y,z) model (x,y,z) P o s i t i o n F o r c e ./Convolve Environmental Effects (reverb, spatialisation, etc.) S c o p e o f t h e s i s Figure 3.1: The sound measurement-to-production pipeline. Measurements at many locations on an object's surface (x, y, z) are acquired to create a model at each lo-cation. Force and position input from a simulation are used to synthesise sounds produced by interactions wi th the object. A mixer a lgori thm interpolates between sound models at different locations. Environmental effects such as reverb and spa-tialisation enhance the realism o f the synthesised sound. Our scope is restricted to measurement and modelling (shaded box). include a 3 - C C D colour camera, Triclops [22] tr inocular vision system and a 6 - D O F force/torque sensor. The work that is the subject of this thesis includes the design and imple-mentation of a contact sound measurement system for A C M E . A s described in Section 3.3, the sound measurement system includes all the hardware and software needed to meet the design requirements listed in Section 3.2. The scope of the project is restricted to a system for the measurement of contact sounds. Related research, including sound synthesis, interpolation between sound models and modelling of the environment is outside the scope of this thesis. Figure 3.1 places our project in the context of the sound measurement-to-production pipeline. 3.2 System Requirements To be successful, a sound measurement system must be able to deliver low-inertia, near-impulsive impacts over the surface of an arbi trary object. Models generated by the system must be registered to the surface of the object for integration wi th 14 other modes of the complete reality-based model. The force profile of each impact must be known so that the parameters of a sound model can be estimated as described in Section 2.4. The system should acquire models automatically. Enough samples should be collected to adequately represent the variation in sound over the entire outer surface of the object. Since the system wi l l be a component of A C M E , all devices used by the system must be controllable under the A C M E framework. Tha t is, an interface to every device must exist in the A C M E Java architecture and be operable from a remote workstat ion. It is assumed that a surface representation of the test object is provided to the sound measurement system by the A C M E trinocular camera. Th is aspect of the system is not an objective of the current research and is not discussed herein. M o r e detailed requirements of each component of the sound measurement system are presented in the respective chapters. 3.3 System Overview A sound measurement system was designed and implemented to meet the require-ments listed in Section 3.2. The system can be viewed as four modules: a sound acquisition device, an asynchronous da ta server, an algorithm for comput ing proto-typical sound models and an adaptive sampling algori thm. Each of these modules wil l be discussed in the following four chapters. Chapter 4 outlines the requirements and implementation of the sound ac-quisition device. Th i s includes the design of a sound effector for the A C M E robot arm, the software to control it and the selection and location of microphones. The sound effector is a solenoid that mounts on the end of the P u m a 260 robot arm and delivers near-impulsive impacts to objects under computer control . A description of the asynchronous da ta server is provided in Chapter 5. 15 The data server architecture is adaptable to many types of data. Its design and specialisation to sound data are discussed. One advantage of using a robotic system is the ability to record multiple samples at the same surface locat ion. A collection of samples can be used to gen-erate a prototypical model which best represents the sound at that location. One algorithm for generating prototypical models is explained in Chapter 6. The final module is an adaptive sampling algorithm for selecting points on the object's surface at which to sample. The sampling algorithm uses the object's surface representation to create a mesh of sample locations. It is adaptive because it uses differences in sound models to adjust the granularity of the sampling mesh. The surface representation and adaptive algorithm are described in Chapter 7. The general procedure for creating a complete sound model using this system is diagrammed in Figure 3.2. F i r s t , a test object is positioned at the center of the test station (Figure 3.3). Typica l ly the object is attached such that it does not move from light contact, yet remains free to vibrate. A surface model of the object is then acquired using the tr inocular stereo camera. F rom this surface model, a coarse sampling mesh is created using the surface's vertices as sample locations. Next , the sound effector is positioned 5 m m away from each sample location. This is done with a guarded move: the sound effector approaches the sample location unti l contact is sensed, then retracted 5 m m . Once in position, the sound effector is actuated multiple times, the resultant sound samples recorded, and models computed. F rom the multiple samples, a prototypical model is produced for each sample location. If two prototypical models at adjoining vertices of the surface mesh differ by too much, the sampling mesh is refined, and a new model is acquired at a position between the two coarse locations. Th is procedure continues unti l no further refinements are required. The complete sound model produced by this procedure is easily registered to other modes of the object model since it creates sound models at vertices of the 16 Start Move sound effector back 5 mm "TT" \ Actuate sound effector and record sample A\TL Create sound model Repeat M times TV Create prototypical model TT Compute acoustic distance to neighbouring sample locations Figure 3.2: General procedure for sound model creation. 17 Figure 3.3: A test object on the A C M E test s tat ion. 18 object's surface model. Results of some typical data collections are presented in Chapter 8. The accompanying C D contains audio tracks and an application comparing recorded sounds to sounds synthesised from a model. 19 C h a p t e r 4 A Sound Acquisition Device 4.1 O v e r v i e w In order to estimate the parameters of an object's sound model, a recording of the object being struck wi th a known force is required. The process of est imating parameters from the recording is described in Section 2.4. Since our goal is to create sound models automatically, we require a device which is capable of s t r ik ing arbitrary objects in a manner suitable for the estimation process. A device is also required to record the resultant sounds. More detailed requirements of these devices are listed in Section 4.3. Th i s chapter describes a device that was designed to strike objects for the purpose of sound measurement. The device is an end effector for the P u m a 260 robot arm which is part of the Ac t ive Measurement facility at U B C . The Ac t ive Measure-ment facility is described in Section 4.4. Details of the end effector and sound capturing hardware are presented in Sections 4.5 and 4.6. Section 4.7 describes the software required to control the end effector. The software used to record the sound samples is the topic of Chapter 5. 20 4.2 Related Work Although this is the first work to automatically create sound models over the entire surface of an object, other people have investigated sound model creation from recordings of real objects. The mechanisms by which these recordings were produced are varied. Cook and Trueman recorded directional impulses from a number of acoustic instruments by s t r iking the instruments ' strings with a M o d a l Shop M o d e l 086C80 miniature force hammer [4]. The sound was recorded using a twelve-microphone icosahedral grid assembly. Recordings were stored using two Tascam D A - 8 8 digi ta l audio tape recorders. It was not mentioned how precisely the impact locations were registered in space, since registration was not important to the study. Impacts were performed manually, and the force profile of each impact was used to filter "bad" impacts. The commercial force hammer used by Cook and Trueman is not suited to our task because it cannot effect an impact under its own power. Whi l e it could be mounted at the end of a robot arm, the swinging of a hammer is difficult to control . Durs t and K r o t k o v created impulse response models of different materials by dropping an a luminum cane wi th a plastic t ip onto each material [8]. The cane was dropped from a constant height through a cyl indrical guide [17]. Recordings were made using an omni-directional condenser microphone connected to a PC-based A / D board. Th i s method is clearly not suitable for our task. van den Doel created sound models from impacts with a non-instrumented hammer [30]. A g a i n , the hammer mechanism is not suited to our application; this experiment is mentioned because satisfactory results were obtained wi th only a crude approximation of the impact force. Huang uses a tapping device to position and orient objects in a plane [13]. His requirements of the device are similar to ours, except he does not require the device to operate off the horizontal plane. The mechanism he proposes is s imilar to a pinball plunger; a spring-loaded rod is released by an electric latch, then reloaded 21 automatically [W. Huang, personal communicat ion, A p r i l 28. 2000]. Unfortunately, his device may not operate correctly wi th any vertical inclination. 4 . 3 R e q u i r e m e n t s The s tr iking device must deliver low-inert ia impacts to objects at any position on their surface. This requires operation through a ± 9 0 ° vertical range. The force of impact should not be so strong as to move the object, yet strong enough to produce a recordable sound. The quantity of force is dependent on the material being measured. The force must be near-impulsive. A true impulse is not realisable, but may be approximated by a force that is well localised in space and time. A s demonstrated by van den Doe l , such approximate impulses yield satisfactory sound models [30]. To permit future experiments, the device should be usable for scraping ob-jects. Th is wi l l permit acquisition of sounds for granular synthesis, another model for sound synthesis. For integration wi th other models produced using the Ac t ive Measurement facility, the locations of impact must be registered to a common frame-of-reference. 4 . 4 T h e A c t i v e M e a s u r e m e n t F a c i l i t y A s mentioned in the first chapter, our system is a component of a larger project at U B C : the Act ive Measurement facility ( A C M E ) . The A C M E facility provides a rich environment for our measurement system. Currently, A C M E consists of three main subsystems: the field measurement system ( F M S ) , test station and contact measurement system ( C M S ) [21]. These subsystems are used as a base platform for the hardware required by the sound measurement system. The roles of each subsystem are expanded below. The field measurement system is a five degree-of-freedom ( D O F ) robot con-22 sisting of a 3 - D O F gantry and a pan / t i l t unit . Current ly a 3 C C D colour camera and Triclops trinocular stereo camera are attached to the pan / t i l t unit. The F M S is used to acquire measurements at a distance from the test object (i.e., in "the field"). A microphone is added to the F M S for making sound measurements. The test station is a 3 - D O F robot which can position (x, y) and orient objects being measured. Its accuracy is ± 0 . 0 0 0 2 5 " and ± 1 0 arc-min [21]. The contact measurement system consists of a 6 - D O F P u m a 260 robot arm with an A T I force/torque sensor mounted at its t ip . The C M S can move an end effector into contact wi th the test object in a working volume around the test station. To prepare the C M S for sound measurement, the three fans of the P u m a control box were replaced by whisper fans. The P u m a arm cannot be used to directly strike the objects because its inertia prevents light, impulsive impacts. Instead, a special end effector is attached. The end effector is described in Section 4.5. Each of these subsystems is controlled remotely using Java-based control software. For more details on this architecture, refer to [21]. Using the A C M E facility as a development platform provides registration of sound samples to other models (e.g., deformation models) since all A C M E sensors and actuators share a common frame-of-reference. The sound model is specifically registered to the surface model of the test object. 4.5 Sound Effector We refer to the end effector designed to strike objects for sound measurement as the sound effector (see Figure 4.1). Th i s device is centered on an electric push-solenoid (Ledex S T A model 195025-227). The solenoid is augmented wi th a return-spring and mounting bracket. A retaining mechanism is designed into the mounting bracket to provide some rigidity in the de-energised state of the solenoid. Th i s rigidity enables the sound effector to be used for scraping objects. The mounting bracket is constructed from aluminum plate. 23 Figure 4.1: The sound effector is a spring-return push solenoid mounted on an alu-minum bracket. A condenser microphone is attached to the bot tom of the bracket. The current design is minimal so as to reduce the weight of the effector. For example, given that a threaded rod is available on the robot 's interface plate, the design incorporates the rod for both attachment purposes and as the aforementioned retention device. The total torque applied to the rod by the effector is approximately 0.0653 N-m (total weight: 106 g) — within acceptable l imits of the force/torque sensor (500 g). A blueprint of the mounting bracket and details of its construction are included in Append ix A . A n interface circuit is required to activate the solenoid from software. The circuit schematic is included in Appendix A . The interface circuit is connected to a digital output of a Precision Mic roDynamics M C 8 board. The board provides a 5 V D C output controllable from its on-board S H A R C D S P or the host computer. Curren t ly the M C 8 board is also used to run a P I D control loop for the F M S and test station. A description of the software used to control the sound effector is presented in Section 4.7. 24 4.6 Sound Capture Hardware The sound capture hardware consists of two condenser microphones and a P C sound card. The microphones are Opt imus omni-directional lapel microphones with a fiat frequency response from 70 to 16 000 Hz [14]. One microphone is attached to the bot tom of the sound effector's mounting bracket (Figure 4.1); the other to the pan / t i l t unit of the F M S . This placement enables near and far-field recording of impact sounds. The microphone on the F M S can be moved to any location around the object for experiments evaluating directional impulse responses. A Creative Sound Blaster Live! card is used to record the sounds digitally. Th is card is commercially available and can sample at up to 46 k H z [5]. Using a P C sound card facilitates easy and affordable upgrades as technology improves. One disadvantage of the card is that only two channels, of sound may be recorded simultaneously. Furthermore, both channels must use the line-in connection, since the microphone connection is single channelled. Th i s restricts our system to using two microphones, both of which must be pre-amplified to line levels. If more input channels are required, a high-end sound card could be purchased. 4.7 Sound Effector Software Act iva t ion of the sound effector requires a unit step signal from the digi tal output of the M C 8 board. The step width determines the stroke distance of the solenoid. Initially, the step function was to be generated in the interface electronics. A soft-ware solution, however, enables us to change the width of the unit step, and hence stroke length, at runtime. To control this output using A C M E requires a Java inter-face compliant wi th the A C M E D e v i c e interface. Th is section describes the design of the A C M E D i g i t a l O u t p u t C o n n e c t i o n S e r v e r ( D O C S ) — an interface between A C M E and the digi tal outputs of the M C 8 board. Th i s interface is used by the 25 ACME Server M C 8 HAV RMI Figure 4.2: Dig i ta l Output Connection server architecture. sound effector, but can also be used by other devices requiring similar control (e.g., lights). The D i g i t a l O u t p u t C o n n e c t i o n S e r v e r has three main components (Fig-ure 4.2): a C o n n e c t i o n S e r v e r , O u t p u t S e r v e r and one or more D i g i t a l O u t p u t D e v -i c e s (e.g., sound effector, or spot l ight). The M C 8 board has 32 digital output lines available for external devices [23]. The C o n n e c t i o n S e r v e r manages the allocation of each output line to a specific D i g i t a l O u t p u t D e v i c e . The O u t p u t S e r v e r is a ' C program (with a Java native interface) which controls the ini t ial isat ion, t im-ing and output of the signal to the M C 8 board. The D i g i t a l O u t p u t D e v i c e is an abstract implementat ion of the A C M E D e v i c e interface. Each device using the D i g i t a l O u t p u t C o n n e c t i o n S e r v e r is controlled by a class extending D i g i t a l O u t -p u t D e v i c e . Each device must lock one channel of the M C 8 for its exclusive use by registering wi th the C o n n e c t i o n S e r v e r at ini t ial isat ion. The O u t p u t S e r v e r and C o n n e c t i o n S e r v e r run as separate processes from the A C M E server. Al though the O u t p u t S e r v e r and C o n n e c t i o n S e r v e r reside on the Solaris computer hosting the M C 8 board, devices can be controlled from A C M E experiments running on any computer because the D i g i t a l O u t p u t D e v i c e s commu-nicate to the C o n n e c t i o n S e r v e r using the Java Remote Me thod Invocation ( R M I ) interface. A s mentioned above, t iming of the unit step is controlled by the O u t p u t S e r -26 ve r . Th is ' C program provides resolution better than a millisecond, l imited only by the Solaris operating system. Isolating the t iming from the A C M E experiment ensures consistent t iming of the output signal. 27 Chapter 5 An Asynchronous Data Server 5.1 Overview The next module of the system is the software used to record sounds produced by str iking the test object. Previously, no generic architecture existed within A C M E for capturing streaming data. A s wi th the DigitalOutputConnectionServer (see Section 4.7) a generic data server was designed, and then specialised to sound da ta for this research. The generic da ta server framework became the Sensor class of the A C M E project. Th i s chapter describes the design and implementation of an asynchronous data server for A C M E , including its specialisation to sound data. The next section lists the requirements of such software. Section 5.3 describes the architecture of the data server. Implementation details are discussed in Section 5.4. 5.2 Requirements This section outlines the requirements of a generic data server (Section 5.2.1) and its specialisation to sound da ta (Section 5.2.2). 28 5.2.1 Generic Da ta Server Requirements One data server must exist for each sensor device. If multiple sensor devices of the same data, type exist, each sensor must have its own data server. Each data server wi l l run as a separate process. Th is division enables dis-tr ibution of the server processes over multiple computers. If the sensor hardware resides on a computer that is not the main A C M E host, the sensor process should also reside on that computer. Since da ta collection can be an intensive operation, distr ibution increases the data servers' abi l i ty for real-time collection by allocating more processing resources. Dis t r ibu t ion also reduces the amount of data which must be streamed over the network in real-time. If platform-dependent software is required for a data server, it must not prevent the A C M E experiment from accessing the data from a different operating system. The data server must be asynchronous. T h a t is, the data server process controls the starting and stopping of da ta collection independently of the main A C M E server. This autonomy eliminates the need for the da ta server to stream data to the A C M E server in real-time. Since sensor data can be large (e.g., image data from cameras) or frequent (e.g., 44.1 k H z for sound) i t is implausible to transmit each frame of data to the A C M E server for real-time monitor ing. The cri teria for s tart ing and stopping data collection must be definable by the A C M E experiment. A method must be provided to the A C M E experiment that indicates when data is being collected and when it has finished. 5.2.2 Sound Server Requirements The sound server must capture data at at least two sampling rates: 44 100 Hz and 22 050 H z . If the capturing hardware supports higher sampling rates, the sound server should accommodate them by user definable properties. The format of the data 29 may be 8 or 16-bit, and one or more channels (to the l imitat ions of the capturing hardware). Since P C sound cards are readily available and of sufficient quality for our purposes, the sound server wi l l capture data from a commercial sound card. 5 . 3 Server Architecture The data server architecture has four main components: sensor hardware, a Sen -s o r S e r v e r , S e n s o r D e v i c e , and a data stream connecting the S e n s o r S e r v e r and S e n s o r D e v i c e (Figure 5.1). The sensor hardware component is an abstraction of both the physical hardware and device drivers. Typica l ly , a native interface is created to allow the device to communicate wi th other Java components. Also , although incoming data is typical ly buffered at the hardware level, it cannot be guaranteed to not drop frames of data if read too slowly. The S e n s o r S e r v e r is a process which resides on the computer hosting the sensor hardware. This process queries the sensor hardware at a prescribed rate, starts and stops data capture, and buffers the incoming data . The S e n s o r D e v i c e is a remote interface to the S e n s o r S e r v e r . It communicates to the S e n s o r S e r v e r v i a the Java Remote Method Invocation (RMI) protocol. The S e n s o r D e v i c e resides in the A C M E server and is the A C M E experiment's link to the S e n s o r S e r v e r . A da ta stream connects the S e n s o r S e r v e r and S e n s o r D e v i c e . The stream uses socket communication to pass da ta from the S e n s o r S e r v e r to the S e n s o r D e v i c e . D a t a is wri t ten to the stream by the S e n s o r S e r v e r and buffered until it is read by the S e n s o r D e v i c e . Thus, da ta can be reliably streamed to the A C M E experiment at a rate slower than its acquisition at the hardware device. This architecture enables us to acquire da ta from any computer, regardless of its platform. Only the S e n s o r S e r v e r uses platform-dependent native code; the S e n s o r D e v i c e can reside on any remote computer that hosts a Java V i r t u a l M a -chine. Since each S e n s o r S e r v e r is writ ten for a specific device operating on a fixed 30 data flow control flow A C M E Server Data Stream R M I Figure 5.1: The architecture of the data server is divided into four main compo-nents: sensor hardware, SensorServer, SensorDevice and a data stream. The SensorServer runs as a separate process and is not necessarily run on the same computer as the A C M E server. platform, this dependence is not restrictive. Not all da ta is passed from the sensor hardware to the A C M E experiment. Users define the cri teria for starting and stopping data capture by creating custom SensorTriggers. One SensorTrigger is created to start capturing, and one to stop capturing. These SensorTriggers can be monitored by other processes at runtime by registering a TriggerListener object. The TriggerListener object wi l l be notified when a SensorTrigger is activated. Th i s mechanism provides a rough synchronisation tool to the A C M E experiment. The next section clarifies the role of the SensorTriggers in server execution. 5.3.1 S e r v e r E x e c u t i o n This section describes the operation of the da ta server. Its operation can best be viewed as the state diagram in Figure 5.2. The server begins in a LIMBO state. Th is is a default state indicating that it has been created, but not yet initialised. The A C M E server initialises the data server at s tartup. The data server then initialises 3] startSpoolingO Figure 5.2: D a t a server state diagram. sensor hardware and creates required data buffers. After ini t ial isat ion, the da ta server waits in a STOPPED state. The RUNNING state is entered by a request from the A C M E experiment to start spooling data . In the RUNNING state, the data server executes a control loop at a rate prescribed by the A C M E experiment. The control loop is listed in Figure 5.3. A t each i terat ion, the next frame of data is requested from the sensor hardware. Here, a frame is defined as a data measurement for one time period (e.g., one image from a camera, or one 6-value reading from a force/torque sensor). T h e current frame is passed wi th a window of previous da ta to the S T A R T trigger. If the conditions for the S T A R T trigger are satisfied by the current frame, the da ta server is placed into the CAPTURING state. Otherwise, the RUNNING control loop repeats. Once the S T A R T trigger has been activated, the data server is in the CAP-TURING state. The CAPTURING state runs a control loop similar to that for the RUN-NING state, wi th the exception that data is passed onto the data stream (Figure 5.4). D a t a is read one frame at a time from the sensor hardware, then analysed by the S T O P trigger. If the conditions of the S T O P trigger are met, the data server enters the STOPPED state. Otherwise, the data is added to the stream and the control loop 32 1. Read next frame from sensor hardware. 2. Evaluate frame in S T A R T trigger. 3. If S T A R T trigger fires, change state to CAPTURING. 4. Otherwise, loop to step 1. Figure 5.3: Cont ro l loop for RUNNING state. 1. Read next frame from sensor hardware. 2. Evaluate frame in S T O P trigger. 3. If S T O P trigger fires, change state to STOPPED. 4. Otherwise, send frame to da ta stream. 5. Loop to step 1. Figure 5.4: Con t ro l loop for CAPTURING state, repeats. Section 5.3.2 elaborates on the flow of data throughout this process. 5.3.2 Data Flow The preceding discussion of server control loops presented a simplified view of the flow of da ta through the system. M o r e details are included in this section. Figure 5.5 illustrates the flow of data through the server. B o t h the RUNNING and CAPTURING control loops use the same da ta flow structure to the point indicated in Figure 5.5. A t each iteration of the control loops, one frame of data is read from the sen-sor hardware. A s mentioned in Section 5.3, the sensor hardware typical ly provides a data buffer; this occurs at either the hardware or device driver level. Th i s buffer is assumed to overwrite old da ta as the buffer fills. Thus, if da ta is not read quickly 33 copy — ' reference RUNNING loop stops here Hardware . Buffer 1 frame ' •> Rmg ; \ 1 ri";'i i 1 frame / Transfer ' - . 'Data"'*) l°" \ liullci /• / A V Buffer . Stream Figure 5.5: Server data flow. enough, frames may be lost. It is assumed that each frame read has not been read previously. Th is frame of da ta is copied into a ring buffer that resides in the S e n s o r S e r -v e r . The S e n s o r S e r v e r guarantees that the data in the ring buffer wi l l not be overwritten until it has been examined. The ring buffer is passed to the trigger object (either S T A R T or S T O P depending on the system state). The trigger has a defined window size for looking at the data. Th is window size must be smaller than the size of the ring buffer and greater than or equal to one frame. B y defining a window size greater than one frame, the trigger can perform t ime-domain filtering of the signal, or use a time-dependent trigger condit ion. A n example of such a trigger is one that is activated by a signal that is greater than the average of the past five frames. If the server is in the RUNNING state, the data flow ends here. In the CAPTURING state, the new frame of data is appended to a transfer buffer. Once the transfer buffer is full , it is wri t ten to the data stream. The size of the transfer buffer is controlled by the user. For data which is sampled at a high rate, it may be desirable to buffer several hundred frames before wri t ing to the stream. This reduces the amount of communicat ion overhead needed to transmit each frame over the network. The A C M E experiment can monitor the data capture by querying the data stream. Once the data capture is complete, the data stream's end-of-file flag is 34 One Audio Frame (2 x 16-bit channels) One AudioPacket Figure 5.6: A n A u d i o P a c k e t contains multiple frames of audio data. Each audio frame contains one or more channels of data. Each channel contains one audio sample. Here, two channels (L, R) of 16-bit (2-byte) samples are il lustrated. raised. 5.3.3 Specialisation to Sound D a t a The A C M E sound sensor server is a specialisation of the generic data server model. The specialisation is straightforward wi th one exception that is explained below. Sound data is typical ly captured from the sound card at a frame rate of 44 100 Hz . One frame of sound data contains one or two channels of 16 or 8-bit sound samples (Figure 5.6). A s explained in Section 5.3.2, a lot of processing and data movement occurs with the receipt of each frame of data. Al though da ta is passed by reference where possible, there are inevitably da ta copies and memory allocations when new data is passed onto the data stream. There is also some computat ional overhead each time the trigger analyses the data. The generic server model cannot process incoming data at audio frame rates; because data is not read quickly enough from the hardware, frames are dropped. To achieve desired frame rates, audio data must be processed in groups of more than one audio frame. A simple redefinition of "frame" suffices to achieve real-time performance with the generic data server model. Here we define an A u d i o P a c k e t to be an array of one or more audio frames (Figure 5.6). The A u d i o P a c k e t becomes the 35 conceptual "frame" that is read from the sound sensor hardware by the sound server. Performance of the sound server using A u d i o P a c k e t s is discussed in Section 5.4.1. 5.4 Implementat ion Deta i ls The generic data server model is an abstract Java class, wi th interfaces defined for Sensor and T r i g g e r objects. The sound server is also writ ten in Java with the exception of the code that reads data from the sound card. Originally, da ta was to be read from the sound card using the JavaSound A P I from Sun Microsystems. A t the t ime of development, JavaSound was a Be ta release (version 0.86). Not all required functionality was available, and frequent changes to the A P I rendered this option undesirable. Currently, da ta is read from the sound card using the Microsoft D i r e c t X A P I (version 6.0). A Java wrapper was wri t ten to interface the D i r e c t X ' C code with the rest of the sound server. Every effort was taken to create a Java wrapper that parallelled the JavaSound A P I so that it could be substituted at a later date. T w o triggers are used to capture sound for modell ing: a T h r e s h o l d T r i g g e r and F i x e d D u r a t i o n T r i g g e r . The T h r e s h o l d T r i g g e r is used as a S T A R T trigger to begin capturing data after the amplitude of the signal exceeds a threshold. The F i x e d D u r a t i o n T r i g g e r is used as a S T O P trigger to make each recording the same length. Th is combination of triggers yields recordings of the impulse responses that are identical in length and contain approximately the same length of silence before each impact. 5.4.1 Performance Evaluation The sound server was subjected to a performance evaluation to guarantee the in-tegrity of the recordings, and estimate bounds on the size of the A u d i o P a c k e t s and transfer buffer. A one-second 1 kHz sinusoidal tone was used as the test st imulus. 36 A cable connected the line-level output of one computer 1 to the line-level input of a second computer 2 . The sound server ran on the second computer and transferred data to a client running on the first computer over an Ethernet connection. A F i x e d D u r a t i o n trigger was used as a S T O P trigger to record 2.5 seconds of sound for each t r ia l . In each tr ial , the size of either the transfer buffer or AudioPackets was varied. The test was run five times for each parameter setting wi th the one-second tone played at a random start time within the 2.5 second recording window. The recorded sound was saved as a P C M wave file and examined visually and audibly for evidence of degradation. Recordings were captured at a 44.1 kHz sampling rate. T w o cri ter ia are used to subjectively measure performance: the length of the recorded tone, and the presence or absence of "pops". The length of the recorded tone is computed manually using a graphical sound editor. If the recorded tone is less than one second, the parameter settings are designated unsatisfactory. Smaller gaps in the recording are experienced as "pops" when heard, and observed as spikes in the spectrogram (Figure 5.7). Parameter settings producing "pops" are also unsatisfactory. Results of this evaluation are tabulated in Table 5.1. F r o m this table, it is evident that the audio signal wil l lose large chunks of data i f the AudioPacket size is not great enough. Similarly, "pop"s wi l l be present in the recorded da ta unless the transfer buffer is large enough. From this data, it is concluded that a transfer buffer of two seconds and an AudioPacket size of 100 frames are required for adequate performance. 5.4.2 L imi t a t ions Whi le performance of the sound server is acceptable, one l imitat ion remains. Using the D i r e c t X v6.0 A P I prevents us from running the sound sensor server on operating *A dual PII 350MHz running Windows N T with a Creative SoundBlaster Live! sound card. 2 A Pentium 120 MHz running Windows 98 with a Creative SoundBlaster Live! sound card. 37 0 0.5 1 1.S 2 Time (sec) Figure 5.7: Spectrogram of sound containing "pops". Three "pops" are visible in the spectrogram as vertical lines at approximately 1.4, 1.5 and 1.65 seconds. These "pops" are caused by dropped audio frames. Th is sample was recorded using a trans-fer buffer of one second, and an A u d i o P a c k e t size of 200. The recording parameters were two-channel, 16-bit sound sampled at 44.1 k H z . Transfer buffer Channels B i t s A u d i o P a c k e t size size (sees) Too short "Pop"s 1 16 1 1 Y N 1 16 10 1 Y N 1 16 100 N Y 1 16 200 1 N N 2 16 1 1 Y N 2 16 10 1 Y N 2 16 100 1 Y N 2 16 200 1 N Y 2 16 1000 1 N Y •1 16 1 2 Y N 2 16 10 2 Y N 2 16 100 2 N N 2 16 200 2 N N Table 5.1: Results of the sound server performance evaluation. Highlighted rows indicate parameter settings that are successful. Recordings were captured at a sampling rate of 44.1 kHz on Pent ium 120 M H z running Windows 98. 38 systems other than Microsoft Windows 95/98/2000. A s mentioned in Section 5.3, this does not prevent us from streaming data to clients on other comput ing plat-forms, but restricts us to Windows-compatible sound cards. Current ly, Windows is the most widely-supported operating system for sound cards, so this is not a major l imi ta t ion . A l so , the D i r e c t X Java interface maintains the JavaSound design and class structure to enable easy substitution once the JavaSound A P I is complete. 39 C h a p t e r 6 Building a Prototypical Model 6 .1 Overv iew The hardware and software described in the previous two chapters enables us to acquire multiple samples at any location on the surface of an object. These samples are each represented by a sound model. The sound model may be any one of the impulse response models as discussed in Chapter 2. The parameters of the model are computed for each sample using the appropriate estimation technique. In our current implementation, the model of van den Doel [30] represents each sample; the parameters of the model are computed by the technique presented in [30]. Refer to Section 2.4 for a brief description of this technique. The advantage of acquiring multiple samples at the same location is that in-strumentation and background noise can be minimised by averaging the samples. A prototypical model, representative of multiple models, can be created at each sample location. The prototypical model should be less affected by noise than each individ-ual model. Such a prototypical model is used for comparing the acoustic distance between two sample locations; this use is discussed in Chapter 7. A prototypical model is also used for synthesis when the object is simulated. This chapter outlines one approach to generating a prototypical model from the available data. The approach is an intuit ive one and produces satisfying results. 40 While there are other solutions to this problem which may produce better results, our intention is to provide one example by which to demonstrate the utility of such models. 6.2 Related Work Although our exact problem appears to be unique, two other fields of audio research have produced related work: speech and speaker recognition and audio morphing. Neither of these fields shares our exact goal, but each is similar in some respect. Speech and speaker recognition researchers have investigated methods for clustering sets of audio data. Thei r problem is one of classic pattern recognition: given a set of N categories (e.g., different speakers or words), and M training exem-plars for each category, classify a new data sample into one of the categories. A t a high level, this is most often accomplished by creating a prototype for each category from the training set, then computing a distance from the new data to each of the prototypes. These prototypes are typical ly formed using a modified K-means algo-r i thm on the vectors resulting from Linear Predict ive Cod ing ( L P C ) analysis [24]. Similar methods were implemented on our sound models, but the variabil i ty of our models (due to noise) caused unsatisfactory results. Aud io morphing is a process that smoothly blends one sound into another. If you imagine a continuum between two sounds, an audio morphing algorithm can generate a sound at any point on the continuum. T h i s sound contains a proportion of each sound, yet is perceived as a single new sound. Slaney et al. describe one such method in [27]. They claim that cross-fading conventional spectrograms is not convincing if the two source sounds are not similar in pi tch. Thei r approach is to represent sounds using a smooth spectrogram (derived from the mel-frequency cepstral coefficients [24]) and a residual spectrogram which encodes the pi tch. Inter-polation occurs in this higher-order spectral representation which is then inverted to produce the resultant sound. If their morphing algori thm could be extended to 41 morph between more than two sounds, this approach could be used to generate our prototypical sound models by estimating model parameters from the composite sound produced by the morph. Indeed, a morphing algori thm is a powerful tool in its ability to weight the contribution of each sound to the morphed result. 6 .3 S p e c t r o g r a m A v e r a g i n g The approach we use is similar in spirit to the morphing algori thm of [27], but much less sophisticated. We are able to use a simpler algori thm because we are dealing wi th a narrow class of sounds: single impact sounds which decay exponentially and are similar in pi tch. Our approach is to compute the "average" spectrogram of M samples at one sample location, then estimate the model parameters from this "average" spectrogram. The approach is intui t ively satisfying and produces reasonable sound models. Comput ing the average t ime signal of sound samples is non-tr ivial . Because the modes of the impulse response may not be in phase across samples, exact align-ment in time is difficult. Phase differences introduced by inexact alignment wi l l create a signal which sounds like many separate sounds played together. In contrast, comput ing the average spectrogram is easier since the spectro-gram contains no phase information. Furthermore, the average spectrogram is a natural representation for our task, given that the model parameters are estimated from spectrograms. Since the samples are recorded at the same sample location, we assume the samples have the same pitch, thereby avoiding the problems indicated by Slaney et al. [27]. Al ign ing the M spectrograms is straightforward. Since each spectrogram represents an impact sound, they can be aligned by matching the onset of impact . Th is onset is represented in the spectrogram as the t ime frame with the maximum total amplitude. Once aligned, the average spectrogram is computed by the mean amplitude of each frequency in each time frame. 42 Since the energy of each impact is not exact, the energy of each signal is nor-malised prior to computing its spectrogram (Equat ion 6.1). Energy normalisation ensures that ^2y(t)2 = 1. Mathemat ical ly , our approach is also satisfying. If we assume we are record-ing a signal gi(t) which is composed of the true signal y(t) and a random addi-tive noise process n z ( i ) , the development in Equat ion 6.2 shows that the average spectrum G(u>) is identically equal to the true spectrum Y(u>) i f we also assume a zero-mean noise process 1 . 9i{t) = y ( 0 + «.•(*) Gi{u>) = Y{u) + Ni(u) GH = E ' = l f ' H (6.2) 'IT >) , M M 6.4 Performance Evaluation To evaluate the effectiveness of spectrogram averaging, a test was conducted using synthetic data . Th is test is similar to the evaluation of the parameter estimation algori thm (Section 2.4.1). For this evaluation, M samples were created for each t r ia l by adding noise to sounds synthesised using fifty random modes. A description of the noise and synthesised sounds is found in Section 2.4.1. Eight signal-to-noise ratios were used: oo (i.e., no noise), 100, 50, 30, 20, 15, 10 and 5. Averaging over the M samples pro-duced a spectrogram from which model parameters were estimated. The experiment JThis characterisation of noise is a common assumption, though unlikely to be correct in our situation. 43 was conducted using five values of M : 1 (control), 2, 5, 10 and 20. One hundred trials were conducted for each pairing of S N R and M values. The resulting sound models are evaluated by the same measures as Sec-tion 2 .4 .1 ; the metrics are repeated in Equat ion 6.3. /••/ = > o g | Ed = logjjj- (6.3) Ea = logfj-where j is chosen to minimise | / ; — fj \ V j = 1 . . .50. The mean error for each signal-to-noise ratio (SNR) is plotted by the dashed lines in Figure 6 . 1 . The solid lines are the mean error over al l eight S N R ' s . The colour of each line indicates the number of samples (M) used in the spectrogram average. A s Figure 6.1 shows, spectrogram averaging substantially reduces the mean error for in i t ia l amplitude and frequency estimates. Even averaging two samples yields improvements of 20% and 12% on Ea and Ej respectively. The mean error of the damping parameter is not significantly reduced by spectrogram averaging. The standard deviation of the error is, however, dramat i -cally reduced wi th increasing values of M . A s Figure 6.2 illustrates, the standard deviation of error is reduced by an average of 61% using only five samples. Th is result implies that spectrogram averaging yields a more consistent est imation of the damping parameter. 44 0.45 0.4 0.35 0.3 0.25 0.2 0.15. 0.1 1 sample 2 samples 5 samples 10 samples 20 samples 60 50 40 30 0 • o 0.05 0.1 noise/signal 0.15 0.2 20 10. 1 sample 2 samples 5 samples 10 samples 20 samples (a) Mean ini t ia l amplitude errors (Ea). 0.05 0.1 0.15 0.2 noise/signal (b) Mean frequency errors (Ef). 1.5 0.5 1 sample 2 samples 5 samples 10 samples 20 samples 0.05 0.1 noise/signal 0.15 0.2 (c) M e a n damping errors (Ed). Figure 6.1: M e a n errors of the parameter estimation algori thm on spectrogram-averaged synthetic data. The mean error over 100 trials is plotted for eight signal-to-noise ratios (oo, 100, 50, 30, 20, 15, 10 and 5) by the dashed lines. A solid line is the mean error over all signal-to-noise ratios. The colour of each line signifies the number of samples (M) used in the average. For convenience, the inverse ratio (noise/signal) is used as the abscissa. See Equat ion 6.3 for definitions of Ej, Ea and Ed. 45 60 50; 40, 30 20 1 sample 2 samples 5 samples 10 samples 20 samples —v.i.dt-.:!!.::.::..:.::..-...'.!.,^ ^ o r;v:L:J...,,,..;;..:-.M? 10 0.05 0.1 noise/signal 0.15 0.2 Figure 6.2: Standard deviation of error for estimated damping parameters. The standard deviation of the mean error over 100 trials is plotted for eight signal-to-noise ratios (oo, 100, 50, 30, 20, 15, 10 and 5) by the dashed lines. A solid line is the mean standard deviation over al l signal-to-noise ratios. The colour of each line signifies the number of samples (M) used in the average. For convenience, the inverse ratio (noise/signal) is used as the abscissa. See Equat ion 6.3 for the definition of Ed. 46 C h a p t e r 7 An Adaptive Sampling Algorithm 7.1 Overview To completely automate the creation of sound models from measurements, the se-lection of sampling locations must also be automatic. B y sampling location we mean the location on the surface of an object where it is to be struck by the sound effec-tor. The most intuitive approach is to select a uniform grid over the surface of the object. Wi thou t knowledge of the object's shape, a uniform grid in Cartesian space could also be considered. Th i s was demonstrated in [25] and [26]. A s part of the Ac t ive Measurement facility, we assume that a surface model of test objects is attainable using the Triclops t r inocular camera 1 . W i t h a surface model available, a uniform grid can be projected, and the sample locations uniformly distributed. One question inevitably arises when generating the uniform grid: how finely should we sample? M a n y everyday objects are composed of different materials 1 Currently, software to generate surface models from the Triclops' range data is not available. This is ongoing work of the A C M E project. For the examples in this thesis, surface models of test objects are constructed manually. 47 1. Select an imsampled vertex as the next sample location. 2. Strike the object at the sample location and create a sound model. 3. Compare the new model to the models at al l adjoining sample locations. 4. If the acoustic distance between two adjoining models is greater than a per-ceptual threshold, add a new vertex between these two. 5. Otherwise, repeat from Step 1 until no vertices are unsampled. Figure 7.1: Adapt ive sampling algori thm. (e.g., a glass jar wi th a metal l id) , or have density, volume and thickness variations across their surface. These properties wi l l cause the sound of the object to change perceptually over small regions of the surface. Other objects (e.g., an eraser) may have a relatively constant sound over the entire surface. Consequently, the selection of a multi-purpose sampling density is non-tr ivial . It cannot be constant, nor can it be pre-determined by object geometry alone. The algorithm presented in this chapter changes the density of a sampling mesh based on perceived differences in sound over the surface of an object. Th i s algorithm is straightforward and is listed in Figure 7.1. Here, the sample locations are chosen as the vertices of a surface model representing the test object. A t each vertex, a sound model is created. If the acoustic distance between this model and one at an adjoining sample location is greater than a perceptual threshold, a new sample location is added between the two (see Figure 7.2). The new sample location should be a vertex in the refinement of the surface. T h i s procedure continues until the acoustic distance between all adjoining models is less than a perceptual threshold. Selection of imsampled vertices in the first step of the algorithm is conducted by any heuristic rule. Currently, the vertex nearest the previously sampled vertex is selected. Other possible heuristics include selecting vertices by their connectivity in 48 Figure 7.2: Result of adaptive sampling. The sound models at two adjoining sample locations are compared (left). If the acoustic distance between the vertices is too great, the surface edge is refined by inserting a new vertex (right). the mesh or selecting vertices to minimise the amount of movement by the robots. The algori thm in Figure 7.1 is simplistic, but relies on three complex compo-nents: a procedure for adding new vertices to the surface model, an acoustic distance metric and a perceptually-relevant threshold for this metric. Th is chapter describes one implementation of each of these components. Note that these components are research topics on their own and it is not presumed that the implementations dis-cussed herein are opt imal solutions. The algori thm in Figure 7.1 is applicable to any vertex insertion rule, distance metric and threshold, and should be considered the principal result of this chapter. 7.2 Related Work To our knowledge, this is the first algorithm for adaptively selecting locations on the surface of an arbitrary object for purposes of sound modelling. Previous work on modelling of musical instruments from empirical measurements is vague in its selec-tion of sample locations; two examples are [2] and [4]. In [2] a guitar is "thumped" on the bridge by a "sharp instrument" to record measurements for estimates of a body model . A physics-based model was used to estimate the sample location 49 of plucked strings from the recordings. In [4], strings of musical instruments were struck using a force hammer at the point on the instruments' bridge were the strings made contact. Rules for the addition of new vertices to a surface have been explored in mult i -resolution surface research. Mult i - resolut ion surfaces are appealing in computer graphics because their level-of-detail nature simplifies editing and provides scalable render quali ty which can speed computer animation development. We have selected a subdivision surface representation, using the Loop [19] scheme for vertex insertion. A brief description is provided in Section 7.4. Distance metrics for sound have been explored in several domains. E a r l y efforts in this research were in the field of speech recognition. A classic survey of techniques is [11]. Dubnov and Tishby explored comparing musical sounds using a spectral and bispectral acoustic distortion measure [7]. The i r approach is to measure statist ical s imilar i ty using the Kul lback-Liebler divergence between models. M o r e recent work, which focused on creating a "content-based" sound browser (Muscle F i sh ) , uses two levels of features to form a complex parameter space wi th in which distances are computed [15, 33]. One set of features is extracted from each frame of audio data: loudness, pitch, brightness, bandwidth and mel-filtered cep-stral coefficients ( M F C C s ) . The time series of these frame values provides a second set of features: the mean, standard deviation and derivative of each frame-level parameter. To our knowledge, none of the metrics listed above have been used to deter-mine perceptual thresholds on similarity. B y "similar i ty" we mean in the context of perceiving two sounds as being produced by the same object. In our applicat ion, we require a metric by which we may decide if two similar sounds are perceived as separate objects and/or materials. Some research has examined the factors by which differences in material and geometry are perceived from audition [8, 9, 12, 16]. Whi l e 50 this research does not examine acoustic distances directly, it does provide empirical results indicating which parameters of a sound model are relevant to material and geometric perception. Specifically, the analysis in [16] suggests a metric by which model parameters are related to perceptual similarity. Because empirical thresholds are available for this metric, it was selected for use in this implementation of the adaptive sampling algori thm. Details of the metric are stated in Section 7.5; Sec-tion 7.6 contains a discussion relating the results of [16] to thresholds on this metric useful to our application. 7.3 Requ i rements The sampling algori thm must select points over the surface of the test object in a dense-enough mesh so as to capture perceptually relevant differences in sound over the entire surface. Satisfaction of this requirement is difficult to quantify and requires future perceptual studies. 7.4 Surface Representa t ion A s mentioned in Section 7.2, a subdivision surface representation is used to facilitate vertex insertion. A subdivision surface represents a smooth surface as the l imi t of a sequence of successive refinements of a coarse mesh [6]. T h a t is, it represents a coarse approximation to the true (smooth) surface with a finite number of vertices, yet can be systematically refined by adding new vertices unt i l , in the l imi t , the smooth surface is produced. Several different schemes exist for specifying the location of vertices added during refinement. Typical ly , the locations of new vertices are weighted combina-tions of neighbouring coarse vertices' positions. We use the Loop scheme as first proposed by Charles Loop [19]. The Loop scheme uses the vertex masks in Figure 7.3 to calculate the position of vertices in the next level of refinement. These masks are 51 Figure 7.3: Loop scheme refinements masks for subdivision. Masks on the left are used for edge refinement; masks on the right are used for vertex refinement. (3 is chosen to be £ ( 5 / 8 - (§ + \ cos ^ ) ) 2 . 52 applied to a coarse set of vertices to calculate the position of vertices in the next level of refinement. The mask on the left is used to insert a new vertex on the edge between two coarse vertices. The mask on the right is used to refine the position of each coarse vertex. The l imit position of interior vertices is computed by replacing (3 in Figure 7.3 with x = 3/8/3+n l ^ l - For vertices on the boundary of a surface, the l imit position is computed by changing the coefficients of the even-vertex boundary mask to j l , | and ^ [6]. The details of the refinement scheme, and its effect on the geometry of the mesh, are not important to our application; the refinement scheme simply defines a hierarchy of successive edge refinements. Interested readers should consult [6] for a thorough introduction to subdivision surfaces. In our implementation, we begin by sampling the sound at the l imit position of each vertex in the coarsest representation of the surface mesh. The acoustic distance between vertices joined by an edge of the mesh is then computed. If the acoustic distance is too great, the vertex which is the refinement of the joining edge is added to the list of vertices to be sampled. The minimum vertex-spacing required for adequate surface representation and refinement dictates the max imum spacing of the sound sampling mesh. For some simple-sounding objects, this may produce an overly dense sampling mesh. This is not a large concern for applications such as simulation since the denser mesh wil l rely less on interpolation between sound models during synthesis. 7.5 Acoustic Distance Metrics A s prescribed by the adaptive sampling algori thm (Figure 7.1), an acoustic distance must be calculated between each pair of vertices joined by an edge in the surface. The calculated distance wi l l be compared to a perceptual threshold to make a refinement decision. The selected distance metric must therefore be indicative of the perceptual proximity of two sounds. 53 Most applications of acoustic distance metrics discussed by the literature in Section 7.2 use distance to make relative comparisons. For example, classifying a new sound by choosing the group of exemplar sounds to which it is nearest. Our application is different because we need to determine i f two models wi l l be perceived as different materials or shapes once synthesised. The realism of a sound model requires that the sound produced over the surface of an object is not discontinuous in regions where it should vary smoothly. The distance metric must therefore be applicable to an appropriate perceptual threshold. Further discussion of such a threshold is delayed until Section 7.6. Mos t sound metrics calculate a distance directly from waveforms. We chose to use a metric derived from model parameters instead. M u c h of the work on mate-rial perception and discrimination discusses discriminants in terms of model param-eters. It is therefore convenient to express the distance metric on these parameters as well. Specifically, we use a logarithmic ratio of the frequency-independent damping coefficient and a ratio of modal frequencies as metrics. These metrics were suggested by the research of K l a t z k y et al. in their paper on auditory material perception [16]. Sections 7.5.1 and 7.5.2 describe these metrics in greater detail . 7.5.1 Frequency-Independent D a m p i n g Coefficient A s stated previously, and repeated in E q . 7.1, the sound model is composed of Nj frequency modes, each with a distinct frequency (w x , i ) , in i t ia l amplitude (a x , i ) and damping coefficient (d x ,«)-Nf p(x,t) = X > x , , - e - d * - i ( O s i n ( u ; x , , - 0 (7.1) i=i It is theorised that the damping coefficient in E q . 7.1 (d x,;) is related to a material 's internal friction parameter (0) by Equat ion 7.2 [9, 12, 16]. 54 d X | l - = 2w X ) l-tan(<£) (7.2) Furthermore, it has been demonstrated that the internal friction parameter ((f)) is an approximate shape and frequency-independent material property [18]. A recent study [16] has shown that a frequency-independent damping factor (p = 7 r t a n ( ( ^ > ) ) is a perceptually useful discriminant of material . A n estimate of this parameter (/5Xj;) can be calculated from the estimated damping coefficient (d x,t) of each frequency mode: = ~2^ T ( 7 ' 3 ) cfX ji is only an estimate at one frequency and is subject to noise and vari-ability. A better estimate of the frequency-independent damping coefficient (px) is computed as the median of p X i ; for all frequency modes i. G iven multiple samples at a single vertex, the estimate of the damping coefficient is improved by calculating /9 X as the median of /5X over all samples. The distance between two vertices' damping coefficients (pA and PB) is ex-pressed as a logari thmic ratio: Distance*(A, B) = log — (7.4) PB 7.5.2 Frequency Similarity Using frequency similari ty as a distance metric is supported by recent perceptual studies. K l a t z k y et al. found frequency and damping to be "independent determi-nants of s imilar i ty" [16]. The results of their study show that frequency and decay are used independently when judging the similarity of sounds, but in combination when classifying material. Addi t ional ly , experiments by Hermes [12] and Gaver [9] support the theory that frequency is an important perceptual feature for material , shape and size estimation. 55 • For each frequency mode u>A,i in model A . . . 1. F i n d the frequency mode uiB.k nearest to u>A,i 2. If the u>B,k is unmatched, match it to UA,% 3. O r , i f u>s,k is matched to another mode which is farther than u>A,i, match ^B,k to UA,% instead. M a r k uA.i as matched, and the mode i t replaces as unmatched. 4. Otherwise, loop to step 1, picking the next nearest frequency mode • Repeat unt i l all frequency modes in model A are matched. Figure 7.4: A l g o r i t h m to find unique frequency mapping between two models. We define the frequency distance between two models by Equa t ion 7.5. Modes are matched between models using the algorithm in Figure 7.4. Th i s al-gor i thm guarantees that each mode in model A wil l be uniquely matched to a mode in model B . Details of the implementation are included in Append ix C . Dxstanc^iA, B) = B" (7.5) where u>A,i and us,i are the frequencies of modes matched in models A and B by the algori thm in Figure 7.4. The frequency distance between one model and a collection of models is computed as the frequency distance (Equation 7.5) between the single model and the prototypical model of the collection. Chapter 6 describes the process of creating a prototypical model. 7.6 P e r c e p t u a l T h r e s h o l d s A s mentioned in Sections 7.2 and 7.5, the distance metrics selected for this imple-mentation were suggested by the analysis in [16]. The motivat ion for this selection 56 Experiment Intercept Freq. Diff. Decay Difr. Product Coeff. Coeff. Coeff. R2 1 88.7 2 96.2 -0.56 -1.02 0.30 -0.50 -1.06 0.31 -0.53 -1.04 0.305 0.76 0.80 0.78 mean 92.45 Table 7.1: Regression of s imilar i ty on frequency and decay difference [16]. The mean of the values for the two experiments is also listed. was the availability of empirical da ta rating the perceptual s imilar i ty of sounds by these metrics. The study of K l a t z k y et al. asked subjects to rate the similar i ty of two synthesised sounds. In their first two experiments, subjects rated how likely two sounds were to have been produced by the same material , regardless of shape. In the th i rd experiment, subjects rated the relative length of the bars that were purported to have produced the synthesised sounds. Whi le these tasks are not identical to the ours, we feel they are similar enough to provide suitable thresholds in the absence of more appropriate data . The appropriateness of the metrics and thresholds to our application can be properly evaluated only by a perceptual study of the synthesised models. The results of K l a t z k y et a/.'s first two experiments are summarised in Ta-ble 7.1. The average of the two experiments is listed in the last row of the table. The regression coefficients express the change in perceived similar i ty for a unit change in the logarithmic ratios of fundamental frequency and decay constant. The similar i ty ratings are on a scale of 0 to 100, but the perceived rat ing of identical sounds was found to be approximately 92.45 (mean intercept). Refer to [16] for more details of the experiments and results. The regression analysis of Table 7.1 suggests Equat ion 7.6 as an appropriate threshold expression. Here, the distances are weighted by the regression coefficients, and the sum is compared to a s imilar i ty threshold (S). The contribution of the three distances could be further weighted by additional empirical constants. 57 92.45 - Distance J A, B) * 1.04 - Distancew(A, B) * 0.53+ P K ' (7.6) Distancep(A, B) * Distancew(A, B) * 0.305 < S In our experiments we successfully applied an alternative set of threshold expressions. In Equat ion 7.7, the coefficients from Table 7.1 are scaled by constant similari ty factors (Sx) to form separate thresholds for each distance metric. The similari ty factor adjusts the amount of perceived material dissimilarity permitted before a refinement is required. DistanceP(A,B) > 1.04 * Sp Distancew(A, B) > 0.53 * Sw (7.7) Distance P(A, B) * Distancew(A, B) > 0.305 * Spuj In our implementation, if any of these thresholds is exceeded, a refined vertex is added. We chose the set of threshold expressions in Equat ion 7.7 because they provide flexible control over the influence of each distance metric. Equat ion 7.6 is a more correct threshold on the s imilar i ty results of the experiment of K l a t z k y et al., but given that the s imilar i ty judgement in their experiment is not identical to our requirements, we find Equa t ion 7.7 to be an acceptable alternative. Currently, a empirical value of Sx = 0.75 is used as the similari ty factor on all distances. Th is value was estimated by reviewing coarse data collections and manually selecting which models required refinement. A s an example, Table 7.2 summarises the calculated s imilar i ty factors of models collected from three objects: a brass vase, glass wine bottle and plastic speaker. The last row of Table 7.2 compares two locations on the brass vase. Only ten frequency modes were used to model the wine bottle and plastic speaker. For ty modes were used to model the brass vase. M o s t of the distances reported agree with expectation. Al though the distance between the brass vase and wine bottle are low, it is not a surprising result when the brass vase is synthesised with only ten modes, since it then sounds very 58 Similar i ty Factor (S) Object A Object B Frequency Decay Product. Brass vase Plast ic speaker 1.14 0.781 1.61 Plast ic speaker Wine bottle 1.05 0.585 1.11 Wine bottle Brass vase 0.267 0.196 0.0943 Brass vase (I) Brass vase (II) 0.0919 0.000125 0.0000208 Table 7.2: Examples of calculated similari ty factors. similar to the wine bottle. Future perceptual studies could be used to determine a less ad hoc value of S. It should be noted that the frequency distance of Equation 7.5 is not identical to the expression used to compute the coefficients in Table 7.1. In their experiments, K l a t z k y et al. compared only the fundamental frequency of their synthesised s t im-uli . Here however, we compare the average of all frequency modes of the model, but weight their contribution to the average by their ini t ia l amplitude. This ap-proximation has proved successful, as wi l l be i l lustrated by the sample collections in Chapter 8. 59 Chapter 8 Sample D a t a Collections 8 .1 Overview This chapter presents the results of four sample da ta collections. The first collection, a tuning fork, is meant as a calibration experiment. A surface model of the tuning fork is not used, nor is the adaptive sampling algori thm of Chapter 7. For the other three objects, a brass vase, plastic speaker and toy d rum, the entire system is used to build a sound model. The objects were selected to provide examples of a variety of materials. Also , since the shape-acquisition component of A C M E is not completed, we required ob-jects wi th geometries that could be easily modelled manually. Each surface model was constructed from manual measurements using 3D modelling software. Each test object was mounted on the A C M E test station. Objects whose diameters are smaller than the diameter of the test station could not be sampled below heights of 30 m m due to collisions between the sound effector and the test station. For future collections requiring complete coverage, objects could be raised on a narrow pedestal. The following four sections discuss the test objects, experimental setup, and the results of the data collections. 60 Coarse Microphone Number of Similar i ty Number of Object Name Vertices Distance (mm) Modes Threshold Samples Tuning fork 1 5 5 N / A 5 Brass vase 136 190 40 0.75 5 Plast ic speaker 98 40 10 0.75 5 Toy drum 31 130 40 0.75 5 Table 8.1: Summary of setup parameters for test objects. Figure 8.1: Setup for acquiring model of Figure 8.2: Recorded spectrogram, tuning fork. 8.2 Tuning Fork A n A-440 tuning fork was used as a calibration object. The tuning fork was mounted on the A C M E test station as il lustrated in Figure 8.1. A s is also shown, the mi -crophone was located 5 mm from the nearest tyne. The tuning fork was struck five times at one location to produce five three-second recordings. A five-mode proto-typical model was created from the five recordings. Table 8.1 summarises the setup parameters for all of the test objects. 8.2 .1 E s t i m a t i o n R e s u l t s Figure 8.2 displays the spectrogram of one recording. The 440 Hz tone is present as are several overtones at 9 430 Hz and 10 120 H z . When the tuning fork is struck, only the overtones are audible from a distance. A t its close proximity, however, the 61 0 0.5 1 —1.5 . . 2 Time (sec) (a) Synthesised spectrogram. (b) Close up. Figure 8.3: The figure on the left (a) shows the spectrogram of a sound synthesised from the prototypical model of the tuning fork. The figure on the right (b) is a closeup of the spectrogram showing the estimation of the fundamental frequency at 430.7 H z . Background noise contributed a false mode at 215.3 H z . microphone was able to record the low amplitude 440 Hz tone. Figure 8.3 contains two spectrograms produced by a sound synthesised from the prototypical model. A s both spectrograms show, the prototypical model con-tains relatively accurate estimates of the fundamental frequency and the overtones present in the original spectrogram. The mode representing the 440 Hz tone was estimated at 430.7 H z . The frequency was estimated from a 1024-point Discrete Fourier Transform ( D F T ) , wi th a frequency resolution of 43 H z . Since the estima-tion is wi th in 43 Hz of the true frequency, the test was a success. 8 .3 B r a s s V a s e The brass vase displayed in Figure 8.4 (a) was the first test object for which a complete sound model was generated. The subdivision surface in Figure 8.4 (b) represented the vase for the adaptive sampling algori thm. This coarse mesh contains 136 vertices. For ty frequency modes were estimated at each sample location, and prototypical models were produced using five recordings at each location. The vase was secured to the A C M E test station using thin double-sided 62 63 tape. The microphone on the field measurement system ( F M S ) was used to record the samples, and was located 190 mm behind the vase (see Figure 8.5). E a r l y experiments using the microphone mounted on the sound effector produced poor sound models due to the transient effects of clipping and echoes. These effects are diminished by recording in the far field. A t each sample location, the vase was struck normal to the surface by the sound effector. 8.3.1 Estimation Results Figure 8.6 is a comparison of spectrograms of synthesised sounds and recorded sam-ples at three positions on the vase. W h i t e noise was added to the synthesised sounds at a signal-to-noise ratio approximating the signal-to-noise ratio of the recording. The addition of noise creates spectrograms that are more comparable to the origi-nals. Appendix B discusses this technique with examples. The frequency modes were estimated quite accurately at each location in Figure 8.6. Audib ly , the synthesised sounds at most sample locations on the vase were comparable to the recordings. Mos t often, any difference in the sounds was a lower perceived pi tch. E v i -dence of this is present in the spectrograms of Figure 8.6, particularly at Z = 90 m m . A narrow band of high energy noise is visible in the recorded spectrogram from approximately 0 to 300 Hz (Figure 8.7). Th is band of noise was estimated as a mode in the model at 215 Hz wi th a very small damping constant (0.452). In fact, this mode's damping constant is smaller than any other of the modes by at least one order of magnitude. Though background noise is concentrated between 0 and 300 Hz , a moderate level of noise is also present in a band from 300 to 600 Hz (Figure 8.7). Th i s band of noise artificially reduced the estimated damping constants of frequency modes within that range. A s an example, two modes were estimated at 646.0 Hz and 473.7 64 0 01 02 03 04 06 OS 07 0 ) OS 0 01 03 03 04 05 00 07 OS 09 Time (sec> Time lsec\ (a) Recorded spectrogram (Z = 90 mm), (b) Synthesised spectrogram (Z = 90 mm) 0.4 OS 0.0 Time (sec) (c) Recorded spectrogram (Z = 61 mm), (d) Synthesised spectrogram (Z = 61 mm) (e) Recorded spectrogram (Z = 45 mm), (f) Synthesised spectrogram (Z = 45 mm) Figure 8.6: Results of brass vase experiment. Spectrograms of recorded samples and those synthesised from prototypical models are compared at three positions on the brass vase: Z = 90, 61 and 45 m m . Whi t e noise was added to the synthetic sounds to produce more comparable spectrograms. 65 Time (sec) Figure 8.7: Deta i l of narrow-band low-frequency noise. A high-energy band of noise is visible between 0 and 300 H z . Moderate levels of noise are also visible from 300 to 600 H z . 0.4 0.4 Time (seel Time (sec) Figure 8.8: Effect of noise on low-frequency modes. Though recorded low-frequency modes are audible for approximately 0.1 seconds, moder-ate noise levels sustain the estimated modes to approximately 0.44 and 0.75 seconds. Hz wi th damping constants of 8.6 and 4.5 respectively. Though recorded modes in this range typical ly lasted approximately 0.1 seconds, these estimated modes remain at significant amplitude (> -3 dB) for 0.44 and 0.75 seconds (Figure 8.8). This artificial sustain of low-frequency modes also contributes to the lower perceived pitch. The signal-to-noise ratio for these recordings was in the range of 30 to 40. 8.3 .2 R e f i n e m e n t R e s u l t s Refinement of the sampling mesh by the adaptive sampling algorithm is il lustrated in Figure 8.9. Table 8.2 summarises the results of refinement for all the test objects. A n interesting result of the vase's refinement is its unusual asymmetry. Since the vase is approximately circularly symmetric, it was expected that the refinement pattern would also be symmetric . A s shown in Figure 8.9, however, refinement varied greatly. The image on the left (a) shows many refined sampling locations, while the image on the right (b) is almost void of refinements. There are two possible 66 Number of vertices Object Name Coarse Rejected Missed Added Sampled Tuning fork 1 0 0 N / A 1 Brass vase 136 56 14 93 173 Plast ic speaker 98 41 14 28 85 Toy d rum 31+ 0 4 163 190 | Number of vertices of th e top surface. Table 8.2: Summary of refinement results. Th is table summarises the number of coarse, rejected, missed, added and sampled vertices for all of the test objects. A rejected vertex is one which is outside the working envelope of the P u m a arm. Missed vertices were counted when no force was sensed at a sample location (i.e., a hole). Figure 8.9: Refinement results of brass vase experiment. The refined sample loca-tions are plotted on the surface model. Despite geometric symmetry, refinement patterns are not symmetric as the two views (a) and (b) illustrate. Vertices are colour-coded by revision number; black, grey and white represent coarse, first and second refinements. 67 explanations. F i r s t , it is possible that the vase is not acoustically symmetric . If so, this example is a strong argument for the necessity of an adaptively sampling algori thm. Alternat ively, it may be that the acoustic distance between coarse models is very close to the threshold. If so, the variability of parameter estimation may be sufficient to occasionally increase acoustic distances above the threshold. Given the regular pattern of revision apparent in Figure 8.9 (a), this explanation is unlikely. One aspect of the vase's geometry introduced a difficulty for the system: there are three rows of small holes around the mouth of the vase. The experiment is programmed to identify missed sample locations if no contact is sensed. W i t h this particular object, though, the outer case of the solenoid often contacted the sides of a hole even though the plunger passed through. When this occurred, the system acquired samples of nothing. O f course, these degenerate samples introduced refinements around these holes. A l though this may result in overly dense sampling of the top r im , it also increases the likelihood that the surfaces surrounding the holes wil l be sampled. 8.4 P las t ic Speaker A complete sound model was also generated for the small speaker shown in F i g -ure 8.10 (a). The speaker is completely plastic, wi th the exception of a metal gri l l covering the front face. A cube mesh wi th 98 vertices represented the speaker for the adaptive sampling algorithm (Figure 8.10 (b)). Al though the speaker's surface could be adequately described by fewer vertices, interior vertices were added to seed the refinement of the adaptive sampling algori thm. Since the sounds at the corners of the speaker are similar, refinement would be unlikely if only corner vertices were used to represent the surface. Ten frequency modes were estimated at each sampling location. P re l imi -nary experiments determined that ten modes sufficiently represent the sound of the speaker at most locations. F ive recordings at each sample location were used to 68 (a) (b) Figure 8.10: A photo (a) of plastic speaker and the subdivision surface which rep-resents it (b). create prototypical models. The sound effector struck the speaker normal to the surface at each sample location. The speaker was secured to the A C M E test station using double-sided tape along the bottom edges. The microphone on the F M S recorded the samples from a distance of approximately 40 m m . Because of the low amplitude of the impact sounds, the signal would be dominated by room noise at larger distances. 8.4.1 Es t ima t ion Results The speaker was a problematic test object for two reasons. F i r s t , the contact sounds it produces are quiet and decay quickly. Low amplitude is a concern since it decreases the signal-to-noise ratio. A s proven by the evaluation of the estimation algorithm (Section 2.4.1), estimation accuracy degrades dramatical ly with increasing noise levels. Also , because the sound decays quickly, the sound of the solenoid's return is sometimes present in the recordings. W h e n the solenoid returns after impact, the plunger often strikes the side of its exit hole. Normally, this chatter is quiet enough, or the microphone far enough, that it is not recorded. Because the microphone needs to be so close to the speaker, however, the solenoid's sound is recordable. 69 o o.i 0 £ 03 0.4 0.5 0.4 0.7 0.8 09 0 : : 03 0* 05 • 07 09 nt Time (sec) Time (seel (a) Recorded spectrogram. (b) Synthesised spectrogram. Figure 8.11: Results of plastic speaker experiment. A spectrogram of a sample recorded at the top of the speaker (a) is compared to a spectrogram of a sound synthesised from a ten-mode prototypical model (b). Choice of an acceptable recording distance is therefore a trade-off between good signal amplitude, and recording the sound of the solenoid. The problem of microphone distance was further complicated by the hard-ware surrounding the microphone on the F M S (i.e., the camera, Triclops and pan/ t i l t unit) . Frequently, the microphone could not be moved closer to the strike location because the P u m a arm would collide with the F M S hardware. The second problem wi th using the speaker as a test object is registration. Because surface model creation is not yet automatic, the object must be manually registered to the stage for correspondence with the surface model. W i t h objects that are circularly symmetric, small imprecision is tolerable. A square object, however, must be more precisely positioned. Dur ing the test it was noted that the speaker was not accurately positioned and the edges of each face were not reliably struck at a normal angle. It is hoped that this problem wil l be eliminated by the development of a shape acquisition module for A C M E . Despite these difficulties, good sound models were produced at many sample locations. The spectrograms in Figure 8.11 illustrate the similari ty between the recorded and synthesised sounds. The model is clearly an accurate representation 7 0 of the recorded sample. Some models suffered the same pitch-lowering effects as discussed in Section 8.3.1. Addit ional ly , The sound models of metal gr i l l were gen-erally not audibly similar to the recordings, since their signal-to-noise ratios were poorer. In comparison to the brass vase, the speaker's sound model has a narrower bandwidth and sharper decay. This result supports the perceptual studies of K l a t z k y et al. [16]. 8.4.2 Ref inement Resul ts The sound of the plastic speaker is mostly uniform, though it does vary from the edge to the middle of each face. Since the middle of each face is unsupported, the sound is generally lower in frequency than the edges. We expected this variation to trigger some refinement of the sampling mesh. A s is i l lustrated in Figure 8.12, very li t t le refinement was required. Future experiments could investigate lower similar i ty thresholds and a coarser surface model as stimulants of refinement. 8.5 T o y D r u m The fourth test object, a toy drum, was selected for its diversity in sound across its surface. The d rum (Figure 8.13 (a)) is a child's toy, made of plastic wi th three metal bars suspended across a slot in the top face. Each metal bar has a different length and therefore different frequency. A complete model of the drum could not be created for the experiment due to restrictions of our subdivision surface loader. Instead, the drum is approximated by a simple cyl indr ical mesh (Figure 8.13 (b)). Because the handle of the drum is not modelled by the surface, we were unable to create a sound model for the entire drum. We instead created a sound model of only the top face in order to show the results of refinement on an acoustically complex object. F ive samples were recorded at each sample location, and forty modes were 71 Figure 8.12: Refinement results of plastic speaker experiment. Very few refinements were made on the speaker, wi th most refinements occurring near the edges. The refinement pattern for the top of the speaker is shown here. Vertices are colour-coded by revision number; black, grey and white represent coarse, first and second refinements. 72 Figure 8.14: W h e n measuring near the edge of the bars, the plunger often caused the bars to pivot on their supports. When the plunger retracted, the bars would return to their nominal positions, reducing the distance between the plunger and the bar. estimated for each model. Similar ly to the brass vase and speaker, the drum was affixed to the A C M E test station using double-sided tape on its bo t tom edges. The microphone on the F M S was again used to record the samples at a distance of 130 m m , and the drum was struck normal to its surface. 8.5.1 Estimation Results The toy drum's construction introduced two problems which affected the quali ty of the sound models. Since the metal bars are supported only along their central axis, they are able to pivot around that axis (Figure 8.14). Unfortunately, when the sound effector measured locations near a bar's edge, the bar moved a few millimetres upon contact, then returned once the sound effector was retracted to strike. Th i s move-ment reduced the distance between the plunger and the metal bars and produced uncharacteristically damped sounds. The second difficulty arose from the spaces between the metal bars. If a sample location lay in one of those spaces, no sound model was created. Th is absence prevented any refinement between that sample location and adjoining vertices. Since these holes lay between the bars, adaptive refinement between the bars did not always occur as expected. A p a r t from the effects just listed, most of the sound models were successful. Except ional ly good models were produced for the metal bars when they were struck 73 0 01 0.S 0.3 0.4 O.S 06 0.7 08 0.9 0 0.1 0.2 0.3 0.4 O.S 06 0.7 0.8 0.9 Time (sec) Time (seel (a)Recorded spectrogram. (b) Synthesised spectrogram. Figure 8.15: Results of toy d rum experiment (metal). The spectrogram of a recorded sample of the middle metal bar is shown on the left (a). The spectrogram of a sound synthesised from the prototypical model at the same location is shown on the right (b). near their centers. For example, Figure 8.15 illustrates the fidelity of the model for the middle bar. W i t h the exception of the noise effects mentioned previously, the spectrograms are nearly identical. Results of modelling the plastic surface were acceptable, though not as suc-cessful as the metal bars. A s Figure 8.16 demonstrates, the frequency spectrum was typically correct, but the damping parameters were often inaccurate. One addit ional consequence of the construction of the drum is that the metal bars often resonated when the plastic was struck. Though a minor effect, it may have contributed to the sustain of some modes. M o r e likely, the pr imary reason for poorer estimation is the lower amplitude response of the plastic. 8.5.2 Refinement Results Results of the refinement are plotted in Figure 8.17. A s expected, the model was refined substantially, especially around the interface between the metal bars and plastic (Figure 8.17 (b)). Unfortunately, several missed sample locations occurred at gaps between metal bars on the right side. Slight inaccuracies in the manual 74 (a) Recorded spectrogram. (b) Synthesised spectrogram. Figure 8.16: Results of toy drum experiment (plastic). The spectrogram of a recorded sample of drum's plastic surface is shown of the left (a). The spectro-gram of a sound synthesised from the prototypical model at that location is shown on the right (b). registration of the drum caused some sample locations on the right side to lie between bars, but not on the left side. Regardless, it is clear from the left side of the diagrams that the sampling mesh is denser near material boundaries. Th is result is convincing evidence of the success of the adaptive sampling algori thm. 75 (a) Sample locations plotted on surface. (b) Diagram of refinement. Figure 8.17: Refinement results of toy d rum. The refined sample locations are plot-ted on the surface model in (a). Vertices are colour-coded by revision number; black, grey and white represent coarse, first and second refinements. A simplified diagram (b) also shows sample locations that were missed (diamonds). The location of the metal bars is indicated by the vertical rectangle. Here refinement levels are represented as 'o ' , ' + ' and ' x ' in ascending order of refinement level. 76 Chapter 9 Conclusions 9 . 1 O v e r v i e w A system to automatically create sound models of everyday objects was designed and constructed. Th i s system uses a surface model of a test object to adaptively select sample locations. A t each of these locations a device, called a sound effector, strikes the object to elicit an acoustic impulse response. Mul t ip l e impacts are used to create a prototypical sound model which best represents the sound at that locat ion. The hardware, software and algorithms required for this system were implemented and tested. Overal l , the sound models produced of the tuning fork, d rum, speaker and toy d rum are encouraging. These results demonstrate that excellent sound models can be constructed under favourable noise and impact conditions. For example, the fundamental frequency of the tuning fork was accurately estimated wi th in the l imits of the discrete Fourier transform. Construct ions such as the drum's metal bars pose a difficult problem. S t i l l , the system performs reliably for "r ig id" objects - a characteristic of many everyday objects. The Achil les Heel of the system is noise. Evaluat ion of the parameter esti-mation algorithm in Chapter 2 revealed large errors in estimation for even modest 77 levels of background noise. Even models with relatively accurate frequency esti-mates contained spurious low-frequency modes and incorrect delay constants due to noise. These effects are perceived as a lower pitch of the synthesised sounds. Proto typica l models are one way to reduce the effect of noise. Evaluat ion of the spectrogram averaging technique showed a significant reduction in mean esti-mation error. The adaptive sampling algori thm presented in Chapter 7 suffers a sensitivity to noisy models and missed sample locations. Its refinement of the vase and drum's sampling meshes, however, are evidence of its potential. Improvement of the models and acoustic distance metric should improve future results. 9.2 F u t u r e W o r k A s with all research, this thesis has created many new opportunities for future work. This section suggests avenues for future research and development of the system. Ult imately, environmental noise must be removed at its source. Ei ther a sound-proof enclosure must be constructed around A C M E , or it must be located in a room without machines. Solenoid noise also presented a problem for "low-amplitude" materials. Since the sound is produced by chatter between the plunger and the solenoid casing, it is presumed that the solenoid must be replaced by a special-purpose device. A n alternate approach is to coat the solenoid plunger wi th rubber. The original design of the sound effector suggested a replaceable t ip . C u r -rently, a hard steel t ip is used to produce a sufficiently impulsive impact . Other tips could be constructed from other materials to investigate their effect. Another useful addit ion to the system would be a force sensor or load cell for the t ip of the effector's plunger. K n o w i n g the force profile of the impact could yield more accurate sound models. Recording the back-current through the solenoid coil may also be a solution. 78 The problem of microphone placement could be addressed by repositioning the microphone on the F M S for each impact . A sophisticated motion planner would be required to prevent collisions wi th the object, test station and contact measure-ment system. One issue not addressed by the current design is the effect attaching objects to the A C M E test station has on the boundary conditions of the sound model. B y attaching one side of an object to the test stat ion, that side is prevented from vibrat ing freely. The resulting sound model is therefore only applicable only to synthesis of the object's sound in this configuration. One useful configuration is an "all-free" boundary condition where all but the impulse location are free to vibrate. This is useful for simulations of dropped objects. One possible solution is to hold the object in place by a min imum number of point contacts. A set of rubber cones may be used for this purpose. Thorough evaluation of the system, part icularly the adaptive sampling algo-r i thm, requires a playback device. Intelligent audio morphing algorithms for syn-thesis may reduce a model's dependence on adaptive sampling. A t the very least, software which morphs between sample locations to provide a continuous audio map of objects wi l l encourage perceptual studies evaluating the effectiveness of the acous-tic distance metric and perceptual thresholds. Th i s research wi l l hopefully result in an iterative improvement of the adaptive sampling algori thm. Other experiments should also be conducted to investigate the effect of strike angle relative to surface normal on the sound model. It is clear that the amplitude of the sound wil l change as the strike angle approaches 0° . It is not immediately clear whether objects require anisotropic sound models. The current system is easily programmed to perform these experiments. 79 Bibl iography [1] Various Authors . Computer Music Journal Special Issues on Physical Modeling. 16(4) and 17(1), M I T Press, 1996 and 1997. [2] Kev in Bradley. Synthesis of an acoustic guitar wi th a digital string model and linear prediction. Master 's thesis, Carnegie M e l l o n University, 1995. [3] Antoine Chaigne and Vincent Doutau t . Numer ica l simulations of xylophones. Journal of Acoustical Society of America, 101(l) :539-557, 1997. [4] Perry R . Cook and D a n Trueman. A database of measured musical in-strument body radiation impulse responses, and computer applications for exploring and uti l izing the measured filter functions. Available Online: http://www.cs.princeton.edu/ pre/'ism98fin.pdf, 1998. [5] Creative Technology L t d . Sound Blaster Live! Hardware Specifications, 2000. Available online: h t tp : / /www.soundblas ter .com. [6] Tony DeRose. Subdivision surface course notes. S I G G R A P H Course Notes, 1998. [7] Shlomo Dubnov, Naftali Tishby, and D a l i a Cohen. Cluster ing of musical sounds using polyspectral distance measures. In Proceedings of the 1995 International Computer Music Conference, pages 460-463, 1995. [8] Robert S. Durs t and Er ic P . K r o t k o v . Object classification from analysis of impact acoustics. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, volume 1, pages 90-95, 1995. [9] W . W . Gaver . Everyday Listening and Auditory Icons. P h D thesis, Univeristy of Cal i fornia in San Diego, 1988. [10] W . W . Gaver . Synthesizing auditory icons. In Proceedings of the ACM IN-TERCHIPS, pages 228-235, 1993. 80 [11] August ine H. Gray, Jr . and John D . M a r k e l . Distance measures for speech processing. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-24(5):380-391, October 1976. [12] D . J . Hermes. Audi to ry material perception. I P O A n n u a l Progress Report 33, Technische Universiteit Eindhoven, 1998. [13] Wesley H . Huang. A tapping microposit ioning cell . In Proceedings of the IEEE International Conference on Robotics and Automation, pages 2153-2158, 2000. [14] I n t e r T A N Inc. Optim.us ultra-miniature tie-clip microphone specifications, 1996. [15] Douglas Keislar , Thorn B l u m , James Wheaton , and Er l i ng W o l d . A content-aware sound browser. In Proceedings of the 1999 International Computer Music Conference, pages 457-459, 1999. [16] Rober ta Kla tzky , Dinesh K . P a i , and Er ic K r o t k o v . Perception of material from contact sounds. Presence, (in press). [17] E r i c K r o t k o v . Robot ic perception of material . In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 88-94, 1995. [18] E r i c Kro tkov , Rober ta Kla tzky , and N i n a Zumel . Robot ic perception of ma-terial : Experiments with shape-invariant acoustic measures of material type. In O . K h a t i b and J . K . Salisbury, editors, Experimental Robotics IV, num-ber 223 in Lecture Notes in Cont ro l and Information Sciences, pages 204-211. Springer-Verlag, 1996. [19] Charles Loop . Smooth subdivision surfaces based on triangles. Master ' s thesis, Univers i ty of U tah , 1987. [20] Rober t L . M o t t . Sound Effects: Radio, TV, and Film. But te rwor th Publishers, 1990. [21] Dinesh K . Pa i , Jochen Lang, John E . L l o y d , and Robert J . Woodham. Acme , a telerobotic active measurement facility. In Proceedings of the Sixth International Symposium on Experimental Robotics, 1999. [22] Point Grey Research, Vancouver, Canada . Triclops On-line Manual. Available online: h t tp : / /www.ptgrey .com. [23] Precision Mic roDynamics Inc. Precision MicroDynamics Inc. MC8-DSP-ISA Register Access Library and User's Manual, 1.3 edition, 1998. 81 [24] Lawerence Rabiner and Bi ing-Hwang Juang. Fundamentals of Speech Recogni-tion. P T R Prent ice-Hall , Inc., 1993. [25] Joshua L . Richmond and Dinesh K . P a i . Act ive measurement of contact sounds. In Proceedings of the IEEE International Conference on Robotics and Automa-tion, pages 2146-2152, 2000. [26] Joshua L . Richmond and Dinesh K . Pa i . Robot ic measurement and modeling of contact sounds. In Proceedings of the International Conference on Auditory Display, 2000. [27] M a l c o l m Slaney, Michele Covel l , and Bud Lassiter. A u t o m a t i c audio morphing. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1001-1004, 1996. [28] K e n Steigli tz. A Digital Signal Processing Primer with applications to Digital Audio and Computer Music. Addison-Wesley, 1996. [29] M a r k Ulano . M o v i n g pictures that talk - the early history of film sound. Avai lable online: h t tp : / /www.f i lmsound.org /u lano/ index .h tml . [30] K . van den Doe l . Sound Synthesis for Virtual Reality and Computer Games. P h D thesis, Universi ty of Br i t i sh Columbia , M a y 1999. [31] Kees van den Doel and Dinesh K . P a i . The sounds of physical shapes. Presence, 7(4):382-395, 1998. [32] R ichard P . Wildes and W h i t m a n A . Richards. Recovering material properties from sound. In W h i t m a n Richards, editor, Natural Computation. T h e M I T Press, 1988. [33] E r l i n g W o l d , T h o m B l u m , Douglas Keislar , and James Whea ton . Content-based classification, search and retrieval of audio. IEEE Multimedia, 3(3):27-36, 1996. A l s o available online (www.musclefish.com). 82 Appendix A Sound Effector Specifications A . l M o u n t i n g B r a c k e t The construction of the sound effector's mounting bracket deserves a brief descrip-tion here for future reference. Constructed from a single piece of 1 — | " x | " a luminum, it was formed on a bending-bar following the schedule in Figure A . l . To allow for the finite bending radius of the material, an addit ional | " ( | " x 2) was added to the length of the material . Following the bending specifications in Figure A . l , the specified spacing between the two ends (i.e., 2.5") was maintained. Unfortunately, two artifacts of the bending process are present in the bracket: fa-tigue marks and off-centre alignment. The fatigue marks were produced on the exterior radius of each bend. These occurred because the metal was bent beyond its permissible stress l imi t . Th i s might be avoided in future constructions i f the metal was first heated. The alignment of the solenoid is also slightly off-centre following the bending. Th is is a flaw of the alignment of the material in the bending vise. 83 Figure A . l : Bending schedule for sound effector mounting bracket. 84 A . 2 C o n t r o l c i r c u i t The circuit interfacing the sound effector to the Precision Mic roDynamics M C 8 board is diagrammed in Figure A . 2 . It is a simple switching circuit , with a 74F245 Octal buffer to isolate the M C 8 from the relay. Th is circuit may be duplicated to control other digital output devices such as lights. +5 V D C f solenoid 1 k Q Pin J4-37 1/8 Q l T l 7\ 4 To Solenoid +12 V D C 100 Q Q l : 74F245 Octal Buffer T l : P 2 N 2 2 2 2 N P N Figure A . 2 : Schematic for solenoid control circuit . 85 Appendix B Effect of W h i t e Noise on Spectrograms The two spectrograms in Figure B . l are presented to compare the effect of white noise on the appearance of aspectrogram. W i t h o u t background noise (Figure B . l (a)), the frequency modes appear as wide bands, and appear to be sustained longer. It therefore becomes difficult to compare synthesised spectrograms to measured ones. B y adding low amplitude (e.g., S N R = 100) white noise to the synthesised sound, the mapping of colours to intensity values is scaled more comparably to the origi-nal recording. For this reason, all spectrograms of synthesised sounds in Chapter 8 include white noise added at a signal-to-noise ratio approximately the same as the measured samples. The white noise is added to the synthesised signal prior to computing its spectrogram. 86 (a) Pure spectrogram. (b) Spectrogram wi th additive white noise. Figure B . l : Effect of white noise on spectrograms. The spectrogram in (a) is pro-duced by a sound synthesised from a forty-mode sound model. The spectrogram in (b) is the same spectrogram, but wi th white noise added at an S N R of 100. Whi t e noise scales the mapping of colour to intensity more comparably to spectrograms of recorded sounds. 8 7 Appendix C Details of Unique Frequency-Mapping Algorithm A simple algori thm was designed to match frequency modes between two sound models such that a one-to-one mapping exists. Tha t is, given two models ( A and B), each wi th Nj frequencies, a mapping function m(j) is produced such that FA,% maps to FB,J when i = m(j). The algorithm is summarised in Chapter 7 (Figure 7.4) and repeated in Figure C . l for convenience. This appendix describes the implementation of the algori thm in greater detail . F ive arrays are used to track the matching of modes. The first is a matr ix of frequency differences between all modes in the two models (Figure C.2). Th i s d i f -ference ma t r ix is used to create the order array. This array orders the indices of modes in model B from nearest to farthest for each mode in model A . (Figure C.3). The th i rd array is a list of indices indicating the next mode to check (Figure C.4). A fourth array (Figure C.5) is used to track which modes in model A are currently matched. The fifth array is the result of the algori thm: an array relating the mapping of mode indices in model A to mode indices in model B (Figure C.6). The index of the mode in model A matched to mode j in model B is stored as mapping [j ] . The elements of the mapping array are initialised to indicating unmatched modes. 88 • For each frequency mode UA,i in model A . . . 1. F i n d the frequency mode u>B,k nearest to UA,\ 2. If the u>B,k is unmatched, match it to LOAJ 3. O r , if UB,k is matched to another mode which is farther than UA,I, match 0JB,k to LJA,{ instead. M a r k uA,i as matched, and the mode it replaces as unmatched. 4. Otherwise, loop to step 1, picking the next nearest frequency mode • Repeat unti l al l frequency modes in model A are matched. Figure C . l : A lgo r i t hm to find unique frequency mapping between two models. A N 1 2 3 4 1 6 0 8 1 2 1 5 3 4 3 1 7 1 6/ 4 4 2 6 1 The difference between modes A3 and B4 is 6. Mode 61 is the third nearest to mode A4. Figure C .2 : difference matr ix . Dif-ferences in frequencies between all modes in M o d e l A and all modes in M o d e l B are stored in this array. For example, difference [3] [4 ] is the difference between FA,3 and FB,A-Figure C . 3 : order array. Each col-umn of the order array contains the indices of modes in M o d e l B from nearest to farthest of a mode in M o d e l A . For example, order [3] [4 ] is the index of the 3rd nearest mode in M o d e l B to FA,4. 89 Check A1 against the 3rd closest mode next iteration. 1 2 3 4 true false false false -1 Mode B l is mapped to mode A2. Figure C.4: i n d e x array. Each element contains an index into the o r d e r array at which to select the next mode from Mode l B for compari-son. Each element is non-decreasing, thus eliminating cycles where two modes are repeatedly compared against another pair. F igure C.5: matched array. Indicates which modes of M o d e l A have been matched to a mode in Mode l B . The algorithm terminates when all elements of this array are t r u e . Figure C .6 : mapping array. When the algorithm terminates, this array maps modes in M o d e l A to modes in M o d e l B . For ex-amples, mapping [1] contains the index of the mode in M o d e l A which maps to FB,I-90 The pseudo-code in Figure C.7 uses these five arrays to implement the al-gorithm of Figure C . l . Each unmatched mode i in model A is examined in se-quence. The j = i n d e x [ i ] t h mode listed in column i of the o r d e r matrix is checked against the mapping array. If no mapping exists (i.e., mapping [b] = -1, where b = o r d e r [ i ] Cj]), the mapping is set to mode i , mode i is marked as matched (i.e., m a t c h e d [ i ] = true) and the next mode in model A is examined (i.e.. i = i + 1). If a mapping exists, the difference between the currently mapped modes is compared to the difference between modes i and b (using the d i f f e r e n c e matr ix) . If the current mapping's difference is less than the proposed new mapping, i n d e x [ i ] is incremented, and the next nearest mode j is examined. Otherwise, mapping[b] is set to i , and the previously mapped mode is marked as unmatched. This process continues unt i l all modes in model A are matched. If the last mode i in model A is examined before the mapping is complete, i is reset to the first unmapped mode in model A and the loop continues. Since models A and B contain the same number of modes, each of which is uniquely mapped to one other modes, the algori thm is guaranteed to terminate. The i n d e x array prevents two pairs of modes from being compared twice, thereby eliminating potential cycles. 91 int[] FindUniqueMapping(double [] modesOfA, double [] modesOfB) { // the difference[a][b] i s the difference between frequency // modes a (from Model A) and b (from Model B) difference = calculateDistance(modesOfA, modesOfB); // order[a] [i] i s the index to the i t h nearest mode i n Model B / / t o mode a i n Model A order = sort(difference); // index [a] i s the index of the next mode in the order // array f or mode a int [] index; // i n i t i a l i s e a l l elements of index to 0 index i s a l l {0 } ; // matched[a] indicates whether mode a in Model A has // been matched yet boolean[] matched; // I n i t i a l i s e a l l elements of matched to false matched i s a l l {false}; // mapping [b] i s the index of the mode in Model A that best // maps to mode b of Model B. int[] mapping; // I n i t i a l i s e a l l elements of mapping to - 1 mapping i s a l l { - 1 } ; 92 /** ALGORITHM CORE BEGINS HERE **/ // Iterate u n t i l a l l modes i n Model A are matched while (matched i s not a l l {true}) { // For each unmatched mode in Model A... for ( i = 0 to modesOfA.length) { i f (matched[i] == false) { // Find nearest unmapped mode in Model B for (j = index[i] to order[i].length) { // b i s the index of the next nearest mode in Model B b = order [ i ] C j ] ; // a i s the index of the mode in Model A to which b / / i s mapped ( i f any) a = mappingCb]; // i f mode b i s unmapped, or i f i t i s mapped // to mode i n Model A which i s farther, map i t to // mode i i f ((a == - 1 ) OR (differenceCa]Cb] > distanceCi]Cb])) { mappingCb] = i ; i f (a > - 1 ) then matchedCa] = fal s e ; matched[i] = true; indexCi] = j + 1 ; break j loop; } } // end j loop > } // end i loop } } Figure C .7 : F indUn iqueMapp ing Pseudo-code 93 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items