UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Automatic measurement and modelling of contact sounds Richmond, Joshua Lee 2000

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2000-0550.pdf [ 11.32MB ]
Metadata
JSON: 831-1.0051315.json
JSON-LD: 831-1.0051315-ld.json
RDF/XML (Pretty): 831-1.0051315-rdf.xml
RDF/JSON: 831-1.0051315-rdf.json
Turtle: 831-1.0051315-turtle.txt
N-Triples: 831-1.0051315-rdf-ntriples.txt
Original Record: 831-1.0051315-source.json
Full Text
831-1.0051315-fulltext.txt
Citation
831-1.0051315.ris

Full Text

Automatic Measurement and Modelling of Contact Sounds by Joshua L . R i c h m o n d B . A . S c , University of Waterloo, 1998  A T H E S I S S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F THE REQUIREMENTS  FOR THEDEGREE OF  M a s t e r of Science in T H E F A C U L T Y OF G R A D U A T E STUDIES (Department of C o m p u t e r Science)  We accept this thesis as conforming to the required standard  The University of British Columbia A u g u s t 2000 © J o s h u a L . R i c h m o n d , 2000  In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t of the requirements f o r an advanced degree a t the U n i v e r s i t y of B r i t i s h Columbia, I agree t h a t the L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and study. I f u r t h e r agree t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g of t h i s t h e s i s f o r s c h o l a r l y purposes may be g r a n t e d by the head of my department or by h i s or her r e p r e s e n t a t i v e s . I t i s understood t h a t c o p y i n g or p u b l i c a t i o n of t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l not be a l l o w e d without my w r i t t e n p e r m i s s i o n .  The U n i v e r s i t y of B r i t i s h Columbia Vancouver, Canada  Abstract Sound plays an i m p o r t a n t role in our everyday interactions w i t h the environment. Sound models enable v i r t u a l objects to produce realistic sounds. T h e manual creation of sound models from real objects is tedious and inaccurate. A brief review of sound models is presented, w i t h details of a sound model for contact sounds. T h i s thesis documents the development of a system for the a u t o m a t i c acquisition of sound models. T h e system is composed of four modules: a sound acquisition device, an asynchronous d a t a server, an algorithm for c o m p u t i n g p r o t o t y p i c a l sound models and an adaptive s a m p l i n g a l g o r i t h m . A description of each module and its requirements is included. Implementations of each module are tested and explained. Results of t y p i c a l d a t a collections are discussed.  Sound models for a cali-  bration object, brass vase, plastic speaker and toy d r u m are constructed using the system. C o m p a r i s o n s of the sound models to the original recordings are displayed for each object. Under ideal circumstances the system produces accurate sound models. E n vironmental noise, however, decreases the accuracy of the estimation technique. A n evaluation of the parameter estimation algorithm confirms this observation. M a n y opportunities exist for future work on this system. Ideas for improvements and future investigations are suggested.  ii  Contents  Abstract  ii  Contents  iii  List of Tables  vii  List of Figures  viii  Acknowledgements  xi  Dedication  xii  1  Introduction  1  2  Contact Sounds  4  2.1  Overview  4  2.2  Related W o r k  5  2.3  A C o n t a c t Sound M o d e l  7  2.4  Empirical Parameter Estimation  8  2.4.1 3  Performance E v a l u a t i o n  10  A S y s t e m for A u t o m a t i c M e a s u r e m e n t of C o n t a c t S o u n d s  13  3.1  Objectives  13  3.2  System Requirements  14  iii  3.3 4  5  15  A Sound Acquisition Device  20  4.1  Overview  20  4.2  Related W o r k  21  4.3  Requirements  22  4.4  T h e A c t i v e Measurement Facility  22  4.5  Sound Effector  23  4.6  Sound C a p t u r e Hardware  25  4.7  Sound Effector Software  25  A n Asynchronous D a t a Server  28  5.1  Overview  28  5.2  Requirements  28  5.2.1  Generic D a t a Server Requirements  29  5.2.2  Sound Server Requirements  29  5.3  5.4  6  System Overview  Server A r c h i t e c t u r e  30  5.3.1  Server E x e c u t i o n  31  5.3.2  Data Flow  33  5.3.3  Specialisation to Sound D a t a  35  Implementation Details  36  5.4.1  Performance E v a l u a t i o n  36  5.4.2  Limitations  37  Building  a Prototypical Model  40  6.1  Overview  40  6.2  Related W o r k  41  6.3  Spectrogram A v e r a g i n g  42  6.4  Performance E v a l u a t i o n  43  iv  7  A n Adaptive Sampling Algorithm  7.1  Overview  47  7.2  Related W o r k  49  7.3  Requirements  51  7.4  Surface Representation  51  7.5  A c o u s t i c Distance M e t r i c s  53  7.5.1  Frequency-Independent D a m p i n g Coefficient  54  7.5.2  Frequency Similarity  55  7.6 8  Perceptual Thresholds  56  Sample D a t a Collections  60  8.1  Overview  60  8.2  Tuning Fork  61  8.2.1  61  8.3  8.4  8.5  9  47  E s t i m a t i o n Results  B r a s s Vase  62  8.3.1  E s t i m a t i o n Results  64  8.3.2  Refinement Results  66  P l a s t i c Speaker  68  8.4.1  E s t i m a t i o n Results  69  8.4.2  Refinement Results  71  Toy D r u m  71  8.5.1  E s t i m a t i o n Results  73  8.5.2  Refinement Results  74  Conclusions  77  9.1  Overview  77  9.2  Future Work  78  Bibliography  80  v  Appendix A  S o u n d Effector Specifications  83  A.l  M o u n t i n g Bracket  83  A.2  C o n t r o l circuit  85  Appendix B  Effect of W h i t e N o i s e o n S p e c t r o g r a m s  86  Appendix C  Details of U n i q u e F r e q u e n c y - M a p p i n g A l g o r i t h m  88  vi  List of Tables 5.1  Results of sound server performance evaluation  38  7.1  Regression of similarity on frequency and decay difference  57  7.2  E x a m p l e s of calculated similarity factors  59  8.1  S u m m a r y of setup parameters for test objects  61  8.2  S u m m a r y of refinement results  67  vii  List of Figures 2.1  A t t r i b u t e s of contact sounds  5  2.2  Parameter Estimation Algorithm  9  2.3  M e a n errors of the parameter estimation algorithm on synthetic d a t a .  12  3.1  Sound M e a s u r e m e n t - t o - P r o d u c t i o n Pipeline  14  3.2  G e n e r a l procedure for sound model creation  17  3.3  A test object on the A C M E test station  18  4.1  T h e Sound Effector  24  4.2  D i g i t a l O u t p u t C o n n e c t i o n server architecture  26  5.1  D a t a server architecture  31  5.2  D a t a server state d i a g r a m  32  5.3  C o n t r o l loop for RUNNING state  33  5.4  C o n t r o l loop for CAPTURING state  33  5.5  Server d a t a flow  34  5.6  One A u d i o P a c k e t  35  5.7  Spectrogram of sound containing "pops"  38  6.1  M e a n errors of the parameter estimation a l g o r i t h m on spectrogramaveraged synthetic d a t a  6.2  45  Standard d e v i a t i o n of error for estimated d a m p i n g parameters.  viii  . . .  46  7.1  Adaptive sampling algorithm  48  7.2  Result of adaptive s a m p l i n g  49  7.3  L o o p scheme refinement masks for subdivision  52  7.4  A l g o r i t h m to find unique frequency mapping between two models.  8.1  Setup for acquiring model of t u n i n g fork  61  8.2  Recorded spectrogram  61  8.3  Results of t u n i n g fork collection  62  8.4  A photo of the brass vase and the subdivision surface which represents  .  56  it  63  8.5  Setup for acquiring sound model of brass vase  63  8.6  Results of brass vase experiment  65  8.7  D e t a i l of narrow-band low-frequency noise  66  8.8  Effect of noise on low-frequency modes  66  8.9  Refinement results of brass vase experiment  67  8.10 A photo of the plastic speaker and the subdivision surface which represents it  69  8.11 Results of plastic speaker experiment  70  8.12 Refinement results of plastic speaker experiment  72  8.13 A photo of the toy d r u m and the subdivision surface which represents it  72  8.14 P i v o t of the d r u m ' s metal bars  73  8.15 Results of toy d r u m experiment (metal)  74  8.16 Results of toy d r u m experiment (plastic)  75  8.17 Refinement results of toy d r u m  76  A.l  B e n d i n g schedule for sound effector m o u n t i n g bracket  84  A.2  Schematic for solenoid control circuit  85  B. l  Effect of white noise on spectrograms  87  ix  Cl  A l g o r i t h m to find unique frequency m a p p i n g between two models.  C.2  Difference m a t r i x  89  C.3  Order array  89  C.4  Index array  90  C.5  M a t c h e d array  9°  C.6  M a p p i n g array  90  C.7  F i n d U n i q u e M a p p i n g Pseudo-code  93  x  .  89  Acknowledgements "Great discoveries and improvements i n v a r i a b l y involve the cooperation of many minds." - A l e x a n d e r G r a h a m B e l l T h i s thesis is the result of the encouragement friends.  and support of dozens of my  W i t h o u t the contributions listed below, this thesis may have never pro-  ceeded beyond the course project from which it began. I owe A C M E members past and present a huge thank you! In particular, Jochen L a n g and J o h n L l o y d contributed the bulk of the A C M E software w i t h i n which my experiments resided.  T h a n k y o u for the seemingly endless discussions  regarding the sensor class, mysterious aborts and assorted odd A C M E behaviours. Derek D i F i l i p p o , P a u l K r y and D o u g J a m e s contributed many focused ideas when mine were fuzzy. T h a n k s for all the insight into sound, surfaces and life! M y supervisor, Dinesh P a i , played an enormous role in the development of this thesis.  H i s thoughtful suggestions and advice steered me out of dark corners  whenever I got lost. T h a n k you for being a fantastic supervisor and friend! M a n y other friends supplied advice and support when I needed it most over these past two years. J i m Green and J a c o b Ofir were constant barometers of my progress. B o n n i e T a y l o r , K r i s t e n P l a y f o r d and A n d r e a B u n t kept me going through tough times. K e l l y C r o n i n gave me a good reason to finish. A n d of course, the rest of the faculty, staff and students of the department always made me feel welcome. T h i s thesis would not have been possible w i t h o u t the generous financial contributions of N S E R C , B C A d v a n c e d Systems Institute and I R I S .  JOSHUA L . RICHMOND  The University of British Columbia August 2000  xi  Dedicated  to my family:  G a r y , C a r o l and E r i c a R i c h m o n d .  T h a n k you for your unwavering encouragement and s u p p o r t . Y o u are terrific examples of success in work, love and life.  xii  Chapter 1  Introduction W h e n radio dramatists began adding sound effects to their productions in the 1920's, they demonstrated the i m p o r t a n t role of sound in our perception of everyday events and environments [20]. W i t h o u t visual aid, listeners could easily identify unspoken actions, e.g., a character's entrance into a room by the sound of a door opening, followed the sound of footsteps.  Even w i t h the advent of m o t i o n pictures in 1896,  it was quickly realized that the rattles, pings and knocks of events were needed to cement the realism of the viewing experience . W h a t the entertainment 1  industry  has learned is t h a t sound is a critical component of our everyday lives. T h e ubiquity of sound in our lives is not surprising when the physics of contact is considered. W h e n contact occurs between two objects, the energy of the i m p a c t is transferred to each object. T h i s energy propagates t h r o u g h the objects, producing vibrations of their surfaces. T h e frequency, a m p l i t u d e and decay of these vibrations depend, in part, on the shape and material of the object [31].  These  surface vibrations create fluctuations in the air pressure around each object and are perceived as sound. Such "contact sounds" provide a listener w i t h much information: location of contact, contact force, material composition of the object as well as its The emergence of sound in film in 1926 was delayed only by the technical difficulties of synchronising recorded sound and film. In fact, Thomas Edison viewed moving pictures as an accessory to the phonograph! [29] 1  1  shape, size and surface texture. H u m a n s can use these audio cues to recognise events and discriminate between materials [10, 12, 16]. G i v e n the i m p o r t a n c e of sound to real-life interactions, it is clear t h a t sound is a required component of any successful v i r t u a l environment or s i m u l a t i o n . One approach to i n c l u d i n g sound in v i r t u a l environments is synthesis from physics-based models [30]. A sound m o d e l of an object can synthesise appropriate contact sounds, given a contact force and contact location, by direct c o m p u t a t i o n . B y "sound m o d e l " we mean a m a t h e m a t i c a l representation  that can be used for simulation of the  object's sound. T h i s is in contrast to an approach where recorded samples are simply replayed, or modulated to account for various forces and locations. A n advantage of the model-based approach is that one model can be used to generate sounds for any number of different interactions (e.g., scraping, pinging, rolling) by s u b s t i t u t i n g the appropriate force profile. T h e parameters of a sound model can be derived a n a l y t i c a l l y for objects w i t h simple geometry and material composition [30]. M o s t everyday objects, however, have complex geometry and material composition which complicate a n a l y t i c a l solutions.  F o r such objects, model parameters can be estimated from e m p i r i c a l  measurements [30]. Recent work on robotic perception has determined that sound models can also be used to a u t o m a t i c a l l y identify materials [8, 17]. A sound model parameterised over the surface of an object can be viewed as an acoustic m a p of the object, analogous to a texture m a p in graphics. T o create such a sound model of an object from empirical measurements requires recording sounds at hundreds of locations over the object's surface. T o populate even a simple v i r t u a l environment w i t h sonified objects could therefore require thousands of measurements. T h e system described in this thesis automates the sound  measurement  procedure. T o our knowledge, this system is the first to automatically  create c o m -  plete sound models of objects. T h i s sound model can be registered to other models  2  of an object (e.g., surface, deformation) to represent a complete reality-based model. Parts of this work were published previously: [25], [26]. A review of sound models and measurement techniques is presented in C h a p ter 2. T h e requirements of a sound measurement system and an overview of our system are included in C h a p t e r 3. T h e components of the system are subsequently described: the sound acquisition device (Chapter 4). asynchronous d a t a server ( C h a p ter 5), prototypical m o d e l generation (Chapter 6) and adaptive s a m p l i n g a l g o r i t h m (Chapter 7).  Results of t y p i c a l d a t a collections are presented in C h a p t e r 8, and  conclusions on the work are stated in C h a p t e r 9.  3  Chapter 2  Contact Sounds 2.1  Overview  C o n t a c t sounds are the sounds produced by our everyday interactions w i t h the environment. of everyday  W e use the phrase "everyday" i n the context of G a v e r ' s definition listening:  the perception of events from the sounds they make [9].  Just as he contrasts everyday listening to musical sounds from musical sounds.  listening,  we distinguish contact  M u s i c a l sounds are themselves the intended result  of some action; contact sounds are the unintentional consequence of an a c t i o n . In general, musical sounds are harmonic while contact sounds are i n h a r m o n i c . These distinctions are not strict, but serve to separate the d o m a i n of our task from the modelling of musical instruments. A s an illustrative example of contact sounds, imagine the sound produced by setting your coffee m u g onto y o u r desk, or pushing it across the desk. T h e sound of the mug s t r i k i n g the desk, and the scraping sound of it being pushed, are both contact sounds. T h e ease w i t h which these sounds are brought to m i n d emphasises the importance of sound to our everyday experiences; though you may have never consciously attended to the sounds in real life, they can be easily recalled. M u c h useful information is conveyed by contact sounds. G a v e r was the first to enumerate this information in [9]. He classified the information into three cat4  Material  Restoring  Density  Force  Configuration  Interaction  Damping  A  Internal  Type  Structure  Force  Shape  Size  Resonating Cavities  F i g u r e 2.1: A t t r i b u t e s of contact sounds. G a v e r identified three categories of information conveyed by contact sounds: m a t e r i a l , interaction and configuration. Each category is composed of more p r i m i t i v e dimensions as shown. Adapted from [9].  egories: material, interaction and configuration. These categories are expanded in F i g u r e 2.1. Such information is beneficial to m a n y applications. F o r instance, it is useful feedback in teleoperative systems. A s mentioned previously, the addition of contact sounds to v i r t u a l environments and simulations enhances the realism of the environment. D u r s t and K r o t k o v also demonstrated the use of contact sound models for automatic material classification [8]. F u r t h e r m o r e , recent work has related the parameters of a contact sound model to h u m a n perception of material [16, 12]. T h e system described by this thesis a u t o m a t i c a l l y acquires the measurements required to create contact sound models.  B y "contact sound m o d e l " , we mean a  m a t h e m a t i c a l representation t h a t can be used to synthesise the sound produced by contact between two objects.  T h e model is parameterised over the surface of an  object, much like a texture m a p in graphics. T h i s chapter is an overview of the m a t h e m a t i c a l model we use to represent contact sounds. A brief description of the model and its parameters is given. T h e algorithm for estimating the model's parameters from recorded samples is also explained.  2.2  Related Work  M u c h research has been conducted in the field of sound modelling. In particular, many people have modelled musical instruments.  5  P h y s i c a l modelling is possible  for a variety of musical instruments [1]. A s one example, C h a i g n e and D o u t a u t modelled the acoustic response of wooden xylophone bars using a one-dimensional E u l e r - B e r n o u l l i equation with the addition of two d a m p i n g terms and a restoring force [3].  T h i s formulation is derived from the geometry of the xylophone bars,  w i t h empirical values for the d a m p i n g parameters.  A n o t h e r example is C o o k and  T r u e m a n who used principal component analysis, Infinite Impulse Response (IIR) filter estimation and warped linear predication methods to model the directional impulse response of stringed instruments from empirical d a t a [4]. G a v e r introduced the modelling of everyday sounds [9]. H i s model of metal and wood bars is based on the wave equation w i t h an exponential d a m p i n g term for each frequency mode. T h e fundamental mode, its i n i t i a l a m p l i t u d e and d a m p i n g factor are determined empirically. P a r t i a l modes are calculated as ratios of the fundamental mode. Recently, a physics-based model for contact sounds of everyday objects was developed by van den Doel [30, 31]. T h e model of van den D o e l was selected for use in our work, and will be described in the subsequent sections of this chapter. T h e model of van den D o e l is appealing for several reasons.  F i r s t , it is  similar in structure to models used by G a v e r [9], Hermes [12] and K l a t z k y et al. [16] in their perception experiments.  Such perceptual studies may yield results useful  to evaluating the performance of our system. F u r t h e r m o r e , these studies provide information applicable to our adaptive sampling a l g o r i t h m as described in C h a p t e r 7. Secondly, the model is suitable for synthesising sounds in real-time — a necessary component of interactive simulations. F i n a l l y , it was selected for its effectiveness at representing contact sounds of everyday objects. It should be noted that the measurement system described herein is not restricted to producing one specific sound model. A n y sound model whose parameters can be estimated from recordings of acoustic impulse responses may be created using this system.  6  2.3  A Contact Sound Model  F o r m u l a t i n g a general sound model is complex because it depends on many parameters: material, geometry, mass distribution, etc. T h i s section follows the development of the contact sound model presented in [30, 31]. If we assume linear-elastic b e h a v i o u r , the v i b r a t i o n of an object's surface 1  is characterised by the function p,(z,t).  Here \x represents the deviation of the  surface from equilibrium point z at time t. fi obeys a wave equation of the form in E q u a t i o n 2.1, where A is a self-adjoint differential operator and c is a constant related t o the speed of sound in the material [31].  (A-lj^Mz,t)  = F(z,t)'  (2.1)  In the absence of external forces, E q u a t i o n 2.1 can be solved by the expression  fi(z,t) = J ^ ( a i sin(u>ict) -f- 6; cos(w;ct))\I> '(z), i=l t  (2.2)  where a - and 6, are determined by boundary conditions, u>i are related to the eigent  values of operator A and \P; (z) are the corresponding eigenfunctions. W h e n the object is sufficiently far from the listener, the sound pressure due t o this solution can be approximated by the impulse response function in E q u a t i o n 2.3. T h e exponential term is added to model material d a m p i n g . C o m p l e t e details of this derivation are provided in [30]. TV,  p ( x , t ) = X> ,.-e~ j=i  dx>,  x  ' * sin(w ,ii) (  )  x  (2.3)  T h i s model expresses the sound pressure p at t i m e t, as the s u m of Nj frequency modes. Here x is the location of the force impulse. E a c h mode i in the model has a frequency w j , initial amplitude a - and exponential d a m p i n g factor X )  Xjt  c? ,i- T h e d a m p i n g factor models material d a m p i n g due t o internal friction; in the x  A reasonable assumption for light contacts.  1  7  literature [32], this is parameterised by an internal friction parameter <f> as expressed in E q u a t i o n 2.4.  d  x>i  = 2w  Xii  tan(<£)  (2.4)  F o r each location x on the surface of an object, the parameters w ,-, a - and X)  d j must be estimated for all Nj X)  X)t  modes.  Because the model is an impulse response model, sounds produced by any linear force interaction can be synthesised by a simple convolution of the force and the model.  2.4  Empirical Parameter Estimation  T h e parameters for the sound model as it is expressed in E q u a t i o n 2.1 can be determined a n a l y t i c a l l y by E q u a t i o n 2.2 for objects of simple geometry and material composition [30]. O f course, to solve this expression for arbitrary everyday objects is not feasible.  T h e alternative is to estimate the parameters of a similar model  (i.e., E q u a t i o n 2.3) from empirical measurements. T h a t is, the object is struck at position x , the resulting sound recorded, and the parameters estimated from the recording. E x a m p l e s of such techniques are [4, 8, 30]. Since we are using the model of van den D o e l , we will use the parameter estimation algorithm described in the same work. is described in this section.  A n overview of the algorithm  F i g u r e 2.2 lists the steps of the a l g o r i t h m ; a brief  explanation of each step follows. Readers requiring more detail are referred to [30]. In the first step, a spectrogram is computed for the entire recorded  sample.  T h e spectrogram is c o m p u t e d by calculating the discrete Fourier transform ( D F T ) on fixed-width (in time) segments of the recording. T h e segments are selected using overlapping H a n n i n g windows.  For details on H a n n i n g windows and the D F T , a  good introduction is [28].  8  1. C o m p u t e the windowed D F T spectrogram of the recorded sample. 2. Identify the signal. 3. E s t i m a t e the frequency modes. 4. E s t i m a t e the d a m p i n g parameters. •5. E s t i m a t e the i n i t i a l amplitudes.  F i g u r e 2.2: P a r a m e t e r E s t i m a t i o n A l g o r i t h m . T h e signal must now be isolated from the background noise (Step t w o ) . T h e segment of the spectrogram w i t h m a x i m u m intensity is the start of the signal. T h i s peak corresponds to the onset of the i m p a c t . T h e end of the signal is the first segment k whose intensity Ak falls below A + Xe(A),  where A is the average intensity of the  region before the signal's start, <r(A) is the standard deviation of that region and X is a constant w i t h a t y p i c a l value of 10. T o estimate the frequency modes (Step three), a histogram is created. E a c h segment (in time) of the spectrogram w i t h i n the signal region casts votes for the Nj frequencies w i t h greatest amplitude w i t h i n t h a t segment. T h e Nj frequencies t h a t obtain the most votes over a l l the segments are selected as the d o m i n a n t frequency modes (w j) of the model at that l o c a t i o n . Xi  For each mode, the log of the spectrogram is fit to the linear function —otik + (3,, where k is an index into the segments of the spectrogram, and the signal starts at segment k — 0. T h e d a m p i n g coefficients are then calculated by E q u a t i o n 2.5.  dx,i = Qf.on/N, where Q is the window overlap factor of the D F T , /  (2.5) s  is the sampling frequency and  N is the size of the D F T w i n d o w . T h e initial amplitude of each frequency mode can then be calculated by  9  Equation 2.6.  (2.6) with p  Xii  =  d N/f . Xti  s  A s stated, the algorithm assumes the recording is the response to an i m pulsive impact. T h e algorithm can be used w i t h responses to other forces by deconvolving the signal w i t h the force profile. A g a i n , the measurement system described by this thesis could use any suitable parameter estimation technique, and is not restricted to the method outlined above. T h i s technique was chosen because it is designed specifically for the selected sound model.  2.4.1  Performance E v a l u a t i o n  A n evaluation of the estimation a l g o r i t h m was performed using synthetic d a t a to determine its robustness to noise.  Since the real samples will be recorded in a  relatively noisy environment, this evaluation is necessary to estimate the expected performance of the a l g o r i t h m w i t h real d a t a . To evaluate the estimation a l g o r i t h m , a test signal is synthesised using the model in E q u a t i o n 2.3. F i f t y frequency modes are randomly selected from a G a u s sian d i s t r i b u t i o n (p = 7 k H z , a = 4 k H z ) . F o r each frequency mode, d a m p i n g factors (d{) and initial amplitudes (a;) are r a n d o m l y selected from a uniform d i s t r i b u t i o n of ranges [10,30] and [0,1] respectively. T h e i n i t i a l amplitudes are scaled so that 1.0. Noise is added to this test signal at a specified signal-to-noise ratio. M o s t of the noise in the A C M E environment originates from fans used to cool equipment. A quick spectral analysis of the r o o m revealed the highest concentration of noise in a band from 0 to 200 H z . A m b i e n t white noise was present, though at lower energy levels.  T h i s room noise is a p p r o x i m a t e d in our simulation by low-pass Gaussian  10  noise, band-limited at 200 H z by a fourth-order filter.  B r o a d b a n d white noise is  added at l / l O O " of the normalised a m p l i t u d e of the low-frequency noise. 1  One hundred trials of the evaluation were executed.  F o r each t r i a l , a test  signal was synthesised with noise added at eight signal-to-noise levels: oo (i.e., no noise), 100, 50, 30, 20, 15, 10 and 5 . T h e noisy signal was then processed by the 2  estimation algorithm outlined in Section 2.4. T h e entire test was implemented in Matlab. T h e estimated  model parameters are compared by the following metrics.  E a c h estimated frequency mode / ; is assumed to be the estimation of the generated frequency mode fj that minimises the difference \ fi~ fj\ V / = 1 . . .50. A logarithmic error ratio ( E q u a t i o n 2.7) was computed for each estimated model parameter. T h e means of these error ratios are plotted i n F i g u r e 2.3.  f  =  log A  E  =  logj  E  d  (2.7)  where j is chosen to minimise |/,- — fj\ V j = 1 . . .50. A s illustrated by F i g u r e 2.3, the parameter estimation a l g o r i t h m is extremely sensitive to noise. For estimates of initial amplitude and frequency, the error i n creases consistently w i t h increasing levels of noise (Figure 2.3 (a,b)). T h e trend is not as consistent for the d a m p i n g parameter estimate (Figure 2.3 (c)). However, as the dashed line in F i g u r e 2.3 (c) indicates, the variance of estimation error increases d r a m a t i c a l l y w i t h increasing noise. M o s t i m p o r t a n t l y , it should be noted that frequency estimates have an error ratio of almost 30 at even modest levels of noise ( S N R = 100).  The signal-to-noise ratio is calculated as the maximum signal amplitude divided by the maximum noise amplitude. This yields an appropriate measure of signal-to-noise for our experiments since the signal decays to zero amplitude. 2  11  0  0.02  0.04  0.06  0.08  0.12  0.1  0.14  0.16  0.16  0  0.2  0.02  0.04  0.06  0.08  noise/signal  (a) M e a n initial amplitude errors (E )  0.12  0.14  (b) M e a n frequency errors  a  0  0.1  0.16  0.18  noise/signal  0.02  0.O4  0.06  0.08  0.1  0.12  0.14  0.16  0.18  (Ej)  0.2  noise/signal  (c) M e a n d a m p i n g errors (Ed) F i g u r e 2.3: M e a n errors of the parameter estimation algorithm on synthetic d a t a . T h e mean error over 100 trials is plotted for eight signal-to-noise ratios: oo, 100, 50, 30, 20, 15, 10 and 5. F o r convenience, the inverse ratio (noise/signal) is used as the abscissa. Sub-figure (c) also plots the standard deviation of error i n the d a m p i n g parameter estimate. See E q u a t i o n 2.7 for definitions of Ej, E and Eda  12  0.2  Chapter 3  A System for Automatic Measurement of Contact Sounds 3.1  Objectives  T h e goal of this work is to produce a system which a u t o m a t i c a l l y acquires contact sound measurements over the surface of an a r b i t r a r y object for the purpose of crea t i n g a sound model.  T h i s system is one component of a larger project focused  on creating complete reality-based models. A reality-based model is one whose parameters are calculated from empirical measurements of real objects.  A complete  reality-based model is inherently m u l t i - m o d a l , w i t h sound comprising only one part. O t h e r modes could include surface texture, shape and deformation. A s part of the reality-based modelling project, our sound measurement syst e m shares its development platform: the A c t i v e Measurement ( A C M E ) facility at the U n i v e r s i t y of B r i t i s h C o l u m b i a ( U B C ) . T h e A C M E facility is a fifteen degreeof-freedom ( D O F ) robot designed to acquire measurements for reality-based model creation [21].  T h e facility is equipped w i t h a 5 - D O F gantry, 6 - D O F robot arm  on a linear stage and a 3 - D O F test stage.  These actuators are used to position  sensors and make measurements of objects mounted on the test s t a t i o n .  13  Sensors  Simulation Sound Measurement (x,y,z) Sound Measurement (x,y,z)  (.••.innate  f-tllll.UC  model (x,y,z)  Position  model (x,y,z)  Force ./Convolve  Environmental Effects (reverb, spatialisation, etc.)  Scope of thesis  F i g u r e 3.1: T h e sound measurement-to-production pipeline. Measurements at many locations on an object's surface (x, y, z) are acquired to create a model at each location. Force and position input from a simulation are used to synthesise sounds produced by interactions w i t h the object. A mixer a l g o r i t h m interpolates between sound models at different locations. E n v i r o n m e n t a l effects such as reverb and spatialisation enhance the realism o f t h e synthesised sound. O u r scope is restricted to measurement and modelling (shaded b o x ) .  include a 3 - C C D colour camera, Triclops [22] t r i n o c u l a r vision system and a 6 - D O F force/torque sensor. The work t h a t is the subject of this thesis includes the design and implementation of a contact sound measurement system for A C M E . A s described i n Section 3.3, the sound measurement system includes all the hardware and software needed to meet the design requirements listed in Section 3.2. The scope of the project is restricted to a system for the measurement of contact sounds. Related research, including sound synthesis, interpolation between sound models and modelling of the environment is outside the scope of this thesis. F i g u r e 3.1 places our project in the context of the sound measurement-to-production pipeline.  3.2  System Requirements  To be successful, a sound measurement system must be able to deliver low-inertia, near-impulsive impacts over the surface of an a r b i t r a r y object. M o d e l s generated by the system must be registered to the surface of the object for integration w i t h  14  other modes of the complete reality-based model. T h e force profile of each i m p a c t must be known so that the parameters of a sound model can be estimated as described in Section 2.4. T h e system should acquire models automatically. E n o u g h samples should be collected to adequately represent the variation in sound over the entire outer surface of the object. Since the system will be a component of A C M E , all devices used by the system must be controllable under the A C M E framework. T h a t is, an interface to every device must exist in the A C M E J a v a architecture and be operable from a remote w o r k s t a t i o n . It is assumed t h a t a surface representation of the test object is provided t o the sound measurement system by the A C M E trinocular camera. T h i s aspect of the system is not an objective of the current research and is not discussed herein. M o r e detailed requirements of each component of the sound  measurement  system are presented i n the respective chapters.  3.3  System Overview  A sound measurement system was designed and implemented to meet the requirements listed in Section 3.2. T h e system can be viewed as four modules: a sound acquisition device, an asynchronous d a t a server, an algorithm for c o m p u t i n g prototypical sound models and an adaptive s a m p l i n g a l g o r i t h m . E a c h of these modules will be discussed in the following four chapters. C h a p t e r 4 outlines the requirements and implementation of the sound acquisition device. T h i s includes the design of a sound effector for the A C M E arm,  robot  the software to control it and the selection and location of microphones. T h e  sound effector is a solenoid that mounts on the end of the P u m a 260 robot a r m and delivers near-impulsive impacts to objects under computer control. A description of the asynchronous d a t a server is provided in C h a p t e r 5.  15  T h e d a t a server architecture is adaptable to many types of d a t a .  Its design and  specialisation to sound d a t a are discussed. One advantage of using a robotic system is the ability to record multiple samples at the same surface l o c a t i o n . A collection of samples can be used to generate a p r o t o t y p i c a l model which best represents the sound at that l o c a t i o n . One algorithm for generating p r o t o t y p i c a l models is explained in C h a p t e r 6. T h e final module is an adaptive s a m p l i n g algorithm for selecting points on the object's surface at which to sample. T h e sampling algorithm uses the object's surface representation to create a mesh of sample locations. It is adaptive because it uses differences in sound models to adjust the granularity of the s a m p l i n g mesh. T h e surface representation and adaptive a l g o r i t h m are described in C h a p t e r 7. T h e general procedure for creating a complete sound model using this system is diagrammed in F i g u r e 3.2. F i r s t , a test object is positioned at the center of the test station (Figure 3.3). T y p i c a l l y the object is attached such that it does not move from light contact, yet remains free to v i b r a t e . A surface model of the object is then acquired using the trinocular stereo c a m e r a .  F r o m this surface m o d e l , a coarse  sampling mesh is created using the surface's vertices as sample locations. N e x t , the sound effector is positioned 5 m m away from each sample location. T h i s is done w i t h a guarded move: the sound effector approaches the sample location until contact is sensed, then retracted 5 m m . Once i n position, the sound effector is actuated multiple times, the resultant sound samples recorded, and models c o m p u t e d . F r o m the multiple samples, a p r o t o t y p i c a l m o d e l is produced for each sample location. If two p r o t o t y p i c a l models at adjoining vertices of the surface mesh differ by too much, the sampling mesh is refined, and a new model is acquired at a position between the two coarse locations. T h i s procedure continues until no further refinements are required. T h e complete sound model produced by this procedure is easily registered to other modes of the object m o d e l since it creates sound models at vertices of the  16  Start  Move sound effector back 5 mm \  "TT"  Actuate sound effector and record sample  A\TL  Create sound model Repeat M times  TV  Create prototypical model  TT  Compute acoustic distance to neighbouring sample locations  F i g u r e 3.2: G e n e r a l procedure for sound model creation.  17  Figure 3.3: A test object on the A C M E test s t a t i o n .  18  object's surface model. Results of some typical d a t a collections are presented in C h a p t e r 8.  The  accompanying C D contains audio tracks and an application c o m p a r i n g recorded sounds to sounds synthesised from a model.  19  Chapter 4  A Sound Acquisition Device 4.1  Overview  In order to estimate the parameters of an object's sound model, a recording of the object being struck w i t h a known force is required. T h e process of estimating parameters from the recording is described in Section 2.4.  Since our goal is to  create sound models automatically, we require a device which is capable of s t r i k i n g arbitrary objects in a manner suitable for the estimation process. A device is also required to record the resultant sounds. M o r e detailed requirements of these devices are listed in Section 4.3. T h i s chapter describes a device that was designed to strike objects for the purpose of sound measurement. T h e device is an end effector for the P u m a 260 robot a r m which is part of the A c t i v e Measurement facility at U B C . T h e A c t i v e Measurement facility is described in Section 4.4.  Details of the end effector and sound  capturing hardware are presented in Sections 4.5 and 4.6. Section 4.7 describes the software required to control the end effector. T h e software used to record the sound samples is the topic of C h a p t e r 5.  20  4.2  Related Work  A l t h o u g h this is the first work to a u t o m a t i c a l l y create sound models over the entire surface of an object, other people have investigated sound model creation from recordings of real objects. T h e mechanisms by which these recordings were produced are varied. C o o k and T r u e m a n recorded directional impulses from a number of acoustic instruments by s t r i k i n g the instruments' strings w i t h a M o d a l Shop M o d e l 0 8 6 C 8 0 miniature force hammer [4]. T h e sound was recorded using a twelve-microphone icosahedral grid assembly. Recordings were stored using two Tascam D A - 8 8 d i g i t a l audio tape recorders. It was not mentioned how precisely the impact locations were registered in space, since registration was not i m p o r t a n t to the study. Impacts were performed manually, and the force profile of each impact was used to filter "bad" impacts. T h e commercial force hammer used by C o o k and T r u e m a n is not suited to our task because it cannot effect an i m p a c t under its own power. W h i l e it could be mounted at the end of a robot a r m , the swinging of a hammer is difficult to c o n t r o l . D u r s t and K r o t k o v created impulse response models of different materials by dropping an a l u m i n u m cane w i t h a plastic t i p onto each material [8]. T h e cane was dropped from a constant height through a cylindrical guide [17]. Recordings were made using an omni-directional condenser microphone connected to a P C - b a s e d A / D board. T h i s method is clearly not suitable for our task. van den D o e l created sound models from impacts w i t h a non-instrumented hammer [30]. A g a i n , the hammer mechanism is not suited to our a p p l i c a t i o n ; this experiment is mentioned because satisfactory results were obtained w i t h only a crude a p p r o x i m a t i o n of the i m p a c t force. H u a n g uses a t a p p i n g device to position and orient objects in a plane [13]. His requirements of the device are similar to ours, except he does not require the device to operate off the horizontal plane. T h e mechanism he proposes is similar to a pinball plunger; a spring-loaded rod is released by an electric latch, then reloaded  21  automatically [ W . H u a n g , personal c o m m u n i c a t i o n , A p r i l 28. 2000]. Unfortunately, his device may not operate correctly w i t h any vertical inclination.  4.3  Requirements  The s t r i k i n g device must deliver l o w - i n e r t i a impacts to objects at any position on their surface. T h i s requires operation t h r o u g h a ± 9 0 ° vertical range. T h e force of impact should not be so strong as to move the object, yet strong enough to produce a recordable sound. T h e quantity of force is dependent on the material being measured. The force must be near-impulsive. A true impulse is not realisable, but may be approximated by a force t h a t is well localised in space and time. A s demonstrated by van den D o e l , such approximate impulses yield satisfactory sound models [30]. To permit future experiments, the device should be usable for scraping objects. T h i s will permit acquisition of sounds for granular synthesis, another model for sound synthesis. For integration w i t h other models produced using the A c t i v e Measurement facility, the locations of i m p a c t must be registered to a c o m m o n frame-of-reference.  4.4  The Active Measurement  Facility  A s mentioned in the first chapter, our system is a component of a larger project at U B C : the A c t i v e Measurement facility ( A C M E ) . T h e A C M E facility provides a rich environment for our measurement system. C u r r e n t l y , A C M E consists of three main subsystems: the field measurement system ( F M S ) , test station and contact measurement system ( C M S ) [21]. These subsystems are used as a base platform for the hardware required by the sound measurement system.  T h e roles of each  subsystem are expanded below. The field measurement system is a five degree-of-freedom ( D O F ) robot con-  22  sisting of a 3 - D O F gantry and a p a n / t i l t unit. C u r r e n t l y a 3 C C D colour camera and Triclops trinocular stereo c a m e r a are attached to the p a n / t i l t unit. T h e F M S is used to acquire measurements at a distance from the test object (i.e., in "the field"). A microphone is added to the F M S for m a k i n g sound  measurements.  The test station is a 3 - D O F robot which can position (x, y) and orient objects being measured. Its accuracy is ± 0 . 0 0 0 2 5 " and ± 1 0 arc-min [21]. The contact measurement system consists of a 6 - D O F P u m a 260 robot arm w i t h an A T I force/torque sensor mounted at its t i p . T h e C M S can move an end effector into contact w i t h the test object i n a w o r k i n g volume around the test station. To prepare the C M S for sound measurement, the three fans of the P u m a control box were replaced by whisper fans. T h e P u m a a r m cannot be used to directly strike the objects because its inertia prevents light, impulsive impacts. Instead, a special end effector is attached. T h e end effector is described in Section 4.5. E a c h of these subsystems is controlled remotely using Java-based control software. F o r more details on this architecture, refer to [21]. U s i n g the A C M E facility as a development platform provides registration of sound samples to other models (e.g., deformation models) since all A C M E sensors and actuators share a c o m m o n frame-of-reference.  T h e sound model is specifically  registered to the surface model of the test object.  4.5  Sound Effector  We refer to the end effector designed to strike objects for sound measurement as the sound effector (see F i g u r e 4.1). T h i s device is centered on an electric push-solenoid (Ledex S T A model 195025-227). T h e solenoid is augmented w i t h a return-spring and m o u n t i n g bracket. A retaining mechanism is designed into the m o u n t i n g bracket to provide some rigidity in the de-energised state of the solenoid.  T h i s rigidity  enables the sound effector to be used for scraping objects. T h e m o u n t i n g bracket is constructed from a l u m i n u m plate.  23  F i g u r e 4.1: T h e sound effector is a spring-return push solenoid mounted on an alum i n u m bracket. A condenser microphone is attached to the b o t t o m of the bracket.  T h e current design is m i n i m a l so as to reduce the weight of the effector. F o r example, given that a threaded rod is available on the robot's interface plate, the design incorporates the rod for both attachment purposes and as the aforementioned retention device. T h e t o t a l torque applied to the r o d by the effector is approximately 0.0653 N - m (total weight: 106 g) — w i t h i n acceptable l i m i t s of the force/torque sensor (500 g). A blueprint of the m o u n t i n g bracket and details of its construction are included in A p p e n d i x A . A n interface circuit is required to activate the solenoid from software. T h e circuit schematic is included in A p p e n d i x A . T h e interface circuit is connected to a digital output of a Precision M i c r o D y n a m i c s M C 8 b o a r d . T h e board provides a 5 V D C output controllable from its on-board S H A R C D S P or the host computer. C u r r e n t l y the M C 8 board is also used to run a P I D c o n t r o l loop for the F M S and test station.  A description of the software used to c o n t r o l the sound effector is  presented in Section 4.7.  24  4.6  Sound Capture Hardware  T h e sound capture hardware consists of two condenser microphones and a P C sound card. T h e microphones are O p t i m u s omni-directional lapel microphones w i t h a fiat frequency response from 70 to 16 000 H z [14]. O n e microphone is attached to the b o t t o m of the sound effector's mounting bracket (Figure 4.1); the other to the p a n / t i l t unit of the F M S . T h i s placement enables near and far-field recording of i m p a c t sounds. T h e microphone on the F M S can be moved to any location around the object for experiments evaluating directional impulse responses. A C r e a t i v e Sound Blaster Live! card is used to record the sounds digitally. T h i s card is commercially available and can sample at up to 46 k H z [5].  Using  a P C sound card facilitates easy and affordable upgrades as technology improves. One disadvantage of the card is that only two channels, of sound may be recorded simultaneously. Furthermore, both channels must use the line-in connection, since the microphone connection is single channelled. T h i s restricts our system to using two microphones, b o t h of which must be pre-amplified to line levels. If more input channels are required, a high-end sound card could be purchased.  4.7  Sound Effector Software  A c t i v a t i o n of the sound effector requires a unit step signal from the digital output of the M C 8 b o a r d . T h e step w i d t h determines the stroke distance of the solenoid. Initially, the step function was to be generated i n the interface electronics. A software solution, however, enables us to change the w i d t h of the unit step, and hence stroke length, at runtime. T o control this output using A C M E requires a J a v a interface compliant w i t h the A C M E D e v i c e interface. T h i s section describes the design of the A C M E D i g i t a l O u t p u t C o n n e c t i o n S e r v e r ( D O C S ) — an interface between A C M E and the digital outputs of the M C 8 b o a r d .  25  T h i s interface is used by the  A C M E Server  MC8  HAV  RMI  F i g u r e 4.2: D i g i t a l O u t p u t C o n n e c t i o n server architecture.  sound effector, but can also be used by other devices requiring similar c o n t r o l (e.g., lights). T h e D i g i t a l O u t p u t C o n n e c t i o n S e r v e r has three main components  (Fig-  ure 4.2): a C o n n e c t i o n S e r v e r , O u t p u t S e r v e r and one or more D i g i t a l O u t p u t D e v i c e s (e.g., sound effector, or spot light). T h e M C 8 board has 32 d i g i t a l o u t p u t lines available for external devices [23]. T h e C o n n e c t i o n S e r v e r manages the allocation of each o u t p u t line to a specific D i g i t a l O u t p u t D e v i c e . T h e O u t p u t S e r v e r is a 'C  program (with a J a v a native interface) which controls the initialisation, t i m -  ing and o u t p u t of the signal to the M C 8 board. T h e D i g i t a l O u t p u t D e v i c e is an abstract i m p l e m e n t a t i o n of the A C M E D e v i c e interface.  E a c h device using the  D i g i t a l O u t p u t C o n n e c t i o n S e r v e r is controlled by a class extending D i g i t a l O u t p u t D e v i c e . E a c h device must lock one channel of the M C 8 for its exclusive use by registering w i t h the C o n n e c t i o n S e r v e r at initialisation. T h e O u t p u t S e r v e r and C o n n e c t i o n S e r v e r run as separate processes from the A C M E server. A l t h o u g h the O u t p u t S e r v e r and C o n n e c t i o n S e r v e r reside on the Solaris computer hosting the M C 8 board, devices can be controlled from A C M E experiments r u n n i n g on any computer because the D i g i t a l O u t p u t D e v i c e s c o m m u nicate to the C o n n e c t i o n S e r v e r using the J a v a Remote M e t h o d Invocation ( R M I ) interface. A s mentioned above, t i m i n g of the unit step is controlled by the O u t p u t S e r -  26  ver.  T h i s ' C program provides resolution better than a millisecond, l i m i t e d only  by the Solaris operating system. Isolating the timing from the A C M E experiment ensures consistent t i m i n g of the o u t p u t signal.  27  Chapter 5  An Asynchronous Data Server 5.1 The  Overview next module of the system is the software used to record sounds produced by  s t r i k i n g the test object. Previously, no generic architecture existed w i t h i n for c a p t u r i n g streaming d a t a . A s w i t h the DigitalOutputConnectionServer  ACME (see  Section 4.7) a generic d a t a server was designed, and then specialised to sound d a t a for this research. T h e generic d a t a server framework became the Sensor class of the A C M E project. T h i s chapter describes the design and implementation of an asynchronous d a t a server for A C M E , including its specialisation to sound d a t a . T h e next section lists the requirements of such software. Section 5.3 describes the architecture of the d a t a server. Implementation details are discussed in Section 5.4.  5.2  Requirements  T h i s section outlines the requirements of a generic d a t a server (Section 5.2.1) and its specialisation to sound d a t a (Section 5.2.2).  28  5.2.1  G e n e r i c D a t a Server R e q u i r e m e n t s  One d a t a server must exist for each sensor device. If multiple sensor devices of the same data, type exist, each sensor must have its own d a t a server. Each d a t a server will run as a separate process. T h i s division enables distribution of the server processes over multiple computers. If the sensor hardware resides on a computer t h a t is not the m a i n A C M E host, the sensor process should also reside on that computer. Since d a t a collection can be an intensive operation, distribution increases the d a t a servers' ability for real-time collection by allocating more processing resources. D i s t r i b u t i o n also reduces the amount of d a t a which must be streamed over the network in real-time. If platform-dependent software is required for a d a t a server, it must not prevent the A C M E experiment from accessing the d a t a from a different operating system. The  d a t a server must be asynchronous.  T h a t is, the d a t a server process  controls the starting and stopping of d a t a collection independently of the main A C M E server. T h i s autonomy eliminates the need for the d a t a server to stream d a t a to the A C M E server in real-time. Since sensor d a t a can be large (e.g., image d a t a from cameras) or frequent (e.g., 44.1 k H z for sound) i t is implausible to transmit each frame of d a t a to the A C M E server for real-time m o n i t o r i n g . The criteria for s t a r t i n g and stopping d a t a collection must be definable by the A C M E experiment. A method must be provided to the A C M E experiment t h a t indicates when d a t a is being collected and when it has finished.  5.2.2  S o u n d Server R e q u i r e m e n t s  T h e sound server must capture d a t a at at least two s a m p l i n g rates: 44 100 H z and 22 050 H z . If the c a p t u r i n g hardware supports higher s a m p l i n g rates, the sound server should accommodate them by user definable properties.  29  T h e format of the d a t a  may  be 8 or 16-bit, and one or more channels (to the l i m i t a t i o n s of the capturing  hardware). Since P C sound cards are readily available and of sufficient quality for our purposes, the sound server will capture d a t a from a c o m m e r c i a l sound c a r d .  5.3 The  Server Architecture d a t a server architecture has four main components: sensor hardware, a S e n -  s o r S e r v e r , S e n s o r D e v i c e , and a d a t a stream connecting the S e n s o r S e r v e r and S e n s o r D e v i c e (Figure 5.1). T h e sensor hardware component is an abstraction of b o t h the physical hardware and device drivers.  T y p i c a l l y , a native interface is  created to allow the device to communicate w i t h other J a v a components.  Also,  although incoming d a t a is typically buffered at the hardware level, it cannot be guaranteed to not drop frames of d a t a if read too slowly. T h e S e n s o r S e r v e r is a process which resides on the computer hosting the sensor hardware. T h i s process queries the sensor hardware at a prescribed rate, starts and stops d a t a capture, and  buffers the incoming d a t a .  T h e S e n s o r D e v i c e is a remote interface to the  S e n s o r S e r v e r . It communicates to the S e n s o r S e r v e r v i a the J a v a Remote M e t h o d Invocation ( R M I ) protocol. T h e S e n s o r D e v i c e resides i n the A C M E server and is the A C M E experiment's link to the S e n s o r S e r v e r . A d a t a stream connects the S e n s o r S e r v e r and S e n s o r D e v i c e . T h e stream uses socket c o m m u n i c a t i o n to pass d a t a from the S e n s o r S e r v e r to the S e n s o r D e v i c e . D a t a is w r i t t e n to the stream by the S e n s o r S e r v e r and buffered until it is read by the S e n s o r D e v i c e . T h u s , d a t a can be reliably streamed to the A C M E experiment at a rate slower than its acquisition at the hardware device. T h i s architecture enables us to acquire d a t a from any computer, regardless of its platform. O n l y the S e n s o r S e r v e r uses platform-dependent native code; the S e n s o r D e v i c e can reside on any remote computer t h a t hosts a J a v a V i r t u a l M a chine. Since each S e n s o r S e r v e r is w r i t t e n for a specific device operating on a fixed  30  data flow control flow A C M E Server  Data Stream  RMI  F i g u r e 5.1: T h e architecture of the d a t a server is divided into four main components: sensor hardware, SensorServer, SensorDevice and a data stream. T h e SensorServer runs as a separate process and is not necessarily run on the same computer as the A C M E server.  platform, this dependence is not restrictive. Not all d a t a is passed from the sensor hardware to the A C M E experiment. Users define the c r i t e r i a for starting and stopping d a t a capture by creating custom SensorTriggers. One S e n s o r T r i g g e r is created to start c a p t u r i n g , and one to stop c a p t u r i n g .  These SensorTriggers can be monitored by other processes at  runtime by registering a T r i g g e r L i s t e n e r object. T h e T r i g g e r L i s t e n e r object will be notified when a S e n s o r T r i g g e r is activated.  T h i s mechanism provides a  rough synchronisation tool to the A C M E experiment. T h e next section clarifies the role of the S e n s o r T r i g g e r s in server execution.  5.3.1  Server  Execution  T h i s section describes the operation of the d a t a server. Its operation can best be viewed as the state d i a g r a m in F i g u r e 5.2. T h e server begins in a LIMBO state. T h i s is a default state i n d i c a t i n g that it has been created, but not yet initialised.  The  A C M E server initialises the d a t a server at s t a r t u p . T h e d a t a server then initialises  3]  startSpoolingO  F i g u r e 5.2: D a t a server state d i a g r a m .  sensor hardware and creates required d a t a buffers.  After initialisation, the d a t a  server waits i n a STOPPED state. The  RUNNING state is entered by a request from the A C M E experiment to  start spooling d a t a . In the RUNNING state, the d a t a server executes a control loop at a rate prescribed by the A C M E experiment. T h e control loop is listed i n F i g u r e 5.3. A t each i t e r a t i o n , the next frame of d a t a is requested from the sensor hardware. Here, a frame is defined as a d a t a measurement for one time period (e.g., one image from a camera, or one 6-value reading from a force/torque sensor).  T h e current  frame is passed w i t h a window of previous d a t a to the S T A R T trigger.  If the  conditions for the S T A R T trigger are satisfied by the current frame, the d a t a server is placed into the CAPTURING state. Otherwise, the RUNNING control loop repeats. Once the S T A R T trigger has been activated, the d a t a server is i n the CAPTURING state. T h e CAPTURING state runs a control loop similar to t h a t for the RUNNING state, w i t h the exception that d a t a is passed onto the d a t a stream ( F i g u r e 5.4). D a t a is read one frame at a time from the sensor hardware, then analysed by the S T O P trigger. If the conditions of the S T O P trigger are met, the d a t a server enters the STOPPED state. Otherwise, the d a t a is added to the stream and the c o n t r o l loop  32  1. R e a d next frame from sensor hardware. 2. E v a l u a t e frame in S T A R T trigger. 3. If S T A R T trigger fires, change state to CAPTURING. 4. Otherwise, loop to step 1.  F i g u r e 5.3: C o n t r o l loop for RUNNING state.  1. R e a d next frame from sensor hardware. 2. E v a l u a t e frame in S T O P trigger. 3. If S T O P trigger fires, change state to STOPPED. 4. Otherwise, send frame to d a t a stream. 5. L o o p to step 1.  F i g u r e 5.4: C o n t r o l loop for CAPTURING state,  repeats. Section 5.3.2 elaborates on the flow of d a t a throughout this process.  5.3.2  D a t a Flow  T h e preceding discussion of server control loops presented a simplified view of the flow of d a t a t h r o u g h the system. M o r e details are included in this section. F i g u r e 5.5 illustrates the flow of d a t a t h r o u g h the server. B o t h the RUNNING and CAPTURING control loops use the same d a t a flow structure to the point indicated in F i g u r e 5.5. A t each iteration of the control loops, one frame of d a t a is read from the sensor hardware. A s mentioned in Section 5.3, the sensor hardware typically provides a d a t a buffer; this occurs at either the hardware or device driver level. T h i s buffer is assumed to overwrite old d a t a as the buffer fills. T h u s , if d a t a is not read quickly 33  copy — ' reference Hardware . Buffer  1 frame  ' •> Rmg ; l°" liullci \  \  1 ri";'i i /•  1 frame /  A  /  V  Transfer ' - . Buffer  'Data"'*) Stream  .  RUNNING loop stops here  F i g u r e 5.5: Server d a t a flow.  enough, frames may be lost. It is assumed t h a t each frame read has not been read previously. T h i s frame of d a t a is copied i n t o a ring buffer that resides in the S e n s o r S e r ver.  T h e S e n s o r S e r v e r guarantees t h a t the d a t a in the ring buffer will not be  overwritten until it has been e x a m i n e d . T h e ring buffer is passed to the trigger object (either S T A R T or S T O P depending on the system state). T h e trigger has a defined window size for looking at the d a t a . T h i s window size must be smaller than the size of the ring buffer and greater than or equal to one frame.  B y defining a window size greater t h a n one  frame, the trigger can perform t i m e - d o m a i n filtering of the signal, or use a timedependent trigger condition. A n example of such a trigger is one that is activated by a signal that is greater t h a n the average of the past five frames. If the server is in the RUNNING state, the d a t a flow ends here.  In the  CAPTURING state, the new frame of d a t a is appended to a transfer buffer.  Once  the transfer buffer is full, it is w r i t t e n to the d a t a stream. T h e size of the transfer buffer is controlled by the user. F o r d a t a which is sampled at a high rate, it may be desirable to buffer several hundred frames before w r i t i n g to the stream.  This  reduces the amount of c o m m u n i c a t i o n overhead needed to t r a n s m i t each frame over the network. T h e A C M E experiment can m o n i t o r the d a t a capture by querying the d a t a stream.  Once the d a t a capture is complete, the d a t a stream's end-of-file flag is  34  One A u d i o Frame (2 x 16-bit channels)  One AudioPacket  F i g u r e 5.6: A n A u d i o P a c k e t contains multiple frames of audio d a t a . E a c h audio frame contains one or more channels of d a t a . E a c h channel contains one audio sample. Here, two channels ( L , R ) of 16-bit (2-byte) samples are illustrated.  raised.  5.3.3  Specialisation to S o u n d D a t a  T h e A C M E sound sensor server is a specialisation of the generic d a t a server model. T h e specialisation is straightforward w i t h one exception that is explained below. Sound d a t a is typically captured from the sound card at a frame rate of 44 100 H z . One frame of sound d a t a contains one or two channels of 16 or 8-bit sound samples (Figure 5.6). A s explained in Section 5.3.2, a lot of processing and d a t a movement occurs w i t h the receipt of each frame of d a t a . A l t h o u g h d a t a is passed by reference where possible, there are inevitably d a t a copies and m e m o r y allocations when new d a t a is passed onto the d a t a stream. T h e r e is also some c o m p u t a t i o n a l overhead each time the trigger analyses the d a t a . T h e generic server model cannot process i n c o m i n g d a t a at audio frame rates; because d a t a is not read quickly enough from the hardware, frames are dropped. T o achieve desired frame rates, audio d a t a must be processed in groups of more than one audio frame. A simple redefinition of "frame" suffices to achieve real-time performance w i t h the generic d a t a server model. Here we define an A u d i o P a c k e t to be an array of one or more audio frames (Figure 5.6). T h e A u d i o P a c k e t becomes the  35  conceptual "frame" that is read from the sound sensor hardware by the sound server. Performance of the sound server using A u d i o P a c k e t s is discussed in Section 5.4.1.  5.4  Implementation  Details  T h e generic d a t a server model is an abstract J a v a class, w i t h interfaces defined for S e n s o r and T r i g g e r objects.  T h e sound server is also written in J a v a with the  exception of the code that reads d a t a from the sound c a r d . Originally, d a t a was to be read from the sound card using the JavaSound A P I from Sun M i c r o s y s t e m s . A t the time of development, JavaSound was a B e t a release (version 0.86).  N o t all required functionality was available, and  frequent  changes to the A P I rendered this option undesirable. Currently, d a t a is read from the sound card using the Microsoft D i r e c t X A P I (version 6.0). A J a v a wrapper was w r i t t e n to interface the D i r e c t X ' C code w i t h the rest of the sound server. E v e r y effort was taken to create a J a v a wrapper that parallelled the JavaSound A P I so that it could be substituted at a later date. T w o triggers are used to capture sound for modelling: a T h r e s h o l d T r i g g e r and F i x e d D u r a t i o n T r i g g e r . T h e T h r e s h o l d T r i g g e r is used as a S T A R T trigger to begin c a p t u r i n g d a t a after the amplitude of the signal exceeds a threshold. T h e F i x e d D u r a t i o n T r i g g e r is used as a S T O P trigger to make each recording the same length. T h i s combination of triggers yields recordings of the impulse responses that are identical in length and contain approximately the same length of silence before each impact.  5.4.1  Performance Evaluation  T h e sound server was subjected to a performance evaluation to guarantee the i n tegrity of the recordings, and estimate bounds on the size of the A u d i o P a c k e t s and transfer buffer. A one-second 1 k H z sinusoidal tone was used as the test stimulus.  36  A cable connected the line-level output of one c o m p u t e r  1  to the line-level input of  a second c o m p u t e r . T h e sound server ran on the second computer and transferred 2  d a t a to a client running on the first computer over an Ethernet connection.  A  F i x e d D u r a t i o n trigger was used as a S T O P trigger to record 2.5 seconds of sound for each t r i a l . In each trial, the size of either the transfer buffer or A u d i o P a c k e t s was varied. T h e test was run five times for each parameter setting w i t h the one-second tone played at a random start time w i t h i n the 2.5 second recording w i n d o w . T h e recorded sound was saved as a P C M wave file and examined visually and audibly for evidence of degradation. Recordings were captured at a 44.1 k H z s a m p l i n g rate. T w o c r i t e r i a are used to subjectively measure performance: the length of the recorded tone, and the presence or absence of "pops". T h e length of the recorded tone is c o m p u t e d manually using a graphical sound editor. If the recorded tone is less than one second, the parameter settings are designated unsatisfactory. Smaller gaps in the recording are experienced as "pops" when heard, and observed as spikes in the spectrogram (Figure 5.7).  Parameter settings producing "pops" are also  unsatisfactory. Results of this evaluation are tabulated in Table 5.1. F r o m this table, it is evident t h a t the audio signal will lose large chunks of d a t a i f the A u d i o P a c k e t size is not great enough. Similarly, "pop"s will be present in the recorded d a t a unless the transfer buffer is large enough. F r o m this data, it is concluded t h a t a transfer buffer of two seconds and an A u d i o P a c k e t size of 100 frames are required for adequate performance.  5.4.2  Limitations  W h i l e performance of the sound server is acceptable, one l i m i t a t i o n remains. U s i n g the D i r e c t X v6.0 A P I prevents us from r u n n i n g the sound sensor server on operating *A dual PII 350MHz running Windows N T with a Creative SoundBlaster Live! sound card. A Pentium 120 M H z running Windows 98 with a Creative SoundBlaster Live! sound card. 2  37  0.5  0  1.S  1  2  Time (sec)  F i g u r e 5.7: Spectrogram of sound containing "pops". Three "pops" are visible in the spectrogram as vertical lines at approximately 1.4, 1.5 and 1.65 seconds. These "pops" are caused by dropped audio frames. T h i s sample was recorded using a transfer buffer of one second, and an A u d i o P a c k e t size of 200. T h e recording parameters were two-channel, 16-bit sound sampled at 44.1 k H z .  Transfer buffer Channels  Bits  A u d i o P a c k e t size  1  16  1  1  16  10  1 1  16 16 16  100  2  16 16 16  size (sees) 1 1  200 1  1 1 1 1 1  T o o short  "Pop"s  Y Y  N N  N  Y  N Y Y Y  N N N N  2 2 2 2 •1 2  16 16 16  200 1000 1 10  1  N N  Y Y  2 2  Y Y  2 2  16 16  100 200  2 2  N N  N N N N  10 100  Table 5.1: Results of the sound server performance evaluation. indicate parameter settings that are successful.  Highlighted rows  Recordings were captured at a  sampling rate of 44.1 k H z on P e n t i u m 120 M H z running W i n d o w s 98.  38  systems other than Microsoft W i n d o w s 9 5 / 9 8 / 2 0 0 0 . A s mentioned i n Section 5.3, this does not prevent us from streaming d a t a to clients on other c o m p u t i n g platforms, but restricts us to W i n d o w s - c o m p a t i b l e sound cards. C u r r e n t l y , W i n d o w s is the most widely-supported operating system for sound cards, so this is not a major l i m i t a t i o n . A l s o , the D i r e c t X J a v a interface maintains the J a v a S o u n d design and class structure to enable easy substitution once the JavaSound A P I is complete.  39  Chapter 6  Building a Prototypical Model 6.1  Overview  T h e hardware and software described in the previous two chapters enables us to acquire multiple samples at any location on the surface of an object. These samples are each represented by a sound m o d e l . T h e sound model may be any one of the impulse response models as discussed in C h a p t e r 2. T h e parameters of the model are computed for each sample using the appropriate estimation technique. In our current implementation, the m o d e l of van den D o e l [30] represents each sample; the parameters of the model are c o m p u t e d by the technique presented in [30]. Refer to Section 2.4 for a brief description of this technique. T h e advantage of a c q u i r i n g m u l t i p l e samples at the same location is t h a t i n strumentation and background noise can be minimised by averaging the samples. A prototypical model, representative of multiple models, can be created at each sample location. T h e p r o t o t y p i c a l m o d e l should be less affected by noise t h a n each i n d i v i d ual model. Such a p r o t o t y p i c a l model is used for c o m p a r i n g the acoustic distance between two sample locations; this use is discussed in C h a p t e r 7. A p r o t o t y p i c a l model is also used for synthesis when the object is simulated. T h i s chapter outlines one approach to generating a p r o t o t y p i c a l model from the available data. T h e approach is an intuitive one and produces satisfying results. 40  W h i l e there are other solutions to this problem which may produce better results, our intention is to provide one example by which to demonstrate the utility of such models.  6.2  Related Work  A l t h o u g h our exact problem appears to be unique, two other fields of audio research have produced related work: speech and speaker recognition and audio morphing. Neither of these fields shares our exact goal, but each is similar in some respect. Speech and speaker recognition researchers have investigated methods for clustering sets of audio d a t a . T h e i r problem is one of classic pattern recognition: given a set of N categories (e.g., different speakers or words), and M training exemplars for each category, classify a new d a t a sample into one of the categories. A t a high level, this is most often accomplished by creating a prototype for each category from the training set, then c o m p u t i n g a distance from the new d a t a to each of the prototypes. These prototypes are typically formed using a modified K - m e a n s algor i t h m on the vectors resulting from Linear P r e d i c t i v e C o d i n g ( L P C ) analysis [24]. Similar methods were implemented on our sound models, but the variability of our models (due to noise) caused unsatisfactory results. A u d i o m o r p h i n g is a process t h a t s m o o t h l y blends one sound into another. If you imagine a c o n t i n u u m between two sounds, an audio m o r p h i n g algorithm can generate a sound at any point on the c o n t i n u u m . T h i s sound contains a proportion of each sound, yet is perceived as a single new sound. Slaney et al. describe one such method in [27].  T h e y claim t h a t cross-fading conventional spectrograms is  not convincing if the two source sounds are not similar in p i t c h . T h e i r approach is to represent sounds using a s m o o t h spectrogram (derived from the mel-frequency cepstral coefficients [24]) and a residual spectrogram which encodes the p i t c h . Interpolation occurs in this higher-order spectral representation which is then inverted to produce the resultant sound. If their m o r p h i n g a l g o r i t h m could be extended to  41  morph between more than two sounds, this approach could be used to generate our prototypical sound models by estimating model parameters from the composite sound produced by the m o r p h . Indeed, a m o r p h i n g a l g o r i t h m is a powerful tool in its ability to weight the contribution of each sound to the morphed result.  6.3  Spectrogram Averaging  T h e approach we use is similar in spirit to the m o r p h i n g a l g o r i t h m of [27], but much less sophisticated. W e are able to use a simpler a l g o r i t h m because we are dealing w i t h a narrow class of sounds:  single i m p a c t sounds which decay exponentially  and are similar in p i t c h . O u r approach is to c o m p u t e the "average" spectrogram of M samples at one sample location, then estimate the model parameters this "average" spectrogram.  from  T h e approach is intuitively satisfying and produces  reasonable sound models. C o m p u t i n g the average time signal of sound samples is non-trivial. Because the modes of the impulse response may not be i n phase across samples, exact alignment in time is difficult.  Phase differences i n t r o d u c e d by inexact alignment will  create a signal which sounds like many separate sounds played together. In contrast, c o m p u t i n g the average spectrogram is easier since the spectrogram contains no phase information. F u r t h e r m o r e , the average spectrogram is a n a t u r a l representation for our task, given t h a t the model parameters are estimated from spectrograms. Since the samples are recorded at the same sample location, we assume the samples have the same pitch, thereby avoiding the problems indicated by Slaney et al. [27]. A l i g n i n g the M spectrograms is straightforward.  Since each spectrogram  represents an i m p a c t sound, they can be aligned by m a t c h i n g the onset of i m p a c t . T h i s onset is represented in the spectrogram as the t i m e frame w i t h the m a x i m u m t o t a l amplitude. Once aligned, the average spectrogram is computed by the mean amplitude of each frequency in each time frame.  42  Since the energy of each impact is not exact, the energy of each signal is normalised prior to c o m p u t i n g its spectrogram ( E q u a t i o n 6.1). E n e r g y normalisation ensures t h a t ^2y(t)  = 1.  2  M a t h e m a t i c a l l y , our approach is also satisfying. If we assume we are recording a signal gi(t)  which is composed of the true signal y(t)  and a r a n d o m addi-  tive noise process n ( i ) , the development in E q u a t i o n 6.2 shows t h a t the average z  spectrum G(u>) is identically equal to the true spectrum Y(u>) i f we also assume a zero-mean noise process . 1  9i{t)  =  y ( 0 + «.•(*)  Gi{u>) =  Y{u)  GH  E  =  +  ' ='IT lf'  Ni(u)  (6.2)  H  >) , M  6.4  M  Performance Evaluation  T o evaluate the effectiveness of spectrogram averaging, a test was conducted using synthetic d a t a . T h i s test is similar to the evaluation of the parameter estimation a l g o r i t h m (Section 2.4.1). F o r this evaluation, M samples were created for each t r i a l by adding noise to sounds synthesised using fifty random modes.  A description of the noise and  synthesised sounds is found in Section 2.4.1. E i g h t signal-to-noise ratios were used: oo (i.e., no noise), 100, 50, 30, 20, 15, 10 and 5. A v e r a g i n g over the M samples produced a spectrogram from which model parameters were estimated. T h e experiment  This characterisation of noise is a common assumption, though unlikely to be correct in our situation. J  43  was conducted using five values of M : 1 (control), 2, 5, 10 and 20. One hundred trials were conducted for each pairing of S N R and M values. T h e resulting sound models are evaluated by the same measures as Section 2 . 4 . 1 ; the metrics are repeated in E q u a t i o n 6.3.  /••/  =  >og|  E  =  logjjj-  Ea  =  logfj-  d  (6.3)  where j is chosen to minimise | / ; — fj \ V j = 1 . . .50. T h e mean error for each signal-to-noise ratio ( S N R ) is plotted by the dashed lines in F i g u r e 6 . 1 .  T h e solid lines are the mean error over a l l eight S N R ' s .  The  colour of each line indicates the number of samples ( M ) used in the spectrogram average. A s F i g u r e 6.1 shows, spectrogram averaging substantially reduces the mean error for i n i t i a l a m p l i t u d e and frequency estimates. yields improvements of 20% and 12% on E  a  E v e n averaging t w o samples  and Ej respectively.  T h e mean error of the d a m p i n g parameter is not significantly reduced by spectrogram averaging. T h e standard deviation of the error is, however, d r a m a t i cally reduced w i t h increasing values of M . A s F i g u r e 6.2 illustrates, the standard deviation of error is reduced by an average of 6 1 % using only five samples.  This  result implies t h a t spectrogram averaging yields a more consistent e s t i m a t i o n of the d a m p i n g parameter.  44  60  0.45 0.4 0.35  1 sample 2 samples 5 samples 10 samples 20 samples  50  1 sample 2 samples 5 samples 10 samples 20 samples  40  0.3 0.25  30 0  0.2  20 • o  0.15. 0.1  0.05  0.1 noise/signal  0.15  0.2  (a) M e a n initial amplitude errors  10.  (E ). a  0.05  0.1 noise/signal  0.15  (b) M e a n frequency errors  0.2  (Ef).  1.5  1 sample 2 samples 5 samples 10 samples 20 samples  0.5  0.05  0.1 noise/signal  0.15  0.2  (c) M e a n d a m p i n g errors (Ed). F i g u r e 6.1: M e a n errors of the parameter estimation a l g o r i t h m on spectrogramaveraged synthetic d a t a . T h e mean error over 100 trials is plotted for eight signalto-noise ratios (oo, 100, 50, 30, 20, 15, 10 and 5) by the dashed lines. A solid line is the mean error over all signal-to-noise ratios. T h e colour of each line signifies the number of samples ( M ) used in the average. F o r convenience, the inverse ratio (noise/signal) is used as the abscissa. See E q u a t i o n 6.3 for definitions of Ej, E and E. a  d  45  60  50;  1 sample 2 samples 5 samples 10 samples 20 samples  40,  30 o  20  —v.i.dt-.:!!.::.::..:.::..-...'.!.,^^  ; LJ...,,,..;..:-.M?  r v:  10  0.05  0.1 noise/signal  0.15  :  ;  0.2  F i g u r e 6.2: S t a n d a r d deviation o f error for estimated d a m p i n g parameters. T h e standard deviation of the mean error over 100 trials is plotted for eight signal-tonoise ratios (oo, 100, 50, 30, 20, 15, 10 and 5) by the dashed lines. A solid line is the mean standard deviation over all signal-to-noise ratios. T h e colour of each line signifies the number of samples ( M ) used in the average. F o r convenience, the inverse ratio (noise/signal) is used as the abscissa. See E q u a t i o n 6.3 for the definition o f E. d  46  Chapter 7  An Adaptive Sampling Algorithm 7.1  Overview  T o completely automate the creation of sound models from measurements, the selection of sampling locations must also be a u t o m a t i c . B y sampling location we mean the location on the surface of an object where it is to be struck by the sound effector. T h e most intuitive approach is to select a uniform grid over the surface of the object. W i t h o u t knowledge of the object's shape, a uniform grid in Cartesian space could also be considered. T h i s was demonstrated in [25] and [26]. A s part of the A c t i v e Measurement facility, we assume that a surface model of test objects is attainable using the Triclops t r i n o c u l a r c a m e r a . W i t h a surface 1  model available, a uniform grid can be projected, and the sample locations uniformly distributed. One question inevitably arises when generating the uniform grid: how finely should we sample?  M a n y everyday objects are composed of different materials  Currently, software to generate surface models from the Triclops' range data is not available. This is ongoing work of the A C M E project. For the examples in this thesis, surface models of test objects are constructed manually. 1  47  1. Select an imsampled vertex as the next sample l o c a t i o n . 2. Strike the object at the sample location and create a sound model. 3. C o m p a r e the new model to the models at all adjoining sample locations. 4. If the acoustic distance between two adjoining models is greater than a perceptual threshold, add a new vertex between these two. 5. Otherwise, repeat from Step 1 until no vertices are unsampled.  F i g u r e 7.1: A d a p t i v e s a m p l i n g a l g o r i t h m . (e.g., a glass jar w i t h a metal l i d ) , or have density, volume and thickness variations across their surface. These properties will cause the sound of the object to change perceptually over small regions of the surface. O t h e r objects (e.g., an eraser) may have a relatively constant sound over the entire surface. Consequently, the selection of a multi-purpose s a m p l i n g density is n o n - t r i v i a l . It cannot be constant, nor can it be pre-determined by object geometry alone. T h e algorithm presented in this chapter changes the density of a sampling mesh based on perceived differences in sound over the surface of an object.  This  algorithm is straightforward and is listed in F i g u r e 7.1. Here, the sample locations are chosen as the vertices of a surface model representing the test object. A t each vertex, a sound model is created. If the acoustic distance between this model and one at an adjoining sample location is greater than a perceptual threshold, a new sample location is added between the two (see F i g u r e 7.2). T h e new sample location should be a vertex in the refinement of the surface.  T h i s procedure continues until the  acoustic distance between all adjoining models is less than a perceptual threshold. Selection of imsampled vertices in the first step of the algorithm is conducted by any heuristic rule. C u r r e n t l y , the vertex nearest the previously sampled vertex is selected. O t h e r possible heuristics include selecting vertices by their connectivity in  48  F i g u r e 7.2: Result of adaptive sampling. T h e sound models at two adjoining sample locations are compared (left). If the acoustic distance between the vertices is too great, the surface edge is refined by inserting a new vertex (right).  the mesh or selecting vertices to minimise the amount of movement by the robots. T h e a l g o r i t h m in F i g u r e 7.1 is simplistic, but relies on three complex components: a procedure for adding new vertices to the surface model, an acoustic distance metric and a perceptually-relevant threshold for this metric. T h i s chapter describes one i m p l e m e n t a t i o n of each of these components. N o t e that these components are research topics on their own and it is not presumed t h a t the implementations discussed herein are o p t i m a l solutions. T h e a l g o r i t h m in F i g u r e 7.1 is applicable to any vertex insertion rule, distance metric and threshold, and should be considered the p r i n c i p a l result of this chapter.  7.2  Related Work  T o our knowledge, this is the first algorithm for adaptively selecting locations on the surface of an a r b i t r a r y object for purposes of sound modelling. Previous work on modelling of musical instruments from empirical measurements is vague in its selection of sample locations; two examples are [2] and [4]. In [2] a guitar is "thumped" on the bridge by a "sharp instrument" to record measurements for estimates of a body m o d e l . A physics-based model was used to estimate the sample location  49  of plucked strings from the recordings. In [4], strings of musical instruments were struck using a force hammer at the point on the instruments' bridge were the strings made contact. Rules for the addition of new vertices to a surface have been explored in m u l t i resolution surface research.  M u l t i - r e s o l u t i o n surfaces are appealing i n computer  graphics because their level-of-detail nature simplifies editing and provides scalable render q u a l i t y which can speed computer a n i m a t i o n development. W e have selected a subdivision surface representation, using the L o o p [19] scheme for vertex insertion. A brief description is provided in Section 7.4. D i s t a n c e metrics for sound have been explored in several d o m a i n s .  Early  efforts in this research were in the field of speech recognition. A classic survey of techniques is [11]. D u b n o v and T i s h b y explored c o m p a r i n g musical sounds using a spectral and bispectral acoustic distortion measure [7]. T h e i r approach is to measure statistical similarity using the K u l l b a c k - L i e b l e r divergence between models. M o r e recent work, which focused on creating a "content-based" sound browser (Muscle F i s h ) , uses two levels of features to form a complex parameter space w i t h i n which distances are computed [15, 33]. One set of features is extracted from each frame of audio d a t a : loudness, pitch, brightness, b a n d w i d t h and mel-filtered cepstral coefficients ( M F C C s ) . T h e time series of these frame values provides a second set of features:  the mean, standard deviation and derivative of each frame-level  parameter. T o our knowledge, none of the metrics listed above have been used to determine perceptual thresholds on similarity. B y "similarity" we mean in the context of perceiving two sounds as being produced by the same object. In our a p p l i c a t i o n , we require a metric by which we may decide if two similar sounds are perceived as separate objects a n d / o r materials. Some research has examined the factors by which differences i n material and geometry are perceived from audition [8, 9, 12, 16]. W h i l e  50  this research does not examine acoustic distances directly, it does provide e m p i r i c a l results indicating which parameters of a sound model are relevant to material and geometric perception. Specifically, the analysis in [16] suggests a metric by which model parameters are related to perceptual similarity. Because e m p i r i c a l thresholds are available for this metric, it was selected for use in this i m p l e m e n t a t i o n of the adaptive s a m p l i n g a l g o r i t h m . Details of the metric are stated in Section 7.5; Section 7.6 contains a discussion relating the results of [16] to thresholds on this metric useful to our application.  7.3  Requirements  T h e s a m p l i n g a l g o r i t h m must select points over the surface of the test object in a dense-enough mesh so as to capture perceptually relevant differences in sound over the entire surface. Satisfaction of this requirement is difficult to quantify and requires future perceptual studies.  7.4  Surface Representation  A s mentioned in Section 7.2, a subdivision surface representation is used to facilitate vertex insertion. A subdivision surface represents a smooth surface as the l i m i t of a sequence of successive refinements of a coarse mesh [6]. T h a t is, it represents a coarse a p p r o x i m a t i o n to the true (smooth) surface with a finite number of vertices, yet can be systematically refined by adding new vertices until, in the l i m i t ,  the  s m o o t h surface is produced. Several different schemes exist for specifying the location of vertices added d u r i n g refinement.  T y p i c a l l y , the locations of new vertices are weighted combina-  tions of neighbouring coarse vertices' positions. W e use the L o o p scheme as first proposed by Charles L o o p [19]. T h e L o o p scheme uses the vertex masks in F i g u r e 7.3 to calculate the position of vertices in the next level of refinement. These masks are  51  F i g u r e 7.3: L o o p scheme refinements masks for subdivision. M a s k s on the left are used for edge refinement; masks on the right are used for vertex refinement. (3 is chosen to be £ ( 5 / 8 - ( § + \ cos ^ ) ) . 2  52  applied to a coarse set of vertices to calculate the position of vertices in the next level of refinement. T h e mask on the left is used to insert a new vertex on the edge between two coarse vertices. T h e mask on the right is used to refine the position of each coarse vertex. T h e limit position of interior vertices is computed by replacing (3 in Figure 7.3 with  x  =  3/8/3+  n  l ^ l - F o r vertices on the boundary of a surface, the  limit position is computed by changing the coefficients of the even-vertex boundary mask to j l , | and ^ [6]. The details of the refinement scheme, and its effect on the geometry of the mesh, are not i m p o r t a n t to our application; the refinement scheme simply defines a hierarchy of successive edge refinements. Interested readers should consult [6] for a thorough introduction to subdivision surfaces. In our implementation, we begin by s a m p l i n g the sound at the limit position of each vertex in the coarsest representation of the surface mesh.  T h e acoustic  distance between vertices joined by an edge of the mesh is then computed. If the acoustic distance is too great, the vertex which is the refinement of the joining edge is added to the list of vertices to be sampled. The  m i n i m u m vertex-spacing required for adequate surface  representation  and refinement dictates the m a x i m u m spacing of the sound sampling mesh.  For  some simple-sounding objects, this may produce an overly dense sampling mesh. T h i s is not a large concern for applications such as simulation since the denser mesh will rely less on interpolation between sound models d u r i n g synthesis.  7.5  Acoustic Distance Metrics  A s prescribed by the adaptive sampling a l g o r i t h m (Figure 7.1), an acoustic  distance  must be calculated between each pair of vertices joined by an edge in the surface. T h e calculated distance will be compared to a perceptual threshold to make a refinement decision. T h e selected distance metric must therefore be indicative of the perceptual proximity of two sounds.  53  M o s t applications of acoustic distance metrics discussed by the literature in Section 7.2 use distance to make relative comparisons. F o r example, classifying a new sound by choosing the group of exemplar sounds to which it is nearest. O u r application is different because we need to determine i f two models will be perceived as different materials or shapes once synthesised.  T h e realism of a sound model  requires that the sound produced over the surface of an object is not discontinuous in regions where it should vary smoothly. T h e distance metric must therefore be applicable to an appropriate perceptual threshold.  F u r t h e r discussion of such a  threshold is delayed until Section 7.6. M o s t sound metrics calculate a distance directly from waveforms. We chose to use a metric derived from model parameters instead. M u c h of the work on material perception and discrimination discusses discriminants in terms of model parameters. It is therefore convenient to express the distance metric on these parameters as well. Specifically, we use a logarithmic ratio of the frequency-independent d a m p i n g coefficient and a ratio of m o d a l frequencies as metrics. These metrics were suggested by the research of K l a t z k y et al. in their paper on auditory material perception [16]. Sections 7.5.1 and 7.5.2 describe these metrics i n greater detail.  7.5.1  Frequency-Independent D a m p i n g Coefficient  A s stated previously, and repeated in E q . 7.1, the sound model is composed of Nj frequency modes, each with a distinct frequency ( w , i ) , initial amplitude (a ,i) and x  x  d a m p i n g coefficient (d ,«)x  N  f  p(x,t) = X > x , , - e - * d  i ( O  sin(u;x,,-0  (7.1)  i=i It is theorised t h a t the d a m p i n g coefficient in E q . 7.1 (d ,;) is related to a x  material's internal friction parameter (0) by E q u a t i o n 7.2 [9, 12, 16].  54  d  X | l  - = 2w -tan(<£)  (7.2)  X)l  F u r t h e r m o r e , it has been demonstrated that the internal friction  parameter  ((f)) is an a p p r o x i m a t e shape and frequency-independent m a t e r i a l property [18]. A recent study [16] has shown that a frequency-independent d a m p i n g factor (p = is a perceptually useful discriminant of m a t e r i a l .  7rtan((^>))  A n estimate of this  parameter (/5 ;) can be calculated from the estimated d a m p i n g coefficient (d ,t) of x  Xj  each frequency mode:  =  (7  ~2^T  '  3)  cf i is only an estimate at one frequency and is subject to noise and variXj  ability. A better estimate of the frequency-independent d a m p i n g coefficient (p ) is x  c o m p u t e d as the median of p ; for all frequency modes i. G i v e n m u l t i p l e samples X i  at a single vertex, the estimate of the d a m p i n g coefficient is improved by calculating /9 as the median of /5 over all samples. X  X  T h e distance between two vertices' d a m p i n g coefficients (pA and PB) is expressed as a l o g a r i t h m i c ratio:  Distance*(A,  B) = log —  (7.4)  PB  7.5.2  Frequency Similarity  U s i n g frequency similarity as a distance metric is supported by recent perceptual studies. K l a t z k y et al. found frequency and d a m p i n g to be "independent determinants of s i m i l a r i t y " [16]. T h e results of their study show that frequency and decay are used independently when classifying  when judging  material.  the similarity  of sounds, but in  combination  A d d i t i o n a l l y , experiments by Hermes [12] and G a v e r [9]  support the theory that frequency is an i m p o r t a n t perceptual feature for material, shape and size estimation.  55  • F o r each frequency mode u>A,i in model A . . . 1. F i n d the frequency mode uiB.k nearest to u>A,i 2. If the u>B,k is unmatched, match it to UA,% 3. O r , i f u>s,k is matched to another mode which is farther t h a n u>A,i, match ^B,k to UA,% instead. M a r k uA.i as matched, and the mode it replaces as unmatched. 4. Otherwise, loop to step 1, picking the next nearest frequency mode • Repeat until all frequency modes in model A are matched.  F i g u r e 7.4: A l g o r i t h m to find unique frequency mapping between two models. W e define the frequency distance between two models by E q u a t i o n 7.5. M o d e s are matched between models using the algorithm in F i g u r e 7.4.  T h i s al-  g o r i t h m guarantees t h a t each mode in model A will be uniquely matched to a mode in model B . Details of the implementation are included in A p p e n d i x C .  Dxstanc^iA, where u>A,i and us,i  B) =  "  B  (7.5)  are the frequencies of modes matched i n models A and B by  the a l g o r i t h m in F i g u r e 7.4. T h e frequency distance between one model and a collection of models is computed as the frequency distance ( E q u a t i o n 7.5) between the single model and the p r o t o t y p i c a l model of the collection. C h a p t e r 6 describes the process of creating a p r o t o t y p i c a l model.  7.6  Perceptual Thresholds  A s mentioned in Sections 7.2 and 7.5, the distance metrics selected for this implementation were suggested by the analysis in [16]. T h e m o t i v a t i o n for this selection  56  Experiment  Intercept  1 2  88.7 96.2  mean  92.45  Freq. Diff. Coeff.  Decay Difr. Coeff.  Product Coeff.  R  -0.56 -0.50 -0.53  -1.02 -1.06 -1.04  0.30 0.31 0.305  0.76 0.80 0.78  2  Table 7.1: Regression of s i m i l a r i t y on frequency and decay difference [16]. T h e mean of the values for the two experiments is also listed.  was the availability of empirical d a t a rating the perceptual similarity of sounds by these metrics. T h e study of K l a t z k y et al. synthesised sounds.  asked subjects to rate the similarity of two  In their first two experiments, subjects rated how likely t w o  sounds were to have been produced by the same material, regardless of shape. In the t h i r d experiment, subjects rated the relative length of the bars t h a t were purported to have produced the synthesised sounds. W h i l e these tasks are not identical to the ours, we feel they are similar enough to provide suitable thresholds in the absence of more appropriate d a t a . T h e appropriateness of the metrics and thresholds to our application can be properly evaluated only by a perceptual study of the synthesised models. T h e results of K l a t z k y et a/.'s first two experiments are summarised in T a ble 7.1. T h e average of the two experiments is listed in the last row of the table. T h e regression coefficients express the change in perceived similarity for a unit change i n the logarithmic ratios of fundamental frequency and decay constant. T h e similarity ratings are on a scale of 0 to 100, but the perceived r a t i n g of identical sounds was found to be approximately 92.45 (mean intercept). Refer to [16] for more details of the experiments and results. T h e regression analysis of Table 7.1 suggests E q u a t i o n 7.6 as an appropriate threshold expression. Here, the distances are weighted by the regression coefficients, and the sum is compared to a s i m i l a r i t y threshold (S). T h e c o n t r i b u t i o n of the three distances could be further weighted by additional empirical constants.  57  92.45 - Distance J A, B) * 1.04 - Distance (A, w  P K  B) * 0.53+  '  (7.6)  Distance (A, B) * Distance (A, B) * 0.305 < S p  w  In our experiments we successfully applied an alternative set of threshold expressions. In E q u a t i o n 7.7, the coefficients from Table 7.1 are scaled by constant similarity factors (S ) x  to form separate thresholds for each distance metric.  The  similarity factor adjusts the a m o u n t of perceived material dissimilarity permitted before a refinement is required.  Distance (A,B)  >  1.04 * S  Distance (A, B)  >  0.53 * S  >  0.305 *  P  w  Distance (A, B) * Distance (A, B) P  w  p  (7.7)  w  S  puj  In our i m p l e m e n t a t i o n , if any of these thresholds is exceeded, a refined vertex is added. We chose the set of threshold expressions in E q u a t i o n 7.7 because they provide flexible control over the influence of each distance metric. E q u a t i o n 7.6 is a more correct threshold on the s i m i l a r i t y results of the experiment of K l a t z k y et al., but given that the s i m i l a r i t y judgement in their experiment is not identical to our requirements, we find E q u a t i o n 7.7 to be an acceptable alternative. Currently, a empirical value of S  x  all distances.  = 0.75 is used as the similarity factor on  T h i s value was estimated by reviewing coarse d a t a collections and  manually selecting which models required refinement.  A s an example, Table 7.2  summarises the calculated s i m i l a r i t y factors of models collected from three objects: a brass vase, glass wine bottle and plastic speaker.  T h e last row of Table 7.2  compares two locations on the brass vase. O n l y ten frequency modes were used to model the wine bottle and plastic speaker.  F o r t y modes were used to model the  brass vase. M o s t of the distances reported agree w i t h expectation. A l t h o u g h the distance between the brass vase and wine bottle are low, it is not a surprising result when the brass vase is synthesised w i t h only ten modes, since it then sounds very  58  Object A  Object B  S i m i l a r i t y F a c t o r (S) Frequency Decay Product.  Brass vase Plastic speaker W i n e bottle  P l a s t i c speaker W i n e bottle Brass vase  1.14 1.05 0.267  0.781 0.585 0.196  1.61 1.11 0.0943  Brass vase (I)  Brass vase (II)  0.0919  0.000125  0.0000208  Table 7.2: E x a m p l e s of calculated similarity factors.  similar to the wine bottle. F u t u r e perceptual studies could be used to determine a less ad hoc value of S. It should be noted t h a t the frequency distance of E q u a t i o n 7.5 is not identical to the expression used to compute the coefficients in Table 7.1. In their experiments, K l a t z k y et al. compared only the fundamental frequency of their synthesised s t i m uli.  Here however, we compare the average of all frequency modes of the model,  but weight their c o n t r i b u t i o n to the average by their initial amplitude. T h i s approximation has proved successful, as will be illustrated by the sample collections in C h a p t e r 8.  59  Chapter 8  Sample D a t a Collections 8.1  Overview  T h i s chapter presents the results of four sample d a t a collections. T h e first collection, a t u n i n g fork, is meant as a calibration experiment. A surface model of the t u n i n g fork is not used, nor is the adaptive s a m p l i n g a l g o r i t h m of C h a p t e r 7. F o r the other three objects, a brass vase, plastic speaker and toy d r u m , the entire system is used to build a sound model. The objects were selected to provide examples of a variety of materials. A l s o , since the shape-acquisition component of A C M E is not completed, we required objects w i t h geometries that could be easily modelled manually. E a c h surface model was constructed from m a n u a l measurements using 3 D modelling software. E a c h test object was mounted on the A C M E test station. Objects whose diameters are smaller than the diameter of the test station could not be sampled below heights of 30 m m due to collisions between the sound effector and the test station. F o r future collections requiring complete coverage, objects could be raised on a narrow pedestal. The following four sections discuss the test objects, experimental setup, and the results of the d a t a collections.  60  Coarse Vertices 1  Microphone Distance (mm)  N u m b e r of Modes  5  P l a s t i c speaker  136 98  Toy d r u m  31  Object N a m e T u n i n g fork Brass vase  N u m b e r of Samples  5  Similarity Threshold N/A  190 40  40 10  0.75 0.75  5 5  130  40  0.75  5  5  Table 8.1: S u m m a r y of setup parameters for test objects.  F i g u r e 8.1: Setup for acquiring model of t u n i n g fork.  8.2  F i g u r e 8.2: Recorded spectrogram,  Tuning Fork  A n A-440 t u n i n g fork was used as a calibration object. T h e t u n i n g fork was mounted on the A C M E test station as illustrated in F i g u r e 8.1. A s is also shown, the m i crophone was located 5 m m from the nearest tyne. T h e t u n i n g fork was struck five times at one location to produce five three-second  recordings. A  five-mode  proto-  typical model was created from the five recordings. Table 8.1 summarises the setup parameters for all of the test objects.  8.2.1  Estimation  Results  F i g u r e 8.2 displays the spectrogram of one recording. T h e 440 H z tone is present as are several overtones at 9 430 H z and 10 120 H z . W h e n the t u n i n g fork is struck, only the overtones are audible from a distance. A t its close proximity, however, the  61  1  0.5  0  —1.5  .  .2  Time (sec)  (a) Synthesised spectrogram.  (b) Close up.  F i g u r e 8.3: T h e figure on the left (a) shows the spectrogram of a sound synthesised from the p r o t o t y p i c a l model of the t u n i n g fork. T h e figure on the right (b) is a closeup of the spectrogram showing the estimation of the fundamental frequency at 430.7 H z . B a c k g r o u n d noise contributed a false mode at 215.3 H z .  microphone was able to record the low amplitude 440 H z tone. F i g u r e 8.3 contains two spectrograms produced by a sound synthesised from the p r o t o t y p i c a l model. A s both spectrograms show, the p r o t o t y p i c a l model contains relatively accurate estimates of the fundamental frequency and the overtones present in the original spectrogram. T h e mode representing the 440 H z tone was estimated at 430.7 H z . T h e frequency was estimated from a 1024-point Discrete Fourier Transform ( D F T ) , w i t h a frequency resolution of 43 H z . Since the estimation is w i t h i n 43 H z of the true frequency, the test was a success.  8.3 The  Brass  Vase  brass vase displayed in F i g u r e 8.4 (a) was the first test object for which a  complete sound model was generated.  T h e subdivision surface in F i g u r e 8.4 (b)  represented the vase for the adaptive sampling a l g o r i t h m . T h i s coarse mesh contains 136 vertices. F o r t y frequency modes were estimated at each sample location, and prototypical models were produced using five recordings at each location. The  vase was secured to the A C M E test station using t h i n double-sided  62  63  tape. T h e microphone on the field measurement system ( F M S ) was used to record the samples, and was located 190 m m behind the vase (see F i g u r e 8.5).  Early  experiments using the microphone mounted on the sound effector produced poor sound models due to the transient effects of clipping and echoes. These effects are diminished by recording in the far field. A t each sample l o c a t i o n , the vase was struck normal to the surface by the sound effector.  8.3.1  Estimation Results  Figure 8.6 is a comparison of spectrograms of synthesised sounds and recorded samples at three positions on the vase. W h i t e noise was added to the synthesised sounds at a signal-to-noise ratio a p p r o x i m a t i n g the signal-to-noise ratio of the recording. T h e addition of noise creates spectrograms that are more comparable to the originals. A p p e n d i x B discusses this technique w i t h examples. T h e frequency modes were estimated quite accurately at each location i n F i g u r e 8.6. A u d i b l y , the synthesised sounds at most sample locations on the vase were comparable to the recordings. M o s t often, any difference in the sounds was a lower perceived p i t c h . E v i dence of this is present i n the spectrograms of F i g u r e 8.6, particularly at Z = 90 mm.  A narrow band of high energy noise is visible in the recorded spectrogram  from a p p r o x i m a t e l y 0 to 300 H z (Figure 8.7). T h i s band of noise was estimated as a mode in the model at 215 H z w i t h a very small d a m p i n g constant (0.452). In fact, this mode's d a m p i n g constant is smaller than any other of the modes by at least one order of magnitude. T h o u g h background noise is concentrated between 0 and 300 H z , a moderate level of noise is also present in a band from 300 to 600 H z (Figure 8.7). T h i s band of noise artificially reduced the estimated d a m p i n g constants of frequency modes w i t h i n that range. A s an example, two modes were estimated at 646.0 H z and 473.7  64  0  01  02  03  04  06  OS  07  0)  OS  Time (sec>  0  01  03  03  04  05  00  07  OS  09  Time lsec\  (a) Recorded spectrogram (Z = 90 m m ) ,  (b) Synthesised spectrogram (Z = 90 mm)  0.4  OS  0.0  Time (sec)  (c) Recorded spectrogram (Z = 61 m m ) ,  (d) Synthesised spectrogram (Z = 61 m m )  (e) Recorded spectrogram (Z = 45 m m ) ,  (f) Synthesised spectrogram (Z = 45 m m )  Figure 8.6: Results of brass vase experiment. Spectrograms of recorded samples and those synthesised from p r o t o t y p i c a l models are compared at three positions on the brass vase: Z = 90, 61 and 45 m m . W h i t e noise was added to the synthetic sounds to produce more comparable spectrograms.  65  0.4  Time (sec)  0.4  Time (seel  F i g u r e 8.7: D e t a i l of narrow-band low-frequency noise. A high-energy band of noise is visible between 0 and 300 H z . M o d e r a t e levels of noise are also visible from 300 to 600 H z .  Time (sec)  F i g u r e 8.8: Effect of noise on lowfrequency modes. T h o u g h recorded low-frequency modes are audible for approximately 0.1 seconds, moderate noise levels sustain the estimated modes to approximately 0.44 and 0.75 seconds.  H z w i t h d a m p i n g constants of 8.6 and 4.5 respectively.  T h o u g h recorded modes  in this range typically lasted approximately 0.1 seconds, these estimated  modes  remain at significant amplitude (> -3 d B ) for 0.44 and 0.75 seconds (Figure 8.8). T h i s artificial sustain of low-frequency modes also contributes to the lower perceived pitch. T h e signal-to-noise ratio for these recordings was in the range of 30 to 40.  8.3.2  Refinement  Results  Refinement of the sampling mesh by the adaptive sampling algorithm is illustrated in F i g u r e 8.9. Table 8.2 summarises the results of refinement for all the test objects. A n interesting result of the vase's refinement is its unusual asymmetry. Since the vase is approximately circularly s y m m e t r i c , it was expected t h a t the  refinement  pattern would also be s y m m e t r i c .  refinement  A s shown in F i g u r e 8.9, however,  varied greatly. T h e image on the left (a) shows many refined sampling locations, while the image on the right (b) is almost void of refinements. There are two possible  66  N u m b e r of vertices Object N a m e  Coarse  Rejected  0 14 14  1  0  Brass vase  136  56  P l a s t i c speaker  98  41  T u n i n g fork  Missed  4 0 Toy d r u m 31+ | N u m b e r of vertices of th e top surface.  Table 8.2: S u m m a r y of refinement results.  Added N/A  Sampled 1  93  173 85  28 163  190  T h i s table summarises the number of  coarse, rejected, missed, added and sampled vertices for all of the test objects. A rejected vertex is one which is outside the working envelope of the P u m a a r m . Missed vertices were counted when no force was sensed at a sample location (i.e., a hole).  Figure 8.9: Refinement results of brass vase experiment. T h e refined sample locations are plotted on the surface model. Despite geometric symmetry, refinement patterns are not s y m m e t r i c as the two views (a) and (b) illustrate.  Vertices are  colour-coded by revision number; black, grey and white represent coarse, first and second refinements.  67  explanations.  F i r s t , it is possible t h a t the vase is not acoustically s y m m e t r i c . If  so, this example is a strong argument for the necessity of an adaptively s a m p l i n g a l g o r i t h m . A l t e r n a t i v e l y , it may be t h a t the acoustic distance between coarse models is very close to the threshold. If so, the variability of parameter estimation may be sufficient to occasionally increase acoustic distances above the threshold. G i v e n the regular pattern of revision apparent i n F i g u r e 8.9 (a), this explanation is unlikely. O n e aspect of the vase's geometry introduced a difficulty for the system: there are three rows of s m a l l holes around the mouth of the vase. T h e experiment is programmed to identify missed sample locations if no contact is sensed.  With  this particular object, though, the outer case of the solenoid often contacted the sides of a hole even though the plunger passed through. W h e n this occurred, the system acquired samples of n o t h i n g . O f course, these degenerate samples introduced refinements around these holes. A l t h o u g h this may result in overly dense s a m p l i n g of the top r i m , it also increases the likelihood that the surfaces surrounding the holes will be sampled.  8.4  Plastic Speaker  A complete sound model was also generated for the small speaker shown in F i g ure 8.10 (a). T h e speaker is completely plastic, w i t h the exception of a metal grill covering the front face. A cube mesh w i t h 98 vertices represented the speaker for the adaptive sampling algorithm (Figure 8.10 (b)). A l t h o u g h the speaker's surface could be adequately described by fewer vertices, interior vertices were added to seed the refinement of the adaptive s a m p l i n g a l g o r i t h m . Since the sounds at the corners of the speaker are similar, refinement would be unlikely if only corner vertices were used to represent the surface. Ten frequency modes were estimated at each sampling l o c a t i o n .  Prelimi-  nary experiments determined t h a t ten modes sufficiently represent the sound of the speaker at most locations. F i v e recordings at each sample location were used to  68  (a)  (b)  F i g u r e 8.10: A photo (a) of plastic speaker and the subdivision surface which represents it (b).  create prototypical models.  T h e sound effector struck the speaker normal to the  surface at each sample location. T h e speaker was secured to the A C M E test station using double-sided tape along the b o t t o m edges. T h e microphone on the F M S recorded the samples from a distance of approximately 40 m m . Because of the low amplitude of the impact sounds, the signal would be d o m i n a t e d by room noise at larger distances.  8.4.1  Estimation Results  T h e speaker was a problematic test object for two reasons. F i r s t , the contact sounds it produces are quiet and decay quickly. L o w amplitude is a concern since it decreases the signal-to-noise ratio. A s proven by the evaluation of the estimation algorithm (Section 2.4.1), estimation accuracy degrades d r a m a t i c a l l y w i t h increasing noise levels. A l s o , because the sound decays quickly, the sound of the solenoid's return is sometimes present in the recordings. W h e n the solenoid returns after impact, the plunger often strikes the side of its exit hole. Normally, this chatter is quiet enough, or the microphone far enough, t h a t it is not recorded.  Because the microphone  needs to be so close to the speaker, however, the solenoid's sound is recordable.  69  o  o.i  0£  03  0.4  0.5  0.4  0.7  0.8  09  0  ::  03  Time (sec)  0*  05  •  07  09  nt  Time (seel  (a) Recorded spectrogram.  (b) Synthesised spectrogram.  F i g u r e 8.11: Results of plastic speaker experiment. A spectrogram of a sample recorded at the top of the speaker (a) is compared to a spectrogram of a sound synthesised from a ten-mode prototypical model (b).  C h o i c e of an acceptable recording distance is therefore a trade-off between good signal amplitude, and recording the sound of the solenoid. T h e problem of microphone distance was further complicated by the hardware surrounding the microphone on the F M S (i.e., the camera, Triclops and p a n / t i l t u n i t ) . Frequently, the microphone could not be moved closer to the strike location because the P u m a a r m would collide w i t h the F M S hardware. T h e second problem w i t h using the speaker as a test object is registration. Because surface model creation is not yet automatic, the object must be manually registered to the stage for correspondence w i t h the surface model. W i t h objects that are circularly s y m m e t r i c , s m a l l imprecision is tolerable. A square object, however, must be more precisely positioned. D u r i n g the test it was noted that the speaker was not accurately positioned and the edges of each face were not reliably struck at a normal angle. It is hoped that this problem will be eliminated by the development of a shape acquisition module for A C M E . Despite these difficulties, good sound models were produced at many sample locations.  T h e spectrograms in F i g u r e 8.11 illustrate the similarity between  recorded and synthesised sounds. T h e model is clearly an accurate  70  the  representation  of the recorded sample.  Some models suffered the same pitch-lowering effects as  discussed in Section 8.3.1. A d d i t i o n a l l y , T h e sound models of metal grill were generally not audibly similar to the recordings, since their signal-to-noise ratios were poorer. In comparison to the brass vase, the speaker's sound model has a narrower b a n d w i d t h and sharper decay. T h i s result supports the perceptual studies of K l a t z k y  et al. [16]. 8.4.2  R e f i n e m e n t Results  T h e sound of the plastic speaker is mostly uniform, though it does vary from the edge to the middle of each face. Since the middle of each face is unsupported, the sound is generally lower in frequency than the edges. W e expected this variation to trigger some refinement of the sampling mesh. A s is illustrated in F i g u r e 8.12, very little refinement was required. F u t u r e experiments could investigate lower similarity thresholds and a coarser surface model as stimulants of refinement.  8.5  Toy  Drum  T h e fourth test object, a toy d r u m , was selected for its diversity i n sound across its surface. T h e d r u m (Figure 8.13 (a)) is a child's toy, made of plastic w i t h three metal bars suspended across a slot in the top face. E a c h metal bar has a different length and therefore different frequency. A complete model of the d r u m could not be created for the experiment due to restrictions of our subdivision surface loader. Instead, the d r u m is a p p r o x i m a t e d by a simple c y l i n d r i c a l mesh (Figure 8.13 (b)). Because the handle of the d r u m is not modelled by the surface, we were unable to create a sound model for the entire d r u m . W e instead created a sound model of only the top face in order to show the results of refinement on an acoustically complex object. F i v e samples were recorded at each sample location, and forty modes were 71  F i g u r e 8.12: Refinement results of plastic speaker experiment. Very few refinements were made on the speaker, w i t h most refinements o c c u r r i n g near the edges. refinement pattern for the top of the speaker is shown here.  The  Vertices are colour-  coded by revision number; black, grey and white represent coarse, first and second refinements.  72  F i g u r e 8.14: W h e n measuring near the edge of the bars, the plunger often caused the bars to pivot on their supports. W h e n the plunger retracted, the bars would return to their n o m i n a l positions, reducing the distance between the plunger and the bar. estimated for each model. Similarly to the brass vase and speaker, the d r u m was affixed to the A C M E test station using double-sided tape on its b o t t o m edges. T h e microphone on the F M S was again used to record the samples at a distance of 130 m m , and the d r u m was struck normal to its surface.  8.5.1  Estimation Results  T h e toy d r u m ' s construction introduced two problems which affected the quality of the sound models. Since the metal bars are supported only along their central axis, they are able to pivot around that axis (Figure 8.14). Unfortunately, when the sound effector measured locations near a bar's edge, the bar moved a few millimetres upon contact, then returned once the sound effector was retracted to strike. T h i s movement reduced the distance between the plunger and the metal bars and produced uncharacteristically damped sounds.  T h e second difficulty arose from the spaces  between the metal bars. If a sample location lay in one of those spaces, no sound model was created.  T h i s absence prevented any refinement between t h a t sample  location and adjoining vertices. Since these holes lay between the bars, adaptive refinement between the bars d i d not always occur as expected. A p a r t from the effects just listed, most of the sound models were successful. E x c e p t i o n a l l y good models were produced for the metal bars when they were struck  73  0  01  0.S  0.3  0.4  O.S  06  0.7  08  0  0.9  0.1  Time (sec)  0.2  0.3  0.4  O.S  06  0.7  0.8  0.9  Time (seel  (a)Recorded spectrogram.  (b) Synthesised spectrogram.  F i g u r e 8.15: Results of toy d r u m experiment (metal). T h e spectrogram of a recorded sample of the middle metal bar is shown on the left (a). T h e spectrogram of a sound synthesised from the prototypical model at the same location is shown on the right (b).  near their centers. F o r example, F i g u r e 8.15 illustrates the fidelity of the model for the middle bar. W i t h the exception of the noise effects mentioned previously, the spectrograms are nearly identical. Results of modelling the plastic surface were acceptable, though not as successful as the metal bars. A s F i g u r e 8.16 demonstrates, the frequency spectrum was typically correct, but the d a m p i n g parameters were often inaccurate. One a d d i t i o n a l consequence of the construction of the d r u m is that the metal bars often resonated when the plastic was struck. T h o u g h a minor effect, it may have contributed to the sustain of some modes. M o r e likely, the p r i m a r y reason for poorer estimation is the lower amplitude response of the plastic.  8.5.2  Refinement Results  Results of the refinement are plotted in F i g u r e 8.17. A s expected, the model was refined substantially, especially around the interface between the metal bars and plastic (Figure 8.17 (b)). Unfortunately, several missed sample locations occurred at gaps between metal bars on the right side.  74  Slight inaccuracies in the m a n u a l  (a) Recorded spectrogram.  (b) Synthesised spectrogram.  F i g u r e 8.16: Results of toy d r u m experiment (plastic). T h e spectrogram of a recorded sample of d r u m ' s plastic surface is shown of the left (a). T h e spectrog r a m of a sound synthesised from the prototypical model at that location is shown on the right (b).  registration of the d r u m caused some sample locations on the right side to lie between bars, but not on the left side. Regardless, it is clear from the left side of the diagrams t h a t the sampling mesh is denser near material boundaries. T h i s result is convincing evidence of the success of the adaptive sampling a l g o r i t h m .  75  (a) Sample locations plotted on surface.  (b) D i a g r a m of refinement.  F i g u r e 8.17: Refinement results of toy d r u m . T h e refined sample locations are plotted on the surface model in (a). Vertices are colour-coded by revision number; black, grey and white represent coarse, first and second refinements. A simplified diagram (b) also shows sample locations that were missed (diamonds). T h e location of the m e t a l bars is indicated by the vertical rectangle. Here refinement levels are represented as 'o', ' + ' and ' x ' in ascending order of refinement level.  76  Chapter 9  Conclusions 9.1  Overview  A system to a u t o m a t i c a l l y create sound models of everyday objects was designed and constructed.  T h i s system uses a surface model of a test object to adaptively  select sample locations. A t each of these locations a device, called a sound effector, strikes the object to elicit an acoustic impulse response. M u l t i p l e i m p a c t s are used to create a p r o t o t y p i c a l sound model which best represents the sound at t h a t l o c a t i o n . T h e hardware, software and algorithms required for this system were implemented and tested. O v e r a l l , the sound models produced of the t u n i n g fork, d r u m , speaker and toy d r u m are encouraging. These results demonstrate that excellent sound models can be constructed under favourable noise and impact conditions. F o r example, the fundamental frequency of the t u n i n g fork was accurately estimated w i t h i n the limits of the discrete Fourier transform. C o n s t r u c t i o n s such as the d r u m ' s metal bars pose a difficult p r o b l e m . S t i l l , the system performs reliably for " r i g i d " objects - a characteristic of m a n y everyday objects. T h e Achilles Heel of the system is noise. E v a l u a t i o n of the parameter estim a t i o n algorithm in C h a p t e r 2 revealed large errors in estimation for even modest 77  levels of background noise.  E v e n models with relatively accurate frequency esti-  mates contained spurious low-frequency modes and incorrect delay constants due to noise. These effects are perceived as a lower pitch of the synthesised sounds. P r o t o t y p i c a l models are one way to reduce the effect of noise. E v a l u a t i o n of the spectrogram averaging technique showed a significant reduction in mean estimation error. T h e adaptive s a m p l i n g a l g o r i t h m presented in C h a p t e r 7 suffers a sensitivity to noisy models and missed sample locations. Its refinement of the vase and d r u m ' s sampling meshes, however, are evidence of its potential. Improvement of the models and acoustic distance metric should improve future results.  9.2  Future Work  A s with all research, this thesis has created many new opportunities for future work. T h i s section suggests avenues for future research and development of the system. U l t i m a t e l y , environmental noise must be removed at its source.  Either a  sound-proof enclosure must be constructed around A C M E , or it must be located i n a room w i t h o u t machines. Solenoid noise also presented a problem for "low-amplitude" materials. Since the sound is produced by chatter between the plunger and the solenoid casing, it is presumed that the solenoid must be replaced by a special-purpose device. A n alternate approach is to coat the solenoid plunger w i t h rubber. T h e original design of the sound effector suggested a replaceable t i p . C u r rently, a hard steel t i p is used to produce a sufficiently impulsive i m p a c t . O t h e r tips could be constructed from other materials to investigate their effect. A n o t h e r useful a d d i t i o n to the system would be a force sensor or load cell for the t i p of the effector's plunger. K n o w i n g the force profile of the i m p a c t could yield more accurate sound models. R e c o r d i n g the back-current through the solenoid coil may also be a solution.  78  T h e problem of microphone placement could be addressed by repositioning the microphone on the F M S for each i m p a c t . A sophisticated motion planner would be required to prevent collisions w i t h the object, test station and contact measurement system. One issue not addressed by the current design is the effect attaching objects to the A C M E test station has on the boundary conditions of the sound model. B y attaching one side of an object to the test station, that side is prevented from vibrating freely.  T h e resulting sound model is therefore only applicable only to  synthesis of the object's sound i n this configuration. One useful configuration is an "all-free" boundary condition where all but the impulse location are free to vibrate. T h i s is useful for simulations of dropped objects. One possible solution is to hold the object in place by a m i n i m u m number of point contacts. A set of rubber cones may be used for this purpose. T h o r o u g h evaluation of the system, particularly the adaptive sampling algor i t h m , requires a playback device. Intelligent audio m o r p h i n g algorithms for synthesis may reduce a model's dependence on adaptive sampling. A t the very least, software which morphs between sample locations to provide a continuous audio m a p of objects will encourage perceptual studies evaluating the effectiveness of the acoustic distance metric and perceptual thresholds. T h i s research will hopefully result i n an iterative improvement of the adaptive s a m p l i n g a l g o r i t h m . O t h e r experiments should also be conducted to investigate the effect of strike angle relative to surface n o r m a l on the sound model. It is clear t h a t the amplitude of the sound will change as the strike angle approaches 0 ° . It is not immediately clear whether objects require anisotropic sound models. T h e current system is easily programmed to perform these experiments.  79  Bibliography [1] Various A u t h o r s .  Computer Music Journal Special Issues on Physical Modeling.  16(4) and 17(1), M I T Press, 1996 and 1997. [2] K e v i n Bradley. Synthesis of an acoustic g u i t a r w i t h a digital string model and linear prediction. M a s t e r ' s thesis, Carnegie M e l l o n University, 1995. [3] A n t o i n e Chaigne and Vincent D o u t a u t . N u m e r i c a l simulations of xylophones.  Journal of Acoustical Society of America, 101(l):539-557, 1997. [4] Perry R . C o o k and D a n T r u e m a n .  A database of measured musical i n -  strument body radiation impulse responses, and computer applications for exploring and  utilizing the  measured filter functions.  A v a i l a b l e Online:  http://www.cs.princeton.edu/ pre/'ism98fin.pdf, 1998. [5] Creative Technology L t d .  Sound Blaster Live! Hardware Specifications, 2000.  Available online: h t t p : / / w w w . s o u n d b l a s t e r . c o m . [6] Tony D e R o s e .  Subdivision surface course notes. S I G G R A P H Course Notes,  1998. [7] Shlomo D u b n o v , Naftali Tishby, and D a l i a C o h e n . C l u s t e r i n g of musical sounds  Proceedings of the 1995 International Computer Music Conference, pages 460-463, 1995.  using polyspectral distance measures. In  [8] R o b e r t S. D u r s t and E r i c P . K r o t k o v . O b j e c t classification from analysis of  Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, volume 1, pages 90-95, 1995.  impact acoustics.  [9] W . W . G a v e r .  In  Everyday Listening and Auditory Icons. P h D thesis, Univeristy  of C a l i f o r n i a in San Diego, 1988. [10] W . W . G a v e r .  Synthesizing auditory icons.  TERCHIPS, pages 228-235, 1993.  80  In  Proceedings of the ACM IN-  [11] A u g u s t i n e H . G r a y , J r . and J o h n D . M a r k e l . processing.  Distance measures for speech  IEEE Transactions on Acoustics, Speech and Signal Processing,  ASSP-24(5):380-391, October 1976. [12] D . J . Hermes. A u d i t o r y material perception. I P O A n n u a l Progress R e p o r t 33, Technische Universiteit E i n d h o v e n , 1998. [13] Wesley H . H u a n g . A tapping micropositioning cell. In  Proceedings of the IEEE  International Conference on Robotics and Automation, pages 2153-2158, 2000. [14] I n t e r T A N Inc.  Optim.us ultra-miniature tie-clip microphone specifications, 1996.  [15] D o u g l a s Keislar, Thorn B l u m , James W h e a t o n , and E r l i n g W o l d . A contentaware sound browser. In  Proceedings of the 1999 International Computer Music  Conference, pages 457-459, 1999. [16] R o b e r t a K l a t z k y , Dinesh K . P a i , and E r i c K r o t k o v . Perception of material from contact sounds. Presence, (in press).  Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 88-94, 1995.  [17] E r i c K r o t k o v . R o b o t i c perception of material. In  [18] E r i c K r o t k o v , R o b e r t a K l a t z k y , and N i n a Z u m e l . R o b o t i c perception of material: E x p e r i m e n t s w i t h shape-invariant acoustic measures of material type. In O . K h a t i b and J . K . Salisbury, editors,  Experimental Robotics IV, num-  ber 223 in Lecture Notes in C o n t r o l and Information Sciences, pages 204-211. Springer-Verlag, 1996. [19] Charles L o o p . S m o o t h subdivision surfaces based on triangles. M a s t e r ' s thesis, U n i v e r s i t y of U t a h , 1987. [20] R o b e r t L . M o t t .  Sound Effects: Radio, TV, and Film. B u t t e r w o r t h Publishers,  1990. [21] D i n e s h K . P a i , Jochen L a n g , J o h n E . L l o y d , and R o b e r t J . W o o d h a m . A c m e , a  Proceedings of the Sixth International Symposium on Experimental Robotics, 1999.  telerobotic active measurement facility. In  [22] P o i n t G r e y Research, Vancouver, C a n a d a .  Triclops On-line Manual. Available  online: h t t p : / / w w w . p t g r e y . c o m .  Precision MicroDynamics Inc. MC8-DSP-ISA Register Access Library and User's Manual, 1.3 edition, 1998.  [23] Precision M i c r o D y n a m i c s Inc.  81  [24] Lawerence Rabiner and B i i n g - H w a n g J u a n g .  Fundamentals of Speech Recogni-  tion. P T R Prentice-Hall, Inc., 1993. [25] J o s h u a L . R i c h m o n d and Dinesh K . P a i . A c t i v e measurement of contact sounds.  In Proceedings of the IEEE International Conference on Robotics and Automation, pages 2146-2152, 2000. [26] J o s h u a L . R i c h m o n d and Dinesh K . P a i . R o b o t i c measurement and modeling of contact sounds. In  Proceedings of the International Conference on Auditory  Display, 2000. [27] M a l c o l m Slaney, Michele Covell, and B u d Lassiter. A u t o m a t i c audio morphing.  In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1001-1004, 1996.  [28] K e n Steiglitz. A Digital Signal Processing Primer with applications to Digital Audio and Computer Music. Addison-Wesley, 1996. [29] M a r k U l a n o .  M o v i n g pictures that talk - the early history of film sound.  A v a i l a b l e online: h t t p : / / w w w . f i l m s o u n d . o r g / u l a n o / i n d e x . h t m l .  [30] K . van den Doel. Sound Synthesis for Virtual Reality and Computer Games. P h D thesis, University of B r i t i s h C o l u m b i a , M a y 1999. [31] Kees van den D o e l and Dinesh K . P a i . T h e sounds of physical shapes. Presence, 7(4):382-395, 1998. [32] R i c h a r d P . W i l d e s and W h i t m a n A . Richards. Recovering m a t e r i a l properties from s o u n d .  In W h i t m a n R i c h a r d s , editor,  Natural Computation. T h e M I T  Press, 1988. [33] E r l i n g W o l d , T h o m B l u m , D o u g l a s K e i s l a r , and James W h e a t o n . based classification, search and retrieval of audio.  1996. A l s o available online (www.musclefish.com).  82  Content-  IEEE Multimedia, 3(3):27-36,  Appendix A  Sound Effector Specifications A.l  M o u n t i n g Bracket  T h e construction of the sound effector's mounting bracket deserves a brief description here for future reference.  Constructed from a single piece of 1 — | " x  |"  a l u m i n u m , it was formed on a bending-bar following the schedule in F i g u r e A . l . T o allow for the finite bending radius of the material, an a d d i t i o n a l | " ( | " x 2) was added to the length of the material. Following the bending specifications in F i g u r e A . l , the specified spacing between the two ends (i.e., 2.5") was maintained. Unfortunately, two artifacts of the bending process are present i n the bracket: fatigue marks and off-centre alignment.  T h e fatigue marks were produced on the  exterior radius of each bend. These occurred because the metal was bent beyond its permissible stress l i m i t . T h i s might be avoided in future constructions i f the metal was first heated. T h e alignment of the solenoid is also slightly off-centre following the bending. T h i s is a flaw of the alignment of the material in the bending vise.  83  F i g u r e A . l : B e n d i n g schedule for sound effector m o u n t i n g bracket.  84  A.2  Control  circuit  T h e circuit interfacing the sound effector to the Precision M i c r o D y n a m i c s M C 8 board is diagrammed in F i g u r e A . 2 . It is a simple switching circuit, with a 74F245 Octal buffer to isolate the M C 8 from the relay. T h i s circuit may be duplicated to control other digital o u t p u t devices such as lights. +5 V D C  Pin J4-37  1/8 Q l  f solenoid 1 kQ  Tl  7\ +12 V D C  100 Q  Q l : 74F245 Octal Buffer Tl: P2N2222NPN  F i g u r e A . 2 : Schematic for solenoid control circuit.  85  4 To Solenoid  Appendix B  Effect of W h i t e Noise on Spectrograms T h e two spectrograms in F i g u r e B . l are presented to compare the effect of white noise on the appearance of a s p e c t r o g r a m . W i t h o u t background noise (Figure B . l (a)), the frequency modes appear as wide bands, and appear to be sustained longer. It therefore becomes difficult to compare synthesised spectrograms to measured ones. B y adding low amplitude (e.g., S N R = 100) white noise to the synthesised sound, the m a p p i n g of colours to intensity values is scaled more comparably to the original recording. F o r this reason, all spectrograms of synthesised sounds in C h a p t e r 8 include white noise added at a signal-to-noise ratio approximately the same as the measured samples.  T h e white noise is added to the synthesised signal prior to  c o m p u t i n g its spectrogram.  86  (a) P u r e spectrogram.  (b) Spectrogram w i t h additive white noise.  F i g u r e B . l : Effect of white noise on spectrograms. T h e spectrogram in (a) is produced by a sound synthesised from a forty-mode sound model. T h e spectrogram in (b) is the same spectrogram, but w i t h white noise added at an S N R of 100. W h i t e noise scales the m a p p i n g of colour to intensity more comparably to spectrograms of recorded sounds.  87  Appendix C  Details of Unique Frequency-Mapping Algorithm A simple a l g o r i t h m was designed to match frequency modes between two sound models such t h a t a one-to-one mapping exists. T h a t is, given two models ( A and B), each w i t h Nj frequencies, a m a p p i n g function m(j) to FB,J when i = m(j).  is produced such t h a t FA,% maps  T h e a l g o r i t h m is summarised in C h a p t e r 7 (Figure 7.4) and  repeated in F i g u r e C . l for convenience. T h i s appendix describes the i m p l e m e n t a t i o n of the a l g o r i t h m in greater detail. F i v e arrays are used to track the matching of modes. T h e first is a m a t r i x of frequency differences between all modes in the two models (Figure C.2). T h i s d i f ference m a t r i x is used to create the order array. T h i s array orders the indices of modes in model B from nearest to farthest for each mode in m o d e l A . (Figure C.3). The t h i r d array is a list of indices indicating the next mode to check (Figure C.4). A fourth array (Figure C.5) is used to track which modes i n model A are currently matched. T h e fifth array is the result of the a l g o r i t h m : an array relating the m a p p i n g of mode indices in model A to mode indices in model B (Figure C.6). T h e index of the mode in model A matched to mode j in model B is stored as mapping [ j ] . T h e elements of the mapping array are initialised to  88  i n d i c a t i n g u n m a t c h e d modes.  • F o r each frequency mode UA,i in model A . . . 1. F i n d the frequency mode u>B,k nearest to UA,\ 2. If the u>B,k is unmatched, match it to LOAJ 3. O r , i f UB,k is matched to another mode which is farther t h a n UA,I, match 0JB,k t o LJA,{ instead. M a r k uA,i as matched, and the mode it replaces as unmatched. 4. Otherwise, loop to step 1, picking the next nearest frequency mode • Repeat until a l l frequency modes in model A are matched.  F i g u r e C . l : A l g o r i t h m to find unique frequency m a p p i n g between t w o models.  1  2  3  4  1  6  0  8  1  2  1  5  3  4  3  1  7  1  6/  4  4  2  6  1  A N  The difference  Mode 61 is the  between modes  third nearest to  A3 and B4 is 6.  mode A4.  F i g u r e C . 2 : d i f f e r e n c e m a t r i x . Dif-  F i g u r e C . 3 : order array.  ferences  umn  i n frequencies  between all  E a c h col-  of the order array contains the  modes i n M o d e l A a n d a l l modes in  indices of modes i n M o d e l B from  M o d e l B are stored i n this array. F o r  nearest t o farthest of a mode i n M o d e l  d i f f e r e n c e [3] [ 4 ] is the  A . F o r example, order [3] [ 4 ] is the  example,  difference between FA,3 and FB,A-  index of the 3  rd  M o d e l B t o F ,4. A  89  nearest mode i n  1  true  2  false  3  false  4  false  Check A1 against the 3rd closest mode next iteration.  Figure C.4: i n d e x array. E a c h element contains an index into the o r d e r array at which to select the next mode from M o d e l B for c o m p a r i son. E a c h element is non-decreasing, thus eliminating cycles where two modes are repeatedly compared against another pair.  -1  Mode B l is mapped to mode A2.  Figure C.5: matched array. Indicates which modes of M o d e l A have been matched to a mode in M o d e l B. The algorithm terminates when all elements of this array are t r u e .  90  Figure C . 6 : mapping array. W h e n the algorithm terminates, this array maps modes in M o d e l A to modes in M o d e l B . F o r examples, mapping [1] contains the index of the mode in M o d e l A which maps to FB,I-  T h e pseudo-code in F i g u r e C . 7 uses these five arrays to implement the algorithm of F i g u r e C . l . E a c h unmatched mode i in model A is examined in sequence. T h e j = i n d e x [ i ]  t h  mode listed in c o l u m n i of the o r d e r matrix is checked  against the m a p p i n g array. If no m a p p i n g exists (i.e., mapping [b] = - 1 , where b = o r d e r [ i ] Cj]), the m a p p i n g is set to mode i , mode i is marked as matched (i.e., m a t c h e d [ i ] = true) and the next mode in model A is examined (i.e.. i = i + 1). If a mapping exists, the difference between the currently mapped modes is compared to the difference between modes i and b (using the d i f f e r e n c e m a t r i x ) . If the current mapping's difference is less t h a n the proposed new mapping, i n d e x [ i ] is incremented, and the next nearest mode j is examined. Otherwise, m a p p i n g [ b ] is set to i , and the previously mapped mode is marked as unmatched. T h i s process continues u n t i l all modes i n model A are matched. If the last mode i in model A is examined before the m a p p i n g is complete, i is reset to the first unmapped mode in model A and the loop continues. Since models A and B contain the same number of modes, each of which is uniquely mapped to one other modes, the a l g o r i t h m is guaranteed to terminate. T h e i n d e x array prevents two pairs of modes from being compared twice, thereby eliminating potential cycles.  91  int[]  FindUniqueMapping(double [] modesOfA, double [] modesOfB)  { // the d i f f e r e n c e [ a ] [ b ] i s the d i f f e r e n c e between frequency // modes a (from Model A) and b (from Model B) d i f f e r e n c e = calculateDistance(modesOfA, modesOfB); // order[a] [ i ] i s the index t o the i t h nearest mode i n Model B / / t o mode a i n Model A order = s o r t ( d i f f e r e n c e ) ; // index [a] i s the index of the next mode i n the order // array f o r mode a i n t [] index; // i n i t i a l i s e a l l elements of index t o 0 index i s a l l { 0 } ; // matched[a] i n d i c a t e s whether mode a i n Model A has // been matched yet boolean[] matched; // I n i t i a l i s e a l l elements of matched t o f a l s e matched i s a l l { f a l s e } ; // mapping [b] i s the index of the mode i n Model A that best // maps t o mode b of Model B. i n t [ ] mapping; // I n i t i a l i s e a l l elements of mapping t o - 1 mapping i s a l l { - 1 } ;  92  /** ALGORITHM CORE BEGINS HERE **/ // I t e r a t e u n t i l a l l modes i n Model A are matched while (matched i s not a l l {true}) { // For each unmatched mode i n Model A... f o r ( i = 0 t o modesOfA.length) { i f (matched[i] == f a l s e ) { // F i n d nearest unmapped mode i n Model B for (j = index[i] to order[i].length) { // b i s the index of the next nearest mode i n Model B b = order[i]Cj]; // a i s the index of the mode i n Model A t o which b //is  mapped ( i f any)  a = mappingCb]; // i f mode b i s unmapped, or i f i t i s mapped // t o mode i n Model A which i s f a r t h e r , map i t t o // mode i i f ( ( a == - 1 ) OR (differenceCa]Cb] > d i s t a n c e C i ] C b ] ) ) { mappingCb] = i ; i f (a > - 1 ) then matchedCa] = f a l s e ; matched[i] = t r u e ; indexCi] = j + 1 ; break j loop; } } // end j loop > } // end i loop } }  Figure C . 7 : FindUniqueMapping  93  Pseudo-code  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051315/manifest

Comment

Related Items