UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Development of data acquisition and analysis methods for chemical acoustic emission Sibbald, David Bruce 1990

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1990_A6_7 S55.pdf [ 12.46MB ]
Metadata
JSON: 831-1.0060322.json
JSON-LD: 831-1.0060322-ld.json
RDF/XML (Pretty): 831-1.0060322-rdf.xml
RDF/JSON: 831-1.0060322-rdf.json
Turtle: 831-1.0060322-turtle.txt
N-Triples: 831-1.0060322-rdf-ntriples.txt
Original Record: 831-1.0060322-source.json
Full Text
831-1.0060322-fulltext.txt
Citation
831-1.0060322.ris

Full Text

Development of Data Acquisi t ion and Analysis Methods for Chemical Acoustic Emiss ion by David Bruce Sibbald B. Sc. (Hons.), University of British Columbia, 1987 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF CHEMISTRY We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA JULY 1990 © David B. Sibbald, 1990 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia Vancouver, Canada Date DE-6 (2/88) i i Abstract Acoustic Emission Analysis (AEA) is the study of the sonic (and ultrasonic) energy released by chemical systems in the form of transient waves, as the system attempts to (re)attain equilibrium. This area of chemistry, and chemical analysis, is ripe for fundamental studies since it has been little explored. The high potential of the technique as a non-invasive, non-destructive reaction monitoring scheme suggests that numerous applications will follow. In this work, an apparatus and software have been constructed to monitor acoustic emission (AE) and collect and process A E data. A broad-band piezoelectric transducer was used to convert the acoustic signals to electrical waveforms which could be captured by a digital storage oscilloscope. These waveforms were then stored on an IBM-compatible computer for further analysis. Analysis of the data was performed using pattern recognition techniques. The signals were characterized through the use of descriptors which can map each signal onto a multi-dimensional feature space. Visualization of the data structure in multi-dimensional space was accomplished using several methods. Hierarchical clustering was used to produce tree structures, known as dendrograms, which attempt to show clustering of the signals into various groups. Abstract factor analysis (AFA) - also called principal components analysis (PCA) - was used to project the data onto a two dimensional factor space to allow for direct viewing of structure in the multi-dimensional data. Sodium hydroxide dissolution, aluminum chloride hydration and heat activation of Intumescent Flame Retardants (IFR's) were used to test the assembled hardware and to provide data to submit to the pattern recognition algorithms coded as part of this work. The solid-solid phase transition of trimethylolethane (Trimet), and the liquid crystal phase transitions of two liquid crystals (a-a;-bis(4-n-decylaniline-benzilidene-4,-oxyhexane), and 4-n-pentyloxybenzylidene-4'-n-heptylaniline) were also monitored and the signals analyzed. The pattern recognition software was able to extract much information from the acoustically emitting samples - information which would not have been apparent by using standard (uni- and bi-variate) methods of analysis. Chemical acoustic emission, coupled with pattern recognition analysis, will be able to provide the chemist with knowledge (qualitative, quantitative, kinetic, etc.) about chemical systems which are often difficult or impossible to monitor and analyze by other means. i v Table of Contents Abstract ii List of Tables vi List of Figures vii Acknowledgement xii I. Chemical Acoustic Emission 1 1 Introduction 1 2 Acoustic Emission Studies 1 3 Apparatus for Chemical Acoustic Emission Monitoring 9 II. Chemometrics 17 1 Projections of Multivariate Data Sets 21 2 Display of Hierarchical Clustering 30 3 Chemometrics in Acoustic Emission 36 III. Experimental 37 1 Hardware 37 1.1 Transducer 37 1.2 Waveguide 41 1.3 Conditioning Amplifier 41 1.4 Digital storage oscilloscope 43 1.5 Computer Interface 43 2 Software 45 IV. Chemical Systems Studied 46 1 Dissolution of Sodium Hydroxide 46 2 Trimethylolethane (TRIMET) 46 3 Intumescent Flame Retardants (IFR) 48 4 Aluminum Chloride Hydration 48 5 Liquid Crystals 50 5.1 a-w-bis(4-n-decylaniline-benzilidene-4'-oxyhexane) 50 5.2 4-n-pentyloxybenzylidend-4'-n-heptylaniline 50 V . Development of Data Acquisition and Analysis Procedures 51 1 Data Acquisition - the Q A Q Program 56 2 Viewing Individual Signals - SIGVIEW 59 4 Signal Classification and Editing 59 5 Frequency Content Analysis Methods - V T R A P S 66 5 Pattern Recognition 70 6 Descriptor Generation - the Program A E M U N C H 75 7 Hierarchical Cluster Analysis - D E N D G R A M 75 8 Factor Analysis - A B S C A T 82 VI. Chemometric Methods Used in This Work 90 1 Descriptors 90 2 Scaling 95 3 Similarity and Distance 103 4 Hierarchical cluster Analysis 105 V 5 Abstract Factor Analysis (AFA) 116 VII. Results and Discussion 119 1 Detailed Characterization of the Hardware and Software Developed 119 1.1 Effect of Transducer on Observed Signals 119 1.2 Effect of Ambient Noise 121 1.3 Effect of Signal Acquisition Rate 121 1.4 Effect of Trigger Level 122 1.5 Effect of Waveguide on Observed Signals 122 1.6 Selection of Descriptors for Pattern Recognition 126 1.7 Choice of Scaling Technique for Descriptors 133 1.8 Visualization of Signal Classes Using Dendrograms 134 2 Chemical Systems 137 2.1 Sodium Hydroxide Dissolution 137 2.2 Trimethylolethane (Trimet) 137 2.3 Intumescent Flame Retardants (IFR) 140 2.4 Aluminum Chloride Hydration 141 2.5 Liquid Crystals 147 VIII. Further Work 153 IX. Conclusions 156 X. Bibliography 157 XI. Appendices 164 1 Data File Formats 164 1.1 .AEA - Acoustic Emission Experiment Data File 165 1.2 .DS1 - Descriptor File 167 1.3 .DEN - Dendrogram File 170 1.4 .AF2 - Abstract Factor Analysis Results 172 2 QAQ Program Listing 173 3 ABSCAT Program Listing 187 4 Hierarchical Cluster Analysis Software 245 4.1 D E N D G R A M Program Listing 248 4.2 DENDPLOT Program Listing 277 v i Legend to Tables Table Page 1 A comparison of acoustic emission monitoring apparatus 12 2 A simulated data set with two dimensions 15 3 The Euclidean distance matrix for the data in Table 2 16 4 The partial distance matrix for Table 2 data after first "fusion" in hierarchical clustering process 16 5 The distance matrix for Table 2 data after first "fusion" using single linkage method of hierarchical clustering 16 6 The distance matrix for Table 2 data after first "fusion" using complete linkage hierarchical clustering 17 7 The distance matrix for Table 2 data after first "fusion" using average linkage methods of hierarchical clustering 18 8 The .RESult file from abstract factor analysis performed on NaOH dissolution experiments showing various functions used to indicate the number of primary factors present in a data set 76 v i i Legend to Figures gure Pag 1 The Kaiser Effect (figure from Bruel and Kjaer course module 600: Introduction to Acoustic Emission) 3 2 A simple apparatus for acoustic emission monitoring 10 3 Integrating AE monitoring apparatus 12 4 AE apparatus using multiple band-pass filters and integrators 13 5 AE Apparatus including digitizing oscilloscope to capture individual signals. 14 6 Typical acoustic emission signal 15 7 Three dimensional plot of detector response versus concentration for two reagents 18 8 Two dimensional plot of system in Figure 7. Detector response versus concentration with one reagent concentration held constant 19 9 Two dimensional data set containing 10 data points. Point 1 is seen to be more similar to Point 2 than to Point 3 because of the relative distances 22 10 Projection of simulated Figure 9 data onto axis 23 11 Data set from Figure 9. Lj is the first principal component - which is the axis containing the most variance. is the second principal component 24 12 Projection of Figure 9 data onto first principal component L j 25 13 Projection of data onto 2nd principle component showing loss of information 26 14 Data from Figure 9 projected onto factor space defined by Lj and Lj. By comparison with Figure 9, factor analysis is a simple rotation of feature space 28 15 Five imaginary cities, A-E 32 16 The five cities joined with minimum length of train tracking. This is a minimal spanning tree for the five cities 33 17 A hierarchical dendrogram for cities in Figure 15 showing two groups or classes 34 18 Chemical acoustic emission monitoring apparatus, i) Bruel and Kjaer broadband transducer - model 8312; ii) Bruel and Kjaer conditioning amplifier model 2638; iii) Tektronix 100 MHz digital storage oscilloscope model T2230; iv) Goerz Metrawatt chart recorder - model SE 120; v) PC/AT computer with IEEE-488 parallel interface 38 v i i i Figure Page 19 Glass wave guide used to transmit acoustic signals from the sample to the transducer when the sample environment is unfavorable for the normal operation of the transducer (eg. due to high temperatures) 39 20 Jacket and sleeve used to hold wave guide and transducer 40 21 Chart recorder trace of melting ice. Vertical axis measures intensity of acoustic emission versus time on horizontal axis (data provided by Dr. P. D. Wentzell) ., 42 22 Acoustic emission signal with voltage scale and digitization level between Oand 255 44 23 Trimethylolethane (TRIMET or 2,2 dimethyl-l,3-dipropanol) 47 24 Q-w-bis(4-n-decylaniline-benzilidene-4'-oxyhexane) 49 25 QAQ - Program display during acoustic emission experiment 57 26 SIGVTEW - Program options for viewing acoustic emission experimental data from .AEA files 60 27 SIGVTEW - Display of acoustic signal from cooling of 4-n-pentyloxybenzylidene-4'-n-heptylaniline 61 28 SIG VIEW - Display of power spectrum of signal in Figure 27 62 29 SIGVTEW - Signal due to electrical noise 63 30 SIGVTEW - Electrical signal repeatedly found on power mains 64 31 SIGVTEW - Intense signal having amplitude beyond the voltage range of oscilloscope 65 32 VTRAPS - Time resolved average power spectrum for melting ice (data provided by Dr. P. D. Wentzell) 68 33 VTRAPS - Frequency variance spectrum for the liquid crystal 4-n-pentyloxybenzylidene-4'-n-heptylaniline. Two regions of high variance suggest that more than one process is occurring 69 34 A E M U N C H - Menu giving choice of descriptors to generate for descriptor file (.DS1) 71 35 A E M U N C H - Display during calculation of descriptors 72 36 A E M U N C H - Plot of RMS versus time 73 37 A E M U N C H - Plot of Kurtosis versus FMAX for NaOH data 74 38 D E N D G R A M - Menu offerring choice of scaling options 76 ix Figure Page 39 D E N D G R A M - Choice of different methods for calculating dendrograms.... 77 40 D E N D G R A M - Dendrogram display. Signal labels are given on left. Horizontal axis is (dis)similarity 78 41 D E N D G R A M - Display can be divided into two "pages" for inspection of detail 79 42 D E N D G R A M - Use of a square-rooted similarity axis 80 43 D E N D G R A M - Use of an exponential similarity axis. This serves to stretch the dendrogram 81 44 ABSCAT - Program's main option menu 83 45 ABSCAT - Performing factor analysis 84 46 ABSCAT - Factor loading display of first six factors. Each vertical bar coresponds to the loadings of a particular feature (descriptor) on the first six principal components starting with the one listed at the upper right (in this case factor 1) 85 47 ABSCAT - Cross-hair "cursor" which is used to help identify the signals referred to by the individual points 87 48 ABSCAT - ''Bubble" which opens up around cross-hairs to enclose points of interest for identification 88 49 ABSCAT - Identification of signals which have data points within the cursor bubble 89 50 Frequency spectrum of acoustic signal in Figure 6 92 51 a) Simulated two dimensional data set with outlier in one dimension. b) Result of range scaling data between 0 and 1. Data has been compressed in one dimension, changing apparent structure 98 52 a) A two dimensional data set with two clusters, b) Data set normalized. The grouping of the data has been lost due to imposition of normalization criteria 100 53 Frequency spectrum of signal from NaOH hydrolysis divided into 8 octiles. Maximum frequency is determined from the sampling rate by the Nyquist theorem. The horizontal bars correspond to the RMS power of each octile 102 54 Scaling routine from program DENDGRAM. Scaling of octiles avoids amplification of noise in eight octiles if the higher variance of the first octile is used 104 55 Difference between Euclid vs Manhattan distances for two dimensional example 107 X Figure Pag 56 Graphic representation of difference between single linkage and complete linkage for distance between a point and a cluster 110 57 Graphic representation of difference in centroid location for weighted and unweighted centroid methods. The location of the centroid of the new cluster formed will lie half way between the centroids of clusters A and B using the Weighted Centroid Method. The Unweighted Centroid Method will use the "center of mass" as the centroid of the new cluster I l l 58 Dendrograms calculated from data in Table 2 using single linkage and complete linkage 112 59 Dendrograms calculated for data in Table 2 using unweighted average linkage and weighted average linkage 113 60 Dendrograms calculated from data in Table 2 using the centroid method and the unweighted centroid method (Grower's method) 114 61 Dendrogram from data in Table 2 calculated using Ward's method 115 62 Frequency response of Bruel and Kjaer transducers (model 8312) 120 63 Four average power spectra for the NaOH hydration using different sample holders 123 64 Values of FMED, FMAX, and FMEAN for the dissolution of NAOH using a 50 mL Beaker, the short waveguide, and the long waveguide 125 65 Plot of AREA vs. RMS for signals from the liquid crystal, 4-n-pentyloxybenzylidene-4'-n-heptylaniline 127 66 Acoustic signal with amplitude outside voltage range digitized. (Signal acquired during cooling of liquid crystal.) 128 67 ABSCAT - Factor loadings for first six factors for NaOH experiment 130 68 ABSCAT - Factor loadings for least significant factors for NaOH experiments 131 69 First two factors for NaOH experiments. The x-axis is the first principal component. The y-axis is the second 132 70 Dendrogram for NaOH experiments. The windowed area includes all the signals from the collection of background showing them to be separated from the other signals 135 71 Acoustic activity of TRIMET (Intensity plotted versus time.) 138 72 Frequency power spectum for acoustic activity during cooling of TRIMET. 139 73 Acoustic activity of AICI3 142 x i Figure Page 74 AICI3 experiments. pH of final solutions versus initial mass of AICI3 144 75 Total acoustic energy recorded versus initial mass of AJCI3 145 76 Reaction time (for 90% of total AE emissions versus mass of AICI3 146 77 RMS versus time for signals from the liquid crystal, 4-n-pentyloxybenzylidene-4'-n-heptylaniline 148 78 Acoustic signal (short duration burst) during initial acoustic activity of the liquid crystal, 4-n-pentyloxybenzylidene-4'-n-heptylaniline 149 79 Continuous acoustic activity during "active phase" of 4-n-pentyloxybenzylidene-4'-n-heptylaniline 150 80 Average power spectrum of acoustic signals from cooling liquid crystal 4-n-pentyloxybenzylidene-4'-n-heptylaniline 151 81 Acoustic flow cell designs. Two stainless steel disks - one with a flow channel inscribed in its surface - are fastened together 154 82 Data acquisition algorithm for QAQ program 175 83 Abstract factor analysis (AFA) algorithm from ABSCAT program 188 84 Dendrogram algorithm from DENDGRAM program 245 x i i Acknowledgements I wish to give my most heartfelt thanks to Dr. Adrian Wade. His cheerful and energetic attitude to this project has been most encouraging. I also wish to commend Dr. P. D. Wentzell. Pete was always willing to help or give advice; and when none was asked for, he related countless anecdotes which helped make the day a little "shorter". Much appreciation is deserved by the people I worked with, for the jovial atmosphere that was created as well as for the assistance and camaraderie: Paul, Terrance, Oliver, Julie, Tony, Steve, Kevin, Ivan, Tim, Helen, Bruce, Patrick, and "little" Adrian. It would be sadly remiss of me not to mention two people who really made U.B.C. a special place for me: Fred Mistry, for breaking up the day into manageable sized chunks; and Dudley Shallcross, for spraying coffee and listening to my advice on female psychology (usually at the same time). A distant recognition is given to Ben Clifford, Gary Leong, and Sandi Clark for helping me get here in the first place. A deep appreciation is also offered to Sara Jane Biles for knowing just when and where to kick me when I needed it most. Finally, I wish to dedicate this work to the two people without whom none of this would be possible, Jane and Bruce Sibbald. But for your love, compassion, and generosity, I would never have been able to try to reach so high. 1 I. Chemical Acoustic Emission L l Introduction Analytical chemistry is that branch of science that attempts to characterize chemical systems based on their behavior and properties. Physical attributes such as viscosity, density, mass, crystallinity and porosity are sometimes used to characterize a sample. Modern chemical analysis almost entirely depends on the interaction of the sample with some form of supplied energy. For example, studies of the interaction of samples with a beam of electromagnetic radiation have led to development of techniques such as atomic absorption, vibrational spectroscopy, fluorescence, phosphorescence, and refractive index measurement. Nuclear magnetic resonance, electron spin resonance, and mass spectroscopy are used to identify / characterize a sample based on its behavior in electric and/or magnetic fields. Characterization of ongoing chemical reactions has been one of the more interesting problems of analytical chemistry. Time resolved methods such as differential scanning calorimetry (DSC), dilatometry, and electrochemical techniques have been employed as well as those techniques mentioned above. Many chemical reactions release energy in some form as they occur. The usual forms of energy release considered are heat and light. Such processes can be monitored by techniques such as calorimetry or luminescence spectroscopy. Energy can also be released in the form of sound over a broad frequency range, presently known to be from a few Hz to over 1 MHz. 1.2 Acoustic Emission Studies The human mind is very efficient at recognizing sound patterns. Both the cause and direction of a sound source are quickly identified by mental analysis of differences 2 in the sound characteristics. Acoustic,parameters, formalized in terms such as pitch, tone, loudness, etc., allow us to distinguish between a myriad of potential candidate sources for the sound. The ear is capable of detecting sound vibrations at frequencies in the range of about 10 Hz to 16 kHz. Vibrations at higher frequencies cannot be heard but still contain useful information. Recently, it was found that plants in need of water emit bursts of high frequency sound. When the plant is re-supplied with water, the sound ceases after the water columns in the stem are fully reestablished1. The earliest recording example of the use of sound in chemical analysis comes from Germany in ca. A.D. 1350, where in a manuscript containing instructions for the making of gun powder, the following information was presented: "If thou wilt try whether sulphur be good or not, take a lump of sulphur in thine hand and lift it to thine ears. If the sulphur crackle, so that thou hearest it crackle, then it is good; but if the sulphur keep silent and crackle not, then it is not good, and must be treated as thou shalt hear hereafter how it shall be prepared."2 Metallurgists are well aware of the value of sound in the production of materials. In 1936, Forster et al. observed that sound waves were generated during the formation of martensite3. The first laboratory investigation of the phenomenon of acoustic emission did not take place until 1953 when J. Kaiser reported his findings4. The term "Kaiser effect" describes the property of a material such that it will only generate transient mechanical waves (AE) when it is placed under a stress (or load) which is greater than that to which it has previously been exposed. This is illustrated in Figure 1. The upper trace shows the force, F, to which a sample is being subjected. The lower trace shows the acoustic energy, Vp e ak - in Volts, being emitted by the sample. The Figure 1) The Kaiser Effect. A sample will release acoustic energy (indicated by the plot of peak Voltage, V P E A K , versus time, t) only when subjected to a load, F, greater than that which it has previously experienced. Here one can see the level of AE drop off as the load is held constant at F 0 . LO 4 first load cycle involves ramping the force up to a value FQ - The load is held constant until the second load cycle at which time it is increased further. One can see that the sample was acoustically active during the first load cycle but did not emit further acoustic emissions until submitted to a greater load than had been achieved previously. The application of acoustic emission to an engineering structure was first documented eleven years later by A. T. Green et al.5. The integrity of a Polaris Model A3 solid rocket motor case is poorly determined by conventional techniques because it is a filament wound structure (unlike steel and titanium rocket cases). In this case, high frequency piezoelectric accelerometers were attached directly to the Polaris chamber. By looking at the amount of acoustic activity at low operating pressures, it was possible to predict the burst (failure) pressure. Acoustic emission was thus introduced as a non-destructive technique for the testing of materials. Because of its non-destructive nature, the use of acoustics for materials testing was enthusiastically examined for commercial applications6*8. The equipment for monitoring acoustic emission was already available. A large background of knowledge concerning acoustics and sound existed (the Journal of the Acoustical Society of America was founded in 1928) and the mathematics of the propagation of sound waves had been well studied. The need for a better understanding and detection in SONAR has resulted in a great theoretical background in the area of sound waves in water9'10. Many of the early acoustic methods relied not on the acoustic energy emitted (as for Kaiser's and Green's experiments) but on the attenuation of a supplied acoustic wave to which the sample was exposed. For example, the ultrasonic C-scan is performed by passing an acoustic pulse through the sample11. The pulse is reflected or scattered by defects or interfaces with separate regions of different acoustical 5 impedance. The signal will,normally reflect from the back of the sample and return after the same time and with the same amplitude across the entire sample. It's failure to do so is caused by a defect, and thus a defect can be located12. Rayleigh waves have also been used for evaluation purposes. The surface acoustic wave (SAW) technique measures the attenuation of an acoustic signal travelling along the surface of a sample. Reflections from a crack in the surface or a flaw which lies just beneath the surface can be observed1 .^ The last two decades have seen much progress in the use of Acoustic Emission (AE) for materials testing. The Journal of Acoustic Emission14 was started in 1981 as a medium for the users of AE to publish their work. Applications of AE have included the study of the intergranular cracking of steel. Nozue and Kishi counted the number of AE signals as well as their energy from a piece of tempered steel15. They discovered that the number of AE signals was proportional to the total number of micro-cracks and that the energy of the signals was nearly proportional to the crack area. The human hearing system involves the use of two ears which allow one to determine the direction from which a sound is coming. Similarly, the location of faults in gas pipelines has been found by correlating the arrival of acoustic signals at multiple transducers placed at different locations on the pipeline16. This is the same technique that seismologists use to locate the epicenter of an earthquake. This type of destructive testing is widely used by the materials industry for quality control. For example, Hoa and Li report the use of AE in the testing of reinforced fibre composites17. 6 A piezoelectric crystal will respond to an impulse with a particular resonant frequency. The resonant frequency depends on the mass of the crystal. A film deposited on the surface of the crystal acts to change the mass of the resonator and therefore changes the resonant frequency. This phenomenon has been used to provide a gas phase microgravimetric sensor18. The mass of a deposited film can be determined by measuring the change in resonant frequency of a piezoelectric crystal. This application has been extended to use in liquids. When a suitable chemically selective binding agent is incorporated into a film which is deposited on the surface of the crystal, the mass sensitive sensor becomes a chemically selective detector19. The extension of this technique to that of an highly selective immunosensor for antigens is discussed in an excellent review article20. Thermal analysis has been used to test a material for purity and integrity. The occurrence of sound during these experiments is also well documented. The existence of such emissions may be confirmed by a visit to any analytical laboratory where fire assay is performed, since as samples cool they emit profusely. Smith describes an apparatus to monitor the decrepitation of minerals21. This early technique is known today as thermosonimetry (TS)22. In this experiment, which is usually run simultaneously with differential scanning calorimetry (DSC), the acoustic activity of a sample is recorded as a function of temperature while the sample is heated and/or cooled. In this way, Clark23 was able to characterize the phase transition of potassium dichromate that occurs at about 520 K. Lonvik24 was able to show that the TS activity of a sample of brucite, Mg(OH)2, was dependent on the site from which the sample had been mined. Also, it was shown that the frequency content of the acoustic signals differed between the samples. 7 Many chemical reactions also emit audible sound. Schoolboys are familiar with the nature of crystals obtained from a mixture of NH4OH solution and I2 such that it makes for a good prank when sprinkled on the floor in front of the chalkboard. In 1957, Ranke Madsen used his ear to determine the endpoint of a titration as acid was added to a carbonate solution25. In 1963, Belyaev et al. 2 6 reported acoustic activity during the crystalloluminescence of BaCClO^)^ LiF, and IvjNaCSO^. It was suggested that the luminescence was the result of the crystals cracking as they grew. However, when in 1978, van Ooijen et al.27 noted acoustic activity during the precipitation of dichloro(pyrazine)zinc (II), the phenomenon was still largely a curiosity to chemists. The observation of van Ooijen et aL27 that the intensities of the signals were proportional to the concentrations of the reagents again suggested that AE could be applicable as a form of chemical analysis. In the late 70s, Betteridge et al decided to investigate the phenomenon of acoustic emission to determine if AE was indeed widely observable in chemistry28"30. Forty-three chemical systems were investigated. These systems included the addition of CUSO4 to a solution of sodium bicarbonate, the reduction of unsaturated hydrocarbons by KMnO,^ , reactions of luminol with peroxide in the presence of copper ions, recrystallization of KC1, and the addition of Mg to a solution of ferric chloride. The results were very conclusive. Thirty-two of the systems studied were found to emit a detectable level of acoustic energy within the frequency range studied. Of the systems that were "silent", nine involved merely the mixing of two organic solvents such as CCI4 and hexane. The setting of an epoxy resin and the addition of ferric chloride to phenol were the other systems that didn't emit detectable levels of acoustic energy. The results of the experiments were compared and pattern recognition techniques were used to group the systems into classes, showing that acoustic emission was able to be used for such a purpose. The individual signals from three of the systems were also compared and were sufficiently different to allow one to 8 distinguish between the different reactions. This key study proved conclusively that acoustic emission from chemical systems was not only a common occurrence, but also had analytical potential. In 1984, Sawada et al. investigated AE from the gelation of sodium carbonate and calcium chloride31. They saw different acoustic behaviors as a function of time (see for example Figure 1) and attributed them to different mechanisms of the reaction. The acoustic emissions from the precipitation and dissolution of sodium thiosulfate, and the phase transitions of p-cresol, methyoxybenzilidene-4-n-butylaniline (MBBA), and water were also observed32. The results suggested that the occurrence of AE was linked to volume changes during the phase transition. In 1986, Belchamber et al. 3 3 investigated one particular chemical system in detail with the intent of discovering whether AE was a useful quantitative tool for chemical analysis. The hydration of silica gel was chosen as this process emits very loud audible sounds, is rapid, and has analogues in industrial catalysis. Catalyst hydrolysis is difficult to monitor in-situ by other techniques. The level of AE activity was found to be dependent on factors such as particle size, amount of sample, initial water content, and temperature. For the first time, individual AE signals from a chemical reaction were captured, recorded, and analyzed in detail. Using pattern recognition techniques, and descriptive statistical factors (descriptors) which summarized aspects of the signals, frequency content, amplitude, and time-domain behaviour, they were able to classify the individual signals as being caused by particle fracture or by gas evolution. Through non-linear calibration curves, this work showed that AE could be quantitatively applied to chemical analysis and might be suited to industrial process monitoring. 9 Chemical Acoustic Emission is one of the primary research interests of Dr. Adrian Wade at the University of British Columbia. Graduate students have investigated acoustic emission from chemical processes such as the gelation of PVAA, PVAc and PAA, the dry-blending of PVC, the 81 °C solid-solid phase transition of trimethylolethane, phase transitions of the liquid crystal a-w-bis(4-n-decylaniline-benzilidene-4'-oxyhexane), the heat activation of intumescent fire retardants, and the 43.6 °C polymorphic transition of hexachloroethane34. Current work in the group includes the use of acoustic emission generated by a forced extension and bending of plastics and composite materials on two "stresser" systems. This is an extension of the work in reference 30. Also, a video system is being used to correlate acoustic activity with visual changes in the sample. Another study seeks to use AE to improve the efficiency of drop weight testing procedures. ChemometricA techniques such as hierarchical clustering and factor analysis figure largely in the analysis of the signals. These tools were implemented as part of this present thesis work. 1.3 Apparatus for Monitoring Chemical Acoustic Emission The apparatus used for AE experimentation has had to evolve as researchers have sought to obtain more information from each experiment, and have been provided with or have developed chemometric routines capable of automatically analyzing the larger data sets. The early AE systems were simply a microphone attached to the sample21 (Figure 2). This enabled one to tell if and when AE was occurring as well as giving some indication as to the rate of the process being studied. The data were collected in real time and different processes could be noticed if they had different intensities in the time domain. The utility of this apparatus is limited by its poor quantitative ability and lack of frequency information. Interpretation in the absence of A. Chemometrics is defined as "the chemical discipline that uses mathematical, statistical and other methods employing formal logic to: (a) design or select optimal measurement procedures and experiments; and (b) to provide maximum chemical information by analyzing chemical data. 3 5 10 Microphone -OO O O Sample Amplifier Chart Recorder Figure 2) A simple apparatus for acoustic emission monitoring. The acoustic activity is converted into an electrical impulse by the microphone which can be amplified and used to drive a chart recorder. 11 these is somewhat subject to operator bias . Detection of the presence of simultaneous processes is not generally possible. With the addition of an integrator (or computer to fulfil this function - Figure 3), the AE apparatus has the same abilities but can provide more information37. The integration trace which quantified the total AE energy output allowed for the determination of the driving force behind the process (ie. whether it was first order, linear, etc.)^. The data could be fitted to a rate equation, and so give some idea about the mechanism involved. Still no information about the frequency content of the emissions was obtained, although lower frequencies could be selectively excluded. By using multiple frequency bands and multiple integrators (Figure 4), the real-time data shows the time dependence of acoustic frequencies39'40. This is the principle of the multi-channel spectrum analyzer. Simultaneous processes may be distinguished if they differ in their frequencies. Noise may be eliminated by the selection of appropriate filters. This system is more expensive however, and the short time-constant integrating nature of the system makes the study of slowly emitting samples difficult. The choice of frequency bands for the filters is important as their number is limited to the number of channels. The frequencies of interest must be found experimentally. The choice of filter bands may also be predetermined by what is available from manufacturers as tunable filters are much more expensive than fixed. Even with this apparatus one still cannot visualize individual signals. By capturing the individual signals (Figures 5 & 6), not only are they easily viewed but their frequency spectra are also available30'33,41 Human or automated pattern recognition abilities can then be used to recognize the different types of signals which occur within each period of AE activity. The signals from stressed polymers were Integrator Microphone ) Sample Amplifier Chart Recorder Figure 3) A n integrating A E monitoring apparatus. Integration of the acoustic energy output can be plotted directly on a chart recorder by an electonic integrator or can be stored on comppter through the use of an analog to digital converter. A to D converters Band Pass Filters Microphone ) OO o O Sample Amplifier Chart Recorder Figure 4) A E apparatus using multiple band-pass to filters and integrato to allow for the analysis of individual frequency bands. TEMPERATURE CHART I RECORDER TRANSDUCER CONDITIONING AMPLIFIER OSCILLOSCOPE COMPUTER Figure 5) AE Apparatus including digitizing oscilloscope to capture individual signals. (f) 0.800 0.600: 0.400^ 0.200 "5 0.000 > -0.200 4 -0.400 i -0 .600: -0.800 i-0.000 Acoustic Emission during heating/cooling of a,w-bIs(4~n-decylanilIne-ben2ilidene-4'-oxyhexane) 4.096E-5 8.192E-5 1.229E-4 1.638E-4 2.048E-4 Time (seconds) Figure 6) A typical acoustic emission signal. (Intensity of signal - in Volts plotted versus time - in seconds). 16 able to be identified by automated pattern recognition techniques used by Belchamber et al. 30>42-46. The availability of different avenues of analysis of the data means that (hopefully) more analytical insights can be revealed. Because the amount of data now collected can now easily be several megabytes per experiment, real time analysis is presently not possible and enormous storage capacities are required. The collection of every signal is not possible for a fast emitter - one where the emission rate exceeds the rate at which signals may be acquired and stored41. Because of this, AE energy quantitation is presently less reliable on signal capture systems47. A tremendous amount of information potentially exists in the individual signals as well as in how they occur and change over time. Pattern recognition techniques have previously been applied to acoustic emission data with much success. These factors, plus the present absence of a solid theoretic background regarding the origins of chemical acoustic emission, presently make empirical pattern recognition the best choice for analyzing AE data. 17 II. Chemometrics The availability of less expensive electronics and computer components, and the need for even greater sensitivity and chemical selectivity, have led to development of hyphenated analytical techniques48. Chemists have begun to use instrumentation that produces very large amounts of multidimensional data. For example, tandem mass spectroscopy (MS/MS) uses two mass analyzers to produce two-dimensional mass spectra4 .^ In hyphenated techniques, the output of one system becomes the input for another. Also common now is the use of multiple detectors to simultaneously collect multidimensional information. Examples of this are diode array detection in UV/Vis spectrophotometry50, and Whole Column Detection (WCD) chromatography51. More information about a sample normally means that it should be easier to draw chemical inferences. However, when the amount of information becomes too large for interpretation within an acceptable time frame, the benefits are lost and much data remains uninterpreted. Such is the case in satellite surveillance, where it has been estimated that less than 5% of data collected is ever interpreted52. Indeed, it has been said that "we live in the age of the sorcerer's apprentice, in that we create data faster CO than we can sweep it away . The mind can easily appreciate the significance of a two dimensional display of data. Any scientific journal is full of two dimensional plots. Two dimensional representations of three dimensional surfaces are also largely recognizable - as a contour map or an evening with the family slides will confirm. When the information takes on a higher dimensionality than those we are accustomed to dealing with, namely two or three, it becomes much more difficult to visualize the structure within a data set and therefore to evaluate its meaning. What is often done is to look at successive subsets of the data in an attempt to find a pattern, eg. several two dimensional plots "simultaneously". 18 Figure 7) Three dimensional plot of detector response versus concentration for two reagents. 19 Figure 8) Two dimensional plot, of system in Figure 7. Detector response versus concentration with one reagent concentration held constant 20 Figure 7 shows a situation where a two dimensional approach may not be effective. The two horizontal axes correspond to concentrations of reagents. The vertical axis is an instrumental response (eg. absorbance). From the surface plot one can see that the best response occurs when the reagents are present in equal concentrations and that the response increases with increase of reagent. If a two dimensional view of this surface were taken (Figure 8) , where concentration of reagent A is held constant and reagent B is plotted against response, the resulting sigmoidal curve would lead one to conclude that there is a maximum response attained when the varied reagent concentration, B, equals that of the reagent that is held constant, A. It would not, however, reveal the tendency of the response to increase with higher concentrations of both reagents. This example of making incorrect conclusions based on only a two dimensional view of a higher dimensioned data set shows the need for some method of displaying all the data simultaneously. The amount of information lost by showing only two dimensions of a three dimensional surface can be estimated by direct comparison. The amount of information lost by using only 2 of m dimensions can only be guessed at. That is, unless the mathematics of information theory54 are used. Many types of patterns are easily recognized by the human mind. As was previously discussed, one is able to distinguish between two different speakers simply by the sound of their voices. Indeed some workers have similarly tried to set the results of GCMS chemical analyses to music to alert operators to the presence of difficult samples55. One would also seek to be able to use the ability of the eye to identify trends or patterns in high dimensional data when performing chemical analysis. However, since the limit for the number of dimensions for a visual representation of data is three, a method must be used to condense the higher dimensional information into a more manageable two- or three-dimensional form. Workers have tried representing features of multivariate data as the lengths of vectors projecting from a 21 single point^ or even as the size of features of a human face57. Keeping in mind the example given above, such graphical methods must not only be able to project m dimensions onto two or three dimensions, such that it is condensed into a more manageable form, but must also maintain the information content of the data. In this way, the human observer can use their own built in pattern recognition abilities. Chemical acoustic emission may have an experiment which produces a large number of signals. Each of these signals may be represented in "feature space" by several descriptors. (This will be discussed further in chapter 5). In order to be able to adequately analyze and observe the data requires the use of methods to visualize the multi-dimensional data. Two techniques commonly used are the projection of the data onto an appropriate set of axes, and the display of the hierarchy of the data set (in the form of clustering). II. 1 Projections of Multivariate Data Sets Figure 9 shows a two dimensional data set with 10 data points. One notices that there is a similarity among some of the data points. For example, because of its relative proximities, point 1 can be seen to be more similar to point 2 than it is to point 3. The distance between points can be thought of as a measure of their similarity. Data points that are similar will occupy the same region in what is termed a "feature space" or "descriptor space" which has the same number of dimensions as there are variables. Finding similarities in 2 dimensions is relatively simple. Data that occupy m-dimensions have the same property, but it are difficult to visualize. Fortunately, there is a method that allows us to project the m-dimensional feature space onto 2 dimensions. 22 CV2 X CD cd •i—i cd > 10 8 6 4 2 0 - 2 - 4 - 6 X o o 3 o o 8-i 1 4- ! ~ i - - --! ! - 8 - 6 - 4 - 2 0 2 4 Variable X i °i X 6 -I i 8 10 Figure 9) Hypothetical data set containing 10 data points each of which described by two variables - X i and X 2 . Point 1 is seen to be more s imi lar to Point 2 than to Point 3 because of the relative distances. 23 •O-G o—o—o— o—©—o— o—e | f _ , _ _ | j 1 1 1 j j 1 - 8 - 6 - 4 - 2 0 2 4 6 8 10 Variable X i Figure 10) Projection of simulated Figure 9 data onto xj axis. 24 10 CV2 X CD i — i cd • r—I cd > 0 -10 _— Q \ O J& o/ o 1 1 -1 1 - 1 0 - 8 - 6 - 4 - 2 0 2 4 6 8 10 Variable X x Figure 11) Data set from Figure 9. L i is the first principal component - which is the axis containing the most variance. is the second principal component 25 o G O — e - e o QSD o i i 1 1 1 1 1 1 1 1 1 - 10 -8 - 6 - 4 - 2 0 2 4 6 8 10 1st Principal Component Figure 12) Projection of Figure 9 data set onto first pr incipal component L j . T h i s is the axis projection which best represents the structure of the two d imensional data set. 26 -0 OOGE) L. i i — i — i — i 1 1 1 — i 1 — i - 1 0 - 8 - 6 - 4 - 2 0 2 4 6 8 10 2nd Principal Component Figure 13) Projection of F igure 9 data set onto second pr incipal component. Structure of original data set is almost completely lost. Da ta structure can then be seen to be mostly contained in the first pr incipal component. 27 Taking the data in Figure 9, one can see that there are two (or three) regions in which the points tend to cluster. This clustering can still be shown if the data is plotted on only one axis. However, axes X\ or X2 alone may obscure this clustering (see Figure 10). Projecting the data onto line L\ in Figure 11 results in a one dimensional plot that retains the structure of the data (Figure 12). One might arbitrarily choose line L2 but the information contained in the data would be lost (see Figure 13). One can thus see the need for discrimination when choosing a decreased dimensionality on which to project the data. Indiscriminate use of 2 dimensional subsets of m-dimensional data sets may give meaningless pictures of the data! In Figure 11, line L\ is chosen to contain the maximum variation in the data. This is called a principal component. The principal component is a new variable which describes the objects. One can see that a linear combination of X\ and x2is used to give the new variable l\. h\ = a ^ i i + bi*2i (IM) One can similarly reduce the data set in a similar manner for the m-dimensional case. The first principal component is chosen to maximize the variance. A second principal component is chosen orthogonal to the first and such that it accounts for as much as possible of the variation not accounted for by the first principal component. The m-dimensional space has now been projected onto a new two-dimensional space (a factor space), as defined by the two principal components. The projection of the data set from the feature space onto the factor space is a simple mathematical transformation. The plot of the data in factor space (Figure 14) can be readily seen to be a simple rotation of the data in Figure 9. A new basis set (comprised of the factors) is chosen. These are linear combinations of the original 28 CD O ft fl o o I — I cd ft • I — I o G •i—i PL, C\2 10 8 6 4 2 0 -2 -4 -6 -8 -10 o ° o o L 2 o o o L l o 1 0 - 8 - 6 - 4 - 2 0 2 4 6 8 10 1st Pr incipal Component Figure 14) Data from Figure 9 projected onto factor space defined by L i and L/j. By compar ison with Figure 9, factor analysis is a simple rotation of feature space. 29 experimental measurements, (the features). This operation results in a set of factor scores and a set of loadings. The loadings are the coefficients of the individual features used to construct the factor vectors (eg. a and b in equation (II.l) above). The scores are the projections of the data points in feature space onto a particular factor (the coordinates in the new space) To fully represent the m-dimensional data set (which resides in a feature space comprised of m dimensions) in factor space requires the use of m factors. These factors can be divided into two groups. The primary factors (or primary principal components) will contain the important information available from the data. The secondary factors will be due mostly to noise, to random errors, and to features that are mainly constant for all samples. The primary factors are therefore, those in which one will be most interested and the secondary factors can largely be ignored. This reduces the dimensionality down to only the number which contain meaningful information. More will be said later on processes used to determine if a factor is primary or secondary. The loadings are an indication of the merit of the features used to describe the data. If a variable (for example, absorbance at a particular wavelength) has very little variation within the sampled data, it will not contribute a great deal to the primary principal components. This will result in it having a small loading. A feature which varies greatly across the data set will have a large contribution to the primary factors and therefore will have a large loading. The scores matrix is simply the transformed data set in factor space. The score for each data point on each of the new basis vectors (factors) is the projection of the data point from feature space onto that factor. The scores matrix defining the data 30 points in factor space can be transformed back into the original data matrix in feature space by multiplying it by the loadings matrix. Feature reduction is the projection of the multi-dimensional data set onto a lower dimensional factor space. This is accomplished by plotting the scores of the data for two factors against each other. In this way, one can view a projection of the feature space onto the two dimensional plane defined by the two factors chosen. Feature selection is the process whereby the features are used or ignored based on their discriminating ability, as given by the factor loading for each feature. This allows one to ignore certain variable measurements and thus in experimental design, eliminate future experiments if a feature is not helpful in describing the structure of the data. For example, if the pH of a set of solutions is roughly the same, then pH is not a good discriminator and perhaps the measurement need not be taken for future analyses of the same type from the same system. II.2 Display of Hierarchical Clustering The basis of pattern recognition is the characterization of items that are similar as opposed to those that are dissimilar. Earlier, it was noted that objects 1 and 2 of Figure 9 were similar because of their closeness in feature space. Distance was used as a measure of similarity. This is intuitive for two dimensions. Objects that are similar will have similar values and therefore will occupy similar positions when plotted. For the multi-dimensional case, the same can also be said. A distance is easily measured in two dimensions - the length of the straight line between two points. For higher dimensions, where the concept of a straight line may be confusing, there are several methods of calculating the similarity of data points in feature space. The mathematics of some will be given later. 31 If one can measure a similarity between data points then one should be able to construct a suitable "tree structure" that would represent the pattern of the data in much the same way that an evolutionary tree represents the similarities between different classes of life. The similarity of experimental data points can be estimated mathematically and displayed similarly. Figure 15 shows a layout of five imaginary cities. Suppose a network of train lines is to be built that connects these cities. It is desired that the length of track used be minimal and that there is a maximum of one route between any two cities (ie. no circuits). The planning of such a task can be accomplished easily. The two closest cities are joined together first. This would be cities A and B. Then a track is placed between the next two closest cities that have no connecting route. This process continues until all cities are served by at least one train track and the network of tracks serves to connect all the cities (Figure 16). This is one type of "tree" used to show the relationship between the cities. The tree in which the sum of the links is a minimum is called the "minimal spanning tree". If we wished to divide the cities into two groups (say for electoral boundaries) then it could be done by drawing the line across the longest link. This would cluster the cities into two groups. Fortunately for political engineers, cities lie on a two dimensional map. Unfortunately for scientists, data often have higher dimensionality. The dendrogram is a different sort of tree. It is similar to the minimal spanning tree in that it can be used to divide a set of objects into groups or clusters. It differs in that it can be used to represent objects that have any dimensionality. One might think of the dendrogram as a transformation of the minimal spanning tree for a multi-dimensional data set onto two dimensions. The method for calculating the dendrogram is very similar to that for the minimal spanning tree. 32 Figure 15) Layout of five imaginary cities - A-E. 33 Figure 16) T h e Five cities joined with m i n i m u m length of train tracking. T h i s is a minimal spanning tree for the five cities. The "boundary" along the longest section (between cities B and C) divides the cities into two groups. B C D E Figure 17) A hierarchical dendrogram for cities in Figure 15 showing two groups or classes. 35 Figure 17 is the dendrogram for the cities in Figure 15. Each city is represented by a point along the vertical axis. The distance of the link between two cities is shown by the horizontal scale. Cities that are close together will be connected by lines that don't extend very far up the vertical axis. One can see that there are indeed two clusters of cities. This is the main use of the dendrogram - to see if there is any clustering of data into specific groups. This technique of displaying data as a dendrogram is known as agglomerative hierarchical clustering. By fusing the objects together, a stepwise set of clusters is formed. If the investigator has a priori knowledge of the data (that is, the data have an expected grouping), then hierarchical clustering will provide a list of which data points are members of each of the known number of clusters. If the structure of the data is not known, then the dendrogram display can be used to determine the number of clusters (subsets) present in the data (if any). These two methods, cluster analysis58 and principle components analysis59, are widely used in chemistry. Chemometrics, as Massart's definition35 suggests, includes techniques for analyzing data and choosing experiments. Chemometrics has allowed scientists to use mathematics to achieve analytical measurements and insight into highly multidimensional chemical systems which once may have seemed impossible. The uses and methods which can be described by the term "chemometrics" are numerous. These include non-linear calibration curves, resolution of chromatogram peaks for components with almost equal elution times, experimental optimization60, and evaluation of experimental methods. For an excellent survey of these, the reader is directed to any of several reviews on chemometrics61"63. There are also a number of 36 text books which give great detail and instruction on particular aspects of chemometrics64"67. II.3 Chemometrics in Acoustic Emission The use of chemometrics for the analysis of acoustic emission data has already revealed much new information. The analysis of acoustic emission from stressed polymers using pattern recognition techniques led to an "automated" method of identifying the samples30'68. Using pattern recognition techniques, Chan and coworkers were able to determine abnormal conditions during the welding of carbon steel69. Other workers have used pattern recognition on acoustic emission data from carbon reinforced fibre composites70. Other approaches to the analysis of acoustic emission data have included time series analysis71, and a two-stage clustering procedure developed by Majeed and Murthy72. These successful applications of chemometrics to acoustic emission data from materials lead one to assume that chemical acoustic emission analysis would benefit from the use of these and other chemometric routines. Initial attempts have confirmed this to be so31'33. 37 III. Experimental The experimental aspect of this thesis work has been: 1) the development and coding of chemometric techniques suitable for analysis of highly multivariate data, such as is obtained from chemical acoustic emission, 2) the development of a computer controlled apparatus and data acquisition software to collect chemical acoustic emission signals, 3) the application of the hardware to a number of chemical experimental systems, 4) the analysis of the chemical acoustic emission data collected using the chemometric techniques developed and coded, 5) the critical evaluation of the chemometric techniques and their utility in chemical acoustic emission studies. III.l Hardware. The next section details the hardware used and the experimental chemistries studied. The experiments were carried out using the apparatus outlined in Figure 18. III. 1.1 Transducer The sample container is placed in contact with a broadband transducer (Bruel and Kjaer, Naerum, Denmark, Model number 8312) which incorporates a 40 dB preamplifier. The transducer has a known frequency response and can access from 100 Hz to 1 MHz. Some signals were observed above 1 MHz but no calibration of the response in this region was available from the manufacturer. The transducers have an operating temperature range between -10 °C and 55 °C (although Bruel and Kjaer claim that the temperature could probably approach 100 °C for short periods without any damage73 as has been confirmed in this laboratory). Vibrations from the work surface were effectively eliminated both by the high pass filter and by a 3/4" layer of foam padding laid underneath the transducer. ACOUSTIC EMISSION INSTRUMENTATION Chart Recorder Sample D Transducer (Peak Detect) Amplifier 0 0 , 0 0 o( Oscilloscope IEEE-488 PC/AT SOURCE SENSOR FILTERING k AMPLIFICATION OISPLAY & ACQUISTION SIGNAL PROCESSING Figure 18) Chemical acoustic emission monitoring apparatus, i) Bruel and Kjaer broadband transducer - model 8312. ii) Bruel and Kjaer conditioning amplifier - model 2638. iii) Tektronix 100 MHz digital storage oscilloscope - model T2230. iv) Goerz Metrawatt chart recorder -model SE 120. v) PC/AT computer with IEEE-488 interface. 00 39 4 cm 4 cm. Short Waveguide 12 cm. Long Waveguide 30 cm. I—I 1 cm. Figure 19) Glass wave guide used to transmit acoustic signals from the sample to the transducer when the sample environment is unfavorable for the normal operation of the transducer (eg. due to high temperatures). 40 W a v e g u i d e C y l i n d e r ( " s l e e v e " ) T r a n s d u c e r S u p p o r t ( ' j a c k e t " ) Figure 20) Jacket and sleeve used to hold waveguide and transducer. 41 III. 1.2 Waveguide For experiments where the maximum comfortable operating temperature (ca. 80 °C) is exceeded, a waveguide was used to transmit the acoustic signals but not the heat from the sample to the transducer. The waveguides (Figure 19) were all of glass construction and manufactured in this department. These are similar to those described by Clark74. The "wine glass" shape waveguides allowed the sample to be placed in the cup at the top which was situated in an oven (CENCO, Central Scientific Company of Canada). Temperatures to 250 °C could be reached although heating rate and temperature control were known to be a problem with this oven. The bottom of the waveguide extended out of a hole cut in the bottom of an oven where it is held by a metal "jacket and sleeve" (Figure 20) which was designed to fit the model 8312 transducers such that only the weight of the waveguide and sample rests on the active area of the transducer. The support was manufactured in the department. Two waveguides were designed and differed only in length. One waveguide had a stem of length 30 cm and the other 12 cm. III. 1.3 Conditioning Amplifier The cable from the transducer was connected to a conditioning amplifier (Bruel and Kjaer, Naerum, Denmark, Model 2638) which provides band pass filter capabilities (linear, 0.1 Hz -10 kHz, 50 kHz - 2 MHz, 100 - 2 MHz, 200 kHz - 2 MHz) as well as amplification (0 to 60 dB). The amplifier has two outputs. The direct output carries the amplified and filtered AC (alternating current) signal from the transducer. The second output from the amplifier is of the "sample and hold" variety which yields a DC (direct current) damped peak voltage with a time constant of about 200ms75. This is connected to a chart recorder (Goerz Metrawatt, Vienna, Austria, Model SE 120) to obtain traces such as seen in Figure 21, and is the output used by integrating AE monitoring systems47. 42 Figure 21) Chart recorder trace of acoustic emission from melting ice. Vertical axis measures intensity of acoustic emission (in Volts) versus time on horizontal axis. (Data provided by Dr. P. D. WentzcM). 43 III. 1.4 Digital Storage Oscilloscope The direct (a.c.) output is attached to a 100 MHz digital storage oscilloscope (Tektronix, either model T2230 or T2340A). The oscilloscope digitizes the incoming signal into a record of 1024 (or 4096) 8-bit integers according to its operating mode. Many operating parameters may be controlled by a microcomputer across an IEEE-488 parallel interface76. Details of the control language used are given elsewhere77'78. The Volts-per-division setting, VDIV, defines the Voltage range that is visible on the oscilloscope's screen. This is also the range into which the signal is digitized into values from 0 to 255. The maximum Voltage in the range is digitized as 255 - ie. the largest number that may be represented by eight bits. The minimum Voltage, which will be negative, is digitized as zero. A zero Volt signal will be digitized as 127 (Figure 22). The signal is then stored in the scope's memory as an array of numbers which represent the Voltage at each time interval. The actual length of the time interval for each signal acquisition is determined by the time-per-division setting, TDIV. The oscilloscope samples and digitizes at 100 MHz and then averages the results to give the correct TDIV. Records of 1024 points were used throughout this work due to the larger storage space and longer analysis time 4096 point signals would require. III. 1.5 Computer and Instrumentation Interface The signals were then downloaded over an IEEE-488 parellel interface to a P C / A T IBM-compatible computer for storage on a hard-disk. Our original work was done with 12 MHz IBM PC/AT compatible microcomputers (NORA Systems, Vancouver) and was later continued using a 20 MHz Intel 80386-based portable microcomputer (model T5200 Toshiba). Both were equipped with an Intel 80287 or 80387 math coprocessor chip and a standard GPIB interface card. The interface to the 0 . 6 6 -C / ) 0.00 > - 0 . 2 2 -- 0 . 6 6 -:0.88 Digitization Of an Acoustic Signa 1.000E-4 1.200E-4 i 255 1.400E-4 0 Time (seconds) Figure 22) Acoustic emission signal with voltage scale on left vertical axis and corresponding digitization level between 0 and 255 on right vertical axis. 45 oscilloscope was provided by drivers written in assembly language and available from • VQ National Instruments (National Instruments Corp., Austin, Texas) . III.2 Software With the exception of the GPIB-PC interface drivers, all software used was coded in this laboratory. All programming is currently done in Microsoft QuickBASIC (version 4.00B, MicroSoft Inc., Mississauga, Ontario), although earlier programming used QuickBASIC 3.0, 4.0, as well as GWBASIC. The programs for data analysis and display require an Enhanced Graphics Adapter (EGA) card and EGA monitor. The programs include SIGVIEW, DENDGRAM, ABSCAT, and AEMUNCH. The data acquisition program, QAQ, and one of the analysis programs, VARI-TRAPS, do not require the EGA card. The operation of these programs will be outlined in Chapter Five. 46 IV. Chemical Systems Studied The work by Betteridge et a/.28 showed that many types of chemical systems are acoustically active. Several different systems were investigated in this work to characterize their acoustic behavior. The pattern recognition methods coded were applied to the data collected. The chemical reactions explored include the dissolution of NaOH, the solid-solid phase change of trimethylolethane (TRIMET), the activation of intumescent flame retardants, the hydration of AICI3, and liquid crystal phase changes. IV. 1 Dissolution of Sodium Hydroxide The dissolution of sodium hydroxide is exothermic and occurs fairly rapidly. This process has been studied using acoustic emission in this laboratory and has shown to provide useful information38,41. Pellets of sodium hydroxide from BDH (Toronto) were used to provide acoustic signals to test the equipment. IV.2 Trimethylolethane (Trimet) Trimethylolethane (Figure 23) is a polyol with a neopentyl structure and is a white, solid powder at room temperature. The compound has very favorable properties for industrial application (as evidenced by its use in plastics, paint resin, lubricants, and as coatings for pigments), due to its hardness, resistance to water, heat and light degradation and oxidation80. Trimet has a melting point of 190 °C without decomposition and has a plastic crystal transformation at 80 °C. The plastic crystal transformation allows Trimet to absorb a fair amount of heat energy without changing temperature (AH = 46.1 cal/g cf. A H m s i o n = 10.7 cal/g)80. Trimet's thermal properties have made it useful as an 47 CU CU o B 03 CU o tm a, •5 ro co c u C J C J cu C J C O C J s t s e ci 43 "o -t-> s m s-3 OX) 48 insulating material and as a heat storage medium for solar energy panels. The sample of solid powder TRIMET was graciously provided by Dr. P. Eckler of International Minerals and Chemical Corporation (IMC), Terre Haute, Indiana. IV.3 Intumescent Fire Retardants (IFR) Intumescent fire retardants (IFR's) are designed to be used as coatings for electrical wiring that will prevent the wire from being damaged by heat, and smoke during a fire81"86. IFR's are typically composed of i) a material which yields acid at temperatures between 100°C and 250°C (such as an inorganic acid); ii) a polyhydric, carbon-rich material; iii) an organic amide, amine, or azo compound; iv) a material which can serve as a blowing agent by decomposing and releasing gases (such as a halogenated compound); and v) a hydrated salt. The protective nature of an IFR is due to its behavior upon heating. The upper layer of the plastic body softens. The blowing agent then releases gases which cause the plastic to foam. The acid released then causes cross-linking and the polymer sets. Five samples of polypropylene samples mixed with ammonium polyphosphate and pentaerythritol were sent by Dr. L. M. Shorr of IMI (TAMI), Haifa, Israel. IV.4 Hydration of Aluminum Chloride Aluminum chloride, A I C I 3 , reacts violently with water to produce aluminum hydroxide and hydrochloric acid gas. A I C I 3 + 3H 20—> Al(OH)3 + 3HC1 This reaction is performed industrially to analyze for A I C I 3 by checking the pH of the water to which the A I C I 3 is added. This technique is suspect, considering that large amounts of HCl are released as vapor, rather than fully absorbed into solution. It was thought that acoustic emission monitoring of this process might provide a better alternative. 49 Liquid Crystal c C , w - b ! s ( 4 - n - d e c y l a n l l l n e - f o e n z i l i d e n e - 4 / - o x y h e x a n e ) Figure 24) a-w-bis(4-n-decyIaniline-benzilidene-4'-oxyhexane) 50 IV.5 liquid Crystals IV.5.1 Q -u -bis(4-n-decylaniline-benzilidene-4'-oxyhexane) The liquid crystal a-a>-bis(4-n-decylaniline-benzilidene-4'-oxyhexane) (Figure 24) has the unusual property of being rigid at the ends and loose in the middle. A small sample was provided by Professor G. Luckhurst (University of Southampton, U.K.) for preliminary investigation by acoustic emission. With a report of liquid crystal transformations being acoustically active32, it was hoped that the phase transitions of this compound (which occur between 100 °C and 180 °C would also be acoustically active. IV.6.2 4-n-pentyloxybenzylidend-4'-n-heptylaniline A sample of a second liquid crystal was also provided by Professor G. Luckhurst. The phase transitions of this compound fall within the allowed operating temperature of the transducer. 30.0 °C 41.0 °C 51.5 °C 53.1 °C 63.2 °C 78.0 °C c sG S B S C S A N I This experiment was performed by Dr. P. Y. T. Chow and used a temperature probe in direct contact with the sample. 51 V. Development of Data Acquisition and Analysis Procedures This project was begun in October 1987. There were no previous students of this research group working in this area, since this group was founded at around the same time. Careful design of both equipment and software was critical if meaningful chemical acoustic emission data were to be collected, and the system developed was to be widely applicable. The criteria used were that the apparatus should: i. be relatively inexpensive; ii. allow broad range of acoustic frequencies to be monitored; iii. be controlled by one or more personal computers of the IBM PC family, and not be dependent on UBC site mainframes; iv. provide integral mass storage for raw data and processed files; v. provide adequate signal capture rates ( > 2 MHz burst mode) and resolution (minimum of 8-bit i.e. 0.4%) vi. use software capable of supporting the highly graphical user interface needed for signal processing applications; vii. be useful to both the university laboratory, and to industry. viii. where possible, use commercially available hardware but in-house developed software (the reasons for which are discussed below). xi. be expandable and modular to allow future needs to be met (e.g. use of two sensors or faster acquisition rates), x. where possible exceed both the hardware and software specifications of the equipment developed by Prof. D. Betteridge's research group at University of Wales in Swansea, and at British Petroleum Research Centre, Sunbury-on-Thames, UK. 52 Hardware Design: An appropriate starting point for this was the experimental apparatus which was published by Belchamber et al. in 198330, and modified for subsequent use in 198633. A full comparison of these systems with those designed as part of this work is given in Table 1. Table 1. Comparison of Acoustic Emission Apparatus Types Type Equipment Type of Data Analysis Limitations microphone chart recorder qualitative data "if and when" AE is occur-ring - detect differ-ent processes if different intensities in time domain - no frequency info. - no quantified data B - integrator - chart recorder - storage device (eg. PC) quantized measure of AE energy look at nature of driving force. rate kinetics from inte-gration of AE energy no frequency data can't view individual signals multiple band pass filters integrators storage device quantized measure of acoustic frequency components elimination of noise compo-nents can see differ-ent processes if different frequency components - more expen-sive - can't see indi-vidual signals high speed digitizer storage device digitization of individual signals - pattern recog-nition of indi-vidual signal types - different classes of signals visible if different signal shapes - not real-time data - energy quan-tization less reliable - most ex-pensive 53 These earlier systems were based around Digital Equipment Corporation (DEC) MINC 11/23 processors and peripherals. The far less expensive IBM PC/XT and PC/AT range looked to be the most suitable replacement for these. Apple Macintosh systems were the only other real contender, but at higher expense, without an open architecture, and with far fewer commercially available data acquisition cards. Fast 12 MHz PC/ATs were readily available in the fall of 1987. Newly introduced Intel 80386-based systems, while still expensive, promised to provide an upgrade route compatible with the hardware and software to be initially developed. Indeed, this has since allowed us to achieve faster performance with minimal additional development. A further perceived advantage was that the maintenance of DEC hardware is generally the domain of DEC specialists, and that the cost of hardware and software maintenance contracts from DEC was 10-15% of the initial cost per annum, a recurring annual charge which we could well do without. Maintenance of IBM PC's, where necessary, could be handled by local suppliers or by the skilled staff of our electronics workshop. We chose to keep the transducer and conditioning amplifier specified by Belchamber, and previously used by Betteridge33, since at the time no better products were known of, and the bandwidth of these units was considered acceptable. Funding was not available for apparatus capable of handling multiple sensors at this time. Belchamber30 originally used a Tektronix transient digitizer to capture individual signals. More recent work had used rack-based analog-to-digital convertors (made specifically by DEC for MINC systems). The functionality of both could be achieved by any of a range of more recent digital storage oscilloscopes produced by Tektronix, some of which were available at a substantial discount to the university. These were capable of communicating with IBM PC class computers via a (slow) serial 54 interface or a much faster IEEE-488 parallel instrumentation interface. The latter interface option was chosen. A Rayonics fast transient capture system was also actively pursued, since this provided extended record lengths, however, the Tektronix 2230 oscilloscope eventually purchased was half the price and adequate additional funding was not available at that time. Software: Data acquisition and processing programs developed by Betteridge's group and used by Belchamber30'33 were written in DEC FORTRAN. These contained many useful routines, but provided essentially character-based output. In part this was due to the difficulty of writing quality graphical interfaces in standard FORTRAN 77. It was decided to abandon FORTRAN in favour of a language more able to make full use of the PC environment, including colour, pixel-based graphics, and pointing devices such as a mouse or light-pen. The alternative languages initially considered were Pascal, BASIC, C, and FORTH. Although other workers in this department had made wide use of Pascal (Turbo Pascal, Borland, Scotts Valley, CA.), it was not ideal because of its inflexible syntax (compulsory declaration of variables, use of semicolons and BEGIN/END statements, etc.) and its decreasing use in academic environments. The lower level languages, C and FORTH, while very powerful and fast, were known to result in code which was difficult to maintain by students who were, first and foremost, chemists by training. FORTH had the obvious advantage of true multitasking, which would have simplified some aspects of instrument control, but is not very readable. A fast, compiled version of BASIC, which allowed structured programming and modularity like Pascal, the mathematical capabilities of FORTRAN or C, and easy control of instruments (like FORTH) through drivers, was the obvious choice. MicroSoft QuickBASIC was selected. Borland's TurboBASIC could also have been used, but has since not been supported so widely by hardware manufacturers. 55 Since we were not going to be using the same hardware as Betteridge, purchase of their software was not sought. Rather, the mathematical principles behind key routines they used and several others were researched, and encoded. In-house development of fresh data acquisition and analysis software had several valuable advantages. i) the ability to add to and otherwise customize the code to exactly meet the needs of this research project, and its sponsors; ii) an intimate understanding of the algorithms employed and the opportunity to use the most appropriate algorithm to solve each problem; iii) the ability to design and use proprietary binary file formats ideally suited to chemical acoustic emission data, and so save valuable storage space which would have been required by less efficient encoding (e.g. as larger flat ASCII or binary-format database files which may carry a large overhead); iv) the training of a research student in multivariate methods of data analysis and chemometrics; v) long term maintenance of the software routines encoded by future students in this group, and easy incorporation of these routines into their own programs (thus saving them considerable effort). An equivalent commercial software package (ICEPAK, TekTrend International, Montreal, PQ) was available for one small part of the task - signal classification. However, its cost was US$ 9999 per copy for a compiled version without source code. The utility of this code for analyzing our data was evaluated by staff at the distributing company in cooperation with our postdoctoral researcher, Peter Wentzell. It was considered useful, but too restrictive for general usage. Even if commercial software 56 had been available at a more reasonable price, the lack of source code would have severely restricted its utility in the research environment. The major task of this thesis was design and development of this software. The task was achieved largely by the author, with help from Peter Wentzell and Ph. D. student Oliver Lee. A discussion of this code, and more recent additions by other members of this research group, will be submitted for publication87. V. l Data acquisition - the QAQ.BAS program The program QAQ.BAS (Quick AcQuisition), co-written with D. A. (Tony) Boyd, and subsequently modified by O. Lee and K. A. Soulsbury, is a driver program for the oscilloscope (Figure 25) which allows the user to enter the experimental parameters such as experiment title, amplifier setting, etc. and then collects the signals. (See Appendix). The experimental information is stored along with the signals in an .AEA file (Acoustic Emission Analysis). This facilitates consistent comparisons with other AE experiments. While using the program the oscilloscope is constantly monitored by the microcomputer. The oscilloscope is continuously digitizing the electrical input from the transducer and amplifier. A trigger threshold is set slightly above the level of background noise. When the sample produces an acoustic signal of magnitude greater than the preset threshold, the oscilloscope is triggered and stores the waveform. The computer, upon being notified that a signal has been acquired, downloads the signal from the oscilloscope, stores it on the hard disk, and then signals the oscilloscope that it is ready to receive another signal. The .AEA file is a raw data file. One is created for each experiment. This contains a header, which has all the operating parameters of the experiment and some WELCOME TO 'NEWQAQ' - PLEASE BUCKLE UP? MXXMXXMMXXXMXMMMMXMXXXXMXXXXXXXXMXXXKKX EXPT. TITLE: NaOH dissolution DATE! 05-82-1990 FILE DESTINATION: NAOHS001.AEB Scope settings I Tine Scale = 2E-4 Volts / DIM = ,2V Conditioning Attplifier : Filter = 50knz":"2HHz" AMPIification = 40 dB MODE : Triggered (Continuous) PRESS F10 TO STOP WAITING FOR TRIGGER Figure 25) QAQ - Program display during acoustic emission experiment. Ol *0 58 comments, followed by the signals produced during the experiment (as linear arrays of integers from 0 to 255) and the time of acquisition of each signal (recorded using the computer's internal clock). The signals are identified by number (stored as a four character string) and are each assigned a four character classification (which initially is the same by default for all signals from any particular experiment). The data files are in such a format that allows inclusion of parameters such as pH, temperature, etc. (see Appendix 1). The use of 8-bit integers (which have a numerical range of 0 to 255) to store the signal is the most efficient means in terms of disk storage. The array of integers can be converted into the signal's voltage profile by using the voltage per division setting of the oscilloscope which is read and recorded by the computer. For example, to convert the amplitude-time profile of the signal, t - where tj is the integer value for the signal amplitude at time interval i, to the voltage profile s, the following equation is used: s i = (*i - eA>' ' 8 / 1 0 2 4 (V-1) where t^ is the average value of tj. There are 8 divisions in which the 0 to 255 range is divided. V.2 Viewing Individual Signals - SIGVIEW.BAS The signals themselves can be recalled and viewed at any time after the experiment is terminated. Dr. P. D. Wentzell has written a program, named n o SIGVIEW , (Figure 26) which can display each individual waveform graphically (Figure 27). With this it is possible to scan backwards and forwards through the .AEA file. It is also possible to search through the file and select individual signals for viewing 59 based on their time of acquisition, numeric identifier, or their assigned class. A Fast Fourier transform (FFT) of the signals can be calculated at any point and viewed (Figure 28). An option for windowed FFT's is also included. Hardcopy output is possible by selecting the option to save a signal in ASCII6 format for later use by commercially available plotting programs such as SIGMAPLOT (version 3.0, Jandel Scientific, Sausolito, California) and LOTUS 1-2-3 (release 2.2, LOTUS, Cambridge, Maryland). V.4 Signal Classification and Editing SIGVIEW allows one to change the classification of a signal. The four character field which contains the class for each signal can be changed to something meaningful. Experimental experience allows some signals to be immediately recognized as noise. Electrical noise (Figure 29) has a very distinct waveform when compared to a transient from a chemical reaction (eg. Figure 27). Also easily identified are signals originating elsewhere in the mains power supply, possibly due to a relay switching machinery on or off (Figure 30). It is also possible to acquire signals which have their maximum and/or minimum amplitude outside the range that the oscilloscope was set to digitize (Figure 31). This type of signal may not be useful for frequency based data analysis as it will have a frequency spectrum which contains artefacts due to the artificial restriction of the signal to a maximum/minimum value. Grossly over range, signals should perhaps be largely ignored for data analysis purposes. B. American Standard Codes for Information Interchange. This type of file is one in which the data is stored in an unencrypted form. The file is therefore "readable" without the use of a program to decode the information. SIGVIEW: AE Signal Display Software Up Arrow <CR> Down Arrow F or f M or w S or s N or n I C or or E L R A B or ESC T or t or e or 1 or r or a b H or h Connand Options Display next signal Display next signal Display previous signal Show Fourier (power) spectrun of current signal Show windowed power specrtun Show current signal (after 'F' or 'H') Select nuwber of signal to be displayed Change signal identifier Change signal class Change signal tine Change signal extra record Locate signal by ID, class, tine or extra rec Repeat last locate Generate ASCII data file of display Renove signals with Class = 'BAD' Exit SIGUIEM Display this help screen - - - Hit Any Key to Continue — Figure 26) SIGVIEW - Program options for viewing acoustic emission experimental data 6 from .AEA files. o .794265 _ Figure 27) SIGVIEW - Program display of acoustic signal from cooling of liquid crystal - 4-n-pentyloxybenzylidene-4'-n-heptyaniIine. Vertical axis is measured in Volts and Horizontal axis is measured in seconds. 2 1 . 7 2 2 D G 2 . 6 : E - G 4 )1 0 2495.117 C F R E Q 3 Signal 11, ID-012 , Class»B50 , Tine- 26.64063, Extra record-Figure 28) SIGVIEW - Program display of calculated power spectrum of signal in Figure 27. Horizontal axis is measured in kHz. Vertical axis corresponds to arbitrary units. .3991484 -, f -.41033 : 0 CTIHE3 Signal 25G, ID= 256, Class* .0004092 0i Time= 754.0195$ Extra record" Figure 29) SIGVIEW - Program display of electrical noise signal -background noise with one spike. Vertical axis is measured in Volts and Horizontal axis is measured in seconds. 0\ CO Figure 30) SIGVIEW - Program display of electrical signal (noise) repeatedly found on power mains. Vertical axis is measured in Volts and Horizontal axis is measured in seconds. Figure 31) SIGVIEW - Program display of intense acoustic signal having amplitude beyone the Voltage range of the oscilloscope settings. Vertical axis is measured in Volts and Horizontal axis is measured in seconds. 66 In the case of such "improper signals", the class field can be changed to "BAD An option exists in SIGVIEW to delete the signals which have the class of "BAD_". This results in a smaller data file which will be more representative of the experiment. However, if unwisely used it can eliminate useful data and so lead to disaster. The electrical noise signals are easily identified, have essentially the frequency content of the background and show up as sharp spikes on the chart recorder. For these reasons they may indeed be useful as they give some indication of the background frequency components and are helpful in that they can be used to calibrate the time axis on the chart recorder output where there are no other signals. An experiment may produce numerous AE signals. As discovered later, some samples emit continuously for several hours. At an acquisition rate of one signal about every other second for the Tektronix T2230 scope, perhaps 7500 signals could be recorded for a single experiment! Not only does this require ca. 7.5 Mbytes of storage but to look through these by hand even with the ease of utility of the SIGVIEW program would take too long and would be unable to provide much meaningful information. Thus other more automated methods of data analysis are necessary. V.5 Frequency Content Analysis Methods - VTRAPS.BAS To get a better understanding of the way the frequency content of the acoustic signals changes over the course of the experiment a three dimensional view is convenient. These can be given by time resolved average power spectra (TRAPS) and time resolved total power spectra. Commercial software such as SURFER (v. 3.0, Golden Software, Golden, Colorado) is available to plot three dimensional data sets. A program was needed to extract the time domain information from the data file, calculate the Fourier transform, obtain the power spectrum, and store this data as a 67 three dimensional data file in ASCII format for use by the plotting program. The power spectrum of a signal shows the energy of each frequency component and is calculated from the FFT by summing the squares of the amplitude of real and imaginary components for each frequency interval and taking the square-root of the result. This project was begun by J. A. Horner and evolved into VARI-TRAPS - a program written largely by O. Lee of this research group. The duration of the experiment is divided into time windows. The power spectra of the signals that fall into each particular window are averaged together. The result is a data file (with the extension .TRA) which contains three dimensional data - time, frequency, and intensity. This can then be used to generate a three-dimensional plot of how the frequency content of the acoustic emissions of an experiment (or of experiments) changes over time (Figure 32). It is also possible to average several experiments together in this way, or compute difference maps. VARI-TRAPS will also calculate the variance spectrum (Figure 33) of an experiment. The frequency spectrum is an array of 512 points - each corresponding to a narrow frequency range. The variance of each frequency component within each time window is calculated for the duration of the experiment. If two different types of signals are generated by a process, and the frequency content of these signals is different, then there will be high variance for frequencies that are present in one of the signal types and not present in the other. Conversely, there will be a low variance for frequency components which are either present or absent from both. 68 T ^ D e p e n d e n c e o Figure 32) VTRAPS Ti 5 4-o 3-d cti P H cd 2-0 156 312 468 624 780 936 1092 1248 Frequency (kHz) Figure 33) VTRAPS - Frequency variance spectrum for the liquid crystal 4-n-pentyloxybenzylidene-4'-n-heptylaniIine. Two regions of high variance suggest that more than one process is occurring. 70 V.5 Pattern Recognition Pattern recognition techniques allow one to see more of the structure present in a multi-dimensional data set. The signals may belong to one or more classes. One would wish to see this and assign classes to the signals. The acoustic emission data is not directly suited to pattern recognition in the form it is collected. Although the data file of time domain signals can be thought of as a multi-dimensional data set (amplitude of the signal at each sample time for each signal), it would not produce as meaningful results for a number of reasons. The signal is a digitized electrical signal resulting from a vibration at the transducer. While, signals may be very similar in terms of energy level, decay rate, and frequency content, the amplitudes and time domain behavior of the waveforms may differ due to small variations in the frequency spectrum and the many different locations within the sample that an acoustic event can occur. An acoustic signal is a vibration which must travel to the transducer. The path the signal takes will affect the final detected waveform due to dispersion and echoes. To allow for pattern recognition of the acoustic emission data, the signals are not used themselves but rather each signal is characterized by a number of "descriptors". A descriptor is a numerical value which describes a statistical property of the signal. Some properties which may be described in this way are root-mean square voltage (RMS), kurtosis, area, median and modal frequency (terms defined in section VI. 1). This transforms the raw signal data set into a smaller multi-dimensional data set that can be effectively subjected to pattern recognition techniques. AECRUNCH ANALYSIS SETUP MENU DESCRIPTOR FILE? Y N AVERAGE POWER SPECTRA? Y N SMOOTH FOR AVERAGING? Y N 1.25 MHz LIMIT FOR DESCRIPTORS? Y N CORRECT FOR AMPLIFIER GAIN? Y N DESCRIPTORS: RMS Y N PEAK Y N AREA Y N CREST Y N KURTOSIS Y N Z-CROSS Y N FMAX Y N FMED Y N FMEAN Y N FBANDW Y N FCREST Y N TBINS Y N FBINS (Unsealed) Y N FBINS (Scaled) Y N Windowed Y N Descriptor Count: 19 Fourier transform Yes Use arrows to Move, spacebar to change, ESC to exit. Figure 34) AEMUNCH - Menu giving choice of descriptors to generate for descriptor file (.DS1). WAVEFORM CHARACTERIZATION IN PROGRESS FILE BEING ANALYZED! LCI.AEA PROGRAM STARTED AT! 04!10119 NUMBER OF SIGNALS TO BE ANALYZED! 1104 NUMBER OF SIGNALS COMPLETED! 4 TIME DOMAIN! RMS -PEAK = AREA = CREST FACTOR = KURTOSIS = 0-CROSSINGS -1.935073E-03 5.223754E-03 1.342146 2.699512 4.577221 375 AVERAGED FREQUENCY DISTRIBUTION! Jxxxxxxxxxxxxxxxxxxxxxxxxxx 1XXXXXXXXXXXXXXX !xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx !xxxxxxxxxxxxxxxxxxxxxxxxxxxxx IXXXXXXXXXXXXXX IXXXXXXXXX IXXXXXXXXXX ! XXXXXXX FREQUENCY DOMAIN! FREQUENCY MAX. (KHz) = 493.1641 MEDIAN FREQUENCY = 471.1915 MEAN FREQUENCY = 448.1932 BANDWIDTH <>15X) = 510.254 FREQUENCY CREST = 3.766646 SIGNAL FREQUENCY DISTRIBUTION! SXXXXXXXXXXXXX SXXXXXXXXXXX !XXXXXXXXXXXXXXXXXXXXXXX !XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX sxxxxxxxxx sxxxx !XXXXXXX sxxxx Figure 35) AEMUNCH • Display during calculation of descriptors. 0 KURTOSIS vs. Tine 228.2109 Figure 36) A E M U N C H - Program display of kurtosis plotted versus time (in seconds) for NaOH experiment using long waveguide. (3 units have been added to the kurtosis for each signal for plotting). 224.6094 53.71094 • o o o •o-o-o o o oo o oo o -o o o o oo oo o o ooo o o o oo o o o o oo 1.220747 S.0G5516 FMAX vs. KURTOSIS Figure 37) A E M U N C H - Program display for scatterplot of frequency of maximum intensity (FMAX) versus kurtosis (plus three) for NaOH experimental data. 75 V.6 Descriptor Generation - the Program AEMUNCH.BAS The program AEMUNCH.BAS, written by Dr. P. D. Wentzell (with assistance from this author and D. A. Boyd), takes an experimental data file (.AEA) and for each signal calculates a desired selection of descriptors (Figure 34). If frequency domain descriptors are requested then the Fourier transform is also calculated when the descriptors are generated (Figure 35). The resulting multi-dimensional data set is saved in a file with the extension .DS1, which contains the descriptors for each signal in an experiment. The descriptor values can be plotted against time (Figure 36), or against each other (Figure 37) using AEMUNCH.BAS. But this only allows for visualization of two dimensional subsets of the data which very likely won't reveal all of the interesting information present in the multi-dimensional data. V.7 Hierarchical Cluster Analysis - DENDGRAM.BAS Cluster analysis allows for the visualization of the clustering of the data in feature space. For this, the program DENDGRAM.BAS was written by D. B. Sibbald. This will first load in a descriptor file. Then the operator has a choice of options for scaling the data (Figure 38 - a discussion of scaling techniques follows). Any of several methods for calculating the dendrogram may be chosen (Figure 39) and the calculation follows. Visual display of the dendrograms on the screen allows one to view in close-up detail, and to change the method of display used (Figures 40 - 42). This work has been reported in the literature89 and is included as an appendix. It is possible that a dendrogram may have been calculated for a set of data with a sufficiently large.number of signals such that the resolution of the graphics display is unable to display the signal identifier for each individual signal. (In this case, a page of the signals can be selected (Figure 43) and the identifiers for this page viewed.) The dendrogram is stored in a file with the extension .DEN. This contains information on the original data file, the XXKXXKXXXXXXKXMMKKXKKMXKXMXXXXXXXKKXKXKKMKXXKXXKMMKKKKMXXXXKXKXXKXXXXXXKXXXXK SCALING 1) Auto-scaling 2) Range scale (0 - +1) 3) Perforw no scaling 4) Special functions. Use arrows to choose. Press RETURN to select. Figure 38) DENDGRAM - Menu offerring choice of scaling options. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX) DENDROGRAM for : TURTA. DES Output file I TURTA.DEN I! Single Linkage 2.' Conplete Linkage 3: Average Linkage (Weighted) 4! Average Linkage (Unweighted) 5! Centroid 6! Weighted Centreid (Median) 7! Ward's Method Minkowski Factor ! 2 Use arrow keys to choose, C to change Minkowski factor ? for a brief sunMary of the different techniques, ESC to select. Figure 39) DENDGRAM • Choice of different methods for calculating dendrograms. (Q -qu i t T-tuo page display P-plot I-Identify WLN^-change display foi^nat) Figure 40) DENDGRAM - Dendrogram display. Signal labels are given on left. Horizontal axis is (dis)similarity. 00 ( O - o t h e r page S - s i n g l e page P - p l o t Q - q u i t WLN^-change d i s p l a y f o r m a t ) ^ 1— ) 1 1 1 1 1 1 1 h Figure 41) DENDGRAM - A "page" can be selected for a detailed listing of the samples producing the dendrogram. This shows one half of the dendrogram in VO ( Q - q u i t T-tuo page display P-plot I-Identify WLH^-change display fornat) Figure 42) DENDGRAM - Use of a square-rooted similarity axis. o (Q-quit T-two page d i s p l a y P-plot I - I d e n t i f y WLN^-change d i s p l a y format) Figure 43) DENDGRAM - Use of exponential similarity axis for display of dendrogram in Figure 40. 82 method of calculating the dendrogram, and the data scaling option that was performed. V.8 Factor Analysis - ABSCAT.BAS While giving an idea of the clustering of the data points, a dendrogram doesn't actually give a picture of the data itself. Abstract factor analysis (AFA)C, also called principal components analysis (PCA) facilitates projection of a multi-dimensional data set onto two dimensions for viewing while retaining most of the data's structure. The program ABSCAT.BAS, written by this author90, will take the data from a descriptor file and allow several viewing and processing options (Figure 44). As with AEMUNCH.BAS, the data can be displayed as X-Y plots. This will show relations between two of the descriptors or give a time profile of one of the descriptors. The program calculates the set of loadings and scores for the data set (Figure 45). Stepwise parameters that are useful in determining whether a factor is a primary or secondary factor are printed into data file with the extension .RES. The AFA results, the factor loadings and scores matrix, are stored in another data file (.AF2). The loadings of the variables on the factors can be viewed to give the relative importance of each feature (Figure 46). From the .AF2 file, the factors can be plotted against each other to view projections of the data set onto any two dimensional plane defined by two of the factors. One useful feature of ABSCAT.BAS is that signals with different values in their class fields are plotted in different colors. This greatly assists the operator in evaluation of the data. When viewing a plot, it is desirable to be able to identify a particular signal C. Abstract factor analysis produces linear combinations of the descriptors that contain the best distribution of variance for viewing the data but which don't correspond to any actual property of the data - thus the term abstract factor analysis. Target factor analysis, by comparison, attempts to project the data onto factors that correspond to real properties (such as a UV-Vis absorption spectrum). Abstract Factor Analysis Utility 1 ) Abstract Factor Analysis 2 ) Print out Results of previous Abstract Factor Analysis 3 ) Show AFA factor loadings 4 ) Create DEScriptor file f r o M AFA results 5 ) View Scattergraa 6 ) Show directory 7 ) Explain file extensions 8 ) Exit Enter Selection ? Figure 44) A B S C A T - Program 's main option menu. CO Now Performing Abstract Factor Analysis Calculating covariance Matrix : 11/11 Now calculating vector tt 11 of 11 Calculating scores Matrix, ( 113/ 297 ) Figure 4 5 ) ABSCAT - Performing factor analysis. The covariance matrix is first calculated and is then used to determine the principal components ("vectors"). The factors are then used as a new basis set to project the original data into the "scores" matrix. CO i l e : NA0HEXP1.AF2 Factor tt 1 EVAL = 1.581E+G4 R E L . 1 • ll 1 I 1 i IL k 1 I, A. R P A < Z F F F - - 1 2 3 4 5 6 7 8 M E R R U - M M M B C / / / / / / / / S A E E R C A E E W R 8 8 8 8 8 8 8 8 K A S T R X D A ( E T O O H 1 S F F F F F F F F S S s j > > * > M > i > I S y. S ) U-Scale axis for eigenvector D-Digital values for loadings ESC ? Figure 46) ABSCAT - Factor loading display of first six factors. Each vertical bar corresponds to the loadings of a particular feature (descriptor) on the First six principal components starting with the one listed at the upper right (in this cas factor 1). Each "group" is comprised of six individual vertical lines corresponding to each of the first six factors. m 86 which is represented by a dot on the screen. This is possible within ABSCAT.BAS. A cursor is placed on the screen and can be positioned by the operator (Figure 47). When on or near the point(s) of interest, a bubble is opened up. This can be expanded or contracted to enclose all the data points of interest (Figure 48). Information on all the data points which are within the bubble can be listed including the signal's ID, class, and the values for the two variables / factors being plotted (Figure 49). If a group of data points are selected, the class designation of all such points can be changed. This allows one to view the points as separate colors on the display as well as on different displays using the same data. In this way, the structure of the data that is projected onto one plane for viewing can be compared with the display obtained when it is projected onto another plane. If desired, the new class designations can be stored in the descriptor file for future reference. Scatter-plot of LCI X-Axis = Tine Y-Axis = RMS 4.40E-03 Relative scale (V-ax is/X-axis) = 0.8880 X = 4.08E+02 V = 7.24E-E i I ra Use cursor keys to move cross. Press return to place.ESC-exit.?-Help. 3.05E+0' Figure 47) ABSCAT - Scatterplot of RMS Voltage versus time (in seconds) for liquid crystal - 4-n-pentyIoxybenzylidene-4'-n-heptyaniline. The "cross-hair" is a cursor which allows the user to help identify the points which correspond to each signal. Scatter-plot of L C I X-Axis = Time 4.40E-B3 Relative scale (Y-axis/X-axis) = 8.0888 Y-Axis = RMS X = 4.08E+B2 Y = 7.24E-f • cm Use cursor keys to select area.ENTER to select.?-Help .ESC-exit . 3.05E+0; Figure 48) ABSCAT - "Bubble" which opens up around cross-hairs to enclose points of interest for identification. CO CO Identif icat ion of points centered about Tine = 4.0829E+82 ± 1.531E+B2 RMS = 7.2421E-84 ± 4.434E-84 ID CLASS X-value V-value Distance from c r o s s h a i r s 2 4 E -N 2.7573E+82 6.3124E--04 aX = 1.326E+82 *V= 9.297E--85 2 5 E -N 3.1643E+02 8.6398E--04 aX= 9.186E+81 AV= 1 .398E--84 2 6 8 3.3933E+02 3.7999E--04 *X= 6.896E+81 3 . 4 4 2 E - -84 2 7 8 3.9821E+82 3.5991E--04 aX = 1.B08E+01 ^y= 3 . 6 4 3 E - -84 2 8 E -N 4.1919E+02 9.7768E--04 *X= 1.090E+01 2 . 5 3 4 E - -84 2 9 8 4.6061E+02 3.4373E--04 AX= 5.232E+01 3 . 8 8 5 E - -84 38 8 4.6168E+02 3.3898E--84 *X= 5.331E+81 3 . 8 5 2 E - -84 31 8 4.6253E+02 3.2259E--04 *X= 5.424E+01 4 . 0 1 6 E - -84 32 8 4.6346E+02 3.1792E--84 AX= 5.517E+81 4 . 8 6 3 E - -84 33 8 4.6445E+02 3.5190E--84 AX= 5.616E+B1 *Y= 3 . 7 2 3 E - -84 34 8 4.6539E+02 3.2676E--84 AX= 5.718E+B1 3 . 9 7 5 E - -84 36 0 4.6797E+02 3.3196E--84 oX= 5.968E+81 A ¥ = 3 . 9 2 2 E - -84 37 8 4.6890E+B2 3.2263E--84 *X= 6.861E+81 4 . 8 1 6 E - -84 38 0 4.7816E+82 3.2737E--84 *X= 6.187E+81 3 . 9 6 8 E - -04 47 E--N 4.8274E+82 7.7276E--84 aX= 7.445E+81 4 . 8 S 5 E - -85 49 0 4.8483E+82 3.5537E--84 AX= 7.654E+81 3 . 6 8 8 E - -84 55 0 4.9148E+82 3.6661E--84 *X= 8.319E+81 AY= 3 . 5 7 6 E - -84 58 0 4.9466E+02 3.6B11E--84 AX= 8.637E+81 3 . 6 4 1 E - -84 Space bar to see next page, B f o r prev ious page* P to see p l o t , F to create l i s t f i l e , C to change c l a s s i f i c a t i o n , ESC t o e x i t . 0 igure 49) ABSCAT • Identification of signals which have data points within the cursor bubble. In this example, the signals with the class designation "E-N" are due to electrical noise and have higher RMS values than the signals due to the liquid crystal. 90 VI. Chemometric Methods Used in this Work VI. 1 Descriptors For analysis, each of the many individual acoustic signals (eg. Figure 6) was described by an array of descriptors. These descriptors evolved over the course of the work and include : 1) Root Mean Square Voltage - RMS Here, sj is the voltage of the signal at each time interval fj and n is the number of time intervals ie. the length of the signal in points. This gives a measure of the energy of the signal. More intense signals and those which decay slowly will have higher RMS values. n RMS = [ £ ( S i 2 ) / n ] 1 / 2 i = l (VI.1) 2) Maximum Peak - MAXPK MAXPK = max | si | The value of the signal's highest peak. (VI.2) 3) Area AREA = E | s; | The summed absolute area under the signal (VI.3) 4) Crest CREST = MAXPK/RMS (VI.4) This is a measure of how "spiky" the signal is. 91 5) Number of zero crossings - Z-CROSS This is the number of times that the signal crosses the zero point. 6) Kurtosis This is the fourth statistical moment and describes the spread of the data about its mean value. A Gaussian distribution will have a kurtosis of zero. 5" is the average value of the signal - normally zero volts and a is the population standard deviation of all the signal values, sj, about zero. 7-14) RMS time octiles The RMS of each eighth of the time signal is calculated. The next descriptors0 are based on the power spectrum (Figure 50) which is calculated for each signal. Each point fj corresponds to the intensity at frequency interval number i. The length of the power spectrum, len(f), is one half the length of the signal from which it is calculated (ie. Ien(f) = n / 2). Frj, the frequency at interval i, can be found using the sampling rate TDIV. Since there are 100 points per division on the oscilloscope, the sampling rate is 100 / TDIV and thus KUR = 1/n £ [ ( S j - s ) / a ] 4 - 3 (VI.5) a = [E (Si)V ( n - l ) ] V 2 (VI.6) Fri = (1 / 2 ) (100 / TDIV) (i -1) / len(f) (VI.7) D. These are calculated using the entire frequency spectrum up to the Nyquist frequency. They will therefore be affected by the setting of the acquisition rate of the scope. Acoustic Emission during heating/cooling of a,aj-bis(4-n-decylanll ine-benzn!dene-4 ' -oxyhexane) 100-c c 2 5 0 0 Frequency (kHz) Figure 50) Frequency power spectrum of acoustic signal in F igure 6. Vert ical axis measures intensity in arbitrary units. ro 93 15) FMAX - Frequency with maximum intensity The frequency component which is most prevalent. FMAX = Frj such that fj = max | f | (VI.8) 16) FMED - Median frequency The median point of the entire power spectrum. FMED is chosen such that: j len(f) Fn: E fj = E fj (VI.9) J i=l i=j+l 17) FMEAN - Mean frequency The weighted average of all frequency components. FMEAN = E ( ^  • Frj) / len(f) (VI.9) 18) FSD - Frequency standard deviation about the mean len(D FSD = 2 ( F r r FMEAN)2 / len(f) (VI.11) i A signal with one predominant frequency component will have a low FSD. Conversely, a signal containing many (or broad) frequency components will have a relatively large FSD. 19) FCREST - Frequency crest This is similar to the time domain crest. It is calculated by dividing the value of the largest peak, ie. the intensity at FMAX, by the total RMS power of the power spectrum. 94 20) BW15% -15% Band width The frequency range about the FMAX which is above 15% of the intensity at FMAX. 21-28) RMS Octiles of the power spectrum These values are calculated from each eighth of the power spectrum. Only first order descriptors are used here, ie. the descriptor values computed for each signal are dependent solely on the properties of that signal. Second order descriptors would include inter-signal dependent features such as (eg.) time between consecutive signals. The work by Belchamber et al. used just 5 descriptors (amplitude, variance, half life, median frequency, band width)30,91. Further descriptors are presently under evaluation by other workers in this group and will be reported in future publications87,103. Through the use of these descriptors, each individual acoustic signal can be represented by a point in a descriptor space which has one dimension for each descriptor. This space is also termed a feature space. Two techniques that transform this feature space into a form that can be viewed in two dimensions are Principle Components Analysis (PCA) and hierarchical clustering. The assumption behind these methods is that similar data points will occupy the same region of descriptor space and will thus appear "near" to one another. Therefore, one can classify pairs of data points as being similar or dissimilar based on the distance between them. 95 V I . 2 Scaling Before calculating the distance between points based on several descriptors, the units of each descriptor must be considered. Different descriptors will have different units with different ranges and / or magnitudes which will make comparisons difficult unless compensation is made for this. The method of scaling used in pattern recognition depends on the nature of the data. For example, in gas chromatography (GC), data are commonly scaled such that the area under the chromatogram is unity -thus eliminating the effects of sample size on the peak areas. Before discussing the techniques of scaling implemented in this work it will be helpful to define some terms. The data matrix can be represented by X where JCJJ represents the value of the j descriptor for the i signal. Thus: x = r x n , x 21' x12< x 2 2 , x q2 ' x IP x 2p qp where p is the number of descriptors and q is the number of signals. The scaled data set will be represented by Y . Descriptors with similar units but different base values may be mean centered. For example, if for a set of solutions, the concentrations of two components are measured, the average concentration of one component may be 100 ppm while that of the other component may be 10 ppm. Mean centering is accomplished by replacing the original data set X by a new set Y calculated as follows: and x^ j = (1/q) E (xjj) xe. x^ j is the mean value for descriptor j. (VI.12) (VI.13) 9 6 It can easily be demonstrated that mean centering may not be sufficient. In the above example, a change in concentration of 50% in the 10 ppm concentration can be completely obscured by a 20% change in the 100 ppm concentration. This problem can be avoided by the use of range scaling. Range scaling sets the maximum value of the descriptor to one and the minimum to zero. Range scaling eliminates most of the problems associated with having descriptors with different units. The resulting descriptor columns all have values between zero and one and there is no dependence on the units of the descriptors. However, if in a set of measurements there is an outlier - a point which lies far from the mean of the measurements - then range scaling will place too much weight on that point and have the affect of making that descriptor meaningless for pattern recognition purposes, as demonstrated in Figure 51. To account for different units amongst the descriptors as well as allowing for the possibility of outliers, other scaling algorithms have been proposed, and implemented here. Normalization, such as in the gas chromatography case, is often used. The data set is scaled such that the sums of the values for the scaled descriptors for a sample equal a convenient constant, K (normally one). y\i = • *MIN/) / <*MAXJ " *MINJ ) (VI.14) (VI.15) Thus the transformation performed is (VI. 16) 98 15 0 + -10 -15 1.5 Unsealed data OO • " o A 1 — 1 1.0 + 0.5 + 0.0 + -0.5 -0.5 10 20 30 Range scaled data C P • 0.0 0.5 1.0 1.5 Figure 51) a) Simulated two dimensional data set with outlier in one dimension, b) Result of range scaling data between 0 and 1. Data has been compressed in one dimension, changing apparent structure. 99 More generally, normalization involves setting the sum of the squares of the descriptors for each data point to a constant. Normalization is a process where the vector describing the individual data points is scaled independently of any other data vector. It should be taken into consideration that normalizing a data vector (a row in the above matrix) forces the data to lie on a certain hyperplane within the feature space. For example, Figure 52 shows a data set which appears to have two well separated clusters. When the data set is normalized, the data are forced to lie on a circle defined by the normalization equation. This also happens in higher dimensions. With these considerations, methods which scale by descriptor (scaling by column - such as range scaling) rather than by each data vector (scaling by row - as in normalization) were chosen. One method, auto-scaling, has been previously used for pattern recognition for AE data30'33. Auto-scaling is also known as the Z-transform and involves scaling the data matrix X such that each of the variable vectors (columns) have zero mean and unit variance. This is accomplished by the following. j (VI.17) yjj = (*ij-xAj) / ° i (VI.18) where ^ A J = U/q) E (x\p (VI.19) x\j is the average value for descriptor j and a j is the standard deviation of descriptor j about the mean. ai = [ ( l / q ) E ( * j j - x A J ) 2 ] V 2 (VI.20) 100 Figure 52) a) A two dimensional data set with two clusters. b) Data set normalized. The grouping of the data has been lost due to imposition ofnormafization criteria. 1 0 1 By auto-scaling each descriptor to have unit variance, we give equal weight to each descriptor. Thus, the effects of the descriptors' units and absolute magnitudes are eliminated. The problem of outliers skewing the scaled data is also minimal. Even auto-scaling has its drawback, and that is that if a variable is essentially non-variant or has variation due only to random noise, then its importance becomes greatly exaggerated. The minor fluctuations suddenly have as much meaning as a critical descriptor. This has been addressed in this work by grouping similar descriptors such that they have a common scaling, as will be described below. In our pattern recognition studies, we include the power present in each octile of the frequency spectrum (Figure 53). If the sampling rate is set such that the Nyquist frequency falls well above the maximum response frequency at which the transducer has a significant response, then there will be sections of the frequency spectrum that describe nothing other than noise. In most of the experiments reported here, the Nyquist frequency was 2.5 MHz. With the transducer sensitive only to about 1.2 MHz, this can leave perhaps 3 octiles that contain little analytical information. Of course, one may sometimes observe where very intense emissions which occur above the somewhat arbitrary 1.2 MHz boundary. Scaling of the less variant octiles in such a way as to have as much importance as the more highly variant octiles would clearly be wrong for processes which have all their emission (eg.) below 700 kHz. The method of scaling thus used for this work is a modification conceived during this work. LINK scaling involves auto-scaling the data with the exception of the frequency octiles, which were scaled such that yy = fry - x^) /aF (VI.21) where a F is the standard deviation of the values in first octile (Figure 54). The first octile normally has the most variance of all the octiles. In this way, the relative importance of the less variant octiles is maintained in relation to the first octile. Of 1248 1560 1872 2184 2496 Frequency (kHz) Figure 53) Frequency spectrum of signal from NaOH hydrolosis divided into 8 octiles. Maximum frequency is determined from the sampling rate by the Nyquist theorem. The horizontal bars correspond to the RMS power of each octile. (RMS powers are multiplied by a factor of 10 for plotting). 103 course, if it is known in advance that any descriptor is not fulfilling a useful purpose but only describing noise or remaining largely constant, then it should be removed from the analysis. VI.3 Similarity and Distance With the feature space now defined by unitless descriptors, a measure of the signals' similarities can be calculated. The distance between the position of pairs of data points in the feature space is taken as the measure of dissimilarity. As with scaling, there are different techniques for calculating the distance between points in the multi-dimensional feature space. The most commonly used measure is the Euclidean distance. This is simply an extension of Pythagorean principle to more than two dimensions. rfij = [ E(y i k - y j k ) 2 ] 1 / 2 (VI.22) where is the distance between the data points representing signals i and j. The Manhattan or city block distance is calculated as the sum of the vertical and horizontal distances along each axis (descriptor). rfy = = I y\k - ?jk I (VI.23) The difference between this measure and the Euclidean distance is shown in Figure 55. These two metrics are actually special cases of the Minkowski formula66. dii = i s | y i k - y j k | r ] 1 / r (VI.24) One can see that when r equals two, this equals the Euclidean distance and when r equals one, the Manhattan distance. Other values of r are possible, but these are not common. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 1 2 3 4 5 6 7 8 9 10 11 12 •.Me! variance 8.700E-04 1.162E-02 5.546E+02 3.322E+00 3.446E+01 5.521E+04 2.446E+04 1.150E+04 1.297E+03 1.208E+01 2.378E-01 3.443E-01 SCALING I Average 6.3122E-02 2.5542E-01 4.9588E+01 4.4548E+00 5.5428E+00 7,0448E+02 7.3374E+02 8.4362E+02 5.6600E+02 9.8468E+00 9.5110E-01 1.1818E+00 THECMHEl.DSl HaximiM 1.8886E-01 8.5010E-01 1.5922E+02 1.2145E+01 4.8528E+01 1.2549E+03 1.2402E+03 1.2319E+03 6.5407E+02 1.7801E+01 2.5005E+00 2.7127E+00 HinimiM 2.4011E-02 1.2730E-01 1.8122E+01 2.4294E+00 2.2335E+00 3.9063E+01 3.3691E+02 5.1641E+02 4.4776E+02 3.2785E+00 2.3165E-01 2.3230E-01 Hode Auto scale Auto scale Auto scale Auto scale Auto scale Auto scale Auto scale Auto scale Auto scale Auto scale Auto scale Link scale 13 f " r b.0903E-0i Link scale 14 15 16 17 18 1.592E-01 6.366E-02 1.243E-02 5.970E-03 3.154E-03 6.9538E-01 ! 3.8914E-01 ' 3.2112E-01 1.9480E-01 1.8741E-01 2.0488E+00 1.8186E+00 8.2313E-01 5.2122E-01 5.0983E-01 2.1303E-01 1.2322E-01 8.5206E-02 7.4046E-02 7.1246E-02 Link scale Link scale Link scale Link scale Link scale 3/8 F' Colunn 13 . Hode = Auto Scale wean to : 0,0000,Use variable 12 variance Press L to load in saved Forwat file, P - Print, ESC - Execute Figure 54) DENDGRAM - Link scaling of frequency RMS octiles avoids amplification of noise in eight octiles (variables 12-19) if the higher variance of the first octile (variable 12) is used. o 105 The interpoint distances are then calculated. This results in a similarity matrix D containing the distances between all pairs of data points. D = r d n , d 1 2» • • • • d l q d21- d22> d2q 1 dql> dq2> •••• dqq J rfy is the distance between points i and j. This matrix is symmetrical and the diagonal elements, rfy, are zero. VI.4 Hierarchical Cluster Analysis The procedure for hierarchical cluster analysis is outlined below. The two most similar points, (ie. those with the shortest distance between them) are fused together into one cluster. A new distance matrix is then calculated. The procedure is repeated until only one cluster is present. Different methods for calculating a dendrogram exist and these differ mainly in the method of calculating a new distance matrix at each step58. The methods available in the DENDGRAM program written by D. B. Sibbald8 7 8 9 can be divided into two groups: those that rely solely on the distance matrix and those that refer to the actual descriptor values to calculate the new distance matrix. Table 2 contains a sample two dimensional data set. From this we can calculate the distance matrix in Table 3 (using the Euclidean distance). Table 2. Sample two dimensional data set. X y 1) 0.0 3.6 2) 1.07 1.45 3) 0.0 0.0 4) 6.68 4.02 5) 3.0 2.8 106 Table 3 . Distance matrix for data in Table 2 . 1 2 3 4 5 1 ) 0 2 ) 2 . 4 , 0 3 ) 3 . 6 , 1 . 8 , 0 4 ) 6 . 7 , 6 2 , 7 . 8 , 0 5 ) 3 . 1 , 2 . 3 , 4 . 1 , 3 . 8 , 0 Using the distance matrix, the first fusion made is between points 2 and 3 (at a distance of 1 . 8 ) . This leaves the new distance matrix in Table 4 . Table 4 . Distance matrix after first fusion. 1 2 - 3 4 5 1 ) 0 2 3 ) [ ] , 0 4 ) 6 . 7 , [ ] , 0 5 ) 3 . 1 , [ ] , 3 . 8 , 0 The spaces marked [ ] denote the distances between the remaining points and the newly formed cluster. These will be calculated and filled in according to the dendrogram method being used. Single linkage takes the distance between a point and a cluster as the shortest distance between the point and any of the points in the cluster. The new distance matrix using the single linkage method is shown in Table 5. Table 5. Distance matrix using Single Linkage. 1 2 - 3 1 ) o " 2 3 ) 2 . 4 , 0 4 ) 6 . 7 , 6 . 2 , 5) 3 . 1 , 2 . 3 , Complete linkage on the other hand, takes the distance as the largest distance between the point and any of the points in the cluster, and gives the new distance matrix shown in Table 6. A graphical representation of the difference between single linkage and complete linkage is shown in Figure 56. 4 5 0 3 . 8 , 0 8 Figure 55) Difference between Euclid vs Manhattan distances for two dimensional example. Euclidean distance between points is 10 while the Manhattan "city block distance is 14. H O 108 Table 6. Distance matrix using Complete Linkage. 1 2-3 4 5 1) 0 23) 3.6, 0 4) 6.7, 7.8, 0 5) 3.1, 4.1, 3.8, 0 Average linkage, as the name suggests, takes the distance between the remaining points and the new cluster as the average distance between the point and the points in the cluster. The new distance matrix for average linkage is given in Table 7. To continue with the algorithm, the shortest distance is taken as the next fusion. One notices that in each method's algorithm, the second fusion is between two different pairs of objects! Table 7. Distance matrix using Average Linkage. 1 2 - 3 4 5 1) 0 23) 3.0, 0 4) 6.7, 7.0, 0 5) 3.1, 3.4, 3.8, 0 There are two different ways of applying the average linkage algorithm. To outline the difference the next step will be examined. When point #1 is joined with cluster 2-3, the matrix is reduced further and the distances from each of points #4 and 5 to the new cluster 1-2-3 need to be calculated. The weighted average linkage method takes this distance as the average of the distances between the object in question and the distances to the previous two components of the new cluster. For the case of point #4 this distance is d123-4 = <rfl-4 + d23-4> / 2 = ( 6 J + 7 - ° ) / 2 = 6 - 8 5 - CV 1 - 2 5 ) 109 The unweighted average linkage method however, calculates this distance as the average distance between the object in question and all points in the new cluster. This is calculated for point # 4 as d123-4= (dl-4> / 3 + (<*2-4)/3 + (rf3-4> / 3 (V I-2 6) = (dlmA) I 3 + (</23_4) x 2 / 3 = (6.7)/3 + (7.0) 2 / 3 = 6.9 The methods involving direct use of the descriptor values also progress in the same way. Two points / clusters are fused to form a new object and a new distance matrix is calculated. The difference is that for each calculation of a new distance matrix element, these algorithms rely on the original (scaled) descriptor values (matrix Y) rather than manipulating the distance matrix D (as the methods discussed above). Therefore, these algorithms are more complicated and require more computing time. In the centroid methods, when a cluster is formed, the position in feature space representing the center of the cluster - the centroid - is calculated. It is the centroid to which the distances are calculated for the elements of the new distance matrix. Like the average linkage method, the centroid method can be performed weighted or unweighted. The unweighted centroid method (or Grower's method) uses the mass center of the new cluster as the centroid. The weighted centroid method uses the mid-point of the two previous centroids / positions as the centroid. Figure 57 illustrates this difference. Ward's method differs markedly from the other methods. Instead of calculating the distances between points and clusters, the amount of information lost with each possible fusion is calculated. At each step, each possible fusion is considered. The new Cluster A Cluster B Figure 56) Graphic representation of difference between single linkage and complete linkage for distance between two clusters. Single linkage uses the shortest distance between any two points in the cluster; complete linkage uses the longest. H Figure 57) Graphic representation of difference in centroid location for weighted and unweighted centroid methods. The location of the centroid of the new cluster formea will lie half way between the centroids of clusters A and B using the Weighted Centroid Method. The Unweighted Centroid Method will use the "center of mass" as the centroid of the new cluster. 3 2 5 1 4 I I I L J SINGLE LINKAGE 3 2 5 1 4 J I I I I I 1 L COMPLETE LINKAGE Figure 58) Dendrograms calculated from data in Table 2 using single linkage and complete linkage. 3 2 1 5 4 RUERRGE LINKAGE 3 2 1 5 4 WEIGHTED AUERAGE LINKAGE Figure 59) Dendrograms calculated for data in Table 2 using unweighted average linkage and weighted average linkage. H LO 3 2 1 5 4 CENTROID METHOD 3 2 1 5 4 UNWEIGHTED CENTROID (GROWER'S) METHOD Figure 60) Dendrograms calculated from data in Table 2 using the centroid method and the unweighted centroid method (Grower's method). H J I I I I 1 IDARD'S METHOD Figure 61) Dendrogram from data in Table 2 calculated using Ward's method. 116 mass-centered centroid is calculated for each hypothetical fusion and the Error in the Sum of Squares (ESS) is calculated. The sum of squares for a data point is the error produced when describing the point as part of a centroid. If the coordinates of a centroid are cy (with the data point in question being y|), the ESS for that point is ESSj = E ( y y - C j ) 2 (VI.27) The ESSj is calculated for each point and then totaled over all points to give the ESS for each possible fusion. ESS = £ ESSj (VI.28) i The fusion made is the one with the least ESS value - ie. the least total sums of the squares of the differences between the data point and the centroid in each descriptor. The dendrograms for each method using the distance matrix are shown in Figures 58 - 61. It is visible that each method produces a slightly different dendrogram from the same data. VI.5 Abstract factor analysis (AFA) While the dendrogram gives a clustering of the objects, AFA can be used to obtain a two dimensional projection of the data such that the information loss is minimal and the structure of the data set is maintained. The scaled data matrix represents the feature-space position of the data points. AFA transforms the data matrix such that a new set of basis vectors is chosen. These 117 basis vectors define the factor space. The new basis vectors are called factors or principle components. The first factor is chosen as the axis along which a maximum of the data set's variance is contained. The feature space can then be reduced in dimensionality essentially by subtracting out the variance accounted for by this factor. A second factor (orthogonal to the first) is similarly chosen to contain a maximum of variance of the reduced feature space. This process is continued until the feature space is entirely mapped into the factor space. The program ABSCAT, written by D. B. Sibbald90 performs abstract factor analysis on the acoustic emission descriptor data. The same scaling method is used to remove any dependence on the units of the descriptors before applying the routine. The first step is the calculation of the covariance matrix C . qj = E Oki'ykj) (vi.27) k This will be a square matrix with the same number of dimensions as there are descriptors - p. A normalized approximation of the first new basis vector to be extracted is made, Lj, by setting the l\ equal to the square root of p. Ll = (h> h> •••> p^> A new approximation is obtained by multiplying by the covariance matrix. ^l(NEW) = ^l(OLD) ' C (VI.29) The eigenvalue associated with the new vector, L 1(NEVV)» *s calculated and the vector is normalized. Ei = [2 ( / j ) 2 ] V 2 (VIJO) h = h I E x (VI.31) 118 This procedure is repeated until consistency is achieved. This is tested for by checking the value of E N E W against the previous value, E Q L D - ^ there is less than 1/10001*1 of a percent change then the vector L N is accepted. The residual covariance matrix C R is then calculated. CRij = cy - [ E N ' Ij • l j ] (VI.32) The second factor is extracted the same way. The same initial approximation is made for L2 and the sequence of obtaining successive estimates is performed by multiplication with CR . Eventually, a complete set of factors is extracted and these form the set of basis vectors for factor space: F - the loadings matrix. F = (L x L3 Lp) The scaled original data set is then projected from feature space onto factor space to produce the scores matrix represented by Z. This is easily accomplished by multiplying the scaled data matrix Y by F , the transpose of F. Z = Y F (VI.33) The scores matrix, Z, is a representation of the data matrix Y. The rows correspond to the coordinates of the data point in the factor space in the same way that the rows in the data matrix corresponded to each individual signal's descriptors. A two dimensional plot can now be made of the data in matrix Y which contains as much of the structural information possible. Plotting the scores of the first two factors against each other is achieved by plotting the first two columns of Z. This is a hyperplane in the multi-dimensional feature space that contains the most variance in the data. Hopefully, any structure of the data in feature space that was hidden by multi-dimensional nature will be visible in factor space. 119 VH. Results and Discussion VII.l Detailed Characterization of the Hardware and Software Developed VII. 1.1 Effect of Transducer on Observed Signals The piezoelectric transducer served the function of converting the ultrasonic signal into an electrical voltage. An ideal transducer8 would allow for an exact transduction of the vibration into a representative electrical signal and would have the following characteristics: a) good sensitivity, b) broad band response, c) flat response over the range of sensitivity, and d) low cost. The sensitivity to a broad range of frequencies allows the observation of signals with different frequency components and adequate quantization and characterization of their frequencies with a single transducer. That the response be constant to all frequencies is also important. For calculations of signal energy (root mean square voltage), signals of equal power but different frequency should not give vastly different apparent energies. The frequency response curve of the Bruel and Kjaer Model 8312 transducer used (#1381596) is given in Figure 62. This is the factory calibration and is given only to 1 MHz with a resolution of 50 kHz. As is shown, other transducers, even those by the same manufacturer with the same model number, have different responses41. The sensitivity to frequencies above 1 MHz is not reported but since some signals have been seen to contain components as high as 1.7 MHz, this suggests that the transducer still has a limited response above its reported operating range. It is unknown whether the frequency response is limited by the piezoelectric material or by the built-in pre-amplifier. One must hope that the ongoing development in transducer design is able to produce more ideal transducers so that research groups may directly compare their work with that of others. This topic is discussed in the literature41,92. Indeed, new transducers by Acoustic Emission Technology Corporation (Sacramento, California)93 may suit this purpose and standards are emerging94,95. Transducer Frequency Response #1381596 — #1381603 100 200 300 400 500 600 700 800 900 1000 1100 Frequency (kHz) Figure 62) Frequency response of two different Bruel & Kjaer model 8312 transducers. 121 Vn.1.2 Effect of Ambient Noise The Bruel and Kjaer transducer was found to be sufficiently sensitive to most of the systems investigated. Use of the 50 kHz - 2 MHz filter was seen to successfully render the system immune to ambient room noise in tests with speech, clapping, and "noise" from a nearby cassette-radio. Comparison of AE data obtained using different transducers of this type is not directly possible without very high resolution (12-bit minimum) data acquisition. Different techniques have been suggested for characterization of the frequency response of transducers96. It has been suggested to measure the sensitivity as a function of frequency and correct the resulting signal to yield the "true" signal97"99 but this requires a large amount of computation for each signal. This was not acceptable for this work in part because of computational time, and in part because of the 8-bit resolution limitation of this apparatus. This is why the same transducer was used for all experiments. VII. 1.3 Effect of Signal Acquisition Rate The ability of the scope to digitize the incoming signals and download the data to the computer was also satisfactory for pattern recognition (ie. qualitative) purposes. The model T2230 was able to transmit a 1024 byte signal every 1.5 seconds. This meant that faster emitting systems released far more acoustic signals than could be captured. However, the sampling of these was thought to be quite adequate for pattern recognition and classification purposes. Quantification of total acoustic energy was not possible using this system, as signals occurring during data transmission were missed and thus could not contribute to the energy sum. Use of the model T2430A oscilloscope, which was able to capture between ten and twenty signals a second, still 122 was subject to the same limitations, although using this scope, data sets of up to 7500 signals have been captured. VII. 1.4 Effect of Trigger Level An unfortunate feature of the T2230 was that the trigger level could not be controlled by computer, and had to be adjusted manually. The knob is not marked with any degradations nor has it any detents. Moreover, the computer interface to the scope was incapable of monitoring the trigger level. Thus, maintaining consistent trigger levels between experiments for future comparison was difficult. Marks on the dial and manual measurement of the trigger mark on the scope display were used to be consistent. Experiments with too low a trigger level will have baseline noise as a large proportion of signals while those with too high a trigger level will selectively discriminate against low intensity acoustic signals (ie. they will be absent from the final data file). Fortunately, the more capable T2430A oscilloscope purchased later in this work not only has a detented knob for setting the trigger level, but the value of the trigger level is displayed (in mV) on the screen and can be set and accessed via the computer-oscilloscope interface. VII. 1.5 Effect of Waveguide on Observed Signals A number of the experiments involved the use of a waveguide. It was important to characterize the ability of the waveguide to transmit the acoustic signals successfully to the transducer. This was investigated with NaOH pellets dissolving in water. Four experiments were performed: one using just a beaker as the sample vessel; one using each of the two waveguides; and a "blank" run which collected background noise. The average power spectra for the four experiments are given in Figure 63. Background signals were collected with a lower trigger level setting than used for the three experiments with NaOH. The waveguides were able to transmit the signals from the 0.200 0.100-0.000 310 820 930 Frequency (kHz) 310 620 930 Frequency(kHz) 1240 Figure 63) Four average power spectra for the NaOH hydration using different sample holders. The waveguides can be seen to attenuate the higher frequencies (above about 250 kHz) as compared to the beaker. w 124 NaOH sample to the transducer. Some attenuation of the signals was evident. In the experiment using the beaker, the signals contained frequency components of up to about 400 kHz. A peak at about 800 kHz is a sharp resonance of the transducer, and is seen in all spectra. The short waveguide attenuates the higher frequency components of signals such that, they are limited to about 250 kHz. The spectrum for the long waveguide also shows the acoustic signals attenuated to about 250 kHz but with a sharper profile at around 75 kHz compared to that of the short waveguide. The attenuation of the frequencies can be seen in Figure 64 with the lower average values of FMED and FMEAN for the experiments using the waveguides. These results are not surprising since workers in sonar have known for some time that higher frequencies are more readily attenuated as a function of distance than lower ones100. It can also be seen from Figure 63 that the use of the 50 kHz - 2 Mhz filter setting on the amplifier does not guarantee the exclusion of frequency components which fall outside this range. Although the frequency spectra show very little power below 50 kHz, there is still some power present. This suggests that the removal of these frequencies does not appear to be as would be expected from an ideal filter. It is thus apparent that the nature of the amplifier used becomes another factor to be considered. This experiment indicated the need to perform replicate experiments with the same sample vessel and / or waveguide apparatus. Comparisons of experiments performed with different waveguides are complicated by the different attenuations. Sound is attenuated according to a relationship similar to Beer's Law1 0 1. The coupling agent used to attach the apparatus to the transducer has also been shown to affect the signals acquired and is another important factor33. Silicone grease was therefore used for all experiments. N O CD CD 1000 800 6 0 0 -400 200 0 beaker short long background B S L B FMAX FMED FMEAN Figure 64) Median frequency (FMED), frequency of maximum intensity (FMAX), and mean frequency (FMEAN) for the four NaOH data sets. It can be seen that the waveguides attenuate the signals by the lower values of F M E D , F M E A N , and FMAX compared to those acquired with the 50 mL beaker. H 126 VII. 1.6 Selection of Descriptors for Pattern Recognition The choice of the best descriptors depends somewhat on the data being used. As this work was to lead toward largely automated data analysis, it was decided initially to use all descriptors at each step until studies could be done to determine more general conclusions about them. The higher octiles are included as the actual frequencies they refer to will depend on the acquisition rate settings on the equipment. Some descriptors are better at discriminating between different classes of signals than others. The performance of the descriptors to describe each signal sufficiently for pattern recognition analysis depends on the choice of descriptors used. Fisher weights of the descriptors have been calculated to show their discriminating capabilities102. This paper concluded that frequency domain descriptors gave the best discriminating values. The time domain descriptors such as RMS, PEAK, and AREA tended to be affected by the distance from the source to the transducer and thus may not be as qualitatively useful for large volume samples. Figure 65 shows the correlation between the AREA and RMS descriptors obtained for the liquid crystal, 4-n-pentyloxybenzylidene-4'-n-heptylaniline. Most points lie on a line, that is, there appears to be a high correlation of these two descriptors in this experiment. Using ABSCAT to identify points, it was found that the points lying off the line were due to signals which have amplitudes greater than the range digitized (Figure 66). The near-linear relationship between AREA and RMS would suggest that including both in pattern recognition algorithms would be redundant. While the linear relationship is useful to determine signals which may be corrupt, pattern recognition would be more consistent (and would take less computing time) by eliminating one or the other. Also, a composite descriptor, such as the ratio AREA / RMS could be used. 0.003 A Initial activity O Continuous emission O Electrical noise 0.002--0.001 0.000 0 1 Area Figure 65) Plot of R M S Voltage versus signal A R E A for the liquid crystal, 4-n-pentyIoxybenzyIidene-4 '-n-heptylaniline. The data points tend to lie on a curve; except for the electrical noise signals. Th is type of plot can therefore be used to determine a signal's validity. Figure 66) S I G V I E W - Program display of acoustic signal with ampl i tude outside Voltage range digitized by oscilloscope. Vert ical axis is measured in Volts and Horizontal axis is measured in seconds. 03 129 Table 8 Summary of Factor analysis 11-09-1989 NAOHEXP1.DS1 => Link Scaling => NAOHEXPLAF2 Eigen-value N ON) W!N + 1 1/Uv RMS VAR CPV 1 2100 2.37 9.71 5.96E-1 5.11E-1 51.12 2 885 2.16 4.10 4.46E-1 2.16E-1 72.69 3 410 2.12 1.90 3.55E-1 9.98E-2 82.68 4 193 1.31 8.93E-1 3.03E-1 4.70E-2 87.38 5 148 1.24 6.84E-1 2.56E-1 3.60E-2 90.98 6 119 1.74 5.52E-1 2.11E-1 2.91E-2 93.88 7 68.4 1.35 3.17E-1 1.80E-1 1.67E-2 95.55 8 50.6 1.26 2.34E-1 1.53E-1 1.23E-2 96.78 9 40.1 1.27 1.86E-1 1.28E-1 9.78E-3 97.76 10 31.5 1.38 1.46E-1 1.03E-1 7.68E-3 98.53 11 22.7 1.59 1.05E-1 8.17E-2 5.54E-3 99.08 12 14.3 1.30 6.64E-2 6.43E-2 3.50E-3 99.43 13 11.0 2.74 5.09E-2 4.67E-2 2.68E-3 99.70 14 4.02 1.35 1.86E-2 3.83E-2 9.79E-4 99.80 15 2.98 1.27 1.38E-2 3.07E-2 7.26E-4 99.87 16 2.34 1.51 1.08E-2 2.29E-2 5.69E-4 99.93 17 1.55 1.32 7.16E-3 1.59E-2 3.77E-4 99.97 18 1.17 4.72 5.43E-3 6.64E-3 2.86E-4 99.99 19 0.249 — 1.15E-3 0.00 6.06E-5 100.0 = average eigenvalue = 216 RMS = Residual Mean Squared error V A R N = variance accounted for by eigenvalue N N CPV = cumulative percent variance = R VARj i = l Table 8 shows the stepwise discriminating parameters for the NaOH data. The eigenvalues are used to calculate functions which give an indication as to the number of primary factors. The average value for the eigenvalues is sometimes used. Factors which have an eigenvalue less than that of the average eigenvalue are labelled as F i l e : N A 0 H E X P 1 . A F 2 Factor tt's 1 - 6 L 0 A D I N G 1_JL 1_1L p. M S E A K A R E A C K R U E R S T T 0 S I S c R 0 S S M A X M M E E D A N F B W ( 1 5 v. ) F C R 8 E S F T ' 2 8 F > 3 / 8 F j 4 5 6 7 8 8 8 8 F F > F 8 / 8 Figure 67) ABSCAT - Factor loadings for First six factors for NaOH experiment. Emphasis on primary factors is due largely to all descriptors other than higher numbered frequency octiles. F i l e : NA0HEXP1.AF2 Factors 13 - 20 L 0 A D I N G L_UL d i 11 Jul R P A C K Z F F F F F 1 2 3 4 5 6 7 8 M E R R U - M M M B C / / / / / / / / S A E E R C A E E W R 8 8 8 8 8 8 8 8 K S T T 0 S I S R X 0 S S A N ( 1 5 '/. ) E S T F F F F F F F F > > > > > > > » Figure 68) ABSCAT • Factor loadings for least significant factors for NaOH experiments. NaOH Dissolution o -4—' o o u_ -4—> o o L_ •4-> U) < "O c o o OJ (/) • < — f - — i — 1 — 1 — 1 — 0 A • A \ • * 1 1 1 1 1 1 A 50 mL Beaker A A A A A A A A . , • Short Waveguide 0 Long Waveguide 1 1 *H h 1 1 1 & ID 0 A A 0 Background 1 1 1 1 r - ^ - H - 7 - 3 - 1 1 3 First Abstract Factor 7 Figure 69) Plot of factor space defined by first two principal components for the NaOH experiments. The background signals, denoted by a circle, can be seen to occupy a seperate region of the factor space from the signals acquired from the dissolving NaOH. 133 secondary factors. Using this criteria, the NaOH data seems to have only three or four primary factors. Other criteria sometimes used also suggest that there are perhaps only four primary factors. The first four factors for the NaOH data account for 80% of the variance in the 19 dimensional feature space as shown in Table 8 by the Cumulative Percent Variance column (CPV). The factor loadings display from ABSCAT shows the loading of the descriptors on the first six factors for the combined NaOH experiments (Figure 67). All descriptors except the higher numbered frequency octiles contribute to the variance of the first six factors. This is also visible by looking at Figure 68 which shows the features which comprise the least significant factors. This picture suggests that these octiles may not be as useful as descriptors. By looking at the space (two-dimensional plane) defined by the first two factors (Figure 69), it can be seen that the background signals are separated from the NaOH signals by factor analysis. Also, the spread of the signals acquired from samples in waveguides is less than that of the signals acquired from the sample in a beaker -further evidence of the attenuation of the signal by the acoustic path from the sample to the transducer. VII. 1.7 Choice of Scaling Technique for Descriptors As discussed above, the use of auto-scaling alone was found to be inadequate for the automated analysis. The frequency octiles alone contained differing variances. (The actual frequency ranges of the octiles will depend on the settings of the digitizing oscilloscope, but an automated method of analysis should account for this automatically). link scaling was used to prevent the descriptors corresponding to the less variant RMS frequency octiles from being given greater importance than they are due. 134 VII.1.8 Visualization of Signal Classes Using Dendrograms The program DENDGRAM has seven different methods for calculating a dendrogram for a set of data. In addition, several different measures of similarity are possible. The Minkowski exponent allows the use of the squared Euclidean, the Manhattan City Block, or any distance measure which can be calculated using the Minkowski formula (given in Chapter 6). This means that a large number of different dendrograms can be generated from the same set of data. The dendrograms in Figures 58 - 61 are all calculated from the same data in Table 2. These make it clear why comparisons of the structure of different data sets by looking at dendrograms calculated by different methods (and that includes the distance metric used), is not recommended. It is advisable to use a method which is familiar to the operator and to do all comparisons using this method. The choice of which to use is not a simple matter. The literature is full of advice58,61'88 but the bottom line is that there is no "best" method. The choice of algorithm for the calculation of the dendrogram can be a matter of personal preference rather than one of formal guidelines. The squared Euclidean distance measure is almost exclusively used in this work (and in this laboratory, and largely in the field of pattern recognition in general). This is not because of any particular benefit of this measure over others but because of the familiar geometry involved. Our use of the dendrogram to show the number of classes present in a set of data has led us to try different methods in a search for the one best method. The outliers of a set of data, which usually correspond to signals due to background or even electrical noise, could be easily identified by a dendrogram regardless of the method used. No one method appeared to be the best. Each had its own merits and pitfalls. Single linkage has the method most commonly used in this laboratory, but perhaps this NA0HEXP1.DEN Display.method - LINEAR 018 BACK Use cursor keys to move, M-to nark sample, L-to l i s t samples, N-next marked. Figure 70) Dendrogram for NaOH experiments. The windowed area includes figure / u ; all the signals from the collection of background showing them to be separated from the other signals. Ul 136 is simply because single linkage is the default which appears first in the list of methods offered by DENDGRAM (Figure 39). The use of a dendrogram to determining the number of possible cluster-types of signals is somewhat subjective. By the use of appropriate display methods, it may be easier for the operator to make judgments. For example, the axis of similarity need not be linear88. DENDGRAM has options which allow for exponential display (Figure 41) and for various powers of the similarity axis (Figure 42). Early in our work, we found the use of the centroid dendrogram methods prone to cross-over. This complicated analysis, and the use of a "relative" similarity axis was developed89. In this way, the occurrence of cross-over does not create problems for interpretation. These different display methods help to "stretch" the dendrogram so that its structure and therefore the relationships among the elements of the data set can be better visualized. None of the display methods actually change the dendrogram but only serve to enhance the information. (This is discussed in reference 89). Figure 70 shows the dendrogram calculated for the NaOH experiments. The background signals are all found together as a single class of signals (denoted by the window on the figure). This shows the ability of the dendrogram to allow for some simple classification of the types of signals. It is apparent however, that the dendrogram is not able to be used to make such conclusions by itself. It is though, a useful means of visualization of the data structure. 1 3 7 VII.2 Chemical Systems VII.2.1 Sodium Hydroxide dissolution Solid NaOH has been used in this laboratory and has been shown to be a good system to study using acoustic emission47. Sodium hydroxide dissolution was used as an acoustic source to test the effects of the pathway from the sample to the transducer. The dissolution of sodium hydroxide occurs spontaneously. The pellets of NaOH dissolved completely in between 3 to 5 minutes. The reaction releases acoustic energy while the pellet is dissolving and ceases once the NaOH is completely dissolved. The source of the acoustic emission may be due to the fracture of the pellet structure and localized boiling. Bubbles are released from the pellet, and this may also be the source of acoustic energy. The bubbles may be due to absorbed air being emitted or localized boiling occurring. VII.2.2 Trimethylolethane (Trimet) Trimet was shown to be very acoustically active, as the chart recorder peak voltage output trace shows (Figure 71). The reported temperature80 of the solid-solid phase transition is 81°C. When the temperature of the sample was raised through that temperature, no acoustic emissions were detected by the apparatus. The oven was then allowed to cool to room temperature. No detectable emissions were released by the sample as the temperature of the oven dropped through the transition temperature. Indeed, no appreciable emissions were observed from the sample until the oven reached a temperature of 40° C. At this point, it was thought that no acoustic activity would be seen, as the oven had effectively cooled to within ten degrees of the temperature of the room. The 15 gram sample then started emitting acoustic signals, and continued to release the acoustic energy for four hours. At first glance this suggests an extremely large energy storage capacity - "move over cold fusion". The proposed use Figure 71) Acoustic activity during cooling of 10 grams Trimet. Intensity plotted versus time. Each horizontal oivision equals two minutes. 03 Power Spectrum for TRIMETCooling 61 : Frequency (kHz) Figure 72) Frequency power spectum for acoustic activity during cooling of Trimet. 140 of this material as an energy storage medium for solar energy panels seems to be full of promise. This is even more apparent when the duration of the acoustic activity is taken into account. The release of energy in the form of acoustic vibrations is believed to accompany the release of other forms of energy - such as heat. The sample appeared to have some success at maintaining the heat of the 81° C transition against an ambient temperature of 40 ° C. The power spectrum of the cooling Trimet is shown in Figure 72 and is quite obviously different from the NaOH spectrum (using the long waveguide in Figure 61). This suggests that a different process (or processes) is occurring in these systems. VII.2.3 Intumescent Fire Retardants (IFR) The intumescent fire retardants proved to be tricky samples on which to perform acoustic emission experiments with the current apparatus. The IFR samples were fine, white powders - physical mixtures of polypropylene, ammonium polyphosphate, and pentaerythritol. The minimum activation temperature of the samples was about 200 °C and temperatures to 400*0 would be necessary to ensure complete carbonification of the plastic substrate. The first experiment was performed in the oven. The oven had a maximum operating temperature only slightly above 2005 C and thus the sample was slowly heated to this temperature. The physical dynamics of the IFR mechanism are such that slow heating doesn't approximate well to the conditions of a fire. To circumvent this in further experiments, a Bunsen burner was used as the heat source with the sample in the short wave guide. Even with this approach, the rate of heating was not consistent between experiments. 1 4 1 The samples of IFR were found to be very active acoustically once a rapid rate of heating could be achieved. It was not clear whether the signals originated from the release of the gas (blowing agent) or from the setting of the polymer. One experiment in which the gases released from the sample actually caught fire produced a large number of signals while the fire burned. This perhaps would be the best technique to monitor this process by AE as it is similar to the actual operating conditions of the IFR and provides a more consistent heating. It is difficult to make judgements about the differences in the samples based on data that is collected under non-standard conditions. VII.2.4 Hydration of Aluminum Chloride The aluminum chloride experiments were performed using the integrator/analyzer acoustic emission apparatus developed in this laboratory47. There is no frequency data in this approach and as such no pattern recognition can be performed. One benefit of the use of this apparatus is the ability to regenerate the AE intensity plot normally only provided as a chart recorder tracing. The 200 ms time constant provided allows the output signal to drive a chart recorder and a computer analog to digital converter. The samples of A1C13 all produced the same shape of acoustic activity plot. Figure 7 3 shows this plot for one of the samples. By comparison of the initial and final baselines, some air hydration of the A I C I 3 was taking place before the addition of water. The initial addition of water produced a very large spike - due to the instantaneous emission of acoustic energy as the first cloud of HCl was released. This reaction produced very intense acoustic emissions - pointing to the vigorous nature of the reaction. 1 4 2 Time (seconds) ure 73) Acoustic activity from 5 mL of H2O added to 0.5 grams AlCi3. 1 4 3 The AE activity returned to the baseline at the completion of the reaction. AE therefore, follows the progress of the reaction as emissions are only detected during the reaction. The source of the acoustic activity may be attributed to HCl evolution, crystal fracture, and possibly localized boiling. As this is a heterogeneous reaction, it is a difficult one to monitor in situ and characterize by other means. The AE plot (Figure 73) appears to show a two-phase reaction, but the initial phase may be simply a mixing anomaly. A better method of monitoring this system may be to dissolve the A I C I 3 in dry methanol and merge the MeOH with a H 2 O stream. Figure 74 shows the pH of the final solutions as a function of weight of A I C I 3 . One notices that there is a correlation. Because of the logarithmic nature of the pH scale, accurate interpolations are somewhat difficult. In the hopes that AE may provide a more consistent correlation with mass, the total integration of the peak level output of the conditioning amplifier is plotted against mass (Figure 75). The lack of simple correlation is obvious. Furthermore, one would expect an increase in detected AE with an increase in sample weight. It is known that the point at 1.5 grams AICI3 in Figure 75 may not be valid as the sample vessel was not centered on the active element of the transducer. While the removal of this data point still doesn't leave a simple correlation, its presence raises the issue of acoustic coupling between the sample and the transducer. How much a slight variation in coupling affects the transmission of the AE is not well understood, but is well known that there is an effect8'41,96. The further question is raised of the transmission properties of the different 50 mL vessels used for each sample. Obviously, a consistent approach is demanded. 144 pH vs. weight of Alumininum Chloride mixed with 5.4 mL of water 04 H 1 1 1 0.000 0.600 1.000 1.500 2.000 A1C13 (grams) Figure 74) AICI3 experiments. p H of final solutions versus init ial mass o f A l C l 3 . 1.0E6 -3 O 4.0E5 2.0E6 0.000 Total AE from A1C13 mixed with 5.4 mL of water 0.500 1.000 1.500 A1C13 (grams) 2.000 Figure 75) Total acoustic energy recorded versus initial mass of AICI3. 146 500 400 4-(D I a I 300 T o 200 -f crj O 100-f 0.000 Time for react ion vs. weight of A1C13 mixed wi th 5.4 mL of water + 0.500 + + 1.000 1.500 2.000 A1C13 (grams) Figure 76) Reaction time (for 90% of total A E emissions) versus mass o f A l C B . 1 4 7 These experiments suggested reasons why the peak output of the amplifier is inadequate for good quantization. This time constant inhibits adequate quantization as, for example, a signal of duration 10 ms may have the same peak level as a signal of duration 100 ms. Obviously, the latter signal will have the greater energy content, but this isn't discernable from the peak output. By examining the duration of the reaction, a relationship with mass can be surmised. However, since the water was added over a period of time, any dependance of reaction time on mass may be obscured (Figure 76) by the mixing time. Also, the existence of air hydration has been noted from the initial baseline. An added possibility is that the increase in mass may have led to increased heating. A faster reaction would have produced signals more rapidly and therefore, a greater percentage of them would have been "missed" during the "dead time" of the oscilloscope. This would also explain the lack of correlation between total AE and sample mass (Figure 75). VT1.2.5 Liquid Crystals The sample of a-w-bis(4-n-decylaniline-benzilidene-4'-oxyhexane) was found to be acoustically active. However, on completion of the experiment, it was found that discernible discoloration of the sample had occurred. It was then not apparent whether the acoustic activity had been due to the phase transformations or decomposition. The highest experimental temperatures, 180-200 °C in an enclosed oven, were such that direct observation of the sample temperature wasn't possible. Thus it isn't known exactly the temperature of the sample at the time(s) of acoustic activity. The lower-temperature phase transitions of the sample of 4-n-pentyloxybenzylidend-4'-n-heptylaniline were such that a temperature probe could be used during the experiment performed by Dr. P. Y. T. Chow. The plot of the RMS CO r— I o > DH 0.005 0.004 2^ 0.003 co 0 . 0 0 2 0.001 0.000 0 o o A Initial reaction O Continuous emission O Electrical noise <$> o o H 1 1 h H J H H 1 1 H—J-—f 1 1 1 H 1-10 20 30 Time (minutes) 40 50 Figure 77) Plot of R M S Voltage versus time for the l iquid crystal, 4-n-pentyloxybenzylidene-4 '-n-heptylaniline. Signals due to electrical noise are denoted by diamonds and can be seen to be seperate f rom sample activity. OO 0.60 -0.60 -\ . 1 , 1 0.0 1.0E-4 2.0E-4 3.0E-4 4.0E-4 T i m e ( s e c o n d s ) Figure 78) Acoustic signal (short duration burst) during initial acoustic activity of liquid crystal, 4-n-pentyloxybenzylidene-4'-n-heptyIaniline. VO 0 . 6 0 OT 0 . 4 0 -< ! - 0 . 4 0 -- 0 . 6 0 -J , , , 1 0 . 0 1 . 0 E - 4 2 . 0 E - 4 3 . 0 E - 4 4 . 0 E - 4 T i m e ( s e c o n d s ) Figure 79) Continuous acoustic activity during "active phase" of 4-n-pentyioxybenzyIidene-4'-n-heptyIaniIine. 151 0.030-I ! Liquid Crystal 0.000 H 1 1 »- h — 0 500 1000 Frequency (kHz) Figure 80) Average power spectrum of cooling l iquid crystal 4-n-pentyioxyDenzylidene-4'-n-heptylaniline. 152 of acoustic output versus time is given in Figure 77. There is an initial area of acoustic activity. This initial period is characterized by signals of high intensity compared to the remainder of the experiment. This would suggest that there are two different processes occurring. The activity of the sample after 8 rriinutes (as the sample was cooling through 67 °C) would appear to be due to continuous emission as the signals all have similar RMS values. This is indicative of a sample which is continuously emitting. The initial activity is a series of individual bursts (Figure 78) as compared to the continuous signals captured after 8 minutes Figure 79. Quite noticeable are the signals which have an RMS value at a level above the normal RMS of the continuous emissions (Figure 77). These are due to some electrical noise carried on the line and can be distinguished by the AREA vs RMS plot as seen in Figure 31. Again, the average power spectrum in Figure 80 is different to that of NaOH, and to that of TRIMET. Further work which may prove useful on this sample of liquid crystal (indeed, on any sample) would be to hold the temperature nearly constant to allow the sample to only cycle through one specific phase transition. This would provide signals specifically from each transformation and may lead to the ability to distinguish amongst them by AE. Improved temperature control is needed to achieve this. 153 Vffl. Further Work The repeatability of time domain behavior of emission needs to be established. This can be done by ensuring consistent attachment of the transducer to the sample as well as using the same acoustic path (sample to transducer) for all replicate experiments and for those that the data is to be compared between systems. More accurate quantitation of acoustic energy released by a sample is needed for determinations of total acoustic output. This could be accomplished through the use of a true RMS meter connected to the transducer (in series with the conditioning amplifier). The use of the DC (peak) output of the Bruel and Kjaer amplifier is not ideal for quantitative measurements of the total energy of the acoustic emissions from a sample. The need for a transducer with linear response has been discussed. This will allow comparison of data taken with different transducers without having to correct for the variation in the frequency response. Also, one would wish to have a wider response range so that higher frequency signal components may be detected, recorded, and analyzed. A means to lower the acoustic detection limit must be pursued - especially when one considers how too high a trigger level can bias results10"*. A video camera may be used to capture the images of the acoustically active processes. In this way, the acoustic signals acquired as data may be related to an observable process that is occurring. It will also be of benefit to hyphenate acoustic emission with techniques such as infrared spectroscopy and nuclear magnetic resonance to correlate acoustic emission with changes in a system which are not directly observable by eye. 1,5 mm (depth and width) g r o o v e p a t t e r n s on s u r f a c e o f two c o n s t r u c t e d bas es . Ends o f g r o o v e channels a r e c o n n e c t e d t o f e e d t u b e s t h r o u g h ho les In t h e b a s e by V 4 inch c h r o m a t o g r a p h y f i t t i n g s , s . o ° — ^ 3 mm Sta in less S t e e l C o v e r Figure 81) Acoustic flow cell designs. Two stainless steel disks - one with a flow channel inscribed in its surface • are fastened together. <S2 0 ° o ^ o > 1 mm Tef lon S h e e t 16 mm Sta in less S t e e l Base - with Inscr ibed g r o o v e s ui 155 The other principal research interest of Dr. Adrian Wade is the use of flow injection analysis ( F I A ) 5 0 , 1 0 4 for the characterization of chemical systems. It is thought that a hyphenated A E - F I A technique might prove useful (as in the AICI3 work). Towards this end, two flow cells have been designed and constructed in this department from aluminum (Figure 81). Initial experiments with 1.0 M H C l and 3 M Na2CC«3 were unable to produce detectable acoustic activity. Further flow cells (Figure 82) have since been designed and constructed from glass (also in this department) and should be able to provide better sensitivity. 156 IX. Conclusions An apparatus and nescessary data acquisition and analysis sortware has been designed to study chemical acoustic emission. It has been shown that acoustic emission data may readily be recorded from a number of chemical systems. A range of descriptors have been developed and code produced to generate and display these. The use of pattern recognition techniques for the analysis of acoustic emission data appears to be almost mandatory given the large amount and high dimensionality of data produced by even simple AE experiments. Link scaling is proposed and has been shown to be of utility. The use of a relative similarity axis for display of dendrograms is also proposed and appears to be useful in some circumstances89 Chemical acoustic emission proves to be a promising means of detecting chemical reaction. Differences in power spectra suggest it has significant analysical potential. Industry has already welcomed the idea of a non-invasive "microphone" on the outside of a reactor vessel. More people should open their ears and put more emphasis on the sounds of chemical reactions as well as the sights. 157 X. Bibliography 1. "Inaudible Screaming of Thirsty Tomatoes is Chemist's New Tool", (on work by S. Bittman and A. P. Wade), The Globe and Mail, Friday, March 17, 1989, p. A l l . 2. Codex Germanicus, ca. 1350 A.D., quoted in J. Read, Prelude to  Chemistry, p. 75, G. Bell: London, (1939). 3. F. Forster, E. Scheil, Zeitshrift fur Metalkunde, 24, (1936), 245. 4. J. Kaiser, Arkiv fur das Eisenhuttenwessen, 24, (1953), 43. 5. A. T. Green, C. S. Lockman, R. K. Steele, "Acoustic Verification of Structural Integrity of Polaris Chambers", Modern Plastics, 41, (1963), 137. 6. T. Holroyd, "Acoustic Emission from an Industrial Applications Viewpoint", Journal of Acoustic Emission, 7, (1988), 193. 7. J. A. Simmons, H. N. G. Wadley, "Theory of Acoustic Emission from Phase Transformations", Journal of Research of the National Bureau of Standards, 89, (1984), 55. 8. D. G. Eitzen, H. N. G. Wadley, "Acoustic Emission: Establishing the Fundamentals", Journal of Research of the National Bureau of Standards, 89, (1984), 75. 9. E. F. Carome, P. E. Parks, S.J. Meaz, "Propagation of Acoustic Transients in Water", Journal of the Acoustical Society of America, 36, (1964), 946. 10. C. Allan Boyles. Acoustic Waveguides. J. Wiley and Sons: New York, 1984. 11. I. G. Scott, C. M. Scala, "A Review of Non-destructive Testing of Composite Materials", NDT International, April 1982, 75. 12. D. Mool, R. Stephenson, "Ultrasonic Inspection of a Boron / Epoxy - Aluminum Composite Panel", Material Evaluation, 29, (1971), 159. 13. F. A. Firestone, J. R. Frederick, "Refinements in Supersonic Reflectoscopy. Polarized Sound", Journal of the Acoustical Society of America, 18, (1946), 200. 14. Journal of Acoustic Emission, Acoustic Emission Group, Los Angeles, California, U.S.A., 1982. 15. A. Nozue, T. Kishi, "An Acoustic Emission Study of the Intergranular Cracking of AISI4340 Steel", Journal of Acoustic Emission,!, (1982), 1. 16. J. W. McElroy, "Development of Acoustic Emission Testing for the Inspection of Gas Distribution Pipelines", Monitoring Structural Integrity by Acoustic  Emission. ASTM STP 571, American society for Testing and Materials, 1975, pp. 59-79. 158 17. S. V. Hoa, L. Li, "Acoustic Emission During Quasi-Static Loading / Hold / Unloading in Notched Reinforced Fiber Composite Materials", Journal of Acoustic Emission, 7, (1988), 145. 18. G. Sauerbrey, "The Use of Quartz Oscillators for Weighing Thin Layers and for Microweighing", Zeitschrift fur Physik, 155, (1959), 206. 19. J. F. Alder, J. J. McCallum, "Piezoelectric Crystals for Mass and Chemical Measurements", Analyst, 108, (1983), 1169. 20. M. Thompson, G. K. Dhaliwal, C. L. Arthur, G. S. Calabrese, "The Potential of the Bulk Acoustic Wave Device as a Liquid-Phase Immunosensor", IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, UFFC-34. (1987), 127. 21. F. G. Smith, P. A. Peach, "Apparatus for the Recording of Decrepitation in Minerals", Economics in Geology, 44, (1949), 449. 22. K. Lonvik, "Thermosonimetry", Thermochimica Acta. 110. (1987), 253. 23. G. M. Clark, M. Tonks, M. Tweed, "Thermal Properties of Potassium Dichromate", Journal of Thermal Analysis. 12. (1977), 23. 24. K. Lonvik, "An Experimental Investigation of the Thermal Decomposition of Brucite by Thermosonimetry", Thermochimica Acta. 27. (1978), 27. 25. E. Ranke Madsen, Ph. D. Thesis, University of Copenhagen, 1957. 26. L. M. Belyaev, V. V. Nabatov, Yu. N. Martyshev, "Luminescence Time in the Processes of Tribo- and Crystalloluminescence", Soviet Physics - Crystallography, 7, (1963), 464. 27. J. A. C. van Ooijen, E. van Tooren, J. Reedijk, "Acoustic Emission during the Preparation of Dichloro(pyrazine)zinc(II)", Journal of the American Chemical Society, 100, (1978), 5569. 28. D. Betteridge, M. T. Joslin, and T. Lilley, "Acoustic Emissions from Chemical Reactions", Analytical Chemistry, 53, (1981), 1064. 29. M. T. Joslin, "Analytical Implications of Acoustic Emissions from Chemical Reactions", Ph. D. Thesis, University of Wales, Swansea, Wales, June 1987. 30 R. M. Belchamber, D. Betteridge, P. Y. T. Chow, T. J. Sly, A. P. Wade, "Applications of Computers in Chemometrics and Analytical Chemistry", Analytica Chimica Acta, 150, (1983), 1292-1299. 31. T. Sawada, Y. Gohshi, "Acoustic Emissions Arising from the Gelation of Sodium Carbonate and Calcium Chloride", Analytical Chemistry, 57, (1985), 366. 32. T. Sawada, Y. Gohshi, C. Abe, K. Furaya, "Acoustic Emission from Phase Transition of Some Chemicals", Analytical Chemistry, 57, (1985), 1743. 159 33. R. M. Belchamber, D. Betteridge, M. P. Collins, T . lilley, C. Z. Marczewski, A. P. Wade, "Quantitative Study of Acoustic Emission from a Model Chemical Process", Analytical Chemistry, 58, (1986), 1873. 34. O. Lee, Y. Koga, A. P. Wade, "Acoustic Emission Study of the Phase II/III Transformation of Hexachloroethane", Talanta, i n press. 35. M. R. Detaevernier, Y. Michote, L. Buydens, M. P. Derde, M. Desmet, L. Kaufman, G. Musch, J. Smeyers-Verbeke, A. Thielemans, L Dryon, D. L. Massart, "Feasibility Study Concerning the Use of Expert Systems for the Development of Procedures i n Pharmaceutical Analysis", Journal of Pharmaceutical & Biomedical Analysis, 4, 297, (1986). 36. J. G. Delly, "Sights and Sounds at 250X", The Microscope, 34, (1986), 63. 37. P. D. Wentzell, S. J. Vanslyke, A. P. Wade, "Programming Direct Memory Access Data Acquisition", Trends in Analytical Chemistry, 9, (1990), 3. 38. A. P. Wade, S. J. Vanslyke, P. D. Wentzell, "A Simple Acoustic Emission Monitoring System", Manuscript in preparation. 39. R. M. Belchamber, M. P. Collins, "Mill material physical property determining method - detects noise and converts into electrical signal that is processed by digital bandpass filter and analyzed.", British Petroleum Research Centre, U.S. Patent Application, WPI# 89-0953211/13, (1989). 40. R. M. Belchamber, M. P. Collins, "Machine acousic emission monitoring method - using pattern recognition procedure to develope soft models used as templates to detect fluctuations.", British Petroleum Research Centre, U.S. Patent Application, WPI# 89-152794/21, (1989). 41. P. D. Wentzell, A. P. Wade, "Chemical Acoustic Emission Analysis in the Frequency Domain", Analytical Chemistry. 61. (1989), 2638. 42. D. Betteridge, J. V. Cridland, T. Lilley, N. R. Shoko, M. E. A. Cudby, D. G. M. Wood, Polymer, 23, (1982), 178. 43. D. Betteridge, J. V. Cridland, T. Lilley, N. R. Shoko, M. E. A. Cudby, D. G. M. Wood, Polymer, 23, (1982), 249. 44. D. Betteridge, P. A. Connors, T. Lilley, N. R. Shoko, M. E. A. Cudby, D. G. M. Wood, "Analysis of Acoustic Emissions from Polymers", Polymer, 24, (1983), 1206. 45. T. Lilley, "A Study of Acoustic Emission from Polymers", Ph. D. Thesis, University of Wales, Swansea, Wales, December, 1980. 46. P. Chow, "A Pattern Recognition Study of Acoustic Emissions from Polymers under Stress", M. Sc. thesis, University of Wales, Swansea, Wales, March 1983. 47. S. J. Vanslyke, Chemistry Undergraduate Thesis, University of British Columbia, Canada, April 1989. 160 48. L. Seungho, M. N. Myers, R. Beckett, J. C. Giddings, "Particle Seperation and Characterization by Sedimentation / Cyclical-Field Field-Row Fractionation", Analytical Chemistry, 60, (1988), 1129. 49. A. P. Wade, P. T. Palmer, K. J. Hart, C. G. Enke, "Developement of Algorithms for Automated Elucidation of Spectral Feature / Substructure relationships in Tandem Mass Spectroscopy", Analytica Chimica Acta. 215. (1988), 169. 50. P. M. Shiundu, P. D. Wentzell, A. P. Wade, "Spectrophotometric Determination of Palladium with Sulphochlorophendazorhodanine by Flow Injection", Talanta, 37, (1990), 329. 51. D. G. Gelderloos, K. L. Rowlen, J. W. Birks, J. P Avery, C. G. Enke, "Whole Column Detection Chromatography: Computer Simulations", Analytical Chemistry, 58, (1986), 900. 52. R. J. Woodham (U.B.C. Computer Science Department), B.C. Advanced Systems Institute Workshop: "Advanced Systems for Material Sensing and Process Control in the Forest Industry", PAPRICAN Vancouver Laboratories, Vancouver, November 14 ,1988. 53. R. E. Dessy, "Chemists in the Microelectronic Toolbox", 72nc* CSC meeting, Victoria, Canada, 1989. 54. C. E. Shannon, "A Mathematical Theory of Communication", Bell Systems Technological Journal, 27, (1948), 379-423 and 623-656. 55. C. C. Sweeley, J. F. Holland, D. S. Towson, B. A. Chamberlin, "Interactive and Multisensory Analysis of Complex Mistures by an Automated Gas Chromatography System", Journal of Chromatography, 399, (1987), 173. 56. R. J. Williams, Biochemical Institute Studies IV : Individual Metabolic Patterns  and Human Disease. "An Exploratory Study Utilizing Predominately Paper Chromatographic Methods", University of Texas Publication no. 5109, University of Texas: Austin, TX, (1951), ch. 1. 57. H. Chernoff, "The Use of Faces to Represent Points in k-Dimensional Space Graphically", Journal of the American Statistical Association, 68, (1973), 361. 58. K. Burton, "Cluster Analysis", European Spring School of Chemometrics, Elsevier, Amsterdam, 1988. 59. D. L. Massart, A. Dijkstra, L. Kaufman, Eds., Evaluation and Optimization of  Laboratory Methods and Analytical Procedures. Elsevier: Amsterdam, 1978. 60. K. Burton, G. Nickless, "Optimisation via Simplex: Part I. Background, Definitions and a Simple Application", Chemometrics and Intelligent Laboratory Systems,!, (1987), 135. 61. B. R. Kowalski, "Chemometrics", Analytical Chemistry, 52, (1980), 112R. 62. R. G. Brereton, "Chemometrics in Analytical Chemistry: A Review", Analyst, 112. (1987), 1635. 161 63. S. D. Brown, T. Q. Barker, R. J. Larivee, S. L. Monfre, H. R. Wilk, "Chemometrics", Analytical Chemistry, 60, (1988), 252R. 64. B. R. Kowalski, Ed., Chemometrics: Theory and Application. ACS Symposium Series 52, American Chemical Society: Washington, D. C , 1977. 65. D. L. Massart, L. Kaufman, The Interpretation of Analytical Chemical Data bv  the Use of Cluster Analysis. New York: Wiley, 1983. 66. M. A. Sharaf, D. L. Illman, B. R. Kowalski, Chemometrics. New York: Wiley, 1986. 67. S. N. Deming, S. L. Morgan, Experimental Design: A Chemometric Approach. Elsevier: Amsterdam, 1987. 68. R. M. Belchamber, D. Betteridge, Y. T. Chow, T. Lilley, M. E. A. Cudby, D. G. M. Wood, "Evaluation of Pattern Recognition Analysis of Acoustic Emission from Stressed Polymers and Composites", Journal of Acoustic Emission, 4, (1985), 71. 69. R. W. Y. Chan, D. R. Hay, V. Caron, M. Hone, R. D. Sharp, "Classification of Acoustic Emission Signals Generated During Welding", Journal of Acoustic Emission, 4, (1985), 115. 70. A. Maslouhi, C. Roy, "Analysis of AE Signals in Time and Frequency Domains Coupled to Pattern Recognition to Identify Fracture Mechanisms in CFRP", Journal of Acoustic Emission, 8, (1989), S292. 71. R. M. Belchamber, D. Betteridge, M. P. Collins, T. Lilley, C. Z. Marczewski, A. G. Hawkes, "Time Series Analysis of Acoustic Emission Signals from Glass Reinforced Plastics", Acoustic Emission Monitoring and Analysis in  Manufacturing - PED-vol. 14. D. A. Dornfeld, Editor, The American Society of Mechanical Engineers: New York, USA, (1984),. 72. M. A. Majeed, C. R. L. Murthy, "An Efficient Unsupervised Pattern Recognition Procedure for Acoustic Emission Signal Analysis", Journal of Acoustic Emission, 8, (1989), S16. 73. R. O. Newman, Personal Communication, Bruel and Kjaer Canada, Richmond, British Columbia, Canada. 74 G. M. Clark, "Instrumentation for Thermosonimetry", Thermochimica Acta, 27, (1978), 19. 75. Bruel and Kjaer amplifier model 2638, technical manual. 76. O. Lee, P. D. Wentzell, D. A. Boyd, A. P. Wade, "Programming Control and Data Acquisition Routines for the IEEE-488 Instrumentation Interface", Trends in Analytical Chemistry, in press. 77. Tektronix 2230 Digital Storage Oscilloscope Operators, Tektronix, Beaverton, Oregon, U.S.A., 1987. 162 78. Tektronix 2430A Digital Oscilloscope Operators, Tektronix, Beaverton, Oregon, U.S.A., 1988. 79. GPIB-PC User Manual for the IBM Personal Computer and Compatibles, National Instruments Corporation, Austin, Texas, U.S.A., April 1988 Edition. 80. A Complete Guide to TRIMET Brand of Trimethylolethane. Pittman-Moore, Inc: Terra Haute, Illinois, 1988. 81-86. G. Camino, L. Costa, L. Trossarelli, "Study of the Mechanism of Intumescence in Fire Retardant Polymers", Polymer Degradation and Stability Part I: 6, (1984), 243. Part II: 7, (1984), 25. Part in: 7, (1984), 221. Part IV: 8, (1984), 13. PartV: 12, (1985), 203. Part VI: 12, 1985 (213). 87. D.B. Sibbald, P. D. Wentzell, O.Lee, I. H. Brock, K. A. Soulsbury and A. P. Wade, "An Integrated Data Acquisition and Analysis Environment for Chemical Acoustic Emission", manuscript in preparation. 88. P. D. Wentzell, D. B. Sibbald, D. A, Boyd, A. P. Wade, "Chemometric Methods for Acoustic Emission Analysis", 15 FACSS conference, Boston, 1988. 89. D. B. Sibbald, P. D. Wentzell, A. P. Wade, "Display Methods for Dendrograms", Trends in Analytical Chemistry, 8, (1989), 289. 90. D. B. Sibbald, P. D. Wentzell, D. A. Boyd, O. Lee, A. P. Wade, "Is Acoustic Emission Just Going through a Phase?", 15tn FACCS conference, Boston, 1988. 91. R. M. Belchamber, D. Betteridge, Y. T. Chow, T. Lilley, M. E. A. Cudby, D. G. M. Wood, "Looking for Patterns in Acoustic Emissions", First International Symposium on Acoustic Emission from Reinforced Composites, The Society of the Plastics Industry, Inc, July 19-21, 1983,1. 92. A. P. Wade, "Acoustic Emission: Is Industry Listening?", Chemometrics and Intelligent Laboratory Systems, in press. 93. Technical information on broadband sensors types FC500 and FAC500, Acoustic Emission Technology Corporation, Sacramento, California. 94. J. R. Mitchell, "Fundamentals of Acoustic Emission and Applications as an NDT Tool for FRP", 34th Annual Technical Conference, Reinforced Plastics / Composites Institute, The Society of the Plastics Industry, Inc., (1979), Section 3-F, p. 1. 95. T. M. Proctor Jr., "More Recent Improvements on the NBS Conical Transducer", Journal of Acoustic Emission, 5, (1986), 134. 96. Ono, Y. Higo, Progress in Acoustic Emission, 2, (1984), 343. 97. Y. Higo, H. Inaba, "The General Problems of AE Sensors", Journal of Acoustic Emission, 8, (1989), S24 163 98. H. Hatano, E. Mori, Journal of the Acoustical Society of America, 59, (1976), 344. 99. S. Kallara, P. K. Rajan, J. R. Houghton, "Acoustic Emission Transducer Modelling using System Identification Techniques", Journal of Acoustic Emission, 8,1989, S28. 100. L. Brekhovskikh, Yu. Lysanov, Fundamentals of Ocean Acoustics. Springer-Verlag: Berlin, 1982, pp. 9-11. 101. T. J. Mason, J. P. Lorimer, "Sonochemistry : Theory, Applications and Uses of Ultrasound in Chemistry", Wiley: New York, 1988. 102 P. D. Wentzell, A. P. Wade, "A Comparison of Pattern Recognition Descriptors", submitted to Journal of Chemometrics, May 1990. 103. A. P. Wade, K. A. Soulsbury, P. Y. T. Chow, I. H. Brock, "Characterization of Chemical Acoustic Emission Near the Conventional Detection Limit", manuscript in preparation for Analytica Chimica Acta. 104. J. Ruzicka, E. H. Hansen, Flow Injection Analysis. 2nd. Ed., Wiley, New York, 1988. XI. 1 Data File Formats The formats of the data fdes used have evolved over the course of this work -and indeed are perhaps due for a further change to address the need for greater than 8-bit resolution and longer record lengths. Authors involved in the design of the formats include everybody remotely involved with the acoustic emission experiments and data analysis. The formats have been designed for efficiency and for ease of utility. This work uses the IBM-PC class of computers which run under MicroSoft DOS 3.30 (MicroSoft, Redmond, Washington). Each data file has an eight character name with a specific three character extension corresponding to its type. A file is normally referred to by its type - for example - a .EXT file. The files used in this work include: i) .AEA This file is created by the QAQ program and includes the experimental data - signals, times, operating parameters, user comments, etc.. ii) .DS1 This descriptor file contains the descriptors for each of the signals contained in the .AEA file of the same name and is calculated by the AEMUNCH program. (For example, the file DATA.DS1 would be the descriptor file for the experiment file DATA.AEA.) Other versions of the descriptor file are the .DES file, and a .DS2 file. iii) .DEN This is an ASCII file which contains the linking structure of the dendrogram. It is calculated from the descriptor file by the DENDGRAM program. iv) .AF2 This ASCII file contains the factor loadings and scores of a principal components analysis of a descriptor file. This file is calculated by the ABSCAT program - which also outputs a results file, .RES, that contains information on the significance of the factors. 165 XI.1.1 .AEA - Acoustic Emission Experiment Data File The AEA files are random access files made up of two blocks: a single HEADER block and many SIGNAL blocks. The purpose of the header block is to record general information about the experiment which is common to every signal in the signal block. The header block occupies a minimum of 1040 bytes and is logically organized into 80 byte records, plus a 4 byte record for every extended field present. The signal block contains the digitized signal obtained from the digitization device. This signal is stored as 1024 unsigned integers, thus occupying 1024 bytes. A 16 byte preamble stores important information particular to that signal - id, class, time of acquisition and room for an extra four-byte record. Appended to each signal may be extended records containing experimental data such as temperature, pressure, force, etc.. F i e l d F i e l d D e s c r i p t i o n B y t e s Note s 1 E x p e r i m e n t T i t l e 40 UNUSED 2 Date 10 "MM-DD-YYYY" UNUSED 2 S a m p l i n g Mode 6 "M0DE=n" see NOTE (1) UNUSED 2 Scope ID 18 "T24 3 0Axxxxxxxxxxxx" 2 Comments (1) 80 u s e r comments 3 Comments (2) 80 4 T r a n s d u c e r S e r i a l # 20 "S/N—xxxxxxxxxxxxxxxx" UNUSED 60 5 Time P e r 100 P o i n t s 20 " T / D - x x x x x x x x x x x x x x x x " V o l t s p e r D i v i s i o n 20 " V / D - x x x x x x x x x x x x x x x x " G a i n 20 "GAIN-xxxxxxxxxxxxxxx" F i l t e r S e t t i n g 20 " F I L T E R - x x x x x x x x x x x x x " 6 T r i g g e r L e v e l 20 "TLEV=xxxxxxxxxxxxxxx" P r e - T r i g g e r Count 20 "PRET=xxxxxxxxxxxxxxx" S a m p l i n g D e l a y Time 20 "DLY=xxxxxxxxxxxxxxxx" E x t r a F i e l d 20 166 7 - 12 UNUSED 80 each for total 480 bytes 13 UNUSED 70 # of Extended Fields 4 (NEXD$) see NOTE (2) spaces 2 Extra Field Name 3 Extra Field Status byte 1 (EFSB) 14+* A l l remaining records are composed of 4 byte "EXTENDED" fields according to VAL(NEXD$). Each is composed of a 3 byte name followed by the status byte for that extended record field (EXSB). Signal Blocks Field Field Description Bytes Field Type / Notes 1 Signal ID 2 Signal Class 3 Time of Signal (sec) 4 Extra Field Record 4 STRING 4 STRING 4 SINGLE PRECISION REAL 4 5 Digitized Signal 1024 6+* Extended Field(s) 4 see NOTE (3) + There may be consecutive multiple occurrences of these fields depending on the number needed in the file. These fields may not be present depending on the status of the extra record and the number of extended records. Note (1) Sampling Modes: 1 = Level Triggered; 2= Continuous Trigger; 3=External #1; 4 = External #2 Note (2) This string (NEXD$) represents the number of EXTENDED FIELDS. These additional fields are to store additional experimental data particular to an experiment. Note (3) Beginning of the EXTENDED fields. The number of extended fields is equal to VAL(NEXD$). NEXD$ is found in Record #13 of the Header Block. Each extended field is 4 bytes of the type designated by the status byte in the Header Block. EFSB: 0 0 0 0 0 0 0 0 0 0 1 0 0 X X Y Y = 0 if the field is not in use. = 1 if the field is in use. XX field contains data of type Placement in field CHR$(EFSB) 00 STRING "ABCD" ! 01 SINGLE PRECISION REAL "xxxx" # 10 INTEGER "OOxx" % 11 LONG INTEGER "xxxx" 167 XI. 1.2 .DS1 - Descriptor Files The descriptor files are produced by the AEMUNCH program. They contain the descriptors calculated from each signal in the .AEA file and are stored in random access format in a .DS1 fde. The identification, class, time and any extended or axtra records for each signal are stored. There are currently three formats for the descriptor files. The original .DES file, which is a simple ASCII file output by the original descriptor file generating programs, PATCHAR and AECRUNCH, the binary .DS1 file - which contains all the signal information from the .AEA file - and the .DS2 file - an ASCII version of the .DS1 file. The binary descriptor file (.DS1) consists of 4-byte records and has the following format: Header Block Field Field Description Bytes Field Type / Notes 1 Number of samples 2 NROWS 2 # of variables 2 NCOLS 3 # of extended records 2 NEXD$ 4 Status byte for extra recordl EFSB (0 i f not used.l i f used) 5* Name of extra record field 4 EN$, 3 chars + a space 6+* Name of extended record 3 EXN$ UNUSED 4 Blanks for future use Extended field status byte 1 EXSB This 8 byte record is present once for each extended record (NEXD) "EXN...x" 7+ Descriptor names 8 VARNM$. NCOLS of these for each descriptor + There may be consecutive multiple occurrences of these fields depending on the number needed in the file. These fields may not be present depending on the status of the extra record and the number of extended records. 168 Signal Block - repeated for each signal (NROWS) Field Field Description Bytes Field Type / Notes 1 Signal identification 4 ID$ - Character string 2 Signal classification 4 CLSS$ - Character string 3 Time of signal acquisition 4 TIME - Single precision "OOxx" 4 Extra record 4 EXTRA$ 5+* Extended record 4 6+ VAR! 4 NCOLS descriptors single precision real numbers + There may be consecutive multiple occurrences of these fields depending on the number needed in the file. * These fields may not be present depending on the status of the extra record and the number of extended records. For the random access of the .DS1 file, the following equations have been calculated. For brevity, the parameters have been shortened for placement in equations. e = NEXT) - number of extended records y = EFSB n = NROWS - number of signals m = NCOLS - number of descriptors The capitalized parameters refer to the specific element being referred to. A = signal number (1 to n) D = descriptor number (1 to m) T = extended record number (1 to e) Records are four bytes long, note the difference between records and fields. Some fields take more than one four byte record. Record Description 1 Contains the number of signals (n) and number of descriptors per signal (m). Packed as 2-byte integers into the 4-byte records. 2 2-byte integer # of extended records (e), 1-byte (AO) analysis options, Status byte for extra record (x) 3 If (x) < > 32 then a record is present containing three characters for the name of the extra record (EN$) and a zero-byte. 169 Record Description 4 & 5 These two records appear once for each extended record (e). Three character bytes, 4 blank bytes and a status byte (EXSB). If (e) = 0 then these records are not present. 2e+y+2& Descriptor name for first descriptor. Maximum of 8 characters 2e+y+3 packed into two 4-byte records.f y = 1 if x <> 32,y=0 otherwise] 2e+y+4 & 8 character descriptor name for second descriptor. 2e+y+5 2m + 2e+y+2 Identifier for signal 1 (4 characters). 2m + 2e+y+3 Class for signal 1 (4 characters). 2m + 2e + y+4 Time for signal 1 (single precision) 2m + 2e+y+5 Extra record for signal 1 (4 characters) 2m + 2e+y+6 First extended record for signal 1 (Only if (e) < > 0) 2m + 3e + y+6 First descriptor value for signal 1 (single precision) 2m + 3e + y+7 Second descriptor for signal 1 (single precision) 3m + 3e + y+6 Identifier for signal 2 (4 characters) (1 +A)e + y+ (A+ l)m + 4(A-l) + 2 Identifier for signal A (l + A)e + y+(A+ l)m + 4(A-l) + 3 Class for signal A (l + A)e + y+(A+ l)m + 4(A-l) + 4 Time for signal A (l + A)e + y+(A+ l)m + 4(A-l) + 5 Extra record for signal A (1 +A)e+y+(A + l)m + 4(A-l) + 5 + T Extended record T for signal A (2+A)e+y+(A+ l)m + 4(A-l) + 5 + D Descriptor D for signal A 2e + y + (A+ l)m + 4*(A-l) + (A-l)n+2 Identifier for signal A There are three possible ASCII file formats. The .DES and .SCL types are consistent with the original descriptor file format, and have the form, (no. of signals - NROWS), (no. of descriptors (name of descriptor 1) (name of descriptor 2) (name of descriptor 3) - NCOLS) (name of descriptor NVARS) (signal 1 ID), (signal 1 class), (signal 2 ID), (signal 2 class), (descriptor 1), (descriptor 1), (descriptor 2), (descriptor 2), (signal NCOLS ID), (descriptor 1), (descriptor 2), 170 All of the string variables (descriptor names, ID's, and class) are in double quotes. Note that this format does not facilitate the use of time and extra records. There is also no additional record at the beginning. If this type of file is converted to the .DS1 format, the second record in the .DS1 file is written as all zeros, the time records are written as floating point zeros, and the extra record consists of all spaces. The other ASCII format is the .DS2 file. This file is essentially the same as the .DES file except that (1) the second record of the .DS1 file appears as two integers on the second line of the .DS2 file, and (2) the time and extra record are inserted after the class for each signal, the latter being written as a string. The .DS2 files are complete and can be converted back to the .DS1 files without loss of information. Note that the ASCII files can be edited and in fact that is one of their intended uses. Great care should be taken in doing this, however, as inconsistencies in file format can cause problems. XI. 1.3 .DEN - Dendrogram Data File The .DEN file contains the dendrogram structure calculated by DENDGRAM. At the front of the file is the number of signals and the number of descriptors used -along with their names. The file then has three sections. The first contains the linkages needed to construct the dendrogram. A pair of signals is listed along with a dissimilariy (distance). A cluster is referred to by the number of the signal of one of its members. The second section contains the list of signals along with the ID, arid CLASS designations. The last section contains information for the user reading the file. This includes contains the name of the original descriptor file, the scaling method used, the method for calculating the dendrogram, and the date and time of calculation. 171 The .DEN file is in ASCII format and has the following structure. (Number of signals - NROWS), ( Number of descriptors - NCOLS) (Descriptor name #1) (Descriptor name #NCOLS) (Signal number), (Signal number), (Dissimilarity of linkage) (There will be NROWS-1 linkages) (Signal number), (Signal number), (Dissimilarity of linkage) (Maximum linkage dissimilarity.®) (Signal number), (Signal ID), (Signal CLASS) (Signal number), (Signal ID), (Signal CLASS) (This is followed by the scaling used, the method of calculation and the name of the original data file.) @ This is the dissimilarity (distance) between the two most dissimilar signals. The signals are listed in order that they need to be plotted in. This will not nescessarily (indeed rarely) be the order in which they appear in the .DSl file. This order is calculated during the DENDGRAM calculation and ensures that the signals are listed in an order for which a dendrogram can be drawn. As an example, this is the .DEN file used to generate Figure 17. 5,2 "x" »y" 2 1 1.044031 4 3 1.726268 5 4 2.022375 2 4 3.623534 6.382006 2 " " w 3." 1, " ", " a" 4, " ", " a" 3, " ", " a" 5, " ", " a" "No Scaling" "Single Linkage" Original Data File -FIG17.DES Start time 19:18:47 Finished at : 19:18:49 04-12-1990 172 XI 1.4 .AF2 - Abstract Factor Analysis Output File These contain the results on the factor analysis of a descriptor file. There are currently two versions in existence. The original .AFA file and the newer used .AF2 file. They both have the same format and are written as sequential ASCII files. The eigenvalues and eigenvectors appear first. These are followed by the loadings matrix (in the same format as a DES file - ie. ED, CLASS, followed by the NCOLS loadings for that vector). The .AFA file was the first version used for AFA results. When later version of the descriptor file (.DS1) were developed, it was thought that the factor analysis results should also contain the extra information available - namely the time of the signal and any extra or extended records. The .AF2 file then contains TIME, EXTRA record and any extended records so that a .DS1 file created from this (by the ABSCAT program) is complete. The .AF2 file has the following format for a data set which uses no extended records. If these are present, they are included after the extra record field (ie. before the projected coordinates). The .AFA file is exactly the same except for the second line of the file (NEXD, EFSB), the name of the extra record, and the extra record and the time fields. NROWS, NCOLS NEXD, EFSB EXN$ EXNM$ [only if NEXD > 0 -NEXD times.] (The following information is present for each eigenvector - NCOLS times.) EVAL(1) [first eigenvalue] EVEC(1,1), EVEC(1,2), ....[linear vector containing first e-vector] .(The following block contains the projection of the original descriptor file onto .the factor space defined by the eigenvectors.) ID, CLASS, TIME, Extra record C(l), .., C(NCOLS) NCOLS coordinates of signal projected onto factor space 173 The .AF2 file also contains a trailing line containing the names of all the descriptors used from the original DEScriptor file. The process of calculating a set of basis vectors leads to a set of eigenvalues. The eigenvalues can be used to determine the significance of the factors. This information is available as a .RES file which is saved with the same name as the .AF2 file. This ASCII file is in column format with the parameter names at the top of the column and each row corresponding to the values for the parameters for each eigenvector. The file can be viewed from DOS by the TYPE command as it is formatted for the 80 column screen display, or it can be viewed from within ABSCAT. (A sample .RES file is included as Table 8). XI.2 QAQ - Data Acquisition from Digital Storage Oscilloscope The QAQ program was written in Microsoft's QuickBASIC 4.0 by D B. Sibbald. Authors contributing to this program include D. A. Boyd (who helped write the original data acquisition program - DATADUMP - in GWBASIC), K. A. Soulsbury and O. Lee (who helped with the addition of the code for the model T2430A), and P. D. Wentzell (who wished for the absence of colour in the screen display). The operation of the program requires a TEKTRONIX Digital Storage Oscilloscope (either model T2230 or T2430A) which is interfaced via an IEEE-488 parallel interface. Use of the program is straight-forward and is outlined in the following flowchart. The program first looks for a oscilloscope connected as a device on the general purpose interface board (GPIB). Having found one, it the initializes the scope to default settings. The experimental details are asked for from the user. These include experiment title, a brief description of the experiment, and information on how data is to be collected. The scope can be run in triggered mode - where the scope waits for a 174 signal with amplitude greater than a preset trigger level - or in continuous mode - where the scope is continuously being polled regardless of amplitude. The program is also capable of adding an acquisition delay in between the acquisition of each signal. This is made available for fast emitting systems and for those which emit for a lengthy period of time causing large data storage requirements. The oscilloscope is then polled for its settings. The user is informed if any of the scope settings are incorrect for the intended mode of operation and is instructed on the correct settings. Once the scope is set up properly, the time-per-division (TDIV) and Volts-per-division (VDIV) settings are read. These parameters will be required to convert the downloaded signals from a string of 8-bit (0-255) integers to a Voltage-time waveform. The user information and the operating parameters are written to an .AEA file. The acquisition of data then follows. The operation of the scope differs slightly depending on the mode of data acquisition selected. In triggered mode, the scope is armed (via the IEEE-488 interface) and the computer waits for an interrupt from the scope which signifies that a signal has been acquired. The signal is then downloaded from the oscilloscope and is written to the file along with the signal ID, class (same for all signals in an experiment), and the time of acquisition. If an acquisition delay hs been selected, the computer then pauses the specified number of seconds before sending the command to the scope to be armed. This process is repeated until a specified number of signals have been acquired or until the user terminates the experiment manually. The continuous mode of acquisition is similar. The exception being that the scope is not armed but instead runs in untriggered mode. The computer requests that the current contents of the scope's buffer be downloaded as often as possible, or as often as the user specified acquisition delay dictates. Inltlc •sclllo aize scope Input Details •n Experiment From User Check Scope For Proper Settings •pen ,AEA File For Data Storage — Arm •scllloscope Walt For Signal Dr Termination Trigger Terminate Upload Signal From •scllloscope Close File Write Signal And Time To Data File ^ E n ^ Figure 82) Data acquisition algorithm for QAQ program. H (Jl 1 NEWQAQ - QAQ f o r MICROSOFT QUICKBasic W r i t t e n by Dav id S l b b a l d Recoded from the GWBASIC program w r i t t e n by D. S l b b a l d and Tony Boyd. September, 1988 R e v i s i o n 2 .0 - A p r i l 1989 W r i t t e n by Dav id S i b b a l d & Kev in Sou lsbury .AEA f i l e s produced d i r e c t l y A l t e r a t i o n s to headerout to conform to cu r ren t format A d d i t i o n a l sound f e a t u r e s E d r i v e now not needed Labora to ry f o r Automated Chemical A n a l y s i s Department o f Chemis t ry U n i v e r s i t y o f B r i t i s h Columbia 2036 Main M a l l , Vancouver, B .C . NEWQAQ - Data A c Q u i s i t i o n from the TEK 2230 O s c i l l o s c o p e Hardware r e q u i r e d : 1) TEKTRONIX 100MHz D i g i t a l Storage O s c i l l o s c o p e 2) GPIB i n t e r f a c e . OPERATION The program f i r s t removes the MOUSE d r i v e r s from memory. It i s assumed tha t the mouse c o n t r o l programs are s t o r e d on the C: d r i v e i n the \M0USE1 s u b d i r e c t o r y . I f the l o c a t i o n o f these d r i v e r s i s d i f f e r e n t , then the program must be m o d i f i e d . I f NO mouse d r i v e r s are p r e s e n t , then there i s no need to worry . The mouse d r i v e r s i n t e r f e r e w i t h the PEN command and need to be d i s a b l e d f o r c o r r e c t ope ra t i on o f t h i s program. The scope i s then p o l l e d f o r i t ' s s t a t u s . I f I t i s not p resen t , then an e r r o r i s r epo r t ed . P o s s i b l e causes a re : scope turned o f f , scope not 2 connected, computer not c o n f i g u r e d f o r scope/GPIB d r i v e r s NOTE : The d r i v e r s used f o r SIGMAPLOT and GPIB a re not c o m p a t i b l e . The USER i s then bombarded w i t h q u e s t i o n s r e l a t i n g to the exper iment . The a m p l i f i c a t i o n s e t t i n g s and f i l t e r i n g o p t i o n s a re reques ted . The in tended sampl ing mode i s asked f o r which w i l l a f f e c t the r e q u i r e d scope s e t t i n g s . The scope i s checked f o r the s e t t i n g s tha t a re r e l e v e n t t o the a c q u i s i t i o n o f s i g n a l s . The user i s in formed o f any tha t need r e s e t t i n g . The ac tua l a c q u i s i t i o n r o u t i n e s a re as f o l l o w s : SET scope to a s s e r t SRQ when a s i g n a l i s a c q u i r e d ( f o r t r i g g e r e d mode) The waveform i s then downloaded and the SCOPE i s armed ( f o r t r i g g e r e d mode) I f a d e l a y i s r e q u e s t e d , t h i s i s done here f i r s t . Now the s i g n a l s (maximum 9999) a re w r i t t e n d i r e c t l y t o the f i l e s p e c i f i e d as they a re p r o c e s s e d . VARIABLES TIMES#() - A l i n e a r a r r a y c o n t a i n i n g the TIMER v a l u e o f each s i g n a l TEMPS! () - " " temperature read from the ADC COMMENTS - An a r r a y c o n t a i n i n g two l i n e s o f user comments and a l i n e w i t h the t r i g g e r mode. WRT$ - T2230 i s the i d e n t i t y o f the scope f o r the GPIB i n t e r f a c e . BD% - The d e v i c e number tha t Is a s s i g n e d to the scope by the GPIB i n t e r f a c e (IBFIND ' c r e a t e s ' the a c t u a l va lue ) MESS$() - An a r r a y f o r passage to the r o u t i n e MESSAGE. Th is a r r a y i s used to p r i n t messages on the sc reen o f the o s c i l l o s c o p e . FILT$() - An a r r a y c o n t a i n i n g a l l the v a l i d f i l t e r s e t t i n g s on the c o n d i t i o n i n g a m p l i f i e r . FILTERS - Con ta ins the chosen s e t t i n g ( i npu t by user ) REP$ - Used to c o n t a i n the r e p l y from the scope f o r c e r t a i n q u e r i e s . EXTITLE$ - The t i t l e o f the exper iment - input by user <7\ 3 AMP% - An Integer referring to the amplification setting (user input) FR00T$ - The name of the f i l e root. 1 to 5 characters in length. Al l f i l e s are stored with this name. Signals are stored with a three d ig i t number appended to FR00T$ (eg. NAHE001.AEB) SAMPMODES- A two d ig i t string referring to the sampling mode. The f i r s t le t ter is a T for time-delay OR a C for continuous. The second let ter is a T for triggered or a C for untriggered. DELAY! - The number of seconds to pause between each acquisit ion in Time - delayed acquisition mode. TIMDIVS - The seconds per d iv is ion . The scope sends i t back as a str ing In engineering notation.( eg 50E+3 for 50ms per division) V0LDIV$ - The volts per division sett ing. This is sent in funny form. COUNTX - The current signal number (IE. the number of the NEXT signal to be acquired. INITIAL* - The TIMER reading when the acquisit ion is started. It is possible that the f i r s t signal wil l be taken at 0.0 seconds. DRIVES - A two character string representing the drive in use. (D:) GPIB SUBROUTINES IBFIND - Opens a device for communication. IBWRT - Writes a str ing to the device. IBRD - Inputs a str ing from the device output buffer. The str ing parameter must be a buffer containing enough space for the device output. GPIB VARIABLES IBSTA% - Returns TRUE (-1) i f the IBFIND function fa i l s DECLARATIONS Common GPIB status variables COMMON SHARED ibsta%, IBERR%, ibcnt% GPIB Subroutine Declarations 4 DECLARE SUB IBFIND (BDNAMES, bd%) DECLARE SUB Ibrd (bd%. RD$) DECLARE SUB IBRDF (bdX. FLNAMES) DECLARE SUB ibwrt (bd%, wrt$) DECLARE SUB IBWRTF (bd%, FLNAME$) ckBASIC Subroutine Declarations DECLARE SUB CHECKSCOPE (bd%, sampmode$) DECLARE SUB GETDTAILS (ext i t leS, COMMENTS(), AMP%, FILT$(). frootS. sampmodeS, sennumS, classS) DECLARE SUB getdrive (defaults, frootS, driveS, dir$) DECLARE SUB GETTEMP (TEMP!) DECLARE SUB headerout (C0MMENTSO, delay, sensornum$) DECLARE SUB MESSAGE (bd%, SOURCESO, NUM%) DECLARE SUB SCOPE (bd%, mesS, REP%, REPS) DECLARE SUB TTLE () CALL TTLE DIM TEMPS!(2000) ' stores the temperature DIM COMMENTS(1 TO 3) CONST wrtS » "T2230" ' identi ty of scope for IBFIND DIM MESS$(10) DATA 5 DATA "15Hello!","l3Welcome to QAQ by DAVID SIBBALD" DATA "12 and TONY BOYD" DATA " 9Please follow the inst ruct ions" ," 8given by the computer." READ nummes% FOR 1% = 1 TO nummes% READ MESS$(I%) NEXT DATA 6 DATA "Linear","0.1 Hz - 10kHz"."50kHz - 2MHz","100kHz - 2MHz" DATA "200kHz - 2MHz","400kHz - 2MHz" READ numfilts% DIM FILT$(0 TO numfiltsX) FOR 1% = 1 TO numfilts% READ FILT$(I%) 5 NEXT CALL IBFINO(wrt$, bd%) ' open scope f o r communicat ion REP$ = SPACE$(80) CALL SC0PE(bdX, " S T A ? " , 0, REPS) ' check s t a t u s o f scope IF i bs ta% < 0 THEN 4000 ' ERROR Now tha t we have e s t a b l i s h e d communication w i t h the scope, l e t ' s c l e a r out the MOUSE c o n t r o l l e r s t ha t i n t e r f e r e w i t h the PEN commands. SHELL " c : \ m o u s e l \ c p a n e l o f f > k i l l . f i l " SHELL "c : \mouse l \mouse o f f > k i l l . f i l " SHELL " e r a s e k i l l . f 1 1 " Th is w i l l remove the mouse d r i v e r s from memory. P lease remember to p l a c e them back o r O l i v e r w i l l be d i s c o n c e r t e d . CALL SCOPEfbdX, " I N I " , 0, REP$) ' se t scope to i n i t i a l s t a t u s COLOR 14 CALL MESSAGE{bd%, MESS$(), 0) ' c l e a r s scope CALL MESSAGE(bd%. MESS$(), 5) ' p r i n t MESS$() CALL G E T D T A I L S ( e x t i t l e $ , COMMENTS(), AMP%, F I L T S O , f r o o t S , sampmodeS, sennumS, c l a s s S ) FILTERS = FILT$(0) ERASE FILTS IF MID$(sampmodeS, 2 , 1) = " T " THEN d e l a y ! = VAL(RIGHTS(sampmodeS, LEN(sampmodeS) - 2)) sampmodeS = LEFTS(sampmodeS, 2) delay% = -1 END IF IF sampmodeS = " C C " AND d e l a y ! = 0 THEN delay% = -1 d e l a y ! = .5. END IF CLS LOCATE 5, 1 PRINT " P l e a s e se t the f o l l o w i n g on the scope : - " LOCATE 7, 1 IF RIGHTS(sampmodeS, 1) = "T " THEN PRINT " l ) T r i g g e r Mode : S i n g l e Sweep Mode" ELSE 6 PRINT "1) T r i g g e r Mode : P . P . AUTO" END IF PRINT "2 ) V e r t i c a l Mode : Channel 1 " PRINT "3 ) H o r i z o n t a l Mode : A " PRINT "4 ) STORE Mode" PRINT "5 ) 1 K A c q u i s i t i o n " PRINT "6) P lease make su re tha t the SEC/DIV and the V0LTS/DIV In the CAL d e t e n t s . " PRINT "7 ) HF REJECT OFF" INPUT a$ check the BOZO's new scope s e t t i n g s CALL CHECKSCOPE(bd%, sampmodeS) CALL MESSAGE(bd%, M E S S S O . 0) now we have the scope se t up f o r the exper iment LOCATE 2 , 15: PRINT " WELCOME TO 'NEWQAQ' - PLEASE BUCKLE UP LOCATE 3 15* PRINT *' * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * COLOR 9 LOCATE 5 , 10: PRINT "EXPT. T ITLE: " ; 'EXTITLES LOCATE 7, 10: PRINT "DATE: " ; 'DATES LOCATE 14, 5 : PRINT "Time S c a l e = " ; 'TIMDIVS LOCATE 14, 35 : PRINT " V o l t s / DIV = " ; 'VOLDIVS LOCATE 18, 5: PRINT " F i l t e r = " ; 'FILTERS LOCATE 18. 3 5 : PRINT " A m p l i f i c a t i o n = " ; 'AMP%;" dB" LOCATE 18, 56: PRINT " d B " LOCATE 9 . 10: PRINT " F I L E DESTINATION: " ; 'FROOTS; " .AEA" LOCATE 10, 18: PRINT "SIGNAL # : " ; LOCATE 20 , 35 : PRINT "MODE : " ; COLOR 13 LOCATE 12, 24: PRINT "Scope s e t t i n g s : " LOCATE 13, 24: PRINT "================" LOCATE 16, 20 : PRINT " C o n d i t i o n i n g A m p l i f i e r : " LOCATE 17, 20: PRINT "======================«" COLOR 11 LOCATE 5, 23 : PRINT e x t i t l e S LOCATE 7, 18: PRINT DATES LOCATE 18, 51 : PRINT AMP% 7 LOCATE 9 , 28: PRINT f r o o t $ ; " . A E A " LOCATE 10. 27 : PRINT USING "####"; 1; LOCATE 18, 19: PRINT FILTERS LOCATE 20, 42 SELECT CASE sampmodeS CASE "TT " PRINT " T r i g g e r e d (De lay " ; d e l a y ! ; " s e c ) " CASE " C T " PRINT " U n t r i g g e r e d (De lay " ; d e l a y ! ; " s e c ) " CASE " T C " PRINT " T r i g g e r e d (Con t inuous ) " CASE " C C " PRINT "Cons tan t Sampl ing" END SELECT COLOR 14 LOCATE 21 , 14: PRINT "PRESS F10 TO STOP " LOCATE 24, 1: COLOR 15 FOR 1% = 0 TO 1000 STEP 100 SOUND 1000 + 1%, (1% + 20) / 100 NEXT 1% PRINT" Buck le up, Ho ld on to your hat ! P ress re tu rn to s t a r t ! " INPUT ; " " , a$ LOCATE 24, 1 PRINT SPACE$(75); Get V o l t a g e s c a l e and Time s c a l e from scope TIMDIVS = SPACE$(30) CALL SC0PE(bd%, "HOR? ASE" , - 1 , TIMDIVS) ' Get A SEC/DIV PS% = INSTR(TIMDIVS, " : " ) LN% = INSTR(TIMDIVS, " ; " ) TIMDIVS = MID$(TIMDIV$, PS% + 1, LN% - PS% - 1) VOLDIVS = SPACE$(30) CALL SC0PE(bd%, "CHI? VOL", - 1 , VOLDIVS) ' Get VOLTS/DIV-. PS% = INSTR(VOLDIV$, " : " ) LN% = INSTR(VOLDIV$, " ; " ) VOLDIVS = MI0$(V0LDIV$, PS% + 1. LN% - PS% - 1) It i s not nescessa ry to check fo r HMAG ON (x lO on SEC/OIV s e t t i n g on the scope because a l though the d i s p l a y on the scope shows s m a l l e r t ime 8 s c a l e , the IK s i g n a l from the scope i s c o l l e c t e d a t the HOR ASE ra te count% = 1 COLOR 11 LOCATE 14, 20: PRINT TIMDIVS LOCATE 14, 51 : PRINT VOLDIVS count% = 1 ON KEY(IO) GOSUB 2270 KEY(IO) STOP s i g n a l S = SPACES(1037) OPEN d r i v e S + f r o o t S + " . a e a " FOR OUTPUT AS #1 CALL headerout(COMMENTS(), d e l a y , sennumS) SELECT CASE sampmodeS This i s the s e c t i o n tha t d e a l s w i t h Cont inuous samp l ing and a l s o t i m e - d e l a y sampl ing o f t r i g g e r e d s i g n a l s . A t r i g g e r i s de tec ted by the scope and SRQ i s a s s e r t e d . The computer c a n ' t de tec t an ' SRQ event BUT the SRQ a s s e r t i o n s e t s the PEN f l a g . So we need to check f o r PEN events t o de termine i f a t r i g g e r has o c c u r r e d . CASE " T C " . "TT" CALL MESSAGE(bd%, MESS$() . 0) COLOR 13 LOCATE 23. 10: PRINT "CLEARING SCOPE" FOR 1% = 1 TO 2000: NEXT CALL SC0PE(bd%, "MES 0 " , 0, REPS) CALL SC0PE(bd%, "OPC ON" , 0 , REP$) CALL SC0PE(bd%, " E V E ? " , - 1 , REPS) ' C l e a r f i r s t 2 SRQ events CALL SC0PE(bd%, " E V E ? " , - 1 , REPS) ' Th i s w i l l be the MES 0 ON PEN GOSUB 2110 i n i t i a l * = TIMER Count% = 0 PEN ON COLOR 28: LOCATE 23. 10: PRINT "WAITING FOR TRIGGER"; 1000 KEY(IO) ON KEY(IO) STOP GOTO 1000 Cont ro l comes here when SRQ a s s e r t e d (PEN) 2110 PEN OFF ^1 VO 9 count% = count% + 1 COLOR 11 LOCATE 10, 27 : PRINT USING "####"; count 0/.; id$ = RIGHT$(STR$(10000 + count%), 4) t ime = TIMER - i n i t i a l * IF t ime < 0 THEN t ime = t ime + 86400 t im$ = MKSS(time) KEY(IO) STOP COLOR 15: LOCATE 23 , 10: PRINT "TRIGGERED " 'CALL GETTEMP(TEMPI) 'TEMPS!(COUNT%) = TEMP! CALL 1bwrt(bd%, " c u r v ? " ) s i g n a l $ = SPACE$(1040) CALL ibrd(bd%, s i g n a l ! ) s i g n a l ! = M ID$ (s i gna l$ , 10, 1024) PRINT #1, i d $ ; c l a s s $ ; t im$; " " ; s i g n a l ! ; COLOR 28: LOCATE 23 , 10: PRINT "WAITING " ; LOCATE 23 , 18 KEY(IO) ON IF d e l a y * THEN ' pause be fore ARMing scope WHILE TIMER < t ime + d e l a y ! WEND END IF KEY(IO) STOP CALL SCOPE(bd%, "SGL ARM", 0, REP$) ' rearms t r i g g e r CALL SC0PE(bd%, "RQS ON", 0 , REP$) ' set SRQ f l a g PEN ON PRINT "FOR TRIGGER" IF count% = 9999 GOTO 3000 2270 RETURN Th i s s e c t i o n d e a l s w i t h Cont inuous sampl ing o f Cont inuous s i g n a l s . CASE " C C " , " C T " 3000 LOCATE 23 . 10 PRINT " C o l l e c t i n g Data" i n i t i a l ! = TIMER count% = 1 2000 1d$ « RIGHT$(STR$(10000 + count%), 4) 10 CALL ibwrt(bd%, "CURV?") s i g n a l j = SPACE$(1040) CALL 1brd(bd%, s i g n a l $ ) s i g n a l ! = M I O $ ( s i g n a l $ , 10, 1024) t ime = TIMER dtime = t ime - i n i t i a l * IF dt ime < 0 THEN dt ime = dt ime + 86400 END IF tim$ = MKSI(dt ime) PRINT #1, i d $ ; c l a s s $ ; t i m $ ; " " ; s i g n a l $ ; 'CALL GETTEMP(TEMPJ) 'TEMPS!(C0UNT%) = TEMP! IF count% = 9999 THEN 3000 count% = count% + 1 COLOR 11 LOCATE 10, 27 : PRINT USING "####"; count%; KEY(IO) ON IF delay% THEN WHILE TIMER < t ime + d e l a y ! WEND END IF KEY(IO) STOP GOTO 2000 END SELECT GOTO 3000 KEY(IO) OFF PEN OFF RETURN 3000 PEN OFF CLOSE #1 LOCATE 23 , 1 COLOR 15 PRINT " R i g h t ! T h a t ' s t ha t then 11 ' Remember the MOUSE d r i v e r s ? Remember the Alamo? 2710 SHELL " c : \mouse l \mouse > k l l l . f l l " SHELL " e r a s e k i l l . f i l " ' Doing 1t t h i s way keeps the message from d e s t r o y i n g the screen d i s p l a y ' Y o u ' r e welcome, O l i v e r . Stand up when I'm t a l k i n g to you ! (Oh, you a re END 4000 CLS PRINT " P l e a s e check the c o n f i g u r a t i o n o f computer (run TEK)" PRINT " ( A l s o check the connec t ions to the s c o p e . ) " GOTO 2710 BadDIr : FOR 1% = 1 TO 5 LOCATE 15 + 1%, 1 PRINT SPACE$(80) NEXT 1% PRINT "No Such D i r e c t o r y - Try a g a i n " CALL g e t d r 1 v e ( d e f a u l t $ , f r o o t S , d r1ve$ , d i r $ ) RESUME NEXT SUB CHECKSC0PE (bd%, sampmode$) • Th i s s e c t i o n checks the scope s e t t i n g s to make sure they are such tha t data a c q u i s i t i o n i s p o s s i b l e . C u r r e n t l y checked elements : TRIGGER MODE HORIZONTAL MODE VERTICAL MODE STORE MODE IK ACQUISITION CALIBRATION DETENT OF VERTICAL MODE 100 ERFLAG3S = 0 CALL SC0PE(bd%, " A T R ? " , - 1 . REP$) ' ask f o r t r i g g e r mode IF LEFT$(sampmodeS, 1) = "T " THEN IF MIDS(REPS, 15, 6) <> "SGLSWP" THEN PRINT "The t r i g g e r mode i s not se t to SINGLE SWEEP." 12 PRINT "The s w i t c h i s on the r i g h t s i d e o f the scope below A TRIGGER" PRINT "Push bu t ton l a b e l l e d SGL SWP 1n" ERFLAG% = 1 END IF ELSE IF MIDS(REP$, 15, 6) <> "PPAUT0" THEN PRINT "The t r i g g e r mode Is not se t t o PP AUTO." PRINT "The bu t ton Is on the r i g h t s i d e o f the scope below A TRIGGER" PRINT "Push the bu t ton l a b e l l e d P_P AUTO i n . " ERFLAG% = 1 END IF END IF CALL SC0PE(bd%, "VM0DE?", - 1 , REPS) ' v e r t i c a l mode s e t t i n g IF MIDS(REP$, 7, 3) <> " C H I " THEN PRINT "The VERTICAL MODE s e t t i n g i s not CHI " PRINT "The s w i t c h i s near the l e f t s i d e o f the ins t rument p a n e l " PRINT "Move s w i t c h f u l l y t o the l e f t . " ERFLAG5S = 1 END IF CALL SC0PE(bd%, "STORE?" , - 1 , REPS) ' ask f o r s t o r e mode s t a t u s IF MID$(REP$, 7, 2) <> "ON" THEN PRINT "The scope Is not In s t o r e mode." PRINT "The bu t ton i s t o the l e f t o f the knob marked VAR HOLD OFF" PRINT "Push the bu t ton i n . " ERFLAG% = 1 END IF CALL SC0PE(bd%, "HOR? MOD", - 1 , REPS) IF MID$(REPS, 17, 6) <> "ASWEEP" THEN PRINT "The h o r i z o n t a l mode i s not se t t o A . " PRINT "The s w i t c h i s t o the l e f t o f the SGL SWP b u t t o n . " PRINT "Move the s w i t c h f u l l t o the l e f t . " ERFLAG% = 1 £ END IF H i s no way to check f o r the HF REJECT so we do have to hope tha t 13 the person has a c l u e whether he has got i t o r n o t ! ! ! CALL SC0PE(bd%, "ACQ? POI " , - 1 . REP$) IF INSTR(REP$, "1024" ) = 0 THEN PRINT "The A c q u i s i t i o n i s not se t f o r 1 K . " PRINT "Push the but ton marked IK under ACQUISTION i n . " ERFLAG% = 1 END IF To check the C A L i b r a t e detent on the VOLTS/DIV knob, i t i s nescessary ' to ask f o r the V/DIV s e t t i n g and then see i f EVENT 555 i s r e t u r n e d . I f s o , then the knob i s not i n the CAL ib ra te de ten t . ' - c l e a r SRQ b u f f e r f o r check ing CAL detent on VOLTS/DIV DO CALL SC0PE(bd%, " E V E ? " , - 1 , REP$) LOOP UNTIL INSTR(REP$, " 0 ; " ) CALL SC0PE(bd%, "CHI? VOL". - 1 , REP$) CALL SC0PE(bd%, " E V E ? " , - 1 , REP$) IF INSTR(REP$, "555" ) THEN PRINT "The Channel 1 VOLTS/DIV knob i s not i n the CAL ib ra ted p o s i t i o n . " PRINT "Turn the smal l knob marked CAL f u l l y c l o c k w i s e . " PRINT "Check the CAL knob on the A and B SEC/DIV knob a l s o . " ERFLA6% = 1 END IF There i s no way to check the CAL knob on the SEC/DIV s e t t i n g . I t would seem l o g i c a l t o check f o r an EVENT 555; as i n the case f o r the ' VOLTS/DIV but t h i s c a p a b i l i t y does not e x i s t i n the TEKTRONIX 2230. IF ERFLAG% <> 0 THEN PRINT PRINT "PLEASE make adjustments and press RETURN"; INPUT a$ PRINT GOTO 100 END IF CLS OKAY, now we ' re ready to get g o i n ' ! ! ! BaZoom! END SUB 14 SUB g e t d r i v e ( d e f a u l t s , f r o o t $ , d r i v e $ , d i r $ ) SHELL " d i r > d i r f i l " OPEN " i " , #1. " d i r f i l " FOR 1% = 1 TO 3 LINE INPUT #1, a$ NEXT CLOSE #1 SHELL " e r a s e d i r f i l " d e f a u l t ! = MID$(a$, 16) IF R IGHT$(de fau l t$ , 1) <> " \ " THEN d e f a u l t ! = d e f a u l t ! + " \ " 270 LOCATE 16. 1 PRINT " E n t e r f i l ename o r path and f i l e n a m e , ( f i l e name must not exceed 8 c h a r a c t e r s ) " LOCATE 17, 1 PRINT " D e f a u l t d i r e c t o r y : " ; d e f a u l t s INPUT f r o o t $ f roo tS = UCASE$( f roo tS) a% * INSTR( f roo t$ , " : " ) IF a% = 2 THEN d r i v e S = L E F T $ ( f r o o t $ , 2) f r o o t $ = R IGHT$( f roo t$ , LEN( f roo tS ) - 2) ELSEIF a% <> 0 THEN PRINT "Oo not type any spaces i n r e p l y . " SOUND 800, 1 GOTO 270 END IF a% * 0 DO n% = a% + 1 a% = INSTR(n%, f r o o t S , " \ " ) LOOP UNTIL a% = 0 IF n% > 0 THEN d i r $ = L E F T S ( f r o o t S , n% - 1) f r o o t S = R IGHTS( f roo tS , LEN( f roo tS ) - n% + 1) END IF IF d r i v e S = " " AND d i r $ = " " THEN defau l t% = 1 IF LEN( f roo tS ) > 8 THEN 15 SOUND 70, 2 GOTO 270 ELSEIF LEN( f roo t$ ) < 1 THEN PRINT "You must supp ly a f i l e n a m e . " SOUND 80 , 2 GOTO 270 END IF 280 IF d r i v e S = " " THEN SHELL " d i r > d i r f i l " OPEN " 1 " , #1, " d i r f i l " LINE INPUT #1, a$ LINE INPUT #1, a$ CLOSE #1 SHELL " e r a s e d i r f i l " d r i v e $ = HIDSfaS, 18, 1) + " : " END IF END SUB SUB GETDTAILS ( e x t i t l e S , COMHENTSO, AMP%, F L T S O . f r o o t S , SMPMODES, sennumS, c l a s s S ) Th is i s the r o u t i n e tha t h a s s l e s the user f o r meaningless d e t a i l s . EXTITLES - 40 c h a r a c t e r s t r i n g c o n t a i n i n g experiment t i t l e COMMENT - 80 c h a r a c t e r s t r i n g s c o n t a i n i n g comments. AMP% - In teger c o n t a i n i n g a m p l i f i c a t i o n o f c o n d i t i o n i n g a m p l i f i e r FLTSO - Con ta ins the p o s s i b l e f i l t e r s e t t i n g s o f the C A . On r e t u r n , FLTS(0) c o n t a i n s the chosen s e t t i n g . FROOTS - Con ta ins the f i l ename (max. 5 l e t t e r s ) to be used. On r e t u r n , the cu r ren t d i r e c t o r y i s the in tended p a t h . SMPMODES - A two c h a r a c t e r s t r i n g tha t con ta ins the mode s e l e c t e d . The f i r s t c h a r a c t e r w i l l be a C i f cont inuous sampl ing i s d e s i r e d as opposed to T f o r Time de lay samp l ing . The second l e t t e r i s a T 1f t r i g g e r e d s i g n a l s are to be a c q u i r e d or a C f o r cons tan t samp l ing . I f t ime de lay i s reques ted , then the STRS o f the number o f seconds i s tacked on to the end o f SMPMODES f o r r e t u r n . CLASSS - A four byte c h a r a c t e r s t r i n g that c on ta i ns the d e f a u l t c l a s s i d e n t i f i c a t i o n f o r use when c o l l e c t i n g d a t a . De fau l t =" 0" 16 SHARED d r i v e S CLS PRINT "UELC0ME TO 'NEWQAQ' - The next gene ra t i on o f DATADUMP" LOCATE 3 , 1 PRINT " E n t e r exper iment t i t l e (40 c h a r a c t e r s ) | " LINE INPUT e x t i t l e S IF L E N ( e x t i t l e S ) > 40 OR L E N ( e x t i t l e S ) < 1 THEN SOUND 2000, 1 GOTO 210 END IF e x t i t l e S = L E F T S ( e x t i t l e $ + STRINGS(40, " " ) , 40) LOCATE 3 , 1 PRINT e x t i t l e S PRINT STRING$(80, " " ) LOCATE 5, 1 PRINT " P l e a s e en te r a d e s c r i p t i o n o f the e x p e r i m e n t . " PRINT " E n t e r two l i n e s o f l e n g t h no more than e i g h t y c h a r a c t e r s FOR 1% = 1 TO 2 LOCATE 6 + 1 % LINE INPUT C0MMENT$(I%) IF LEN(COMMENTS(1%)) > 80 THEN SOUND 1200, 1 GOTO 220 END IF IF COMMENTS(1%) = " " THEN 1% = 2 COMMENTS(1%) = LEFT$(COMMENT$(I%) + STRINGS(80, " " ) . 80) NEXT LOCATE 4 , 1 FOR 1% = 1 TO 2 PRINT C0MMENT$(I%) NEXT FOR 1% = 1 TO 2 PRINT STRINGS(80, " " ) NEXT LOCATE 8, 1 PRINT"Do you w ish T r i g g e r e d s i g n a l s or Constant sampl ing ( C / T ) ? ' aS = INPUTS!1) aS = UCASES(aS) 17 IF a$ = " C " THEN SMPMODES = a$ ELSEIF a$ = " T " THEN SMPMODES = a$ ELSE SOUND 1000, 1 GOTO 240 END IF LOCATE 10, 1 PRINT "Do you w ish to have a t i m e - d e l a y between sampl ing? ( Y / N ) " ; a$ = INPUT$(1) a$ = UCASE$(a$) IF a$ = " N " THEN SMPMODES = SMPMODES + " C " ELSEIF a$ = " Y " THEN LOCATE 12, 15 INPUT "Number o f seconds to de lay " ; b$ IF VAL(b$) = 0 THEN SMPMODES = SMPMODES + " C " ELSE d e l a y ! = VAL(b$) SMPMODES = SMPMODES + " T " END IF ELSE SOUND 1000, 1 GOTO 230 END IF IF RIGHT$(SMPMODE$,l)="T" THEN SMPMODES = SMPMODES + STRStde lay ! ) SELECT CASE MIDS(SMPMODE$. 2 , 1) CASE " T " C0MMENT$(3) = "Time de lay (" + RIGHTS(SMPMODES, LEN(SMPMODES) - 2) + " seconds) : CASE " C " COMMENTS(3) = "Cont inuous sampl ing : END SELECT SELECT CASE LEFTS(SMPMODE$, 1) CASE " T " C0MMENT$(3) = COMMENTS(3) + " T r i g g e r e d " 18 CASE " C " C0MMENT$(3) = COMMENT$(3) + " U n t r i g g e r e d " END SELECT LOCATE 7, 1 PRINT COMMENTS(3) FOR 1% = 1 TO 5 PRINT STRINGS(80, " " ) NEXT LOCATE 9, 1 INPUT "What i s the a m p l i f i c a t i o n s e t t i n g on the c o n d i t i o n i n g a m p l i f i e r " ; a$ AMP% = VAL(a$) IF (AMP% < 1 AND a$ <> " 0 " ) OR (AMP% > 60) THEN SOUND 900, 1 GOTO 250 END IF LOCATE 8, 1 PRINT " A m p l i f i c a t i o n : " ; AMP% LOCATE 9, 1 PRINT STRINGS(80, " " ) LOCATE 10, 1 PRINT "What i s the FILTER s e t t i n g ? " FOR 1% = 1 TO UBOUND(FLTS) PRINT 1%; " ) " ; FLT$(1%) NEXT PRINT INPUT aS a% = VAL(a$) IF a% < 1 OR a% > UBOUND(FLTS) THEN SOUND 800, 1 GOTO 260 END IF FLT$(0) = FLTS(a%) LOCATE 9, 1 PRINT "FILTER : " ; FLT$(0) LOCATE 10. 1 FOR 1% = 1 TO UBOUND(FLTS) + 3 PRINT STRINGS(80, " " ) 19 NEXT LOCATE 11 , 1 INPUT "sensor number : "; sennum$ LOCATE 13, 1 INPUT " E n t e r c l a s s prompt: < 0>"; C$ c l a s s $ = LEFT$(C$ + SPACE$(3) + " 0 " , 4) LOCATE 14, 1 PRINT " C l a s s : " ; c l a s s $ LOCATE 12, 1 FOR 1% = 1 TO 8 PRINT STRING$(80, " ") NEXT LOCATE 12, 1 PRINT " C l a s s : " ; classS LOCATE 15, 1 CALL getdrive(defaultS, frootS, drive$, dir$) IF default* = 1 THEN PRINT " F i l e specification = "; defaultS; froot$; " . A E A " ELSE PRINT " F i l e specification = "; driveS; dir$; froot$; " . A E A " END IF LOCATE 20, 1 INPUT " I s above information correct"; a$ IF UCASE$(LEFT$(a$, 1)) = " N " THEN 210 The current directory i s changed to tha t specified I f the directory does not exist, we will not find a problem until after the experiment. To prevent t h i s annoying problem, the directory i s created. I f i f already exists an error will occur but the execution will continue. ON ERROR GOTO BadDir IF LEN(d i r$ ) > 1 THEN SHELL "md" + LEFT$ (d i r $ , LEN(d i r$ ) - 1) IF LEN(d i r$ ) > 0 THEN SHELL " c d " + LEFT$ (d i r $ , LEN(d i r$ ) - 1) ON ERROR GOTO 0 END SUB SUB GETTEMP (TEMP!) Th is routine one day w i l l p o l l the RTI-815 ADC and return a value for 20 ' the temperature. (One day) END SUB SUB headerout (COMMENT$(), delay, sensornumS) ' Th is routine prints out the header f i l e . ' The format of the header f i l e i s : ' The format of the header f i l e i s : 'EXPERIMENT TITLE (MAX 40 CHARACTERS) DATE (8 CHARS) 'TWO comment lines o f 80 characters each ' T h i s line contains the MODE of sampling. 'T/0= V/D= GAIN= FLT= SHARED extitleS, FILTERS, AMP%, TIMDIVS, VOLDIVS. ini t ia l* PRINT #1, LEFT$(extitle$ + SPACE$(40) , 4 0 ) ; SPACES(2) ; DATES; SPACES(IO); LEFTS("T2230" + SPACE$(18) , 18 ) ; FOR 1% = 1 TO 2 PRINT #1, LEFT$(COMMENT$(I%) + SPACE$(80) , 8 0 ) ; NEXT PRINT #1, LEFTS("SN=" + sensornumS + SPACES(80) , 8 0 ) ; PRINT #1, LEFT$( "T /100p t= " + TIMDIVS + SPACE$(20) . 2 0 ) ; PRINT #1. LEFTS("V /D=" + VOLDIVS + SPACE$(20) . 2 0 ) ; PRINT #1. LEFT$("GAIN=" + STR$(AMP%) + SPACE$(20) . 2 0 ) ; PRINT #1, LEFT$( "FLT=" + FILTERS + STRING$(20, " " ) . 2 0 ) ; PRINT #1, SPACES(40) ; PRINT #1, LEFT$("ACQ DELAY=" + STRS(delay) + SPACE$(40) , 4 0 ) ; FOR index% = 1 TO 6 PRINT #1, SPACE$(80) ; NEXT EXR$ = "NUL " ' 3 character name o f extra record ' s t a t u s o f extra record - see IHB documentation , eh? PRINT #1, SPACES(70) ; " 0 " ; SPACES(2) ; EXRS; END SUB SUB MESSAGE (bd%. SOURCES!) . NUM%) ' Subrout ine to print SOURCES in format for IBWRT H ' The string in sourceS must have the first two spaces containing the * ' line number of the screen for the message to be written on 21 The numbers a re from 16 a t the top to 1 at the bottom (ups ide down) I f NUM% i s ze ro then "MES 0" i s sent which c l e a r s the screen and re tu rns the sc reen to d i s p l a y mode. Be aware tha t the scope screen has a l i m i t e d memory and i t i s not p o s s i b l e to p r i n t on every space on the screen a t the same t ime . Yes , spaces at the beg inn ing o f s t r i n g s count towards f i l l i n g the s c o p e ' s screen b u f f e r so i t i s adv i sed tha t a l l messages be l e f t j u s t i f i e d . SOURCES() i s an a r r a y c o n t a i n i n g the s t r i n g s to be p r i n t e d i n the format d i s c u s s e d above : 2This message w i l l go on the second l i n e from the bot tom" " l O T h i s w i l l be p r i n t e d on the 5th l i n e from the t o p . " NUM% : i s the number o f elements o f SOURCES tha t are to be p r i n t e d . I f NUM% = 0 then the screen i s c l e a r e d and no message i s s e n t . B0% : i s the In teger exp ress i on that r e f e r s to the TEK scope f o r c a l l s to I BURT. ' GENERIC COMMAND FOR WRITING TO SCREEN : ' MESSAGES="MES <num>:"+CHR$(34)+"message (40 chars per l i n e ) " + c h r $ ( 3 4 ) ' CALL IBWRT (#,MESSAGES) SOURCES(O) = "MES 0" IF NUM% = 0 THEN CALL ibwrt (bd%, S0URCE$(0)) ELSE FOR 1% = 1 TO NUM% TS = SOURCES!1%) TLEN = LEN(TS) T l$ = "MES " + LEFT$(T$, 2) + " : " + CHR$(34) T$ = T l$ + RIGHTS(T$, TLEN - 2) + CHR$(34) CALL ibwrt(bd%, TS) NEXT 1% END IF END SUB SUB SCOPE (bd%, mesS, REP%, REPS) Th is r o u t i n e sends MESS to the dev i ce l a b e l l e d BD% I f a r e p l y i s expec ted , then REP% shou ld be set TRUE (-1) The s t r i n g i s re tu rned i n REPS. 60 c h a r a c t e r s i s c o n s i d e r e d enough f o r most purposes. CALL ibwrt(bd%, mesS) 22 IF REP% THEN REPS = SPACE$(60) CALL ibrd(bd%, REPS) END IF END SUB 'Th i s r o u t i n e p r i n t s the f i r s t t i t l e page SUB TTLE COLOR 15, 11 CLS LOCATE 4 , 25 PRINT "NEWQAQ - Data a c q u i s i t i o n " LOCATE 8, 25 PRINT "8y Oave S i b b a l d & Kev in S o u l s b u r y " LOCATE 14 PRINT TAB(20) ; "The Labo ra to ry f o r Automated Chemical A n a l y s i s ' PRINT TAB(20) ; "Department o f C h e m i s t r y " PRINT TAB(20) ; "The U n i v e r s i t y o f B r i t i s h Co lumb ia " PRINT TAB(20) ; "Vancouve r , B . C . , CANADA" COLOR , 0 END SUB 187 XI.3 ABSCAT - Abstract Factor Analysis / Scattergram Plotting Utility Program ABSCAT was written by D. B. Sibbald in QuickBASIC 4.0. The AFA routine was translated from a program written in FORTRAN by P. D. Wentzell. The display of the two-dimensional scattergrams uses an installed enhanced graphics adapter (EGA) card. The program has many options and any are available from the program in any order, allowing for recall of previous analyses. The procedure for calculation of the AFA analysis is as follows. The user specified descriptor file (with a .DS1, .DES, .SCL) extension is loaded and subjected to the same scaling options as in the DENDGRAM program. The AFA routine first involves the calculation of the covariance matrix. This is then used to generate each principal component. The principal component is then removed from the covariance matrix and the process continues until a complete set of basis vectors has been calculated. (The complete set of basis vectors will consist of the same number of vectors as there were descriptors in the original data fileF). The scaled data matrix is then projected onto the new set of basis vectors (factor space). The basis vectors, along with the projected data, are stored in a data file (.AF2). Also, a report file (.RES) is created which lists various parameters which can help determine the number of primary factors present in the data. The .RES report file can be viewed from within ABSCAT, and help screens are available which explain the significance of parameters such as the root-mean-square (RMS) error, the cumulative percent variance (CPV) and the imbedded error (IE) function. The loadings of the original descriptors on the factors can be displayed F That is, unless some of the descriptors have been removed from analysis by the scaling routine. 188 graphically. This gives an indication of the relative importance of each descriptor as a descriminating parameter. ABSCAT also plots two-dimensional scattergrams of data files. These data files can either be descriptor files (with .DS1, .DES, .SCL extensions) or can be the factor analysis data file (.AF2, and .AFA files). Two descriptors / factors can be plotted against each other or they can be plotted against time. Colored plotting is used to allow one to distinguish and assign different classes of signals visually. On the advice / insistence of the members of this research group who wished to use the program, some elementary DOS functions were incorporated into ABSCAT to allow for people to view the contents of the computer's hard disk (or of floppy disks) to jog a memory which wasn't quite able to keep up with the myriad files created during the data analysis routines. Estimate Next Basis Vector Load Data Scale Data Calculate Covariance Matrix Multiply By Transpose of Covariance Matrix To Get New Estimate For Basis Vector ND AU Dimensions Normalize New Estimate Compare New Eigenvalue To Previous One YES Remove Base Vector From Covariance Matrix <Aoop Use New Basis Set To Project Original Data Onto Factor Space Save File 0 Figure 83) Abstract factor analysis (AFA) algorithm from A B S C A T program. CO <D 1 ABSCAT - Abstract Factor analysis, Scattergram utility. ************** Written by David Sibbald (Abstract factor analysis routine by Peter Wentzell from FORTRAN) Laboratory for Automated Chemical Analysis Chemistry Department, University of British Columbia. Vancouver , B.C A useful chemometrlc util ity program for use with AECRUNCH and OENDGRAM. (See also SIGVIEW) The purpose of this program it to : ' 1) Perform abstract factor analysis on a f i le. 2) View the scatterplot of various formats of fi les. 3) Create DEScrlptor files for use In DENDGRAM using abstract factors. ' Subroutine Declarations DECLARE SUB TITLESCRN () DECLARE SUB TITLESCRN2 () DECLARE SUB GETKEY (KEYS) DECLARE FUNCTION FEXISTX (F$. LENGTH*) DECLARE SUB GETFIL (INFILS. EXTS, EXISTX) DECLARE FUNCTION FRTEXTS (FILES. ROOTS, EXTS) DECLARE FUNCTION GETOIRS () DECLARE SUB SLEP () DECLARE SUB READCOL (Ft. C0L#(), COLX) DECLARE SUB READVALS (F$, VARNAMJO, HESS*(). ID() AS ANY. CLASS() AS ANY, TIME EXTRA() AS ANY. EXRJ, EXNAMJO, EXTSXO, EXTENSO. SCALEDS) OECLARE SUB READPARS (F$. NUMSAMSX. NUMVARSX. EXT1X. EXT2X) DECLARE SUB REAOFACTS (FILES. NUMFAXX, EVALS!(). EVECStO, VARNAMSO) DECLARE SUB NEVOES (FILES, SUCCESSX) DECLARE SUB FACLOOISPLAY (FILES) 2 DECLARE SUB FACDISHELP () DECLARE SUB verprlnt (WORDS, XX, YX) ' Subroutines used by AFA section DECLARE SUB AFA (FILES) DECLARE SUB ADVSCAL (MESS*(). VARNAMSO, F$. SCALEDS, DELVARX()) OECLARE SUB SCALECOL (MAT#(), COLX, MODEX. VALUE*. DIV#) DECLARE SUB SCLFORSAV (FILES, SCS. AX(). B*(), C#()) DECLARE SUB SCLFORLOD (FILES, SCI, M0DEX(). A#(), B#()) DECLARE SUB SCALER (MESS#(). VARNAMSO, FILES. SCALEDS. DELVARXf), OUTFILES) DECLARE FUNCTION COLMAXI (MAT#(), COLX, STARTROVX, ENDROWX. COLMINI) DECLARE FUNCTION VARIANCE! (MAT#(), COLX, NUMROVSX, AVE*) ' Subroutines used by SCATtergram section DECLARE SUB SCAT (FILES) OECLARE FUNCTION INSIDEX (X*. Y*. XD*. YD*. YSCALE*, XSCALE*. RADX) DECLARE SUB CROSSHAIRS (XX. YX. CROSSCOLX) DECLARE SUB MARK (XX. YX, MRKX) DECLARE SUB PMRK (XCORD*. YCORD*) OECLARE SUB EGGBOX (XX, YX. RADX, COLRX, CRCLEX(J) DECLARE SUB TIMPRINT (MESS*(), VARNAMSO, F$) ' Subroutines for explaining the output In the RESults f i le after performing an AFA. DECLARE SUB RESFILHELP () DECLARE SUB CPVHELP () DECLARE SUB IEHELP () DECLARE SUB INOHELP () DECLARE SUB REHELP () DECLARE SUB RMSHELP () DECLARE SUB EVALHELP () DECLARE SUB NACHELP () OECLARE SUB EOAHELP () DECLARE SUB VARHELP () Define some Options , variables, Constants etc. 1 COLOR FGROUNO. bground CLS LOCATE 2 , 20 PRINT " A b s t r a c t F a c t o r A n a l y s i s U t i l i t y " FOR IX « 1 TO numcholcesX ' change t h i s l i n e to handle more c h o i c e s than 9 LOCATE IX * 2 + 7 - numcholcesX \ 2 , 10 PRINT IX; " ) " ; cho1ce$( IX) NEXT 10 LOCATE 22 . 15 PRINT " E n t e r S e l e c t i o n " ; s l c t X - 0 LOCATE 22 . 31 INPUT ; a$ aX ' VAL(a$) IF aX > 0 ANO aX <« numcholcesX THEN s l c t X « aX IF s l c t X « 0 THEN 10 SELECT CASE s l c t X CASE 1 ' F a c t o r A n a l y s i s CLS LOCATE 2 , 10 PRINT " F a c t o r A n a l y s i s " LOCATE 4 , 1 PRINT " E n t e r f i l ename ( i n c l u d i n g root and ex tens ion ) : <"; PRINT FILES + FEXTS; ">" COLOR 3 PRINT " (Legal ex tens ions » " ; Q$; " . 0 E S . D S 1 . 0 S 2 . S C L " ; QS COLOR 11 INPUT F$ IF F$ <> " " THEN EXTENSS - "DS1 .0ES .0S2 .SCL" F$ • UCASES(FJ) CALL GETFIL(F$, EXTENSS, EXISTX) IF F$ » " " THEN PRINT " I l l e g a l f i l e s p e c i f i c a t i o n " CALL s l e p ELSEIF NOT EXISTX THEN CLS TYPE RECRD R AS STRING * 4 END TYPE CONST b l u e = 1 CONST l t b l u e - 11 CONST y e l l o w = 14 CONST red = 4 CONST p u r p l e = 13 CONST green • 10 CONST FGROUND » l t b l u e CONST bground • p u r p l e CONST l e g a l e x t S - " . D E S . D S 1 . D S 2 . S C L . A F A . A F 2 " CONST maxdim! = 23000 CALL t l t l e s c r n QS » CHR$(34) number o f o p t i o n s a t main menu Max 1s n ine DATA 8 DATA " A b s t r a c t F a c t o r A n a l y s i s " DATA " P r i n t out R e s u l t s o f p r e v i o u s A b s t r a c t Fac to r A n a l y s i s " OATA "Show AFA f a c t o r l o a d i n g s " DATA " C r e a t e O E S c r i p t o r f i l e from AFA r e s u l t s " DATA "V iew S c a t t e r g r a m " DATA "Show d i r e c t o r y " )" DATA " E x p l a i n f i l e e x t e n s i o n s " DATA " E x i t " READ numcholcesX DIM c h o i c e S d TO numcholcesX) FOR IX » 1 TO numcholcesX READ c h o i c e $ ( I X ) NEXT CUROIRECTORYS = GETDIRS CALL TITLESCRN2 ^ H PRINT "Can't f i n d " ; F$ CALL s l e p GOTO 1 END IF FILESPEC$ = F$ ELSE IF F$ = " " AND FILES = " " THEN GOTO 1 END IF FILE$ = FRTEXT$(FILESPECS, FROOTS, FEXT$) CALL AFA(FILESPECS) CASE 2 ' Copy REP f i l e t o p r i n t e r CLS LOCATE 2 , 10 PRINT " P r i n t out RES f i l e " LOCATE 4 , 1 PRINT " E n t e r name o f .AF2 (AFA) f i l e ( I n c l u d i n g path) INPUT F$ IF FS <> " " THEN EXTENSS = " R E S " F$ = UCASE$(F$) CALL GETFIL(F$, EXTENSS, EXIST%) IF F$ = " " THEN PRINT " I l l e g a l f i l e s p e c i f i c a t i o n " CALL s l e p ELSEIF NOT EXIST% THEN PRINT " C a n ' t f i n d " ; F$ CALL s l e p GOTO 1 END IF RESFILES = F$ ELSEIF F$ = " " .AND FILES = " " THEN GOTO 1 ELSE RESFILES = FROOTS + FILES + " . R E S " END IF IF NOT FEXIST%(RESFILE$, IS) THEN PRINT " C a n ' t f i n d " ; RESFILES PRINT " ( F a c t o r A n a l y s i s not per formed o r F i l e has been moved)" CALL s l e p GOTO 1 END IF LOCATE 6, 1 INPUT " P r i n t e r o r sc reen ( P / S ) " , aS a$ = UCASE$(LEFTS(a$, 1)) IF a$ = " P " THEN LPRINT "RESULTS FROM ABSTRACT FACTOR ANALYSIS OF FILE : " ; LPRINT FILES SHELL " t y p e " + RESFILES + " >prn" ELSE CLS numlines% = 1 6 OPEN " I " , #1, RESFILES FOR 1% = 1 TO 6 F ILES; ">" LINE INPUT #1, a$ PRINT aS NEXT VIEW PRINT 7 TO 7 + numl lnesX a% = 0 WHILE NOT EOF( l ) LINE INPUT #1, a$ a% = a% + 1 PRINT a$ IF a% = numl inesX THEN aX = 1 LOCATE 2 5 , 5 COLOR 15 PRINT"Press key < ? - e x p l a i n column head ings>" ; COLOR 11, 13 GOSUB 95 LOCATE 7 + numl inesX - 2 END IF WEND CLOSE #1 S LOCATE 25, 5 • w COLOR 15 PRINT " P r e s s a key to c o n t i n u e . <? -exp la i n h e a d i n g s * " ; GOSUB 95 COLOR 11 VIEW PRINT END IF GOTO 1 CALL ge tkey(a$) IF a$ = " ? " OR a$ = " / " THEN CALL r e s f i l h e l p COLOR 11, 13 VIEW PRINT 7 TO 7 + numl ines* GOTO 95 END IF RETURN CASE 3 CLS LOCATE 2, 10 PRINT " P r i n t out Fac to r Load ings " LOCATE 4 . 1 PRINT " E n t e r name o f .AF2 (AFA) f i l e ( i n c l u d i n g path) :< " ; F ILES; ">" INPUT FS IF FS <> " " THEN EXTENSS = "AF2 .AFA" F$ = UCASE$(F$) CALL GETFIL(F$, EXTENSS, EXISTX) IF F$ = " " THEN PRINT " I l l e g a l f i l e s p e c i f i c a t i o n " CALL s l e p ELSEIF NOT EXIST% THEN PRINT " C a n ' t f i n d " ; F$ CALL s l e p GOTO 1 END IF f l s f i l e S = F$ ELSEIF F$ = " " AND FILES = " " THEN GOTO 1 ELSE f l s f i l e S = FROOTS + FILES + " . A F 2 " END IF IF NOT F E X I S T % ( f l s f i l e $ , 18.) THEN PRINT " C a n ' t f i n d " ; f l s f i l e S CALL s l e p GOTO 1 END IF CALL F A C L O D i s p l a y ( f l s f l l e S ) FILESPECS = f l s f i l e S FILES = FRTEXT(FILESPEC$, FROOTS, FEXTS) CASE 4 CLS IF FEXT$ <> " . A F A " THEN FEXTS = " . A F 2 " LOCATE 2, 10 PRINT " C r e a t e D e s c r i p t o r f i l e from a b s t r a c t f a c t o r s . " LOCATE 4 . 1 PRINT " E n t e r name o f .AF2 (AFA) f i l e ( I n c l u d i n g path) INPUT F$ IF FS <> " " THEN EXTENSS - " A F 2 . A F A " F$ = UCASE$(F$) CALL GETFIL(F$, EXTENSS, EXISTX) IF F$ = " " THEN PRINT " I l l e g a l f i l e s p e c i f i c a t i o n " CALL s l e p ELSEIF NOT EXISTX THEN PRINT " C a n ' t f i n d " ; FS CALL s l e p GOTO 1 END IF DESFILES = F$ ELSEIF F$ = " " AND FILES = " " THEN GOTO 1 ELSE DESFILES = FROOTS + FILES + FEXTS END IF CALL NEWOES(OESFILE$, SUCCESS%) IF SUCCESS* THEN FEXTJ = " . D S 2 " FILESPEC$ = FROOT$ + FILES + FEXT$ CASE 5 CLS LOCATE 2 , 10 PRINT "V iew SCATTERGRAM of data f i l e . " LOCATE 4 . 1 PRINT " E n t e r name of f i l e ( I n c l u d i n g ex tens ion ) " ; IF FILES <> " " THEN PRINT "< " ; FILES + FEXT$; "> " ; END IF PRINT " : " COLOR FGROUNO - 8 PRINT "Lega l ex tens ions " ; CHRS(238); " " ; l e g a l e x t S COLOR FGROUND INPUT F$ FS = UCASE$(F$) IF F$ <> " " THEN EXTENS = " . DS1. DES-. DS2. AF2. AFA. SCL" CALL GETFIL(F$, EXTENS, EXISTX) IF F$ = " " THEN PRINT " I l l e g a l f i l e s p e c i f i c a t i o n " CALL s l e p GOTO 1 ELSEIF NOT EXISTX THEN PRINT " C a n ' t f i n d " ; F$ CALL s l e p GOTO 1 END IF FI LESPECS = F$ ELSEIF F$ = " " AND FILES = " " THEN GOTO 1 ELSE FILESPECS = FROOTS + FILES + FEXT$ END IF CALL SCAT(FILESPECS) 9 FILES = FRTEXT$(FILESPECS, FROOTS, FEXT$) 10 CASE 6 CLS LOCATE 2, 1 PRINT " E n t e r d i r e c t o r y to show <"; CURDIRECTORYS; ">" PRINT " ( en te r " ; QS; " . * " ; Q$; " t o show ALL f i l e s ) " INPUT d$ SHELL "DIR " + d$ + " > t e m p f 1 1 . d i r " CLS LOCATE 1, 1 COLOR y e l l o w OPEN " i " , #2, " t e m p f i l . d i r " LINE INPUT #2, a$ PRINT " " ; aS LINE INPUT #2, a$ PRINT " " ; a$ LINE INPUT #2, a$ PRINT " " ; a$ VIEW PRINT 4 TO 24 COLOR FGROUND WHILE NOT E0F(2) 1inenumX = 1 DO LINE INPUT #2, a$ IF RIGHTS(a$, 12) <> "TEMPFIL DIR" THEN IF RIGHT$(" " + d S , 2) ' " . * " THEN PRINT " " ; a$ 1 i nenumX = 11 nenum% + 1 ELSEIF MID$(a$, 15, 3) = "DIR" THEN PRINT " " ; a$ l inenumX = l inenumX + 1 ELSEIF INSTR( lega lex t$+ "REP" , MID$(a$ ,10 ,3 ) ) <> 0 THEN PRINT " " ; aS l inenumX = l inenumX + 1 END IF H END IF 4* 11 LOOP UNTIL 11nenum% = 18 OR EOF(2) LOCATE 24, 5 COLOR y e l l o w + 16, red PRINT " p r e s s any key to c o n t i n u e . . . " ; a$ = 1NPUT$(1) COLOR FGROUND, bground CLS LOCATE 5, 1 WEND CLOSE #2 VIEW PRINT CLS KILL " t e m p f i l . d i r " CASE 7 COLOR 15, 0 CLS ' l ook f o r f i l e ABSCAT.TXT - Th is shou ld con ta in the l a t e s t d i r t on the f i l e fo rma ts . OPEN " a b s c a t . t x t " FOR RANDOM ACCESS READ AS #1 1 = LOF( l ) CLOSE #1 IF 1 <> 0 THEN OPEN " I " . #1, "ABSCAT.TXT" WHILE NOT ( E O F ( l ) ) LINE INPUT #1, a$ PRINT a$ WEND CLOSE #1 ELSE ' i t ' s not t h e r e . P r i n t out the s tandard d i r t PRINT PRINT " F i l e E x t e n s i o n s " PRINT PRINT " .DES - O r i g i n a l D e s c r i p t o r f i l e . Conta ined d e s c r i p t o r s i n ASCII f o rma t . " PRINT 12 PRINT " .DS1 - C u r r e n t D e s c r i p t o r f i l e . BINARY f i l e . Con ta i ns more i n f o r m a t i o n . " PRINT:PRINT " .DS2 - ASCII ana log o f DS1 f i l e . " PRINT:PRINT " . A F A - O r i g i n a l A b s t r a c t F a c t o r A n a l y s i s f i l e . C o n t a i n s s c o r e s and l o a d i n g s " PRINT:PRINT " . A F 2 - Cur ren t format o f AFA f i l e . Has complete i n f o from DS1 f i l e . " PRINT:PRINT " . R E S - Tab le o f I n d i c a t o r f u n c t i o n s f o r each F a c t o r e x t r a c t e d i n AF2 f i l e . " PRINT:PRINT " . S F F - S c a l i n g format f i l e . Ho lds USER-de f ined s p e c i a l s c a l i n g i n f o . " PRINT END IF a$ = INPUT$(1) CASE numchoices% END ' e x i t program - no f a n f a l r - "Dead i s d e a d " . . . Ian Sh iTd t CASE ELSE END SELECT GOTO 1 SUB advsca l (mess#(), varnam$() , F$ , s c a l $ , DELVAR%()) Th is r o u t i n e i s c a l l e d when the user d e c i d e s to p l a y God and mess up the data w i th some w i e r d s c a l i n g . I f t he re Is some knowledge about the data then "use r d e f i n e d " s c a l i n g 1s a good Idea but can be d i s a s t r o u s when a p p l i e d I n d i s c r i m i n a t e l y . So t h a t ' s tha t t hen . The columns are shown on the sc reen and the v i t a l s t a t i s t i c s a re shown o f each v a r i a b l e . These a re s t o r e d i n a r r a y s which need not ever be accessed again so i t i s prudent t o not d e c l a r e t h i s r o u t i n e as STATIC or at l e a s t to erase the a r r a y s be fo re c o n t i n u i n g . One BUG : Yes I am r e p o r t i n g a bug tha t I know to be p resent but c o u l d not f i x . When s e l e c t i n g the mode, p r e s s i n g the arrow keys 19 t imes causes a STRING FORMULA TOO COMPLEX e r r o r on an o the rw ise p e r f e c t l y good command. I vo Ui 13 c o u l d d i s c o v e r no obv ious or s e c r e t reason f o r t h i s except t ha t the e r r o r does not occur when the program 1s opera ted from execu tab le mode. I f e e l t ha t the QUICKBASIC environment 1s h o s t i l e t o people who REALLY c a n ' t make up t h e i r mind. VARIABLES SCVAL# - These two v a r i a b l e s s t o r e va lues f o r each column. They a re the SCDIV* - parameters passed to SCALCOL. I f AUTO s c a l i n g or SHIFT/MULTIPLY s c a l i n g 1s used , then SCVAL* Is the va lue to be s u b t r a c t e d from each element ( to s c a l e the mean to z e r o , f o r example, SCVAL# i s equal t o the column a v e r a g e . ) . SCDIV# i s the va lue tha t each element Is d i v i d e d by. For AUTO s c a l i n g , SCDIV# shou ld be the SQuare Root o f the v a r i a n c e . SCM0D% - The S C a l i n g HODe f o r each p a r t i c u l a r column. The numbers r e f e r to the s t r i n g s s t o r e d In S C ( ) . l - Auto s c a l i n g . 2 - Range s e a l i n g . 3 - User d e f i n e d S h i f t / M u l t i p l y 4 - Remove v a r i a b l e from a n a l y s i s VARNZ# - The v a r i a n c e o f each column o f the m a t r i x . AVE# - The average o f each 'co lumn. COLMINS!- The minimum va lue o f the column. COLMAXS!- The maximum va lue o f the column. CURR0W% - An i n t e g e r r e f e r r i n g to the number o f the row on the sc reen tha t i s c u r r e n t l y s e l e c t e d . (NB. each row cor responds to a column o f the m a t r i x ) SCRNROWS- Th is i s a " c o n s t a n t " to g i v e the number o f v a r i a b l e s tha t can be d i s p l a y e d on the screen a t once. These are d i s p l a y e d s t a r t i n g on 1 ine TOP. TOP - Th is Is the f i r s t l i n e tha t the v a r i a b l e i n fo rma t i on i s d i s p l a y e d on . I f t h i s i s changed. I t must be kept i n mind tha t l i n e s 1-3 a re used f o r t i t l e i n f o rma t i on and l i n e s 23-25 a re used f o r o the r info/ input . [SCRNROWS + TOP < 23] T0PR0W% - Th is i s the In teger co r respond ing to the v a r i a b l e # t ha t i s to appear on the f i r s t l i n e [LOCATE TOP] BOTROWX - Th is i s the v a r i a b l e number tha t appears on the l a s t row on the d e f i n e d " v a r i a b l e s c r e e n " . [TOP + SCRNR0WS%] 14 CURSTAT%- Re fe rs t o the c u r r e n t s t a t i s t i c t ha t Is h i g h l i g h t e d . Th i s w i l l depend on the SCM0D% f o r the c u r r e n t co lumn. SCAL$ - The s t r i n g t h a t r e p r e s e n t s the method o f s c a l i n g . T h i s ge ts saved a t the end o f the SCL f i l e . I f SCALJ - "VIEW MODE" on c a l l i n g then TIMM0DE% i s se t p o s i t i v e . TIMM0DE%- Is TRUE (-1) i f SCAL$ c o n t a i n s the s t r i n g "VIEW MODE" on c a l l i n g . TIMmode i s when peop le l i k e TIM j u s t want t o l ook a t the column s t a t i s t i c s . P r i n t out Is a v a i l a b l e as w e l l as a f u t u r e b i t t o put out a Lotus f i l e . (God knows why) The p e r f e c t i o n i s t w i l l a l s o n o t i c e t h a t MESS* Is passed as a b lank a r r a y w i t h o n l y one d i m e n s i o n . Th i s may not be a c c e p t a b l e t o some COMPUTER GODS but i t works to p r i n t o u t TIM'S s t a t i s t i c s . NUMsams% = U80UND(mess#, 1) NUMVARS* = UBOUND(varnam$) CONST SCRNR0WS% = 18 CO.NST TOP = 4 CONST y e l l o w = 14 CONST bgrnd = 13 CONST h lgh t = 8 CONST n tex t = 11 CURROWX = 1 CURSTAT% = 1 IF s c a l $ = "VIEW MODE" THEN TIMmodeX = -1 s c a l $ = DIM SC(4) AS STRING * 15 SC(1) = "Auto s c a l e SC(2) = "Range s c a l e SC(3) = " S h i f t / M u l t i p l y " SC(4) = " E r a s e / Remove DIM SCVAL#(1 TO NUMVARS%) DIM s c d i v # ( l TO NUMVARS%) DIM SCM0D%(1 TO NUMVARS%) DIM v a r n z * ( l TO NUMVARS%) DIM AVE*(1 TO NUMVARS%) DIM colminsKl TO NUMVARS%) £ DIM c o l m a x s ! ( l TO NUMVARS%) O IF TIMmode% THEN DIM col#(NUMVARSX) END IF p t l $ = " Column it , Mode = pt$= " II | 11.111"" | # # . H i t " " | i t . i t i t " " | tl.mt'" DIM pt2$(4) p t 2 $ ( l ) =• " Sca le mean to :###.####,Use v a r i a b l e ti v a r i a n c e p t2$(2) =• " Range Between Hi Hit (min) and i i i . i i i i (max)" p t2$(3) = " S h i f t mean to :itt.Htl, M u l t i p l y by i i i . i i i i pt2$(4) c u l a t e v a r i a b l e s t a t i s t i c s f o r each column. c% = 1 FOR IX = 1 TO NUMVARS% SCVAL#(IX) = 0 scd iv# ( IX ) • IX SCMODX(IX) = 1 IF TIMmodeX THEN CALL r e a d c o l ( F $ , c o l # ( ) , IX) FOR JX = 1 TO NUMsamsX mess#(JX, 1) = co l#(JX) NEXT ELSE cX = IX END IF varnz#( IX) • VARIANCE#(mess#(), c X , NUMsamsX, AVE#(IX)) c o l m a x s ! ( I X ) = co lmax! (mess#() , c X , 1, NUMsamsX, co lm1ns ! ( IX ) ) NEXT COLOR 11, bgrnd CLS LOCATE 1, 1 PRINT STRING$(80, " * " ) ; LOCATE 2 , 24 PRINT "SCALING : " ; F$ LOCATE 3 , 1 PRINT " V a r i a b l e | Va r iance | Average | Maximum | Minimum IF TIMmodeX THEN 16 PRINT " Name" ELSE PRINT " Mode " ENO IF SOUND 400, 1 ' Wake up Bob TOPROWX = 1 16 COLOR 11 LOCATE TOP, 1 P r i n t out s t a t i s t i c s f o r each co lumn. B0TR0WX = TOPROWX + SCRNROWSX IF B0TR0WX > NUMVARSX THEN B0TR0WX » NUMVARSX FOR IX = TOPROWX TO BOTROWX PRINT USING p t $ ; IX; CSNG(varnz#( IX ) ) ; CSNG(AVE#(IX)) ; c o l m a x s ! ( I X ) ; PRINT c o l m i n s ! ( I X ) ; IF TIMmodeX THEN PRINT varnam$(IX) ELSE PRINT SC(SCMODX(IX)) END IF NEXT IX IF NOT TIMmodeX THEN LOCATE 25 , 1 PRINT " P r e s s L t o l o a d i n saved Format f i l e . P - P r i n t . " ; PRINT " ESC - E x e c u t e " ; SPACE$(16) ; LOCATE 23 . 1 COLOR 0 , bgrnd PRINT " " + varnam$(CURROWX); STRING$(40, " " ) ; ELSE LOCATE 25, 1 PRINT " P r e s s ESC t o r e t u r n . P - P r i n t numbers o u t . F - l o t u s F i l e " ; END IF ' H i g h l i g h t c u r r e n t v a r i a b l e (CURROWX) COLOR y e l l o w , h l g h t LOCATE CURROWX + TOP - TOPROWX, 1 £ PRINT USING p t $ ; CURROWX; CSNG(varnz#(CURROW%)); CSNGI colnaxs!(CURROWX); colmlns!(CURROW%); IF TIMmodeX THEN PRINT varnam$(CURROWX); ELSE PRINT SC(SCMODX(CURROW%)); END IF 212 COLOR y e l l o w , bgrnd IF NOT TIMmode% THEN LOCATE 24, 1 PRINT USING p t l $ ; CURROWX; IF SCMODX(CURROWX) » 4 THEN COLOR 14, n igh t PRINT LEFT$(SC(SCMODX(CURROWX)), 5 ) ; P r i n t out the bottom menu l i n e IF SCMODX(CURROWX) <> 4 THEN LOCATE 24. 27 PRINT USING pt2$(SCM0DX(CURR0WX)); SCVAL#(CURROWX); scdiv#(CURROW%); ELSE COLOR bgrnd, bgrnd PRINT STRING$(54, " " ) ; END IF ' P r i n t out the CURSTATX s e l e c t i o n i n h i g h l i g h t COLOR y e l l o w , h lgh t LOCATE 24, (CURSTATX - 1) * 22 + 22 IF SCMOD%(CURROWX) < 4 THEN SELECT CASE CURSTATX CASE 1 PRINT LEFT$(SC(SCMOD%(CURROWX)), 5 ) ; CASE 2 PRINT USING "###.####"; SCVAL*(CURROWX); CASE 3 IF SCMODX(CURROWX) = 1 THEN PRINT USING "##"; scdiv#(CURROW%); ELSE PRINT USING "###.####"; scdiv#(CURROWX); 17 ): 18 END IF END SELECT END IF END IF the RIGHT key was p ressed the LEFT key was p ressed Get command key and p rocess 1236 CALL ge tkey(a$) IF ASC(a$) = 0 THEN a$ = RIGHT$(a$, 1) 1237 SELECT CASE a$ CASE CHR$(77) CURSTATX = CURSTATX + 1 IF CURSTATX > 3 THEN CURSTATX = 1 GOTO 212 CASE CHR$(75) CURSTATX = CURSTATX - 1 IF CURSTATX = 0 THEN CURSTATX = 3 GOTO 212 CASE CHR$(72) CURROWX = CURROWX - 1 IF CURROWX = 0 THEN CURROWX = NUMVARSX IF CURROWX < TOPROWX THEN TOPROWX » CURROWX IF CURROWX > BOTROWX THEN TOPROWX = CURROWX CASE CHR$(80) CURROWX = CURROWX + 1 IF CURROWX > NUMVARSX THEN CURROWX = 1 IF CURROWX < TOPROWX THEN TOPROWX = CURROWX IF CURROWX > BOTROWX THEN TOPROWX = CURROWX - SCRNROWSX CASE " L " IF TIMmodeX THEN 1236 LOCATE 25, 1 INPUT ; " Inpu t name o f format f i l e to l o a d : " , a$ a$ = UCASE$(a$) CALL GETFIL(a$ , " S F F " , EXISTX) IF a$ = " " THEN 1236 IF NOT EXISTX THEN LOCATE 25, 1 PRINT STRING$(80, " " ) ; the UP key was p ressed SCRNROWSX the Down key was p ressed 03 19 LOCATE 25, 1 PRINT " C a n ' t f i n d " ; a$ ; CALL s l e p GOTO 1236 END IF CALL SCLFORLOD(a$, s c a l $ , SCMODX(), SCVAL#(), scd i v# ( ) ) CASE " P " , " p " GOSUB p r i n o u t CASE " F " , " f " GOSUB f i l o u t CASE CHR$(27) ' ESC key p ressed GOTO 39 CASE CHR$(13) ' RETURN CTRL-M IF TIMmodeX THEN 1236 COLOR y e l l o w , bgrnd LOCATE 25, 1 PRINT SPACE$(79) ; LOCATE 25, 1 SELECT CASE CURSTATX CASE 1 ' Mode s e l e c t o r MOX = SCMODX(CURROWX) LOCATE 25, 1 PRINT " S e l e c t new mode - " ; FOR IX = 1 TO 4 PRINT S C ( I X ) ; NEXT IX LOCATE 25, LEN(SC(1)) * 5 + 5 LOCATE 25, MDX * 15 + 4 COLOR y e l l o w , n igh t PRINT SC(MDX); COLOR y e l l o w , bgrnd a$ = " " n up menu on bottom l i n e f o r Mode s e l e c t i o n CALL ge tkey(a$) IF a$ = CHR$(13) GOTO 61 a$ = RIGHT$(a$. 1) IF a$ = CHR$(75) THEN 'LEFT 20 MDX = MDX - 1 IF MDX = 0 THEN MDX = 4 ELSEIF a$ = CHR$(77) THEN 'RIGHT MDX = MDX + 1 IF MDX = 5 THEN MDX = 1 END IF GOTO 58 IF MDX <> SCM0DX(CURROWX) THEN ' change i n mode SCMODX(CURROWX) = MDX IF MDX = 1 THEN SCVAL#(CURROWX) = 0 scdivf(CURROWX) = CURROWX ELSEIF MOX = 2 THEN SCVAL#(CURROWX) = 0 scdiv#(CURROWX) = 1 ELSEIF MDX = 3 THEN SCVAL*(CURROWX) = 0 scdiv#(CURROWX) = 1 ELSE 'mdX = 4 == Erase v a r i a b l e END IF END IF CASE 2 ' v a l u e s e l e c t o r COLOR y e l l o w , h l g h t PRINT " Change v a l u e to : " ; INPUT ; " " , a$ IF SCMODX(CURROWX) = 2 THEN ' RANGE s c a l i n g : check MIN = MAX IF VAL(a$) = scdiv#(CURROWX) THEN LOCATE 25 , 1 PRINT "The v a l u e s f o r maximum and minimum must b e " ; PRINT " d i f f e r e n t " ; SOUND 37 , 3 GOTO 1236 END IF END IF SCVAL#(CURROWX) = VAL(a$) £ CASE 3 'd iv# s e l e c t o r vo COLOR 14, n i gh t IF SCMODX(CURROWX) = 1 THEN INPUT ; " S c a l e column t o v a r i a n c e o f v a r i a b l e #", dv# = VAL(aS) IF dv# > 0 AND dv# < NUMsamsX THEN scdiv#(CURROW%) ELSEIF SCM0DX(CURROWX) - 2 THEN INPUT ; " S c a l e maximum to =", a$ IF VAL(aS) = SCVAL#(CURROWX) THEN LOCATE 25 , 1 PRINT SPACE$(79); LOCATE 25, 1 SOUND 90 , 1 PRINT "The va lues f o r Maximum and Minimum"; PRINT " cannot be the same"; GOTO 1236 ELSE scdiv#(CURROWX) = VAL(a$) END IF ELSE PRINT " Va lue t o m u l t i p l y by ="; INPUT ; " " , a$ dv# = VAL(a$) IF dv# <> 0 THEN scdiv#(CURROWX) = dv# ELSE LOCATE 25 , 1 COLOR 14, h lgh t PRINT "Use ERASE/REMOVE to e l i m i n a t e v a r i a b l e " ; PRINT " from a n a l y s i s " ; SOUND 100, 1 GOTO 1236 END IF END IF CASE ELSE END SELECT CASE ELSE 22 END SELECT a$ = GOTO 16 'GOSUB tha t p r i n t s out s t a t i s t i c s t o a s t a n d a r d l i n e p r i n t e r p r i nout : LOCATE 25, 1 PRINT STRING$(80, " " ) ; LOCATE 25, 1 PRINT "Se t p r i n t e r t o top o f form and p r e s s any key . ESC a b o r t s . " ; CALL ge tkey (a$) IF a$ = CHR$(27) THEN LOCATE 25 , 1 PRINT STRING$(80, " " ) ; GOTO 16 END IF pt t im$ = " | #«.*»»"" \tf.*m"" | ##.####"" | ##.####"" |" LPRINT " V a r i a b l e s t a t i s t i c s f o r : " ; F$ LPRINT " | V a r f a b l e | V a r i a n c e | Average | Maximum | Minimum | " LPRINT " I + + + — + FOR IX = 1 TO NUMVARSX LPRINT STRING$(7, " " ) + " | " + LEFT$(varnam$(IX) + SPACE$(12) , 1 2 ) ; LPRINT USING pt t1m$; CSNG(varnz#( IX ) ) ; CSNG(AVE#(IX)) ; c o l m a x s ! ( I X ) ; c o l m i n s l ( I X ) NEXT IX LPRINT LPRINT " Th i s da ta packaged on : " ; 0ATE$ RETURN ' GOSUB which p r i n t s s t a t i s t i c s t o a s tanda rd ASCII f i l e f o r use i n LOTUS e g . f i l o u t : Q$ = CHR$(34) c$ = " , " QCS = Q$ + c$ + Q$ OPEN " o " . #1, " S t a t s f l l . v p p " ^ PRINT #1, Q$; " V a r i a b l e s t a t i s t i c s f o r : " ; F$ ; Q$ o PRINT #1, Q$ + " V a r i a b l e " + QC$ + " V a r i a n c e " + QC$; PRINT #1, "Ave rage " + QC$ + "Maximum" + QC$ + "Minimum" FOR 1% = 1 TO NUMVARSX PRINT #1, Q$; varnam$(IX) ; Q$; c $ ; CSNG(varnz#( IX)) ; c $ ; PRINT #1, CSNG(AVE#(IX)); c $ ; c o l m a x s l ( I X ) ; c $ ; c o l m i n s l ( I X ) NEXT IX CLOSE #1 RETURN 'Now we get t o do the s c a l i n g 39 IF TIMmodeX THEN EXIT SUB ' bye T1m ! COLOR 15, 13 FOR IX = 22 TO 25 LOCATE IX, 1 PRINT STRING$(79, " " ) ; NEXT IX LOCATE 22 , 1 PRINT " Enter User-name o f s c a l i n g format : <"; s c a l $ ; " > " ; INPUT a$ IF a$ <> " " THEN s c a l j = LEFT$(a$, 20) LOCATE 22, 2 PRINT s c a l $ ; SPACE$(60); LOCATE 23 , 10 INPUT ; "Save s c a l i n g format ( Y / N ) " ; a$ s a v s c a l $ = LEFT$(UCASE$(a$), 1) IF s a v s c a l j = " Y " THEN LOCATE 23, 2 INPUT ; "Name o f User s c a l i n g format f i l e to save : " , f i $ f i $ = UCASE$( f i$ ) SCFILE$ = FRTEXT( f i $ , f o rmr t$ , formext$) IF SCFILEI » " " THEN s a v s c a l $ = " N " f i $ = fo rmr t$ + SCFILE$ + formext$ LOCATE 23, 2 IF s a v s c a l $ = " Y " THEN PRINT " S a v i n g Sca le format f i l e : " ; f i $ ; SPACE$(25); CALL s c l f o r s a v ( f i $ , s c a l $ , SCMODXO, SCVAL#(), scd i v# ( ) ) END IF 24 FOR IX = 23 TO 25 LOCATE IX, 1 PRINT SPACE$(79) ; NEXT END IF LOCATE 25 , 20 COLOR y e l l o w , bgrnd PRINT "Now S c a l i n g the D a t a " ; FOR IX = 1 TO NUMVARSX SELECT CASE SCMODX(IX) CASE 1 ' au to s c a l i n g v l# = AVE#(IX) dv# = SQR(varnz#(scd iv# ( IX ) ) ) MDX = 3 CALL s c a l e c o l ( m e s s # ( ) . IX, MDX, v l # , dv#) IF SCVAL#(IX) <> 0 THEN v l# = -SCVAL#(IX) dv# = 1 MDX = 3 CALL s c a l e c o l ( m e s s # ( ) , IX, MDX, v l # , dv#) END IF CASE 2 ' range s e a l i n g CALL s e a l e c o l ( m e s s # ( ) , IX , 2 , SCVAL#(IX) , s cd i v# ( IX ) ) CASE 3 MDX = 3 v l# = SCVAL#(IX) dv# = 1 / scd i v# ( IX ) CALL s c a l e c o l ( m e s s # ( ) , IX . MDX, v l # , dv#) CASE ELSE DELVARX(IX) = -1 END SELECT NEXT IX END SUB SUB AFA (FILES) O 25 A Pe te r Wen tze l l r o u t i n e ( from MALINOWSKI) Th is s u b r o u t i n e per forms the a b s t r a c t f a c t o r a n a l y s i s on the a r ray D# The e i g e n v e c t o r s (column ma t r i x ) a re s t o r e d i n EVEC# w i t h the co r respond ing e i g e n v a l u e s i n EVAL# and the row m a t r i x In D# 0#() - Is the a r ray d f r e a l data tha t i s read from FILES EVEC# - 1s the a r ray tha t c o n t a i n s the e i g e n v e c t o r s . ( f i r s t index i n d i c a t e s v e c t o r number) EVAL# - the e igenva lues Load i n f i l e and check f o r s c a l i n g PRINT "Load ing f i l e " CALL r e a d p a r s ( F I L E S , NROWSX, NCOLSX, e x t l % , ext2%) DIM d#(NR0WS%, NC0LS%) DIM varnam$(NCOLS%) DIM time!(NROWS%) DIM cLass(NROWSX) AS RECRD DIM id(NROWSX) AS RECRD DIM EXTRA(NROWSX) AS RECRD DIM EXTEN$(NROWSX, ext2X) DIM EXNAMS(ext2X) DIM EXTSX(ext2X) CALL r e a d v a l s ( F I L E S , varnam$() , d#() , 1d ( ) , c L a s s ( ) , t 1 m e ! ( ) , EXTRA(), EXR$.EXNAM$(), EXTSX() , EXTEN$(), sca ledS) Per form any d e s i r e d s c a l i n g on the data DIM DELVARX(NCOLSX) PRINT STRING$(60, " " ) ; OUTFILES = FRTEXT$(FILES, FR$, FX$) OUTFILES = FR$ + OUTFILES + " . A F 2 " CALL s c a l e r ( d # ( ) , varnam$() , F ILES, s c a l e d S , DELVARX(), OUTFILES) IF varnam$(0) = "EX IT " THEN EXIT SUB CLS LOCATE 3 , 5 LOCATE 3. 15 26 PRINT "Now Per fo rm ing A b s t r a c t F a c t o r A n a l y s i s " LOCATE 5, 2 Remove any v a r i a b l e s from m a t r i x tha t a re f l a g g e d In DELVARX(). The columns must be compressed. Th i s w i l l a l s o dec rease the number o f f a c t o r s removed and speed compu ta t i on . NSHIFTX = 0 s h i f t X = 0 F ind f i r s t v a r i a b l e t o be removed and the number t o be removed. FOR IX = 1 TO NCOLSX IF DELVARX(IX) THEN NSHIFTX = NSHIFTX + 1 IF s h i f t X = 0 THEN s h i f t X = IX END IF NEXT IF NSHIFTX > 0 THEN FOR JX = s h i f t X TO NCOLSX - NSHIFTX DO s h i f t X = s h i f t X + 1 LOOP UNTIL NOT DELVARX(sh i f tX ) FOR KX = 1 TO NROWSX d#(KX, JX) = d#(KX, s h i f t X ) NEXT NEXT END IF NCOLSX = NCOLSX - NSHIFTX C a l c u l a t e c o v a r i a n c e (unnorma l l zed) m a t r i x and s t o r e i n zt DIM z#(NC0LSX, NCOLSX) LOCATE 20, 5 PRINT " C a l c u l a t i n g c o v a r i a n c e m a t r i x FOR IX = 1 TO NCOLSX LOCATE 20, 37 PRINT USING "###"; IX; FOR JX = 1 TO NCOLSX sum# = 0 FOR KX = 1 TO NROWSX 1 / " ; NCOLSX; to o NJ 27 sum# = sum* + d#(K%, 1%) * d* (KX, J%) NEXT z * ( I X , J%) = sum* NEXT NEXT Covar iance m a t r i x c a l c u l a t e d . Now determine e i g e n v e c t o r s and e igenva lues by i t e r a t i o n DIM temp*(NCOLSX) DIM evec*(NCOLSX. NCOLSX) DIM EVAL*(NCOLSX) f a p r o x * = SQR(1* / NC0LS%) FOR 1% = 1 TO NC0LS% FOR JX = 1 TO NC0LS% evec*( I%, J%) « f ap rox * NEXT EVAL*( IX) = 1 NEXT Now loop to i t e r a t e each e i g e n v e c t o r t o s e l f - c o n s i s t e n c y M u l t i p l y e i g e n v e c t o r a p p r o x i m a t i o n by cova r i ance m a t r i x and no rma l i se to get new e i g e n v e c t o r and e i g e n v a l u e . Check e igenva lue f o r s e l f - c o n s i s t n c y LOCATE 22, 1 PRINT SPACE$(79); LOCATE 22, 6 PRINT "Now c a l c u l a t i n g v e c t o r * o f " ; NC0LS%; FOR 1% = 1 TO NCOLSX LOCATE 22, 31 PRINT USING "*#*"; IX; 145 sumsq* * 0 FOR JX = 1 TO NCOLSX temp*(JX) = 0 FOR KX = 1 TO NCOLSX temp*(JX) = temp*(JX) + z#(JX, KX) * evec#(IX, KX) NEXT 28 sumsq* = sumsq* + temp*(JX) * 2 NEXT o l d v a l * = EVAL*(1X) EVAL#(IX) = SQR(sumsq*) FOR JX = 1 TO NCOLSX evec#( IX, JX) = temp#(JX) / EVAL*( IX) NEXT i c n t X = i c n t X + 1 IF i c n t X = 1 GOTO 145 t e s t * = ABS( (EVAL* ( IX ) - o l d v a l * ) / EVAL* ( IX ) ) IF t e s t * > .000001 GOTO 145 S e l f c o n s i s t e n c y a t t a i n e d - now c a l c u l a t e r e s i d u a l m a t r i x t o be used i n de te rmin ing next e i g e n v e c t o r , u n l e s s a l l have been a t t a i n e d . IF IX = NCOLSX GOTO 140 i c n t X = 0 FOR J X = 1 TO NCOLSX FOR KX = 1 TO NCOLSX z#(JX. KX) = z * ( J X , KX) - EVAL*( IX) * e v e c * ( I X , JX) * evec#( IX, KX) NEXT NEXT 140 NEXT Column m a t r i x c a l c u l a t e d , now c a l c u l a t e row m a t r i x by m u l t i p l y i n g o r i g i n a l data m a t r i x by i n v e r s e ( t r a n s p o s e ) o f the column m a t r i x . Row m a t r i x i s s to red i n o r i g i n a l da ta by u t i l i z i n g a temporary s t o rage a r r a y . LOCATE 24, 1 PRINT SPACE$(79) ; LOCATE 24, 5 PRINT " C a l c u l a t i n g s c o r e s m a t r i x . ( / " ; NROWSX; " ) " ; FOR IX = 1 TO NROWSX LOCATE 24, 34 PRINT USING "##*"; IX; FOR JX = 1 TO NCOLSX ^ temp*(JX) = 0 CJ FOR K% = 1 TO NC0LS% temp#(J%) = temp*(JX) + d#(IX, KX) * evec#(J%, K%) NEXT NEXT FOR JX = 1 TO NCOLS% d#(IX, J%) = temp#(JX") NEXT NEXT A b s t r a c t f a c t o r a n a l y s i s comple te , save f i l e . Output r e s u l t s to f i l e . The f i l e i s g i ven the same name(and roo t ) but i s s t o r e d w i t h the ex tens ion .AF2 A f i l e f o r f a c t o r a n a l y s i s would no rma l l y on l y have the ex tens ions .DES; .SCL ; .DS1; .DS2 LOCATE 15, 10 PRINT "Output f i l e - " ; OUTFILES Q$ = CHRS(34) OPEN " o " , #2, OUTFILES PRINT #2, NROWSX, NCOLS* PRINT #2, ext2%, e x t l X IF e x t l % <> 0 THEN PRINT #2, QS + EXR$ + Q$ FOR n% = 1 TO ex t2X PRINT * 2 , Q$ + EXNAMS(nX) + CHRSC&H20 OR EXTSX(n%)) + Q$ NEXT FOR 1% = 1 TO NCOLSX PRINT #2, EVAL#(I%) FOR J% = 1 TO NC0LS% - 1 PRINT #2, e v e c * ( I X , J%), NEXT PRINT #2, evec#(I%, NC0LS%) NEXT FOR 1% = 1 TO NROWSX PRINT #2, Q$ + 1d( IX) .R + Q$; Q$ + cLass( I%) .R + Q$; PRINT #2, t ime ! ( I%) , Q$ + EXTRA(I%).R + Q$; FOR J % = 1 TO ext2% PRINT #2, QS + EXTEN$(IX. JX) + Q$; 29 30 NEXT FOR J % = 1 TO NC0LS% - 1 PRINT #2, d#(I%, J%) , NEXT PRINT #2, d * ( I X , NCOLS%) NEXT PRINT #2, scaledS FIRSX = -1 FOR 1% = 1 TO NC0LS% + NSHIFTX IF NOT DELVARX(IX) THEN IF NOT FIRS% THEN PRINT #2, " , " ; END IF PRINT #2, varnamS ( I X ) ; FIRS% = 0 END IF NEXT PRINT #2, CLOSE #2 P r i n t out r e s u l t s f i l e REPFILES = FRTEXT$(OUTFILE$, a $ , B$ REPFILES = aS + REPFILES + " . R E S " DIM o u t f u n c s ! ( 5 ) DIM 0UTFUNC$(5) REpos% = 0: IEposX = 0: INDposX = 0 CPVposX = 5 VARposX = 4 EoAposX = 2 rmsPOSX = 3 NACposX = 1 sum* = 0 FOR 1% = 1 TO NCOLSX sum* = sum* + EVAL#(IX) NEXT OUTFUNCS(REposX) = " RE " OUTFUNCS(IEposX) = " IE " OUTFUNCS(INDposX) * " IND" OUTFUNCS(CPVposX) = " CPV" OUTFUNCS(VARposX) = " VAR" OUTFUNCS(EoAposX) = " 1 / A V " OUTFUNCS(rmsPOSX) = " RMS" OUTFUNCS(NACposX) = "1/1 " to o 31 OPEN " 0 " , #5, REPFILES PRINT #5, "Summary o f F a c t o r a n a l y s i s " ; DATES PRINT #5. " " PRINT #5, F ILES , " => " : s c a l e d S ; " => " ; OUTFILES PRINT #5, PRINT #5, " N E-VALUE(l ') " ; FOR 1% = 1 TO 5 PRINT #5, OUTFUNCS(IX); " " ; NEXT PRINT #5, PRINT #5, " == ========== " ; FOR IX = 1 TO 5 PRINT #5, "==== " ; NEXT PRINT #5, FOR IX = 1 TO NCOLSX SUM2Y# = 0 FOR JX = IX + 1 TO NCOLSX SUM2Y* = SUM2Y* + EVAL#(JX) NEXT IF IX < NCOLSX THEN ' s k i p over f u n c t i o n s tha t a r e n ' t l e g a l Real E r r o r ou t funcs ! (REposX) » SQR(SUM2Y# / (NROWSX * (NCOLSX - IX ) ) ) imbedded e r r o r o u t f u n c s ! ( I E p o s X ) = SQR(SUM2Y# * 1% / (CDBL(NROWSX) * NCOLSX * (NCOLSX - IX) ) ) i n d i c a t o r f u n c t i o n ou t funcs ! ( INDposX) = ou t funcs ! (REposX) / (NCOLSX - IX) * 2 change i n e i g e n v a l u e (N A c c e l e r a t i o n ) ou t funcs ! (NACposX) = EVAL#(I%) / EVAL#(IX + 1) ELSE ' s e t them to dummy va lues - d o n ' t wor ry , we won' t p r i n t them *8 ou t funcs ! (REposX) = 111.111 o u t f u n c s ! ( I E p o s X ) = 111.111 ou t funcs ! ( INDposX) = 111.111 out funcs! (NACposX) = 111.111 32 END IF Root Mean Square E r r o r ou t funcs l ( rmsPOSX) = SQR(SUM2Y# / (NROWSX * NCOLSX)) v a r i a n c e ou t funcs ! (VARposX) = EVAL#(IX) / sum* CUMULATIVE PERCENT ERROR ou t f uncs ! (CPVposX) = 100 * (sum* - SUM2Y#) / sum* e igenva lue as f r a c t i o n o f average ou t f uncs ! (EoAposX) = EVAL#(IX) * NCOLSX / sum* prntS = " tt tt.tft PRINT #5, USING p r n t S ; IX; EVAL#(IX) ; FOR PX = 1 TO 5 IF o u t f u n c s l ( P X ) = 111.111 THEN PRINT #5, " " ; ELSE IF P% = CPVposX THEN PRINT #5, USING " ###.### " ; o u t f u n c s ! ( P X ) ; ELSE PRINT #5, USING "tt J * ; o u t f u n c s ! ( P X ) ; END IF END IF IF P% = 5 THEN PRINT #5, NEXT NEXT PRINT #5, IF NSHIFTX > 0 THEN IF NSHIFTX = 1 THEN PRINT #5, " V a r i a b l e removed from a n a l y s i s : " ; ELSE PRINT #5, "These v a r i a b l e s removed from a n a l y s i s : " END IF FOR IX = 1 TO NSHIFTX DO c o l % = c o l X + 1 ^ LOOP WHILE DELVARX(colX) = 0 S PRINT #5, varnam$(col%) NEXT END IF CLOSE #5 ERASE varnam$ ERASE EVAL# ERASE evec* ERASE d# ERASE tempi ERASE DELVARX ERASE z# FILES - OUTFILES END SUB FUNCTION co lmax! (MAT#(), c o l X , s t a r t r owX , ENDROWX, COLMIN!) r e t u r n s the maximum va lue o f the column COLX i n ma t r i x MAT* from STARTRO' ' t o ENDROWX. COLMIN! Is re tu rned as the minimum va lue o f the column CONST INIT! = -999 cmax! = INIT COLMIN! = cmax! FOR nrX = s ta r t r owX TO ENDROWX vl# = MAT#(nrX, c o l X ) IF vl# < COLMIN! OR COLMIN! = INIT THEN COLMIN! = vl# IF vl# > cmax! OR cmax! * INIT THEN cmax! = v l# NEXT nrX co lmax! = cmax! END FUNCTION SUB cpvhe lp COLOR 11, 12 CLS LOCATE 1, 20 PRINT "He lp sc reen fo r CPV - Cumulat ive Percent V a r i a n c e " PRINT 34 PRINT " The cumu la t i ve pe rcen t v a r i a n c e i s a measure o f the percent o f the t o t a l " PRINT " v a r i a n c e i n the da ta wh ich i s accounted f o r by the c u r r e n t se t o f p r ima ry " PRINT " e i g e n v e c t o r s . " PRINT PRINT " n PRINT " (N * 1 ) " PRINT " :1 " PRINT " CPV(N) = " PRINT " c PRINT " (N * 1 PRINT " :1 " PRINT " PRINT PRINT " The percen t v a r i a n c e c r i t e r i o n a c c e p t s the set o f l a r g e s t e i g e n v a l u e s " PRINT " r e q u i r e d to account f o r the v a r i a n c e w i t h i n a chosen s p e c i f i c a t i o n . The" PRINT " problem i s t ha t i n o r d e r to use a 98X (eg . ) v a r i a n c e l e v e l , one must be a b l e " PRINT " t o make an a c c u r a t e e s t i m a t e o f the t r u e v a r i a n c e In the d a t a . " CALL s l e p CLS END SUB This r o u t i n e draws a c r o s s a t p o s i t i o n xX .yX on the s c r e e n . It w i l l a l s o erase the c r o s s depending on the s e t t i n g o f CR0SSC0LX ' SQUAREXO i s an a r r a y wh ich a 10x10 square i s s t o r e d . The next c a l l o f t h i s ' r o u t i n e w i l l r e p l a c e the o r i g i n a l p i c t u r e ( the reby e r a s i n g the p r e v i o u s ' c r o s s h a i r s ) . OLDGETX Is used to de termine I f t he re was a p rev ious GET. ' XOLDX, YOLDX c o n t a i n the c o o r d i n a t e s tha t the a r r a y SQUARE r e f e r s t o . SUB CROSSHAIRS (XX, YX, CROSSCOLX) STATIC DIM squareX(45) IF o l dge tX THEN to o PUT (xold%, y o l d X ) . squareX, PSET oldget% = 0 END IF IF CROSSCOLX >= 0 THEN xold% = XX - 4 IF xo ldX < 0 THEN xo ldX '« 0 yo ldX = YX - 5 IF yo ldX < YMINX THEN yo ldX = YMINX GET ( xo ldX . y o ! d X ) - ( x o l d X + 9 , yo ldX + 9 ) . squareX LINE (XX, YX + 4)-(XX, y o l d X ) , CROSSCOLX LINE ( xo ldX , YX) - (XX + 5 , Y X ) , CROSSCOLX o l dge tX » -1 END IF END SUB Th is r o u t i n e i s c a l l e d by SCAT to draw a c i r c l e o f c o l o r COLRX around the p o s i t i o n XX, YX. The r a d i u s ( i n p l o t u n i t s ) i s RAOX The p rev ious image i s con ta ined i n CRCLEX() . Th is r o u t i n e opera tes much the same way tha t CROSSHAIRS ope ra tes . BOXTHEREX i s a boolean va lue tha t s t a t e s whether t he re i s a l r e a d y a box SUB eggbox (XX, YX, RADX, co>rX, c r c l e X O ) STATIC box thereX, xo ldX , yo ldX IF boxthereX THEN IF c o l r X >= 0 THEN PUT ( xo ldX , y o l d X ) , c r c l e X , PSET boxthereX = 0 ELSE boxthereX = 0 END IF END IF IF c o l r X > 0 THEN GET (XX - RADX - 1. YX - RADX - 1 ) - (XX + RADX + 1, YX + RADX + 1 c r c l e X CIRCLE (XX, Y X ) , RADX, 15, . , 1 xo ldX = XX - RADX - 1 yo ldX = YX - RADX - 1 35 36 boxthereX = -1 ELSE ERASE c r c l e X END IF END SUB SUB eoahelp COLOR 11, 12 CLS LOCATE 1, 20 PRINT " H e l p - s c r e e n f o r 1/AV - r a t i o t o average e i g e n v a l u e " PRINT PRINT " Th i s column c o n t a i n s the e i g e n v a l u e s d i v i d e d by the average e i g e n v a l u e f o r " PRINT " the e n t i r e f a c t o r s p a c e . " PRINT PRINT " The average e i g e n v a l u e proposed by K a i s e r i s based upon a c c e p t i n g a l l " PRINT " e i g e n v a l u e s w i t h v a l u e s above the average and r e j e c t i n g those b e l o w . " CALL s l e p CLS END SUB SUB e v a l h e l p COLOR 11. 12 CLS LOCATE 1, 20 PRINT " E i g e n v a l u e h e l p - s c r e e n " PRINT PRINT " The e i g e n v a l u e s a re c a l c u l a t e d as the sum o f the squares o f t h e " PRINT " c o e f f i c i e n t s o f the e i g e n v e c t o r s . T h e r e f o r e , the l a r g e r e i g e n v a l u e s " *o o 37 PRINT " co r respond to the e i g e n v e c t o r s ( f a c t o r s ) tha t are more i m p o r t a n t . " PRINT PRINT " The e i g e n v e c t o r s can be s p l i t i n t o two groups : those tha t account f o r " PRINT " r e a l v a r i a t i o n s i n the data (pr imary e i g e n v e c t o r s ) and those tha t a re due" PRINT " ma in l y t o no ise and exper imenta l e r r o r (secondary e i g e n v e c t o r s ) . " CALL s l e p CLS END SUB SUB f a c d i s h e l p COLOR 11 CLS LOCATE 1, 20 PRINT "He lp sc reen f o r F a c t o r Loadings D i s p l a y " PRINT PRINT " Th i s d i s p l a y shows the l oad ings o f the A b s t r a c t Fac to rs on each o f t h e " PRINT " o r i g i n a l d e s c r i p t o r s . The d e s c r i p t o r names are shown at the bottom o f the " PRINT " sc reen read ing v e r t i c a l l y . Above each d e s c r i p t o r i s p l o t t e d the a b s o l u t e PRINT " va lue o f the no rma l i zed f a c t o r l oad ings f o r the f a c t o r which i s l i s t e d PRINT " at the upper r i g h t o f the s c r e e n . The load ings f o r each f a c t o r a re d i s p l a y e d " . PRINT " i n the same c o l o r ac ross a l l d e s c r i p t o r s and i n the same c o l o r as t h e " PRINT " E igenva lue i n the upper r i g h t . The f a c t o r l oad ings f o r a f a c t o r a re a lways " PRINT " d i s p l a y e d i n the same c o l o r r e g a r d l e s s o f which f a c t o r " PRINT " i s the ' c u r r e n t ' f a c t o r be ing v iewed. Th is i s conven ient when v iew ing a PRINT " s imu l taneous p l o t o f s i x f a c t o r s . Th is can be done by p r e s s i n g " ; CHR$(34); 38 PRINT " I " ; CHR$(34); " . " PRINT " In t h i s d i s p l a y , the l o a d i n g s f o r the s i x f a c t o r s a re a l l d i s p l a y e d a t o n c e . " PRINT " Each f a c t o r i s d i s p l a y e d i n i t s own c o l o r and the l o a d i n g s a re c o n s e c u t i v e " PRINT " from l e f t to r i g h t f o r each d e s c r i p t o r . (NB. the l o a d i n g s f o r the f i r s t " PRINT " f a c t o r a re a lways d i s p l a y e d as b l u e . ) " PRINT " The v e r t i c a l s c a l e i s n o r m a l i z e d (0 t o 1) f o r the c u r r e n t f a c t o r . " PRINT " In RELa t l ve d i s p l a y ( d e f a u l t mode) the f a c t o r l o a d i n g s a re s c a l e d a c c o r d i n g " PRINT " to the r a t i o o f e i g e n v a l u e s ( w i t h the ' c u r r e n t ' e i g e n v a l u e ) . PRINT " " PRINT " Use the l e f t and r i g h t ar row keys to change the c u r r e n t f a c t o r . " PRINT " Use the V key t o t o g g l e between ABSo lu te (no s c a l i n g ) and R E L a t i v e d i s p l a y . " PRINT " Use the 0 key t o p r i n t the f a c t o r l o a d i n g s i n t o a f i l e f o r p r i n t i n g . " LOCATE 25, 25 PRINT " P r e s s Any K e y " ; END SUB SUB FACLODisplay (FILE$) YMINX = 30 YMAX% = 2 0 8 CLS l f t $ = CHR$(0) + CHR$(75) rg t$ = CHR$(0) + CHR$(77) c$ = " V - S c a l e a x i s f o r e i g e n v e c t o r D-D ig1 ta l va lues f o r l o a d i n g s ESC ? " com$ = SPACE$(80) LSET com$ = c$ PRINT STRING$(30, " * " ) ; PRINT " F a c t o r L o a d i n g s " ; PRINT STRING$(30, " * " ) ; PRINT 1 0 PRINT Reading f a c t o r s from : ; FILES co CALL r e a d p a r s ( F I L E $ , NUMsams%, NUMVARSX, e x t l X , ext2X) DIM EVecs!(NUMVARSX, NUMVARSX) DIM EVALS!(NUMVARSX) DIM varnam$(NUMVARSX) CALL r e a d f a c t s ( F I L E $ , NUMVARSX, E V A L S ! ( ) , E V e c s ! ( ) , varnam$()) SCREEN 9 COLOR 14, 0 LOCATE 1, 1 PRINT " F i l e : " ; FILES LINE (12, YMINX)- (12, YMAXX + 1 ) , 15 LINE - ( 6 4 0 , YMAXX + 1 ) . 15 IVANMODEX = -1 P r i n t out v a r i a b l e names c o l r X = 11 TBX = 1 IF NUMVARSX < 39 THEN TBX = 2 IF NUMVARSX < 25 THEN TBX = 3 FOR IX = 1 TO NUMVARSX IF c o l r X = 11 THEN c o l r X = 14 ELSE c o l r X = 11 END IF COLOR c o l r X CALL v e r p r i n t ( v a r n a m $ ( I X ) , 2 + TBX * 1%, 16) NEXT FIRSTVECX = 1 EVSCALEX = -1 P l o t f a c t o r l o a d i n g s COLOR 11 LINE (13, YMINX)-(649, YMAXX), 0, BF IF EVSCALEX THEN CALL v e r p r i n t f ' A B S LOADING", 1, 4) ELSE CALL v e r p r i n t ( " R E L LOADING", 1, 4) END IF 40 LOCATE 25, 1 PRINT com$; IF IVANMODEX THEN l a s t v e c X = FIRSTVECX ELSE l a s t v e c X = FIRSTVECX + 5 IF l a s t v e c X > NUMVARSX THEN l a s t v e c X • NUMVARSX END IF FOR JX = FIRSTVECX TO l a s t v e c X COLOR (JX) MOO 7 + 9 FACSCALEX = (YMAXX - YMINX) FOR IX = 1 TO NUMVARSX XposX = (1 + TBX * IX) * 8 + JX - FIRSTVECX Y d i s X = A B S ( E V e c s ! ( J X , IX) ) * FACSCALEX IF EVSCALEX THEN Y d i s X = Y d i s X * (EVALS! ( JX ) / EVALS!(FIRSTVECX)) END IF IF Y d i s X > 0 THEN LINE (XposX, YMAXX)-(XposX,YMAXX - Y d i s X + 1) NEXT NEXT LOCATE 1, 40 COLOR (FIRSTVECX) MOD 7 + 9 PRINT USING " F a c t o r _### EVAL = # # . # # # ; FIRSTVECX. EVALS!(FIRSTVECX) PRINT ev$ ; 7 CALL ge tkey (a$ ) SELECT CASE a$ CASE CHR$(27) CLS SCREEN 0 EXIT SUB CASE " v " , " V " EVSCALEX = NOT EVSCALEX CASE " I " . " 1 " IVANMODEX = NOT IVANMODEX CASE " 0 " , " o " LOCATE 25, 1 41 PRINT STRING$(80. " " ) ; LOCATE 25 , 1 INPUT ; " E n t e r f i l ename f o r output : " , 0F$ IF 0F$ <> " " THEN OPEN " o " , #1, 0F$ PRINT #1, " F a c t o r l o a d i n g s f o r " ; FILES PRINT #1, STARTX = 1 1101 STEN0% = NUMVARSX IF STENDX - STARTX > 5 THEN STENDX = STARTX + 4 PRINT #1, "VARIABLE " ; FOR IX = STARTX TO STENDX PRINT #1, USING "FACTOR Jtt " ; IX; NEXT PRINT #1, PRINT #1, " - - " ; FOR IX = STARTX TO STENDX PRINT #1, " NEXT PRINT #1, FOR JX = 1 TO NUMVARSX PRINT #1, varnamS(JX); FOR IX = STARTX TO STENDX PRINT#1,USING" tt.ttt;CSNG(EVecs!(IX,JX)); NEXT PRINT #1, NEXT PRINT #1, SPACE$(8); FOR IX = STARTX TO STENDX PRINT #1, " " ; NEXT PRINT #1, PRINT #1, " E - v a l u e : " ; FOR IX = STARTX TO STENDX PRINT #1, USING " tt.ttt; CSNG(EVALS! ( IX) ) ; NEXT PRINT #1, IF STENDX < NUMVARSX THEN STARTX = STENDX + 1 PRINT #1, GOTO 1101 END IF CLOSE #1 END IF CASE " d " , " D " he lpX = 0 SCREEN 9 , , 1 CLS PRINT " V a r i a b l e Load ing on F a c t o r #"; FIRSTVECX; PRINT SPACE$(10) ; " E i g e n v a l u e ="; EVALSS(FIRSTVECX) PRINT " " COLOR 10 STARTX = 1 tabposX = 1 STENDX = NUMVARSX LOCATE 3 IF STENDX - STARTX > 18 THEN STENDX = STARTX + 17 FOR IX = STARTX TO STENDX P$ = "tt) " + varnam$(IX) + " tt.ttt LOCATE . tabposX PRINT USING P$ ; IX, CSNG(EVecs!(FIRSTVECX, IX) ) NEXT IF STENDX < NUMVARSX THEN tabposX = tabposX + 20 STARTX = STENDX + 1 GOTO 9001 END IF LOCATE 25 , 1 PRINT " P r e s s any k e y . . . " ; SCREEN 9 , , , 1 CALL ge tkey (a$) SCREEN 9, , 0, 0 CASE l f t $ FIRSTVECX = FIRSTVECX - 1 IF FIRSTVEC% = 0 THEN FIRSTVEC% = NUMVARSX CASE r g t $ FIRSTVEC% = FIRSTVECX + 1 IF FIRSTVECX > NUMVARSX THEN FIRSTVECX = 1 CASE " ? " IF he lpX THEN SCREEN 9, , 1, 1 ELSE he lpX = -1 SCREEN 9 , , 1. 1 CALL f a c d i s h e l p END IF CALL s l e p SCREEN 9, , 0 , 0 CASE ELSE GOTO 7 END SELECT GOTO 2 END SUB FUNCTION FEXISTX (F$ , LENGTHS) Th i s f u n c t i o n opens F$ f o r random access r e a d . LENGTHX i s re tu rned as the ' l e n g t h o f the f i l e . I f LENGTHS = 0 then F$ i s e rased and 0 i s re tu rned as the f u n c t i o n va lue IF F$ <> " " THEN OPEN F$ FOR RANDOM ACCESS READ WRITE AS #1 LENGTHS = L0F(1) CLOSE #1 IF LENGTHS > 0 THEN FEXISTX - -1 ELSE FEXISTX = 0 KILL F$ END IF END IF END FUNCTION FUNCTION FRTEXT$ ( F I L E S . ROOT$. ex t$ ) This r o u t i n e takes FILES and breaks i t i n t o r o o t , f i l e , e x t e n s i o n The f i l ename i s r e t u rned as the f u n c t i o n v a l u e . I f no roo t or e x t e n s i o n , then a n u l l s t r i n g Is r e t u r n e d . I f no f i l e n a m e , the f u n c t i o n r e t u r n s "NONAMES" as the va lue o f FRTEXTS. The f i l ename Is the f i r s t e i g h t c h a r a t e r s a f t e r the l a s t ' \ ' c h a r a c t e r . A l l spaces i n FILES are removed. LENGTHX = LEN(FILES) F$ = FILES FILES = " " ROOTS = " " FOR aX - 1 TO LENGTHX cS = MID$(F$, aX. 1) SELECT CASE c$ CASE " " THese a re c h a r a c t e r s t ha t a re i gnored - removed from FILES CASE " / " , These c h a r a c t e r s t e r m i n a t e the l e g a l components o f FILES The s t r i n g i s chopped o f f h e r e . EXIT FOR CASE ELSE The rema in ing p o s s i b i l i t i e s a re l e g a l c h a r a c t e r s f o r FILES FILES - FILES + c$ END SELECT NEXT aX ex tS = " " aX = 1 IF LEFTS(FILE$, 2) = THEN aX = 3 aX = INSTR(aX, F I L E S , " . " ) IF aX <> 0 THEN ex t$ = MIDS(FILE$. aX , 4) ROOTS = LEFT$(F ILE$ , aX - 1) ELSE ROOTS = FILES END IF 45 We have the e x t e n s i o n taken ca re o f , now k i c k out the r o o t . LENGTH% = LEN(ROOTS) a% = 0 DO nX = a% + 1 a% - INSTR(n%, ROOTS, " \ " ) LOOP UNTIL aX « 0 NAMES = RIGHT$(R0OT$, LENGTH* - n% + 1) ROOTS = LEFTS(ROOTS, nX - 1) a% = INSTR(NAMES, " : " ) IF a% = 2 THEN ROOTS = LEFTS(NAMES, 2) NAMES = RIGHTS(NAMES, LEN(NAMES) - 2) END IF NAMES = RTRIM$(LEFTS(NAMES + " " , 8) ) IF LEN(nameS) = 0 THEN name$ = " " The above l i n e i s added so tha t a DEFAULT f i l ename can be s u p p l i e d FRTEXTS = NAMES END FUNCTION FUNCTION GETDIRS T h i s f u n c t i o n takes a DIR of the cu r ren t d i r e c t o r y and saves i t as ' DIRFIL.TMP. The f i l e i s then read i n t o A$. I t i s assumed tha t the c u r r e n t d i r e c t o r y appears on the t h i r d l i n e , a f t e r a two-space t a b . I t was seen by DOS 3.30 tha t t h i s i s the c a s e . Other v e r s i o n s o f DOS may g i v e d i f f e r e n t ' f o rma ts . ( Indeed. DOS 4.01 does ! ) SHELL " d i r > d i r f i l . t m p " OPEN " 1 " . #1, " d i r f i l . t m p " LINE INPUT #1, a$ LINE INPUT #1, aS LINE INPUT #1, aS a% = INSTR(aS, " " ) ' two-spaces GETDIR = LTRIM|(RIGHTS(aS, LEN(a$) - a%)) CLOSE #1 KILL " d i r f i l . t m p " END FUNCTION 46 SUB GETFIL ( INF ILS, e x t $ , EXIST%) This r o u t i n e ge ts a f i l e name and checks i t f o r p roper fo rmat . FRTEXTS i s c a l l e d to pa rse the f i l e name In to i t s p a r t s . The ex tens i on 1s checked a / o se t a c c o r d i n g t o the v a l u e o f EXTS on c a l l i n g . ' EXTS on c a l l i n g : - con ta i ns a l i s t o f l e g a l e x t e n s i o n s f o r F IL$ sepe ra ted by p e r i o d s . ' eg . " . D S 1 . 0 E S . S C L " - con ta i ns a s u p p l i e d e x t e n s i o n tha t i s the ONLY l e g a l e x t e n s i o n . ' eg . " R E S " - con ta i ns a s u p p l i e d d e f a u l t e x t e n s i o n AND o t h e r a c c e p t a b l e e x t e n s i o n s . ' eg . " A F 2 . A F A " - i s b lank - meaning not t o check the e x t e n s i o n . NOTE : The d i f f e r e n c e between formats 1 and 3 i s t ha t 1n format 3 , i f none of the o the r l e g a l e x t e n s i o n s i s g i ven ( In the example above - . A F A ) , then the ex tens i on i s taken t o be the f i r s t t h ree c h a r a c t e r s preceded by a In format 1, i f no e x t e n s i o n i s g i v e n , FIL$ Is r e t u r n e d as a b l ank . IF ex tS <> " " THEN IF LEFT$(ex t$ , 1) <> " . " THEN DEFEXTENS = " . " + LEFT$(ex t$ , 3) DEFEXTEN% = -1 ex t$ = RIGHTS(ext$ , LEN(extS) - 3) IF ex tS <> " " THEN EXTENSIONS! = DEFEXTENS + ex tS CHECKEXTEN% = -1 END IF ELSE EXTENSIONS! = e x t $ CHECKEXTEN5C = -1 END IF END IF FS = FRTEXTS(INFIL$, FROOTS, FEXT$) ' CHECK FOR BAD EXTENSION to IF CHECKEXTEN% THEN £ IF INSTR(EXTENSIONS$, FEXT$) = 0 OR FEXT$ = " " THEN IF DEFEXTENX THEN FEXT$ = DEFEXTEN$ ELSE INFIL$ = " " EXISTX = 0 EXIT SUB END IF END IF ELSEIF DEFEXTENX THEN FEXTJ = DEFEXTEN$ END IF INFIL$ = FROOT$ + F$ + FEXT$ EXISTX = FEXISTX(INFIL$, 18.) END SUB SUB getkey (key$) a$ = " " c l e a r keyboard b u f f e r . Th is ensures tha t the program d o e s n ' t run away the u s e r . I t a l s o annoys the pants o f f o f people who are Impat ien t . BWAHAHA WHILE a$ <> " " a$ = INKEY$ WEND WHILE a$ = " " a$ = INKEY$ WEND key$ = a$ END SUB SUB i e h e l p CLS COLOR 11. 12 LOCATE 1. 20 PRINT " IE PRINT 47 PRINT PRINT PRINT PRINT PRINT PRINT PRINT PRINT PRINT PRINT PRINT PRINT PRINT PRINT PRINT PRINT PRINT from PRINT " PRINT " 48 The imbedded e r r o r f u n c t i o n can be used to determine the number o f f a c t o r s " i n a data m a t r i x w i t h o u t r e l y i n g upon any es t ima te o f the e r r o r . " 1/2.. (N * E V ) " IE(N) = n : N+l r c * (c - n) where : - N = e i g e n v e c t o r o f i n t e r e s t " c = number o f columns ( e i g e n v e c t o r s ) ' r = number o f rows ( samp les ) " T= e i g e n v a l u e " Imbedded E r r o r " The imbedded e r r o r i s a f u n c t i o n o f the secondary e i g e n v a l u e s . " The IE f u n c t i o n s h o u l d dec rease as we use more p r imary e i g e n v e c t o r s . However," once we have exhaus ted the p r imary s e t , the IE shou ld I n c r e a s e . ( T h i s may be" obscured i f non-random e r r o r s a r e p r e s e n t ) " CALL s l e p CLS END SUB SUB indhe lp COLOR 11, 12 CLS LOCATE 1, 20 PRINT " F a c t o r I n d i c a t o r f u n c t i o n " PRINT PRINT " Ma l inowsk l d i s c o v e r e d an e m p i r i c a l f u n c t i o n which appears to be more PRINT " s e n s i t i v e than the IE f u n c t i o n . " £j PRINT w 49 PRINT " PRINT " IND(N) = RE PRINT " " PRINT " (c - N) 2 " PRINT PRINT " The IND f u n c t i o n reaches a minimum when N equa ls the ' c o r r e c t ' number" PRINT " o f f a c t o r s . " PRINT PRINT " where : - N » e i genvec to r o f i n t e r e s t " RINT " c • number o f columns ( e i g e n v e c t o r s ) " PRINT CALL s l e p . CLS END SUB FUNCTION INSIDEX (X#. Y#, XD#, YD#, YSCALEI, XSCALE#, RAD%) Th i s f u n c t i o n i s used by SCAT t o determine i f the p i x e l at X#,Y# i s w i t h i n RAD% ' number o f p i x e l (determined by YSCALE* and XSCALE#) o f the p i x e l a t XD#.YD# XDST# = ABS(X# - XD#)' YDST# = ABS(Y# - YD*) y d i s t ! = YDST# * YSCALE# x d i s t ! = XDST# * XSCALE* d s t ! = S Q R ( x d i s t ! * 2 + y d i s t ! " 2) IF d s t ! <= RADX THEN REsX - -1 ELSE REsX = 0 END IF INSIDEX = REsX END FUNCTION SUB MARK (XX, YX, c o l X ) STATIC ' T h i s r o u t i n e draws a l i t t l e box at p o s i t i o n x , y w i t h c o l o r COLX PSET (XX. Y X ) , c o l X LINE (XX - 2 , YX - 2 ) - (XX + 2 , YX - 2 ) . c o l X 50 LINE (XX + 2 , YX - 2 ) - ( X X + 2 , YX + 2 ) , c o l X LINE (XX + 2 , YX + 2)-(X% - 2 , YX + 2 ) , c o l X LINE (XX - 2 , YX + 2)- (X% - 2 , YX - 2 ) , c o l X END SUB SUB nachelp COLOR 11, 12 CLS LOCATE 1, 20 PRINT " H e l p - s c r e e n f o r l / T -PRINT PRINT " As f a c t o r s become l e s s s i g n i f i c a n t , the co r respond ing e i g e n v a l u e s (1)" PRINT " dec rease i n v a l u e . By d i v i d i n g the e i g e n v a l u e (1) by the e i g e n v a l u e f o r " PRINT " the next f a c t o r ( T ) , one can get an idea f o r the r e l a t i v e s i g n i f i c a n c e PRINT " o f the i n d i v i d u a l e i g e n v e c t o r s . PRINT PRINT " The va l ues f o r l / T shou ld change i n a p s e u d o - l i n e a r f a s h i o n w h i l e t h e " PRINT " e i g e n v e c t o r s be long to the p r ima ry s e t . The boundary between the pr imary s e t " PRINT " o f e i g e n v e c t o r s and the secondary ( i n s i g n i f i c a n t ) se t shou ld be noted PRINT "by a l a r g e 1/1' va l ue f o l l o w e d by a ve ry low v a l u e . " CALL s l e p CLS END SUB ' Th is s u b r o u t i n e takes an AFA f i l e and c r e a t e s a D E S c r i p t o r f i l e o f the ' a b s t r a c t f a c t o r s . The use r i s asked f o r the number o f f a c t o r s t o be used . SUB NEWDES (AFAFILES, SUCCESSX) PRINT OPEN " i " , #1. AFAFILE$ w ITYPX = INSTR("AFAAF2" , RIGHT$(AFAFILE$. 3 ) ) ^ INPUT #1, nr%, ncX PRINT "The f i l e h a s " ; ncX; " a b s t r a c t f a c t o r s . " 00 LOCATE 8 , 1 PRINT "How many do you w ish to p l a c e i n the new D E S c r i p t o r f i l INPUT n% IF n% > ncX OR nX < 1 THEN PRINT "Lega l v a l u e s ("; 1; " - " ; ncX; " ) " GOTO 6100 END IF PRINT a f a f $ = FRTEXTJ(AFAFILEJ. A r o o t S , aext$) INPUT " What f i l ename Is t o be used as the new DES f i l e " ; F$ F$ = UCASE$(F$) I f the f i r s t c h a r a c t e r 1s a - then the same root i s to be used . IF LEFTS(F$, 1) = " - " THEN F$ = RIGHT$(F$. LEN(F$) - 1) OUTFILES = FRTEXTS(FS, FROOTS, FEXTS) OUTFILES = ArootS + OUTFILES + " . D S 2 " ELSE OUTFILES = FRTEXTS(F$, FROOTS, FEXTS) OUTFILES = FROOTS + OUTFILES + " . D S 2 " END IF OPEN OUTFILES FOR RANDOM ACCESS READ AS #3 EXISTX = L0F(3) > 0 CLOSE #3 IF EXISTX THEN PRINT "That f i l e a l r eady e x i s t s . Overwr i te (Y/N) 7" a$ = INPUTS(l) a$ = UCASES(aS) IF a$ <> " Y " THEN 43 END IF PRINT PRINT "New d e s c r i p t o r f i l e = " ; OUTFILES PRINT Q$ = CHRS(34) DIM row#(ncX) DIM cLass AS RECRD 52 DIM 1d AS RECRD DIM EXTEN AS STRING * 4 DIM EXTRA AS STRING * 4 OPEN " o " . #3, OUTFILES PRINT #3. n r X . nX INPUT #1, e x t 2 X , e x t l X PRINT #3, e x t 2 X , e x t l X IF ITYPX = 4 THEN IF e x t l X <> 0 THEN INPUT #1, EXR$ PRINT #3, Q$ + EXR$ + Q$ END IF FOR NnX = 1 TO ex t2X INPUT #1, EXTENS PRINT #3, Q$ + EXTENS + Q$ NEXT END IF Sk ip over e i g e n v a l u e s and e i g e n v e c t o r s i n Input f i l e . FOR IX = 1 TO ncX * (ncX + 1) INPUT #1, row#(l) NEXT Wr i te out column names to ou tpu t f i l e FOR IX » 1 TO nX WRITE #3, "FTR #" + STR$(IX) NEXT Input e n t i r e row, then p r i n t the f i r s t NX co lumns. FOR IX = 1 TO nrX INPUT #1, i d . R , c L a s s . R , t i m e ! , EXTRA PRINT #3, QS + Id .R + QS; QS + c L a s s . R + QS; t i m e ! , QS + EXTRA + Q$; FOR NnX = 1 TO ex t2X INPUT #1, EXTENS PRINT #3, QS + EXTENS + Q$; NEXT FOR JX = 1 TO ncX INPUT #1, row#(JX) NEXT to FOR JX = 1 TO nX - 1 £ 53 PRINT #3, row#(J%), NEXT J % PRINT #3. row#(nX) NEXT PRINT #3, " T h i s f i l e was produced by ABSCAT.BAS" PRINT #3, "The complete f a c t o r a n a l y s i s f i l e i s : " ; AFAFILE$ PRINT #3, nX; " o f " ; nc%; " f a c t o r s . " PRINT #3, DATE$ IF NOT E0F(1) THEN INPUT #1, s c a l e d $ PRINT #3, s c a l e d $ END IF CLOSE #3 CLOSE #1 SUCCESS* = -1 AFAFILES = OUTFILES LOCATE 24, 1 PRINT " P r e s s Any K e y . . . " ; CALL s l e p END SUB SUB pmrk (XC0RD#, YCORD#) Th i s s u b r o u t i n e assumes the p l o t t e r i s open as f i l e #1 Th is r o u t i n e a l s o assumes tha t the p l o t t e r i s se t up p r o p e r l y w i t h the P l po in t at the o r i g i n o f the graph to be drawn (not n e s c e s s a r i l y the lower l e f t o f the paper) HRK$ i s added t o a l l o w f o r user de f i ned c h a r a c t e r s f o r l a b e l l i n g the p o i n t s d i f f e r e n t l y . An a r ray w i l l e v e n t u a l l y be ass igned to co r respond the sepera te p o i n t s t o t h e i r p l o t c h a r a c t e r . MRK$ = " 9 9 , 4 , 0 . 0 , 8 , - 4 , 0 . 0 , - 8 ; " PRINT #1, " P A " ; XC0RD#; " , " ; YC0RD#; PRINT #1, " U C " + MRKJ END SUB 54 SUB readcol (F$ , co1#( ) , co l%) Th is r o u t i n e reads In one column ( d e s c r i p t o r o f F$) I t i s assumed t ha t the e x t r a r eco rds e t c . a r e taken c a r e o f by a p r e v i o u s ' c a l l t o READPARS w i t h t he a p p r o p r i a t e R0W#, d imens ion ing Parameters . f i l ename i n c l u d i n g e x t e n s i o n a r r a y r e t u r n e d w i t h column v a l u e s o f f i l e column to read i n . FTYPEX = ( I N S T R ( " . D E S . D S 1 . D S 2 . S C L . A F A . A F 2 " , RIGHT$(F$. 4 ) ) / 4) + 1 SELECT CASE FTYPEX CASE 1, 4 ' DES , SCL OPEN " I " , #1, F$ INPUT #1, NUMsamsX, NUMVARSX FOR nX = 1 TO NUMVARSX INPUT #1, varnam$ NEXT FOR NXX = 1 TO NUMsamsX INPUT #1, 1d$ INPUT #1, c L a s s $ FOR NyX * 1 TO c o l X INPUT #1, col#(NXX) NEXT FOR NyX = c o l X + 1 TO NUMVARSX INPUT #1, temp* NEXT NEXT CLOSE #1 CASE 2 ' DS1 OPEN F$ FOR RANDOM ACCESS FIELD #1, 4 AS REC1$ GET #1 tmp$ = LEFT$(REC1$, 2) NUMsamsX = CVI(tmp$) F$ co l#( ) -c o l X READ AS #1 LEN • 4 GET the f i r s t r e c o r d Parse out the f i r s t two by tes NJ Conver t t o number o f s i g n a l s & tmp$ - RIGHT$(REC1$, 2) NUMVARSX * CVI(tmp$) GET #1 ext2X = CVI(LEFT$(REC1$, 2) ) e x t l X = ASC(RIGHT$(REC1J, 1)) AND &H7 IF e x t l X <> 0 THEN GET #1 EXR$ = REC1$ FOR nX = 1 TO ext2X GET #1 GET #1 NEXT Get the d e s c r i p t o r names FOR IX = 1 TO NUMVARSX GET #1 GET #1 NEXT IX S igna l / d e s c r i p t o r i n f o rma t i on FOR IX = 1 TO NUMsamsX GET #1 GET #1 GET #1 GET #1 FOR nX = 1 TO ext2X GET #1 NEXT FOR JX = 1 TO c o l X GET #1 NEXT co l# ( IX ) = CVS(REC1$) FOR JX = c o l X + 1 TO NUMVARSX GET #1 NEXT NEXT CLOSE #1 CASE 3. 3 ' DS2 OPEN " I " , #1. F$ INPUT #1, NUMsamsX, NUMVARSX Parse out the l a s t two by tes Convert to no. o f d e s c r i p t o r s get the e x t r a reco rd s k i p extended reco rd names s i g n a l ID s i g n a l c l a s s t ime o f s i g n a l e x t r a reco rd extended records INPUT #1. ex t2X , e x t l X IF e x t l X <> 0 THEN INPUT #1, EXR$ FOR nX = 1 TO ex t2X INPUT #1, EXNAM$ NEXT FOR nX = 1 TO NUMVARSX INPUT #1, varnam$ NEXT FOR NXX = 1 TO NUMsamsX INPUT #1, id$ INPUT #1, c l s s $ INPUT #1, t i m e ! INPUT #1, EXTRA$ FOR nX = 1 TO ext2% INPUT #1, EXTENS NEXT FOR NyX = 1 TO c o l X INPUT #1. col#(NXX) NEXT FOR NyX = c o l X + 1 TO NUMVARSX INPUT #1, temp* NEXT NEXT CLOSE #1 CASE 5 ' " A F A " OPEN " I " , #1, F$ INPUT #1, NUMsamsX, NUMVARSX INPUT #1. ex t2X , e x t l X * READ past e i g e n v a l u e s and e i g e n v e c t o r s FOR IX = 1 TO NUMVARSX + 1 FOR KX = 1 TO NUMVARSX INPUT #1, temp# NEXT NEXT IX * * * READ i n a l l the v a l u e s h e r e , eh? FOR IX = 1 TO NUMsamsX INPUT #1, id$ INPUT #1, cLass$ INPUT #1, t i m e ! INPUT #1, EXTRAS FOR KX = 1 TO c o l X INPUT #1, col#(I%) NEXT FOR K% • c o l X + 1 TO NUMVARSX INPUT #1, tempi NEXT NEXT CLOSE #1 CASE 6 ' " A F 2 " OPEN " I " , #1, F$ INPUT #1, NUMsamsX, NUMVARSX INPUT #1, ex tZX , e x t l X IF e x t l X <> 0 THEN INPUT #1. EXR$ FOR nX « 1 TO ext2X INPUT #1, EXNAMJ NEXT FOR nX = 1 TO NUMVARSX INPUT #1, EIGENVAL! FOR MX » 1 TO NUMVARSX INPUT #1, EIGENVEC! NEXT NEXT FOR NXX = 1 TO NUMsamsX INPUT #1, 1d$ INPUT #1, cLass$ INPUT #1, t ime ! INPUT #1, EXTRAJ FOR nX = 1 TO ext2X INPUT #1, EXTENS. NEXT FOR NyX = 1 TO c o l X INPUT #1, col#(NXX) NEXT FOR NyX = c o l X + 1 TO NUMVARSX 7 58 INPUT #1, temp# NEXT NEXT CLOSE #1 END SELECT END SUB SUB read fac t s (F ILES , NUMFAXX, E V A L S ! ( ) , E V e c s ! { ) , varnam$()) ' Th is s u b r o u t i n e reads In the e i g e n v a l u e s EVALS! ( ) and e i g e n v e c t o r s EVECS!( ) ' from F ILES. The v a r i a b l e names VARNAMSO a re read In from the f i l e . O l d e r f i l e s may not have them. OPEN " I " , #1, FILES INPUT #1, NUMsamsX, NUMVARSX INPUT #1, ex t2X , e x t l X IF e x t l X <> 0 THEN INPUT #1, EXRS FOR nX = 1 TO ex t2X INPUT #1, EXNAMS NEXT FOR IX = 1 TO NUMFAXX INPUT #1, EVALS! ( IX ) FOR JX = 1 TO NUMVARSX INPUT #1, E V e c s ! ( I X , JX) NEXT NEXT ' Sk ip the f a c t o r s c o r e s m a t r i x so we can get a t the d e s c r i p t o r names. FOR IX = 1 TO NUMsamsX INPUT #1, 1d$, c L a s s $ , t i m e ! , ex t$ FOR JX = 1 TO ex t2X INPUT #1, ex tn$ NEXT FOR JX = 1 TO NUMVARSX INPUT #1, d e s c r ! NEXT NEXT INPUT #1, sca ledS LINE INPUT #1, DESS £ CLOSE #1 CO 59 aX = 0 FOR 1% = 1 TO NUMVARSX BX = aX + 1 aX = INSTR(aX + 1, DES). " . " ) IF aX = 0 THEN aX « LEN(DES$) + 1 varnamJ(IX) = MID$(DESJ'. BX, aX - BX) NEXT END SUB SUB readpars ( F I L E S . NUMsamsX, NUMVARSX, e x t l X , ext2X) Th is r o u t i n e reads 1n the header In fo rmat ion on the da ta f i l e . ' NUMSAMSX and NUMVARSX are parameters re tu rned by the r o u t i n e . V a r i a b l e s The parameters r e t u rned by t h i s sub rou t i ne a re : NROWS - number o f samples s t o r e d i n f i l e NCOLS - number o f d e s c r i p t o r s f o r each sample EXT1X - va l ue f o r the s t a t u s o f the 4 -byte e x t r a r e c o r d . 0 i f not used. O the rw i se , i t conforms to the conven t ion ment ioned above. 1 » f ou r by te s t r i n g 3 = r e a l number 5 = In teger o f form (OOnn) 7 = long i n t e g e r NEXT2X - Number o f extended r e c o r d s . The data f i l e s have the f o l l o w i n g formats DES Th is f i l e i s s t o r e d as a s e q u e n t i a l ASCII f i l e . Th i s f i l e format was l a t e r upgraded to a l l o w f o r more i n f o con ten t . NROWS, NCOLS Number o f samples, # o f v a r i a b l e s "Name o f v a r i a b l e # l " NCOLS o f these [ 1 t o NCOLS ] 60 Name i s 8 c h a r a c t e r s i n quotes . "Name o f v a r i a b l e #2 ID , C L S S , v a r # l . . . , v a r # N C O L S ID.CLSS a re f o u r - c h a r a c t e r s t r i n g s t h a t c o n t a i n the I d e n t i t y ( Ie number), r e c o r d (not u s e d ) . F o l l o w i n g these on each l i n e a re the NCOLS d e s c r i p t o r s f o r t h a t sample . There a r e NROWS such l i n e s . DS1 Th is f i l e Is s t o r e d as a random a c c e s s b i n a r y f i l e but has more i n f o r m a t i o n s t o r e d . As w e l l as the ID and CLASS o f each sample , the t ime o f a c q u i s i t i o n , and an e x t r a 4 -by te r e c o r d i s s t o r e d w i t h each s i g n a l . These appear on the l i n e a f t e r the ID and CLASS but be fo re the d e s c r i p t o r s . As w e l l , t he re 1s a s t a t u s byte f o r the EXTRA r e c o r d t e l l i n g what type o f v a r i a b l e i t i s and whether or not i t i s b e i n g used . The 4 -by tes a re a lways p r e s e n t . The s t a t u s by te EXSTAT Is |00100xxy| where y « 0 i f r e c o r d i s not u s e d , xx « 00 - f o u r c h a r a c t e r s t r i n g . 1 i f r e c o r d i s used . 01 - r e a l ( s i n g l e p r e c i s i o n ) 10 - I n tege r o f form (OOnn) 11 - l ong I n t e g e r . Note: the 1 i n the t h i r d b i t 1s se t t o one f o r c o m p a t a b l l i t y w i t h p r e v i o u s v e r s i o n s . A l l f i e l d s a re t h e r e f o r e se t up as 4 - b y t e s . NROWS, NCOLS Number o f samples , # o f v a r i a b l e s NEXT.EXTSTAT ] Number o f ex tended r e c o r d s , s t a t u s o f e x t r a r e c o r d (0 i f not u s e d . l i f used) "EXT " [on ly i f EXTSTAT <>0] ] Name o f e x t r a r e c o r d , 3 chars + a space "XN1_ _ X " [on ly i f NEXT >0 -NEXT t i m e s . Name o f ex tended r e c o r d . 3 char,Sj + 4 spaces and s t a t u s byte X . ^ 61 "XN2 _ _ X " "Name o f v a r i a b l e # 1" NCOLS o f these [ 1 t o NCOLS ] Name 1s 8 c h a r a c t e r s , ( i e 2-4byte r eco rds ) "Name o f v a r i a b l e #2 10 CLSS TIME EXTR EXN1 EXN2 NROWS t imes var# l ,var#2 var#NC0LS ID,CLSS,TIME,EXTR are f o u r - b y t e reco rds t ha t c o n t a i n the I d e n t i t y ( l e number), C l a s s , T i m e o f a c q u i s i t i o n , e x t r a r e c o r d . Any extended records a re I nse r t ed a t t h i s p o i n t . F o l l o w i n g these fou r (+NEXT2%) f i e l d s a re the NCOLS v a r i a b l e s f o r t ha t sample. S to red as f o u r - b y t e r e a l s 0S2 Th is f i l e i s s t o r e d as a sequen t i a l ASCII f i l e but has the same format as the OSI f i l e . The two can be i n t e r c o n v e r t e d w i t h no l o s s o f I n f o rma t i on . NROWS, NCOLS Number o f samples, # o f v a r i a b l e s NEXT.EXTSTAT ] Number o f extended r e c o r d s , s t a t u s o f e x t r a reco rd (0 I f not u s e d . l i f used) 'EXT " [on ly i f EXTSTAT <>0] ] Name of e x t r a r e c o r d , 3 chars + a space XN1_ _ X " [on ly i f NEXT >0 -NEXT t imes . Name of extended r e c o r d . 3 chars + 4 spaces and s t a t u s byte X . See note i n XN2 _ _ X " DS1 f i l e f o r use o f s t a t u s by te . "Name o f v a r i a b l e # 1" "Name of v a r i a b l e #2 NCOLS of these [ 1 to NCOLS ] Name i s 8 c h a r a c t e r s In quo tes . " ID " " C L S S " TIME! " E X T R " , Any extended reco rds go here NROWS l i n e s c o n t a i n i n g 62 var# l ,var#2 var#NC0LS ID,CLSS,TIME* ,EXTR a re f o u r - b y t e reco rds t ha t c o n t a i n the I d e n t i t y ( l e number), CLASS, t ime o f the s i g n a l ( r e a l ) . a n d the EXTRa r e c o r d . Any extended r e c o r d s a re then p l a c e d f o l l o w e d by the NCOLS v a r i a b l e s f o r t ha t sample. There a re NROWS such l i n e s - one f o r each sample. AFA.AF2 These f i l e s c o n t a i n the r e s u l t s o f the f a c t o r a n a l y s i s o f a d e s c r i p t o r f i l e . They both have the same format and a re w r i t t e n as s e q u e n t i a l ASCII f i l e s . The e i genva lues and e i g e n v e c t o r s appear f i r s t which i s f o l l o w e d by the load ings m a t r i x i n the same format as a DES f i l e ( ID ,CLASS, f o l l o w e d by the NCOLS l o a d i n g s f o r t ha t v e c t o r ) . The AF2 f i l e a l s o c o n t a i n s TIME, EXTRA record and any extended r e c o r d s so t ha t a DS1 f i l e c r e a t e d from t h i s i s complete . NROWS, NCOLS Number o f samples , # o f v a r i a b l e s NEXT, EXstat% # o f ex tended r e c o r d s , e x t r a r e c o r d s t a t u s NEXT.EXTSTAT ] Number o f extended r e c o r d s , s t a t u s o f e x t r a r e c o r d (0 I f not u s e d . l i f used) "EXT " [on ly i f EXTSTAT <>0] ] Name o f e x t r a r e c o r d . 3 chars + a space "XN1_ _ X " [on ly i f NEXT >0 -NEXT t i m e s . Name o f extended r e c o r d . 3 cha rs + 4 spaces and s t a t u s byte X . See note i n "XN2 _ _ X " DS1 f i l e f o r use o f s t a t u s b y t e . NJ to EVAL( l ) - f i r s t e i g e n v a l u e O 63 EVEC(1 ,1 ) ,EVEC(1 ,2 ) - l i n e a r v e c t o r c o n t a i n i n g f i r s t e - v e c t o r - These two l i n e s a re repeated f o r each e i g e n v e c t o r . (NFAX t imes - * ) " ID " " C L S S " R # l , . . , R#NFAX NROWS l i n e s c o n t a i n i n g ID,CLSS and NCOLS rea l va lues f o r the l o a d i n g s . AF2 a l s o has extended records and t ime . NOTE : The AF2 f i l e , as w e l l as c o n t a i n i n g the e x t r a / extended reco rds a l s o has the added b e n e f i t o f c o n t a i n i n g a t r a i l i n g l i n e c o n t a i n i n g the names o f a l l the d e s c r i p t o r s used from the o r i g i n a l OESc r i p to r f i l e . SELECT CASE RIGHTS(FILE$, 3) CASE "DES" OPEN " I " . #1, FILES INPUT #1, NUMsamsX, NUMVARSX CLOSE #1 e x t l X = 0 ext2X = 0 CASE " A F A " , " A F 2 " OPEN " I " , #1, FILES INPUT #1, NUMsamsX, NUMVARSX INPUT #1. ex t2X, e x t l X CLOSE #1 CASE " D S 1 " OPEN FILES FOR RANDOM ACCESS READ AS #1 LEN = 4 FIELD #1, 4 AS RECS GET #1 ' F i r s t reco rd tmp$ = LEFTS(REC$, 2) ' s p l i t In to h a l f NUMsamsX = CVI(tmpS) NUMVARSX = CVI(RIGHT$(REC$, 2)) GET #1 64 e x t l X = ASC(RIGHT$(RECS, D ) AND &H7 ext2X = CVI(LEFT$(REC$, 2)) ' Conver t e x t r a record CLOSE #1 CASE "DS2" OPEN " I " , #1, FILES INPUT #1, NUMsamsX, NUMVARSX INPUT #1, ex t2X , e x t l X CLOSE #1 CASE " S C L " OPEN " I " , #1, FILES INPUT #1, NUMsamsX, NUMVARSX CLOSE #1 e x t l X = 0: ext2X = 0 END SELECT END SUB SUB readvals (F$ , varnamSO, mess#(), 1d() AS RECRD, c L a s s f ) AS RECRD, time!(), EXTRA() AS RECRD, EXR$, EXNAM$(), E X T S X O , EXTENS ( ) , scaledS) This routine is the second half of the read f i le sub program. The routine i s broken in two halves in order for D IMens lon lng in the main module. Parameters . FS - f i l ename i n c l u d i n g e x t e n s i o n VARNAMS - a r r a y r e t u r n i n g names o f d e s c r i p t o r s MESS#() - a r r a y r e t u r n i n g m a t r i x o f da ta - i f MESS* i s passed as a l x l a r r a y then the v a l u e s a re not s t o r e d - t h i s i s u s e f u l f o r read ing In e x t r a parameters o n l y TIME!() - a r r a y r e t u r n i n g t imes o f a c q u i s i t i o n ID() - a r r a y o f 4 - b y t e s t r i n g s o f i d e n t i f i c a t i o n CLASS() - 4 - c h a r a c t e r s t r i n g c o n t a i n g c l a s s d e s i g n a t i o n SCALEDS - s t r i n g c o n t a i n g method o f s c a l i n g i f e x t e n s i o n i s " . S C L " EXR$ - 3 - c h a r a c t e r s t r i n g c o n t a i n i n g name o f e x t r a r e c o r d EXTRASO- a r r a y o f 4 - b y t e s con ta i ng e x t r a r e c o r d s f o r each s i g n a l EXNSO - a r r a y c o n t a i n g 7 -cha r names o f extended reco rds K> EXNTX() - A r r a y c o n t a i n g 1 by te codes f o r t ype o f extended r e c o r d s . Same | _ i conven t i on as f o r EXT1X i n READPARS. EXTEN$()- 2-D a r r a y c o n t a i n i n g 4 -by te extended reco rds f o r each s i g n a l . DIM EXTRA AS RECRD FTYPE% = ( INSTR( " .DES.DS1.DS2.SCL.AFA.AF2" , SELECT CASE FTYPEX CASE 1 , 4 ' OES , SCL OPEN " I " , #1. F$ INPUT #1, NUMsamsX, NUMVARSX FOR nX = 1 TO NUMVARSX INPUT #1, varnam$(nX) NEXT FOR NXX = 1 TO NUMsamsX INPUT #1. 1d(NXX).R INPUT #1, cLass(NXX) .R FOR NyX = 1 TO NUMVARSX IF UBOUNO(mess#, 2) = 1 THEN INPUT #1, temp* ELSE INPUT #1, mess*(NXX, NyX) END IF NEXT NEXT IF FTYPEX = 4 THEN INPUT #1, s c a l e d ! CLOSE #1 CASE 2 ' OSI OPEN F$ FOR RANDOM ACCESS READ AS *1 LEN FIELD #1. 4 AS REC1$ GET #1 tmp$ = LEFT$(REC1J, 2) NUMsamsX = CVI(tmp$) tmp$ » RIGHT$(REC1$, 2) NUMVARSX = CVI(tmp$) GET #1 ext2X = CVI(LEFT$(REC1$, 2) ) e x t l X = ASC(RIGHT$(REC1$, 1)) AND &H7 RIGHT$(F$, 4) ) / 4) + 1 GET the f i r s t r eco rd Parse out the f i r s t two by tes Convert to number o f s i g n a l s Parse out the l a s t two by tes Convert to no. o f d e s c r i p t o r s get the e x t r a reco rd 66 IF e x t l X <> 0 THEN GET *1 EXR$ = REC1$ FOR nX = 1 TO ex t2X GET #1 EXNAM$(nX) = REC1$ GET #1 EXTSX(nX) = ASC(R1GHT$(REC1$, 1) ) AND &H7 EXNAMj(nX) = EXNAM$(nX) + LEFT$(REC1$, 3) NEXT Get the d e s c r i p t o r names FOR IX = 1 TO NUMVARSX GET *1 tmp$ = REC1$ GET #1 varnam$(IX) » tmpj + REC1$ NEXT IX S i g n a l / d e s c r i p t o r I n f o rma t i on FOR IX = 1 TO NUMsamsX GET *1 1d ( IX ) .R = REC1$ GET *1 c L a s s ( I X ) . R = REC1$ GET *1 t i m e ! ( I X ) = CVS(REC1$) GET *1 EXTRA(IX).R » REC1$ FOR nX = 1 TO ex t2X GET #1 EXTEN$(IX, nX) = REC1$ NEXT FOR JX = 1 TO NUMVARSX GET #1 IF UBOUND(mess#, 2) > 1 THEN mess#(IX, JX) = CVS(REC1$) END IF NEXT s i g n a l ID s i g n a l c l a s s t ime o f s i g n a l e x t r a r e c o r d NJ K 1 NEXT CLOSE #1 CASE 3 ' DS2 OPEN " I " , #1, F$ INPUT #1, NUMsamsX, NUMVARSX INPUT #1, ex t2X . e x t l X ' IP e x t l X <> 0 THEN INPUT #1, EXR$ FOR nX = 1 TO ext2X INPUT #1, EXNAM$(nX) EXTSX(nX) » ASC(RI6HT$(EXNAM$(nX), 1)) AND &H7 EXNAM$(nX) = LEFT$(EXNAM$(nX), LEN(EXNAM$(nX)) NEXT FOR nX = 1 TO NUMVARSX INPUT #1, varnam$(nX) NEXT FOR NXX = 1 TO NUMsamsX INPUT #1, 1d(NXX).R INPUT #1, cLass(NXX) .R INPUT #1, t ime!(NXX) INPUT #1, EXTRA(NXX).R FOR nX = 1 TO ext2X INPUT #1, EXTEN$(NXX, IX) NEXT FOR NyX = 1 TO NUMVARSX IF UBOUNO(mess#, 2) > 1 THEN INPUT #1, mess#(NXX, NyX) ELSE INPUT #1, temp# END IF NEXT NEXT CLOSE #1 CASE 5 ' " A F A " t OPEN " I " , #1, F$ INPUT #1, NUMsamsX, NUMVARSX INPUT #1, ex t2X , e x t l X READ pas t e igenva lues and e i g e n v e c t o r s FOR IX = 1 TO NUMVARSX + 1 FOR KX = 1 TO NUMVARSX INPUT #1, temp# NEXT NEXT IX * READ i n a l l the v a l u e s h e r e , eh? FOR IX = 1 TO NUMsamsX INPUT #1, i d ( I X ) . R INPUT #1, c L a s s ( I X ) . R INPUT #1, t i m e ! ( I X ) INPUT #1, EXTRA.R FOR KX = 1 TO NUMVARSX IF UBOUND(mess#, 2) > 2 THEN INPUT #1, mess#(IX, KX) ELSE INPUT #1, temp# END IF NEXT NEXT CLOSE #1 FOR 1% = 1 TO NUMVARSX varnam$(IX) = " F t r #" + STR$(IX) NEXT CASE 6 ' " A F 2 " OPEN " I " . #1, F$ INPUT #1, NUMsamsX, NUMVARSX INPUT #1, ex t2X , e x t l X IF e x t l X <> 0 THEN INPUT #1, EXR$ FOR nX = 1 TO ext2% INPUT #1, EXNAM$(nX) EXTSX(nX) = ASC(RIGHT$(EXNAM$(nX), 1)) AND &H7 EXNAM$ = RIGHT$(RIGHT$(EXNAM$(nX), LEN(EXNAM$(nX)) SPACE$(7) , 7) NEXT FOR nX = 1 TO NUMVARSX INPUT #1, EIGENVAL! FOR MX = 1 TO NUMVARSX INPUT #1, EIGENVEC! NEXT NEXT FOR NX% = 1 TO NUMsamsX INPUT #1, 1d(NXX).R INPUT #1, cLass (NXX) .R INPUT #1, t ime! (NXX) INPUT #1. EXTRA(NXX).R FOR nX = 1 TO ext2X INPUT #1, EXTEN$(NX%, nX) NEXT FOR NyX = 1 TO NUMVARSX IF UBOUNO(mess#. 2) > 2 THEN INPUT #1, mess#(NXX, NyX) ELSE INPUT #1, temp# END IF NEXT NEXT CLOSE #1 FOR IX = 1 TO NUMVARSX varnam$(IX) » " F t r f" + STR$(IX) NEXT END SELECT END SUB SUB rehe lp COLOR 11, 12 CLS LOCATE 1, 20 PRINT "RE - Real E r r o r (aka Res idua l Standard D e v i a t i o n ) PRINT 1/2., PRINT c ' PRINT " (N * E V ) " PRINT " RE(N) = n : N+l n " PRINT " " PRINT " r * (c - n) " 69 70 PRINT PRINT " The Real E r r o r shou ld be l e s s than the e s t i m a t e d e r r o r i n the data ( e r r o r PRINT " l i m i t s ) when N = the ' c o r r e c t ' number o f f a c t o r s . RE w i l l then decrease w i t h " PRINT " N r e a c h i n g z e r o o n l y i f t h e r e i s no e r r o r i n the d a t a . " CALL s l e p CLS END SUB SUB r e s f i l h e l p 93 SCREEN , , 1, 1 COLOR 11, 12 CLS LOCATE 1, 20 PRINT "He lp window f o r RESULTS f i l e " PRINT hecurX = 1 LOCATE 22, 15 PRINT "Use arrows t o choose . P ress RETURN t o s e l e c t . ' DIM he lp$ (9 ) h e l p $ ( l ) = " E he lp$ (2 ) = " he lp$ (3 ) - " E i g e n v a l u e " he lp$ (4 ) = " he lp$ (5 ) = " VALUE( l ) 1/1 • 1/AV RMS VAR p r imary v e c t o r s he lp$ (6 ) = " he lp$ (7 ) = " he lp$ (8 ) = " he lp$ (9 ) = " COLOR 10, 12 FOR IX = 1 TO LOCATE 7 + CPV RE IE IND Column c o n t a i n i n g the e i g e n v a l u e s E igenva lue N d i v i d e d by E igenva lue N+l " E igenva lue N d i v i d e d by the average Root Mean Square E r r o r " V a r i a n c e o f da ta se t accounted f o r by Cumula t i ve Percen t V a r i a n c e " Real E r r o r " Imbedded E r r o r " F a c t o r I n d i c a t o r F u n c t i o n " 10 PRINT he l p$ ( IX ) to to NEXT 1% COLOR 13, n i gh t LOCATE 7 + hecur%. 10 PRINT LEFT$(he lp$(hecur%), 12) COLOR 10, 12 94 a$ = INKEYS IF a$ = " " GOTO 94 IF a$ = CHR$(13) THEN GOTO 911 IF a$ = CHR$(27) THEN CLS SCREEN , , 0, 0 EXIT SUB END IF IF ASC(a$) <> 0 GOTO 94 a$ = RIGHT$(a$, 1) SELECT CASE a$ CASE CHR$(72) hecur% » hecur% - 1 IF hecur% = 0 THEN hecur% = UBOUND(help$) GOTO 93 CASE CHR$(80) hecur% = hecur% + 1 IF hecur% > UBOUND(helpS) THEN hecur% = 1 GOTO 93 CASE ELSE GOTO 94 END SELECT 911 c a l l a p p r o p r i a t e sub rou t ine SELECT CASE hecur% CASE 1 CALL e v a l h e l p CASE 2 CALL nache lp CASE 3 CALL eoahe lp CASE 4 71 72 CALL rmshelp CASE 5 CALL v a r h e l p CASE 6 CALL cpvhe lp CASE 7 CALL r e h e l p CASE 8 CALL i e h e l p CASE 9 CALL Indhe lp CASE ELSE END SELECT GOTO 93 END SUB SUB rmshelp COLOR 11, 12 CLS LOCATE 1, 20 PRINT "RMS - Root Mean Square E r r o r " PRINT PRINT " c 1 1 PRINT " £ ( T ) PRINT " RMS(N) = n : N + l n PRINT " — PRINT " r * c PRINT PRINT CALL s l e p CLS END SUB SUB s c a l e c o l (MAT#(), c o l % , Mode%, VALUE*, DIV*) STATIC This r o u t i n e per forms the s c a l i n g on the COLX ' th column of MAT* a c c o r d i n g ttf NJ the va lue o f MODE% and ( i f n e s c e s s a r y the) VALUE*. u i 73 MODEX 1 The column i s a u t o s c a l e d such tha t the mean o f the column i s VALUE* and the t o t a l v a r i a n c e equa ls 0IV#. For t r u e a u t o s c a l i n g , the mean shou ld be s c a l e d to 0 and the va lue o f DIV# shou ld be 1. To s c a l e a column such tha t i t has the same v a r i a n c e r e l a t i v e to a d i f f e r e n t co lumn, DIV# shou ld equal the quo t ien t o f the v a r i a n c e s . DIV* » VARIANCEfof o ther column) / VARIANCEfof COLXumn) 2 The column i s range s c a l e d such tha t the column maximum i s VALUE* and the minimum va lue i s DIV#. For range s c a l i n g 0 - 1 , VALUE*=1 and DIV* » 0 . NOTE: i t does not mat te r tha t VALUE* be l a r g e r than DIV* except t ha t the s c a l e d va l ues w i l l a l l be i n v e r t e d about the column mean. I f VALUE* = DIV* then the va lues w i l l a l l be se t to VALUE* Th i s i s how a column i s removed from the c a l c u l a t i o n o f d i s t a n c e . 3 Each element o f the column has VALUE* sub t r ac ted from i t and i s then d i v i d e d by DIV*. Th i s MODE* Is c a l l e d r e c u r s i v e l y from M0DEX=1 and Is Inc luded to a l l o w m u l t i p l i c a t i o n o f a column by a c e r t a i n va lue (DIV* = 1/FACT0R) o r s h i f t i n g a column by a VALUE* (NOTE: DIV* shou ld equal 1 to accomp l i sh a s h i f t and VALUE* shou ld equal 0 f o r a m u l t i p l i c a t i o n o n l y . NSAMS% = UB0UND(MAT#) SELECT CASE Mode% CASE 1 v a r n z * = VARIANCE(MAT*(), c o l X , NSAMS%, average*) v l * = average* - VALUE* dv* = SQR(DIV* * va rnz* ) MD% = 3 CALL s c a l e c o l ( M A T * ( ) , c o l % , MD%, v l# , dv*) CASE 2 cmax! = co lmax ! (MAT* ( ) , co l% , 1, NSAMSX, cmin ! ) v l * = cm in ! dv* = 1 IF cmax! <> cmin ! THEN dv* = (cmax! - cmin ! ) / (DIV* - VALUE*) MDX = 3 CALL s e a l e c o l ( M A T * ( ) , c o l % , MDX, v l * , dv*) v l * = VALUE* * SGN(DIV* - VALUE*) 74 dv* = 1 MDX = 3 CALL s c a l e c o l ( M A T * ( ) , c o l X , MDX, v l * , dv* ) CASE 3 FOR NXX = 1 TO NSAMSX MAT*(NXX, c o l X ) = (MAT*(NXX, c o l X ) - VALUE*) / DIV* NEXT NXX CASE ELSE STOP END SELECT END SUB SUB s c a l e r (mess * ( ) , varnam$() , F$ , s c a l e d $ , DELVARXO, OUTFILES) ' Th is program w i l l s c a l e the m a t r i x mess* a c c o r d i n g t o user i n p u t . The method ' f o r s c a l i n g the rows i s d e s c r i b e d i n the SUBprogram SCaLECOL. SCALEX = 1 sca lmax ! = 1 s c a l m i n ! = 0 COLOR 11, 13 CLS SOUND 330, 1 PRINT STRIN6$(80, " * " ) LOCATE 3 , 25 PRINT "SCALING FOR FACTOR ANALYSIS" LOCATE 5, 20 PRINT "DATA FILE = " ; F$ LOCATE 7, 20 PRINT "OUTPUT FILE = " ; OUTFILES LOCATE 22, 15 PRINT "Use arrows t o choose . P r e s s RETURN to s e l e c t . " ; DIM s c l e $ ( 4 ) s c l e $ ( l ) = "Au to S c a l e " s c l e $ ( 2 ) = "Range S c a l e " s c l e $ ( 3 ) = "No S c a l i n g " to s c l e $ ( 4 ) = " S p e c i a l S c a l i n g F u n c t i o n s " ^ COLOR 10, 13 LOCATE 7. 34 PRINT OUTFILES + STRING$(46 - LEN(Ol!TFILE$), " " ) ; FOR IX = 1 TO 4 LOCATE 7 + 1% * 2 , 15 PRINT LEFT$(STR$(IX) + " ) " + sc le$( I%) + SPACE$(30), 30) NEXT 1% COLOR 13, h lgh t LOCATE 7 + SCALE% * 2 , 15 IF SCALE* > 0 THEN PRINT LEFT$(STR$(SCALEX) + " ) " + sc leS(SCALEX) + SPACE$(30), 30) ELSE LOCATE , 34 PRINT OUTFILE$ END IF COLOR 10, 13 IF SCALE* = 2 THEN LOCATE 11. 55 PRINT "Max = " ; LEFTS(STR$(seal max!) + SPACE$(18), 18) LOCATE 12, 55 PRINT "Min = " ; LEFT$(STR$(sca lmin ! ) + SPACE$(18), 18) ELSE LOCATE 11, 55 : PRINT SPACE$(24); LOCATE 12, 55: PRINT SPACE$(24); END IF CALL ge tkey(a$) IF a$ = CHR$(13) THEN IF SCALE* > 0 THEN GOTO 11 ELSE LOCATE 8, 20 INPUT ; "OUTPUT FILE =", 0F$ FX$ = " A F 2 " CALL GETFIL(0F$, FX$, EXIST*) IF 0F$ = " " THEN GOTO 6 ELSEIF EXIST* THEN PRINT " a l r e a d y e x i s t s . OVERWRITE ( Y / N ) " INPUT B$ IF UCASE$(LEFT$(B$, 1)) <> " Y " THEN GOTO 6 END IF END IF OUTFILES = 0F$ END IF LOCATE 8 . 20 PRINT STRING$(60, " " ) ; GOTO 3 END IF IF a$ = CHR$(27) THEN varnam$(0) = " E X I T " EXIT SUB END IF IF ASC(aS) <> 0 GOTO 4 a$ = RIGHT$(a$, 1) SELECT CASE a$ CASE CHR$(72) SCALE* = SCALE* - 1 IF SCALE* = -1 THEN SCALE* = UBOUND(scle$) GOTO 3 CASE CHR$(80) SCALE* = SCALE* + 1 IF SCALE* > UBOUND(scle$) THEN SCALE* = 1 GOTO 3 CASE CHR$(77) ' t he RIGHT key was p r e s s e d IF SCALE* = 2 THEN COLOR 14, h lgh t LOCATE SCALE* * 2 + 7, 61 PRINT SPC( IO) ; LOCATE SCALE* * 2 + 7, 61 COLOR 14, 13 INPUT num$ sea l max! = VAL(num$) COLOR 14, h lgh t LOCATE SCALE* * 2 + 8 , 61 PRINT SPC( IO) ; LOCATE , 61 COLOR 14, 13 INPUT num$ s c a l m i n ! = VAL(num$) ' END IF GOTO 3 CASE ELSE GOTO 4 END SELECT I t LOCATE 22, 1 PRINT SPACE$(79) LOCATE 22, 14 IF FEXISTX(OUTFILE$, 1&) THEN PRINT OF$; " a l r e a d y e x i s t s . Overwr i te (Y/N) ? " ; CALL ge tkey(a$) IF UCASE$(a$) <> " Y " THEN 5 END IF LOCATE 22, 1 PRINT SPACE$(79) LOCATE 22, 30 IF SCALE* = 4 THEN PRINT " P l e a s e w a i t , c a l c u l a t i n g v a r i a b l e s " ELSE PRINT "Now S c a l i n g " END IF ON SCALE* GOTO 12, 13, 14, 15 The f i r s t c h o i c e i s s e l e c t e d . Th is w i l l be the A u t o s c a l i n g f u n c t i o n f o r a l columns. 12 vl# = 0 dv# = 1 MD* = 1 FOR n c * = 1 TO NUMVARS* CALL sca leco l (mess# { ) , n c * , MD*, v l # , dv#) NEXT n c * sca led$ = "Auto S c a l e d " GOTO 40 The second s c a l i n g f e a t u r e was s e l e c t e d . T h i s w i l l be range s c a l i n g . 13 vl# = sea l max* dv# = s c a l m i n ! MD* = 2 FOR n c * = 1 TO NUMVARSX CALL sca leco1 (mess# ( ) , ncX, MDX, v l # , dv#) NEXT ncX sca led$ = "Range S c a l e d : " + STR$(sca lm1n! ) + " t o " + STR$(scalmax GOTO 40 The t h i r d s c a l i n g f e a t u r e i s Not t o s c a l e . T r i c k y 14 sca led$ = "No S c a l i n g " GOTO 40 Now i t ' s go ing to get i n t e r e s t i n g . We have to supp ly the column s t a t i s t i c s 15 CALL advsca l (mess# ( ) . varnam$() . F$ , s c a l e d $ , DELVARX()) 40 LOCATE 21 , 30 END SUB SUB SCAT (FILE$) ' ********************************************** SCATTER-GRAM By Dav id S i b b a l d U n i v e r s i t y o f B r i t i s h Columbia Labora to ry f o r Automated A n a l y t i c a l Chemis t r y Crea ted May 30 , 1988 ( P a y d a y ! ! ) • ********************************************** ' QuickBASIC v e r s i o n 4 .0 SCATtergram i s des igned t o read i n the r e s u l t s ' from ABFACT and d i s p l a y a p l o t o f f a c t o r v s . ' f a c t o r . P l o t t i n g Is p o s s i b l e on a H P - C o l o r P r o ' (7440) p l o t t e r . 79 CONST INIT# = 99999 CONST xmaxX = 600 CONST YMIN% = 30 CONST YMAXX = 330 CONST c lassmaxX = 25 COLOR 15, 0 CLS PRINT " * * * * * * * * * * * * * * * * * SCATTERGRAM by David S i b b a l d * * * * * * * * * * * * * * * * * * * * * * * * * " PRINT PRINT "Read ing " ; FILES CALL r e a d p a r s ( F I L E S , NROWSX, NCOLSX, e x t l X . ext2%) DIM colnam$(0 TO NCOLSX) colnam$(0) = "Time DIM X c o l # ( l TO NROWSX) DIM y c o l # ( l TO NROWSX) DIM c L a s s ( l TO NROWSX) AS RECRD DIM i d ( l TO NROWSX) AS RECRD DIM t i m e ! ( l TO NROWSX) DIM EXTENS(NROWSX, ext2X) DIM EXTRA(NROWSX) AS RECRD DIM EXNAM$(ext2X) DIM EXTSX(ext2X) DIM row#(NROWSX, 1 ) ' Th is i s a f l a g to the sub rou t i ne not to read i n ROW* CALL r e a d v a l s ( F I L E $ , co lnam$( ) , row#(), i d ( ) , c L a s s ( ) , t i m e ! ( ) , EXTRA() EXR$, EXNAM$(), EXTSX() , EXTEN$(), sca ledS) FLNM$ = FRTEXT$(FILES, FRT$. FEXT$) PETE wants to have the d i f f e r e n t c l a s s e s p l o t t e d as d i f f e r e n t c o l o r s s o . . . Search through each CLASS, every t ime we come on a d i f f e r e n t one, a s s i g n a new c o l o r . CLASSESf) c o n t a i n s a l l the d i f f e r e n t c l a s s e s . COLRSXO c o n t a i n s the l i s t o f p o s s i b l e c o l o r s . COLRMAXX i s the number o f c o l o r s s t o r e d i n C0LRS%(). The p o s i t i o n o f the CLASS i n CLASSESf) i s the key to the c o l o r con ta ined i n C0LRSX() . EG: The c l a s s found i n element 5 o f CLASSES!) w i l l appear as the c o l o r number i n element 5 i n C0LRSX() . 80 DIM CLASSES(0 TO c lassmaxX) AS RECRD DIM CLPNX(classmaxX) c o l r s $ = " 7101114131215 9 1 2 3 5 6" COLRMAXX = L E N ( c o l r s $ ) / 2 - 1 DIM c o l r s X ( 0 TO COLRMAXX) FOR IX = 0 TO COLRMAXX c o l r s X ( I X ) = VAL (MIO$(co l r s$ , ( IX + 1) * 2 - 1. 2) ) NEXT numclassesX = 0 FOR IX = 1 TO NROWSX aX = 0 WHILE aX <= numclassesX AND CLASSES(aX) .R <> c L a s s ( I X ) . R aX = aX + 1 WEND IF aX > numclassesX THEN numclassesX = aX CLASSES(aX).R = c L a s s ( I X ) . R END IF IF numclassesX = c lassmaxX THEN EXIT FOR NEXT IX ' We now have a l i n e a r a r r a y o f CLASSES. For each p o i n t , a l l we need to do ' i s f i g u r e out which o f CLASSES 1t be longs to and then a s s i g n t ha t c o l o r . ' Because t he re may be more c l a s s e s than c o l o r s , the c o l o r a s s i g n e d i s found ' to be the MODulus o f the d i v i s i o n o f the c l a s s number d i v i d e d by COLRMAXX. 1000 SCREEN 0 COLOR 15, 13 CLS PRINT " F I L E = " ; FILES PRINT PRINT " S e l e c t f a c t o r s t o p l o t - ( 0 t o e x i t ) : " COLOR 10 STARTX = 1 tabposX = 5 1001 STENDX = NCOLSX LOCATE 5 IF STEND% - START% > 16 THEN STEND% = START% + 16 FOR 1% = START* TO STEND% LOCATE , tabpos% PRINT 1%; " ) " ; colnamS(IX) NEXT IF STENDX < NCOLSX THEN tabposX = tabposX + 15 STARTX = STENDX + 1 GOTO 1001 END IF COLOR 14 LOCATE 21, 50 PRINT " To see column s t a t i s t i c s , " LOCATE 22, 50 PRINT " e n t e r ' S ' f o r Y a x i s . " COLOR 15 LOCATE 22, 5 PRINT " E n t e r y - a x i s (by number)" ; INPUT ; " : " , a2$ IF UCASE$(LEFT$(a2$, 1)) = " S " THEN c a l c u l a t e column s t u f f here CALL TIMPRINT(row#(), co lnam$( ) , FILES) GOTO 1000 END IF a2% = VAL(a2S) IF a2% = 0 THEN 4000 IF a2% < 1 OR a2X > NCOLSX GOTO 1000 LOCATE 23, 5 PRINT ; " E n t e r x - a x i s " ; IF RIGHTS(FILE$, 3) = " D S l " THEN PRINT " (0 to p l o t v s . t i m e ) " ; INPUT ; " : " , a l X IF a l X < 0 OR a l X > NCOLSX GOTO 1000 IF a l% = 0 THEN ' check i f TIME i n f o a v a i l a b l e IF RIGHTS(FILES, 3) <> " D S l " AND RIGHT$(FILES, 3) <> "DS2" AND RIGHTS(FILE$, 3) <> " A F 2 " THEN GOTO 1000 FOR IX = 1 TO NROWSX Xco l# ( IX) = t ime. ' ( IX) 81 82 NEXT END IF xaxnmS = co lnamS(a lX) yaxnm$ = colnamS(a2X) Now tha t we know which f a c t o r s t o p l o t l e t s t r y to get them on the sc reen F i r s t read them i n . LOCATE 25, 10 COLOR 9 PRINT " P l e a s e wa i t . . . r e a d i n g v a l u e s from f i l e . " ; COLOR 15 IF a l X <> 0 THEN CALL r e a d c o l ( F I L E S , X c o l * ( ) , a l X ) CALL r e a d c o l ( F I L E S , y c o l * ( ) , a2X) That was easy , now f i n d the max and min v a l u e s o f the f a c t o r s . RXSCALEX = 0 RYSCALEX = 0 XSCALE* = 0 YSCALE# = 0 MIN1# = INIT* min2iC = INIT# MAX1* = INIT* max2* = INIT* FOR IX = 1 TO NROWSX IF Xco l# ( IX ) > MAX1* OR MAX1* = INIT* THEN MAX1* = Xco l * ( I%) IF Xco l# ( IX) < MINI* OR MINI* = INIT* THEN MINI* = X c o l * ( I X ) IF a2X > 0 THEN IF y c o l * ( I X ) > max2* OR max2* = INIT* THEN max2# = yco l * ( I%) IF y c o l * ( I X ) < min2* OR min2* = INIT* THEN min2* = yco l# ( IX) END IF NEXT Now t h a t ' s done, f i n d out where on the sc reen the axes go. Then c a l c u l a t e s c a l e f a c t o r and o f f s e t t o p l o t p o i n t s on s c r e e n . X - a x i s f i r s t , h o p e f u l l y , the y - s c a l e w i l l be the same. to IF MAX1* * MINI* >= 0 THEN O IF MAX1# - MINI* = 0 THEN XOFFSETX = xmaxX \ 2 XSCALE* •= 1 ELSEIF MAX1* <= 0 THEN XOFFSETX = xmaxX - 1 XSCALE# = .9 * (xmax% - 2) / (-MIN1#) ELSE XOFFSET% = 2 XSCALE* = .9 * (xmaxX - 2) / (MAX1*) END IF ELSE IF (MAX1# - MIN1# = 0) THEN XOFFSETX = xmaxX / 2 XSCALE* = .4 * (xmaxX - 2) / MAX1# ELSE XOFFSET% = COBL(xmax%) * (-MIN1#) / (MAX1* - MIN1#) XSCALE# = .9 * (xmax% - 2) / (MAX1* - MINI*) END IF END IF 1700 IF a2% = 0 THEN Y0FFSET% = (YMAXX - YMIN%) / 2 YSCALE* = 1 ELSE IF max2* * min2* >= 0 THEN IF max2# <= 0 THEN Y0FFSET% = (YMAX% - YMIN% - 1) YSCALE* = .9 * (YMAXX - YMIN% - 2) / (-m1n2*) ELSE YOFFSETX = 2 YSCALE* = .9 * (YMAXX - YMINX - 2) / (max2#) END IF ELSE YOFFSETX = (YMAXX - YMINX) * (-min2#) / (max2* - min2#) YSCALE* = .9 * (YMAXX - YMINX - 2) / (max2* - min2*) END IF END IF 84 x o f f s e t X and x s c a l e * a r e c o r r e c t , now c a l c u l a t e whether x - s c a l e w i l l do fo r y - s c a l e . i e . check p o s i t i o n o f min2 and max2 to see i f they f a l l on the screen us ing x s c a l e * TYSCALE* = YSCALE* TXSCALE* = XSCALE* IF a2X = 0 GOTO 2000 posmax* = max2* * XSCALE* + YOFFSETX posmin* = min2* * XSCALE* + YOFFSETX IF posmin* > 0 AND posmax* < (YMAXX - YMINX) THEN RYSCALEX = -1 END IF now check i f i t i s p o s s i b l e t o s c a l e x - a x i s w i t h y - s c a l e * . oNLY CHeck i f i t i s NOT p o s s i b l e t o s c a l e Y - a x i s . IF NOT RYSCALEX THEN posmax* = MAX1* * YSCALE* + XOFFSETX posmin* = MINI* * YSCALE* + XOFFSETX IF posmin* > 0 AND posmax* < xmaxX THEN RXSCALEX = -1 END IF END IF XCROSSX = XOFFSETX YCROSSX = YMAXX - YOFFSETX 2000 CLS SCREEN 9, , 0, 0 keycodeX * 0 COLOR 4 LINE (0 , YMAXX - YOFFSETX)-(xmaxX, YMAXX - YOFFSETX) LINE (XOFFSETX, YMAXX)-(XOFFSETX. YMINX) draw t i c k marks on axes IF a2X = 0 GOTO 2500 FOR I* = 0 TO xmaxX STEP xmaxX / 15 LINE (XOFFSETX + I*, YMAXX - YOFFSETX - 1) - (X0FFSETX + I*, YMAXX -YOFFSETX + 1) LINE (XOFFSETX - I*, YMAXX - YOFFSETX - 1) - (X0FFSETX - I*. YMAXX -YOFFSETX + 1 ) N> to 85 NEXT stp# = xmax% / 15 * TYSCALE* / TXSCAIE* IF stp# < 3 THEN stp# = 4 IF stp# > YMAX% - YMINX THEN stp# = YMAX% FOR 1% = 0 TO (YMAXX - YMIN%) STEP stp# LINE (XOFFSETX-l.YMAXX-YOFFSETX-IX)-(XOFFSETX+l,YMAXX - YOFFSET% - 1%) LINE (XOFFSETX-l,YMAXX-YOFFSETX+IX)-(XOFFSETX+l,YMAXX - YOFFSET% + 1%) NEXT 2500 COLOR 11 LOCATE 1. 6 PRINT " S c a t t e r - p l o t o f " ; FLNM$; COLOR 12 LOCATE 1, 38 PRINT " X - A x i s = " ; xaxnm$; " " ; IF a2% <> 0 THEN PRINT " Y - A x i s = " ; yaxnm$ COLOR 7 LOCATE 2 , 11 PRINT " R e l a t i v e s c a l e ( Y - a x i s / X - a x i s ) ="; SC$ = LTRIM$(STR$(TXSCALE# / TYSCALE*)) PO% = INSTR(SC$, " . " ) . LE% = LEN(SC$) IF POX = 0 THEN IF LEX < 6 THEN SCF$ = STRING$(LE% + 1. "#") ELSE SCF$ = "##.## END IF ELSE IF POX < 6 THEN SCF$ = STRING$(POX + 1. "#") IF 6-POX > 0 THEN SCF$ = SCF$ + " . " + STRING$(6 - POX, "#") ELSE ' SCF$ = "##.## END IF END IF PRINT USING SCF$; CSNG(TXSCALE# / TYSCALE#) 86 END IF COLOR 13 FOR IX = 1 TO NROWSX XCORDX = Xco l# ( IX ) * TXSCALE* + XOFFSETX IF a2X <> 0 THEN YCORDX = y c o l * ( I X ) * TYSCALE* + YOFFSETX ELSE YCORDX = YOFFSETX END IF Find c o l o r number f o r p l o t t i n g . aX = 1 DO WHILE aX <= numclassesX IF c L a s s ( I X ) . R <> CLASSES(aX) .R THEN aX = aX + 1 ELSE EXIT DO END IF LOOP IF aX > numclassesX THEN c o l X = 11 ELSE c o l X = c o l r s X ( a X MOD COLRMAXX) END IF CALL MARK(XCORDX, YMAXX - YCORDX, c o l X ) NEXT 2650 LOCATE 25, 1 PRINT STRING$(80. " " ) ; COLOR 12 LOCATE 2 , 1 PRINT USING ; CSNG((-YMINX + YMAXX - YOFFSETX) / TYSCALE*) ; LOCATE 25, 71 PRINT USING "##.## ; CSNG((xmaxX - XOFFSETX) / TXSCALE*) ; LOCATE 25, 1 COLOR 15 PRINT "(N-new axes I - I d e n t i f y E - e x i t O-Output K - L i s t C l a s s e s " ; IF RYSCALEX OR RXSCALEX THEN PRINT " Y - r e s c a l e a x e s " ; M PRINT " ) " ; £ 2100 CALL ge tkey(a$) a$ = UCASE$(a$) IF a$ = CHR$(27) GOTO 1000 IF a$ = " E " GOTO 1000 IF a$ = " I " AND a2X <> 0 GOTO 5000 IF a$ = " N " GOTO 1000 IF a$ = " K " GOTO 6000 IF a$ = " 0 " GOTO 3000 IF a$ = " Y " AND (RYSCALEX OR RXSCALEX) THEN IF RYSCALEX THEN TYSCALE* = YSCALE* + XSCALE* - TYSCALE* ELSE TXSCALE* = XSCALE* + YSCALE* - TXSCALE* END IF GOTO 2000 END IF GOTO 2100 ' Th is s e c t i o n d e a l s w i t h i d e n t i f y i n g i n d i v i d u a l p o i n t s on the graph 5000 LOCATE 25. 1 c r s s c o l o r X = 15 CALL CROSSHAIRS(XCR0SSX, YCROSSX, c r s s c o l o r X ) 5050 LOCATE 25, 1 PRINT "Use c u r s o r keys to move c r o s s . Press r e t u r n to p lace .ESC H e l p . " ; 5100 LOCATE 2 , 56 XD* = (XCROSSX - XOFFSETX) / TXSCALE* YD* = (-YCROSSX - YOFFSETX + YMAXX) / TYSCALE* PRINT USING "X = # * . * * ; CSNG(XD*); PRINT USING " Y = # # . # * ; CSNG(YD#); 5101 CALL getkey(a$) IF a$ = " ? " OR a$ = " / " THEN GOSUB 5102 keycodeX = 0 GOTO 5101 END IF IF LEFT$(a$, 1) = CHR$(0) THEN a$ = RIGHT$(a$, 1) SELECT CASE ASC(a$) CASE 75 ' l e f t IF XCROSSX > 0 THEN XCROSSX « XCROSSX - 1 CASE 77 ' r i g h t IF XCROSSX < xmaxX THEN XCROSSX = XCROSSX + 1 CASE 72 ' up IF YCROSSX > YMINX THEN YCROSSX = YCROSSX - 1 CASE 80 ' down IF YCROSSX < YMAXX THEN YCROSSX - YCROSSX + 1 CASE 71 ' home IF XCROSSX > XMINX + 10 THEN XCROSSX = XCROSSX - 10 END IF CASE 55 ' SHIFT HOME IF XCROSSX > XOFFSETX THEN XCROSSX = XOFFSETX ELSE XCROSSX = XMINX END IF CASE 79 ' end (SHIFT) IF XCROSSX < xmaxX - 10 THEN XCROSSX = XCROSSX + 10 END IF CASE 49 ' SHIFT END IF XCROSSX < XOFFSETX THEN XCROSSX = XOFFSETX ELSE XCROSSX = xmaxX END IF CASE 73 , 56 ' pg-up , SHIFT UP IF YCROSSX > YMINX + 10 THEN YCROSSX = YCROSSX - 10 ELSE YCROSSX = YMINX END IF CASE 8 1 , 50 ' pg -dn , SHIFT DOWN IF YCROSSX < YMAXX - 10 THEN YCROSSX = YCROSSX + 10 ELSE YCROSS% = YMAXX END IF CASE 54 ' SHIFT r i g h t IF XCROSS% < xmax% - 10 THEN XCROSS% = XCROSSX + 10 ELSE XCROSS% = xmaxX END IF CASE 52 ' SHIFT l e f t IF XCROSS% > 10 THEN XCROSS% = XCROSSX - 10 ELSE XCROSS% = 0 END IF CASE 57 ' SHIFT PGUP IF YCR0SS% > YMAX% - YOFFSETX THEN YCROSS% = YMAXX - YOFFSETX ELSE YCROSSX = YMINX + 1 END IF CASE 51 ' SHIFT PGDN IF YCROSSX < YMAXX - YOFFSETX THEN YCROSSX = YMAXX - YOFFSETX ELSE YCROSSX = YMAXX - 1 END IF CASE 13 ' r e t u r n IF XCROSSX > xmaxX - 3 OR XCROSSX < 3 THEN SOUND 45 , 1 GOTO 5101 ELSEIF YCROSSX > YMAXX - 3 OR YCROSSX < YMINX + 3 THEN SOUND 45, 1 GOTO 5101 END IF boxco lo rX = 7 rad i usX = 2 89 90 1199 LOCATE 25, 1 PRINT "Use c u r s o r keys to s e l e c t area.ENTER to s e l e c t . ? - H e l p . E S C -e x l t . " ; DIM c r c l eX (maxd im ! ) 1200 CALL eggbox(XCROSSX, YCROSSX, r a d i u s X , b o x c o l o r X , c r c l e X ( J ) >201 CALL ge tkey(B$) IF B$ = " ? " OR 8$ = " / " THEN keycodeX = 0 GOSUB 5203 GOTO 5201 END IF IF LEFT$(B$, 1) = CHR$(0) THEN B$ = RIGHT$(B$, 1) SELECT CASE ASC(B$) CASE 75, 72 ' l e f t , u p GDX = 0 IF (XCROSSX - r a d i u s X > 1) THEN IF (YCROSSX - r a d i u s X > YMINX + 1) THEN IF (YCROSSX + r a d i u s X < YMAXX - 1) THEN r a d i u s X = r a d i u s X + 1 GDX = -1 END IF END IF END IF IF NOT GDX THEN SOUND 60, 1 GOTO 5201 END IF CASE 77, 80 ' r i g h t , down IF r a d i u s X > 2 THEN r a d i u s X = r a d i u s X - 1 ELSE SOUND 60, 1 GOTO 5201 END IF CASE 71 ' home r a d i u s X = 6 CASE 79 ' end CASE 73, 56, 52 ' p g - u p , s h i f t u p , s h i f t l e f t GD* = 0 IF (XCROSS% - r a d i u s * > 10) THEN IF (YCR0SS% - r a d i u s * > YMIN% + 10) THEN IF (YCR0SS% + r a d i u s * < YMAX% - 10) THEN r a d i u s * = r a d i u s * + 10 GD* = -1 END IF END IF END IF IF NOT GD* THEN SOUND 60, 1 GOTO 5201 END IF CASE 8 1 , 50, 54 ' p g - d n , s h i f t r i g h t , s h i f t down IF r a d i u s * > 11 THEN r a d i u s * = r a d i u s * - 10 ELSE r a d i u s * = 2 END IF CASE 13 ' r e t u r n GOTO Imems CASE 27 ' ESC CALL eggbox(XCROSS*, YCROSS*, RAD*, 0, c r c l e * ( ) ) 'ERASE c r c l e * GOTO 5050 CASE ELSE GOTO 5201 END SELECT GOTO 5200 CASE 27 ' ESC CALL CROSSHAIRS(XCROSS*, YCROSS*, -1) GOTO 2650 CASE ELSE END SELECT CALL CROSSHAIRS(XCROSS*, YCROSS*, c r s s c o l o r * ) 91 GOTO 5100 92 5102 ' P r i n t out f u n c t i o n s o f c u r s o r keys d u r i n g ID o p e r a t i o n SCREEN 9 , , 1, 1 COLOR 11, 0 CLS PRINT " Moving c r o s s " PRINT PRINT " KEY PRINT " PRINT " Up, Down PRINT " L e f t , R igh t PRINT PRINT " PgUp, PgDn PRINT " Home, End PRINT " SHIFTed arrows PRINT PRINT " SHIFT Home PRINT " SHIFT End PRINT PRINT " SHIFT PgUp PRINT " SHIFT PgDn PRINT PRINT " RETURN PRINT PRINT " ESCape PRINT PRINT " ? CALL s l e p COLOR 15 CLS SCREEN 9, , 0, 0 RETURN FUNCTION Moves c u r s o r " by one p i x e l " Moves c u r s o r u p / d o w n / l e f t / r i g h t " by ten p i x e l s Moves c u r s o r t o l e f t / r i g h t s i d e o f s c r e e n " o r t o a x i s " Moves c u r s o r t o t op /bo t tom o f s c r e e n " o r t o a x i s " Opens c i r c l e f o r d e f i n i n g a r e a " E x i t s to p r e v i o u s menu" P r i n t t h i s s c r e e n " 5203 P r i n t out f u n c t i o n s o f c u r s o r keys d u r i n g ID c i r c l e o p e r a t i o n SCREEN 9, , 1, 1 CO Ul COLOR 11 CLS PRINT " PRINT PRINT " KEY PRINT " PRINT " Up. L e f t PRINT PRINT " R i g h t , Down PRINT PRINT " Home PRINT PRINT " SHIFT Up PRINT " SHIFT L e f t PRINT " PgUp PRINT PRINT " SHIFT Down PRINT " SHIFT R igh t PRINT " PgDn" PRINT PRINT " RETURN PRINT PRINT " ESCape PRINT PRINT " ? CALL s l e p COLOR 15 CLS SCREEN 9 , , 0, 0 RETURN D e f i n i n g A r e a " FUNCTION" Expand c i r c l e by one p i x e l u n i t " Con t rac t c i r c l e by one p i x e l u n i t " Reset c i r c l e to d e f a u l t (6 p i x e l rad i Expand c i r c l e by ten p i x e l u n i t s " Con t rac t c i r c l e by ten p i x e l u n i t s " I D e n t i f i e s p o i n t s i n c i r c l e " E x i t s to p rev ious menu" P r i n t t h i s s c r e e n " D i s p l a y c o l o r codes f o r d i f f e r e n t c l a s s e s 6000 IF keycode% THEN SCREEN 9 , , 1, 1 ELSE SCREEN 9 , , 1, 0 CLS 93 94 PRINT "CLASS - COLOR #" PRINT "===== =======" PRINT SCREEN 9 , , 1, 1 STARTX = 1 tabposX = 2 2001 STENDX = numclassesX LOCATE 3 IF STENDX - STARTX > 18 THEN STENDX = STARTX + 18 FOR IX » STARTX TO STENDX LOCATE , tabposX COLOR c o l r s X ( I X MOD COLRMAXX) PRINT CLASSES( IX ) .R ; " " ; c o l r s X ( I X MOD COLRMAXX) NEXT IF STENDX < numclassesX THEN tabposX = tabposX + 18 STARTX = STENDX + 1 GOTO 2001 END IF IF numclassesX = 25 THEN COLOR 11 LOCATE , tabposX PRINT "OTHER"; " " ; 11 END IF LOCATE 25, 1 COLOR 13 PRINT "PRESS ANY KEY TO CONTINUE"; END IF 6001 a$ = INPUT$(1) keycodeX = -1 SCREEN 9 , , 0, 0 GOTO 2100 lmems: ' Th is r o u t i n e p r i n t s out the members t ha t a re c o n t a i n e d i n the c i r c l e ' de f i ned by XX, YX, and RADX. Th i s i n v o l v e s a t r i c k y f u n c t i o n tha t shou ld ^ ' on ly be at tempted by a p r o f e s s i o n a l . DO NOT t r y t h i s a t home. (Yes, d e a r . ) w 95 ' SCREEN 9 page two Is used f o r d i s p l a y t o a l l o w concur ren t v iew ing o f the elements and t h e i r IO ' s keycode% = 0 LOCATE 25, 1 PRINT " p r e s s any k e y . . . " ; ' SPACE$(54); SCREEN 9 , , 1, 0 CLS PRINT " I d e n t i f i c a t i o n o f p o i n t s cen te red about " ; COLOR 12 PRINT RIGHT$(STRING$(8, " " ) + colnam$(al%), 8 ) ; COLOR 15 PRINT USING " = ! * . # # * * ; CSNG(XD#); PLUSMINUS$ = CHR$(241) + "##.### PRINT USING PLUSMINUSS; CSNG(radius% / TXSCALE! ) ; LOCATE 2 , 41 COLOR 12 PRINT RIGHT$(STRING$(8, " " ) + colnam$(a2%), 8 ) ; COLOR 15 PRINT USING " = t t . t m ; CSNG(YDI); PRINT USING PLUSMINUSS; CSNG(radius% / TYSCALE! ) ; LOCATE 3 , 1 PRINT " ID CLASS X - v a l u e Y - v a l u e D is tance from c r o s s h a i r s " LINE (0 , 4 1 ) - ( 6 4 0 , 41) LOCATE 24, 1 PRINT "Space bar to see next page, B f o r p rev ious page, P to see p l o t , LOCATE 25, 1 PRINT " F t o c r e a t e l i s t f i l e , C to change c l a s s i f i c a t i o n , ESC to e x i t . ' ' F ind a l l members tha t be long , p r i n t them out i n batches o f 18, a l l o w f o r scann ing up and down through them PgUP, PgDn and f l i p p i n g back and f o r t h between the two s c r e e n s . NMEMS% = 0 1% = 1 backend% = 1 5250 LOCATE 4, 1 b a c k s t a r t X = backend% 96 backend% = 1% NUMMEMS% = 0 WHILE NUMMEMS5C < 19 AND 1% <= NR0WS5C XI = Xcol#(I%) Y# = ycol#(I%) IF INSIDE%(XI. Y#, XD#, YD* . TYSCALE! , TXSCALE! , radius%) THEN NUMMEMS% = NUMMEMS% + 1 PRINT i d ( I%) .R ; " " ; c L a s s ( I X ) . R ; PRINT USING " I I . I ! * ! ; CSNG(XI ) ; PRINT USING " # # . » § § # ; CSNG(YI ) ; d e l t a $ = CHR$(127) PRINT USING d e l t a $ + " X = * l . * * l ; CSNG(ABS(XD* - X ! ) ) ; PRINT USING d e l t a $ + "l=*1 M * ; CSNG(ABS(YD* - Y * ) ) END IF 1% = 1% + 1 WEND NMEMS% = NMEMS% + NUMMEMS% IF NUMMEMS% < 1 THEN SCREEN 9 . , 0 , 0 SOUND 45, 2 LOCATE 25 , 1 PRINT "No p o i n t s s e l e c t e d . . . P ress any k e y . . . " ; CALL ge tkey (a$ ) GOTO 5199 END IF SCREEN 9 , , 1, 1 c l e a r sc reen below l a s t p r i n t e d l i n e (but above command l i n e s ) LINE (0 , (3 + NUMMEMS%) * 14 ) - ( 639 , 3 2 0 ) , 0, BF get some input here about whether t o go to 5250 and what to change i% t o . We need to keep t r a c k o f the f i r s t member o f each se t o f 18 so tha t we can s t a r t a t the r i g h t p l a c e q u i c k l y , Ie w i thou t scann ing th rough the damn t h i n g each t ime . Going forward i s no prob lem. Perhaps i t i s best to a l l o w go ing forward o n l y . I t i s no prob lem to a l l o w scann ing backwards each t ime by one page o n l y . 5260 LOCATE 25, 67 COLOR 13 PRINT CHRS(2) ; «J COLOR 15 5261 CALL ge tkey(a$) LOCATE 25, 67 PRINT " " ; SELECT CASE UCASE$(a$) CASE " " IF 1% >= NROWS% THEN 1% = 1 GOTO 5250 CASE "B" 1% = backs ta r t% NMEMS% = NMEMS% - NUMMEMSX GOTO 5250 CASE " C " LOCATE 24, 1 PRINT SPACE$(79) ; LOCATE 25 , 1 PRINT SPACE$(79) ; LOCATE 24, 1 PRINT " E n t e r new c l a s s i f i c a t i o n f o r ALL p o i n t s i d e n t i f i e d . LOCATE 25, 1 INPUT ; B$ IF B$ <> " " THEN c lasschange% = -1 B$ = LEFT$(B$ + " " , 4) EXIST% = 0 aX = 1 WHILE NOT EXIST% AND a% <= numclassesX EXIST% = (B$ = CLASSES(aX).R) aX = a% + 1 WEND IF NOT EXIST% THEN numclassesX = numclassesX + 1 CLASSES(numclassesX).R = B$ END IF aX = 1 WHILE BS <> CLASSES(aX).R AND a% <= numclassesX aX = aX + 1 WEND IF aX > numclassesX THEN c o l X = 11 ELSE c o l X = c o l r s X ( a X MOD COLRMAXX) END IF SCREEN 9 , , 0, 1 FOR i k X - 1 TO NROWSX X# = Xco l#(1kX) Y# = yco l# (1kX) IF INSIDEX(X#, Y#. XD#, YD#, TYSCALE#,TXSCALE#.radiusX) THEN c L a s s ( i k X ) . R = B$ XX = X# * TXSCALE* + XOFFSETX YX = -Y# * TYSCALEI - YOFFSETX + YMAXX CALL MARK(XX, YX, c o l X ) END IF NEXT i k X SCREEN 9 , , 1, 1 END IF GOTO lmems CASE " F " LOCATE 25 , 1 PRINT " E n t e r f i l e n a m e t o c r e a t e (ENTER to a b o r t ) " ; INPUT ; " " , LIST$ IF LISTS = " " THEN 5262 LISTS = UCASES(LISTS) OPEN " o " , #5, LISTS PRINT #5, "E lements o f : " ; FILES PRINT #5, "X a x i s = " ; c o l n a m $ ( a l X ) , "Y a x i s = " ; colnam$(a2X) PRINT #5, USING "X r a d i u s = # # . # # # ; XD#; PRINT #5, USING PLUSMINUSS; r a d i u s X / TXSCALE* PRINT #5, USING " y r a d i u s = # # . # # # ; YD#; PRINT #5, USING PLUSMINUSS; r a d i u s X / TYSCALE* PRINT #5, " ID CLASS X - v a l u e Y - v a l u e D i s t a n c e from c r o s s h a i r s " PRINT #5. " J % = 1 backendX = 1 WHILE JX <= NSAMSX X# «= X c o l * ( J X ) Y* * y c o l * ( J X ) IF INSIOEX(X*, Y#, XD#, YD* , TYSCALE*, TXSCALE*. radius%) PRINT 1d(J%).R; " " ; c L a s s ( J X ) . R ; PRINT * 5 , USING " * * . * * * ; X * ; PRINT * 5 , USING " * * * * * ; Y * ; PRINT * 5 , USING CHR$(127) + " X = # * . * # * ; XO* - X#; PRINT * 5 , USING CHR$(127) + " Y = # # . # * * ; YD* - Y* END IF JX = JX + 1 WEND CLOSE * 5 5262 LOCATE 25. 1 PRINT"F to c r e a t e l i s t f i l e , C to change c l a s s , E S C to e x i t . CASE CHR$(27) SCREEN 9 . . 1, 0 CLS SCREEN 9 , , 0, 0 GOTO 5299 CASE " P " SCREEN 9 , , 1, 0 a$ = INPUT$(1) SCREEN 9 , , 1, 1 CASE ELSE END SELECT GOTO 5260 5299 CALL CROSSHAIRS(XCROSS%, YCROSS%, -1) CALL eggbox(XCROSS%, YCROSSX, RADX, - 1 , c r c le%( ) ) 'ERASE c r c l e X GOTO 2000 Th is s e c t i o n d e a l s w i t h output 3000 LOCATE 25. 1 PRINT STRING$(80, " " ) ; 99 100 LOCATE 25, 1 COLOR 11 PRINT " ( P - HP C o l o r P r o p l o t t e r S - SIGHAPLOT f i l e H- p r i n t e r ) " ; a$ = INKEY$ a$ = " " WHILE a$ = " " a$ = INKEYS WEND IF UCASE$(a$) = " P " THEN 3004 IF UCASE$(a$) = " S " THEN 3100 IF UCASE$(a$) = " H " THEN 2650 ' NOT YET GOTO 2650 3004 CLS PRINT " P l e a s e make su re the p l o t t e r i s t u rned on and paper i s l o a d e d . " INPUT " P r e s s RETURN (Q - to a b o r t ) " , R$ IF R$ = " q " OR R$ = " Q " THEN 3999 PRINT 3005 OPEN " C 0 M 1 : 9 6 0 0 , S , 7 . 1 , R S , C S 6 5 5 3 5 , D S , C D " FOR RANDOM AS *1 PRINT * 1 , " I N ; " 3006 PRINT * 1 , " O S ; " pause w h i l e w a i t i n g f o r p l o t t e r t o respond to OS command PRINT " T a l k i n g t o p l o t t e r . " ; FOR I! = 1 TO 5000 NEXT LOCATE 4 , 1 PRINT IF EOF( l ) THEN LOCATE 4 , 1 PRINT "The p l o t t e r i s not l oaded w i t h p a p e r . " PRINT " P l e a s e do so now and p r e s s r e t u r n . " PRINT " ( E n t e r ' Q ' t o abor t p l o t . ) " ; INPUT R$ IF UCASE$(LEFT$(R$, 1)) = " Q " GOTO 3999 to VO 101 ' Con t ro l i s now sent t o 3005. O r i g i n a l l y , the C0M1 por t was not c l o s e d and ' the c o n t r o l was passed back to 3006. However, at t h i s po in t i f the tu rkey ' sa t and r e p e a t e d l y p ressed r e t u r n w i thout l o a d i n g paper (3 or 4 t i m e s ) , then the p l o t t e r b u f f e r would f i l l and the computer would hang - w a i t i n g f o r ' space i n the p l o t t e r b u f f e r . The pause i s added to account f o r the de lay t ime between OS command and the p l o t t e r response . CLOSE #1 GOTO 3005 END IF LINE INPUT #1, a$ PRINT #1, CHR$(27) + " . P 3 : " CLS PRINT PRINT "Axes w i l l be l a b e l l e d . Fi lename and date w i l l a l s o appear on the p l o t . " PRINT 3040 PRINT " E n t e r TITLE (maximum 42 c h a r a c t e r s ) | " LOCATE , 43 PRINT " | " ; LOCATE , 1 INPUT " " , t i t l e ! IF L E N ( t i t l e S ) > 42 GOTO 3040 c e n t e r t i t l e In f i e l d o f 42 spaces cnt% = 42 - L E N ( t i t l e S ) cnt% = cn tX \ 2 t i t l e $ = SPACES(cnt%) + t i t l e S 3010 DIM pn%(3) LOCATE 12, 1 INPUT " E n t e r pen number f o r p l o t t i t l e ( 0 - 8 ) : " , pn%(l) IF pn%(l) < 1 OR pn%(l) > 8 GOTO 3010 INPUT " Pen number f o r date : " , pn%(3) 3020 LOCATE 14, 1 INPUT " Pen number f o r axes : " , pn%(2) IF pn%(2) < 1 OR pn%(2) > 8 GOTO 3020 3030 LOCATE 16. 1 en te r input here f o r l a b e l s on p o i n t s 102 FOR IX = 1 TO numclassesX 3031 PRINT " Pen f o r s i g n a l s o f c l a s s <"; CLASSES( IX ) .R ; ">: " ; INPUT " " , CLPNX(IX) IF CLPNX(IX) < 1 OR CLPNX(IX) > 8 GOTO 3031 NEXT INPUT "Pen number f o r s i g n a l s w i t h no c l a s s : " , BS CLPNX(O) = CLPNX(l) IF BS <> " " THEN CLPNX(O) = VAL(B$) set d imens ions on p l o t t e r . P lease see HP7440A program manual f o r a l i s t o f what i s be ing done h e r e . I f you want to change a n y t h i n g , y o u ' l l need i t ! ! ! PRINT #1, " p a O . 6 9 0 0 ; " PRINT #1, " D R ; " PRINT #1, " S I . 4 , . 5 ; " PRINT #1, " S P " ; pnX(l) PRINT #1, " L B " + titleS + CHR$(3) + " ; " PRINT #1, " P A 0 . 7 2 9 0 ; " PRINT #1, " S I . 2 , . 2 ; " PRINT #1, " P A 9 0 0 0 , 7 2 9 0 ; " IF LEN(FILE$ + FEXTS) > 11 THEN PRINT #1, " C P " ; 11 - LEN(FILE$ + FEXTS) ; " , 0 ; " END IF PRINT #1, " L B " + FILES + FEXTS + CHRS(3) + " ; " PRINT #1, " S P " ; p n X ( 3 ) ; " ; " PRINT #1, " L B " + DATES + CHR$(3) + " ; " PRINT #1, " P A 3 0 0 . 6 6 7 5 ; " PRINT #1, " S I . 3 , . 3 ; " PRINT #1, " L B P l o t o f " + CHRS(3) PRINT #1. " C P " ; LEN(RTRIM$(co lnam$(a2X)) ) ; " , 0 ; " PRINT #1, "LB v s . " + CHR$(3) PRINT #1, " S P " ; pnX(2) PRINT #1, "CP " ; -LEN(RTRIM$(colnam$(a2X)) ) - 5 ; " , 0 ; " PRINT #1, " L B " + RTRIMS(colnam$(a2X)) + CHR$(3) + " ; " PRINT #1, "CP 5 , 0 ; " PRINT #1, " L B " + RTRIM$(co lnam$(a lX)) + CHR$(3) + " ; " Now tha t a l l the t i t l e and c rap 1s p r i n t e d l e t ' s draw some axes . PRINT 01, " L T ; " to o PRINT #1, " P A O , " ; YOFFSETX * 50 / 3 + 700; " ; " PRINT #1, " P O ; P R 1 0 0 0 0 , 0 ; " PRINT #1, " P U ; " PRINT #1. " P A " ; XOFFSETX * 50 / 3 ; " . 7 0 0 ; " PRINT #1, " P D ; P R 0 . 5 0 0 0 ; " PRINT #1. " P U ; " F i r s t , how about remapping the paper to cor respond to sc reen u n i t s . PRINT #1, " IU 0 ,700 ,10000 ,5700 ; " PRINT #1, " I P " ; X 0 F F S E T X * 5 0 / 3 ; " , " ; Y 0 F F S E T X * 50/3+700; " . 1 0 0 0 0 , 5 7 0 0 ; ' YPLOT = CSN6((YMAXX - YOFFSETX) / TYSCALE*) IF YPLOT < 1 THEN YPLOT = 1 PRINT #1, " S C O , " ; CSN6((xmaxX - XOFFSETX) / TXSCALE*) ; " , 0 , " ; YPLOT; ' The p l o t t e r w i l l now d i r e c t l y map the coo rd i na tes o f x c o l * and y c o l * PRINT * 1 , "SI . l , . l ; " FOR cX = 1 TO numclassesX PRINT * 1 , " S P " ; CLPNX(cX) ; " ; " FOR sX = 1 TO NROWSX IF c L a s s ( s X ) . R = CLASSES(cX).R THEN CALL p m r k ( X c o l * ( s X ) , y c o l * ( s X ) ) END IF NEXT NEXT now p l o t a l l p o i n t s tha t have no c l a s s i n p l o t c o l o r (1) PRINT * 1 , " S P " ; STR$(CLPNX(0)) ; " ; " FOR IX = 1 TO NROWSX FOR cX = 1 TO numclassesX IF c L a s s ( I X ) . R = CLASSES(cX).R THEN GOTO 901 END IF NEXT CALL p m r k ( X c o l * ( I X ) , y c o l * ( I X ) ) 901 NEXT PRINT * 1 , " S P ; " PRINT * 1 , " I N ; " PRINT * 1 , " P A O , 7 4 7 9 ; " 103 CLOSE #1 3999 GOTO 2000 104 3100 LOCATE 25, 1 PRINT STRING$(80, " " ) ; LOCATE 25, 1 INPUT ; " E n t e r f i l e n a m e f o r S igmaplo t ou tpu t : " ; sp f$ 'pu t e r r o r c h e c k i n g here e x t e n s i o n = " . A S P " LOCATE 25, 1 PRINT STRING$(80, " " ) ; LOCATE 25, 1 PRINT ">TYPE " ; s p f $ ; " f o r SIGMAPLOT l o a d i n g i n s t r u c t i o n s . " ; Q$ = CHR$(34) OPEN " 0 " , * 1 , sp f$ PRINT * 1 , QJ+"0utput f i l e from ABSCAT. Source f i l e :"+FILE$+ext$ + 0$ PRINT * 1 , Q$ + " x - a x i s = " ; c o l n a m $ ( a l X ) ; " ; y - a x i s = " ; c o l n a m $ ( a 2 X ) ; Q$ PRINT #1, Q$+"Read i n t o SIGMAPLOT w i t h " ;1+numc lassesX; "co lumns and " ; PRINT #1, " use ' . " ' ; " as d e l i m i t e r " + Q$ PRINT #1, Q$+"Skip 5 f i e l d s . Column 1 = x - a x i s . Other columns a re y - " ; PRINT * 1 , " a x i s " + Q$ PRINT * 1 . Q$ + " x - a x i s " + Q$; FOR IX = 1 TO numclassesX PRINT #1. " , " + Q$ + "CL=" + CLASSES( IX) .R + Q$; NEXT PRINT * 1 , FOR IX = 1 TO NROWSX PRINT #1. C S N G ( X c o l * ( I X ) ) ; " , " ; FOR aX = 1 TO numclassesX - 1 IF c L a s s ( I X ) . R = CLASSES(aX) .R THEN EXIT FOR ELSE PRINT * 1 , Q$ + " - " + Q$; " , " ; END IF NEXT PRINT #1, CSNG(yco l# ( IX ) ) ; £ FOR BX = aX + 1 TO numclassesX - H 1 PRINT #1. Q$ + " - " + Q$; NEXT PRINT #1. NEXT CLOSE #1 COLOR 13 PRINT " . . . P r e s s any k e y . . . " ; CALL s l e p GOTO 2650 I f Mr. User has changed some o f the c l a s s e s o f the s i g n a l s , then i t seems o n l y f a i r to ask h i m / h e r / o t h e r i f they shou ld be w r i t t e n to the f i l e . And HEY! the word " o t h e r " i s s e x i s t ! ! ! ! Should be s p e l l e d " o t h e i r " f o r n e u t r a l i t y . R igh t p e r s i b l i n g s ? 4000 IF c lasschangeX AND RIGHT$(FILE$, 3) = " D S l " THEN LOCATE 25, 1 PRINT "Oo you wish the new c l a s s e s to be w r i t t e n to the f i l e " ; INPUT ; " " ; a$ IF UCASE$(LEFT$(a$, 1}) = " Y " THEN OPEN FILES FOR RANDOM ACCESS WRITE AS #1 LEN = 4 FIELD #1, 4 AS REC$ FOR 1% = 1 TO NROWSX LSET REC$ » c L a s s ( I X ) . R spot& » (1% + 1) * NCOLSX + 4 * ( IX - 1) + 4 PUT #1, spot& NEXT CLOSE #1 END IF END IF SCREEN 0 END SUB SUB SCLFORLOD (F ILES. SC$, ModeX(), a#() , B#()) Th is subprogram i s des igned to read i n a f i l e saved by SCLFORSAV and parse i t i n t o the ADVSCAL subprogram so that the l a z y J o e ' s d o n ' t have to type so much. A l s o i t i s n i c e to be a b l e to ensure that s c a l i n g i s c o n s i s t e n t 106 between da ta s e t s . I f t he number o f d e s c r i p t o r s i s d i f f e r e n t between the cu r ren t data set and the saved format f i l e , then (uh oh) the program reads i n on l y enough to use ( i e . the e x t r a f o r m a t t i n g i s i g n o r e d ) . I f t he re i s not enough i n the f i l e the v a r i a b l e s l e f t over a re l e f t as the d e f a u l t OPEN " i " , #3, FILES INPUT #3, SC$ FOR IX = 1 TO UBOUND(ModeX) INPUT #3, ModeX( IX) , a# ( IX ) , B#(IX) NEXT CLOSE #3 END SUB 5UB s c l f o r s a v ( F I L E S , SC$, a X ( ) , B#() , c#( ) ) See the notes on SCLFORLOD. And see I f they a r e n ' t s t r a i g h t f o rward . The f i l e has a s imp le ASCII fo rmat . OPEN " 0 " , #2, FILES WRITE #2, SC$ FOR IX = 1 TO UBOUND(aX) WRITE #2, a X ( I X ) , B#( IX) , c#( IX) NEXT IX PRINT #2, " T h i s f i l e c r e a t e d on " + DATES + " a t " + TIMES CLOSE #2 :"ND SUB 5UB s l e p My v e r s i o n o f QB 4 . 5 ' s SLEEP command. Had to w r i t e t h i s when 4 .5 kept c r a s h i n g . aS = INKEYS: aS = " " : WHILE a$ = " " : a$ = INKEY$: WEND [NO SUB SUB TIMPRINT (mess#(), varnam$() , FS) Th is r o u t i n e was i n s p i r e d by Timothy C r o w t h e r s ' a t tempts a t g e t t i n g out the column s t a t i s t i c s . S o . . . . we need to c a l l ADVSCAL and have the format changed s l i g h t l y . A l l o w p r i n t sc reen and l o t u s f i l e output i f t ime permi ts to to 107 ' The t r i c k i s , how do we c a l l advsca l w i thout hav ing the t h i n g a c t u a l l y ' s c a l e the data? W e l l , how about I f we pass s c a l $ as "VIEW MODE". Now, i f ' the ADVSCAL r o u t i n e checks f o r "VIEW MODE" and behaves a p p r o p r i a t e l y . . . s c a l $ = "VIEW MODE" DIM DELVAR%(1) ' dummy a r ray f o r parameter c o n s i s t e n c y CALL 'advsca l (mess# ( ) , varnam$() , F$ , s c a l $ , DELVAR%()) END SUB SUB t i t l e s c r n COLOR 15, 11 CLS LOCATE 3, 15 PRINT " A s t r a c t Fac to r A n a l y s i s and Scat te rgram U t i l i t y . " LOCATE 5, 35 PRINT "ABSCAT" LOCATE 10, 18 PRINT " L a b o r a t o r y fo r Automated Chemical A n a l y s i s " LOCATE 12, 28 PRINT "Department o f Chem is t r y " LOCATE 12, 24 PRINT " U n i v e r s i t y o f B r i t i s h Co lumbia" LOCATE 13, 25 PRINT "Vancouver , B r i t i s h Co lumb ia" LOCATE 23, 27 COLOR 11 PRINT " P r e s s any key to c o n t i n u e . " END SUB Th is i s a smal l attempt at a sense o f humour. Approx imate ly 6% of the t ime the f r on t sc reen w i l l p i s s somebody o f f f o r th ree seconds. Hope i t i s n ' t A d r i a n . (Note, t h i s r ou t i ne i s o b v i o u s l y not mandatory - but I can t h i n k o f b e t t e r ones ! ) SUB TITLESCRN2 a$ = INPUT$(1) RANDOMIZE TIMER a% = RND(l) * 100 + 1 108 IF a% < 6 THEN SOUND 45 , 32 LOCATE 24, 30 COLOR 28 PRINT "EXCEPT THAT ONE ! ! ! ' a# = TIMER WHILE TIMER < a# + 3 WEND END IF :ND SUB >UB va rhe lp COLOR 11, 12 CLS LOCATE 1, 20 PRINT " H e l p - s c r e e n f o r VAR - V a r i a n c e o f e i g e n v e c t o r " PRINT PRINT " In p r a c t i s e , e i g e n v e c t o r s hav ing l a r g e v a r i a n c e s are c o n s i d e r e d p r i m a r y " PRINT " e i g e n v e c t o r s , whereas e i g e n v e c t o r s hav ing sma l l v a r i a n c e s a re c o n s i d e r e d PRINT " t o be secondary e i g e n v e c t o r s . PRINT " PRINT " PRINT " PRINT " VAR(N) » PRINT " PRINT " PRINT " PRINT PRINT " where : PRINT " PRINT " PRINT " PRINT CALL s l e p CLS c E n : l N = e i g e n v e c t o r o f i n t e r e s t " c = number o f columns ( e i g e n v e c t o r s ) ' 1 = e i g e n v a l u e " to END SUB FUNCTION VARIANCE* (MAT#(), c o l X , numrowsX, AVE!) ' r e tu rns the v a r i a n c e o f the COLX' th column o f the MATf r i x from the f i r s t ' the NUMROWSX'th row. sum# » 0 FOR NXX «= 1 TO numrowsX sum# = sum! + HAT#(NXX, c o l X ) NEXT NXX AVE* = sum# / numrowsX v r * = 0 FOR NXX = 1 TO numrowsX v r * = vr# + (MAT#(NXX, c o l X ) - AVE#) * 2 NEXT NXX IF v r * THEN VARIANCE* * v r * / (numrowsX - 1) ELSE VARIANCE* « 0 END IF END FUNCTION SUB v e r p r i n t (W0RD$. XX, YX) ' Th is r o u t i n e p r i n t s out the s t r i n g W0R0$ v e r t i c a l l y w i t h the f i r s t l e t t e r at XX ,YX . I f the word i s to be p r i n t e d upwards, g i ve XX as n e g a t i v e . LNX = LEN(WORDJ) IF XX < 1 THEN XX ' -XX enX •= YX s tX - YX + LNX - 1 dir% = -1 ELSE s tX = YX enX = YX + LNX - 1 dir% - 1 END IF psX = 0 FOR IX = s tX TO enX STEP d i r X psX = psX + 1 LOCATE IX, XX PRINT MID$(WORD$, psX, 1 ) ; to NEXT . END SUB tnmooBOX 7> Z-<r->0 245 XI.4 Hierarchical Clustering / Dendrogram Analysis Program D E N D G R A M was written in QuickBASIC 4.0 entirely by D. B. Sibbald. The knowledge for the dendrogram algorithms is attributed to the course presented by Dr. Ken Burton at the 2nd Spring School of Chemometrics in April, 1988 at Bristol, U.K.. The attendance of the course was precipitated by the coincidence of the dates of the course with that of the birthday of a special friend living in Norfolk. The program has been divided into two halves because of size. The first half calculates the dendrogram from a data file (.DS1) created by P. D. Wentzell's A E M U N C H program. The second half (DENDPLOT) is dedicated to displaying the dendrogram on the screen and plotting it on a Hewlett Packard plotter (HP-Color Pro Plotter HP7440A, Hewlett Packard, San Diego, CA). The operation of the program is as follows. The user specifies a data file with the extension of .DS1 - extensions of .DES, .SCL, are also permitted. These correspond to files created from earlier versions of A E M U N C H (most notably A E C R U N C H , and a version written by D. A. Boyd in QuickBASIC 3.0 - PATCHAR) and to a prescaled data file. The data file is then loaded in and scaling options are presented. These include auto scaling, and range scaling, as well as allowing the user to define a specific scaling algorithm. It is through the use of the user-defined scaling menu that LINK scaling is achieved. After scaling the data file, the user is first asked if a scaled data file should be saved for future use. Then, the calculations follow. The distance matrix is calculated. All signals are then clustered together successively until all data points (signals) have been joined into one cluster. This involves selecting the closest pair of objects (data 2 4 6 points and/or clusters) and fusing them into one object. The distance matrix is then recalculated and the process continues. The resulting dendrogram is stored in a file (with the extension of .DEN) as a list of object pairs and the distances (or dissimilarities) at which they were joined. This file can then be used by DENDPLOT to generate a graphical display (using an installed enhanced graphics adapter (EGA) card) or to produce a hardcopy. f Star t j 247 Load Data Scale Data - LINK Scaung - Auto Scaung - Rang* Scaung - Normalization I Calculate Distance Matrix Join Closest Points and Clusters Recalculate New Distance Matrix Loop Join All Save File Figure 84) Dendrogram algorithm from DENDGRAM program. * * * * * * * * * * * * * DENDGRAM by Dav id S l b b a l d * * * * * * * * * * * * * * * * * * * * * * * * * W r i t t e n January 14, 1988 a t U n i v e r s i t y o f B r i t i s h Columbia * Update #7 March 1988 Update #8 June 28 , 1988 Update #9 J u l y 4 t h , 1988 Update #9.1 January 5, 1989 9 -Imp l lmenta t lon o f new f i l e formats use o f s i n g l e p r e c i s i o n use o f bubble s o r t d u r i n g the c a l c u l a t i o n (removed i n v . 8) a d d i t i o n o f p l o t t i n g a l g o r i t h m f o r HP C o l o r Pro 7440 Use o f qb v e r s i o n 4 .0 MINDPI i n a l g o r i t h m (see below) c a l c u l a t i o n o f d i s t a n c e m a t r i x t o speed execu t i on a d d i t i o n o f c l a s s i f i e r t o f i l e , a d d i t i o n o f sub rou t i nes FIXCSD and FIXCCD - use o f m a t r i c e s f o r d i s t a n c e s between c l u s t e r s and samples. Menu f o r cho i ces o f SEVEN d i f f e r e n t methods o f c a l c u l a t i n g Dendrogram. Menu f o r s e l e c t i o n o f s c a l i n g f u n c t i o n s . - A b i l i t y t o s c a l e columns i n d i v i d u a l l y . A b i l i t y t o s c a l e columns to meet the v a r i a n c e o f o ther columns. S h i f t i n g and M u l t i p l i c a t i o n . A b i l i t y f o r BATch f i l e c o n t r o l . Use o f new(er) f i l e formats f o r DES f i l e . DS1 and DS2 f i l e s now e x i s t . SC2 f i l e format p a r a l l e l 2 w i t h DS2. - Future p l a n s to a l l o w f o r use o f d i s k as RAM t o I nc rease memory c a p a c i t y o f p r o g . Th is program i s des igned t o l o a d 1n a da ta m a t r i x [ samples , v a r i a b l e s ] f rom a .DES f i l e The format o f the DES f i l e i s as f o l l o w s : numsams% - the number o f samples NUMVARS - the number o f v a r i a b l e s per sample VARNAM$(NUMVARS) - a l i n e a r a r r a y o f s t r i n g s c o n t a i n i n g the d e s c r i p t o r s o f the v a r i a b l e s these a re f o l l o w e d by numsams% rows o f : " i d . . " , " c l s s " , v a r ! ( 1 ) , v a r ! ( 2 ) var ! (numvars%) With v e r s i o n 9 . 1 , i t i s p o s s i b l e t o use the newer d e s c r i p t o r f i l e formats . DS1 F i l e fo rmat . The DS1 f i l e 1s a b i n a r y random a c c e s s f i l e . I t c o n s i s t s o f 4 - b y t e records and has the f o l l o w i n g fo rma t : RECORD DESCRIPTION 1 Con ta i ns the number o f s i g n a l s (n) and number o f d e s c r i p t o r s per s i g n a l (m). Packed as 2 -by te In tege rs In to the 4 - b y t e s . 2 B lank r e c o r d f o r f u t u r e m o d i f i c a t i o n . C u r r e n t l y z e r o . 3 & 4 D e s c r i p t o r name f o r f i r s t d e s c r i p t o r . 8 c h a r a c t e r s packed i n t o two 4 - b y t e r e c o r d s . 5 & 6 Second d e s c r i p t o r . w co 3 2m + 3 I d e n t i f i e r f o r s i g n a l 1 (4 c h a r a c t e r s ) 2m + 4 C l a s s f o r s i g n a l 1 (4 c h a r a c t e r s ) 2m + 5 Time f o r s i g n a l 1 ( s i n g l e p r e c i s i o n ) 2m + 6 E x t r a r e c o r d f o r s i g n a l 1 (4 c h a r a c t e r s ) 2m + 7 F i r s t o f m d e s c r i p t o r s f o r s i g n a l 1 ( s i n g l e p r e c i s i o n ) 3m + 6 m' th d e s c r i p t o r f o r s i g n a l 1 ( s i n g l e p r e c i s i o n ) 3m + 7 I d e n t i f i e r f o r s i g n a l 2 3m + 8 C l a s s f o r s i g n a l 2 (y+l)m+4y+2+x D e s c r i p t o r x f o r s i g n a l y ( s i n g l e p r e c i s i o n ) (y+l)m+4(y- l )+3 I d e n t i f i e r f o r s i g n a l y (y+l)m+4(y- l )+4 C l a s s f o r s i g n a l y (y+l)m+4(y- l )+5 Time f o r s i g n a l y (y+l)m+4(y- l )+6 E x t r a r e c o r d f o r s i g n a l y The a l t e r n a t e ASCII d e s c r i p t o r f i l e has the e x t e n t i o n DS2 and has the same format as the DES f i l e except t ha t the second reco rd o f the DSl f i l e appears as two i n t e g e r s on the second l i n e and tha t the t ime and e x t r a r e c o r d a re I n s e r t e d a f t e r the c l a s s f o r each s i g n a l . The f i l e Is t h e r e f o r e the ASCII coun te rpa r t o f the DSl f i l e . The DS2 f i l e s a re thus comple te and a t r a n s f o r m a t i o n between DSl and DS2 f i l e s are t h e r e f o r e complete w i t h no l o s s o f I n fo rma t i on . SCALING S c a l i n g i s then per formed on the data ma t r i x - A u t o - s c a l i n g : mean-center ing each v a r i a b l e vec to r to have u n i t v a r i a n c e . - Range s c a l i n g : s c a l i n g each v a r i a b l e vec to r t o have va lues from 0 t o +1 Other o p t i o n s such as we igh t ing a re now a v a i l a b l e w i t h the s p e c i a l f u n c t i o n s op t i on In the s c a l e menu. The v a r i a b l e rows can be s h i f t e d . 4 m u l t i p l i e d , ranged between any two v a l u e s ( I n c l u d i n g an I n v e r s i o n about the median o r about the minimum v a l u e . The t ransformed m a t r i x i s s t o r e d as a f i l e In the same format as the o r i g i n a l data f i l e w i t h a one l i n e s t r i n g a t the end g i v i n g some i n fo rma t i on as t o the method o f s c a l i n g . A DSl f i l e hav ing been sca led i s saved as a SC2 f i l e . (NB. A DES f i l e w i l l r e s u l t In an SCL f i l e be ing saved and a DSl f i l e w i l l r e s u l t i n a SC2 f i l e . ) : i = ; ; : : : : : = 3 = = : = = s : : : = = : : : : = 3 s = z : : s s 3 8 S C S s e s e 8 S = : n s S 3 t S B S B C t S 8 E S s : a e t s BATCH FILE c o n t r o l : For BATCH f i l e c o n t r o l , t he command l i n e must be se t up In the f o l l o w i n g format (so t h e r e ! ) : >DENDGRAM / f i l e n a m e /S=scopt /M=method [ /Minkowski va lue ] [ /0=outf11e] where 1) f i l ename 1s the f i l e ( I n c l u d i n g roo t and e x t e n s i o n ) f o r i n p u t . V a l i d e x t e n s i o n s ( S C L . D E S . D S l , 0 S 2 ) 2) scopt Is the s c a l i n g f u n c t i o n d e s i r e d . V a l i d parameters : /S=AUT0 - Auto s c a l i n g /S=RANGE - Range s c a l i n g (0 t o 1) /S=N0 - No s c a l i n g /S=YES - The f i l e 1s taken t o be a l r e a d y s c a l e d . No f u r t h e r s c a l i n g I s done and the SCLES a t the end o f the .SCL f i l e i s r e a d . / S = s c l f i l e - User d e f i n e d s c a l i n g . SCLFILE must be a p r e v i o u s l y saved s c a l i n g format f i l e . 1f no /S= o p t i o n 1s g i v e n , then no s c a l i n g Is per fo rmed. 3) method Is the method f o r c a l c u l a t i n g the DENDROGRAM. V a l i d o p t i o n s : /M=x - x Is an In tege r between 1 and 7 1 - S i n g l e L inkage 2 - Complete L inkage 3 - Weighted Average L inkage 4 - Average L inkage (unweighted) 5 - C e n t r o i d 6 - Median C e n t r o i d to 7 - Ward s Method I f no /M= o p t i o n Is g i v e n then S i n g l e L inkage Is u s e d . * ° 5 4) The Minkowski f a c t o r can be supp l i ed . De fau l t = 2 . ' 5) The output roo t can be s u p p l i e d I f I t i s r e q u i r e d to be d i f f e r e n t . Sample command l i n e f o r a complete l i n k a g e dendrogram from TEST.DES ' i n the a c t i v e d i r e c t o r y , t o be s c a l e d acco rd ing t o LINK.SFF 1n a ' lower d i r e c t o r y , u s i n g a Minkowski f a c t o r o f 1 (Manhattan c i t y block d i s t a n c e s ) and saved i n the root d i r e c t o r y . ' DENDGRAM /TEST /S«=SCALE\LINK.SFF /M«2 / l /0=\TEST S==Z=e==SSr=S83SBS=3aESI3Sa3BBBBBSBSnOBBSSBe9SS = = S S C = S = = S S S S = = = : = : = S = DECLARE SUB ADVSCAL ( M E S S ! ( ) , ID() AS ANY, CLASSO AS ANY, F$ , SCLE$) DECLARE SUB BATCHGUYS (batchCALLX, BATOPTJO) DECLARE SUB CALCDIST (MESSI ( ) , NUMSAMSX, NUMVARSX, SSDISTI ( ) , MAXDSTI) DECLARE FUNCTION clustmemX (CLUSTNUMX, MEMX) DECLARE FUNCTION co lmax l (MAT! ( ) , COLX, s t a r t r o w X , endrowX, COLMIN!) DECLARE SUB DOPT ( F I L E S , methodX, method$() , M i n k o w s k i ! , OUTFILES) DECLARE SUB f l x c s d ( C S D ! ( ) . PRTRX, PTR2X. OMEMX, NMEMX, methodX) DECLARE SUB f l x c c d ( C C D ! ( ) . PTR1X, PTR2X, OMEMX, NMEMX, methodX) DECLARE SUB FRTEXT ( F I L E S , FROOTS, FEXT$) DECLARE FUNCTION MATCH (XX, YX) DECLARE SUB READPARS (NUMSAMSX, NUMVARSX, EXT1X. EXT2X, F$) DECLARE SUB READVALS (VARNAMSO, MESSI ( ) . ID() AS ANY, CLASS() AS ANY, SCALEDS, F$) DECLARE SUB s c a l e c o l (MAT! ( ) , COLX, modeX, v a l u e ! , d i v ! ) DECLARE SUB s c a l e r (MESS! ( ) , ID ( ) AS ANY,CLASS() AS ANY, S C L E S O , FILES) DECLARE SUB s c l f o r l o d (F ILES, s c a l $ , modeX(), a ! ( ) . B ! ( ) ) DECLARE SUB s c l f o r s a v (F ILES, s c a l $ , a X ( ) , B ! ( ) , C ! ( ) ) DECLARE SUB s c l s a v e (MAT! ( ) , ID( ) AS ANY,CLASSO AS ANY.scSO . specX, F$) DECLARE FUNCTION VARIANCE! (MAT! ( ) , COLX, numrowsX, AVE!) * * * FNTXTX r e t u r n s the p r i n t l i n e on screen cor respond ing * * * t o g r a p h i c s l i n e n DEF f n t x t X (N) IF N < 8 THEN f n t x t X = 1 ELSE f n t x t X = (N + 9) * 25 / 350 END DEF CONST I n i t = 3E+38 6 CONST bgrnd = 11 CONST h l gh t = 4 CONST YELLOW = 14 CONST db lue = 9 CONST n t e x t *> 15 CONST l e g e x t S = " . D E S . D S 1 . D S 2 . S C L . S C 2 " TYPE RECRD R AS STRING * 4 END TYPE 1 CLEAR RESTORE DATA 7 , " S i n g l e L i n k a g e " , " C o m p l e t e L i n k a g e " DATA "Average L inkage ( W e i g h t e d ) " . " A v e r a g e L inkage (Unweighted) " DATA " C e n t r o i d " . " W e i g h t e d C e n t r o i d ( M e d i a n ) " , " W a r d ' s Method" READ nummethodsX DIM SHARED method$( l TO nummethodsX) FOR IX • 1 TO nummethodsX READ methodS(IX) NEXT IX REM SDYNAMIC DIM BAT0PT$(5) ' Check t o see 1 f t h i s i s done by r e m o t e - c o n t r o l . (Don ' t l e t my Dad ' have the remote ) . I f the COMMAND l i n e c o n t a i n s pa ramete rs , the ' BATCHYESX w i l l be TRUE and the program can run a u t o m a t i c a l l y . CALL BATCHGUYS(batchyesX, BATOPTSO) 5 COLOR n t e x t , bgrnd CLS IF batchyesX THEN F$ = BAT0PT$(1) FILES •= BATOPTS(l) IF BAT0PT$(2) = " Y E S " THEN mdX = 2 GOSUB 100 IF BAT0PTS(2) = " Y E S " THEN 600 GOTO 500 END IF PRINT S T R I N G $ { 2 2 , " * " ) ; " OEND-GRAM by David S i b b a l d " ; STRING$(30. " * " ) LOCATE 5 , 19 PRINT "1) C a l c u l a t e dendrogram" LOCATE 6, 19 PRINT "2) Show d i r e c t o r y " LOCATE 7. 19 PRINT "3) Draw dendrogram from DEN f i l e " LOCATE 8 . 19 PRINT "4) Jump out o f a moving au tomob i le " LOCATE 11 , 16 PRINT " S e l e c t number"; a$ « INPUT$(1) IF a$ < " 1 " OR a$ > " 4 " GOTO 8 mdX = VAL(a$) ON mdX GOTO 10, 20 , 30. byby LOCATE 13, 9 PRINT " E n t e r da ta f i l e t o c a l c u l a t e dendrogram ( Inc lude e x t e n s i o n ) . . . " LOCATE , 1 0 PRINT " [ Legal ex tens i ons " ; CHR$(238); " {"; q$ = CHR$(34) PRINT Q$; MID$(1egext$, 2 , 3 ) ; Q$; FOR IX - 2 TO LEN( legex tS) / 4 PRINT QS; MID$( legex t$ , ( IX - 1) * 4 + 2 , 3 ) ; Q$ NEXT PRINT "} ] " LOCATE 16, 20 INPUT " F i l e =", FILES IF FILES » " " THEN 5 FILES • UCASES(FILES) CALL FRTEXT(FILES. FROOTS. FEXTS) IF ( INSTR( " .DES.SCL .DS l .DS2 .SC2" ,FEXTS) » 0 ) OR FEXT$ = SOUND 300, 2 GOTO 10 END IF F$ = FROOTS + FILES + FEXTS GOSUB 100 GOTO 500 LOCATE , 1 0 PRINT " D i r e c t o r y t o show [RETURN f o r c u r r e n t d i r e c t o r y ] " LOCATE , 10 INPUT ; a$ s h e l l S = "DIR " + aS + " > TEHPFIL .DIR" SHELL s h e l l S OPEN " I " , #1, "TEMPFIL .D IR" CLS IX = 0 WHILE NOT E0F(1) LINE INPUT #1, aS LOCATE , 2 PRINT aS IX = IX + 1 IF IX = 18 THEN LOCATE 24, 15 COLOR 14 PRINT " P r e s s any k e y " ; COLOR n tex t a$ = INPUT$(1) IX = 0 CLS LOCATE 5, 2 END IF WEND LOCATE 24, 15 COLOR 14 PRINT " P r e s s any k e y " ; COLOR n t e x t a$ = INPUTS(l) CLOSE #1 SHELL "ERASE TEMPFIL.DIR" ^ GOTO 5 Oi LOCATE 13, 8 H 9 PRINT " En te r DEN f i l e to draw dendrogram (no e x t e n s i o n ) " LOCATE 15, 20 INPUT " F i l e « " . FILE$ FILES » UCASE$(FILES) CALL FRTEXT(FILES, FROOTS, FEXTS) FEXTS = " . D E N " F$ • FROOTS + FILES + FEXTS GOSUB 200 GOTO 899 ' * * * * * * Reads da ta from f 1 l e $ i n t o mess! Open f i l e and get the v i t a l In fo rmat ion f o r d imens ion ing v a r i a b l e s . 100 CALL READPARS{NUMSAMSX, NUMVARSX, EXT1X, EXT2X, FS) Dimension v a r i a b l e s t ha t need to be read from f i l e DIM VARNAMS(1 TO NUMVARSX) DIM MESS!(1 TO NUMSAMSX, 1 TO NUMVARSX) DIM CLASS(1 TO NUMSAMSX) AS RECRD DIM ID(1 TO NUMSAMSX) AS RECRD ' Read i n da ta from f i l e . No need to pass NUMSAMS, NUMVARS because they are read by the s u b r o u t i n e . CALL READVALS(VARNAMS(), M E S S ! ( ) , I D ( ) . C L A S S O . SCALEDS, F$) Dimension v a r i a b l e s needed f o r c a l c u l a t i o n o f dendrogram DIM DENDX(1 TO NUMSAMSX - 1, 1 TO 2) DIM DENDD!(1 TO NUMSAMSX - 1) DIM quashva rX( l TO NUMVARSX) RETURN * * * * * * Reads dendrogram In to dendX 200 OPEN " 1 " , #2, F$ INPUT #2, NUMSAMSX, NUMVARSX DIM VARNAMS(1 TO NUMVARSX) DIM MESS!(1 TO NUMSAMSX, 1 TO NUMVARSX) DIM CLASS(1 TO NUMSAMSX) AS RECRD DIM ID(1 TO NUMSAMSX) AS RECRD 10 FOR NX = 1 TO NUMVARSX INPUT #2, VARNAMS(NX) NEXT DIM PL0TX(1 TO NUMSAMSX) ' C o n t a i n s numbers o f samples i n o rde r o f t . t . b DIM 0ENDX(1 TO NUMSAMSX - 1. 1 TO 2) DIM DENDD!(1 TO NUMSAMSX - 1) FOR npX = 1 TO NUMSAMSX - 1 FOR NX = 1 TO 2 INPUT #2, 0ENDX(npX, NX) NEXT INPUT #2, DENDD!(npX) NEXT INPUT #2, MAXDST! FOR npX = 1 TO NUMSAMSX INPUT #2. PLOTX(npX) INPUT #2, ID(PLOTX(npX)) .R INPUT #2, CLASS(PLOTX(npX)J.R NEXT INPUT #2, SCALEDS INPUT #2, methodS(1) methodX • 1 CLOSE #2 RETURN * S c a l e s da ta a c c o r d i n g t o c h o i c e CLS DATA 4 DATA " A u t o - s c a l i n g DATA "Range s c a l e (0 - +1)" DATA "Pe r fo rm no s c a l i n g " . " S p e c i a l f u n c t i o n s . " READ numofsca lesX DIM SCLES(numofscalesX) FOR IX = 1 TO numofsca lesX READ SCLES(IX) w NEXT Ui INF1LES = F$ to 11 OUTFILE$ • FROOTS + FILES + " . D E N " F$ = FROOTS + FILES + " . S C L " CALL s c a l e r ( M E S S ! ( ) . IO( ) . CLASS( ) , S C L E S O , F$) IF NUMSAMSX < 2 THEN GOTO 1 SCALEDS - SCLE$(1) ERASE SCLES * * * * * * * * * * Make Dendrogram by David S i b b a l d * * * * * * * * * * Th i s program 1s des igned t o c a l c u l a t e the j o i n t s and c o r r e l a t i o n s between the s t u p i d p o i n t s In the s c a l e d ma t r i x mess! The dendrogram Is s t o r e d i n a f i l e w i t h the ex tens i on .DEN The a l g o r i t h m works a% f o l l o w s : 0) c a l c u l a t e d i s t a n c e m a t r i x ( D I S T ! ( , ) ) . Th is i s added i n v e r s i o n 8. I t was an element o f the e a r l i e r v e r s i o n s but removed to save computer memory f o r the l a r g e r NUMSAM s i z e s . Now that v e r s i o n 4 . 0 can handle m a t r i c e s o f l a r g e r than 64 K 1t seems w ise to I nc lude DIST! a g a i n . The f u n c t i o n FNDSTSS! has been changed. 1) f i n d s h o r t e s t j o i n t - the c l u s t e r s a re compared to each o the r - the c l u s t e r s a re compared to l oose samples - the l oose samples are compared to each o the r (Th is o rde r Is a r b i t r a r y , the s h o r t e s t f u s i o n i s made r e g a r d l e s s ) 2) se t dendX t o a p p r o p r i a t e v a r i a b l e s based on type o f j o i n t i n 1) -- merge two c l u s t e r s - add sample In to c l u s t e r - j o i n two samples In to new c l u s t e r 3) C a l c u l a t e the new m a t r i c e s f o r the d i s t a n c e s between samples and c l u s t e r s CSOIST! ( ) and between c l u s t e r s CCDIST ! ( ) . 4) repeat 1) to 3) u n t i l the number o f j o i n t s (nnaybs) i s equal t o the number o f samples (numsamsX) minus one (eg. 4 samples == 3 J o i n t s ) 12 5) For output t o the sc reen o r p r i n t e r , the o rde r ( top tobot tom) o f the samples must be known. Th i s i s s t o r e d i n PLOTX. The o rde r w i l l a lways end up In the a r r a y SAMPPTR() s t a r t i n g w i t h element C L S N X ( l ) . See the d i s c u s s i o n i n the CLUSTMEMX f u n c t i o n f o r the d e t a i l s o f t h i s MAGICAL s i d e e f f e c t . 6) Output : The da ta f i l e (.DEN) 1s s t o r e d as d e n d X ( l . l ) , d e n d X ( l , 2 ) . DENOD!(1) DENDX(NUMSAMSX-l . l ) , 0ENDX(NUMSAMSX-1,2). DENDD!(NUMSAMSX-1) P L O T X ( l ) , ID$*4 (1 ) , CLASS$*4(1) PLOTX(NUMSAMSX), ID$*4(NUMSAMSX) ,CLASS$*4(NUMSAMSX) For output t o s c r e e n , each sample 1s a s s i g n e d a p o i n t i n ( x , y ) space. These p o i n t s a r e j o i n e d by a J - t y p e shape f o l l o w i n g the order o f dendX. As a p o i n t i s j o i n e d i t i s a s s i g n e d the new p o s i t i o n a t the c e n t e r o f the c r o s s - l i n e . 600 IF NOT batchyesX THEN LOCATE 25 , 1 PRINT " P r e s s r e t u r n to c o n t i n u e " ; SLEEP: a$ = INKEYS methodX = 1 Mink! = 2 CALL DOPT(INFILE$, methodX, method$() , M i n k ! , OUTFILES) ELSE methodX = VAL(BAT0PT$(3)) Mink ! = VAL(BAT0PT$(5)) END IF CLS PRINT "DENDGRAM f o r . . . " ; INFILES; OFILES - OUTFILES CALL FRTEXT(0FILE$. 0UTR00TS. OUTEXTS) IF OFILES <> FILES THEN PRINT " >==> " ; 0UTR00TS + OFILES + OUTEXTS 13 END IF LOCATE 6, 15: PRINT SCALED? LOCATE 10, 15: PRINT method}(methodX) LOCATE 23, 1 PRINT " C a l c u l a t i n g d i s t a n c e m a t r i x " ; t l m e l j • TIMEJ DIM SSDIST!(1 TO NUMSAMSX, 1 TO NUMSAMSX) CALL CALCDIST(MESS!() , NUMSAMSX, NUMVARSX, S S D I S T ! ( ) , MAXDST!) IF methodX > 4 THEN DIM c l s t v a r ! ( l TO NUMSAMSX \ 2 , 1 TO NUMVARSX) ELSE ERASE MESS! END IF CSDIST! (S ,C) c o n t a i n s the d i s t a n c e between sample S and c l u s t e r C. The m a t r i x Is comp le te l y c a l c u l a t e d (except where the number o f c l u s t e r s Is l e s s than NUMSAMSX / 2 ) . In a d d i t i o n , the f i r s t row (CSDIST! (0 ,X) ) c o n t a i n s the number o f members o f c l u s t e r X . The f i r s t column (CSDIST ! (S ,0 ) ) con ta ins a BOOLEAN va lue f o r [sample S j o i n e d i n t o any c l u s t e r ] < - l • TRUE.O • FALSE>. DIM c s d 1 s t ! ( 0 TO NUMSAMSX, 0 TO NUMSAMSX \ 2) CCDIST! (X,Y) c o n t a i n s the d i s t a n c e m a t r i x between c l u s t e r s X and Y. The m a t r i x i s o n l y h a l f c a l c u l a t e d (because o f symmetry) so f o r the c o r r e c t r e s u l t s X must be l e s s than Y. DIM c c d i s t ! ( l TO NUMSAMSX \ 2 , 1 TO NUMSAMSX \ 2) See the comments on the FUNCTION CLUSTMEMX fo r the d e s c r i p t i o n o f the use o f the a r r a y s CLSNX and SAMPPTRX. DIM c l s n X ( l TO NUMSAMSX / 2) DIM sampptrX(0 TO NUMSAMSX) LOCATE 23 . 1 PRINT " C a l c u l a t i n g Dendrogram nnaybsX » 1 n c l s t s X = 0 WHILE nnaybsX < NUMSAMSX ' * * * Main loop LOCATE 24, 1 14 t$ = " ###/### " PRINT USING t $ ; nnaybsX; NUMSAMSX - 1; mind! = MAXDST! + 1 ' * * * f i n d s h o r t e s t d i s t a n c e between c l u s t e r s FOR nCX = 2 TO n c l s t s X IF c s d i s t ! ( 0 , nCX) = 0 GOTO 610 FOR nYX = 1 TO nCX - 1 IF c s d i s t ! ( 0 , nYX) <> 0 THEN d s t ! = c c d i s t ! ( n Y X , nCX) IF d s t ! < m ind ! THEN mind ! = d s t ! nmin lX = nYX nmin2X = nCX new jo in tX = 1 END IF END IF NEXT 610 NEXT ' * * * f i n d s h o r t e s t d i s t a n c e between c l u s t e r and po in t FOR nCX = 1 TO n c l s t s X IF c s d i s t l ( 0 , nCX) = 0 GOTO 620 FOR NSX * 1 TO NUMSAMSX IF c s d i s t ! ( N S X , 0) » 0 THEN d s t ! = c s d i s t ! ( N S X , nCX) IF d s t ! < m i n d ! THEN newjo in tX = 2 nmin lX * nCX nmin2X = NSX mind ! = d s t ! END IF END IF NEXT 620 NEXT ^ 15 * * * f i n d s h o r t e s t d i s t a n c e between two samples FOR NS% - 2 TO NUMSAMSX IF c s d 1 s t ! ( N S X , 0) - 0 THEN FOR nYX - 1 TO NSX - 1 IF c s d 1 s t ! ( n Y X , 0) « 0 THEN d s t ! = SSDIST!(NSX, nYX) IF d s t ! < nrlnd! THEN mind l » ds t t nmin lX - NSX nmlnZX - nYX newjo in tX « 3 END IF END IF NEXT END IF NEXT * * * Set up new va lues i n dendX 660 ON newjo in tX GOTO 700, 710, 720 * * * Two c l u s t e r s t o be j o i n e d ' NMIN1X = c l u s t e r NUMBER ' NMIN2X = c l u s t e r NUMBER NMIN2X < NMIN1X We need to j o i n the members o f c l u s t e r NMIN2X onto the back o f ' c l u s t e r NMIN2. Th i s Is done by s e t t i n g SAMPPTR() o f the l a s t member o f c l u s t e r NMIN1 t o be equal to the f i r s t member o f c l u s t e r NMIN2 (or ' CLSNX(NMIN2X)). 700 DENDX(nnaybsX, 1) « clustmemX(nm1nlX, 1) DENDX(nnaybsX, 2) = clustmemX(nm1n2X, 1) DENDDI(nnaybsX) - mind! 0MEMX • c s d 1 s t ! ( 0 , nminlX) NMEMX = c s d i s t l ( 0 , nro1n2X) lastsamX • clustmemX(nm1nlX, 0MEMX) sampptrX( lastsamX) • c lsnX(nmin2X) 16 ' Set a p p r o p r i a t e va l ues 1n the " c l u s t e r " v e c t o r o f CS0 IST ! ( ) c s d 1 s t ! ( 0 , nmin2X) = 0 c s d i s t ! ( 0 , nmin lX) « NMEMX + OMEMX GOTO 750 710 ' * * * Add sample t o c l u s t e r ' NMIN1X = c l u s t e r number ' NMIN2X = sample number DEN0X(nnaybsX, 1) = nm1n2X DENDX(nnaybsX, 2) = clustmemX(nm1nlX, 1) DENDO!(nnaybsX) = m i n d ! OMEMX = c s d 1 s t ! ( 0 , nmin lX) NMEMX = 1 lastmemX = clustmemX(nm1nlX, OMEMX) sampptrX(lastmemX) = nm1n2X ' set sample NMIN2X to " u s e d " i n the sample v e c t o r o f CSDIST c s d i s t ! ( n m i n 2 X , 0) = -1 add one member t o c l u s t e r NMIN1X c s d 1 s t ! ( 0 , nmin lX) = OMEMX + 1 ' set NMIN2X to n e g a t i v e f o r p a s s i n g In to Fix 's nmin2X * -nm1n2X GOTO 750 720 ' * * * Two p o i n t s t o make new c l u s t e r DENDX(nnaybsX, 1) » nmin lX DENDX(nnaybsX, 2) = nmin2X DENDDI(nnaybsX) = m i n d ! NMEMX = 1 OMEMX = 1 n c l s t s X « n c l s t s X + 1 c l s n X ( n c l s t s X ) = nmin lX sampptrX(nmin lX) = nmin2X set two samples to " u s e d " and se t the number o f members to TWO in new c l u s t e r . c s d 1 s t ! ( n m i n l X , 0) = -1 to (Ji (Ji 17 csd i s t ! ( nm1n2X , 0) » -1 c s d 1 s t ! ( 0 , n c l s t s X ) » 2 nm1n2X • 0 nmin lX » n c l s t s X 750 CALL f 1 x c s d ( c s d f s t ! ( ) . nmin lX . nmfn2X, OMEMX, NMEMX, methodX) CALL f 1 x c c d ( c c d 1 s t ! ( ) , nmin lX , nmin2X, OMEMX, NMEMX, methodX) nnaybsX 3 nnaybsX + 1 WEND ' Now t ha t dendX Is c a l c u l a t e d , f i g u r e o rder f o r PLOTXIng Th i s i s s imp le because the order o f the samples In CLSNX(l ) IS the ' o r d e r ! ! T h i s l i t t l e t r i c k was f i g u r e d out by David S i b b a l d du r ing a ' p a r t i c u l a r l y bad bought o f mental I n s t a b i l i t y . S e r e n d i p i t o u s l y , an ' a l g o r i t h m based on some o f the d e l i r i u m tremens r e s u l t e d tha t made l i f e much e a s i e r . ERASE c s d i s t ! , c c d i s t l , SSDISTI, c l s t v a r ! , quashvarX DIM P L 0 T X ( 1 T 0 NUMSAMSX) ' Con ta ins # 's o f samples In o rde r o f t . t . b PLOTX(l) - clsnX(l) FOR IX - 2 TO NUMSAMSX PLOTX(IX) - sampptrX(PLOTX(IX - 1)) NEXT CLS PRINT "Dendrogram c a l c u l a t e d . " ERASE c l s n X , sampptrX IF batchyesX THEN F$ - BAT0PT$(4) ELSE F$ = OUTFILES END IF SOUND 2000, 5 OPEN " o " , #4, F$ WRITE #4, NUMSAMSX, NUMVARSX FOR nvX = 1 TO NUMVARSX 18 WRITE #4, VARNAMS(nvX) NEXT FOR ndX » 1 TO NUMSAMSX - 1 PRINT #4, DENDXfndX, 1 ) , DENDX(ndX, 2 ) , DENDD!(ndX) NEXT IF methodX » 7 THEN MAXDST! » DENDD!(NUMSAMSX - 1) PRINT #4, MAXDST! FOR npX = 1 TO NUMSAMSX WRITE #4, PLOTX(npX), ID(PLOTX(npX)) .R, CLASS(PLOTX(npX)) .R NEXT P r i n t o u t s t u f f such as s c a l i n g method, c a l c u l a t i o n method. Minkowski f a c t o r , d a t e , e t c . . WRITE #4, SCALEDS PRINT #4, CHR$(34) + methodS(methodX) + CHR$(34), IF Mink! <> 2 THEN PRINT #4, CHRS(34)+"Minkowski f a c t o r ="+STRS(Mink!)+CHR$(34) ELSE PRINT #4, END IF PRINT #4, " O r i g i n a l Data F i l e » " + FROOTS + FILES + FEXTS PRINT #4, " S t a r t t i m e " , timelS PRINT #4, " F i n i s h e d a t : " , TIMES PRINT #4, DATES CLOSE #4 PRINT " Dendrogram saved as " ; F$ LOCATE 23 , 15 PRINT " P r e s s any key t o v iew g r a p h . . . [ESC to e x i t ] " IF batchyesX THEN GOTO byby a$ = INPUTS(l) IF aS = CHRS(27) THEN GOTO byby Now p lo t dendrogram 899 CLS SHELL "dendp lo t " + F$ The f o l l o w i n g l i n e i s an a l t e r n a t i v e t o the above l i n e . The f i l e ' DENDPLOT.BAS needs t o be i n the c u r r e n t d i r e c t o r y w i t h QuickBASIC i n the pa th . 19 'SHELL "QB /RUN OENDPLOT "+F$ byby: SCREEN 0 END REM $STATIC SUB ADVSCAL ( M E S S ! ( ) . ID() AS RECRD. CLASSO AS RECRD. F$ . s c a l $ ) Th i s r o u t i n e i s c a l l e d when the user dec ides t o p l a y God and mess up the da ta w i t h some w E i r d s c a l i n g . I f t he re i s some knowledge about the da ta then " u s e r d e f i n e d " s c a l i n g i s a good idea but can be d i s a s t r o u s when a p p l i e d i n d i s c r i m i n a t e l y . So t h a t ' s t ha t t hen . ' (Eh , R i c h a r d ? ) The columns a re shown on the screen and the v i t a l s t a t i s t i c s are shown o f each v a r i a b l e . These a re s t o r e d i n a r r a y s which need not ever be accessed aga in so i t 1s prudent t o not d e c l a r e t h i s r o u t i n e as STATIC o r a t l e a s t t o e rase the a r r a y s be fo re c o n t i n u i n g . One BUG : Yes I am r e p o r t i n g a bug t ha t I know to be present but c o u l d not f i x . When s e l e c t i n g the mode, p r e s s i n g the arrow keys 19 ' t imes causes a STRING FORMULA TOO COMPLEX e r r o r on an o therw ise p e r f e c t l y good command. I cou ld d i s c o v e r no obv ious or sec re t reason f o r t h i s excep,t t ha t the e r r o r does not occur when the program i s opera ted from e x e c u t a b l e mode. I f e e l tha t the QUICKBASIC environment i s h o s t i l e t o peop le who REALLY c a n ' t make up t h e i r mind - l i k e S a r a . SHARED NUMVARSX, NUMSAMSX, VARNAM$(), quashvarXO SHARED ba tchyesX, BAT0PT$() SCRNROWSX » 16 TOP = 5 TOPROWX « 1 COLOR n t e x t , bgrnd CURROWX - 1 c u r s t a t X «• 1 s c a l $ • DIM s c v a 1 ! ( l TO NUMVARSX) DIM s c d i v K l TO NUMVARSX) DIM scmodXU TO NUMVARSX) 20 DIM v a r n z ! (1 TO NUMVARSX) DIM AVE!(1 TO NUMVARSX) IF batchyesX GOTO s k i p DIM c o l m i n s ! ( l TO NUMVARSX) DIM c o l m a x s ! ( l TO NUMVARSX) DIM sc (4 ) AS STRING * 15 s c ( l ) = "Auto s c a l e sc (2 ) = "Range s c a l e sc (3 ) = " S h i f t / M u l t i p l y " sc (4 ) = " E r a s e / Remove FOR IX = 1 TO NUMVARSX v a r n z l ( I X ) = VARIANCE! (MESS! ( ) , IX , NUMSAMSX, A V E ! ( I X ) ) IF NOT batchyesX THEN s c v a l l ( I X ) = 0 s c d i v l ( I X ) = IX scmodX(IX) = 1 c o l m a x s ! ( I X ) =co lmax ! (MESS! ( ) , IX ,1 .NUMSAMSX.co lmi n s ! ( I X ) ) END IF NEXT IF batchyesX THEN CALL s c l f o r l o d ( B A T 0 P T $ ( 2 ) , s c a 1 $ . s c m o d X ( ) . s c v a l ! ( ) . s c d i v ! ( ) ) GOTO d o i t END IF p t l $ = " Column it , Mode » " DIM p t2$(4) p t 2 $ ( l ) = " S c a l e mean t o :###.####,Use v a r i a b l e tt v a r i a n c e p t2$(2)=" Range Between Jit I i i i (min) and i l l . l i f t (max)" p t2$(3)=" S h i f t mean t o t i t . i i i i , M u l t i p l y by f i t . t i l l pt2$(4) = " " CLS LOCATE 1, 1 PRINT STRING$(80, " * " ) ; LOCATE 2 , 24 PRINT "SCALING : " ; F$ LOCATE 4 , 1 pt$=" ft | ##.### |##.#### | t t . t f l t " " | I t . t t t t " " I " 21 PRINT " V a r 1 a b l e | Va r i ance | Average | Maximum | Mi numum | Mode" SOUND 400, 1 16 COLOR 1, bgrnd LOCATE TOP, 1 B0TR0W% » T0PR0W% + SCRNROWSX IF BOTROWX > NUMVARSX THEN BOTROWX « NUMVARSX FOR IX » TOPROWX TO BOTROWX PRINT USING p t $ ; IX; v a r n z l ( I X ) ; A V E ! ( I X ) ; c o l m a x s ! ( I X ) ; c o l m l n s ! ( I X ) ; PRINT sc(scmodX( IX) ) NEXT IX LOCATE 25 , 1 . PRINT " P r e s s L t o l o a d 1n saved Format f i l e . ESC to e x e c u t e . " SPACE$(26) ; LOCATE 23 , 1 COLOR n t e x t , bgrnd PRINT " " + VARNAM$(CURROWX); STRING$(40, " " ) ; COLOR YELLOW, h l gh t LOCATE CURROWX + TOP - TOPROWX, 1 PRINT USING p t $ ; CURROWX; varnz!(CURROWX); AVE!(CURROWX); colmaxs!(CURROWX); colmlns!(CURROWX); PRINT sc(scmodX(CURROWX)); 212 COLOR n t e x t , bgrnd LOCATE 24, 1 PRINT USING p t l j ; CURROWX; IF scmodX(CURROWX) - 4 THEN COLOR YELLOW, h lgh t PRINT LEFT$(sc(scmodX(CURROWX)), 5 ) ; IF scmodX(CURROWX) <> 4 THEN LOCATE 24, 27 PRINT USING pt2$(scmodX(CURROWX)); scval !(CURROWX); scdiv! (CURROWX); ELSE COLOR n t e x t , bgrnd PRINT STRING$(54, " " ) ; END IF COLOR YELLOW, h lgh t 22 LOCATE 24, ( c u r s t a t X - 1) * 22 + 22 IF scmodX(CURROWX) < 4 THEN SELECT CASE c u r s t a t X CASE 1 PRINT LEFT$(sc(scmodX(CURROWX)) PRINT LEFT$(sc(scmodX(CURROWX)), CASE 2 PRINT USING "###.####"; scva l ! (CURROWX); CASE 3 IF scmodX(CURROWX) = 1 THEN PRINT USING "##"; scd iv ! (CURROWX); ELSE PRINT USING "###.####"; scd iv ! (CURROWX); END IF END SELECT END IF 1236 a$ = INKEY$ IF a$ « " " GOTO 1236 IF ASC(a$) = 0 THEN a$ = RIGHT$(a$, 1) 1237 SELECT CASE a$ CASE CHR$(77) ' the RIGHT key was p r e s s e d c u r s t a t X = c u r s t a t X + 1 IF c u r s t a t X > 3 THEN c u r s t a t X = 1 GOTO 212 CASE CHR$(75) ' the LEFT key was p r e s s e d c u r s t a t X = c u r s t a t X - 1 IF c u r s t a t X = 0 THEN c u r s t a t X = 3 GOTO 212 CASE CHR$(72) ' the UP key was p r e s s e d CURROWX * CURROWX - 1 IF CURROWX » 0 THEN CURROWX = NUMVARSX IF CURROWX < TOPROWX THEN TOPROWX ' CURROWX IF TOPROWX<CURROWX-SCRNROWSX THEN TOPROWX=CURROWX-SCRNR0WSX CASE CHR$(80) ' t he Oown key was p r e s s e d CURROWX = CURROWX + 1 IF CURROWX > NUMVARSX THEN CURROWX = 1 IF CURROWX > BOTROWX THEN TOPROWX = CURROWX - SCRNROWSX ^ IF TOPROWX > CURROWX THEN TOPROWX = CURROWX 0 0 23 CASE " L " LOCATE 25 , 1 INPUT ; " Inpu t name o f format f i l e t o l oad : " , a$ a$ « UCASE$(a$) CALL FRTEXT(a$, r o o t a j . ex ta$) 61 IF a$ = " " THEN 1236 a$ = roo ta$ + a$ + exta$ CALL s c i f o r i o d ( a $ , s c a l $ , scmod%(), s c v a l ! ( ) , s c d i v ! ( ) ) CASE CHR$(27) GOTO 39 CASE CHR$(13) ' RETURN COLOR n t e x t , bgrnd LOCATE 2 5 , 1 PRINT SPACES(79) ; LOCATE 2 5 , 1 SELECT CASE c u r s t a t X CASE 1 ' Mode s e l e c t o r mdX - scmod%(CURROWX) 58 LOCATE 25, 1 PRINT " S e l e c t new mode - " ; FOR IX = 1 TO 4 PRINT s c ( I X ) ; NEXT IX LOCATE 25, L E N ( s c ( l ) ) * 5 + 5 LOCATE 25, mdX * 15 + 4 COLOR YELLOW, h lght PRINT sc(mdX); COLOR n t e x t , bgrnd a$ = " " 59 a$ » INKEY$ IF a$ = " " GOTO 59 IF a$ = CHR$(13) GOTO 61 aS » RIGHT$(a$, 1) IF a$ = CHR$(75) THEN 'LEFT mdX « mdX - 1 IF mdX « 0 THEN mdX » 4 ELSEIF a$ - CHR$(77) THEN 'RIGHT 24 mdX = mdX + 1 IF mdX = 5 THEN mdX » 1 END IF GOTO 58 IF mdX <> scmodX(CURROWX) THEN ' change In mode IF scmodX(CURROWX) » 0 THEN quashvarX(CURROWX) - 0 scmodX(CURROWX) = mdX IF mdX = 1 THEN scval!(CURROWX) = 0 scdiv!(CURROWX) » CURROWX ELSEIF mdX - 2 THEN scval!(CURROWX) » 0 scdiv!(CURROWX) = 1 ELSEIF mdX = 3 THEN scval!(CURROWX) = 0 scdiv!(CURROWX) - 1 ELSE 'mdX = 4 == Erase v a r i a b l e ENO IF END IF CASE 2 ' v a l u e s e l e c t o r COLOR YELLOW, h l g h t PRINT " Change v a l u e t o : " ; INPUT ; " " , a$ IF scmodX(CURR0WX)«2 THEN 'RANGE s c a l i n g : check MIN=MAX IF VAL(a$) - scdiv!(CURROWX) THEN LOCATE 25 , 1 PRINT"The v a l u e s f o r maximum and minimum must be d i f f e r e n t " ; SOUND 37 , 3 GOTO 1236 END IF END IF scval!(CURROWX) = VAL(a$) CASE 3 ' d1v ! s e l e c t o r COLOR YELLOW, h l gh t IF scmodX{CURROWX) = 1 THEN INPUT ; " S c a l e column to v a r i a n c e o f v a r i a b l e #", a$ 25 dv ! » VAL(a$) IF dv!>0 AND dv ! <= NUMVARSX THEN scdiv!(CURROWX) = dv! ELSEIF scmodX(CURROWX) » 2 THEN INPUT ; " S c a l e maximum to a$ IF VAL(a$) » scval!(CURROWX) THEN LOCATE 25, 1 PRINT SPACE$(79); LOCATE 25, 1 SOUND 90 , 1 PRINT "The va lues f o r Maximum and Mlnumum cannot be ' t he same"; GOTO 1236 ELSE scdiv!(CURROWX) = VAL(a$) END IF ELSE PRINT " Va lue to m u l t i p l y by ="; INPUT ; " " , a$ dv ! «= VAL(a$) IF dv ! <> 0 THEN scdiv!(CURROWX) - dv ! ELSE LOCATE 25, 1 COLOR YELLOW, h lgh t PRINT "Use ERASE/REMOVE to e l i m i n a t e v a r i a b l e from a n a l y s i s " ; SOUND 100, 1 GOTO 1236 END IF END IF CASE ELSE END SELECT CASE ELSE END SELECT a$ « GOTO 16 26 'Now we get to do the s c a l i n g 39 COLOR n t e x t , bgrnd FOR IX = 22 TO 25 LOCATE IX , 1 PRINT STRING$(79, " " ) ; NEXT IX LOCATE 22, 1 PRINT " En te r User-name o f s c a l i n g format : <"; s c a l $ ; " > INPUT a$ IF a$ <> " " THEN s c a l $ = LEFTS(a$ , 20) LOCATE 22, 2 PRINT s c a l $ ; SPACE$(60) ; LOCATE 23, 10 INPUT ; "Save s c a l i n g format ( Y / N ) " ; a$ s a v s c a l $ = LEFT$(UCASE$(a$), 1) IF s a v s c a l $ = " Y " THEN LOCATE 23 , 2 INPUT ; "Name o f User s c a l i n g format f i l e to save : " , f i $ f i $ = UCASE$( f i$ ) CALL F R T E X T ( f i $ , f o r m r t S , formext$) IF f i $ = " " THEN s a v s c a l $ « " N " f1$ = fo rmr t$ + f i $ + formext$ LOCATE 23 , 2 PRINT " S a v i n g S c a l e format f i l e : " ; f i $ ; SPACE$(25) ; CALL s c i f o r s a v ( f 1 $ , s c a l $ , scmodX() , s c v a l ! ( ) , s c d i v ! ( ) ) FOR IX = 23 TO 25 LOCATE IX , 1 PRINT SPACE$(79) ; NEXT END IF LOCATE 25 , 20 COLOR b l u e , h l gh t PRINT "Now S c a l i n g the D a t a " ; d o i t : DIM scprnt$(NUMVARSX) FOR IX » 1 TO NUMVARSX SELECT CASE scmodX(IX) 27 CASE 1 ' au to s c a l i n g v l ! - AVE!(I%) dv ! - v a r n z ! ( s c d 1 v t ( I % ) ) / v a r n z ! ( I X ) mdX » 1 CALL s e a l e c o l ( M E S S ! ( ) , 1%, md%, v l ! , d v ! ) s c p r n t $ ( I X ) « " Auto S c a l e d t o mean «" + STR$(scva l ! (1%)) IF s c d i v l ( I X ) <> IX THEN s c p r n t $ ( I X ) = scp rn t$ ( IX ) + " , and f o r v a r i a n c e o f v a r i a b l e " + S T R J ( s c d i v ! ( I X ) ) END IF CASE Z ' range s c a l i n g CALL s c a 1 e c o l ( M E S S I ( ) , IX, 2 , s c v a l ! ( I X ) , s c d i v ! ( I X ) ) s c p r n t $ ( I X ) = " Range s c a l e d : Min = " + S T R $ ( s c v a l ! ( I X ) ) + " , Max - " s c p r n t $ ( I X ) = s c p r n t J ( I X ) + S T R $ ( s c d i v ! ( I X ) ) CASE 3 mdX « 3 v l ! » s c v a l ! ( I X ) dv ! » 1 / s c d l v l ( I X ) CALL s e a l e c o l ( M E S S ! ( ) . IX, mdX, v l ! , d v ! ) s c p r n t $ ( I X ) = " S h i f t e d by " + STR$(v l ! ) + " and m u l t i p l i e d by " s c p r n t $ ( I X ) » scp rn t$ ( IX ) + STR$(1 / dv ! ) CASE ELSE quashvarX(CURROWX) - 1 s c p r n t $ ( I X ) ** " Removed from a n a l y s i s " END SELECT NEXT IX scp rn t$ (0 ) » s c a l $ IF NOT batchyesX THEN CALL s c l s a v e ( M E S S ! ( ) , ID ( ) , C L A S S O , s c p r n t $ ( ) , 1, F$) END IF END SUB SUB BATCHGUYS (batchCALLX, BAT0PT$()) Th is dumb s t u p i d r o u t i n e explodes the command l i n e i n t o parameters . I f t he re a re the proper number o f nescessary parameters then the 28 ' BATCHCALLX f l a g i s s e t . ' BAT0PT$ c o n t a i n s the f o l l o w i n g : ' BAT0PT$(1) = the f i l e n a m e o f the DES f i l e . I n c l u d i n g root + e x t e n s i o n (2) = the s c a l i n g o p t i o n . AUTO, RANGE, NO, YES o r the name o f the s c a l i n g format f i l e . AUTO 1s the d e f a u l t . (3) = the method o f DENDROGRAM-ATION. An i n t e g e r between 1 and 7. 1 Is the d e f a u l t . (4) = output f i l e n a m e f o r DEN f i l e . The d e f a u l t i s the same name as the i npu t f i l e n a m e but w i t h the p roper e x t e n s i o n . (5) « The Minkowski f a c t o r . The d e f a u l t i s two. BATOPTJ(I) = " " BAT0PT$(2) = "AUTO" BAT0PT$(3) = " 1 " BAT0PT$(4) = " " BAT0PT$(5) = " 2 " C l $ = UCASE$(C0MMAND$) l c X = LEN(C1$) NX ' 1 batchCALLX = INSTR(C1$, " / " ) > 0 Go through command! l o o k i n g f o r ' / ' DO aX - INSTR(NX, C l $ , 7") IF aX = 0 THEN GOTO THERE Check f o r type o f parameter . The f i r s t 2 c h a r a c t e r s a f t e r the / i s the key. I f t he re i s no equal s i g n , then a l e t t e r I n d i c a t e s tha t the parameter i s the Input DES f i l e . A number I n d i c a t e s the d e s i r e d Minkowski f a c t o r . I f t h e r e Is an equal s i g n , then the l e t t e r preceeding I t t e l l s o f the k e y . C$ = MID$(C1$, aX + 2 , 1) IF C$ = " - " THEN C$ = MID$(C1$, aX + 1, 1) SELECT CASE C$ CASE " S " ' S c a l i n g o p t i o n s$ = " " NX = aX + 3 DO s$ = s$ + C$ C$ » MID$(C1S. NX, 1) NX - NX + 1 LOOP UNTIL C$ • " " OR C$ = " / " OR NX > l c X Now check f o r AUTO. RANGE. NO, YES IF INSTRC'AUTO.RANGE.NO.YES", s$) THEN BAT0PT$(2) « s$ ELSE the s t r i n g Is a f i l e n a m e . Check f o r i t s e x i s t e n c e . s$ » UCASES(sS) CALL FRTEXT(s$, s r o o t $ , sex t$) F$ « s r o o t $ + s$ + sex t$ OPEN " R " , #1, F$ le& - L0F(1) CLOSE #1 IF le& - 0 THEN BAT0PT$(2) - "AUTO" KILL F$ ELSE BAT0PT$(2) « F$ END IF END IF CASE " M " C$ = MID$(C1$, aX + 3 , 1) NX = aX + 3 methX « VAL(C$) IF methX >= 1 AND methX <« 7 THEN BAT0PT$(3) CASE " 0 " s$ - " " C$ « " " NX » aX + 3 DO sS - s$ + C$ 30 C$ = MID$(C1$, NX, 1) NX = NX + 1 LOOP UNTIL C$ = " " OR C$ = " / " OR NX > l c X + 1 the s t r i n g Is a f i l e n a m e . s$ = UCASE$(s$) CALL FRTEXT(s$, s r o o t $ . sex t$ ) F$ = s r o o t $ + s$ + " . D E N " BATOPT$(4) = FS CASE ELSE END SELECT ELSE There i s no equal s i g n so the t h i n g 1s e i t h e r a f i l ename or i t i s the Minkowski f a c t o r . NX = aX + 1 s$ = " " C$ = " " DO s$ = s$ + C$ C$ = MID$(C1$, NX. 1) NX « NX + 1 LOOP UNTIL (C$ - " " OR C$ » " / " OR l c X + 1 < NX) v a l s « VAL(s$) IF v a l s = 0 THEN The s t r i n g i s a f i l e n a m e . We need to check f o r the e x t e n s i o n .DES o r . S C L . A l s o i f f i l e e x i s t s . s$ = UCASE$(s$) CALL FRTEXT(s$, FROOT$, FEXT$) IF NOT I N S T R ( " . D E S . S C L . D S l . D S 2 " , F E X T S ) THEN FEXT$=".DES" END IF F$ = FROOTS + s$ + FEXTS OPEN " R " , #1, F$ le& = L O F ( l ) CLOSE #1 IF le& = 0 THEN PRINT "We have a FUCK UP" BATOPTS(l) = F$ 31 ELSE BAT0PT$(5) « s j END IF END IF LOOP UNTIL NX > l c X THERE: IF BAT0PT$(4) » " " THEN There was no Input f o r the output f i l e . There fo re we w i l l g i v e 1t the same as the Input f i l e but w i th the ex tens i on DEN F$ = BAT0PT$(1) CALL FRTEXT(F$. FROOTJ, FEXT$) F$ = FROOT$ + F$ + " . D E N " BAT0PT$(4) = F$ END IF END SUB SUB CALCDIST ( m s s ! ( ) . NUMSAMSX. NUMVARSX, d s t ! ( ) , maximum!) Th is r o u t i n e c a l c u l a t e s the d i s t a n c e m a t r i x The s imp le e u c l l d l a n d i s t a n c e i s c a l c u l a t e d , un less WARD'S method i s used SHARED M i n k ! , methodX, quashvarXO IF methodX <> 7 THEN FOR XX = 2 TO NUMSAMSX FOR YX = 1 TO XX - 1 d t ! = 0 FOR numyX = 1 TO NUMVARSX IF NOT quashvarX(numyX) THEN dt !=dt !+ABS((mss! (XX,numyX)-mss! (YX,numyX)) ) "Mink! END IF NEXT d t ! = d t ! " (1 / M ink ! ) d s t ! ( X X , YX) = d t ! d s t ! ( Y X . XX) = d t ! IF d s t ! ( X X , YX) > maximum! THEN maximum! = d s t ! ( X X , YX) NEXT YX NEXT XX ELSE 32 DIM t e m p v a r ! ( l TO NUMVARSX) FOR XX = 2 TO NUMSAMSX FOR YX = 1 TO XX - 1 e s s ! = 0 FOR IX = 1 TO NUMVARSX tempva r l ( IX ) = (mss ! (XX , IX) + m s s ! ( Y X , IX)) / 2 e s s ! = e s s ! + (mss ! (XX, IX) - t empva r ! ( IX ) ) " 2 e s s ! = e s s ! + (mss ! (YX. IX) - t e m p v a r l ( I X ) ) * 2 NEXT IX d s t ! ( X X , YX) = e s s ! d s t ! ( Y X , XX) « e s s ! NEXT YX NEXT XX maximum! = 1 n i t END IF END SUB FUNCTION clustmemX (CLUSTNUMX, MEMX) ' Th is f u n c t i o n r e t u r n s the MEMX'th member o f c l u s t e r CLUSTNUMX. The c l u s t e r members a re not s t o r e d s p e c i f i c a l l y i n any a r r a y but a re ' coded i n t o SAMPPTRX. CLSNX(X) p o i n t s t o the f i r s t member o f c l u s t e r ' X . The va lue s t o r e d In SAMPPTRX(CLSNX(X)) p o i n t s t o the second ' member. The v a l u e s t o r e d i n SAMPPTRX(SAMPPTRX(CLSNX(X))) p o i n t s t o ' the t h i r d member, e t c . . The SAMPPTRXO a r r a y Is I n i t i a l i z e d to ze ro so any n o n - e x i s t a n t member Is r e t u r n e d as ZERO. For t h i s r e a s o n , the a r ray SAMPPTRXO Is d imens ioned w i t h a ZERO' th element even though i t i s not used s p e c i f i c a l l y . SHARED c s d 1 s t ! ( ) , c l s n X ( ) . samppt rXO IF MEMX > c s d i s t ! ( 0 , CLUSTNUMX) THEN clstmmX = 0 ELSE clstmmX = clsnX(CLUSTNUMX) FOR IX = 2 TO MEMX clstmmX = sampptrX(clstmmX) NEXT IX clustmemX = clstmmX END IF 33 END FUNCTION FUNCTION co lmax l (HAT!(). COLX, s t a r t r owX , endrowX, COLMIN!) r e t u r n s the maximum value of the column COLX i n ma t r i x MAT! from ' s t a r t r owX t o ENDROWX. COLMIN! Is re tu rned as the minimum va lue o f the co lumn. cmax! = I n i t COLMIN! » cmax! FOR nrX = s t a r t r o w X TO endrowX v l ! • MAT! (n rX , COLX) IF v l ! < COLMIN! OR COLMIN! « I n i t THEN COLMIN! = v l ! IF v l ! > cmax! OR cmax! • 1n1t THEN cmax! = v l I NEXT nrX c o l max! • cmax! END FUNCTION SUB DOPT U N F I L E S . methodX, methodSO, M i n k o w s k i ! , 0UTFILE$) Th i s r o u t i n e selects the method for DENDGRAM. The Minkowski f a c t o r can a l s o be changed here. The output f i le can a l s o be m o d i f i e d here . methmaxX =• UBOUND (method*) COLOR YELLOW, 3 pn t : CLS PRINT STRING$(80, " * " ) COLOR 10 LOCATE 18, 5 PRINT "Use arrow keys to choose , C to change Minkowski f a c t o r " LOCATE 19, 5 PRINT " ? f o r a b r i e f summary o f the d i f f e r e n t t e c h n i q u e s , ESC to s e l e c t . " p r n t : COLOR n t e x t , bgrnd LOCATE 3 . 20 PRINT "DENDROGRAM for INFILE$ LOCATE 4 , 20 PRINT "Output f i le : " ; OUTFILES FOR IX - 1 TO methmaxX LOCATE IX + 6, 20 34 PRINT LEFT$(STR$(IX) + " : " + method$(IX) + SPACE$(60) , 60) NEXT LOCATE 15, 30 IF methodX <> 7 THEN PRINT "Minkowsk i F a c t o r : " ; M i n k o w s k i ! ; SPACE$(10) ; ELSE PRINT SPC(69 ) ; END IF COLOR n t e x t , h l gh t IF methodX THEN LOCATE methodX + 6. 20 0UT$=LEFT$(STR$(METHODX) + " : " +METH0D$(METH0DX)+SPACE$(40) PRINT LEFT$(0UT$(40) ELSE LOCATE 4 . 37 PRINT OUTFILES; END IF COLOR 9 , bgrnd LOCATE 2 1 . 10 PRINT SPACES(65) LOCATE 2 1 . 10 1234 aS - INKEYS IF a$ = " " GOTO 1234 IF ASC(a$) - 0 THEN a$ = RIGHT$(a$, 1) 1235 SELECT CASE a$ CASE CHR$(77) ' the RIGHT key was p ressed CASE CHR$(75) ' the LEFT key was p ressed CASE CHR$(13) ' t he RETURN key IF methodX THEN 1234 LOCATE 5, 10 COLOR 0 INPUT "Name o f DEN f i l e t o save : " , a$ a$ = UCASE$(a$) CALL FRTEXT(a$, r o o t a j , ex ta$) IF a$ <> " " THEN OUTFILES = rootaS + aS + " . D E N " COLOR n t e x t X o> LOCATE 5, 10 . 35 PRINT SPACES(69) ; LOCATE 4 , 37 PRINT SPACE$(40) ; CASE CHR$(72) ' the UP key was p ressed methodX • methodX - 1 IF methodX = -1 THEN methodX • methmaxX CASE CHR$(80) ' the Down key was p ressed methodX » methodX + 1 IF methodX > methmaxX THEN methodX = 1 CASE " 1 " TO RI6HT$(STRJ(methmaxX). 1) methodX - VAL(a$) CASE CHR$(27) IF methodX GOTO bye CASE " ? " COLOR n t e x t , 3 CLS PRINT "The e x p l a n a t i o n o f the va r i ous methods can be found 1n the a r t i c l e by Ken" PRINT "Bu r ton In the Chemometrics notes from B r i s t o l ( r e f M.SC. T h e s i s by" PRINT " Dav id S l b b a l d , UBC Chemis t ry Dept. 1990). B r i e f l y , the d i f f e r e n c e I n " PRINT " the methods i s i n the way i n which the d i s t a n c e between c l u s t e r s Is c a l c u l a t e d . " LOCATE 5 , 19 PRINT" - D i s t ance taken as s h o r t e s t d i s t a n c e between any member o f " PRINT " the c l u s t e r . " LOCATE 7, 21 PRINT " - D i s t ance taken as LONGEST d i s t a n c e between any members" PRINT " o f c l u s t e r . " LOCATE 9 , 21 PRINT " - When a j o i n Is made, the d i s t a n c e to the new c l u s t e r I s " PRINT " taken to be the weighted average o f the d i s t a n c to the o r i g i n a l components." 36 LOCATE 11 , 31 PRINT " - Same as 3 above but the d i s t a n c e t o a new" PRINT " c l u s t e r i s taken as h a l f t he sum o f the d i s t a n c e s to the p r e v i o u s components . " LOCATE 13, 13 PRINT " - The a c t u a l " ; CHR$(34) ; " l o c a t i o n " ; CHR$(34); " o f the c l u s t e r i s f o u n d " PRINT " i n the v a r i a b l e - s p a c e . D i s t a n c e s a re c a l c u l a t e d to t h i s p o i n t . " LOCATE 15, 35 PRINT " - The c e n t r o i d 1s c a l c u l a t e d to be a t t h e " PRINT " h a l f way p o i n t between the two p r e v i o u s components . " LOCATE 17, 18 PRINT " - The d i s t a n c e s a re c a l c u l a t e d as the r e s u l t i n g l o s s o f i n f o " PRINT " i f an a r b i t r a r y j o i n i s made. The E r r o r o f Sum o f Squares (ESS) i s the sum" PRINT " o f the sqares o f the r e s i d u a l s o f each v a r i a b l e 1n the p o s s i b l e c e n t r o i d . " LOCATE 20 , 17 PRINT " - The d i s t a n c e i s c a l c u l a t e d as t h e " ; PRINT S P C ( 5 ) ; " r o o t o f the sum " PRINT " o f t h e " ; PRINT S P C ( 5 ) ; PRINT "power o f the d i f f e r e n c e s i n each v a r i a b l e . " PRINT " For example, the E u c l i d e a n d i s t a n c e i s g i ven when"; PRINT S P C ( 3 ) ; " e q u a l s 2 . 0 . " PRINT " The Manhattan C i t y B lock d i s t a n c e i s g i ven when" ; PRINT S P C ( 3 ) ; " e q u a l s 1 . 0 . " ; COLOR 12, 7 LOCATE 5 , 1: PRINT "1 - S i n g l e L i n k a g e " ; LOCATE 7, 1: PRINT "2 - Complete L i n k a g e " ; LOCATE 9 , 1: PRINT " 3 - Weighted A v e r a g e " ; LOCATE 11, 1: PRINT "4 - Unweighted Average L i n k a g e " ; LOCATE 13, 1: PRINT " 5 - C e n t r o i d " ; 37 LOCATE 15. 1: PRINT "6 - Growers Method (Mean C e n t r o i d ) " ; LOCATE 17, 1: PRINT "7 - Ward 's Method" LOCATE 20 . 1: PRINT "Minkowski f a c t o r " ; COLOR YELLOW, 3 LOCATE 20 , 53 : PRINT " r t h " LOCATE 2 1 , 11 : PRINT " r t h " LOCATE 22 . 54: PRINT " r" LOCATE 2 3 , 52 : PRINT " r"; LOCATE 2 5 , 1 INPUT ; " c o n t i n u e . . . " , a$ CLS GOTO pnt CASE " c " , " C " IF method% < methmaxX THEN LOCATE 23 , 10 INPUT " E n t e r Minkowski f a c t o r " ; a$ v l ! - VAL(a$) IF v l ! » 0 THEN M inkowsk i ! = 2 ELSE M inkowsk i ! - v l ! END IF LOCATE 23 . 10 PRINT SPACE$(65) END IF CASE ELSE END SELECT GOTO p r n t : bye: CLS END SUB SUB f i x c c d ( C C D ! ( ) , PTR1%, PTR2X, OMEMX, NMEMX, methodX) Th is sub rou t i ne has a f u n c t i o n . Okay, here i t i s . . . The c l u s t e r - c l u s t e r d i s t a n c e ma t r i x needs to be f i x e d now tha t a new f u s i o n has been made. Th i s guy does i t . The parameters pass are s i m i l a r t o those i n the o ther r o u t i n e s (FIXCSD). CCD! -The c l u s t e r - c l u s t e r d i s t a n c e mat r i x 38 PTR1X The c l u s t e r be ing added t o PTR2X The c l u s t e r be ing absorbed OMEMX The number o f members In the o r i g i n a l c l u s t e r NMEMX The number o f members o f the newly formed c l u s t e r . The number added Is o b v i o u s l y NMEMX-OMEMX SHARED c s d 1 s t ! ( ) , M E S S ! ( ) , n c l s t s X , c l s n X ( ) , NUMVARSX, NUMSAMSX SHARED M i n k ! , c l s t v a r ! ( ) , sampptrXO, quashvarXO SELECT CASE methodX CASE 1, 2 FOR c l X = 1 TO n c l s t s X IF c s d 1 s t ! ( 0 , c l X ) <> 0 AND c l X <> PTR1X THEN c l l X - c l X c l 2 X = PTR1X IF c l l X > c l 2 X THEN SWAP c l l X , c l 2 X IF PTR2X > 0 THEN IF PTR2X < c l X THEN d s t 2 ! = CCD!(PTR2X, c l X ) ELSE d s t 2 ! = C C D ! ( c l X , PTR2X) END IF d s t l ! = C C D ! ( c l l X , c l 2 X ) ELSEIF PTR2X = 0 THEN d s t l ! « c s d 1 s t ! ( c l u s t m e m X ( P T R l X , 1 ) , c l X ) d s t 2 ! = c s d 1 s t ! ( c l u s t m e m X ( P T R l X , 2 ) , c l X ) ELSE d s t 2 ! = c s d i s t ! ( - P T R 2 X , c l X ) dstl! = C C D ! ( c l l X , c l 2 X ) END IF IF (methodX = 1) XOR (dstl! < d s t 2 ! ) THEN C C D ! ( c l l X . c l 2 X ) = d s t 2 ! ELSE C C D ! ( c l l X , c l 2 X ) = dstl! END IF END IF NEXT c l X ^ CT\ CASE 3 , 4 n f a c l » . 5 : o f ac t • .5 IF methodX = 3 THEN n f a c ! = NMEMX / (NMEMX + OMEMX) o f a c ! » OMEMX / (NMEMX + OMEMX) END IF FOR c l X = 1 TO n c l s t s X IF c s d 1 s t ! ( 0 , c l X ) <> 0 AND c l X <> PTR1X THEN c l l X « c l X c l 2 X - PTR1X IF c l l X > c l 2 X THEN SWAP c l l X . c l 2 X IF PTR2X > 0 THEN d s t l ! = C C D ! ( c l l X , c l 2 X ) IF c l X < PTR2X THEN d s t 2 ! = C C D ! ( c l X , PTR2X) ELSE d s t 2 ! = CCD!(PTR2X. c l X ) END IF ELSEIF PTR2X « 0 THEN d s t l ! » c s d i s t ! ( c l u s t m e m X ( P T R l X , 1 ) , c l X ) d s t 2 ! » csd1s t ! ( c lus tmemX(PTRlX , 2 ) , c l X ) ELSE d s t l ! = C C D ! ( c l l X . c l 2 X ) d s t 2 ! - c s d l s t ! ( - P T R 2 X , c l X ) END IF C C D ! ( c l l X , c l 2X ) • d s t l ! * o f a c ! + d s t 2 ! * n fac END IF NEXT c l X CASE 5 , 6 FOR c l X » 1 TO n c l s t s X IF c l X <> PTR1X AND c s d 1 s t ! ( 0 , c l X ) <> 0 THEN d s t ! - 0 FOR IX - 1 TO NUMVARSX IF NOT quashvarX(IX) THEN d s t ! = d s t ! + A B S ( ( c l s t v a r ! ( c l X , I X ) -c l s t v a r ! ( P T R l X , I X ) ) ) " M i n k ! 40 END IF NEXT IX d s t ! = ( d s t ! ) * (1 / M i n k ! ) IF c l X < PTR1X THEN C C D ! ( c l X , PTR1X) = d s t ! ELSE CCD!(PTR1X, c l X ) « d s t ! END IF END IF NEXT c l X CASE 7 DIM t e m p v a r ! ( l TO NUMVARSX) FOR c l X = 1 TO n c l s t s X IF c s d i s t ! ( 0 , c l X ) <> 0 AND c l X <> PTR1X THEN e s s ! = 0 c f c ! = c s d i s t ! ( 0 , c l X ) / ( c s d 1 s t ! ( 0 , c l X ) + c s d i s t ! ( 0 , P T R l X ) ) p f c ! = c s d i s t ! ( 0 , P T R l X ) / ( c s d i s t ! ( 0 . c l X ) + c s d i s t ! ( 0 , P T R l X ) ) FOR IX « 1 TO NUMVARSX IF NOT quashvarX( IX ) THEN t empva r l ( IX ) » c l s t v a r ! ( c l X . IX) * c f c ! t empvar ! ( IX ) - c l s t v a r ! ( P T R 1 X , I X ) * p f c ! + t e m p v a r ! ( I X ) END IF NEXT FOR MEMX = 1 TO c s d 1 s t ! ( 0 . c l X ) curmemX = c l us tmemX(c lX , MEMX) FOR IX = 1 TO NUMVARSX IF NOT quashvarX( IX ) THEN ess!=ess!+(MESS»(curmem%, IX) - t empva r ! ( IX ) ) " 2 END IF NEXT NEXT MEMX FOR MEMX = 1 TO c s d i s t ! ( 0 , PTR1X) curmemX - c lus tmemX(PTRlX. MEMX) FOR IX = 1 TO NUMVARSX IF NOT quashvarX( IX ) THEN e s s ! = e s s ! + (MESS! (cu rmemX, IX ) - tempvar ! ( IX ) ) "2 41 END IF NEXT IX NEXT MEMX IF c l X < PTR1X THEN CCD! (c1X, PTR1X) • e s s ! ELSE CCD!(PTR1X, c l X ) » e s s ! END IF END IF NEXT clX ERASE tempvar ! END SELECT END SUB SUB f i x c s d ( C S D ! ( ) , PTR1X, PTR2X, OMEMX. NMEMX, methodX) S i m i l a r to FIXCCD, but f i x e s the c l u s t e r - s a m p l e m a t r i x . Only the elements a c t u a l l y a f f e c t e d need be changed (acco rd ing to METHODX) SHARED M E S S ! ( ) , S S D I S T ! ( ) , n c l s t s X . c l s n X ( ) , NUMVARSX. NUMSAMSX SHARED c lstvarlO, M i n k ! , sampptrXO, quashvarX() n f a c ! • . 5 : ofac! » .5 IF methodX •» 3 OR methodX = 5 THEN n f a c ! = NMEMX / (NMEMX + OMEMX) o f a c ! • OMEMX / (NMEMX + OMEMX) END IF IF methodX > 4 AND methodX < 8 THEN FOR IX = 1 TO NUMVARSX IF NOT quashvarX( IX) THEN IF PTR2X = 0 THEN ' j o i n two p o i n t s to make new c l u s t e r v a r l ! = MESS!(clustmemX(PTRlX, 1 ) . IX) v a r 2 ! « MESS!(clustmemXfPTRIX, 2 ) , IX) ELSEIF PTR2X < 0 THEN v a r 2 ! = MESSI(-PTR2X, IX) v a r l ! « c l s t v a r ! ( P T R l X , IX) ELSE v a r l ! * c l s t v a r ! ( P T R l X , IX) v a r 2 l * c l s t v a r ! ( P T R 2 X , IX) END IF 42 v a r l ! = v a r l ! * o f a c ! v a r 2 ! = v a r 2 ! * n f a c ! c l s t v a r ! ( P T R l X , IX) - v a r l ! + v a r 2 ! END IF NEXT -END IF c l s t v a r ! i s now c o r r e c t f o r t he new j o i n . C a l c u l a t e d i s t a n c e s between a l l s i n g l e p o i n t s and the new c l u s t e r . SELECT CASE methodX CASE 1. 2 FOR SAMPX = 1 TO NUMSAMSX IF CSD!(SAMPX, 0) - 0 THEN IF PTR2X > 0 THEN d s t l ! = CSD!(SAMPX. PTR1X) d s t 2 ! •= CSD! (SAMPX. PTR2X) ELSEIF PTR2X = 0 THEN d s t l ! = SSDISTKSAMPX, c lus tmemX(PTRlX, 1)) d s t 2 ! = SSDIST!(SAMPX, c lustmemXfPTRIX, 2) ) ELSE d s t l ! = CSD!(SAMPX, PTR1X) d s t 2 ! « SSDIST!(SAMPX, -PTR2X) END IF IF (methodX - 1) XOR ( d s t l ! < d s t 2 ! ) THEN CSD!(SAMPX, PTR1X) - d s t 2 ! ELSE CSD!(SAMPX, PTR1X) - d s t l ! END IF END IF NEXT SAMPX CASE 3 , 4 FOR SAMPX » 1 TO NUMSAMSX IF CSD!(SAMPX, 0) = 0 THEN IF PTR2X > 0 THEN d s t l ! = CSD!(SAMPX, PTR1X) d s t 2 ! = CSD!(SAMPX, PTR2X) ELSEIF PTR2X = 0 THEN M CO 43 d s t l ! •= SSDIST!(SAMPX, c lustmemX(PTRlX. 1)) d s t 2 ! = SSOIST!(SAMP%, clustmemX(PTRlX. 2)) ELSE d s t l ! = CSO!(SAMPX, PTR1X) IF SAMPX < -PTR2X THEN d s t 2 ! = SSD1ST!(SAMPX, -PTR2X) ELSE d s t 2 ! » SSDIST!( -PTR2X, SAMPX) END IF END IF CSD!(SAMPX, PTR1X) » d s t l ! * o f a c ! + d s t 2 ! * n f a c ! END IF NEXT SAMPX CASE 5, 6 FOR SAMPX - 1 TO NUMSAMSX IF CSD!(SAMPX, 0) = 0 THEN d s t ! = 0 FOR IX = 1 TO NUMVARSX IF NOT quashvarX( IX) THEN ds t !=ds t !+ABS((MESS! (SAMPX, IX) -c l s t v a r ! ( P T R l X , I X ) ) ) " M i n k ! END IF NEXT IX CSD!(SAMPX, PTR1X) = d s t ! " (1 / M ink ! ) END IF NEXT SAMPX CASE 7 DIM t e m p v a r ! ( l TO NUMVARSX) FOR SAMPX = 1 TO NUMSAMSX IF CSD!(SAMPX, 0) = 0 THEN ess I « 0 FOR IX » 1 TO NUMVARSX IF NOT quashvarX(IX) THEN tempvar l ( IX ) - c l s t v a r ! ( P T R 1 X , IX) * o f a c ! tempvar ! ( IX) « tempvar ! ( IX)+MESS!(SAMPX, IX)*n fac ! 44 e s s ! = e s s ! + (MESS!(SAMPX, IX) - t e m p v a r ! ( I X ) ) " 2 END IF NEXT IX FOR MEMX = 1 TO C S 0 ! ( 0 , PTR1X) FOR IX » 1 TO NUMVARSX IF NOT quashvarX( IX ) THEN curmemX = cIustmemX(PTRlX. MEMX) e s s ! = e s s ! + (MESS! (cu rmemX, IX ) - tempvar ! ( IX ) ) "2 END IF NEXT IX NEXT MEMX CSD!(SAMPX, PTR1X) - e s s ! END IF NEXT SAMPX ERASE tempvar! CASE ELSE e r r o r END SELECT END SUB SUB FRTEXT ( F I L E S . FROOTS, FEXTS) STATIC It i s t h i s r o u t i n e s f u n c t i o n t o break a f i l e name i n t o r o o t , f i l e , and e x t e n s i o n . f i l e S i s the i npu t s t r i n g . FILES c o n t a i n s f i l ename on r e t u r n (maximum 8 c h a r a c t e r s ) FROOTS c o n t a i n s pa th i n f o r m a t i o n FEXTS con ta i ns e x t e n s i o n Loop to remove spaces from f i l e name LENGTHX = LEN(FILES) F$ = FILES FILES = " " FOR IX = 1 TO LENGTHX IF MID$(F$, IX, 1) <> " " THEN FILES = FILES + MID$(F$ , IX ,1 ) NEXT IX FROOTS - " " FEXTS = " " £j aX « INSTR(FILES. " . " ) vo 45 IF a% = 0 GOTO nroot ' no " . " means no ex tens ion IF aX * LENGTHX THEN FEXT$ = " M FILES • LEFTS(F ILES. LENGTHX - 1) LENGTHX « LEN(FILES) GOTO n roo t END IF FEXTS = MID$(FILE$. a%. 4) FILES » LEFTS(F ILES, aX - 1) e x t e n s i o n taken c a r e o f , now peel o f f root n r o o t : , LENGTHX - LEN(FILES) aX = INSTR(FILES. " \ " ) IF aX = LENGTHX THEN FROOTS = FROOTS + FILES FILES -GOTO f i n END IF IF aX - 0 GOTO f i n FROOTS » FROOTS + LEFT$(FILE$, aX) FILES - RIGHTS!FILES. LENGTHX - aX) GOTO nroot f i n : END SUB FUNCTION MATCH (XX. YX) MATCH t e l l s us 1f sample xX i s one o f the samples i n node YX SHARED DENDXO IF YX > 0 THEN MATCH « (XX « DEN0X(YX, 1) OR XX » DENDX(YX, 2)) END IF END FUNCTION SUB READPARS (NUMSAMSX, NUMVARSX, EXTIX, EXT2X, FS) Th is r o u t i n e 1s the f i r s t h a l f o f the f i l e read ing s e c t i o n . The second h a l f Is c a l l e d READVALS. Th i s s e c t i o n reads i n the two parameters a t the head o f the f i l e FS (FS Is opened acco rd ing to ' FTYPEX). NUMSAMSX Is the number o f s i g n a l s recorded i n the f i l e and 46 NUMVARS i s the number o f d e s c r i p t o r s per s i g n a l . EXT1 and EXT2 a re not c u r r e n t l y used but a re I nc luded f o r f u t u r e use . FTYPEX = (INSTR("DESDS1DS2SCLAFAAF2", RIGHTS(F$, 3 ) ) - 1) / 3 + SELECT CASE FTYPEX CASE 1, 4 , 5 OPEN " i " , #1. F$ INPUT #1, NUMSAMSX INPUT #1, NUMVARSX EXT1X = 0 EXT2X = 0 CLOSE #1 CASE 2 OPEN F$ FOR RANDOM ACCESS READ AS #1 LEN FIELD #1, 4 AS REC1S GET #1 TMP$ * LEFT$(REC1$, 2) NUMSAMSX = CVI(TMPS) TMP$ « RIGHT$(REC1$, 2) NUMVARSX = CVI(TMP$) GET #1 TMP$ = REC1S EXT1X = &H7 AND ASC(RIGHTS(TMP$ EXT2X = CVI(LEFTS(TMP$. 2 ) ) CLOSE #1 CASE 3 , 6 OPEN " 1 " , #1. F$ INPUT #1, NUMSAMSX INPUT #1, NUMVARSX INPUT #1, EXT2X, EXT1X CLOSE #1 END SELECT GET the f i r s t r e c o r d Parse out the 1ST 2 by tes Conver t t o # o f s i g n a l s Parse out l a s t 2 by tes Conver t # o f d e s c r i p t o r s get the e x t r a r e c o r d conve r t In f u t u r e * * * * * * * D) END SUB SUB READVALS (VARNAMSO, M E S S ! ( ) , ID() AS RECRD, CLASS() AS RECRD, SCALEDS, F$) 47 Th is r o u t i n e 1s the second h a l f o f the read f i l e sub program. The r o u t i n e Is broken 1n two ha lves 1n o rde r f o r OIMensioning i n the main module. DIM EXTRA AS RECRD FTYPEX = (INSTR("DESDS1DS2SCLAFAAF2",R16HT$(F$,3)) - 1) / 3 + 1 SELECT CASE FTYPEX CASE 5 STOP ' NOT PRESENT IN THIS VERSION ' TALK TO DAVE - OR USE ABSCAT TO CREATE DS2 FILE CASE 1, 4 OPEN " I " . #1, F$ INPUT #1. NUMSAMSX, NUMVARSX FOR NX = 1 TO NUMVARSX INPUT #1, VARNAM$(NX) NEXT FOR nxX - 1 TO NUMSAMSX INPUT #1, ID(nxX).R INPUT #1. CLASS(nxX).R FOR nYX • 1 TO NUMVARSX INPUT #1, MESS!(nxX, nYX) NEXT NEXT IF FTYPEX » 4 THEN INPUT #1, SCALEDS CLOSE #1 CASE 2 OPEN F$ FOR RANDOM ACCESS READ AS #1 LEN = 4 FIELD #1, 4 AS REGIS GET #1 TMP$ = LEFT$(REC1$, 2) NUMSAMSX « CVI(TMPS) TMP$ = RIGHT$(REC1S, 2) NUMVARSX = CVI(TMPS) GET #1 TMP$ » REC1S EXT1X » &H7 AND ASC(RIGHT$(TMP$, 1)) EXT2X - CVI(LEFT$(TMP$, 2 ) ) GET the f i r s t r eco rd Parse out the 1ST 2 by tes Convert to I o f s i g n a l s Parse out l a s t two bytes Convert # o f d e s c r i p t o r s get the e x t r a reco rd conver t i n f u t u r e * * * * * * 48 IF EXT1X > 0 THEN GET #1 ' s k i p e x t r a r e c o r d name FOR NX = 1 TO EXT2X GET #1 ' s k i p extended r e c o r d s GET #1 NEXT Get the d e s c r i p t o r names FOR IX = 1 TO NUMVARSX GET #1 TMP$ = REClS GET #1 VARNAMS(IX) * TMP$ + REC1J NEXT IX S igna l / d e s c r i p t o r i n f o r m a t i o n FOR IX = 1 TO NUMSAMSX GET #1 10(1%)-R = REClS GET #1 CLASS( IX) .R = REClS GET #1 GET #1 FOR JX » 1 TO EXT2X GET #1 NEXT FOR JX = 1 TO NUMVARSX GET #1 MESS! ( IX , JX) ' CVS(RECIS) NEXT NEXT CLOSE #1 ' s i g n a l ID s i g n a l c l a s s t ime o f s i g n a l e x t r a r e c o r d s k i p extended reco rds CASE 3 OPEN " I " , #1, F$ INPUT #1. NUMSAMSX, NUMVARSX INPUT #1, EXT2X, EXT1X IF EXT1X <> 0 THEN INPUT #1, EXT$ ' read e x t r a r e c o r d name FOR NX = 1 TO EXT2X 49 INPUT #1, EXNM$ ' read extended reco rd names NEXT FOR NX « 1 TO NUMVARSX INPUT #1, VARNAMS(NX) NEXT FOR nxX ' 1 TO NUMSAMSX INPUT #1, ID(nxX).R INPUT #1. CLASS(nxX).R INPUT #1, TIMEI INPUT #1, EXTRA.R FOR NX - 1 TO EXT2X INPUT #1, EXTENS NEXT FOR nYX - 1 TO NUMVARSX INPUT #1, MESS!(nxX, nYX) NEXT NEXT CLOSE #1 CASE 6 OPEN " I " , #1, F$ INPUT #1, NUMSAMSX. NUMVARSX INPUT #1, EXT2X. EXTiX IF EXTIX <> 0 THEN INPUT #1, EXRS FOR NX = 1 TO EXT2X INPUT #1, EXTNMS NEXT FOR IX <• 1 TO NUMSAMSX INPUT #1, ID( IX) .R INPUT #1, CLASS(IX) .R INPUT #1, TIME! INPUT #1. EXTRA.R FOR NX » 1 TO EXT2X INPUT #1, EXTENS NEXT FOR JX » 1 TO NUMVARSX INPUT #1, MESS!( IX. JX) NEXT 50 NEXT CASE ELSE 'BLOWUP END SELECT END SUB SUB s c a l e c o l (MAT! ( ) , COLX, modeX, v a l u e ! , d i v ! ) STATIC Th is r o u t i n e per forms the s c a l i n g on the C O L X ' t h column o f MAT! acco rd ing to the va lue o f MODEX and ( i f n e s c e s s a r y the) VALUE! . MODEX 1 The column i s a u t o s c a l e d such tha t the mean o f the column i s VALUE! and the t o t a l v a r i a n c e equa l s D IV ! . For t r ue a u t o s c a l i n g , the mean s h o u l d be s c a l e d to ZERO and the v a l u e o f DIV! shou ld be ONE. To s c a l e a column such tha t i t has the same v a r i a n c e r e l a t i v e t o a d i f f e r e n t co lumn, DIV! shou ld equal the q u o t i e n t o f the v a r i a n c e s . DIV! = VARIANCEfof o t h e r column) / VARIANCE(of COLXumn) 2 The column i s range s c a l e d such t ha t the column maximum i s VALUE! and the minumum v a l u e i s D I V ! . For range s c a l i n g 0 - 1 , VALUE!=1 and DIV! = 0 . NOTE: I t does not mat te r t ha t VALUE! be l a r g e r than DIV! except t h a t the s c a l e d va l ues w i l l a l l be i n v e r t e d about the column mean. I f VALUE! = DIV! then the v a l u e s w i l l a l l be se t t o VALUE! 3 Each element o f the column has VALUE! s u b t r a c t e d from i t and i s then d i v i d e d by D I V ! . T h i s MODEX i s c a l l e d r e c u r s i v e l y from M0DEX=1 and i s i n c l u d e d to a l l o w m u l t i p l i c a t i o n o f a column by a c e r t a i n v a l u e (DIV! = 1/FACTOR) o r s h i f t i n g a column by a VALUE! (NOTE: DIV! shou ld equal 1 to accomp l i sh a s h i f t and VALUE! shou ld equal 0 f o r a m u l t i p l i c a t i o n o n l y . SHARED NUMSAMSX SELECT CASE modeX CASE 1 v a r n z ! = VARIANCE(MAT!() , COLX. NUMSAMSX, a v e r a g e ! ) v l ! = ave rage ! - v a l u e ! dv ! = SQR(d iv ! * v a r n z ! ) ^ mdX = 3 ^ 51 CALL s c a 1 e c o l ( M A T ! ( ) , COLX, mdX, v l ! , d v ! ) CASE 2 cmax! = c o l m a x ! ( M A T ! ( ) , COLX, 1, NUMSAMSX, cmin ! ) v l ! = c m i n ! dv ! - 1 IF c m a x l o c m l n ! THEN dv! « (cmax! - cm in ! ) / ( d i v ! - v a l u e ! ) mdX • 3 CALL s e a l e c o l ( M A T ! ( ) , COLX, mdX, v l ! , d v ! ) v l ! = v a l u e ! * SGNfd lv ! - v a l u e ! ) 3 dv ! » 1 mdX = 3 CALL s c a l e c o l ( M A T ! ( ) , COLX, mdX, v l ! , d v ! ) CASE 3 FOR nxX = 1 TO NUMSAMSX MAT!(nxX, COLX) = (MAT!(nxX, COLX) - v a l u e ! ) / d i v ! NEXT nxX CASE ELSE STOP END SELECT END SUB SUB s c a l e r ( M E S S ! ( ) . ID() AS RECRD, CLASSO AS RECRD, SCLE$( ) , F$) Th is program w i l l s c a l e the ma t r i x mess! acco rd ing to user Input . The method f o r s c a l i n g the rows Is d e s c r i b e d In the SUBprogram SCaLECOL. SHARED NUMVARSX. NUMSAMSX SHARED ba tchyesX, BAT0PT$() IF batchyesX THEN 4 IF BAT0PT$(2) - "AUTO" GOTO 12 IF BAT0PT$(2) = "RANGE" GOTO 13 IF BAT0PT$(2) = "NO" GOTO 14 GOTO 15 END IF s c a l e X = 1 sca lmax! • 1 s c a l m i n ! • 0 COLOR 1, bgrnd CLS 52 SOUND 330, 1 P R I N T " * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ********************** LOCATE 3 , 30 PRINT "SCALING" LOCATE 22 , 15 PRINT "Use arrows to choose . P ress RETURN to s e l e c t . " ; COLOR 10, bgrnd FOR IX - 1 TO UB0UND(SCLE$) LOCATE 7 + IX * 2 , 15 PRINT LEFT$(STR$(IX) + " ) " + SCLE$( IX) + SPACE$(30) , 30) NEXT IX COLOR n t e x t , h l gh t LOCATE 7 + s c a l e X * 2 , 15 PRINT LEFT$(STR$(sca leX) + " ) " + SCLE$(sca leX) +SPACE$(30),30) COLOR 10, bgrnd IF s c a l e X = 2 THEN LOCATE 11, 55 PRINT "Max = " ; LEFT$(STR$(sca lmax ! ) + SPACE$(18) , 18) LOCATE 12, 55 PRINT "Min » " ; LEFT$(STR$(sca lm in ! ) + SPACE$(18) , 18) ELSE LOCATE 11, 55 : PRINT SPACE$(24) ; LOCATE 12, 55 : PRINT SPACE$(24) ; END IF a$ = INKEY$ IF a$ = " " GOTO 4 IF a$ = CHR$(13) GOTO 11 IF a$ = CHR$(27) THEN NUMSAMSX = -1 EXIT SUB ' ESC was p r e s s e d END IF IF ASC(a$) <> 0 GOTO 4 a$ = RIGHT$(a$, 1) SELECT CASE a$ CASE CHR$(72) s c a l e X • s c a l e X - 1 IF s c a l e X - 0 THEN s c a l e X • UBOUND(SCLES) GOTO 3 CASE CHR$(80) s c a l e X = s c a l e X + 1 IF s c a l e X > UBOUND(SCLEJ) THEN s c a l e X = 1 GOTO 3 CASE CHR$(77) ' the RIGHT key was p ressed IF s c a l e X » 2 THEN COLOR YELLOW, h lgh t LOCATE s c a l e X * 2 + 7, 61 PRINT SPC( IO) ; LOCATE s c a l e X * 2 + 7, 61 COLOR YELLOW, bgrnd INPUT num$ sea lmax! » VAL(numJ) COLOR YELLOW, h lgh t LOCATE s c a l e X * 2 + 8 , 61 PRINT SPC( IO) ; LOCATE , 61 COLOR YELLOW, bgrnd INPUT num$ s c a l m l n ! = VAL(num$) END IF GOTO 3 CASE ELSE GOTO 4 END SELECT LOCATE 22, 1 PRINT SPACE$(79) LOCATE 22 , 30 IF s c a l e X - 4 THEN PRINT " P l e a s e w a i t , c a l c u l a t i n g v a r i a b l e s " ELSE PRINT "Now S c a l i n g END IF ON s c a l e X GOTO 12, 13. 14, 15 54 ' The f i r s t c h o i c e i s s e l e c t e d . T h i s w i l l be the A u t o s c a l l n g f u n c t i o n f o r a l l co lumns. 12 v l ! = 0 dv ! = 1 mdX = 1 FOR nCX = 1 TO NUMVARSX CALL s c a l e c o l ( M E S S ! ( ) , nCX, mdX, v l ! , d v ! ) NEXT nCX SCLE$(1) = "Au to S c a l e d " GOTO 40 The second s c a l i n g f e a t u r e was s e l e c t e d . Th i s w i l l be range s c a l i n g . 13 v l ! » sca lmax ! dv! = s c a l m l n ! mdX ' 2 FOR nCX = 1 TO NUMVARSX CALL s e a l e c o l ( M E S S ! ( ) , nCX, mdX. v l ! . d v ! ) NEXT nCX SCLE${1) = "Range S c a l e d : " + STR$(sca lm tn ! )+ " t o " +STR$(scalmax!) GOTO 40 The t h i r d s c a l i n g f e a t u r e i s Not t o s c a l e . T r i c k y 14 S C L E J ( l ) = "No S c a l i n g " GOTO 40 15 ' Now i t ' s go ing t o get I n t e r e s t i n g . We have to supp ly the column s t a t i s t i c s CALL ADVSCAL(MESS!() , I D ( ) , C L A S S O , F$ , SCLE$(1)) 40 LOCATE 2 1 . 30 END SUB SUB s c l f o r l o d ( F I L E $ . s c $ , modeX() . a ! { ) , B ! ( ) ) Th is subprogram i s des igned to read In a f i l e saved by SCLFORSAV and parse i t i n t o the ADVSCAL subprogram so t ha t the l a z y J o e ' s d o n ' t 1 0 have to type so much. A l s o i t i s n i c e t o be a b l e to ensure tha t *» 55 s c a l i n g Is c o n s i s t e n t between da ta s e t s . I f the number of d e s c r i p t o r s Is d i f f e r e n t between the cu r ren t da ta se t and the saved format f i l e , then (uh oh) the program reads i n o n l y enough to use ( l e . the e x t r a f o rma t t i ng i s I gno red ) . I f the re i s not enough in the f i l e the v a r i a b l e s l e f t over a re l e f t as the d e f a u l t . OPEN " 1 " , #3, FILES INPUT #3, sc$ FOR IX • 1 TO UBOUNO(modeX) INPUT #3, modeX(IX), a l ( I X ) , B I ( IX ) NEXT CLOSE #3 END SUB SUB s c l f o r s a v (F ILES, scS, a X ( ) , B l ( ) . C ! ( ) ) ' Th is i s the s i s t e r r o u t i n e to SCLFORLOD. OPEN " 0 " , #2, FILES WRITE #2, sc$ FOR IX = 1 TO UBOUNO(aX) WRITE #2, a X ( I X ) , B ! ( I X ) , C I ( IX ) NEXT IX PRINT #2, " T h i s f i l e c r e a t e d on " + DATES + " at " + TIMES CLOSE #2 END SUB SUB s c l s a v e (MAT I ( ) , ID() AS RECRO, CLASS() AS RECRO, scSO, specX, F$) Th is r o u t i n e saves the ma t r i x as a s c a l e d f i l e in the same format as the o r i g i n a l da ta f i l e . (Note: a s c a l e d DSl f i l e is saved as a SC2 ' f i l e ) SHARED NUMSAMSX, NUMVARSX SHARED VARNAMSO ' * * * * * Output m a t r i x t o f11e$ . COLOR n t e x t , h lgh t LOCATE 25 , 1 PRINT SPACE${79); LOCATE 25, 10 56 INPUT ; "Save s c a l e d data f i l e " ; aS IF UCASE$(LEFTS(a$, 1)) = " Y " THEN EXTS - RIGHT$(FS, 3) FS ' LEFTS(F$, LEN(FS) - 4) LOCATE 25. 10 PRINT " F i l e Name = <"; F$ ; "> " ; INPUT ; a$ aS • UCASES(aS) CALL FRTEXT(a$, rootaS, extaS) IF aS - " " THEN a$ - F$ ELSE aS * rootaS + aS END IF IF EXTS • " D S l " OR EXTS » "DS2" THEN exta$ = " . S C 2 " ELSE extaS - " . S C L " END IF aS • a$ + ex taS OPEN "o". #3, a$ WRITE #3, NUMSAMSX WRITE #3, NUMVARSX FOR nvX • 1 TO NUMVARSX WRITE #3, VARNAMS(nvX) NEXT QS «= CHR$(34) FOR nxX « 1 TO NUMSAMSX PRINT #3. Q$ • I 0 (nxX) .R + QS, QS + CLASS(nxX).R + QS, FOR nYX « 1 TO NUMVARSX - 1 PRINT #3, MAT!(nxX, nYX) , NEXT PRINT #3, MAT!(nxX, NUMVARSX) NEXT WRITE #3, sc$(0 ) IF specX THEN >J l lmX « UBOUND(scS) 0 1 57 FOR IX * 1 TO 11mX WRITE #3. " V a r i a b l e #" + STR$(IX) + sc$( IX) NEXT IX END IF CLOSE #3 ' Th is i s a pause put i n f o r no o b v i o u s l y apparent reason . Perhaps some people were d i s t r e s s e d at the speed of s c a l i n g and sav ing of smal l data se t s and t h i s i s put i n to ease these p e o p l e ' s a n x i e t i e s . (Kim knows who these people a r e . ) FOR IX « 1 TO 20000 NEXT IX , END IF END SUB FUNCTION VARIANCE! (HAT! ( ) , COLX, numrowsX, AVE!) re tu rns the va r i ance o f the COLX' th column of the MAT! r ix from the ' f i r s t to the NUMROWSX'th row. sum! » 0 FOR nxX » 1 TO numrowsX sum! => sum! + MAT!(nxX, COLX) NEXT nxX AVE! = sum! / numrowsX v r ! = 0 FOR nxX = 1 TO numrowsX v r ! = v r ! + (HAT!(nxX, COLX) - AVE!) " 2 NEXT nxX IF numrowsX > 1 THEN VARIANCE! » v r ! / (numrowsX - 1) ELSE VARIANCE! = 0 END IF END FUNCTION We're done! Time to go out on the boat and s k i p K a t h l e e n ' s c o o k i e s ! 1 ********************************************************** 2 ' OENDPLOT - By Dav id S i b b a l d ' The Labo ra to ry f o r Automated Chemical A n a l y s i s Department o f Chemis t ry ' U n i v e r s i t y o f B r i t i s h Columbia ' November 1989 '********************************************************************** ' Th i s Is pa r t two o f the DENDGRAM program package. Th is program loads ' i n the dendrogram data from a .DEN f i l e c r e a t e d by the DENDGRAM ' program and d i s p l a y s I t on the s c r e e n . EGA w i t h 256K o f memory 1s r e q u i r e d . '********************************************************************** DECLARE FUNCTION DFCALC! (POSIT! , MAX!, FORMS) DECLARE SUB GETFILE (FILES) DECLARE SUB PMOVE ( p x X ( ) . P Y X ( ) . nxX, nYX, o y l X , oy2X) DECLARE SUB LST (CURSORX, MARKX(), 1d() AS ANY, CLASS() AS ANY, PLOTXO. C X O ) OECLARE SUB MRK (XX, YX, Y1X, YLX, CX) DECLARE SUB CMRK (XX, YX. CX) DECLARE SUB p l o t t e r (DENDX(), DENDD!(), PLOTXO. 1d() AS ANY, CLASS() AS ANY, MAXOST!) DECLARE SUB DLINE (X IX , Y1X. X2X, Y2X, YMAXX, TWOPAGESX, PL7440X) OECLARE SUB FRTEXT (F ILES, FROOTS, FEXTS) DECLARE FUNCTION WADEDST! (NODEX) * * * F N T X T X r e t u r n s the p r i n t l i n e on screen co r respond ing * * * t o g r a p h i c s l i n e n OEF f n t x t X (N) IF N < 8 THEN f n t x t X - 1 ELSE f n t x t X = (N + 9) * 25 / 350 END DEF TYPE RECRD R AS STRING * 4 END TYPE CONST bgrnd = 11 CONST h l gh t » 4 CONST YELLOW = 14 CONST db lue « 9 CONST n t e x t •= 15 REM SDYNAMIC OATA LINEAR,.LOGARITHMIC,QUADRATIC DMETHSX « 4 DIM DSMETHS(DMETHSX) FOR IX = 1 TO DMETHSX READ DSMETHS(IX) NEXT Check COMMANDS to see 1f t h i s program has been c a l l e d from DENDGRAM w i th a f i l e name. I f s o , we can read I t s t r a i g h t i n , o t h e r w i s e , get one. FILES = COMMANDS IF FILES - " " THEN CALL GETFILE(FILES) IF FILES • " " THEN GOTO BYBY END IF IF INSTR(FILE$, " . " ) = 0 THEN FILES = FILES + " . D E N " Okay, now we have a f i l e name. " W e ' r e not su re where the f i l e name came from, a l l w e ' r e su re I s : we have one now" - a p o l o g i e s to Rod, the f a s c i s t geographer . So, read I t In OPEN " I " , #1, FILES INPUT #1, NUMSAMSX, NUMVARSX DIM VARNAMS(NUMVARSX) FOR IX - 1 TO NUMVARSX INPUT #1. VARNAMS(IX) NEXT Dimension some v a r i a b l e s to h o l d the s t u f f DIM DENDD!(NUMSAMSX - 1) 'The a r r a y o f the d i s s i m i l a r i t i e s DIM DENDX(NUMSAMSX - 1, 2) 'The l i s t o f p a i r s o f f u s i o n s 3 DIM PLOTX(NUMSAMSX) 'The o rde r In which the samples l i e 'on the v e r t i c a l a x i s . DIM id(NUMSAMSX) AS RECRD 'The ID ' s o f the samples DIM CLASS(NUMSAMSX) AS RECRD 'The CLASS d e s i g n a t i o n s FOR IX - 1 TO NUMSAMSX - 1 INPUT #1, DENDX(IX, 1) INPUT #1, DENDX(IX, 2) INPUT #1, DENDD!(IX) NEXT INPUT #1, MAXDST! FOR IX = 1 TO NUMSAMSX INPUT #1, PLOTX(IX) INPUT #1, 1d(PL0TX( IX) ) .R INPUT #1, CLASS(PLOTX(IX)) .R NEXT INPUT #1, sca led$ INPUT #1, method! CLOSE #1 ' I t shou ld be s a i d t ha t a t t h i s p o i n t , the ID o f sample number one -' ID(1) i s not the same as the ID o f the f i r s t sample. (Huh?). ID(1) ' c o n t a i n s the ID o f the sample ( s i g n a l ) which i s d i s p l a y e d at the top ' (o r l e f t ) o f the dendrogram - 1e. I t w i l l be one o f the two samples ' i n v o l v e d i n the ve r y f i r s t f u s i o n made when c o n s t r u c t i n g the ' dendrogram. So be c a r e f u l when c o n v e r t i n g between t h i s program and ' the o the r data a n a l y s i s program. Th is Is c o n f u s i n g , but more use fu l than a sc reen door on a b a t t l e s h i p . Now p l o t dendrogram 899 CLS DIM SHARED SCRNX DIM SHARED VEEWX SCRNX - 9 CALL p lo t t e r (DENDX( ) , DENDD!(), PLOTXO, i d ( ) , CLASS( ) , MAXDST!) BYBY: SCREEN 0 CLS END 4 REM $STATIC SUB CMRK (XX, YX, CX) Subrout ine to make a c u r s o r a t p o s i t i o n x , y on the s c r e e n w i t h c o l o r CX ch$ = " H3BD6E3L5" ' b i t mapping f o r a c r o s s mk$ = " C " + STR$(CX) + " B M " + STR$(XX - 1) + " , " + STR$(YX) + ch$ DRAW mk$ END SUB FUNCTION DFCALC! (POSIT ! , MAX!, F0RM$) This f u n c t i o n c a l c u l a t e s a p o s i t i o n f o r the p l o t t i n g o f a dendra l ' l i n k o c c u r l n g a t POSIT! d i s s i m i l a r i t y v a l u e . MAX! s h o u l d be the ' maximum va lue ( Ie - the va lue o f DENDD!(NUMSAMSX-1) OR MAXDST!) from main module) . FORM$ Is a number o f the p l o t method d e s i r e d . The VAL(F0RM$) Is t a k e n . Th i s Is an i n t e g e r r e f e r l n g to the format o f d i s p l a y . I f an e x t r a parameter Is r e q u i r e d to be passed (such as the R! va lue f o r q u a d r a t i c d i s p l a y ) then t h i s i s s t o r e d i n FORMS a f t e r a decimal p o i n t . The f u n c t i o n v a l u e 1s r e tu rned as a s i n g l e p r e c i s i o n v a l u e between 0 and 1. Th is v a l u e co r responds t o the p l o t p o s i t i o n r e l a t i v e t o MAX!. Eg. a p o s i t i o n c a l c u l a t e d to be shown at 1/4 the d i sp lacemen t from the bottom the the MAX! shou ld have would r e t u r n .25 as the f u n c t i o n v a l u e . So the p o i n t o f a l l t h i s Is t h a t the r e t u r n e d v a l u e must s t i l l be m u l t i p l i e d by the p l o t window s i z e (XMAXX-XMINX) and added t o the ' o f f s e t (XMINX) ' CURRENTLY: 1 L i n e a r (normal) 2 L i n e a r ( r e l a t i v e a x i s - a f f e c t i o n a t e l y c a l l e d the Wade-Wentze l l ) 3 Loga r i t hm ic 4 Quad ra t i c o f the form x " R t h power SELECT CASE INT(VAL(FORM$)) CASE 1 to -J CO D! « POSIT! / MAX! CASE 2 D! » POSIT! / MAX! ' The a c t u a l c a l c u l a t i o n f o r the Wade- Wentze l l method Is ' done In the WAOEOST! r o u t i n e c a l l e d by the PLOTTER s u b r o u t i n e . CASE 3 D! = L0G(P0SIT! + 1) / LOG(MAX! + 1) IF D! < 0 THEN D! • 0 CASE 4 R! » VAL(RIGHT$(FORM$, LEN(FORMS) - INSTR(FORM$, " . " ) ) ) D! • (POSIT! " R! ) / (MAX! " RI) CASE ELSE STOP END SELECT IF 0! < 0 THEN 01 - 0 IF D! > 1 THEN D! - 1 DFCALC! - D! END FUNCTION SUB DLINE (X IX . Y1X, X2X, Y2X, YMAXX, TWOPAGESX, PL7440X) ' * * * DLINE draws a l i n e between two p o i n t s I f the y - c o o r d l n a t e 1s o f f s c a l e then page two 1s used I f TWOPAGESX - - 1 SHARED SCRNX, VEEWX IF PL7440X THEN PRINT #1, " P A " ; Y1X; X IX ; " ; P D ; P A " ; Y2X; " , " ; X2X; " ; P U ; " GOTO 3000 END IF IF NOT TWOPAGESX THEN SCREEN SCRNX, , 0 , VEEWX LINE (X IX , Y IX) - (X2X, . Y2X) ELSE IF Y l X > YMAXX THEN IF Y2X > YMAXX THEN SCREEN SCRNX, , 1. VEEWX 6 LINE (XIX, Y1X - 2 * YMAXX + 350)-(X2X,Y2X-2*YMAXX+350) ELSE s l o p e ! » (XIX - X2X) / (Y1X - Y2X) xmldX = s l o p e ! * (YMAXX - Y2X) + X2X SCREEN SCRNX, , 0 . VEEWX LINE (X2X. Y2X)-(xm1dX, YMAXX + 10) SCREEN SCRNX, , 1, VEEWX LINE (xmldX. 350 - YMAXX - 10 ) - (X1X , Y1X -2*YMAXX+350) END IF ELSE ' y l X <= ymaxX IF Y2X <= YMAXX THEN SCREEN SCRNX, , 0 , VEEWX LINE (XIX. Y 1 X H X 2 X , Y2X) ELSE s l o p e ! = (XIX - X2X) / ( Y l X - Y2X) xmidX • s l o p e ! * (YMAXX - Y2X) + X2X SCREEN SCRNX, , 0 , VEEWX LINE (X IX . Y lX ) - ( xm1dX, YMAXX + 10) SCREEN SCRNX, . 1. VEEWX LINE (xmldX, 350 - YMAXX - 10 ) - (X2X , Y2X - 2*YMAXX+350) END IF END IF END IF 3000 o END SUB SUB FRTEXT (F ILES, FROOTS, FEXTS) STATIC I t i s t h i s r o u t i n e s f u n c t i o n t o break a f i l e name i n t o r o o t , f i l e , and e x t e n s i o n . f l l e S Is the Input s t r i n g . FILES con ta i ns f i l ename on r e t u r n (maximum 8 c h a r a c t e r s ) FROOTS c o n t a i n s pa th i n f o r m a t i o n FEXTS con ta i ns e x t e n s i o n Loop to remove spaces from f i l e name LENGTHX = LEN(FILES) F$ « FILES ^ FILES = "" VO 7 FOR IX * 1 TO LENGTHX IF MID$(F$. IX . 1) <> " " THEN FILE$ = FILE$ + MID$(F$, IX,1) NEXT IX FROOT$ = " " FEXT$ -AX ' INSTR(FILE$, " . " ) IF AX • 0 GOTO n roo t ' no " . " means no ex tens ion IF AX = LENGTHX THEN FEXTS - " " FILE$ » LEFT$(F ILE$, LENGTHX - 1) LENGTHX = LEN(FILE$) GOTO nroot END IF FEXT$ = MID$(FILE$, AX. 4) FILE$ = LEFT$(FILE$, AX - 1) ' e x t e n s i o n taken ca re o f , now peel o f f roo t n r o o t : LENGTHX » LEN(FILES) AX - INSTR(FILE$, " \ " ) IF AX • LENGTHX THEN FROOTS - FROOTS + FILES FILES - " " GOTO f i n END IF IF AX = 0 GOTO f i n FROOTS » FROOTS + LEFTS(FILES. AX) FILES = RIGHTS(FILES, LENGTHX - AX) GOTO nroot f i n : END SUB SUB GETFILE (FILES) ' Th i s r o u t i n e i s c a l l e d I f DENDPLOT Is c a l l e d wi thout any f i l e be ing passed to i t . CLS LOCATE 3 , 20 PRINT "DENDPLOT - Dendrogram p l o t t i n g u t i l i t y " 8 LOCATE 10. 15 INPUT " E n t e r f i l ename (no e x t e n s i o n ) " ; FILES FILES - UCASES( FILES) IF FILES = " " THEN GOTO by Put e r r o r check ing i n here t o check f o r a l e g a l f i l e name. In case the user i s not pay ing a t t e n t i o n w h i l e wa tch ing Don C h e r r y ' s C o a c h ' s Corner du r i ng Hockey N igh t i n Canada. Eh , Duane? AX = INSTR(FILES, " . " ) IF AX > 0 THEN FILES = LEFT$(F ILES, AX - 1) FILES = FILES + " . D E N " OPEN FILES FOR RANDOM ACCESS READ AS #1 L& = L0F(1) CLOSE #1 IF L& = 0 THEN KILL FILES PRINT F ILES; " not f o u n d . " STOP END IF by: END SUB SUB LST (CURS0RX.MARKX().1d() AS RECRD.CLASSO AS RECRD.PL0TX() , c o l r X O ) Th is r o u t i n e p r i n t s out the dendrogram samples i n o r d e r . The sample ' at the CURSORX p o s i t i o n Is kept In the m idd le o f the d i s p l a y ( u n l e s s the c u r s o r Is a t the top o r bo t tom) . NUMSAMSX = UBOUND(MARKX) SCREEN 9 , , 0, 0 LOCATE 25 . 1 PRINT STRING$(80, " " ) ; LOCATE , 1 COLOR 14 PRINT " P r e s s any k e y . . . " ; SCREEN 9 , , 1, 0 CLS to C O o Determine what range t o p r i n t out TOPSAMX = 1 IF NUMSAMSX > 23 THEN IF CURSORX > 12 THEN TOPSAMX » CURSORX - 11 END IF END IF BOTSAMX - TOPSAMX + 22 IF BOTSAMX > NUMSAMSX THEN BOTSAMX » NUMSAMSX ' P r i n t out ID and CLASS 'S SCREEN 9 . , 1. 1 LOCATE 1, 1 COLOR 14 PRINT " ID CLASS" FOR IX = TOPSAMX TO BOTSAMX - 1 CURSCOLRX - co l rX( -MARKX( IX) ) IF IX « CURSORX THEN CURSCOLRX - co l rX( -MARKX(IX) + 2) IF CURSCOLRX » 0 THEN CURSCOLRX » 15 COLOR CURSCOLRX PRINT 1d(PLOTX( IX) ) .R; " " ; CLASS(PL0TX(IX)) .R NEXT PRINT 1d(PLOTX(BOTSAMX)).R; " " ; CLASS(PLOTX(BOTSAMX)).R; LOCATE 25, 1 COLOR 14 PRINT "(0 - SHOW DENDROGRAM, ESC - E X I T ) " ; SCRNX • 1 530 AS - INKEY* AS - " " WHILE A$ • " " AS - INKEYS WEND AS - UCASES(AS) IF A$ - CHR$(27) THEN SCREEN 9. , 1, 1 CLS SCREEN 9 , , 0 , 0 10 EXIT SUB END IF SCRNX = -SCRNX + 1 SCREEN 9 , , SCRNX, SCRNX GOTO 530 END SUB SUB MRK (XX, YX, Y l X , YLX, CX) STATIC ' Th is r o u t i n e draws a box around the c u r s o r a rea f o r the c u r r e n t page. ' The box has the d imens ions 7 by YLX w i t h the upper r i g h t c o r n e r b e i n g ' p o s i t i o n e d a t X X , Y l X . The s u b r o u t i n e CMRK Is c a l l e d to draw the ' cu rso r a t p o s i t i o n XX,YX w i t h c o l o r CX. DIM AREAX(IOOO) DIM c o l r X ( 3 ) co l rX (O) = 0 ' Background c o l r X ( l ) = 12 ' Marked c o l r X ( 2 ) = 11 ' C u r s o r c o l r X ( 3 ) = 10 ' Marked w i t h c u r s o r IF PREVX THEN PUT ( xo ldX - 7, y o l d X ) - ( x o l d X , yoldX+ Y L X ) , AREAX PREVX = 0 END IF ' S tore area In to AREAX f o r red raw ing l a t e r GET (XX - 7, Y 1 X ) - ( X X , Y l X + Y L X ) , AREAX CALL CMRK(XX, YX, CX) IF CX = c o l r X ( l ) THEN CX « c o l r X ( 2 ) IF CX > 8 THEN CX = CX - 8 LINE (XX - 1, Y1X) - (XX - 7 , Y l X + Y L X ) . CX, B PREVX = -1 xo ldX = XX yo ldX - Y l X END SUB SUB p l o t t e r (DENDX(), DENDD! ( ) , P L O T X O , i d ( ) AS RECRD,CLASS() AS RECRD, MAXDST!) to CO H 11 ' * * * * " P r in t -Dendrogram by Dav id S l b b a l d Th is 1s the program wh ich conve r t s the dendrogram f i l e ' c r e a t e d by DENDO to a v i s u a l d i s p l a y method$, * * * * * 901 s c a l e d S , DSMETH$() These a re the va lues f o r the window on the g r a p h i c s screen y - c o o r d i n a t e on screen o f samples Background Harked Cursor Harked w i t h cu r so r SHARED SCRN% SHARED F ILES , XMAX% = 640 ' YMAXX » 320 ' XMINX «= 50 ' YMINX = 20 NUMSAHSX = UBOUND(ld) DIM HARKXf1 TO NUMSAMSX) DIM P0SYX(1 TO NUMSAMSX) DIH P0SXX(1 TO NUMSAMSX) DIM c o l r X ( 3 ) c o l r X ( O ) » 0 c o l r X ( l ) « 12 c o l r X ( 2 ) = 11 c o l r X ( 3 ) = 10 DFORMS » " 1 " cu rexp ! = 2 COLOR , c o l r X ( O ) d l s tmax ! « MAXDST! IF NUMSAMSX < 100 THEN WADEMAX! = WADEDST!(NUMSAMSX - 1) by l $ = " ( Q - q u l t T-two page d i s p l a y P-p lo t I - I d e n t i f y WLN~-change d i s p l a y format) by2S • " ( 0 - o t h e r page S - s l n g l e page P -p lo t Q-qu i t WLN"-change d i s p l a y f o rma t ) " IF PL7440X THEN ' * * * se t va lues f o r p l o t t e r output YMINX = 75 YMAXX » 975 OVERT! » (YMAXX XMINX « 50 XMAXX = 600 DOPRINTX = 0 GOTO 920 END IF YMINX) / (NUMSAMSX) 12 by$ = by l $ ' F igure out where the p l o t i s a l l o w e d t o go , and what t ype o f d i s p l a y . IF TW0PAGESX <> 0 THEN YMAXX = 3 1 4 IF NUMSAMSX <= (YMAXX - 14) / 14 THEN YMINX = 28 * ((YMAXX - 28) / 28 - NUMSAMSX / 2) + 28 tb topX = fn tx tX(YMINX) XMINX = 70 OVERT! = 28 DOPRINTX » -1 ELSEIF NUMSAMSX < (YMAXX - 28) / 7 THEN YMINX = 14 * ((YMAXX - 14) / 14 - NUMSAMSX / 2) + 21 XMINX = 70 OVERT! = 14 DOPRINTX = -1 ELSE YMINX = 20 DVERT! = 2 * (YMAXX - YMINX) / (NUMSAMSX) XMINX = 10 DOPRINTX = 0 ENO IF ELSE ' * * * one page IF NUMSAMSX > 23 THEN YMINX = 20 YMAXX » 325 DVERT! = (YMAXX - YMINX) / (NUMSAMSX - 1) XMINX = 10 DOPRINTX = 0 ELSEIF NUMSAMSX > 11 THEN DVERT! - 14 YMINX = 7 * (25 - NUMSAMSX) + 5 XMINX = 70 DOPRINTX = -1 ELSE DVERT! = 2 8 YMINX - 3 . 5 * (25 - NUMSAMSX) - 5 co XMINX = 7 0 . t o 13 DOPRINTX = -1 END IF END IF ' * * * * G raph i cs t o draw dendrogram on sc reen 920 SCREEN SCRNX. , 1. VEEWX CLS SCREEN SCRNX, . 0 . VEEWX CLS PAGLENX = DVERT! * 23 ' Set the Ycords o f the samples based on p l o t o rde r FOR NX » 1 TO NUMSAMSX P0SYX(PL0TX(NX)) = YMINX + DVERT! * (NX - 1) NEXT IF PL7440X THEN GOSUB 2000 IF PL7440X - 0 THEN A$ = " S " VEEWX = 0 GOTO 1021 END IF END IF ' Set xcords o f a l l samples t o the l e f t a x i s FOR NX » 1 TO NUMSAMSX P0SXX(PL0TX(NX)) = XMINX NEXT COLOR 15 IF DOPRINTX THEN FOR NX = 1 TO NUMSAMSX ' P r i n t out samplenames IF P0SYX(PL0TX(NX)) > YMAXX THEN VTABX = fntxtX(POSYX(PLOTX(NX)) + 350 - 2 * YMAXX) SCREEN SCRNX, , 1, VEEWX c o l r X - co l rX(-MARKX(NX)) CALL CMR(C(XMINX, P0SYX(PL0TX(NX)) + 350-2*YMAXX,col rX) LOCATE VTABX, 1 ELSE 14 SCREEN SCRNX, , 0 , VEEWX VTABX = fn tx tX(POSYX(PLOTX(NX))) c o l r X = co l rX( -MARKX(NX)) CALL CMRK(XMINX, P0SYX(PL0TX(NX)) , c o l r X ) LOCATE VTABX, 1 END IF o u t l $ = SPACE$(4) RSET o u t l $ = id (PLOTX(NX) ) .R out2$ = SPACE$(4) RSET out2$ - RTRIM$(CLASS(PLOTX(NX)).R) out$ " o u t l $ + out2$ PRINT o u t $ ; NEXT END IF DSCALE! - ABS(XMINX - XMAXX) * .98 FOR pX = 1 TO NUMSAMSX - 1 pos lX = DENDX(pX. 1) pos2X = DENDX(pX. 2) OLDylX = POSYX(poslX) 0LDy2X - POSYX(pos2X) NEWYX » (OLDylX + 0LDy2X) / 2 IF WWMETHX THEN NEWXX-OSCALE!*DFCALC!(WADEDST!(pX).MAXDST!.0F0RM$)+XMI NX ELSE NEWXX=OSCALE! * DFCALC!(DENDD!(pX). MAXDST!,DF0RM$)+XMIN% END IF CALL DLINE(POSXX(pos lX) , OLDylX, NEWXX. OLDylX, YMAXX. TWOPAGESX, PL7440X) CALL DLINE(P0SXX(pos2X). 0LDy2X, NEWXX, 0L0y2X, YMAXX, TWOPAGESX, PL7440X) CALL DLINE(NEWXX, OLDylX, NEWXX, 0LDy2X, YMAXX, TWOPAGESX, PL7440X) CALL PMOVE(POSXX(), P O S Y X O , NEWXX, NEWYX, OLDylX, 0LDy2X) NEXT CALL DLINE(NEWXX, NEWYX, XMAXX, NEWYX, YMAXX. TWOPAGESX, PL7440X) 15 SCREEN 9, , -TWOPAGESX, VEEWX LOCATE 1. 1 PRINT FILE$ LOCATE 1 , 6 + LEN(FILES) PRINT " D i s p l a y method • " ; PRINT DSMETH$(I NT{VAL(DFORM$))); Patch In here I f you w ish the number on the end o f DSF0RM$ p r i n t e d SELECT CASE INT(VAL(DFORM$)) CASE 4 PRINT S P C ( 3 ) ; PRINT "POWER ="; RIGHT$(DFORM$, LEN(DF0RM$) - INSTR(DFORM$, "•")); PRINT " Use + and - k e y s . " ; CASE ELSE END SELECT IF WWMETHX THEN LOCATE 1, 65 PRINT " R e l a t i v e a x i s " END IF Draw s i m i l a r i t y a x i s and p r i n t out command l i n e s COLOR 11 IF NOT TWOPAGESX THEN LOCATE 25, 1 PRINT b y l $ ; COLOR 13 LINE (XMINX, YMINX - 5)-(XMAXX, YMINX - 5) ELSE SCREEN SCRNX, , 0, VEEWX LOCATE 1. 1 PRINT by2$; COLOR 13 LINE (0 , YMAXX + 11)-(XMAXX, YMAXX + 11) SCREEN SCRNX, . 1. VEEWX LOCATE 25, 1 COLOR 15 16 PRINT by2$; COLOR 13 LINE (0 . 350 - YMAXX - 11)-(XMAXX, 350 - YMAXX - 11) SCREEN SCRNX. , 0 . VEEWX END IF Draw hashmarks on s i m i l a r i t y a x i s IF NOT PL7440X THEN IF TWOPAGESX THEN SCREEN SCRNX, , 1, VEEWX COLOR 12 YX » 350 - YMAXX - 12 GOSUB s c a l e SCREEN SCRNX. , 0 , VEEWX COLOR 12 YX = YMAXX + 9 GOSUB s c a l e ELSE SCREEN SCRNX, , 0 , VEEWX COLOR 12 YX = YMINX - 8 GOSUB s c a l e END IF ELSE YX = YMINX - 10 GOSUB s c a l e PRINT #1, " P A O . O ; " PRINT #1, " S P ; " CLOSE #1 TWOPAGESX = 0 VEEWX = 0 PL7440X = 0 GOTO 901 END IF GOTO 1020 This GOSUB p r i n t s out the hashmarks on the a x e s . S p e c i f i c a l l y , i t draws a l i n e 5 p i x e l s l ong (YX t o YX+4) a t the p o s i t i o n c a l c u l a t e d CO 17 us ing the DFCALC f u n c t i o n s c a l e : FOR I! • 0 TO MAXDST! STEP MAXDST! / 10 D! » D F C A L C ! ( I ! , MAXDST!, DFORM$) * DSCALE! + XMIN% LINE ( D l . Y%)-(DI , YX + 4) NEXT RETURN 1020 INIT! = TIMER A$ » INKEYJ A$ = " " Pause and wa i t f o r key . I f the Bozo d o e s n ' t p lay w i t h the d i s p l a y , i t goes away. The t ime cons tan t shou ld be changed f o r longer d e l a y s , o r removed f o r people w i t h th ree c a r s who are never on t ime (but no ment ion ing D a r c y ' s name.) WHILE A$ « " " AND TIMER < INIT! + 10000 A$ = INKEY) WEND IF A$ - " " THEN A$ = " Q " A$ = UCASE$(A$) 1021 SELECT CASE A$ CASE " Q " SCREEN SCRNX, , 0 , 0 CLS SCREEN SCRNX, , 0 , 1 CLS GOTO 1030 CASE " T " IF TWOPAGESX = 0 THEN TWOPAGESX = -1 GOTO 901 END IF CASE "W" IF NUMSAMSX < 100 THEN IF WWMETHX THEN MAXDST! = d is tmax! WWMETHX « 0 18 ELSE MAXDST! = WADEMAX! WWMETHX = 1 END IF END IF GOTO 920 CASE " N " IF DF0RM$ = "1" THEN SOUND 50. 1 GOTO 1020 END IF DF0RM$ = " 1 " GOTO 920 CASE " L " IF DF0RM$ = " 3 " THEN SOUND 50, 1 GOTO 1020 END IF DF0RM$ = " 3 " GOTO 920 CASE " " " IF LEFT$(DFORM$, 1) = " 4 " THEN SOUND 51 . 1 GOTO 1020 END IF DF0RM$ = " 4 . " + STR$(cu rexp ! ) GOTO 920 CASE "+" , "=" IF INT(VAL(LEFT$(DFORM$, 1 ) ) ) = 4 THEN c u r e x p ! = c u r e x p ! + .1 IF cu rexp ! > 2 THEN c u r e x p ! = 2 DF0RM$ = " 4 . " + STR$(cu rexp ! ) GOTO 920 ELSE SOUND 70. I GOTO 1020 END IF CASE " _ " IF INT(VAL(LEFTJ(DFORMS, 1 ) ) ) = 4 THEN c u r e x p ! » c u r e x p l - .1 IF c u r e x p ! » 0 THEN c u r e x p ! = .1 OFORM$ « " 4 . " + STR$(curexp! ) GOTO 920 ELSE SOUND 70 , 1 GOTO 1020 END IF CASE " X " ' change power Th i s c o u l d be Implemented Inpu t lng a power from the keyboard but I was busy . CASE " S " IF TWOPAGESX THEN TWOPAGESX = 0 VEEWX = 0 GOTO 901 END IF CASE " P " CLS IF TWOPAGESX THEN VEEWX » (VEEWX + 1) MOD 2 SCREEN SCRNX, , VEEWX, VEEWX CLS END IF PL7440X » -1 TWOPAGESX • -1 GOTO 901 nice "n" " " CASE 0 , IF TWOPAGESX THEN VEEWX « (VEEWX + 1) MOD 2 SCREEN SCRNX, , VEEWX, VEEWX END IF CASE " I " IF NOT TWOPAGESX THEN CURSORX = 1 20 GOSUB IDCL LOCATE 25 , 1 PRINT by$; ELSE LOCATE 25 , 1 COLOR 14 PRINT STRING$(80, " " ) ; LOCATE 25, 1 PRINT "You must be i n s i n g l e page mode to i d e n t i f y p o i n t s . " ; A$ = INKEY$ A$ - " " INIT! = TIMER WHILE A$ = " " AND INIT! + 10 > TIMER A$ = INKEY$ WEND GOTO 920 END IF CASE ELSE 'burp END SELECT GOTO 1020 This s e c t i o n d e a l s w i t h i d e n t i f y i n g and changing the c l a s s o f s i g n a l s For a one page d i s p l a y , we w i l l use page two to p r i n t out the dendrogram ID 's i n PLOT o r d e r . We w i l l a l l o w the s e l e c t i o n o f s p e c i f i c e lements on the dendrogram i t s e l f and h i g h l i g h t these on the p r i n t out o f I D ' s . The ID page w i l l c o n t a i n the ID and CLASS of each sample. IDCL: CURSORX = 1 IDCL2: LOCATE 25 , 1 PRINT SPACE$(79) ; w LOCATE 1, 65 CO PRINT STRING$(15, " " ) ; 21 LOCATE 25 , 1 PRINT "Use c u r s o r keys to move, M-to mark sample, L - to l i s t samples , N-next m a r k e d . " ; IF FEXTJ = " . D S 1 " THEN PRINT " , C to change c l a s s i f i e r " ; ' ****** IF NUMSAMSX > 23 THEN PGLNTHX • DVERT! * 23 + 2 ELSE PGLNTHX = NUMSAMSX * 14 + 2 END IF ex tX = 0 1110 WHILE ex tX = 0 LOCATE 1, 70 CURSCOLRX » colrX(-MARKX(CURSORX) + 2) COLOR CURSCOLRX PRINT id(PL0TX(CURS0RX)) .R; " " ; CLASS(PL0TX(CURS0RX)).R; TBX = (CURSORX - 1) * DVERT! + YMINX TOPX = 1 IF NUMSAMSX > 23 AND NUMSAMSX - CURSORX < 12 THEN TOPX - (NUMSAMSX - 23 - .5) * DVERT! + YMINX ELSE IF CURSORX > 11 AND NUMSAMSX > 23 THEN TOPX « (CURSORX - 11 - .5) * DVERT! + YMINX ELSE TOPX = ( - . 5 * DVERT!) + YMINX END IF END IF IF TOPX < 1 THEN TOPX « 1 CALL MRK(XMINX, TBX, TOPX, PGLNTHX, CURSCOLRX) A$ = INKEY$ A$ = " " INIT! » TIMER WHILE A$ = " " AND TIMER < INIT! + 1000 A$ - INKEYS WEND IF AJ = " " THEN A$ = CHR$(27) IF ASC(A$) = 0 THEN A$ • RIGHT$(A$, 1) 22 SELECT CASE UCASE$(A$) CASE " M " ' the M key - Duh, thanks K e l l y . MARKX(CURSORX) - NOT MARKX(CURSORX) CASE " N " ' the N key IX = CURSORX + 1 MX = 0 DO WHILE NOT MX MX « MARKX(IX) IX » IX + 1 IF IX > NUMSAMSX AND NOT MX THEN MX = -1 IX = 0 END IF LOOP IF IX = 0 THEN SOUND 105, 2 ELSE c o l r X = colrX(-MARKX(CURSORX)) CURSORX = IX - 1 CALL MRK(XMINX, TBX, TOPX, PGLNTHX. c o l r X ) END IF CASE " L " ' the L key CALL LST(CURSORX, MARKX(). i d ( ) . C L A S S ( ) , P L O T X ( ) , c o l r X ( ) ) GOTO IDCL2 CASE CHR$(80) ' the DOWN key was p ressed c o l r X = colrX(-MARKX(CURSORX)) IF CURSORX < NUMSAMSX THEN CURSORX = CURSORX + 1 CALL CMRK(XMINX, TBX, c o l r X ) CASE CHR$(72) ' the UP key c o l r X = colrXt-MARKX(CURSORX)) IF CURSORX > 1 THEN CURSORX = CURSORX - 1 CALL MRK(XMINX, TBX, TOPX, PGLNTHX. c o l r X ) CASE CHR$(56),CHR$(73),CHR$(57) ' the s h i f t UP, PgUp.SH PgUp c o l r X = colrX(-MARKX(CURSORX)) IF CURSORX > 11 THEN CURSORX » CURSORX - 10 ^ ELSE ^ 23 CURSORX = 1 END IF CALL MRK(XMIN%, TBX, TOPX, PGLNTHX, c o l r X ) CASE CHR$(27) ' the ESCape key ex tX « -1 CASE CHR$(50) . CHR$(81), CHR$(51) ' s h f t DOWN ,PgDn.shf tPgDn c o l r X » colrX(-MARKX(CURSORX)) IF CURSORX < NUMSAMSX - 11 THEN CURSORX = CURSORX + 10 ELSE CURSORX • NUMSAMSX END IF CALL MRK(XMINX, TBX, TOPX, PGLNTHX, c o l r X ) CASE CHRS(71), CHR$(55) ' HOME, S h i f t HOME c o l r X = colrX(-MARKX(CURSORX)) CALL MRK(XMINX, TBX, TOPX, PGLNTHX, c o l r X ) CURSORX = 1 CASE CHR$(79), CHRS(49) ' END, S h i f t END c o l r X » colrX(-MARKX(CURSORX)) CALL MRK(XMINX, TBX, TOPX, PGLNTHX, c o l r X ) CURSORX = NUMSAMSX CASE ELSE END SELECT WEND LOCATE 1, 70 PRINT SPACE$(9) CALL MRK(XMINX, TBX, TOPX, PGLNTHX, colrX(-MARKX(CURSORX))) COLOR 15 RETURN ********* Opens the po r t fo r the p l o t t e r S e l e c t s pen 3 f o r l a b e l l i n g the dendrogram S e l e c t s pen 2 f o r l a b e l l i n g each sample S e l e c t s pen 1 be fo re r e t u r n i n g c o n t r o l back to PRIN-DEND 2000 CLS PRINT "Make su re tha t HP C o l o r P r o i s connec ted" PRINT "Load paper " PRINT "Pen 1 - b l ack (0.3mm) Pen 2 - b lue (0.3mm) Pen 3 p u r p l e " PRINT " P r e s s R e t u r n " ; INPUT " " , A$ A$ « LEFTS(A$, 1) IF AS = " q " OR AS = " Q " THEN PL7440X = 0 RETURN END IF OPEN "COM1:9600 ,S ,7 ,1 ,RS ,CS65535 ,OS ,CD" FOR RANDOM AS #1 'OPEN " 0 " . #1, "HPPLOT.OPL" PRINT #1, " S P 3 ; " PRINT #1, " S C O . 1 0 0 0 , 0 , 7 0 0 ; " PRINT #1, "PA 1 0 , 6 7 0 ; 0 R 1 , 0 ; S I . 4 , . 4 ; " CLS PRINT " En te r T i t l e - maximum 40 c h a r a c t e r s | " LINE INPUT TI$ TI$ = LEFT$(T I$ , 40) CLS PRINT "Now p l o t t i n g . . . " ; TI$ PRINT #1, " L B " + TIS + CHR$(3) PRINT #1, " P A 0 , 6 9 5 ; S I . 2 , . 2 ; " PRINT #1, " S P 2 ; " PRINT #1, " L B " + DATES + CHR$(3) PRINT #1, " D R 0 . 1 ; " PRINT #1, " S I . 1 7 , . 1 7 ; " PRINT #1, " P A 1 0 0 0 , 7 0 ; " PRINT #1, " L B " + FILES + CHR$(3) PRINT #1, " P A 1 0 0 0 . 2 0 0 ; " PRINT #1. " L B " + s c a l e d S + CHR$(3) PRINT #1, " P A 1 0 0 0 . 3 5 0 ; " PRINT #1, " L B " + methodS + CHRS(3) IF WWMETHX THEN PRINT #1, " P A 1 0 0 0 . 5 5 0 ; " PRINT #1. "LBww" + CHR$(3) END IF s i z e ! = 7 / NUMSAMSX IF s i z e ! > .3 THEN s i z e ! » .3 PRINT #1. "DR 1 , 0 ; " IF s i z e ! < .045 THEN s i z e ! = .045 PRINT #1, "S I " ; s i z e ! ; " , . 2 ; " CLASLENX « 0 FOR NX = 1 TO NUMSAMSX LX = LEN(RTRIM$(LTRIM$(CLASS(NX).R))) IF LX > CLASLENX THEN CLASLENX = LX NEXT FOR NX = 1 TO NUMSAMSX Nl$ = SPACE$(4) RSET Nl$ - (RTRIM$(LTRIM$(1d(PL0TX(NX)).R))) N2$ - RTRIM$(LTRIM$(CLASS(PLOTX(NX)).R)) LETPOSX = POSYX(PLOTX(NX)) + s i z e ! * 35 PRINT #1. "PA " ; LETPOSX; " , 0 ; " IF NUMSAMSX < 30 THEN BSX - LEN(N2$) IF BSX > 0 THEN PRINT #1, " C P " + STR$(-BSX) + " PRINT #1, " L B " + N2$ + CHR$(3) + " ; " LETPOSX » POSYX(PLOTX(NX)) + s i z e ! * 5 PRINT #1, "PA " ; LETPOSX; " , 0 ; " PRINT #1, "CP 0 , 1 ; " Nl$ = RTRIM$(LTRIM$(N1$)) BSX = LEN(NlJ ) IF BSX > 0 THEN PRINT #1. " C P " + STR$(-BSX / 2) PRINT #1, " L B " ; Nl$ + CHR$(3) + " ; " ; " " N$ « " 1 2 " ' 2 c h a r a c t e r s ELSE N2$ = LEFT$( " " + N2$, CLASLENX) N$ = N2$ + Nl$ FOR l nX = 4 + CLASLENX TO 1 STEP -1 PRINT tl, " C P - 1 , 1 ; " PRINT #1, " L B " + MID$(N$, l n X , 1) + CHR$(3) + NEXT END IF NEXT 26 XMINX = 19 * LEN(N$) PRINT #1, " S P 1 ; " RETURN 1030 END SUB SUB PMOVE (pxX ( ) , P Y X ( ) , nxX, nYX, o y l X , oy2X) PMOVE looks through PYX() [posyX() In main module] f o r anyX "members ' that have y - c o o r d l n a t e s 0Y1X o r OY2X [ o l d y l , 2 X ] . AnyX matches have t h e i r P o s i t i o n s moved t o NXX,NYX [newx.newy] |nsX = numsamsX| NSX = UBOUND(PYX) FOR IX » 1 TO NSX IF PYX(IX) = o y l X OR PYX(IX) = oy2X THEN PYX(IX) = nYX pxX( IX) = nxX END IF NEXT END SUB FUNCTION WADEDST! (NODEX) This r o u t i n e r e c u r s i v e l y WADEs th rough the dendrogram t r e e s t r u c t u r e to f i n d the maximum he igh t needed t o d i s p l a y the dendrogram s t o r e d 1 dendX as In the newly proposed fo rmat . The r o u t i n e takes the c u r r e n t d i s t a n c e s t o r e d In dendd!(nodeX) and adds i t t o the maximum o f the v a l u e s r e t u rned from the r e c u r s i v e ' c a l l s o f WADEDST(leftnodeX) and WADEDST(r ightnodeX). SHARED DENDXO, DENDD! () MAX! = DENDD!(NODEX) rnodeX = 0 rmax! • 0 InodeX = 0 lmax! = 0 cnodeX = NODEX DO 27 cnodeX = cnode% - 1 LOOP UNTIL cnode% = 0 OR MATCH(DEN0%(NOOE%, 1), cnode%) IF cnodeX THEN InodeX » cnodeX lmax! - WAOEDSTI(InodeX) END IF cnodeX » NODEX DO cnodeX • cnodeX - 1 LOOP UNTIL cnodeX = 0 OR MATCH(DENDX(NODEX, 2 ) , cnodeX) IF cnodeX THEN rnodeX = cnodeX rmax! * WADEDST!(rnodeX) END IF IF rmax! > lmax! THEN MAX! = MAX! + rmax! ELSE MAX! = MAX! + lmax! END IF WADEDST! » MAX! END FUNCTION SUB WRITECLASS ( c l s s AS RECRD, NUMSAMSX, NUMVARSX, SAMPX, F$ , opX) ' Th i s s u b r o u t i n e w r i t e s the c l a s s o f sample # SAMPX to the DSl f i l e F$ . (See opening c r e d i t s f o r format o f DSl f i l e ) The fo rmula f o r f i n d i n g the reco rd number o f the CLASS reco rd o f sample SAMPX i n a da tase t c o n t a i n i n g NUMVARSX d e s c r i p t o r s i s : ' REC! = (SAMPX+l)*NUMVARSX+4(SAMPX-l)+4 CLSS = 4 by te s t r i n g to be w r i t t e n NUMSAMSX = The maximum number o f s i g n a l s i n the f i l e . NUMVARSX = The number o f d e s c r i p t o r s per s i g n a l . SAMPX » the sample to be w r i t t e n F$ • the name of the f i l e ( .DS l e x t e n t i o n ) ' OPX = BOOLEAN v a r i a b l e . True i f f i l e F$ i s a l r e a d y open. 28 REC1$ = SPACE$(4) IF NUMSAMSX < SAMPX THEN GOTO nope r e c ! » (SAMPX + 1) * NUMVARSX + 4 * (SAMPX - 1) + 4 IF NOT opX THEN OPEN F$ FOR RANDOM ACCESS WRITE AS #1 LEN = 4 FIELD #1, 4 AS REC1$ END IF LSET REC1J « c l s s . R PUT #1. r e e l IF NOT opX THEN CLOSE #1 nope: END SUB to VO O 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0060322/manifest

Comment

Related Items