Tensile Strength Estimation of Lumber By Hossein S. Saboksayr B. A . Sc. (Electrical Engineering) Sharif University of Technology, 1988 M . A . Sc. (Electrical Engineering) Concordia University, 1995 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F D O C T O R O F P H I L O S O P H Y in T H E F A C U L T Y O F G R A D U A T E STUDIES ( D E P A R T M E N T O F E L E C T R I C A L A N D C O M P U T E R E N G I N E E R I N G ) We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y O F BRITISH C O L U M B I A August 2001 © Hossein S. Saboksayr, 2001 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. -The University of British Columbia Vancouver, Canada DE-6 (2/88) Abstract The problem of wood tensile strength estimation of softwood lumber is studied in this thesis. The main contributions brought to this topic here are first, a set o f knot geometry features that can be used in board strength estimation, and second a learning algorithm that selects the best set of features for the purpose o f strength measurement. The estimation problem is posed as an empirical learning problem that is based on the measured properties of wood. The process o f producing the required database consisted o f three distinct tasks: selecting and preparing the boards, measuring a set of properties of wood for every board, and estimating the measured strength of each board from the measured profiles. A set of boards, providing a random sample o f softwood lumber, already existed at U B C (from previous experiments). These boards were measured and used as the preliminary database. A second set of boards was selected randomly from the regular production of softwood lumber. These boards created the evaluation data set. For the measurement task, all the boards were scanned using the available measurement machines. These machines were S O G and Microwave for grain angle measurement, X -ray for local density measurement, dynamic bending machine for the Modulus of Elasticity measurement, as well as the ultimate tensile strength tester for measuring the tensile strength of a board. The output profiles per board were saved in a data file (one data file per board per machine). The measured data files were stored in a database consisting o f a structure of directories. In the strength estimation task all the measured profiles o f a board were mapped to specific features (usually statistical moments) and the features were then mapped to the strength of the board. One o f the features of a board is the set o f its knots. A conic model of a knot was chosen and the related mappings were developed such that the X-ray scanning could be used in order to detect the existence, location, and shape o f knots in a board. Then geometrical features were proposed such that the set o f knots of a board could be transformed into a set of features suitable for strength estimation methodology o f this thesis. ii Since specimens are costly to measure, means to reduce the number required were developed. To this end statistical learning theory was applied. This theory addresses the suitability of the learning model for the physical problem and the effectiveness o f the features for the estimation problem. Based on this theory, the A S E C learning model was developed. The learning problem for wood tensile strength estimation was divided into three problems: defining the most suitable feature set, measuring the suitability of a learning machine, and using the a priori knowledge about the dependence in the learning machine. A method for measuring the suitability o f a regression estimator (VC-dimension) was developed in order to select the best model in a class o f models. The A S E C learning model was developed in order to find the best set o f new features from the given feature set by using the known dependencies. Different learning machines were tested in order to determine what model is most suitable for tensile strength estimation of lumber. The validity of all the methods was demonstrated by analytical proof, by simulation, or by test on the database. iii Table of Contents ii Abstract „ , , viii List of Tables ix List of Figures Nomenclature X U xvi Acknowledgement Chapter 1. Introduction 1 1.1. Objective and limitations 1 1.2. The paradigm 2 1.3. Specimens 2 1.4. An overview of the experimental work 4 1.5. An overview of the analysis 6 1.6. Review of other attempts to transform measured mechanical properties into strength 7 1.7. Economic importance of this knowledge 10 1.8. Why Geometrical features are important 10 1.9. Outline of this thesis 11 1.10. The Contributions of this thesis 12 Chapter 2. Measurement means; practical constraints 13 2.1. Measurement of the Physical Properties of Wood 14 2.1.1. Modulus of elasticity of wood 14 2.1.2. Moisture content 15 2.1.3. Dielectric Properties 16 2.1.4. Density of wood 17 2.2. Measuring density with an X-ray machine 17 iv 2.3. Measuring Grain Angle with SOG and Microwave Machines 22 2.3.1. Grain angle via an SOG machine 22 2.3.2. Grain angle measurement via microwave 25 2.4. Modulus of Elasticity using the Continuous Lumber Tester 27 2.4.1. Calibration 28 2.5. Ultimate Tensile Strength Tester 36 Chapter 3. Modeling wood failure 37 3.1. Tensile strength of clear wood as a random variable 38 3.2. The effect of grain angle on strength 39 3.3. The effect of density variation on strength 40 3.4. The effect of holes on strength 41 3.5. The effect of knots on strength 41 3.6. Other factors 42 3.7. A 2x4 board as a system 43 3.8. Proposed approach to modeling strength as a continuous function of wood properties 44 Chapter 4. Characterizing a board; features 47 4.1. A general view on defining the features 49 4.2. Properties of clear wood 51 4.3. Characterizing the defects 55 4.4. Geometrical features 57 4.5. Characterizing feature sets are not unique 59 4.6. Board model 60 Chapter 5. Extracting features from measured profiles 62 5.1. The image of a knot in an X-ray image; the shadow image 62 v 5.2. Thresholding; classifying a pixel into knot or clear wood classes 64 5.2.1. Bayesian classification 66 5.2.2. Edge detection 68 5.2.3. Experimental data 70 5.3. Indexing the pixels of a knot 74 5.4. Knot pattern 75 5.4.1. Knot model parameters 78 5.4.2. Measuring and calculating the features of the pattern 78 5.4.3. Matching patterns 82 5.5. Geometrical features of a knot 82 5.6. X-ray image analysis; extraction of board image and defect detection 83 5.6.1. Edges of the board 84 5.6.2. Knot detection 85 5.6.3. Three dimensional knot shape detection (simulation) 90 5.7. Knot detection for real data 96 5.8. X-ray and SOG interrelation 100 5.9. Sufficiency of sampling rate; the problem of resolution 103 5.10. Preparation of specimens and measurement process 105 5.11. Specimen Characterization 112 Chapter 6. Strength Estimation; Empirical Learning for Small Sample Size 120 6.1. Strength estimation as an empirical learning problem 122 6.2. High dimensionality of the feature set 130 6.3. The relationship between estimation accuracy, learning capacity, and sample size 132 6.4. Extending the learning-capacity (VC-dimension) to regression estimators 135 6.4.1. The equivalence of classifiers and regression estimators 135 6.4.1.1. Learning problem statement 135 6.4.1.2. Constructing a classifier by using a regression estimator 137 vi 6.4.1.3. Constructing a regression estimator by using a classifier 138 6.4.2. The VC-dimension of a regression estimator 142 6.4.3. Corollary 143 6.4.4. Measuring the VC-dimension at a point 143 6.5. Strength estimation by using existing methods 149 6.5.1. Linear regression 149 6.5.2. K-Nearest Neighbor and radial basis functions 150 6.5.3. Multi-layer neural networks 151 6.5.4. SPORE 153 6.6. Alternating Space Expansion and Contraction (ASEC) 154 6.7. Estimation accuracy of different learning machines 163 6.8. Discussion of the results 165 Chapter 7. Conclusions and Future Directions 167 7.1. Conclusions 167 7.2. Future Directions 168 Appendix: Board failure; initiation and propagation of a fracture 170 References 174 vii List of Tables Table 1: The average output, average MOE, and transformation coefficients for selected specimens. 31 Table 2: The number of output signals of measurement machines 108 Table 3: Specimen file names 109 Table 4: The measured VC dimension of a linear regression estimator 146 Table 5: The parallel feature sets 157 Table 6: The computed after each iteration of ASEC 162 Table 7: The accuracy of estimation by using different learning machines. 164 Table 8: The accuracy of the estimation for different feature sets. 165 vii i List of Figures Figure 1.1. Wood strength estimation problem paradigm; different stages of the learning problem.2 Figure 1.2. Different Specimen sets that were produced during the course of this research. 4 Figure 2.1. Three-point and four-point bending test. 15 Figure 2.2. The mechanism of a Continuous Lumber Tester 15 Figure 2.3. A simple diagram of the X-ray system and the shadow image 18 Figure 2.4. The profile of one cross-section of the board. 19 Figure 2.5. The shadow of an X-ray image. 20 Figure 2.6. The shadow image of a knot. 20 Figure 2.7. The contour map of an X-ray image. 22 Figure 2.8. The rotating head for grain angle measurement. 23 Figure 2.9. A measured grain angle profile of a board. 24 Figure 2.10. Microwave System for measuring wood grain angle 25 Figure 2.11. The arrangement of the probes (left) and the structure of a probe (right). 27 Figure 2.12. Typical output profiles of a dynamic bending machine. 28 Figure 2.13. The extracted and adjusted dynamic bending machine output profile for specimen #0760 29 Figure 2.14. Measured static MOE of specimen #0760 30 Figure 2.15. Linear relationship between the average measured dynamic bending machine output 32 Figure 2.16. Measured MOE profiles by and static bending methods. 34 Figure 2.17. The repeatability of the measured profiles for machine. 35 Figure 2.18. The repeatability of the measured profiles for machine (continued). 36 Figure 3.1. Strength estimation as a learning problem. 45 Figure 4.1. Steps of strength estimation method. 50 Figure 4.2. The histogram representation of density distribution of a board 53 Figure 4.3. Grain angle profile of a board and its distribution. 54 Figure 4.4. The MOE profile of a board and its distribution. 55 ix Figure 4.5. The local density profile of the board generated by a moving window. 57 Figure 4.6. The density profile of a knot as compared with the density of clear wood of a board. 58 Figure 4.7. Conic model of a knot. 58 Figure 4.8. The model of a board that consists of the clear wood and defects. 60 Figure 5.1. The X-ray image of a knot 63 Figure 5.2. Contour map of an X-ray image. 64 Figure 5.3. Probability distribution of clear wood density and knot wood density. 66 Figure 5.4. Density distribution of X-ray image pixels of a specimen 73 Figure 5.5. Sampling pixels of a knot from an X-ray image. 74 Figure 5.6. A search for neighboring knot pixels. 75 Figure 5.7. A knot pattern (from its image) 76 Figure 5.8. Two orthographic projections of a knot. 77 Figure 5.9. The pattern of a knot. 78 Figure 5.10. The approximate image of the knot. 79 Figure 5.11. The four points of the pattern 80 Figure 5.12. The flow chart for knot pattern analysis. 81 Figure 5.13. The output of the X-ray system with and without the sample board. 84 Figure 5.14. Output of the X-ray imaging machine. 85 Figure 5.15. The edge of all knots of a board detected by fixed Bayesian based thresholding. 86 Figure 5.16. The edge of all knots of a board ... 86 Figure 5.17. Knot detection by using Bayesian thresholding. Specimen 130. 87 Figure 5.18. Knot detection by using Bayesian thresholding. Specimen 70. 87 Figure 5.19. The top view X-ray image and discretized X-ray image of a knot. 88 Figure 5.20. The side view X-ray image and discretized X-ray image of a knot. 89 Figure 5.21. The diagram of a knot cone and its parameters. 92 Figure 5.22. Contour diagrams showing the simulated top X-ray image ... 94 Figure 5.23. Estimation accuracy for the knot diameters. 95 Figure 5.24. The simulated knot pattern that is partially stretching outside the board. 96 x Figure 5.25. The result of applying the knot detection algorithm to the X-ray image. 97 Figure 5.26. The simulated X-ray images. 98 Figure 5.27. The location and size of knots in CCD camera images. 99 Figure 5.28. Most of the knots show a grain variation in the grain angle profile. 101 Figure 5.29. Visual image, X-ray image, SOG profile, and MOE profile. 102 Figure 5.30. The measured signal and its Fourier-transform. 103 Figure 5.31. The measured signal and its Fourier-transform of a typical SOG profile. 104 Figure 5.32. The database structure. I l l Figure 5.33. The coordinate of a knot in a board 113 Figure 5.34. The general flow chart of geometrical feature extraction program 115 Figure 5.35. The flow chart of the knot extraction routine 116 Figure 5.36. The flow chart of knot extraction routine 2 117 Figure 5.37. The flow chart of the statistical feature extraction program 118 Figure 6.1. The learning problem 123 Figure 6.2. Distance of all other boards from a sample board. 131 Figure 6.3. The block diagram for estimating the VC-dimension 147 Figure 6.4. A multi-layer neural network. 152 Figure 6.5. The cascade form of SPORE-1. 153 Figure 6.6. Two iterations of ASEC 155 Figure 6.7. Estimation accuracy from the contracted space (top) and the empirical risk (bottom). 158 Figure 6.8. The curve of the upper bound of the risk functional for the given number of features. 159 Figure 6.9. Replacement of close features in the trajectory table. 161 Figure 6.10. The flow chart of one cycle of the ASEC 163 xi Nomenclature A a a i B CART CCD camera CLT™ cm c D d d i d(.,) E E(.) F feature P F(.) A-) g(-) h h(.) I 0 KAR Minimum of a bounded function Fracture surface Area of either half of a knot pattern Maximum of a bounded function Classification And Regression Tree. A learning method A Charge-Coupled-Devicec digital camera Continuous Lumber Tester, a machine for measuring the dynamic modulus of elasticity of aboard The cost of misclassifying an element of class'/' into class 'f Speed of light Deflection (m) of a board in a bending test Diameter of a knot at the center of the knot Diameter of a knot at either end of it Distance between two points Modulus of elasticity Also, energy of matter Expected value of a random variable A The feature space that represents X A property of a profile, e.g., the average of a profile The physically meaningful set of features. The cumulative distribution function An approximating function /th feature that is selected inyth iteration of subset selection in ASEC A function A measure of the degree of freedom of a data set Also, the learning capacity of a machine An element of the approximating function set, in a dictionary based learning machine Intensity of an electromagnetic wave Also, moment of inertia (m3) of an object Also, size of a subset that is selected in an iteration of ASEC The initial intensity of electromagnetic wave before attenuating in matter Knot Area Ratio, the worst ratio of a knot cross-section to the board cross-section The stress intensity factor for opening mode of failure xn K The critical stress intensity factor for opening mode of failure K The stress intensity factor for sliding mode of failure K The critical stress intensity factor for sliding mode of failure Km The stress intensity factor for tearing mode of failure L The span {rn) of the bending test L(.) A loss function, based on which risk functional is defined A Total number of specimens in a training set A Distance of the center of a knot pattern from either end M(.) The Fourier transform of the MOE profile MEL Machine Evaluated Lumber. A lumber-grading standard MOE Young's Modulus of Elasticity MOR Modulus Of Rupture. The bending stress level that causes board to break MSR Machine Stress Rated; A lumber-grading standard m Mass of matter Also, the mean of a probability distribution function m(.) The measured MOE profile m'(.) The measured MOE profile with zero mean m Mean of the probability distribution function of class'/' N Number of pixels in a class n Number of samples or specimens n. Number of samples or specimens in class'/' o The center of a knot pattern P The load (AO in deflection of a board. profile A measured property in the form of a one dimensional signal or a two dimensional image p. Any point on the knot pattern p A knot pixel in a generated cube p Projection of a knot pixel on the central line of a generated cube p(x) The probability distribution of V if it belongs to class 7 psi Pound per Square Inch Q(.,.) A loss function qt Probability of class V happening R The complete space of physical features R A region in the measured variable X R(.) The risk functional, the mimmization of which is the goal of training a learning machine R (.) Empirical risk functional, by minimizing which a learning machine is trained Rempi) Empirical risk functional of classifiers xiii RempC) Empirical risk functional of regression estimators RZqk) Empirical risk functional of regression estimators that are made up of a group of classifiers r Distance (in a two dimensional polar coordinate) of a material element around the tip of a fracture r2 Coefficient of determination (^(xy) = c o v ^ x ^ ) . An estimation accuracy i A measure a a ' x y r(.) Penalization function S The tensile strength of a board S The set of samples that belong in class; S A set of approximating functions with fixed complexity § Estimated tensile strength S(.) The Fourier transform of the SOG profile SOG Slope Of the Grain. Slope of the grain of wood SPORE A learning method for high dimensional feature space SPF Spruce, Pine, and Fir s standard deviation of a distribution s(.) The measured SOG profile T An iteration of subset selection in ASEC t Thickness of a specimen U Potential energy in a board due to stress U Surface energy of a fracture w The set of parameters of an approximating function w * Optimum set of parameters of an approximating function X The measurable space of physical features Also, the measured density (the value) of a pixel of an X-ray image $ The space of all the measured profiles X A single point in the feature space x The input to a learning machine Y The vector of outputs y The output of a learning machine y A vector z A sample point z The sample space, the set of all sample points Z2^ A set of 2A specimens <J>(.) Penalization functional, which defines the complexity of an approximating function xiv A The difference between two consecutive threshold levels A(.) The difference between practical and ideal risks T* A bootstrap sample from a training set A Population of all the image pixels The upper bound error for empirical risk functional n A real number between zero and one a The angle between two vectors Also, a set of parameters of a learning machine ax Optimum set of parameters resulting from a training set of size A a0 Optimum set of parameters resulting from minimizing the risk functional H The location parameter in an approximating function Also, mass absorption coefficient (1/crri) a The scale factor in an approximating function Also, the stress level in a board Also, the square root of variance of a distribution a The critical stress level c X The shape factor in an approximating function Also, the regularization parameter in an empirical risk functional ys Elastic surface energy per unit thickness 9 Angle (in a two dimensional polar coordinate) of an element around the tip of a fracture p Mass density iglcrrfi) e Total permittivity of matter Also error in strength estimation e(A) Error in empirical risk functional due to the size of the learning set d The permittivity of matter £" The loss factor in permittivity of matter n Class (clear wood or knot wood) of an image pixel (oj The output of a classifier £ (Z2 )^ Maximum variation in empirical risks of A samples, in a set of size 2 A vx Empirical risk of a set of size A, in VC dimension detection process xv Acknowledgements I would like to acknowledge and express my appreciation for the support, patience, and continuous encouragement o f my supervisor Peter Lawrence, which has contributed a great deal to the completion o f this thesis. I would also like to acknowledge the help and cooperation of Frank Lam (Co-supervisor), David Barrett, Gary Schajer, Greg Grudic, Jacek Biernacki, Carl Flatman, Wilson Lau, Glen Trarup, and Bob Myronuk, A l l the required measurements were done in Wood Products Laboratory (Wood Science Department o f U B C ) as well as our industrial collaborators. The cooperation and help from the laboratory coordinators and researchers is also acknowledged. I also acknowledge my friendly roommates (in CICSR) , M r . Philip K . L i u and M r . Mohamed K . Cherif. The relaxed atmosphere o f my workplace has definitely had a positive effect on my work. xvi To my parents for their unlimited love and support Chapter 1. Introduction 1.1. Objective and limitations The goal o f this research is to find a suitable method for estimating a board's tensile strength by measuring the important-known material properties o f wood. A s wil l be shown, this problem inherently leads to a series of studies of wood structure, wood properties measurement, feature definition, learning systems, and estimation accuracy analysis. This thesis is an attempt to tackle this problem on all levels to produce a coherent analysis o f the problem. From a practical point o f view, the problem of strength estimation can be divided into the following problems: collecting a set of specimens, selecting measurement means, defining and extracting features from the measured signals, and estimating the strength by using the features of a specimen. Strength estimation can be cast as a learning problem. To do so, one needs different parallel measurement profiles of different wood properties and must define a variety of different features, in order to reduce the chance o f having any un-modeled factors. Therefore, a high dimensional feature space is needed for strength estimation. In the end, the problem of wood strength estimation is transformed into a learning problem in a high dimensional (more than 20 dimensions) feature space. The practical restrictions can be considered as conditions of the learning problem. There are potentially three major restrictions to the application o f the resulting strength-estimating model. The first restriction is, that because of measurement difficulty, the number of specimens for the database wil l be limited, which could make the database unsuitable for some learning systems. The second restriction is that properties of each species can vary depending on the region o f harvest, which means that any single model cannot be claimed as a general model of wood strength. Finally, the production lines may have different measurement means that have demonstrated reliability over time. 1 Therefore, many asymptotic learning methods may not be suitable for strength estimation and various learning methods should be explored in order to get the best result. Also, this research does not claim generality for the produced model to many species and wood harvesting regions. However, since the database of this thesis consists o f softwood lumber produced here in British Columbia (from here on B . C . ) the strength estimating model is suitable for B . C . 1.2. T h e p a r a d i g m The problem of wood strength estimation can be summarized as shown in Figure 1.1. The problem can be divided into four major tasks; specimen selection, measurement, feature extraction, and model development. Each part o f the problem is briefly explained in the following. specimen profiles measurement signal conditioning and feature extraction features learning estimated strength a priori knowledge Figure 1.1. Wood strength estimation problem paradigm; different stages o f the learning problem. 1.3. S p e c i m e n s A model is only representative o f the database used to formulate it. For this study, specimens were selected at random from normal production in a regular mill. In particular, four bundles o f regular production from two different mills (two bundles from each) were shipped to the laboratory for measurement and destructive testing. (The boards with obvious fractures were removed.) Each board was assigned a number that was written on the cross-section o f one o f its ends. This number was used to code all the data files that would be related to the same board and was kept constant throughout the project. Also, the same number was used to reference the specimen in the final feature set. During the course o f this research, two independent data sets were produced for the pilot study and the evaluation study, respectively. The pilot study data set consisted of 800 boards and was further divided into two separate sets: set one was used to develop a set o f features and the boards in this set were destructively tested for tensile strength; set two was used as a repetition of the first set as well as for comparing the measurement machines in the laboratory with the measurement machines used in the industry. A description of all the measurement machines and their limitations wil l be presented later in the related sections. In the evaluation study, presented in Sections 6.5 to 6.8, 1100 boards were selected from local mills. Similar to the boards in the pilot study, the boards in the evaluation study were a random sample o f the produced boards. A l l the measurements were done using industrial machines and the tensile strength of the specimens were measured at the University of British Columbia (from here on, U B C ) . The following diagram shows the specimen sets and the related measurement machines. 3 SOG Cook-Bolinder Vision-Smart X-ray UTS Pilot Study Preliminary Set (400 boards) SOG + Cook-Bolinder + Vision-Smart X-ray UTS Microwave CTL™ + Advantage2 Pilot Study Measurement Comparison Set (400 boards) Microwave Dynamic bending machine Advantage2 UTS Evaluation Study (1100 boards) Figure 1.2. Different Specimen sets that were produced during the course of this research. The S O G and Microwave machines measure the local grain angle of wood. Cook-Bolinder and CLT™machines measure the flatwise M O E of the board. Vision-Smart X-ray and Advantage2 measure the local density. U T S is for ultimate tensile strength measurement. 1.4. An overview of the experimental work The measurement means wil l be discussed in detail in Chapter 2. Briefly, they consisted o f the following systems. For X-ray, a Vision Smart X-ray machine (at U B C Wood Products Laboratory) and a Newnes Advantage2 were used. The X-ray machines ([1]) produced two-dimensional 4 density images o f each board. Two images o f the same board, one from the top and one from the side, were produced. The Vision Smart X-ray machine was used for the pilot study and was set in the Wood Products Laboratory o f U B C . The Newnes Advantage2 X -ray machine was used for the evaluation study. The data produced from the two machines were similar, but the Newnes Advantage2 database was used in the evaluation study because o f the higher resolution and because its conveyor had a more consistent speed. For S O G , the capacitance based heads, at U B C Wood Products Laboratory, and the microwave machine were used. These machines measured the local grain angle of wood along the board. The grain angle measurements were very similar between the two machines therefore the microwave measurements were used for strength estimation in evaluation study. For M O E , Cook-Bolinder M O E tester (at U B C Wood Products Laboratory) and Metriguard CLT™([4]) machine (at our industrial collaborator), also called dynamic bending machine here, were used. Cook-Bolinder M O E tester measured the flatwise Modulus o f Elasticity ( M O E ) profile of a board in a three-point test (see Figure 2.1) and measured the M O E profile o f one side o f the board in one scan. A Metriguard CLT™ machine is commonly used for wood grading and produces the two M O E profiles by one pass o f the specimen through the machine (see Section 2.1.1). A s wil l be shown in Section 2.4.1, the CLT™ machine is repeatable and its output is highly correlated to static M O E (see Section 2.1.1). Therefore, in the evaluation data set, only the CLT™ profile was measured and used for strength estimation. The ultimate strength tester (at U B C Wood Products Laboratory) was used to get the tensile strength o f each board in both the pilot study and evaluation study. In the U B C Wood Products Laboratory, all o f the different types of measurements were carried out independently. Once the measurement of one profile o f all the boards was finished, the laboratory was prepared for the next measurement. For each measurement, as each board was fed into the measurement machine its number was registered in a table. The measured profile of each board was saved in a different data file. Once an experiment (e.g. an X-ray measurement) was over, all the data files were renamed so that the file name represented the name o f the measurement (e.g. X-ray) 5 and the board number (e.g. 0114). The calibration files (if any) were kept with the data files in a directory to be used in a feature extraction program. 1.5. A n overv i ew o f the analys is A s was stated above, the selected specimens limited the application of the model. Since the sample boards were from B C , the strength estimation model is suitable only for softwood production in local industry. The limitations of the model not withstanding, the best choice of measured properties for model development is that they produce the minimum amount of uncertainty for board characterization. Measurement selection and feature extraction can be viewed as attempts at specimen characterization, such that boards with different features are mapped to different points in the feature space. Since the strength of a board is a random variable in this space, good feature definition minimizes the variance o f this random variable at a given point. Each specimen is characterized by a set of features, which are the statistical or the geometrical transformations of the measured profiles. The total of 64 features wil l be used to characterize each specimen of the database o f this thesis. Each one-dimensional profile wi l l be transformed into eight statistical features and the combination of the two X-ray images wil l be transformed into 16 features. There wil l be three M O E profiles. O f these three, two profiles are the measured profiles and the third profile is the average profile o f the two measured profiles (at every point along the board). Therefore, there wil l be 24 features from the measured M O E profiles. The measured grain angle profiles wil l be grain angle, grain alignment, longitudinal signal amplitude, transverse signal amplitude, longitudinal signal phase, and transverse signal phase. However, the three signals; grain angle, grain alignment, and longitudinal signal amplitude were used in the strength estimation. Therefore, there wil l also be 24 features from the measured grain angle profiles. Each of these profiles wil l be transformed into: minimum, maximum, average, standard deviation; variance, kurtosis, skewness; and the standard deviation o f the absolute value. The X-ray images o f the board wil l be transformed into the board three-dimensional model. The clear wood features: average, standard deviation, and variance of clear wood 6 density (as a random variable); the board volume; and the board density (sum of clear wood and non-clear wood density). Also, the knots wil l be characterized by: the number of knots; the ratio of knot volume to board volume; average and standard deviation of neighboring knot distances; average and standard deviation o f knot three-dimensional shape form the main axis of the board; average and standard deviation o f the standard-deviation o f all the knot pixels from the main axis of the board; and average and maximum of knot area ratio. The learning model is a regression-like estimator that relates measured features to the strength of the board. The selected features are given in Section 5.11. At this level, the concept o f learning capacity (or V C dimension) of the learning system is important (see Section 6.3). Briefly, the limited number o f measured specimens limits the accuracy of the learning system. In other words, there should be a search among different methods of learning to find the best model for a limited-size training set. Statistical learning theory was used to guide this search. A group of learning systems operating on different concepts was tested to find which method works best. A learning method was suggested whose structure is compatible with the constraints o f this learning problem. Finally, the performance of different learning models was compared. 1.6. R e v i e w o f o ther at tempts to t r a n s f o r m m e a s u r e d m e c h a n i c a l proper t i e s into s trength Since wood is a traditional structural material, there are many heuristic and scientific methods developed for estimating its strength. Among these, the weight o f a piece o f wood, which was used to represent the density o f wood, and the detection o f the presence of knots are the oldest. Scientific methods were later developed for finding the relationship between wood strength and one or more measured properties of wood. Since an estimation method can almost be identically used for tensile or the bending strength of wood, we wil l not distinguish between those. In the following discussion, different approaches to wood strength estimation are summarized. 7 The modulus of elasticity of wood is widely considered as the best single feature for estimating the strength of wood ([5],[6],[7]). In this method, the M O E (modulus of elasticity) profile o f a board is measured and a regression estimator maps the average of M O E profile to the strength o f wood. This method is the basis of an industry standard for machine stress grading. Special measurement machines were developed to measure this profile while the board passes through the machine at high speed. The coefficient of determination (r^xy) = c o v ^ x ^ ) between strength and the measured feature is r 2 « 0.5. i y The existing technology allows only the flatwise M O E profile of a board to be measured, that is, the M O E profile is measured by placing the bending force on the wider face (plank) of the board. Therefore it leaves out the applications where the bending force is applied to the narrower face (edge) of the board. The grain angle of wood has also been used for estimating strength ([8],[9]). In a similar approach, a regression estimator transforms the average grain angle to the strength o f wood. For this purpose, the capacitance based grain angle measurement and microwave based grain angle measurement tools were developed. The capacitance based grain angle measurement machine is most affected by the grain angle at the surface o f the board. The microwave measurement method can produce more than one measurement. It can estimate the average grain angle, and also the moisture content and the local density of wood. The features are usually the statistical moments of the measured profile and a linear regression estimator can be used to map the features to the strength. A finite element model of wood has been developed for estimating the strength of wood by assuming that the grain angle of wood is known at every point ([10]). Other parameters o f wood such as local density, M O E , etc., are typically assumed to be constant. This model is reported to produce accurate results for single knot and double knot pieces o f wood. The highest reported accuracy is r1 = 0.89 ([10]). Because o f the high computation time, this method is useful for laboratory study but is not suitable for industrial speeds. The effect of knots on the strength of wood was studied in ([11],[12],[13]). The use of knots in strength estimation is usually the limiting factor, and is applied in the visual grading of wood. There are category and boundary rules for estimating the impact o f a 8 knot on the strength. Therefore, a board is downgraded if, based on the rules, the knot or knots are considered to be significant strength reducing factors. Although this is obviously a very qualitative assessment o f their effect on strength, this method is an industry standard. This fact shows the high impact o f knots on the quality o f the graded wood. The geometrical features, as wi l l be discussed later, are especially developed to include this factor for characterizing a board. The local density of wood has also been suggested for strength estimation ([14],[15]). For this purpose X-ray machines (called X-ray lumber graders) were developed. It was proposed (arguably) as a non-contact measurement means that can replace the M O E measurement method ([15],[16]). In this case, the statistical features o f an X-ray image of a board are transformed to its strength by using a linear regression model. Visual systems ( C C D cameras, infra-red spectroscopy, and laser scanning) were also used for defect detection and grading of wood ([17],[18]). These methods are based on finding the difference between the color o f defects and that o f clear wood. Also, three-dimensional profiling (using laser scanning) is studied for estimating the defects and shape o f the board. The detected defects are related to the surface o f the board and a regression estimator transforms their features to strength reducing factors of the board. The grade o f wood was estimated in ([19]) by using the slope of the grain, infrared spectroscopy, M O E , and visual scanning. A classification and regression tree ([20]) method was used to learn the best grading thresholds. The goal was to find better rules for strength classes. In the Classification And Regression Tree ( C A R T ) method the feature space is repeatedly divided into smaller sections, as long as the total estimation error is reduced. In learning systems theory, the S P O R E algorithm ([21]) was developed specifically for very high dimensional problems. This method o f learning was originally developed for human-to-robot skill transfer. This method is suitable for very large dimension feature space (about 100 dimensional) and large numbers of specimens (about 10,000) and uses almost no a priori information about the data. A s wil l be shown in Chapter 6, this method is one o f the most successful methods for predicting the strength of wood. In this method, one generates a polynomial in the feature space ([21]) by producing two variable estimator blocks and adding the blocks to the model such that the estimation error is 9 minimized. The restriction of the wood strength estimation problem is that the practical number o f measured specimens is limited to around 1000. This is because the measurement process is costly and time consuming. The S P O R E algorithm has been tested for typical sample sizes of 10000. Therefore, this model may not be stable for a typical sample size used for wood strength estimation problem. 1.7. E c o n o m i c i m p o r t a n c e o f this knowledge The economic importance of this study arises from the limitations of the wood-grading standard. With the M S R lumber grading standard, three criteria must be met for every bundle that is sold. A flat M O E value (E) and an edge M O R (the bending stress that breaks the board) value (F) are provided for a grade. The following three criteria should be met for the assigned grade ([22]): a) The average flat M O E of a grade should be more than E; b) The fifth percentile o f M O E o f the grade should be more than 0.82 x E; c) The fifth percentile of M O R of the grade should be more than 2.1 xF. In practice, a stress-grading machine measures the M O E profile of each board. The average and minimum of this profile is measured and transformed into its grade. Recently, there have been attempts to improve the market value o f the grading system. In a recent study ([23]) it was shown that in M E L grading standards the limiting factor is the ultimate tensile strength of the board. Therefore, an increased estimation accuracy of the tensile strength increases the grade yield by more than 5%. It was shown (by simulation) that an increase in M O E prediction from 0.7 to 0.9 wil l produce more than 17,5 million U$/year in British Columbia ([23]). 1.8. W h y G e o m e t r i c a l features a r e i m p o r t a n t A s wil l be shown in future chapters, the problem o f learning the tensile strength of lumber is an empirical learning problem. In problems with small sample numbers, the a priori knowledge about the input-output dependence directly improves the estimation. 10 The geometrical feature set is a first step in including the classical study of wood failure under stress into this learning problem. Naturally, the features that are introduced are crude and only general information is used as a priori knowledge. It wil l be shown in Chapter 6 that even this set o f features produces relatively good estimation accuracy. The potential advantage of a geometrical feature set approach is that it is based on the three-dimensional model of a board, which wil l be produced by using scanning profiles. Since the model o f knots, the most common defect, is produced here, more refinement of features is possible by more carefully studying the effect of knots on strength. For example, through simulation enough samples can be generated so that a learning machine can find the effect o f common types o f defects on strength. 1.9. O u t l i n e o f this thesis This thesis is presented in seven chapters. Chapter 2 describes the measurement systems used in this project. A look into the wood property that is to be measured is presented. Also, measurement restrictions that are due to the limitations of the measurement machines are presented. Chapter 3 presents the effect of common strength factors and how they are modeled. Chapter 4 presents the chosen approach for feature definition and board characterization. Extraction of features from measured profiles is presented in Chapter 5. Details o f knot detection algorithm that is the basis o f the geometrical feature set is presented in this chapter. The problem of learning systems is discussed in Chapter 6. The learning problem and different learning machine structures are presented. Also, the A S E C learning machine is described in this chapter. The Appendix presents an examination of wood failure. This study motivated the focus of this thesis and provides a basis for comparison between classical study on this subject and the approach of this thesis. A l l the experimental results are given in the related chapters. 11 1.10. T h e C o n t r i b u t i o n s o f this thesis The main contributions of this thesis are as follows: 1. For analyzing an X-ray image, image segmentation (using Bayesian classification) is developed. In this method, the statistical properties of the image is used for segmentation. 2. A geometrical model of knots is developed and the detection algorithm and transformations are presented. 3. For characterization of a board, a geometrical feature set is developed. 4. In order to estimate the accuracy of a learning system, a method for measuring the learning capacity (VC-dimension) of a regression estimator is presented. 5. The A S E C learning method (see Section 6.6) is presented. In this method, the a priori knowledge about the input-output relationship is used to transform the given feature set into a better set of features. This method combines feature transformation and feature selection. 12 Chapter 2. Measurement means; practical constraints In this chapter the measurement systems used in the experiments are introduced and their limitations are discussed. For every wood property there are different technologies of measurement. Each technology and method of measurement is limited by its inherent restrictions that arise from the system's setup or its sensor requirements and limitations. As shown in Section 1.2, the measurement process is the first step in characterizing a specimen. The limitations of the measurement process directly affect the accuracy of the strength estimation because it is the basis o f specimen characterization and there is no other method for later compensation o f the generated error. Measurement restrictions can be examined from three different perspectives. The first limitation is that the measured property is affected by other properties o f wood, that is, one property of wood cannot be singled out. For example, the measured density of wood, i f measured by X-ray methods, increases with the increase of moisture content. The second limitation in the measurement process is influenced by the measurement method and the spatial resolution of the measured profile. Obviously, a property of wood cannot be measured at a single point and is usually the average of that property over an area or volume of wood, either at the surface of the board or through the depth o f the board. Any improvement of the measurement technique can result in a more sensitive machine and, therefore, improve the spatial resolution of the measured profile. The third limitation in the measurement process is dependent on the quality of the measurement machine, the sampling rate, and the noise level o f the output profile. In order to remove the measurement noise, almost all systems use averaging in a short period. For a board-moving system with a set o f fixed sensors, temporal averaging leads to spatial averaging. Higher-speed sensors allow the system to generate the output faster. In the following section the physical basis of the measurement machines is discussed. 13 2.1. M e a s u r e m e n t o f the P h y s i c a l P r o p e r t i e s o f W o o d The physical properties of wood relevant to the measurement systems o f this project are discussed in the following. 2.1.1. Modulus of elasticity of wood The modulus o f elasticity o f a specimen is measured by using bending machines. The board is bent and the force needed to bend the board is measured. Young's modulus or modulus o f elasticity ( M O E ) , is widely recognized by the wood products industry as the best variable estimator o f the bending strength of boards ([5], [6], [7]). It is defined as the ratio of stress over strain. In the three point bending test, as shown in Figure 2.1a, force is applied at the midpoint o f the board. M O E is calculated as follows. MOE = PD 148/D (1) Where P is the load (N), D is the deflection (m), L is the span of the measurement (m), and / is the moment of inertia (m 3 ) . This test is sensitive to the location o f defects in the wood specimen. Therefore, the operator tries to find the weakest point on the board and places it right at the midpoint (and on the tensile stress side), where the force is being applied. To reduce the sensitivity of this, a four-point test is often used (see Figure 2.1b). In this test, the middle part of the board is in constant bending moment, therefore, the whole span between the two forcing points is uniformly tested. 14 F F F — d TK—j ; d Bending force 1 1 Tensile force at the lower half part (a) (b) Figure 2.1 Three-point and four-point bending test. What is shown in Figure 2.1 is the basis of static M O E measurement systems where the board is placed on holders and bending force is applied. In a mill the boards are passed sequentially through a machine that continuously measures the M O E . The CLT™ (Continuous Lumber Tester) is one of these machines. A s shown in Figure 2.2 the dynamic bending machine has four rollers that are slightly displaced from a straight line. The first and last rollers are placed to grip the board and push the board through the system. The middle rollers are slightly displaced (Figure 2.2) so that they bend the board as it passes through. The rollers' locations are fixed, therefore, the bending of the passing board is fixed. The force applied by the board to the bending rollers is measured and sent to the output o f the system. Figure 2.2 The mechanism o f a Continuous Lumber Tester (CLT™, or dynamic bending machine) machine. 2.1.2. Moisture content The conductivity of wood is proportional to its moisture content. Therefore, by placing two conducting nails inside the board and measuring the resistance between them the moisture content of the board can be measured. Although this factor was not used as a feature of specimens, it was regularly checked to make sure that the boards were dry ( moisture content less than 12%). 15 2.1.3. Dielectric Properties Measuring the grain angle o f wood is usually based on its directional electric property. The permittivity of wood depends on the angle between an applied electric field and the longitudinal direction of the tracheid of wood. Also, the permittivity depends on the density, moisture content, and temperature of wood as well as the frequency o f the electric field used to measure it. Although these factors are briefly discussed here, they are not of significant importance because the grain angle measurement is based on the difference of permittivity along the grain and across the grain. The directional and variable permittivity o f wood is based on the polarization of its material and internal water and vapor. There are five polarization types that contribute to the overall polarization of wood. These are electronic polarization, ionic polarization, dipole polarization, interfacial polarization, and electrolytic polarization ([31]). Electronic polarization is due to the polarization o f the atoms o f the dielectric. Ionic polarization is a result o f elastic displacement o f atoms in a molecule. Dipole polarization is due to alignment of the existing dipole molecules in the direction of the electric field. Interfacial polarization is due to the interaction o f water and the cell wall that makes a cavity similar to a dipole. Electrolytic polarization is caused by the electrolysis of the soluble components due to the electric field. The first two polarizations are the major ones while the last three create the loss factor, making the permittivity a complex number as follows: e =s' - ie" (2) The permittivity is a directional factor with three major directions, longitudinal, radial, and tangential. The longitudinal permittivity is the biggest factor while the radial and tangential permittivity factors are approximately the same. A l l three factors are increased by wood density and moisture content. The permittivity of wood decreases as the frequency o f the applied electric field increases. A n increase in temperature may also cause an increase or decrease in permittivity, depending on the frequency o f the applied electric field ([31]). The loss factor (£") is usually much smaller than the real part of the permittivity itself, and therefore can be overlooked in the qualitative analysis of the measurement systems. 16 2.1.4. Density of wood It was shown that an X-ray source could be used for wood densitometry ([24]). The X-ray can be generated by an X-ray tube that projects beta rays onto a metal plate. The X-ray absorption phenomenon is a combination o f three absorption factors: the photoelectric absorption, the scattering absorption, and pair production absorption. The electrons of the matter rarely absorb the X-ray photons. This phenomenon is called true absorption or photoelectric absorption. Also, some of the photons are deflected by the atoms of the mass, which results in the scattering factor. In pair production absorption the ray transforms to matter (electron and positron) by the Einstein equation. The attenuation o f the ray as it passes through the board is as follows. where, ju, is the mass absorption coefficient (1/cm), and t is the specimen thickness. This equation can be written as follows. where/? is density (g/cm3) and p/p is the mass-absorption coefficient in square centimeters (area) per gram ([25]). A method of calculating the mass absorption coefficient for different materials is published in [26] and tables of it are published ([25]). A s the attenuated ray comes out of the wood, it is transformed to a measurable signal by an array of sensors. Each sensor is based on ionization of its atoms (gas or crystal) in an applied electric field ([27]), generating photons in the process. Every photon absorbed by the detector generates a pulse o f electric current. The count o f the pulses in a time interval is proportional to the intensity of the X-ray. 2.2. M e a s u r i n g densi ty w i t h a n X - r a y m a c h i n e X-ray scanning is used to measure the local density of wood. A s the X-ray passes through the board its intensity is attenuated. A sensor array measures the ray intensity. I = I0exp(-ttt) (3) I = I0exp(-(ju/p)pt) (4) 17 The board density image is mainly used for knot detection, which leads to the board's structural model. Nevertheless, the system and method o f this densitometry are commonly used and are explained in the following. The system consists of an X-ray source tube, which projects an X-ray fan on the object, and a linear array o f sensors that detects the intensity o f the rays after passing through the board. A metal frame reduces the vibration caused by any moving components. Lead curtains create a shield for the operator from the X-rays. A few rollers lead the board (pushed by the conveyor) through the system. As previously discussed, the attenuated ray coming out of the wood is transformed into a measurable signal by a linear array o f sensors ([27]). < X-ray tube Board cross section Shadow <•—Sensor array Figure 2.3 A simple diagram of the X-ray system and the shadow image results from a non-collimated X-ray. The board is extended and moves perpendicular to this page. The measured variable is the ray density, which is calibrated to wood density. For that purpose a set o f calibration aluminum plates with knot density and different thickness were used. They were placed in the machine and a polynomial was fitted to the output signal so that it transformed the measured intensity to wood density. A n array o f sensors (in a single line) can capture the density profile across the board at the measurement point. Each measurement generates a density profile related to one cross-section o f the board. A conveyor moves the board along, while repeated scanning takes place. In the end, an image o f wood density results for the whole board. The X-ray densitometry method and the structure of the measurement system impose some constraints on the produced image that are discussed here. The X-ray generated from the 18 tube is not collimated; therefore, it creates a shadow of the board corner at the sides o f the density profile. If the angle o f the ray (6) is relatively small (for the vertical distance between the source and the sensor array) there wil l be little horizontal deviation (s) in the projection o f the measured signal and thus no collimation is needed, which was the case for the X-ray system used in this thesis. A s wil l be shown, the angle of the ray can be measured from the X-ray image. The shadow o f the board (s) (as is shown in Figure 2.4) is less than three pixels in the machine used, which is considered negligible in the measured profiles and is ignored in the feature extraction procedure. X-ray profile of the cross section of the board at one point o. o £ e CO >H 0) >> (0 >< 500 — o l I I I , I , U I I I 0 20 40 60 80 100 120 140 sensor number (1 to 128) Figure 2.4 The profile o f one cross-section o f the board. The shadow of the board creates the side ramps. In Figure 2.4 the shadow of the board is 2 pixels at each side and the total number of pixels for the board is 35. Using the standard board size (89mm by 35 mm), one can calculate the maximum ray angle (G) with the center-line as follows (Figure 2.5): 19 e\ r(35mm) A < — ' V 89 mm \ . ,„ . , . (2 pixels) w (35 pixels) Figure 2.5 The shadow of an X-ray image. , s „ 5 89 mm , 2 ^ 89 mm v „ „ „ 0 = a t a n ( - ) = a t a n ( - * — ) = a t a n ( ^ * ^ ) = 8.1° (5) There is a more important shadow image that is related to the image o f knots of a board. Figure 2.6 shows an imaginary knot projected on the image plane similar to the result o f X-ray scanning of the board. A similar pattern is generated from the X-ray image o f a knot and wil l be used to detect knots in a board. The density of the image at every point is obviously the summation o f the wood density throughout the thickness of the board at that point. I f two perpendicular images are produced, a sense of density distribution in the body o f the board can be obtained. Figure 2.6 The shadow image of a knot. The sensor array makes a fixed resolution profile across the board. The image resolution along the board depends on the speed o f the board. I f the conveyor can push the board 20 through the system with a constant speed this resolution wil l be constant too. Through the course of this thesis, two X-ray systems were used for X-ray scanning. One o f them (made by Vision Smart) did not have any conveyor attached to it. Therefore a conveyor was placed next to the machine that pushed most of the board through the X-ray scanner and the end of the board (about 90 cm o f the length of the board) would free-run through the system because of its momentum. The deceleration of the board was compensated for as a preprocessing step when the image was being processed. The image was segmented into two parts, the first part related to the part o f the board that was pushed through the X-ray system with constant speed. The second part o f the image was related to the part of the board that was decelerating. A constant speed (that is the average speed during deceleration) was assigned to the second part of the image, as the momentary speed could not be estimated. The strength estimations by using each o f the outputs of the two machines resulted in similar strength estimation accuracy therefore here only a typical system is described. Another consideration regarding the X-ray image is that there should be a zero density region that surrounding the board density image. The content of the surrounding area is noise whose level is much less than the density o f clear wood. A threshold level can separate these parts and extract the density image of the board. Figure 2.7 shows the contour map of the top (for the plank of the board) and side images of the board. The lower right part of the image shows increased density that can be due to natural density variation o f clear wood or related to a defect, such as a resin canal. Vibration of the board is not considered in this case because slight vertical displacement of the board does not affect the measured image. This is because the source is far from the board surface and the slight change in the vertical location o f the board does not create significant change in the sensor readings. The noise level o f the system in the surrounding area is negligible, as can be seen from Figure 2.4. However, filtering by a square filter o f size three was advised (by the manufacturer) for smoothing the wood density. The system stores the measured ray intensity in 16 bits of data, which covers the spectrum of black to white. 21 c o n t o u r m a p of X - r a y s c a n from p l a n k o f t h e b o a r d « 0 75 S 7 0 a | 65 3 • SO JS * 55 50 • N^l: t,ML]» 4 0 0 600 600 1000 1 2 0 0 X - r a y i m a g e p i x e l n u m b e r (in th e o rig in a I p ic tu re ) c o n t o u r m a p of X - r a y s c a n from side o f t h e b o a r d _ 115 m e 0 400 600 BOO 1 0 0 0 1 2 0 0 X - r a y im a g e p i x e l n u m b e r (in the o r i g i n a i p i c t u r e ) Figure 2.7 The contour map o f an X-ray image. The final digital image is displayed by using two diagrams, the contour map and the three dimensional diagram o f the density. The contour map (Figure 2.7) is used for showing the general form of the image and the relative location of the defects in a board. The three dimensional density-diagram (similar to Figure 4.6) is used for showing the density variation due to the existence of a defect. 2.3. M e a s u r i n g G r a i n A n g l e w i t h S O G a n d M i c r o w a v e M a c h i n e s The principle o f grain angle measurement was discussed above. In this section the details of the two grain-angle measurement machines are presented. 2.3.1. Grain angle via an SOG machine The capacitance based measurement system that was used in this research was based on measuring the impedance o f a capacitor whose dielectric was the board at the measurement point and its operation is as follows (based on directional permittivity of wood). If two parallel strips of metal are placed on the top surface o f a board and the 22 impedance o f the created capacitor is measured, the impedance depends on the angle of the electric field o f the capacitor. I f the capacitor (i.e. the two metal plates) is rotated around its center the measured capacitance varies. The maximum capacitance is measured when the electric field is along the direction o f the grain. In a rotating head capacitor a cylindrical head is rotated around its axis. The bottom of the cylinder, where it contacts the board, four pieces of metal and a cross-shaped dielectric compose the circle as is shown in Figure 2.8. This arrangement creates two pairs o f parallel capacitors that are perpendicular to each other (as shown by the dotted lines in Figure 2.8). Therefore i f the electric field of two o f the parallel capacitors is along the grain o f wood then the electric field of the other two capacitors is perpendicular to the grain of wood. The angle o f the head where one of the two impedances is maximized and the other one is minimized is the angle of the grain o f wood. The impedance of each o f the capacitors is most affected by the grain of wood at the surface o f the board. The wood that is beneath the surface is affected less because the magnitude of electric field is less inside the board. Also, the measurement is taken over the area of the head. The diameter of the measurement head was about 5cm. The underlying assumption for accurate measurement is that the grain angle, and other factors that affect the permittivity, are about the same over the area of the measurement head. The measurement head rotates at 60 cycles per second and every half cycle one measurement can be completed. Therefore the sampling rate is limited to 120 samples per second. The grain-angle sampling rate is then controlled by the speed o f the conveyor. The sampling rate o f S O G measurement was about 1500 samples per board, which is about one sample per 3.3mm. The system's resolution is 0.1 degree and its accuracy is (±0.9) degree and the system covers -89 to 89 degrees ([2],[3]). Figure 2.8 The rotating head for grain angle measurement. 23 Figure 2.9 shows a profile that is measured by using this system. The large variations in Figure 2.9 are related to grain angle defects, which are usually related to knots. Small variations are related to the measurement noise. The two constant measurements at the beginning and the end of the profile is produced by the system when the board is not attached to the measurement head. The large variations, at the beginning and the end o f the board's grain angle profile, are produced when the board first reaches the measurement head or leaves it. A measured grain angle profile 1 i 1 1 1 J i MT* | 1*1-J [ J | ] ! i 1 800 1000 1200 1400 1600 1800 2000 sample number Figure 2.9 A measured grain angle profile of a board. The grain angle measuring system consisted of a metal frame and plastic rollers to push the board along the system and before the measurement heads. A stepper motor controls the movement and speed of the board. Two measurement heads were placed on the system so that two profiles could be measured simultaneously, one from the face of the board and one from the edge of the board. The diameters of the two heads were different. One head diameter was almost equal the thickness across the face of the board and the other head diameter was almost equal the thickness across the edge of the board. Four profiles were measured by using this arrangement, two profiles from the two faces and two profiles from the two edges. B y rearranging the system four extra profiles were 24 measured from the two faces of the board, each grain angle profile related to half o f a face. 2.3.2. Grain angle measurement via microwaves The microwave grain angle measurement machine is based on the directional attenuation of an electric field in wood. A simple diagram of such a machine is shown in Figure 2.10 ([28],[29]). In this machine a linearly polarized electromagnetic wave at microwave frequency («10 G H z ) is transmitted through the thickness o f the specimen. The directional conductivity and dielectric property of wood creates a non-symmetric attenuation and delay o f the wave. Therefore, the output wave is an elliptically polarized wave whose major axis is along the grain. The Microwave based grain angle measurement is based on identifying this ellipse. The standard deviation o f the measured grain angle is 2 degrees. M M Linearly polarized transmitting antenna Probe Circularly polarized receiving antenna Figure 2.10 Microwave System for measuring wood grain angle Let's say the electric field is represented by a sine wave as follows: E = E sin(tf*) (6) where the amplitude (E0) and the angular frequency (aj) are constant. Wood as an orthotropic material shows different loss factor and phase factor along and across its 25 grain. This would transform the linearly polarized electromagnetic wave to an elliptically polarized wave as shown in the following ([28],[29]). The incident wave components on longitudinal and transversal axes are as follows: E^E^cosiQ) (7) £ f f = £^s in (e ) (8) where 0 is the angle o f the incident field with the longitudinal axis. A s the transmitted wave propagates through wood, it is attenuated and delayed by the longitudinal and transverse propagation constants. rP = OLp+J% (9) rT = aT+j0r (10) Moisture content is the major factor of the wave attenuation and the wood material is the major factor of the wave delay. If the thickness of the specimen is d, the output wave is as follows: E T O = E I p e V = E 0 cos(9) e V * ^ (11) E ^ = E ^ e V = E 0 sin(0) e-<vwT« (12) which is an elliptically polarized wave. The assumption for the validity of equations (11) and (12) is that the two components of the electric field are uncoupled. Furthermore, based on a commonly used approximation ([30]), all internal multiple-reflections, inside the board, are neglected. The ratio o f En to Ew was used as a feature o f the grain angle that defines the alignment of the grain angle through the thickness of the board. In order to identify the ellipse, one needs to measure the amplitude of the electric field at least in three different angles. Therefore four probes, with a 45 degree angle between each other, were located underneath the specimen ([30]). Each probe is used to measure the amplitude o f the electric field along the direction of the probe. A probe adds a 455 K H z wave to the microwave. At the other end of the machine, the modulated signal is received 26 and the power (rms value) of the received signal is measured. The measured power represents the power o f the transmitted signal along the direction of the probe. to the signal source Figure 2.11 The arrangement of the probes (left) and the structure of a probe (right). Once the amplitude (or rms value) o f the signal along the direction of the probes is known, the general ellipse equation can be solved and the direction of the grain can be obtained ([29]). The measured grain angle profile is similar to that o f capacitance based (SOG) machine. The difference is that the output of the S O G machine is more affected by the surface grain o f the specimen, while the output o f the microwave machine represents the average grain angle through the depth of the specimen. 2.4. M o d u l u s o f E l a s t i c i t y u s i n g the C o n t i n u o u s L u m b e r T e s t e r A s was discussed on Section 2.1, the Continuous Lumber Tester machine measures the flatwise M O E profile of the specimen by moving the board through two slightly displaced rollers and measuring the bending force. Two profiles are measured from each board (Figure 2.12) one from each side o f the board. The machine is calibrated by using aluminum bars. However the average M O E of the two profiles was compared with the static M O E measurement (Section 2.1) in the lab. A linear dependence between the measured average output and the static M O E was observed. A s is shown in Figure 2.12 the noise level when there is no board in the machine is negligible as compared with the signal level. Therefore, thresholding was used to extract the M O E profile from the measured profile. The two profiles that are shown in Figure 2.12 are apart by a fixed distance. This distance is equal to the distance of the two measurement cells. The measured signal at the output of each sensor is an average over the span o f the board between one gripping roller and the other sensor roller. Therefore, the signal is very 27 smooth and does not need a high sampling rate. As is shown in Figure 2.12 there are about 500 samples over the length of the board. About 70 centimeters from each end of the board is not measured. The distance between the two sensors was twice that length, i.e., 140 centimeters ([4]). M O E test; CLT machine output (profile 1) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 sample number M O E tett; C L T machine output (profile 2) a f 0.5 I 1 . .—— i 200 400 600 800 1000 1200 1400 1600 1800 2000 sample number Figure 2.12 Typical output profiles o f a dynamic bending machine. 2.4.1. Calibration Unlike other measurement machines the calibration information for the dynamic bending machine was available at the time of the experiments. The following test was carried through at U B C ([32]) in order to ensure that the dynamic bending machine measurement was calibrated and that it had the same reliability as the other measurement systems. Fifteen specimens were randomly selected and were tested in a flatwise-static bending test as well as being tested by the dynamic bending machine machine. The average of the dynamic bending machine output and the average o f the static M O E profile were used to calculate the calibration factors. A s wil l be shown in the following, the result shows good linear correlation between the average dynamic bending machine and the average static M O E along the board. The repeatability of each measurement was tested as follows. For 28 each of five randomly selected specimens, the dynamic bending machine measurement was carried though five times. The repeated profiles were plotted in a diagram for comparison. Visual comparison shows good match of repeated profiles. The dynamic bending machine output measurement was explained previously and a sample output was shown in Figure 2.12. Each profile was extracted from the corresponding measured (dynamic bending machine output) signal by removing the surrounding area by using a threshold level of 0.1 v. magnitude. The result is shown in Figure 2.13. LandMark adjusted output voltage profiles for specimen #0760 18 16 1.4 1.2 I 1 I S 0.8 h-U 0.6 0 4 0.2 Q | i 1 i 1 1 i 1 1 0 100 200 300 400 500 600 700 800 sampling Instant (millisecond) Figure 2.13 The extracted and adjusted dynamic bending machine output profile for specimen #0760 The flatwise-static M O E profiles were measured by means of two independent Static Bending machines, i.e., Metriguard Model 440 Static Bending Tester and Eldelco D A R T Static M O E Tester ([32]). Static M O E measurements were taken at equal spaces fifteen centimeters along each board. Seventy centimeters from both ends o f the board were not measured in order to have a proper support of the board. Two sample profiles are shown in Figure 2.14 and good agreement is obtained between the two static test machines. One o f the machines measures the M O E of a specimen by three-point test and the other measured the M O E by a four point M O E test (see Section 2.1.1). The difference between the two profiles o f Figure 2.14 is due to this difference in the machines. 29 Measured static M O E of specimen #0760 by using Metriguard (blue)and Eldelco (green) machii 3 2.5 2 </> a. op 1 ,1.5 ui o SE 0 5 0 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 board length (mm) Figure 2.14 Measured static M O E of specimen #0760 by using Metriguard (blue) and Eldelco (green) machines, (a) original measurement, (b) adjusted for sampling position. The extracted and adjusted profile shows the actual measured M O E at the represented points o f the board. The average o f the two profiles in Figure 2.13 (shown as V in Table 1) and the average of the two profiles o f Figure 2.14 (shown as E in Table 1) were used for calibration. The measured average dynamic bending machine and average static M O E are shown in the following Figure 2.15. The coefficient of determination of the two variables is r 2 = 0.92, which shows a linear relationship between the two variables. i | 30 Table 1: The average CLT™ output, average M O E , and transformation coefficients for selected specimens. Specimen # V (CLT™) E ( M O E ) c, c\ 760 1.3683 1.8833 0.7265 1.3764 83 1.7110 1.8623 0.9188 1.0884 749 1.2610 1.5726 0.8019 1.2471 9 1.1853 1.3637 0.8692 1.1505 570 1.0565 1.2586 0.8394 1.1913 198 1.5896 1.8423 0.8628 1.1590 503 1.5518 1.7860 0.8689 1.1509 616 1.8549 2.0528 0.9036 1.1067 62 1.3717 1.4725 0.9315 1.0735 94 1.0792 1.4200 0.7600 1.3158 642 1.4377 1.7064 0.8425 1.1869 2 1.9447 2.1876 0.8890 1.1249 612 1.6248 1.7914 0.9070 1.1025 623 1.4489 1.7315 0.8368 1.1950 17 1.7727 2.1795 0.8134 1.2295 31 22 2 1 Average of CLT output vs average static MOE 2 -1 . 8 h UJ i i . 8 h -1.7 1.6 1.4r-1.3 12 1.3 1.4 1.5 1.6 1.7 Average of the measured CLT output 1.8 1.9 Figure 2.15 Linear relationship between the average measured dynamic bending machine output and the average static M O E . Therefore for a linear relationship, the calibration equation o f each cell (that produces each dynamic bending machine profile) is as follows: V=CE + C (13) A s was stated before, V is the average of the dynamic bending machine output and E is the average static M O E . C 0 is the output offset of the dynamic bending machine machine and is equal to the output voltage when there is no board inside. Therefore, the part of a profile that is not related to the board (for example, see the side parts o f the two profiles shown in Figure 2.12) is equal to this offset factor. Each measured profile is the output of one force-measuring cell, which measures the required force for bending the board along each o f its faces. For the two cells there wil l be four factors (C 0 1 and C 0 2 for offset factors and C n and C 1 2 for the scale factors), which are shown in the following equations: Cm = 0.0626 (14) C =0.0172 02 32 (15) C M and C 1 2 are calculated by comparing the measured profiles (from dynamic bending machine ) with the static M O E profiles of the selected samples and by using the following equations: Cu =(K-C0l)/Ei (16) Ca =(K-C02)/E2 (17) As shown in Figure 2.13 and Figure 2.16 the two profiles are almost the same therefore the scale factors are about the same (Cn = Cn = C) and are calculated as follows: C, = (El + Ey((Vx - CJ + (V2 - CJ) (18) The 10% of the beginning and the end o f the dynamic bending machine profile are removed in order to avoid the transient part o f the measured profile. The following calibration factors were calculated by using linear regression. Equation (19) is the calibration equation for profile #1 (the output o f cell #1 o f the dynamic bending machine) and Equation (20) is the calibration equation for profile #2 (the output of cell #2) o f the dynamic bending machine: P r o f i l e d : E= 1.1808 (V- 0.0626) (19) Profile #2: E= 1.1808 (V- 0.0172) (20) The final calibrated dynamic bending machine profiles of a few specimens are shown in Figure 2.16. In order to check the consistency o f the measured profiles, five boards were randomly selected and the dynamic bending machine profile of each were measured five times. Figure 2.17 and Figure 2.18 show that the superimposed profiles match well for both cells o f the machine. The measured M O E values are very close but the location of the profiles can be compromised in some cases because of the board slippage under the conveyor rollers of the dynamic bending machine. 33 , ig a LandMark and UBC MOE profiles tor specimen MS95 % IQ° LandMark and U B C MOE profiles for speclmen #0760 , 1 0 - LandMark and UBC MOE profiles for specimen MK21 ! I I i | r-i , A ! I *4 — 1 i i — t / I / I / T N _ 1 / i V > -— -.Z^pA-a ij i / ^ 1 I f \ ' " V I ' / ' / 1 1 1 l / / 1 t / / 1 1 III I i , i _ y \ ~jf i ! a I 1 > C - M . 1 > 1 1 i i itftgiiofd profile Yellow ! 1 — hi \ i "I I | ! ! 1 . . I 1 ' -T -andWiark prof LandMark prof UBC Eldelco p UBC Metrlguar el Blue • 2- Red oftle Green 3 profile; Yellow I . . . 1.0 2 2.5 3 3.5 length (m) LandMark and U B C MOE profiles for specimen #0204 I I I I / V \ f \ t J i . I i LaMMr LandMi UBC E U B C M L 4 — I irk profile 1: Blu rk profile 2: Rec ielco profile G Itrkjuard praflla — L _ _ ; i V run — 1 I — ! _ _ 2.5 3 3.9 4 4 5 5 k*ngth(m) 0 0.S 1 1.5 2 2.5 3 3.5 4 4.5 5 length (m) , to5 LandMark and UBC MOE profiles br specimen #0098 , if/ LandMark and UBC MOE profles for specimen #0629 A i r | 1 1 1 1 \ J . . _J 1 \ V LandMark profile! BIO LandMafk praMa2' RtK 1 I 1 1 1 1 1 \ JSC EUako preito: Q UBC IflWiguvb paella 1 1 1 ! 1 1 I 1 1 I I i I UJ O ! I I „ _jfcli ™. Ji_Ji~ _____ ! \ / 1 \ / 1 n 1 T v ^ f J 1 J 1 1 1 LancMl i 1 Itk prolldl Blue rk prolle2: Red 1 1 1 1 1 1 1 UBC El UBC M laleo profile G Btriguarti profile 1 Yellm i i 1 1 ~ ~ T 1 1 . " T 1 | I 0.5 1 1.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 length (m) Figure 2.16 Measured M O E profiles by CLT™ and static bending methods. 34 x iff LandMark repeal led M O E profiles tor spec m e n #0 158. cell #1 X KJ 6 LendMartt repealed MOE proales tor specimen #0158. cell #2 , 1 f / Landmark r e p e a t H d M O E p r o l e s tor specimen#0182. cel l #1 j J'JJx f- second-pass, green third peas: yeJpw fcrffi oass: red 1 l l h pass: magi i t . £ ; . . . . L. . \ —A— — A . . . . . I » .1 ™ T tecond1 pnf! green hlrd pass yellow M h . W * : M ' K h p * '"••» « • 0 0.5 1 1.5 2 2A 3 3.5 4 4.5 5 length On) x to6 LandMark repeated MOE profiles tor specimen #0162. cell *2 'J 0.5 1 ' S 2 2.5 3 3.5 4 4.S length i m) / i \ i j V" . . . . . . . . MC0fKf|p>9S' gnMfl Mrd pass: yellow brth pe£t j • h t*dt m a g W a . . . . . •-0 0.5 1.5 2 2.5 3 3.5 4 4.5 5 k»ngtri(ra) LandMark repeat eed MOE proOJes tor specimen 40207. ceil an , -o LandMark repeated M O E preflea tor specimen 00207, cad 02 I Mi I ! / 1 \ ! secondlpass: grain thrt fpass . yeabw M a 0 OS 1 1.5 2 2.5 3 3.5 4 4.5 5 lenglfi imi V v • • • * HI secondposs: green third p a n : yellow brth pass: red • | ] 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 length (m) Figure 2.17 The repeatability of the measured profiles for CLT™ machine. 35 Figure 2.18 The repeatability of the measured profiles for CLT™ machine (continued). 2.5. Ultimate Tensile Strength Tester This machine destructively tests the boards. The board is put under increasing tensile stress until it breaks apart. The tensile stress is recorded at every instant and is kept when the board is broken. The system consists of two grips; one moving and one fixed. A metal frame holds the grips and provides the fencing for security of the operator from wood particles that may fly around due to the release o f tensile stress when the board breaks. The moving grip is driven by a pneumatic system whose measured pressure is transformed to the applied tensile stress. 3 6 Chapter 3. Modeling wood failure I f a board is tested in tension, a three-dimensional stress distribution is generated inside the board. This stress distribution is not usually uniform due to the variation of wood properties inside the board. A crack usually is initiated in one part (or more) where the stress level is high. I f the applied tensile stress (to the board) is increased, the chance o f wood failure under stress is increased. Once the wood fails at a point o f the board, it generates a higher stress zone around it. Since the property of wood usually changes gradually, the excessive stress can also cause the failure at a neighboring point. This phenomenon can generate a chain o f failures that finally ends when the board breaks apart ([33]). In the following sections the factors influencing the strength o f wood are presented. Also, the measurable features o f wood that have important effects on strength are explained. A board is composed of clear wood and various defects, which interact when tensile stress is applied. For example, assuming that wood property is known in a neighborhood, by knowing the geometrical features o f the defects in the board and wood properties, the stress distribution may be estimated. Each of the features (defects and clear wood) and their effect on total strength are discussed in the related sections. The discussions o f each section are mostly limited to qualitative descriptions o f the related physical factors. A feature or a set o f features wil l represent each physical factor. Therefore, the qualitative discussion provides an insight into the problem and helps in defining the features. A n exact relationship between each physical phenomenon and the tensile strength is not needed (and in most cases it is not known). The collective relationship wil l be learned by the estimation (or learning) algorithm, which wil l be discussed in subsequent chapters. Two very important aspects of wood are the complexity of its structure (as a fibrous anisotropic material) and variation in its local features (due to various defects). Therefore, even i f the features of wood were accurately measured at every point o f the board and i f 37 the strength model was fully developed, the computation o f a board's tensile strength doesn't seem feasible for on-line use. Therefore, simplification seems an integral part o f this problem i f a practical strength estimator is desired. 3.1. T e n s i l e s trength o f c l e a r w o o d as a r a n d o m v a r i a b l e In strength-of-material testing in the early twentieth century it was observed that the tensile strength o f similar specimens are dispersed and cannot be modeled by a single value. The tensile strength o f matter, therefore, was modeled by a probability density function. Weibull weakest link theory ([34],[35]) is a theoretical explanation o f the probabilistic behavior of material strength. This theory was based on two major assumptions. The first assumption was that the cause of fracture is the flaw (microscopic cracks) in the material and that these flaws are independent and are randomly scattered in the volume o f the material. The second assumption was that the entire material holds up against the applied stress until one o f the flaws fails. This assumption is actually the weakest link theory, which is applicable to brittle material like glass or china. It was shown that this assumption is valid for wood i f the mode o f fracture is brittle. In non-brittle material the strength of a specimen is not equal to the strength of its weakest element and the failure of the specimen is a random process rather than a sudden event. When the fracture starts, stresses are redistributed and it doesn't necessarily lead to immediate collapse or failure. Also, neighboring points can affect the stress tolerance o f an element of the specimen. For the brittle case, the probability o f fracture at a point, S, is related to the stress level, a, over the volume of the material, V, by the following equation (Weibull distribution, [34],[35]): log(l -S) = -fn(o)dv (21) V where w(.) is a heuristic function and is defined as follows: 38 «(o) = ( ^ % (22) u where <r0, cr, and w are properties ofthe material. The probability of failure, S, is equal to zero for a < a 0 . The location parameter, cro, is not usually measured due to the practical difficulty in finding a minimum value for ultimate strength of material. Therefore n(d) is commonly defined as follows. n(d) = (^T (23) u There are different reported parameters in the literature. For SPF (Spruce-Pine-Fir) softwood lumber cr » 30 and m « 3 is used for simulation (covariance « 40%) ([23]). For Douglas fir, an average tensile strength of 14200 p.s.i. (102 MPa) and standard deviation of 3500 p.s.i. (26 MPa) is reported (covariance « 25%) in [36]. It was observed ([37]) that the size (the bigger dimension of the cross-section) o f the specimen affects the strength and caused modification o f this model ([38],[39],[40],[41]). However, it was shown that the size effect vanishes i f the thickness o f material is more than 30 mm ([42]). The modifications o f this model that account for redistribution o f stress due to the existence o f crack as well as size effect were developed ([43],[44]). 3.2. T h e effect o f g r a i n angle o n s trength Grain angle o f wood is the angle between local longitudinal axis o f wood grain and the longitudinal axis of the board. The average grain of a tree doesn't necessarily grow along the longitudinal axis of the tree. In some cases the grain may grow in a spiral form around and along the trunk o f the tree ([45]). The average grain angle affects the strength of a small specimen and is described by the Hankinson equation ([46]). The impact o f grain angle on strength is due to the fact that the strength o f wood is directional. Tensile strength o f wood along the grain is the highest while the tensile strength is lowest across the grain. Although related to the species o f the tree, the longitudinal strength is usually many times larger in magnitude than the strength perpendicular to the grain. The 39 following equation (Hankinson equation) is an accepted relation between wood property (e.g., tensile strength or M O E ) and wood grain angle ([47]): P sin \0) + Q cos\0) v ' where, N is the desired property o f the test specimen, P is the property along the grain, Q is the property across the grain, and 9 is the angle o f the grain with respect to the stress direction, n was empirically derived, and is usually between 1.5 and 2 for tensile strength. In [48], P for tensile strength o f Douglas fir is reported to be equal to 13800 p.s.i. (99 Mpa) and Q for the same tensile strength is reported to be 340 p.s.i. (2.4 Mpa). Although improvements to Hankinson equation were suggested ([49], [50]), this equation is still widely used. In a board, not only the average grain angle affects the strength o f the specimen, the local grain angle can affect the strength even more severely. Local grain angle is created by deviation of grain of the trunk o f the tree into its branches. Also, the deviation of grain angle around a branch that is a consequence o f natural growth o f the tree can have considerable effect on the strength of a piece o f lumber. Another effect that is not as important, for strength estimation, is that the type of failure also changes depending on the angle between tensile stress and the grain. I f tensile stress is perpendicular to the grain, failure in clear wood is like that of brittle material ([38]), while i f the tensile stress is along the grain failure in clear wood is like that o f tough material ([51]). 3.3. T h e effect o f dens i ty v a r i a t i o n on s trength One of the features o f wood that was known for many centuries to be related to the strength was its density. Although not highly correlated with strength, the specific gravity of wood relates to the density o f fibers o f wood, i.e., the thickness o f the cell walls. The correlation between density and strength is valid within a species, and even valid within the body of a tree. Each ring o f a tree represents one year of its age. The light color part is early wood that grows in spring (when water is abundant). The cells that grow in this 40 season have bigger void at the center of the cell. The dark color part is related to late wood (summer and fall). The grain that grows in this season has smaller voids at the center o f the cell and is denser and stronger than that o f early wood. One of the reasons why wood shows orthotropic behavior is this difference between the strength of the two types o f wood in a ring ([38]). The body o f a tree shows a cylindrical symmetry that arises from the relative axial symmetry of growth rings. Therefore, at every point the properties of clear wood are relatively constant in three major directions; along the tree, along the radius of the trunk, and tangent to the trunk (perpendicular to the other two axes). Also, the difference between heartwood (close to the center o f the tree) and sapwood (away from the center and carrying water) as well as juvenile wood (the wood developed when the tree was young) and the rest of the tree can affect the strength o f a board. 3.4. T h e effect o f holes o n s trength Although holes are not likely to be seen in a board, natural holes can be created in wood due to decay or missing knots. There are two effects that can be associated with holes; smaller effective cross-section of board and increased local stress arise due to the shape, size, and location of the hole. Usually, the increased stress that is caused by the shape o f the hole initiates the fracture, which propagates and results in the failure o f the board. The holes with sharp ends that are pointing across the board cause largest local stress. Therefore even a small saw cut, because of its shape, could be more strength reducing than a larger circular hole. However, the shape o f natural defects is not usually sharp. 3.5. T h e effect o f knots o n s trength Knots may be the most common defects that exist in lumber. A s the trunk o f a tree grows radially, the base o f each branch is gradually taken into the body. Different types of knot can be generated based on how the branch is incorporated in the trunk wood. A knot can 41 appear as a distinguishably defined cylindrical shape in a board, or it can appear as an acute change of direction in local grain. Knots have two features that affect the tensile strength of a board. The first feature is that knot wood almost does not contribute to the strength of the board's cross-section at its location. Therefore, a knot is sometimes modeled by a hole in the board. However, the shape o f the knot cannot always be simply modeled as a hole because there is also density variation from clear wood to knot wood that makes the effect o f a knot more complicated. The second effect is that due to the existence of the knot, the grain of wood bears local variations. This variation o f the grain reduces the local strength o f wood around the knot. In practice, it is obvious that knots are the cause of failure in most cases ([11],[12]). Visual grading that usually deals with the largest knot of a board is an example o f empirical information that transforms knot features to board grades. The size of a knot and its closeness to the edge of the board are two important factors that reduce the strength. Knot area ratio represents what percentage o f the cross-section o f the board is actually taken by the knot. This, in fact, is the increase in the stress on clear wood section of the board cross-section. 3.6. O t h e r factors Other factors such as wane, resin canal, and decay affect the strength but they are hard to model or learn by a learning algorithm. The reason is that these factors don't have any specific shape and density and that they don't happen frequent enough for a learning algorithm to be trained for their condition. Any error in measuring the features of wood wil l have an impact on the estimated strength. This factor can be modeled as a shift in the feature space. Some o f the features o f wood are not measurable. For example grain structure, which has a defining effect on strength, cannot be measured directly. Therefore, a heuristic approach to strength estimation wil l be limited to a specific species or a group of species. 42 3.7. A 2x4 b o a r d as a sys tem It is almost impossible to see a board without any defects. The impact o f defects on strength is so severe that the failure of a board is almost always assigned to a dominant defect or defects. A comparison o f board tensile strength and the tensile strength o f small samples o f clear wood from the same boards ([11]) shows low correlation. This comparison substantiates the observation of defect dominance on tensile strength. The actual failure o f a board under tensile stress usually takes place due to the interaction o f more than one feature. For example an edge knot initiates the fracture, but the propagation of the fracture is dependent on the general strength of the wood. In practice, there can be more than one fracture that is initiated but the first one that results in the total failure ends the experiment. Therefore, a board should be considered as a system o f wood features such that their interaction determines the tensile strength o f the board. Since a board usually has many defects in it, different strength zones can be defined (not always easily/possible). A study ([52],[53]) showed that i f defects are closer than a certain distance to each other, they could create local weak zones. In practice, it can be seen that close knots, for example, create low strength zones. This is due to the fact that high density variation and high grain variation as well as high stress zones exist in a small region, which lets the fracture be initiated and propagated in a relatively low stress level as compared with the average strength of the board. This fact is not only related to knots but also related to local high grain angle (like spikes), resin canals, or some times local decayed zones. However, knots are the most common factors that cause the failure of a board ([11]). One more factor that must be considered in board tensile strength estimation is that all the elements of a board are random variables. This means that all the elements of the system are random variables. Therefore the problem o f board tensile strength estimation, in general, can be considered as a nonlinear mapping of input random variables (features) to the strength. 43 3.8. P r o p o s e d a p p r o a c h to m o d e l i n g s trength as a cont inuous funct ion o f w o o d proper t i e s It is the goal of this thesis to find a mapping that transforms the measured properties o f a board to its tensile strength. The place where a failure initiates and the path on which the crack wil l grow are not known. Although it is desired to get a detailed and accurate measurement of all the physical attributes of a board, the complexity of analysis wil l make the study practically impossible. Therefore, the problem of transforming a board's physical properties to its tensile strength leads to transforming selected features of measured profiles to board tensile strength. Let's say X is the set of strength factors that should be measured in order to estimate the strength of the board. The set X is the set of all the necessary profiles. If X were known, the estimation error would be due to model development; i.e., board characterization (Chapter 4) and/or the learning method selection (Chapter 6). It is obvious that X cannot be fully produced by practical measurement. The measured profiles cannot be detailed enough to represent all the local variations of wood properties, as was obvious from the limitations of measuring machines of Chapter 2. For example the measured grain angle profile of the board can only be considered as a coarse presentation o f three-dimensional grain-direction of all the points o f the board. Also assuming that we know all the strength affecting factors, not all the factors are being measured due to unavailability of a suitable measuring machine and/or due to lack of reliability of an existing machine. The available set o f measurable strength factors, X, is a subset o f X, and its elements o f ! are usually the smoothed form of the elements o f X. Any improvement of the existing measurement machines or any invention of new machines that measure a new property o f wood would improve the estimation by making X more similar to X. One characteristic o f the measured profiles, that is, the elements of X, is that each one can be a single number, a vector, or an image (matrix). Any change in the measurement machines affects X. Figure 3.1 shows the problem of strength estimation as a learning problem ([21], [54]). In this figure, x is a vector that represents X, u is the vector that is not related to the output but can be a part o f the input feature set as a part o f exploratory 44 data analysis, v is the vector of features that are included in the generated feature set and represents X. g(x) is the actual process and g(x) is the model that is developed by the learning machine, y is the actual output and y is the estimated output. Figure 3.1. Strength estimation as a learning problem. To reduce the dimensionality of the problem the measured profiles, X, can be transformed into a feature space, F, that represents X. For example, a board can be identified by its defects and a general description of its average wood properties. The feature space, F, could also be very high dimensional and is not necessarily unique. Since there aren't any rules in defining the features, there can be many redundancies in the feature set or one may define a feature that is contributing to strength estimation. Some a priori knowledge about any dependence between the strength and the feature set helps in developing a more accurate model. As wil l be discussed in the Appendix, the problem o f wood failure under tensile stress was studied in the past. Therefore, it is desired to choose a set o f features that are physically meaningful and best represent F so that the existing knowledge of strength of material can be used at model development stage that wi l l be discussed in Chapter 6 The selected feature space can be denoted by P. Since the tensile strength, S, is a random number it can be assumed that it is a hyper-surface in the measurable physical features space, with added uncertainty. The estimated tensile strength o f the board, S, is the hyper-surface in P that estimates S. 45 A board can be related to a point in P. The estimation process requires the transformation of this point to S. As the point is transformed from one space to the next some uncertainty, or variance, is added to it. Once the methods of measurement, feature extraction, and model development are chosen the estimated strength for a given point in physical space is a random variable. Let's say the estimated strength of a board with measurable features P is denoted by S(JU,<T,X) where ju, a, and X are location, scale, and shape factors, and s is a random variable with known distribution ([55]). The three factors o f the distribution are functions o f the feature (P), assuming that boards are uniquely identified in the feature space. Therefore, the tensile strength of a board can be studied as a random function (for the random nature o f material failure) o f the features as follows. s(ju,a,A) = s<jj(P),o(hMP))==giP) (25) 46 Chapter 4. Characterizing a board; features There are two classical approaches to solving the strength estimation problem. One approach is based on the assumption of having relatively accurate knowledge about wood properties at every point of the board. This approach leads to an analytical approach that can be used in a method like linear elastic finite element modeling ([38],[64]). On the other hand, there have been estimation methods that were based on general properties of wood ([65]). In this case general features o f the board are mapped to the strength o f the board. The existing standard o f wood grading is based on such an approach. The strength estimation methodology o f this thesis is something between the two approaches but closely related to the second approach. A model of board structure (fundamentally similar to what is done in the first approach) was developed. Then the characterizing features o f the board were extracted from this model. Finally, the features was mapped to the strength of the board. In this method, the failure of wood is considered to be a function of general properties o f wood at a section o f the board. The basic assumption here is that wood properties don't change much in a small region. Therefore, a few (although it is not known how many) features of that section can characterize it. Since the exact location o f fracture initiation and the path o f fracture propagation are not needed, the features should be mapped to a single value that is the final stress level that leads to the failure o f the board. It should be noted, as was explained in Chapter 3, that the strength o f a board section is generally a random variable. However, the variance of strength at a point in the feature space is not studied in this thesis and the average of the strength distribution was estimated. Therefore, the strength o f a board can be modeled as g(P) where g(.) is a real function o f the real valued feature vector P. The problem o f characterizing the board for strength estimation is basically an attempt to define the weakest part of the board and the general properties o f the board. Therefore 47 two types o f features were produced; one type for the weakest section and one type for general properties o f wood. It may seem relatively straightforward to detect the weakest section or sections o f the board and develop the strength estimation model for that section. The underlying assumption here is that the weakest section can be indicated and its features can be used for strength estimation. This deduction arises from the fact that the causes of many failure cases are major defects that can be easily seen. If no distinct defect exists, interaction o f defects can go beyond the neighboring defects. Therefore, it is not always clear as how to distinguish different sections of a board. Therefore, characterizing a board cannot be considered a trivial task. Also, it can be expected that the feature set to be almost definitely noisy. A s was explained above, a single major defect or a combination o f defects can cause the failure. In many cases of failure a single defect or a group of defects can cause the failure o f the board. The section o f the board that has the dominant defect(s) is the weakest section of the board. In such cases a feature that represents the worst defect is needed in the representative feature set. The minimum of the M O E profile is related to the weakest section o f the board or equivalently to the biggest defect. Also, in many cases the board contains defects that are similar. In such cases there isn't one distinct defect that causes the failure. These defects increase the probability o f the failure at lower stress levels. Therefore, a set o f features that represents the overall existence o f defects should be included. From Chapter 3 it seems obvious that there are two basic factors that govern wood strength estimation; wood strength at a point or a section of the board and the structure o f the board. The former one defines how strong the material is and the latter one defines the stress distribution and stress concentration that has a definite reducing impact on strength. The stress distribution in a board is not going to be studied in this thesis but a set of features that seem to be related to the stress concentration should be included in the representative feature set. The definition o f features depends on the measurement constraints. The goal is to produce a set o f features that are most relevant to strength or failure and can be accurately 48 measured or calculated by using the measured features. The understanding of measurement constraints is important at this stage because some o f the features that may seem most relevant may not be measured properly. The measurement systems and their restrictions wil l be explained in Chapter 5. Local density, grain angle, and bending modulus of elasticity are the measured profiles. The representative features are therefore related to one o f these measured properties. Obviously, the unmeasured properties o f wood wil l add to the uncertainty o f the estimation, as was explained in Chapter 3. 4.1. A g e n e r a l v iew o n de f in ing the features The need for extracting a feature set from the measured profiles o f a board arises from the fact that analyzing the measured profiles of the board is practically impossible because they contain numerous redundant or highly correlated pieces o f information. For reducing the unnecessary pieces of data, features of the measured profiles are used to develop a model of the board. These features are basically the characterizing features o f the board. The features that wil l be introduced serve a double purpose, they represent a board (distinguishing different boards) and they represent the strength-related physical features o f the board. Therefore the representative feature set needs to be a minimum necessary size. In order not to compromise prediction accuracy for feature meaningfulness, the method o f Figure 4.1 can be followed. The meaningfulness of the features makes them different than the features that are usually introduced in dimensionality reduction techniques as these techniques map (linearly or nonlinearly) the given set o f features to a fewer number o f features. For example, in a three layer neural network with few hidden layer neurons, the output of each neuron is dimensionally reduced feature. Obviously in a problem like wood strength estimation such a technique cannot be much o f help in using any existing information from other studies. However, i f the X-ray image is transformed to the board model, it can be transformed to Knot Area Ratio ( K A R ) , a feature that has been studied, by other researchers ([13]), as an effective strength-affecting feature. 49 measured profiles board structural model -> representative features 1 strength model I ^ representative features 2 _^ Figure 4.1. Steps o f strength estimation method. The board structural model is the generalization of the measured profiles. This model is developed based on the existing knowledge of the constraints that should exist on the measured profiles. However, the model tends to be simple and tends to remove the anomalies that are not considered in the model. For example, a small-size density variation is usually considered a measurement error. Therefore the measured profiles can generate features that can add to the generated features through the board model, shown as representative features 1 in Figure 4.1. A learning system, shown as the "strength model" in Figure 4.1, uses a fixed length of features. Therefore, the feature set should have fixed length. However, the fixed length feature set imposes restrictions on the estimation that can best be studied by the complexity o f signals. In the following, the effect o f a fixed number o f features on the estimation accuracy is discussed. It is obvious that the complexity of a board structure is variable. A board with many small defects is structurally more complex than a board with few major defects, although the strength o f both boards may be the same. For example i f a board with twenty knots is to be characterized, twenty sets of features for the knots and a set of features for the clear wood is needed. But i f a board with three large knots is to be characterized, only three sets of features for the knots and a set of features for the clear wood are needed. Since transformation of input to output needs a fixed length of features, the feature sets of clear wood and defects wil l be transformed into a fixed length vector of features that is the board feature set. It may seem that the board model is not needed because one could extract a set of representative features directly from the measured profiles and feed those to the strength estimator (Figure 4.1). The board model allows one to develop a set o f features that are physically meaningful, for example the features that are related to structure of the board. 50 B y using this model the existing knowledge from fracture mechanics and wood grading can be used in order to develop a meaningful representative feature set. This fact wi l l become obvious when knot feature extraction is involved. The characterization o f a board starts by categorizing its strength affecting factors. The factors that affect the failure (or strength) of the board are distinguished and are extracted from the measured profiles. A set of features that best characterize each factor wi l l be produced. Based on the feature sets of all the factors, a set o f features that best characterizes the board wil l be generated. The strength factors are categorized into clear wood and defects. The average properties o f clear wood contribute to the strength of the board. The defects, on the other hand, are the strength reducing factors. The measured profiles are the density, grain angle, and modulus of elasticity. The strength model is a learning system (parametric and non-parametric) with fixed length of input vector. In order to produce a fixed length vector of features as the representative feature set of the board, the statistical representation of the features is used, i.e., the similar features of defects in a board are represented by their statistical features. For example, a feature o f a knot is its volume, but a representing feature for the related board is the average o f all knot volumes. The clear wood of a board wil l be represented in the same fashion. The clear wood can be represented more accurately because it is one entity whose properties change gradually. This method is based on the regularity o f the defects. I f the defect sizes are about the same and their strength reducing effect is about the same, then this method can represent the probability o f failure at lower strength levels. 4.2. P r o p e r t i e s o f c l e a r w o o d Characterizing the clear wood of the board is basically defining what is regular in the board. For that purpose, the regular part o f a measured profile wil l be defined and wil l be extracted from the measured profiles. Here, a definition of clear wood and its attributes is provided and will be used to set the methodology for clear wood feature extraction and 51 clear wood modeling. Naturally, every measurement technique imposes some constraints on the measured property and therefore the methodology of feature extraction wil l be developed accordingly. The clear wood of a board is the non-defect wood that is related to the regularly grown portion of the trunk o f the tree with two distinct attributes. The first attribute is that this portion of a board contributes most of the body of the board. The second attribute is that the properties o f clear wood are either constant or vary gradually. However, randomness of properties can be expected because clear wood is a natural product. The first attribute, statistically speaking, states that the chance of a point being clear wood is more than that being a defect. The second attribute leads to using averaging or using regression for extracting the clear wood properties from the measured profiles. These attributes are the basis for extracting clear wood properties from the measured profiles. A density profile is a two dimensional profile in the form of an image. Every pixel of the image is proportional to the density o f that point of the board. I f the pixels of the image are considered as random numbers and presented as a histogram diagram, clear wood contributes the major shape of the distribution. The statistical mode o f the distribution is the most common density of the board and defines the average density o f clear wood. Also, the natural variation of density in clear wood is obviously a second feature. This feature is a strength-reducing factor (Figure 4.2). Therefore, the density o f clear wood at every point of the board is modeled by its average density and by its natural variation that is modeled by the standard deviation of the histogram diagram. 52 density distribution of knot + clear wood (specimen #97) 4000 3500 3000 2500 frequency 2000 1500 1000 500 0 06 0.8 1 1.2 1.4 1.6 1.8 2 density Figure 4.2. The histogram representation o f density distribution of a board If the clear wood density is assumed to be constant at all clear wood points o f the board, the mode of this distribution is the natural choice (Figure 4.2). The mode of distribution is the simplest form that can be used to model the clear wood density. Although, the average o f the distribution is usually close to the mode, the mode value has the slight advantage that it is not affected by large density affecting defects like decays or resin canals. The standard deviation should be independently stored as a feature. Also i f the density is assumed to change gradually along and across the board, a plane that is a linear regression of the image can model the clear wood density at every point. In geometrical defect detection it wil l be shown that two images are needed to develop the model o f the defect. One image is related to the wider face (plank) of the board, that is, the top image, and one is related to the narrower face (edge) o f the board, that is, the side image. The two images can produce the density o f all the points of the board. The grain angle profiles o f the board cannot be analyzed in the same way as done above. The sampling rate in this measurement is low therefore the number of measurements in this profile is considerably less than that of density profile. The sharp local grain angle variations that can be seen in the grain angle profile are related to the grain variation around knots. The shape of grain angle variation in the measured profile depends on the relative location of the measurement head with respect to the center o f the knot. The histogram representation does not generate a clear shape of distribution (Figure 4.3) 53 because the defects usually take a large portion of the profile. The grain angle features of clear wood should be extracted from the grain angle profile. The average and standard deviation o f the measured grain angle profile are considered as the grain angle features of the clear wood. grain angle profile of a board histogram distribution ofthe grain angle of a board Figure 4.3. Grain angle profile of a board and its distribution. The bending modulus o f elasticity measurement o f a board is an average over a span of the board. This profile does not vary much and the sampling rate is low. Therefore, similar to grain angle profile, the profile is better for feature extraction than its histogram diagram (Figure 4.4). However, unlike the grain angle profile the defects o f the board do not create any sharp changes in the profile. Therefore, the local measurement of the M O E of a point can be assigned to the cross-section of the board at that point. Although the variation in the M O E profile is gradual, the minimum of this profile usually indicates a weak zone in the board that is either related to the major defects of the board or a weak clear wood zone. This feature is not related to clear wood general properties and wil l be considered as a feature of the entire board. 54 X106 MOE profile of a board 3.4, 1 1 1 . . , 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 length (mm) histogram distribution of the MOE of a board 1 1 • ill j , • • • mi 'Pin m • i ' i 11 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 M O E * * ) x 1 0 » Figure 4.4. The M O E profile of a board and its distribution. 4.3. C h a r a c t e r i z i n g the defects Characterizing the defects starts with detecting the defects from the measured profiles. Defect detection can be categorized into three problems: detecting general variations, sharp local variations, and extreme values. The general variations were explained in Section 4.2, along with clear wood properties, in the form o f standard deviation of the measured local density. The local variation detection is based on the existing difference between the density or grain angle o f the defect and the clear wood. The extreme values are the minimum values o f wood density and modulus o f elasticity, which are the weak points of the board. Knots are the distinct defects that most commonly exist in wood. They were studied and modeled in detail by using their geometrical features. This is possible because of a knot's defined shape as well as its distinct density difference with respect to clear wood density. Each representing feature wil l be an indication of the defect's size, location (across the board), or shape. A grain deviation appears as sharp changes in the profile (Figure 4.3). This can be detected by using thresholding techniques. N o other specification is assigned to it because the shape and size o f the variation depends on the relative location of the measurement head with respect to knot location. It wi l l also be shown that the grain variation can be modeled by having the knot model. 5 5 These attributes of a feature are basically the properties that similarly affect the strength. The size o f a defect reduces the clear wood material. The shape o f the defect affects the high stress zones that are generated by the defect when the tensile stress is applied. Also, the location of the defect across the board influences the strength reducing effect of the defect. I f a similar defect is closer to the edge of the board, it more likely initiates a fracture. Also general variations of clear wood or general minima of strength factors are the features that reduce the strength. Therefore, the standard deviation of the measured properties (from clear wood values) in Figure 4.2, Figure 4.3, and Figure 4.4 are the features that can represent some of the defects of wood. Also, the minimum of the measured profile for M O E (Figure 4.4) and the minimum o f density across the board are the features that are related to weak points in the board. The minimum of density across the board is the average density in a small section of the board (Figure 4.5). Such general minima or maxima are not seen in grain angle profile of the measured samples. When the specific features cannot be defined, general features that may be related to the strength can be defined. For example a grain angle defect may not have a specific shape or size or location but its maximum grain deviation defines the weakest part o f that defect and is related to the likelihood o f failure at that point. Similarly a decayed zone can be identified because o f its regional reduced density level but a specific generic shape cannot be assigned to it. Also, the location across the board cannot be assigned to the decay zone because its large size makes this feature meaningless. 56 x igverage board density of a section with 1/10 of the length of the board -A St \ O r 500 1000 1500 2000 2500 3000 3500 4000 4500 board length Figure 4.5. The local density profile of the board generated by a moving window. Window span is 1/10 o f the length of the board. 4.4. G e o m e t r i c a l features o f a knot In this section the geometrical features of a knot wil l be explained. The geometrical features o f a knot are the features that are obtained by modeling the shape of the knot. The features characterize a knot's size, its shape, and its relative location with respect to other knots. The creation o f a knot is due to the existence o f a tree branch, which has a defined circular cross-section. The remainder of the branch keeps its defined cross-section shape and leads to the geometrical model that wil l be produced in Chapter 5. The knot can be detected in the density image o f the board because its density is usually distinctly different than the density o f clear wood. Figure 4.6 shows the X-ray density profile of a knot in a piece o f wood. The three-dimensional image of such a knot can be imagined as what is shown in Figure 4.7. The flat part o f the density profile in Figure 4.6 is related to clear wood that surrounds the knot and the protrusion is related to knot wood. It wil l be shown in Chapter 5 that two density images o f knot that are taken from mutually perpendicular directions are sufficient to identify the model of the knot (Figure 4.7). 57 Density profile of a knot Figure 4.6. The density profile of a knot as compared with the density of clear wood o f a board. This model is based on the generalization o f the cylindrical shape of a branch. Therefore a knot is modeled by a cone, which is cut by two parallel planes. The cone represents the knot and the parallel planes are the surface of the board. The features o f the geometrical shape are the features o f the related knot. More detailed features can be extracted from this shape because of its increased accuracy as compared with other defects. Knot volume, its axial deviation, the taper, the average radius, the distance to the closest knot, and knot-area-ratio are the features o f a knot. The axial deviation o f the knot is the standard deviation o f all knot points from the central line of the board. Knot taper is the ratio o f the smaller diameter to the larger diameter. Knot average radius is the average radius o f the knot cone in the board. Knot area ratio is the maximum of the ratio o f knot cross-section to the board cross-section. The Knot Area Ratio ( K A R ) is a visual quality level ([13]) measurement that is used in industry for down grading a piece of lumber i f a large knot exists in it. V Figure 4.7. Conic model of a knot. 58 Knot volume and average knot radius represent the reduction in the material of the surrounding clear wood area. Similarly, knot-area-ratio represents the minimum clear wood material at the cross-section of the board where the knot is. The axial deviation of the knot from the center ofthe board shows how close the knot is to a corner of the board. Knot taper represents the possible existence of the tip of the knot in the board that can create a high stress zone. The distance o f the knot to its closest neighbor can be used to represent the expected increase in stresses that can be created from knots that are very close to each other. 4.5. C h a r a c t e r i z i n g feature sets a r e not u n i q u e In defining a set o f features, it should be noted that different sets of features can be defined that represent similar properties of wood. It is obvious that any transformation of a feature or a group of features wil l generate the same estimated strength as long as the learning system can learn the inverse of the transformation. Since there aren't any rules for defining features, the definition o f features that are representing the same factor is avoided. For example, the average and the mode of the density histogram of a board (Figure 4.2) are two features that are conceptually representing the same factor. The mode of the board's density profile is the regular density o f the clear wood o f the board, which is related to the strength of the board. The average of the X-ray image can be considered the same as long as the defects are neglected. In defining the representative features of a board their physical meaning wil l be considered. Therefore, the emphasis is on the features that are related to board model as opposed to those that are related to the measured profiles (Figure 4.1). The advantage of this methodology is that the existing knowledge from wood grading and wood mechanics can be used for feature definition. 59 4.6. B o a r d m o d e l The board model is the basis of generating a set of features that physically make sense. For that matter, the clear wood properties that were explained in Section 4.2 are extracted from the profiles and inserted at all the points of the board model. B y using the geometrical features of knots a basic three-dimensional model of the board can be generated (Figure 4.8). This model is a structural model o f the board. The knots are detected by using the geometrical method and are modeled by holes of the same shape. Figure 4.8. The model of a board that consists of the clear wood and defects. The clear wood part of the model at every point indicates the density, grain angle, and modulus of elasticity of wood at that point. The density o f wood at every point is the average of estimated density from the two density images after removing the knots. There may be more than one grain-angle profile of the board. The grain angle o f wood at a point is resulted from the interpolation o f the measured profiles after removing the anomalies (the trace of the defects like knots) in the profile. The modulus o f elasticity profile is usually measured as a pair, one from each side of the board. The average profile shows the average M O E o f the measured points. The M O E of all the points in a cross-section of the board are assigned the value of M O E at the related point in the measured profile. It is obvious that the complexity o f the characterizing feature set increases as the number of the defects changes. This can be seen from the fact that i f each defect is characterized by a set of features the total number of features increases i f the number of defects increases ([66],[67]). The complexity of the characterizing feature set can be related to the accuracy of the strength estimation of the board. As was mentioned before, the characterizing feature set of the board is a fixed length vector of features. In order to transform the variable number of features to a fixed number 60 of features the statistical moments were used. The clear wood features and general variations o f clear wood were defined before. The features that are related to the board model are the features that are related to geometrical features. Knot volume, its axial deviation, the shortest distance to a neighboring knot, taper, and knot area ratio are the features that were added. 61 Chapter 5. Extracting features from measured profiles In this chapter the extraction of features from the measured profiles is presented. The features, that wil l represent a board for strength estimation, were discussed in Chapter 4 and the measured profiles were presented in Chapter 2. Statistical features are used for representing the grain angle measurement and the modulus o f elasticity measurement. The X-ray images, however, are used for deriving the location and shape of knots. Most of this chapter is related to detecting knots in an X-ray image and modeling the shape of the pattern. As was explained in Chapter 4, a model of board density is produced by using the X-ray scanning of the board. This model consists of the knots of the board and clear wood density distribution. A knot produces a local protrusion (a bump) in the regular surface that is the X-ray image o f the clear wood. Since the X-rays are almost parallel, the edge of this bump is a pattern that can represent the shape of the knot and wil l be used to extract the parameters of the knot model. It wil l be shown that two patterns are sufficient to produce the model o f the knot. The three dimensional shape o f a knot and its location in the board is the basis o f geometrical features of the board. Shown in Figure 5.2 is an X-ray image of the wood density of a board and the surrounding area. The surrounding area is related to the portion of the measured profile where there is no board in the system and the system output is the noise level. Since the noise level in all the measured profiles is considerably lower than the signal level, the portion of the measured profile that is related to the board can easily be separated from the surrounding part of the profile by thresholding. 5.1. T h e i m a g e o f a knot i n a n X - r a y i m a g e ; the s h a d o w i m a g e As was explained in Chapter 4, the density image of a knot is generated by X-ray scanning of the board. Figure 5.1 shows the density image of a knot, which is a part of an 62 X-ray image. A s is shown in this Figure a knot can be seen as a local peak in a relatively flat surface that is related to clear wood o f the board. Because o f the large density difference between clear wood density and knot wood density (as can be seen in Figure 5.1) a threshold level can separate knot pixels in this image. This is the basis o f knot detection that wil l be derived in the following sections. Figure 5.1. The X-ray image o f a knot A contour map can show the existence and location o f knots in an X-ray image. Figure 5.2 shows the contour map o f a measured profile. The two relatively vertical lines are the edges of the board (the surrounding area were removed) and the closed elliptic contours are related to the knots. Each closed elliptic contour is related to a density variation similar to Figure 5.1. 63 Figure 5.2. Contour map o f an X-ray image. 5.2. T h r e s h o l d i n g ; c lass i fy ing a p ixe l into knot o r c l e a r w o o d classes The existence o f knots and the related wood density variation makes the density distribution o f the board a bimodal distribution ([68]). In order to classify the image pixels into two classes o f clear wood and knot wood, Bayesian based thresholding was used. The density o f wood is assumed to be normally distributed, therefore, knot wood and clear wood can be modeled by two normal distributions whose averages and standard deviations were measured from the existing sample boards. For example, the distribution of density in knot and clear wood o f a board (as wi l l be discussed in detail later in this chapter) from a randomly selected board shows that the mean and standard deviation of clear wood distribution (assuming a normal distribution) are x = 1 and cr = 0.05, 64 relatively. The mean and standard deviation of knot wood distribution are xk= 1.39 and <rk = 0.44, relatively. The distributions are shown in Figure 5.4. The problem o f separating the image pixels into knot pixels and clear wood pixels was cast as a Bayesian decision-making problem. Every pixel of the image should be classified into one of two populations with known probability distributions (Figure 5.3). In deriving the threshold level two points were kept in mind, detecting knots and preserving the shape o f the knot pattern. Detecting small variations in clear wood density of the board was not o f interest here. The knot patterns, on the other hand, were specifically important because they were transformed to geometrical features of the board. Therefore, the threshold value was chosen so that it generates the least chance o f error in the output knot pattern. Although the density distribution of clear wood pixels of the X-ray image is related to the natural variation in wood density, assuming that knots are the only defects of the board, the density distribution of knot pixels is affected by different factors as is explained in the following. The density distribution of a knot depends on the type o f the knot. For example, i f the density distribution o f knot wood is modeled by a Normal distribution the mean of the distribution changes as the type of the knot changes. There are few types o f knots therefore one can expect few possible density distributions. The threshold level is specific to the two density distributions. A single threshold level imposes error in classification procedure. However, this error cannot be avoided because the types o f knots cannot be distinguished by using their X-ray images. 65 probability distribution • density knot wood R R, Clear wood Threshold level Figure 5.3. Probability distribution o f clear wood density and knot wood density. The actual statistical distribution of a knot-related X-ray image pixel is more complex because it comprises two components: the statistical distribution of knot wood density, and the statistical distribution due to the projection o f a knot cone onto the image plane. For a tilted knot, for example, the density value o f a pixel changes gradually from the knot wood to a combination o f knot wood and clear wood depending on how far the pixel is from the center of the knot. 5.2.1. Bayesian classification Let's say nx is the population o f all the clear wood pixels in an X-ray image and n2 is the population of all the knot wood pixels in the same image. A is the population o f all the image pixels {nx u x2). X is a random variable that is defined as the measured density at a pixel in the X-ray image, x is an observation, i.e., it is the value o f a pixel in an X-ray image. Rx and R2 are the two regions of X that define clear wood and knot wood respectively. q is the a priori probability of a pixel belonging in nx and q2 is the a priori probability of the same pixel belonging in n2 (qx + q2 = 1). px is the conditional probability distribution function o f nx and p2 is the conditional probability distribution o f n2. C(2 |l) is the cost o f misclassifying a pixel into R2 (i.e., classifying it as a knot pixel) while it belongs in Rx (it is a clear wood pixel). Also, C(l|2) is the cost o f misclassifying a pixel into Rx (i.e., classifying it as clear wood) while it belongs in R2 (it is a knot pixel). 66 The conditional probability o f the pixel belonging in nx (clear wood) given that its value (the observation) is x can be expressed as follows: 1iPi(x) P l = q,p^) + q1p^) ( 2 6 ) The cost o f misclassification is C = C(2|l) qx fPl(x) dx + C(l|2) q2 fp2(x) dx (27) Rt and R2 are chosen so that: p,(x) C(\\2)g2 R>Pl(x)*C(2\l)qi ™ /?,(*) C(\\2)q2 RrPi(x) <C(2\l)qi ( 2 9 ) The probability that a pixel is clear wood is much higher than that it relates to a knot, therefore q>>q2 (30) Since qx and q2 are the probability of a pixel belonging in a knot and clear wood correspondingly, they can be estimated by using the total clear wood area and total knot area in a sample X-ray image. The location, size, and shape o f a knot wil l be detected by using the edge of the particular knot in the X-ray image. Extracting these parameters is the whole purpose of edge detection. I f too many pixels are misclassified from knot to clear wood it disfigures the shape of the knot edge pattern, therefore affects the rest of the process. I f the threshold level is set to a higher level the resulted parameters of a knot wil l be biased to a lower value (because the size of the pattern wil l be smaller), which can be tolerated. Therefore, the cost o f misclassifying a clear wood pixel to a knot pixel (C(2|l)) is much higher than the cost o f misclassifying a knot pixel to a clear wood pixel (C(l|2)). 67 C ( 2 | 1 ) » C ( 1 | 2 ) (31) Therefore Rx is for all x's that are non-zero and R2 starts where px(x) diminishes to almost zero at a high percentile point. I f Xj, is the threshold level, R 2 = {V x > xr |pfa) = C(2\l)q ' * °* (32) If x T is a high percentile (for example 99 percentile) ofpx(x) it satisfies equation (32). 5.2.2. Edge detection As can be seen from previous section, knot edge detection in an X-ray image of a board relies on finding px(x); clear wood density distribution, by using the X-ray image. The frequency distribution (a histogram diagram) related to the X-ray image pixels is the summation of two normal distributions with different average and standard deviations. Knowing the fact that the density of wood changes in a very limited range as well as equation (30), one can conclude that px(x) has a dominating effect regarding the overall statistical measures of the combined frequency distribution. Therefore, the mean and the standard deviation of the overall distribution can be measured and used directly in px(x). In the following it is shown that the statistical mean of px(x) is almost equal to the statistical mean of the overall distribution: m, = (33) (34) where x , is a pixel that belongs inpx(x) and x , is a pixel that belongs inp2(x). m (35) 68 = -[nm, + nm] flL 1 1 2 2 J m = qimi+ q2m2 (36) Equation (30) and qx + q2 = 1 result in the following: m&m (37) The standard deviation of the overall distribution does not have such a simple relationship with the distribution o f the overall distribution as shown in the following. The standard deviation of the overall distribution s is as follows. The first term of equation (38) is the standard deviation of the clear wood distribution. Usually m2 is a multiple of mx therefore the second term of equation (38) is not negligible. The standard deviation of clear wood can be estimated by using the portion of the overall distribution that is mostly contributed by clear wood. Assuming that the standard deviations o f the two distributions are in the same order, one can conclude, from equation (30), that the maximum frequency of knot wood distribution is much less than the maximum frequency of clear wood distribution. For example for qx = 10 q2, the maximum frequency of clear wood distribution is about ten times as much as the maximum frequency o f knot wood distribution. This fact can be seen in Figure 5.4-a where the clear wood distribution obviously dominates the overall distribution (Figure 5.4-a, d, and e). The samples within a neighborhood of the mode value of the overall distribution is a censored representation of clear wood density distribution and can be used for estimating the standard deviation o f clear wood distribution. A censored sample o f a distribution is the sample of the distribution when a part of the distribution cannot be sampled. The limits of this neighborhood is defined by a threshold level, which should be more than 1 n s1 = —r Z (x - m) n-l itl v - ' (38) 69 q21 qx in order to avoid the maximum frequencies of knot wood distribution. I f qx is assumed to be almost equal to 10 q2, the threshold level should not be less than 10%. The limit of the neighborhood can be set to be 20% of the mode frequency of px(x). The standard deviation of the clear wood density distribution is equal to the estimated standard deviation o f the pixels within this neighborhood. Once the parameters of are estimated the threshold level for knot detection wil l be one of its high percentiles (like 99 percentile). 5.2.3. Experimental data The threshold level detection is studied on a typical sample. The overall statistical distribution of the density of wood is derived from the X-ray image of the sample. The actual density distributions of clear wood and knot wood were extracted subjectively for comparison. The statistical mean and standard deviation of the overall distribution wil l be compared with the statistical mean and standard deviation of clear wood distribution. The mean of the overall distribution wil l be used as the estimation of the mean of clear wood distribution. The 20% maximum frequency neighborhood wil l be used to estimate the standard deviation o f clear wood density distribution. The threshold level for knot edge detection wil l be derived based on the actual and estimated clear wood density distributions. The cost o f misclassification wil l be derived for each case. The distribution o f wood density (clear wood and knot wood) is shown in Figure 5.4-a. This diagram is the density distribution of all the pixels o f the X-ray image of specimen #97. One can easily see the dominating effect of clear wood density distribution by comparing the overall density distribution (Figure 5.4-a) with clear wood density distribution (Figure 5.4-e) and with knot wood density distribution (Figure 5.4-d). The actual knot wood density distribution and clear wood density distribution were extracted from the X-ray image for comparison. The pixels of the X-ray image were divided into knot and clear wood subjectively and with a confidence gap; i.e., the sampled knot pixel values are high enough to be obviously recognized from clear wood pixels and the sampled clear wood pixels were far from the knots (Figure 5.5). The gap affects the distribution of the knot density, as the lower values are more likely to be cut 70 out. Figure 5.4-d and Figure 5.4-e show the two distributions. It is obvious that the average density of knot wood is considerably higher than the average density of clear wood. The mean and standard deviation of the overall distribution are shown in equations (39) and (40): x = l (39) cr = 0.97 (40) The mean and standard deviation of actual knot wood distribution and clear wood distribution are as follows: * = 1 (41) a = 0.05 (42) * k = 1.39 (43) The mean and standard deviation o f the estimated clear wood distribution based on 20% of maximum frequency is as follows: * = 1 (44) ac = 0.04 (45) and the neighborhood of the mode is as follows: [0.91 1.09] (46) Equations (41), (42), (44), and (45) show that the mean and the standard deviation of the clear wood distribution are closely estimated. One can conclude from Figure 5.4-b and -e that the threshold level should be 1.2. In this case the probability o f misclassifying a clear wood pixel to a knot pixel is almost zero. Therefore, the cost of misclassification is as follows: 71 C=C(\\2)qJp2(x)dx (47) R l C , = C ( l | 2 ) ? 2 x 0 . 1 4 (48) The 99 percentile of estimated clear wood density distribution is 1.09. Therefore, the threshold level by using this criterion wil l be 1.1. The cost of misclassification is as follows: C 2 = C(2 | l ) qx fp{x) dx + C( l |2 ) q2 fp2(x) dx (49) R R 2 1 C2 = C(2 | l ) qx 0.032 + C( l |2 ) q2 0.064 (50) If it is assumed that C ( l | 2 ) = 10C(2|1) (51) and <7, = 10<72, (52) the cost of misclassification is as follows: C= 3.264 C{\\2)q2 (53) 72 density distribution of knot + clear wood (specimen #07) simulated density distribution of knot wood (specimen #67) frequency (a) frequency 0.6 0.8 1.4 1.6 simulated density of dear wood (specimen #07) 0.75 0.8 0.85 0.9 0.95 1 1 05 1.1 1.15 1.2 normalized density density distribution of knot wood (specimen #67) Frequency frequency 0.8 0.85 0.B 0.95 1 1.05 1.1 1.15 1.2 normalized density density distribution of dear wood (specimen #67) normalized density frequency (e) 1.2 1.3 normalized density Figure 5.4. Density distribution o f X-ray image pixels o f a specimen (a) overall-distribution; the distribution o f all X-ray image pixel values (b) simulated density distribution of clear wood (Normal distribution with similar mean and variance), (c) simulated density distribution (Normal distribution) o f knot wood (d) density distribution of knot wood (normalized to average density of clear wood) (e) density distribution of clear wood (normalized to the average value). 73 contour of a knot n i i II. sampling area for knots sampling area knot boarder for clear wood 2 4 12 14 Figure 5.5. Sampling pixels o f a knot from an X-ray image. Figure 5.5 shows the samples pixels for the distributions of Figure 5.4. The area around each knot o f the board is separated and the clear wood pixels and knot pixels are separated as shown in Figure 5.5. The threshold level that shows the border line between knot pixels and clear wood pixels is equal to the average o f minimum and maximum wood density in the image of Figure 5.5. 5.3. I n d e x i n g the pixels o f a knot Once the threshold level is found and the knot pixels are detected, the neighboring pixels that are related to a knot are grouped. A l l the pixels that belong to the same knot are adjacent in the X-ray image. Once one pixel that is related to a knot is found all the neighboring pixels can be found and marked. Therefore, indexing the pixels o f knots in the X-ray image is a double search. The first search is through all the pixels o f the image looking for a knot pixel that was not indexed before (looking for a new knot). Once a pixel is detected it is indexed and a second search starts. This search indexes all the pixels that are neighbor to this pixel or are the neighbors o f the neighbors o f this pixel. Therefore, all the knot pixels that are related to the same knot are found and indexed in this search. Once all the pixels o f one knot are indexed the first search resumes to find another knot. 74 A simple routine that can be used for indexing all the pixels o f the same knot is as follows. The first found pixel (the result o f first search) is placed in a stack. Then all the neighboring pixels are checked to see i f they are knot pixels. A l l the neighboring knot pixels are placed in the same stack as well. Then all the pixels in the stack can be retrieved and their neighboring pixels are check to see i f they are knot pixels. A neighboring knot pixel is added to the stack i f it already doesn't exist in the stack. The search terminates when all the pixels in the stack are checked and no new pixel is added to the stack. Figure 5.6. A search for neighboring knot pixels. Once the black pixel is known to be a knot pixel, all the gray pixels are check too. 5.4. K n o t p a t t e r n The knot pattern is the edge o f the image o f a knot. The edge pixels are the outermost pixels of a group of pixels that are indexed to be related to a knot. The inside pixels can be easily detected because all their neighboring pixels are knot pixels. The edge pixels are simply non-inside pixels. Figure 5.7 shows an edge pattern that is obtained by using the method presented above. This pattern is simply the edge of the density profile shown in Figure 5.1. 75 Ellipse Shape Pattern of a real knot —A _l _« 1 1 1 ; . — » , - M — — H r i 2 4 6 8 10 12 14 16 (mag a pixel Figure 5.7. A knot pattern (from its image) Solving this pattern for knot parameters is a basic problem in our knot detection procedure. The knot pattern can be imagined to be the orthographic projection of a cone onto an image plane ([69]). The cone is cut by two parallel-planes. Figure 5.8 shows the cone and its images on two image planes. Deriving the parameters o f the cone from the features of the pattern in x-y plane wil l result in solving the knot detection problem. It wil l be shown that the pattern in x-z plane can easily be solved. The patterns that are shown in Figure 5.8 are equivalent to knot patterns in an ideally simple case. In this case, the clear wood does not have any variations (standard deviation equal to zero) therefore the threshold level can be set very close to the average clear wood density. Also, the knot density is distinctly different than the clear wood density at all points o f the knot. Because the knot pattern results from the projection of all the points o f the cut cone onto the image plane, different sections of it can be distinguished. This pattern consists o f two half ellipses whose major axes are along the same line; therefore, their minor axes are parallel. The two ellipses are connected by two straight-lines as is shown in Figure 5.8. The center part of the pattern is a trapezoid whose two bases are equal to the diameters of the cone at the two ends. 76 z A the knot x Figure 5.8. Two orthographic projections of a knot. Projection line is perpendicular to the image plane (x-y and x-z planes). The special shape of this image can be proven to be correct by circulating the generator line around the axis o f the cone and following its projection on the image plane. It can be proven that the symmetry axis of the pattern is the projection of the axis of the cone. Also, the centroid o f the image lies on symmetry axis o f the pattern. It wi l l be shown that in most cases one of the extremities o f the symmetry axis o f the pattern is the farthest point from the centroid o f the contour. Therefore, the symmetry line can be drawn from the centroid and the farthest point; that is, the point with maximum distance from the centroid, that is on the perimeter of the pattern. The axis o f a knot cone is assumed to be closely radial; therefore, it wouldn't have a large component along the tree's length. It means that the edges o f the board wil l not cut the pattern parallel to the symmetry axis (i.e., its major direction). This axis wi l l be parallel to the large axis of the board. 77 o Figure 5.9. The pattern o f a knot, o is the centroid o f the pattern. The dashed line indicates the symmetry line of the pattern. 5.4.1. Knot model parameters A knot is modeled by a set of parameters that are easy to derive and useful in practice. The two centers of the two ellipses of the knot and the radii o f the cone at the two ends can define the knot cone. The two centers define the axis o f the cone and the two radii define the acute angle of the cone and size o f the knot. Therefore, eight parameters can define the knot. The centers of the two knot ellipses are projected to the centers of the two half ellipses of the knot pattern (o ] and o2 in Figure 5.9). The two diameters of the cone are projected onto the two minor axes of the corresponding half ellipses (dx and d2). Once the two patterns in two perpendicular image planes are obtained the knot parameters can easily be solved. 5.4.2. Measuring and calculating the features of the pattern A simple solution to finding dx and d2 assumes that the knot pattern is a trapezoid. What is needed to find dx and d2 are the two areas of half trapezoid and the height o f the trapezoid from the center to the two ends. I f the area of the smaller end is ax and the area of the bigger end is a2, dx and d2 (see Figure 5.10) are found as follows: 2a, d = Y ' d (54) 78 (55) d Figure 5.10. The approximate image of the knot, o is the centroid o f the pattern. In order to measure dx, d2, Xx, and X2, the centroid of the pattern, o, is found first (see Figure 5.11). The farthest point of the edge of the pattern from o is the tip of the smaller ellipse. p2 is the point that the angle of (px, o, p2) is closest to 180 degrees. p3 is the point with Z(p}, o,pi) = 90° andp 2 is the point that Z(p2, o,p3)= 180°. The pattern is then divided into two halves on the two sides of (pi, pj line. Let's say p is a random point in the pattern, i f ~st = (o , p) J_ (p3 ,pA) is the perpendicular projection of "a* on line (p 3 ,p), it defines which side of the pattern the point p lies. The angle of H* and (o,px) ( Z ( (o~~px),~dL )) indicates the side. If this angle is zero p belongs in one side (p is closer to p) and i f it is 180°, p belongs to the other side (p is closer top2). Figure 5.11 shows the four points and the center o f the pattern. Adding the area of all the pixels of one side gives the area of one side of the pattern. Xx is the distance between o and px, X2 is the distance between o and p2, and d is the distance between p3 and p4. From equations (54) and (55) the two diameters o f the two ellipses are obtained. The projections of the two centers of the cone on an image are estimated by shifting px and p2 into the connecting line by dx 12 and d212. When the pattern ends to the edge o f the board it becomes a trapezoid (Figure 5.8, the pattern in x-z plane). In such a case the detection problem becomes a straightforward 79 problem. In such a case the sections of the pattern that touch the border of the board are measured. The length of each section is dx and d2, respectively. The diameter of the knot cone at the two ends equals d{ and d2 that are readily measured from either x-y or x-z pattern. The centers of the two ellipses of the knot are obtained from the two projections of each center that are obtained from each pattern. Also the centroid of the knot is estimated by using the two centroids of the two patterns. Figure 5.11. The four points of the pattern In the following the block diagram of knot pattern analysis is presented. In this algorithm, it is assumed that all the elements of the pattern are previously detected and saved in an nx2 matrix where n (a variable) is the number o f pixels in the knot pattern. 80 load the knot elements £ find o, the center of the pattern 1 n j n X = " E J C , V = - Z V find the edge elements (X) findpx, the farthest point from o. find />, the complement of /?, with respect to o. find /?3, a perpendicular point to (p, , p2) P 3 = min I find p the complement of p with respect to o. PA= max Z.(p o,p)) Figure 5.12. The flow chart for knot pattern analysis. 81 5.4.3. Matching patterns In order to match the two patterns in either o f the images the longitudinal location o f the centroids and the span o f the patterns (the length of the pattern along the board) are used, as is explained in the following. The two patterns are matching i f the two spans are overlapping. I f more than one pattern overlaps with a pattern, the ones with closest centroids are chosen. The two sides o f the two patterns are also matched by finding the smaller end and the bigger end. The smaller diameters are averaged to result in the smaller diameter o f the knot. The projected centers o f the ellipse on these images result in the center o f the smaller ellipse of the knot. Similarly the parameters o f the larger ellipse of the knot can be obtained. 5.5. G e o m e t r i c a l features o f a knot The geometrical features o f the knot are indicators of its size, taper, and deviation from center. The volume of a knot, dx I d2, axial deviation, and knot area ratio are the geometrical features o f a knot. In order to measure the volume and the axial deviation of a knot, a model o f the knot is generated (by using the measured parameters) in a cubic matrix. B y using the center o f the two ellipses the axis line o f the cone can be generated. The cone is generated as a parametric equation. For every point of the matrix cube its projection on the axis of the knot is obtained. The radius o f the cone at this point is obtained by linearly interpolating the two radii o f the cone. I f the distance o f the cube pixel from the axis is more than the related radius o f the cone the pixel is marked as a knot pixel. Adding the volume of all the knot pixels o f the cube results in the volume of the knot. Knot area ratio is equal the ratio o f knot area to board area at every cross-section of the board. The axial deviation is the standard deviation of all the pixels of the knot when compared with the central line o f the board. The axial deviation of the knot is shown in the following: 82 1 / N Axial deviation = J^\jTi (d(pa,p))2 (56) where N is the number of knot pixels, d(pn,p) is the distance between two points, pn is the wth knot pixel and p is the projection o f pn on the central line of the cube. 5.6. X - r a y i m a g e analys i s ; ex trac t ion o f b o a r d i m a g e a n d defect detect ion A s was explained in previous chapters, detecting knots from the X-ray image is the cornerstone o f geometrical features for characterizing a specimen. In this section the extraction o f a board X-ray image from the measured X-ray image and the extraction o f knots from the board X-ray image wil l be presented. In Section 5.2 the image segmentation problem was generally related to the density distribution of clear wood and the required equations for extracting useful information from an X-ray image were developed. In this section those equations are applied to a typical specimen. The measured X-ray image can be divided into three parts: the surrounding area, the clear-wood area, and the knot area. Each area of the image is detected and separated from the rest by thresholding the density of the image pixel. Separating the board X-ray image from the rest of the image is simply done by a fixed thresholding because the noise level is not comparable with the signal level. The overall statistical distribution o f the density of wood is derived from the X-ray image o f the sample. The actual density distributions of clear wood and knot wood were extracted subjectively for comparison. The statistical mean and standard deviation o f the overall distribution wil l be compared with the statistical mean and standard deviation o f clear wood distribution. The mean of the overall distribution wil l be used as the estimation o f the mean o f clear wood distribution. The 20% maximum frequency neighborhood wil l be used to estimate the standard deviation of clear wood density distribution. The threshold level for knot edge detection wil l be derived based on the actual and estimated clear wood density distributions. The cost o f misclassification wil l be derived for each case. 83 5.6.1. Edges of the board Extracting the image o f the board from the X-ray image was overlooked to this point. A threshold level can extract this segment of the image as well. First the noise level in the sample image is measured. Figure 5.13 shows the comparison of the output levels. The noise level is related to the same run that the image o f the board is extracted. For that purpose the average o f the sensor output before the board reaches the location o f the sensor is used for representing the noise level. This comparison shows that the signal to noise ratio is at least 100. Also the comers o f the board cross-section profile shows that the image does not have large shadow area. The shadow image appears as a slow slope o f the line descending from the clear wood density to the surrounding area. the X-ray system output with a board tha X-ray system output with out a board X-ray tensor number X-ray sensor mirriber Figure 5.13. The output o f the X-ray system with (top) and without (bottom) the sample board. The maximum sensor noise level is about 0.01 of the average signal level. Therefore, a threshold scale of 0.6 was used to separate the board image from the surrounding area. This level is well below the wood density level and well beyond the noise level (see Figure 5.4-a). For segmentation of the X-ray image it is assumed that the board is located in a straight extension along the image. Therefore, the average values of pixels along the image and 84 across the image are found. The search starts from both the beginning and the end of the image. When the average pixel value is more than the threshold level, the location is marked as the edge of the board. T h * o u t p u t o f t h e X - r a y m a c h in a r ! . . . „ 1 5 5 0 0 1 0 0 0 1 5 0 0 2 0 0 0 l e n g i n T h e > 2 5 0 0 3 0 0 0 3 5 0 0 o f t h e board ( m m ) -ray i m a g e o f t h e b o a r d 4 0 0 0 4 5 0 0 I I I t>; i r I r r - I " i • • • - — r n 5 0 0 1 0 0 0 1 5 0 0 2 0 0 0 2 5 0 0 3 0 0 0 l e n g t h o f t h e b o a r d ( m m ) 3 5 0 0 4 0 0 0 4 5 0 0 7igure 5.14. Output o f the X-ray imaging machine (top) and the X-ray image of the board (bottom). The coordinate o f the image o f the board in Figure 5.14 is used for knot detection. The same method is used for processing the edgewise image o f the board. 5.6.2. Knot detect ion One can conclude from Equation (32) and Figure 5.4-b that the threshold level should be 1.2. In this case the probability o f misclassifying a clear wood pixel to a knot pixel is almost zero. I f this level o f thresholding is used the edges o f knots of the board would be as shown in Figure 5.15. 85 knots edges with threshold scale equal to 1.2 board withd (mm) 620 1240 1860 2480 3100 board length (mm) Figure 5.15. The edge o f all knots o f a board detected by fixed Bayesian based thresholding. The threshold level is 1.2 times the average clear wood density. The edges of all knots o f a board are detected by fixed level Bayesian based thresholding. The threshold level is 1.1 times the average clear wood density. The edges o f knots of the board that are detected by using this thresholding are shown in Figure 5.16. knots edges with threshold scale equal to 1.1 board withd (mm) _ILi i J l U • U i U I i S j _ 620 1240 1860 2480 3100 3720 4340 board length (mm) Figure 5.16. The edge of all knots o f a board detected by fixed Bayesian based thresholding. The threshold level is 1.1 times the average clear wood density. Other randomly selected specimens are shown in Figure 5.17 and Figure 5.18. 86 knots edges with threshold scale equal to 1 2 45 620 1240 1860 2480 board length (mm) 3100 3720 4340 Figure 5.17. Knot detection by using Bayesian thresholding. Specimen 130. board width (mm) knots edges with threshold scale equal to 1.2 1240 1860 2480 3100 3720 4340 board length (mm) Figure 5.18. Knot detection by using Bayesian thresholding. Specimen 70. The following figures show the effect of thresholding to the image of a knot. 87 88 real size side view X-ray image of a knot, specimen #100 12 11 10 9h 8 7-6 5 4h 3 2 1 12 11 10 9 81 5 7-6 5 -4 3 2 1|-6 8 length (cm) 10 12 14 real size discretized side view X-ray image of a knot, specimen #100 8 length (cm) 10 12 14 Figure 5.20. The side view X-ray image and discretized X-ray image of a knot. 89 5.6.3. Three dimensional knot shape detection (simulation) In this section, the projection pattern is solved and all its parameters are detected. A projection pattern is a pattern that is generated in an x-ray image o f the knot. A knot in a perfect board and its x-ray image(s) are artificially generated. Simulation was used for comparing the generated parameters with detected parameters and for analyzing the accuracy o f the detection method. The error o f the knot detection algorithm wil l be reflected on strength estimation and, therefore, it is reasonable to expect that any improvement in the knot detection algorithm wil l improve the strength estimation accuracy. A knot can be modeled by using a higher density cone (than clear wood) inside a cube. The cone o f the knot is defined by seven parameters; the center o f the knot (three parameters), the direction vector (three parameters), and the acute angle of the knot (one parameter). The X-ray image of the board is basically the orthographic projection of the object (i.e., the board) on the image plane. In the following it is shown that one can start from the X -ray image and obtain all the parameters o f the knot. The board itself is a 3 -D matrix. Each element o f the matrix that is inside the knot (can be defined by using vector algebra) has the value of one while each element that is outside the knot has the value o f zero. The image (a 2 -D matrix) is obtained by adding the pixel values along a specific direction. Since the image is simulated, its edges represent the most accurate pattern that can be obtained by measurement. In fact the actual measured images are not this accurate because they suffer from general blurring and lack o f uniform material density distribution in the knot and in the clear wood area. The fact of having perfect image in this case allows the threshold level to be chosen close to the clear wood density and obtain a pattern that is almost as accurate as a theoretical pattern. The detection procedure starts by detecting the existence of a knot due to the existence of some points in the image with higher density than the median value. Then the clear wood density is used for finding one point in the knot area. The entire pixels that are in the knot are searched for finding the edge points and the centroid of the image. The image is 90 presented by its edge so that the inner density o f the knot does not affect the procedure (this makes the procedure more general). The farthest point from the image centroid and the centroid itself are located on the symmetry line of the pattern. There is a complementary point to the farthest point that lies on the symmetry line and the edge of the pattern. The distance of these two points is the length of the pattern. The width of the pattern is the longest line segment that is drawn between two points of the edge and is perpendicular to the symmetry line. In this simulation the length of this line is assumed to be close to the length of the line segment that passes the centroid and is perpendicular to the symmetry line. These two line segments are called the symmetry line and the perpendicular line in this report. Having the centroid and the farthest point from it, one can find the three other points by using vector algebra. The areas of the pattern that are at either side of the perpendicular line are given different names. The area of inner side o f the pattern from the perpendicular line towards the farthest point is called the vertex-side area and the other area is called the non-vertex-side area. Each of them is found by scanning the entire pixels of the pattern and finding what side of the line the pixel is located. Then the area o f the pixel is added to the related area. The vertex-side area is used for finding the smaller diameter of the knot cone. The simulation parameters are as follows. (x o ly 0,z 0) = vertex location (v l 5v 2,v 3) = the direction vector o f the knot <p - cone acute angle V V i + v\ y/ = arctan( — ) is the angle of the knot. L = Max{(3S-z0)[tan(i//+<p) - t an (y -# ) ] , (38-z0) \m(y/+(p) + z 0 tan(^-#>)} is the length o f the pattern. In this equation 38 is the thickness of the board. / = 2x((38-z0) / cos(y/)) • tan(^j) is the larger diameter of the knot 91 / ' = 2x(z 0 / cos(^)) • tan(^) is the smaller diameter of the knot. The detected parameters are as follows. L is the length of the pattern. / is the width o f the pattern, which is equal to the larger diameter of the pattern, / ' i s the smaller diameter of the pattern. 1'0 517 x Figure 5.21. The diagram o f a knot cone and its parameters. 1 K y/ is the solution o f L • cos(ys) - 38 sin(^) = ^ • ( / + / ) • Since y/ is in [0 , and since K c o s ( ^ and sin(^) are always decreasing in [ 0 , ^ ] , the above equation has a unique solution. The location of the knot vertex is kept fixed at (x0 = 0 , y0 = 0) and - 2 0 < 2 Q < - 3 5 in increments o f 2. Two relatively small cone acute angle values are chosen to be q> = 3,5,7 degrees. Three values for the knot angle are y/ = 25, 28, 31 degrees, v is equal to v 2 so that by choosing y/ the three direction vectors can be determined. For the given knot angles (y/) and knot vertex location (0,0,z0), the smaller 92 diameter o f the knot would be less than 2.5 cm and the larger diameter would be less than 5 cm. Therefore, the size of the knot is relatively small and the accuracy o f the model can be assessed by this simulation. A three dimensional matrix simulated a small board segment (38 x 89 x 100 mm). Every element of the matrix represents a point in the artificial board. Each dimension o f the matrix is one dimension of the board (x, y, or z). The matrix elements that are inside the knot are equal to one and the rest o f the elements are equal to zero. These element values are selected for simplicity, such that the edge of the knot can be easily detected in a simulated X-ray image as follows. The simulated X-ray image is a two dimensional matrix whose element values are obtained by adding the elements of the simulated board in one direction. The x-y simulated image (equivalent to the top X-ray image) is obtained by adding up the elements o f the simulated board along the z axis. Similarly for obtaining an element value of x-z simulated image, all the elements of the simulated board along the y axis are added. A l l the nonzero elements belong in the knot image. Two X-ray images o f a simulated knot are shown in Figure 5.22. The simulation parameters are y/ = 35, <p = 5, and z 0 = - 30. The resolution of the image is 0.5 mm. 93 20 40 80 80 100 120 140 160 180 200 Simulated side X-ray image of a knot * //J 1 ii VI ii I If 1 V 1 1 i 1 I M 20 40 60 80 100 120 140 160 180 200 Figure 5.22. Contour diagrams showing the simulated top X-ray image (top) and side X-ray image (bottom) of a knot. The simulation parameters are y/ = 35, <p = 5, and z f l = - 30. Image resolution is 0.5 mm. Figure 5.23 shows the error o f detection (for the image resolution) for larger and smaller diameters of the simulated knot. The coefficient o f determination (A*2) between the detected smaller diameter of the knot and the generated smaller diameter is 0.84, and the r2 between the detected larger diameter o f the knot and the generated larger diameter is 0.98. The error of the larger diameter in some cases is relatively big. 94 3 £ 2 E I • I-•s I" I"2 -3 •5 0 error tor the larger diameter of the knot , 1 1 j . * . „ * * + + ** • + fc-^H-t...-*... * * + • t +* +*** :* • . H L*+ : ' " * f + ++ i 16 20 25 S larger diameter ofthe knot (mm) error for the smaller diameter of the knot 1 + I I + £ $ +* i * + +_ + + ** i + •V i * 30 40 50 60 smaller diameter ofthe knot (mm) large diameter, r2=0.83746 80 i — i — i — 1 1 + i * + + + +- + + + > . . • V * + • i i i 2 0 2 5 3 0 3 5 4 0 4 5 5 0 5 5 small diameter, r^o 97999 40 30 + + + . . , . * j * * * * T i i 35 40 Figure 5 . 2 3 . Estimation accuracy for the knot diameters. The horizontal axis represents the generated diameter and the vertical axis shows the detected diameter. The set of knots that are simulated contains the knot size/locations that can stretch outside the limit o f the board and cause excessive error in the detection. The following Figure shows a simulated knot that stretches outside the limit o f the board and produces error in detection. 9 5 5.7. K n o t detect ion f o r r e a l d a t a The knot detection algorithm was applied to a few samples for comparison of the detection of knot size and location with image data. The results are shown in the following figure: 96 knots in two images of flatwise and edgewise (top; flat, bottum; edge) specimen #100 620 1240 ' - T V U k 1860 2480 3100 3720 4340 620 1240 1860 2480 3100 3720 4340 Figure 5.25. The result o f applying the knot detection algorithm to the X-ray image of specimen #100. The location of each knot is detected and marked by an asterisk. The two knot-cone diameters are also printed at the location of the knot. For example, the two numbers (10,6) show that the diameter o f the knot at one of the faces is equal to 10 and it is equal to 6 at the other face of the board. The board can be regenerated from the knot parameters and an artificial X-ray image can be produced. The result is shown in Figure 5.24. 97 flatwise X-ray image simulation, generated by using simulated board 75 62 45 37 25 12 1 ^1 620 1240 1860 2480 3100 3720 4340 edgewise X-ray image simulation, generated by using simulated board 37 25 12 620 1240 1860 2480 3100 3720 4340 Figure 5.26. The simulated X-ray images. These images are generated from the geometrical features o f specimen #100. Also location o f knots can be detected in C C D camera images o f a board. About 80% o f the knot patterns in each simulated image (Figure 5.26) overlap with those in the discretized form o f the X-ray image (Figure 5.25). In order to calculate the overlap of the two images, the X-ray image is discretized the same way as is described in knot detection algorithm and the output image is a binary with pixel values o f zero or one. The pixels that represent a high-density defect are marked as one and the rest o f the pixels are marked as zero. The simulated X-ray images are similarly processed. Therefore, the M N overlap o f the two images is equal to § ^x^jpcj^ij). M and N are the number of rows and columns o f each matrix. Dividing the over lap o f the two images by the sum o f defect M N area in the simulated image (g J} x (ij)) gives the ratio of the overlap. A s a byproduct of knot detection algorithm, the location and size of knots can be indicated in other measured profiles such as C C D camera image (Figure 5.25). 98 Knot geometrical features of X-ray are used to identify knots in the TOP CCD image 500 1000 1500 2000 2500 3000 3500 4000 Knot geometrical features of X-ray are used to identify knots in the BOTTOM C C D image 500 1000 1500 2000 2500 3000 3500 4000 Figure 5.27. The location and size o f knots in C C D camera images are detected from geometrical features. 99 5.8. X - r a y a n d S O G i n t e r r e l a t i o n Similar to density variations in a board the grain angle variation can be divided into two factors, the variations due to the existence o f knots and other variations in the regular grain angle of wood. Figure 5.28 shows the variations in the grain angle profile that is caused by the existence of knots. The type of variation in the grain profile (as can be seen in Figure 5.28) depends on the relative path of the (grain angle) measurement head with respect to the knot. Therefore, the shape o f the grain angle variations in Figure 5.28 varies. In Figure 5.29 all the profiles of a sample board as well as the board image are shown. It is interesting to note that the variations due to the existence o f knots (Figure 5.28) can be modeled using the geometry o f knots. A n approximate model proposed in [70] is based on fluid flow around a solid obstacle in its path. The average grain angle in this model replaces the direction o f the fluid stream. The direction of the fluid at every point around the obstacle defines the direction o f the grain angle around the knot ([70]). Assuming that the general trend of clear-wood grain angle is the direction of a line with a fixed angle with the longitudinal direction of the board, one can generate the local variations of grain angle around a knot by generating the fluid direction around the knot ([71]). 100 10 0 -10 10 0 -10 -20 60 50 40 30 20 10 ! 1 r T i i ! I i i 500 1000 1J 00 2000 2500 3000 3500 4000 4500 ! I I i i ! ! 1 i i i i 1 i i I i 500 1000 15( 0 2000 2500 3000 3! 00 4000 4500 ' • : - : ^ - - r r r~'i r T l [ • ; ^ : , ' T : t 1 i f f i i i i i i i 620 1240 1860 2480 3100 3720 4340 Figure 5.28. Most of the knots show a grain variation in the grain angle profile. The top two profiles are grain angle profiles o f the top half and bottom half o f the specimen #100. The bottom image is the X-ray image o f the same specimen. 101 102 5.9. Suf f ic iency o f s a m p l i n g rate; the p r o b l e m o f reso lut ion The sampling rate of the natural properties of wood that is regarded as the resolution of the measurement system can have direct impact on the accuracy of the estimation because it can change the information content o f the input signal to the estimation process. The resolution o f the image compared to the size o f a knot is the controlling factor in geometrical feature sets. The longitudinal resolution o f the current X-ray image is close to 3mm (-1/10 inch), which is very much smaller than the size o f strength determining knots. In the following, the effect of the sampling rate of the signals on the estimation accuracy wil l be studied by using Fourier analysis of M O E and SOG. B y comparing the spatial distance o f measurement points (along the board) with the sampling instances of an analog signal, one can analyze the sufficiency of the sampling rate for measured signal. A s wi l l be shown in the following, the sampling rate of the slope o f the grain signal and the modulus o f elasticity signal are much more than the signal bandwidth, therefore no aliasing takes place and no part of the information is lost. Figure 5.30 shows the Fourier transform o f an M O E signal. In order to capture the effect of the sampling rate, the average of the signal is removed. Figure 5.30, shows that the bandwidth of the signal is much less than the sampling rate. The zero mean MOE profile of specimen #3 Discrete Fourier transform of the MOE profile (specimen *3) 100 200 300 400 SOO 800 TOO SOO sample number along tne board (n) Figure 5.30. The measured signal and its Fourier-transform o f a typical M O E profile with zero mean (specimen #3). 103 m = The measured M O E profile of the board m'= m -m = The zero mean profile (shown in Figure 5.30) M(k) = \jtm'(ri) e * * " 1 ^ (57) A s is shown in Figure 5.30, the maximum of the Fourier transform magnitude (|M(k)|) at low frequency is more than 275 (max(|M(k)|)>275), and the magnitude o f the Fourier transform is less than 0.8 (|M(k)|<0.8) for k>300. Therefore, the energy of the signal at the tail o f the Fourier transform (near k=400) is much less than the energy at low frequency levels (k<50) therefore the measured signal wil l not be affected by aliasing. Similarly, the grain angle profile and its discrete Fourier transform are shown in Figure 5.31. The average of the grain angle profile is not removed because the average grain angle is usually small. A measired grain angle profile of speamen #3 dscrele Fouler transform ofthe gram a r ^ pro«e (specimen <G) E800| Figure 5.31. The measured signal and its Fourier-transform o f a typical S O G profile (specimen #3). Similar to equation Figure 5.31, the discrete Fourier transform o f the grain angle profile can be written as follows. s(n)= The sampled grain angle profile of the specimen. 104 n - The sample number. N= Total number of samples in a profile. The number of samples is 1300. S(k) = The discrete Fourier transform of s(ri), defined as follows: S(k) = jjSt s(n) e^->xn-.yN k=\,...,N (58) A s is shown in Figure 5.31, the maximum of the Fourier transform magnitude (|5(A:)|) at low frequency is more than 300 (max(\S(k)\) > 300), and the magnitude o f the Fourier transform is less than 20 (\S(k)\ < 20 for k > 600). The energy of the signal at the tail o f the Fourier transform (near k = 650) is much less than the energy at low frequency levels (k < 200) therefore the measured signal wi l l not be affected by aliasing. 5.10. P r e p a r a t i o n o f spec imens a n d m e a s u r e m e n t process During the course of this thesis two complete sets of data were produced and analyzed. The first set o f boards was used for developing the required skills and producing the programs. The second set of boards was used for actual analysis and comparison of models and methods. The boards were selected from typical production from mills in B C . A few o f the boards had suffered damage during shipment and were removed from the database. Once the set of boards for the experiments was selected each board was assigned one number (starting from 1). The code o f each board was written on one cross-section o f the board and was used in all the measurements. B y using the board code the direction and side were assigned to a board by having the code in front and upward (not upside down). This direction was needed to be consistent through X-ray measurement for geometrical feature extraction. The boards were dry (moisture content less than 12%). The moisture content of the boards was measured before all the measurements to make sure that the boards remain dry. The boards were kept under roof and away from rain and snow. For transportation, 105 the boards were first wrapped in plastic covers to make sure that they are kept from the elements. The CLT™ machine has two outputs ( B N C connections) that port the (amplified) output voltage of the two M O E sensors. A n Atlantis data acquisition board (in a laptop) with five channels of analog inputs (only two channels were used) was connected to the CLT™ output port. The driver program o f the data acquisition board had the option o f transforming the existing data files from binary to text. This option was later (in the laboratory) used to transform the data file format from binary to text format. When the size o f the data files was not too big, (i.e., it could be saved on internal hard-drive), the text format was more desirable because the file could be checked by a variety o f different applications and proved useful through the experiments. The Newnes Advantage2 X-ray machine has a set of measurements and wood cutting tools that are controlled by a group o f computers. This machine also had a conveyor system that consistently moved the board through the machine. The output of the X-ray sensors was saved in the machine's memory. The data files were then downloaded from the computer and transferred to the laboratory at U B C . The Newnes Advantage2 output files were named in sequential order. A s each board was going through the machine an assistant registered the code of the board. The table was later used to change the file name so that it showed the board code. This wil l be explained in detail later. Removing the surrounding area In the following, three terms wil l be used frequently that should be defined first. The measured signal is related to what is saved in a data file. There can be more than one measured signal in a data file. For example a M O E data file contains two measured signals. The measured signal o f a data file ( M O E , S O G , or X-ray) either contains a measured one-dimensional profile (or simply a profile) of the board or it contains an image (X-ray image) of the board. We use the term profile for one-dimensional profiles as well as the images. A l l the measured signals contain a surrounding area to each profile. For one-dimensional profiles the surrounding area is the output of the measurement system before and after the 106 actual profile. For X-ray images the surrounding area is the part o f the image that is not covered by the board. The surrounding area contains only measurement noise and is easily separated from the measured profile by thresholding as follows. Since the density of clear wood in X-ray image is detected by statistical methods, the threshold level was set to be almost half of the density o f clear wood. A scan algorithm searches for the beginning and the end o f the board in two dimensions of the image. Starting from the start or the end of each axis of the image, the scan algorithm searched for the point where the measured density level was more than the threshold level and marked them as the beginning or the end o f the X-ray image. The surrounding area in M O E signals was only the noise level (at a level of 0.001 volt) that was much below the M O E profile level (at least 2 volts). In order to extract the measured M O E profiles from the data files a fixed threshold level (0.1 v) was used. Starting from the start or the end o f a profile, a scan algorithm searched for the point where the measured signal level was more than the threshold level and marked them as the beginning and the end of the M O E profile. As was discussed in Chapter 2 the dynamic bending machine (CLT™) produces two M O E profiles. I f the two profiles are significantly different it shows that the board is bent. In order to reduce the effect of the bend in a board the average of the two profiles was saved as a third output signal o f the dynamic bending machine. Therefore, the number of M O E profiles is three. In the capacitance-based grain-angle-measuring system the output is kept constant when the measurement head is not in contact with wood. For finding the beginning and the end of the measured grain angle profile, the scan algorithms search for any sudden change in the measured signal. Once the two points (start and end) are marked a small portion o f the profile that corresponds to a few centimeters of the length of the board were removed in order to exclude the transient response o f the measurement machines. The output was considered as the measured profile o f the board and was used for feature extraction. There were six measured profiles per specimen. The structure o f the capacitance based grain angle measurement was discussed in Chapter 2. The size o f the head was about 5 cm. In order to cover the wider face of a board two scans were carried out. Also, one scan for each narrower face of a board was carried out. Therefore, the output o f the capacitance based grain-angle-measuring machine is six signals. 107 Similar to M O E signals the surrounding area in Microwave signals was noise. In order to extract the measured grain angle profiles from Microwave signals the average o f the signal was used as a threshold level. Starting from the start or the end o f a profile, a scan algorithm searched for the point where the measured signal level was more than the threshold level (half the average of the measured signal). The rest of the algorithm is the same as that o f the capacitance based system. The output signals of the Microwave system were; grain angle, grain alignment, longitudinal signal amplitude, transverse signal amplitude, longitudinal signal phase, and transverse signal phase. Table 2: The number of output signals of measurement machines Board density (X-ray) Grain angle (capacitance based) Grain angle (Microwave) Modulus of Elasticity (CLT™) The number of output profiles 2 6 6 3 The type o f output profile Two dimensional density image One dimensional profile One dimensional profile One dimensional profile Practical considerations and the database Boards were first coded in the laboratory and few broken specimens were removed. The broken specimens have been broken in transportation or have a relatively large fracture along and partly across the board. The board code was written at the cross section o f each specimen. There was little or no control on keeping the order of specimens in a set when the specimens were to be taken from a bundle. 108 A l l the data files that were produced by the measurement machines were in sequential order with regard to the measured specimen. For example the name of the stored data files from the dynamic bending machine were CLT_0001.data, for the first specimen that was measured and CLT_0002.data for the second specimen. In every measurement process the code of every passing board was registered by an assistant as follows: 1 0014 2 0528 Therefore, it was known that the file CLT_0001.dat was related to specimen 0014 and that the file CLT_0002.dat was related to specimen 0528. It is obvious that any mistake in registering the specimen code would directly cause a mix up in the database. The experiments had to be finished within a limited time slot due to the availability o f the Newnes machines. Therefore, there was little chance for repeating a part o f an experiment. In order to avoid specimen mix up there was a repeated measurement of the same specimen as is explained in the following. After a specified number (say 20) of the specimens were measured the last specimen was measured again. This repeated measurement made two relatively identical data files in the measured data files. Table 3: Specimen file names Sequence of the passing board The board code Final file name 1 0014 CLT_0001.dat 2 0528 CLT_0002.dat 19 0751 CLT_0019.dat 109 20 1049 CLT_0020.dat 21 1049 CLT_0020.dat 40 0081 CLT_0040.dat 41 0081 CLT_0041.dat Any error in registering the board codes would appear as a shift in this table. There was rarely any mistake in registering the board, however, any error would cause a number (usually 20) to either be removed from the database or be sent back for another round of measurement. In the end, the table of specimen codes would match all the (remaining) data files. Then based on the board code table, the data file names were changed so that the file name represented the board code. For example, 1 0014 CLT_0001.dat CLT__0001.dat -» Landmark_CLT_0014.dat If possible, the format of the data files was also changed to an easier to handle format for future research. Landmark_CLT_0014.dat -> Landmark_CLT_0014.txt The strength o f all the boards was recorded in one file, which contained two columns, one for board code and one for the registered strength. The database is a structure of directories that represent the related measurements as follows. 110 I L G S X-ray Edge | NewnesX_e_0014.dat I Flat I NewnesX_f_0014.dat — C L T | Landmark_CLT_0014.txt Microwave ' CAE_MW_0014 . tx t ' Strength 1 ILGS_strength.txt Figure 5.32. The database structure. The X-ray files are binary files that contain 128x1400 matrices. Each element is represented by 16 bits. The CLT™ file is a text file that is a matrix of 4000x3. The first column is the location indicator, the second column is the first dynamic bending machine profile (cell 1), and the third column is the second dynamic bending machine profile. The microwave file is a text file that contains a matrix o f seven columns. The first column is the location indicator, and other columns (as was said before) were the related grain angle, grain alignment, longitudinal signal amplitude, transverse signal amplitude, longitudinal signal phase, and transverse signal phase. Ill The strength file is a text file that contains a matrix of size two columns; the first column is the board code, and the second column is the tensile strength (MPa). A l l the board codes o f the specimens in the database are included in a specimen list text file (ILGS_boards.txt) that is placed in the I L G S directory. A data processing program reads the board codes from the specimen-list file, generates the name of all the related data files, and loads the files from the pre-specified directories of the database. 5.11. S p e c i m e n C h a r a c t e r i z a t i o n In a pilot study in the Wood Science Department of U B C , the statistical features were applied to the raw data. In the work reported here the following features were computed. For one-dimensional profiles the average, minimum, maximum, standard deviation, variance, the standard deviation o f the absolute value, Fisher skewness, Fisher kurtosis of the profile are the feature set of the profile. These features are compatible with those used in previous statistical analysis in the Wood Science Department of U B C . These features characterize the board. Fisher skewness (sk) and Fisher kurtosis (k) are defined as follows. ^ w = ^ 5 £ i ^ ( 5 9 ) k M = N £ L l ^ 1 ( 6 0 ) In the above equations, x is the variable, N is the number of samples in the profile, x is the average value o f the variable, and cris the standard deviation o f the variable x. For X-ray feature definition the geometrical features were produced. From the X-ray images of the board the knots were detected and a model board was generated. The coordinate of a knot is shown in Figure 5.33. For each knot the centers (xy,z) o f the two 112 end ellipses and the diameter o f the cone at the two centers were saved as the parameters o f the knots. For each knot the characterizing features are as follows. 1. Fy Volume 2. F2: Axial deviation (average distance of the knot pixels from the center o f the board), 3. Fy Axia l standard deviation (standard deviation of the knot pixels form the center o f the board), 4. F4: The ratio o f small diameter to the large diameter, 5. F5: Closest distance to a neighboring knot, 6. F6: Knot area ratio ( K A R : the ratio of a knot cross-section to the board cross-section) Figure 5.33. The coordinate o f a knot in a board These features were processed to produce a final set o f features as follows. 1. The average o f knot area ratio (F 6 ) over all the knots of the board 2. The maximum of knot area ratio (F 6 ) over all the knots o f the board 3. The standard deviation o f knot area ratio (F^ over all the knots of the board 4. The average of axial deviation (F 2 ) over all the knots in the board 113 5. The standard deviation o f axial deviation (F2) over all the knots in the board 6. The average o f axial standard deviation (F 3 ) over all the knots 7. The standard deviation of axial standard deviation (F 3 ) over all the knots 8. The board distribution mode. This feature is equivalent to the density o f clear wood of the board. 9. The standard deviation of the density distribution of the board 10. The variance of the density distribution of the board 11. The average of the distance between neighboring knots (F 5 ) . For each knot the neighboring knot is the closest knot. 12. The standard deviation of the neighboring knot distances (F 5 ) over all the knots in the board. 13. The number of knots in a board 14. The ratio o f knot volume to the board volume 15. The board volume is identified by the X-ray image 16. The board density, which is the summation of the density over the board. This feature is the summation o f all the pixels of the X-ray image of the board. I f two images exist, the average o f this feature for two images is used. Geometrical feature extraction program The flow chart of the knot detection program is shown in the following. This program is a part of geometrical feature extraction program that follows. 114 Fetch a board code from the board list * Load flat wise and edgewise images Run knot extraction routine * Save geometrical features of the specimen Figure 5.34. The general flow chart of geometrical feature extraction program 115 Extract board image from the X-ray image Find mode $ Discretize the image by using thresholding Run knot extraction routing A vector of all knot patterns in flatwise image Extract board image from the X-ray image * Find mode Discretize the image by using thresholding if A vector o f all knot patterns in edgewise image Run knot extraction routine 2 Scan for all matching patterns in the two vectors 4r A l l knots in the board Regenerate each knot from their parameters and extract features i r A list o f all features of all knots Extract the geometrical features of the board from knots features Figure 5.35. The flow chart of the knot extraction routine Knot pattern extraction routine is as follows: 116 i Search for a non-zero element in the discretized image. return Starting from this point find all the non-zero elements that belong to the same pattern Extract the features of this pattern and save them Figure 5.36. The flow chart o f knot extraction routine 2 Statistical feature extraction program The feature extraction program for one-dimensional profiles is the same for grain angle and for dynamic bending machine profiles. Only the threshold levels are adjusted. 117 * Fetch a board code from the board list Generate the file name and the directory name and load the data file Scan for the beginning and the end o f the profile A vector that represents the measured profile I Extract the statistical features Save the features Figure 5.37. The flow chart o f the statistical feature extraction program The output of each feature extraction program is a matrix whose first column is the board code and other columns are the extracted features. Once all the feature files are generated one output file is produced that is the combination of all the extracted features. The first column is the board code, then all the features (dynamic bending machine, Microwave, 118 and Geometrical features), and the last column is the measured strength of the specimen. This file is used in all the learning algorithms of Chapter 6. 119 Chapter 6. Strength Estimation; Empirical Learning for Small Sample Size The final stage of the strength estimation process is the empirical learning problem. Once the feature set is defined and the strength of each specimen is measured, a suitable machine should be chosen in order to produce the best predictive model. Due to the high cost of generating a learning set for wood strength estimation, the number of specimens in the training set is very limited. Therefore, a number of different learning machines need to be examined to discover what model works best. The problem of learning (or function approximation) can be stated as finding the optimal parameters of the approximating function by minimizing a risk functional. The type of input variables (continuous or discrete) and the task (regression or classification) of the function should be defined. A s were shown in the previous chapters the features of a board, which are the input variables o f the approximating function, are numeric variables. The approximation problem is to find a regression function to estimate the strength of a new board. In Section 6.1 a review of learning systems is presented in order to create the common ground for various learning machines for strength estimation, as described later in this chapter. In this section the principles and components of a learning method are discussed and the relationship between two accuracy measures (also called performance criteria), such as the mean square error and the coefficient of determination is presented. The coefficient of determination between measured and estimated strength is used in the standard o f wood grading. Mean square error is usually recognized as the measure of accuracy in learning methods. The problem of high dimensionality for feature space is discussed in Section 6.2. The high dimensionality of feature space is known as "the curse" or complexity of dimensionality ([54]). 120 In Section 6.3 a review of the theoretical upper bound ([66]) on the error o f a learning system is presented. The upper bound o f the error is a function of sample size and the learning capacity o f the learning system. Therefore, in Section 6.3 the relationship between the sample-size, the learning-capacity o f the learning machine, and the estimation error is presented. The learning-capacity o f a learning machine is the factor that defines how suitable the machine is for the given task. The understanding that is developed in this section wil l later be used in the A S E C model (Section 6.6). A practical method for measuring the learning capacity of a classifier exists in the literature ([73]). The equation of the upper bound error is valid for both classifiers and regression estimators ([81]), assuming that the learning-capacity o f the machine is known. A practical method o f learning-capacity measurement for regression estimators is presented in Section 6.4. The learning capacity of regression estimators is defined based on the definition o f the learning capacity o f classifiers and thus, the presented capacity measurement method for regression estimators is based on the existing method for the learning-capacity measurement of classifiers. Different learning machines that have been used for strength estimation are discussed in Section 6.5, and the A S E C algorithm is presented in Section 6.6. The existing learning machines o f Section 6.5 are linear regression, K-nearest neighbor ([74]), the radial basis network, the multi-layer neural network ([75]), and S P O R E ([21]). Parametric Learning Machines The linear regression method is commonly used in practice and is a parametric learning machine. Parametric learning machines develop a model by assuming that the data distribution has a known parametric form. A parametric learning machine can result in constant error due to irresolvable biasing, because o f the restriction imposed by the fixed structure o f the approximating function. Nonparametric Learning Machines Neural networks, K-nearest neighbor, S P O R E , and A S E C , are non-parametric learning machines. These machines reduce the bias error by changing the complexity (see sub-Section 6.1) o f the learning machine. A simple multi-layer neural network consists o f 121 three layers (input, output, and middle layers) with one layer (middle layer) for non-linear mapping of the input to the output ([75]). The K-nearest neighbor algorithm and radial basis networks estimation methods are also called memory based learning machines and do not generate a model from the given data but predict the output for a new specimen by finding its closeness to the given samples. S P O R E is a learning method that was developed for learning conditions similar to that dealt with in this research. A s explained in Chapter 4, a minimum set of features is not known. Therefore, there may potentially be a group o f features that are poorly correlated with the tensile strength o f a board. S P O R E was developed specifically for this condition and for when a large number o f features and noisy input and output exists with no a priori knowledge about the input-output (i.e., feature-strength) relationship. The A S E C algorithm is developed in this thesis in order to include the a priori knowledge about the input-output (feature-strength) relationship in the learning process. In A S E C , the features and the input-output transformation are selected such that the estimation error is minimized. The a priori knowledge is included in the form of input-output transformations. A S E C uses subset selection ([82]) and space expansion ([83]) and learning system error estimation (Section 6.1 and [81]) in order to transform the given set of features to a new set of features with minimum size and best accuracy of estimation. 6.1. S t r e n g t h es t imat ion as a n e m p i r i c a l l e a r n i n g p r o b l e m In this section the problem o f wood strength estimation is looked at as a learning problem, or equivalently, a function approximation problem. The problem of learning (or function approximation) can be modeled as finding the optimal parameters o f the approximating function by minimizing a risk functional. The type o f input variables and the task of the function should be defined. A s were shown in the previous chapters the features o f a board, which are the input variables o f the approximating function, are numeric variables. The approximation problem is to find a predictive model to estimate the strength of a new board. 122 The process of data generation and the learning problem are shown in Figure 6.1. The sample space is the collection o f boards randomly picked from a typical mill-run in the region. It is assumed to be a typical sample o f the board production. The actual strength affecting factors are denoted by F. These factors may or may not be measurable. The A feature set presented to the learning system is denoted by F. These features were presented and discussed in Chapter 5. The measured strength of the board (the output o f the destructive test) is denoted by S. The learning system estimates a dependence between A A F and S. The estimate o f the strength is the output o f the learning system (S). A board from the sample space F Actual Failure process Measurement and feature computation A F •> s Learning system A s Figure 6.1 The learning problem It is obvious that the input variables to the learning system are not the same as the actual factors of the failure process. This has been explained in detail in previous chapters. The A features in F are the features, among measurable features, that are reasonably related to A board strength. The learning system uses S and F to generate an estimate of the strength A (S). The whole strength estimation problem is then formulated as follows: S=S + e (61) In order to separate the analytical discussion of the function approximation problem from the wood strength estimation problem, the regular notation in the literature is used in the A A following. Therefore F is denoted as x and S is denoted as^y. The estimated strength, S, is 123 denoted asj\x,w) (ory) which is the output o f the approximating function and w denotes the set of parameters of the learning system. The learning problem is stated as estimating the parameters o f the approximating function in order to find the best mapping from x to y from a set of given (x^y). A set o f approximating functions, flx,w), are chosen to estimate the dependence. It is assumed that the dependence between x and y can be modeled by the chosen set off[x,w). The choice of the approximating function depends on the researcher and depends on the nature of the dependence between x and y. Different forms of J\x,w) were tested in order to find the best model. A learning problem can be formulated as a minimization problem. The goal is to minimize a risk functional, R(w), which determines the difference between the actual dependence between x andy, and the estimated dependence that is modeled by f[x,w). R(w) = fL(y,Ax,w))p(xy)dxdy (62) A common empirical risk functional, which is an estimate o f R(w), is used for regression purposes as shown in equation (64). L(y,fix,M>)) = (A^)-yY (63) The risk functional can be estimated as follows. Where R (w) denotes the empirical risk functional that wi l l be minimized in the estimation problem. The parameter vector w is the vector o f all the adjustable parameters o f / O -Therefore the dependence between x andy is estimated as shown in equation (65). y=fix,w) + e (65) Equation (65) is equivalent to equation (61). The difference between j> and j\x,w) can be contributed by two factors: the inherent difference between the true dependence (between 124 y and x) and j\x,w), and the measurement noise. In a small neighborhood in the feature space the risk functional can be written as in equation (66) ([72]): EMx,w) - yf] = E[ (few) - Efew)] f]+(y- E\fe,wW (66) The function E(.) represents an average over all training samples. The first term of the right hand side o f equation (66) is the variance of the approximating function. This term is related to the complexity o f fe,w) and wil l be explained later. The second term is the square of bias error, which is the actual error between the approximating function and the true dependence. The complexity offe,w) defines how fast / changes with respect to the variation of its input, x. The complexity offe,w) can be measured with respect to a given set of data (as wil l be done later). However, the complexity of a function may be defined by using the number o f its parameters. For example for a linear model, the complexity of the function is equal to the number of coefficients. Generally speaking, depending on whether the complexity offe,w) is low or high, the dominant estimation error can be bias error or variance error. Informally speaking, i f the complexity o f / is limited, the resulting approximating function cannot follow all the variations o f the measured output and the bias error wi l l be dominant in equation (66). In this case, the variance error wil l be limited. A n obvious form o f bias error happens when the set of approximating functions cannot estimate the true dependence, for example, trying to approximate a second-degree dependence (y = x2 ) using a linear regression. I f the complexity offe,w) is too high, the variance error wi l l dominate equation (66). The variance error is a result o f having a limited number o f specimens for training the learning machine. In this case, i f the complexity offe,w) is too high, the approximating function follows all the variation ofy that is caused by the noise. Assuming that fe,w) can model y, minimizing R^Jw) as shown in equation (64) generates little bias error in training stage, but large error results due to the variance o f the resulting function, fe,w*). Therefore, a second factor to be rmrrimized is the complexity of the estimating function fe,w). Therefore, finding the right parameters of 125 the approximating function in the learning problem means finding a compromise between the bias error and variance error. Model development In order to solve the model development problem four stages are followed: 1. a set o f approximating functions is chosen; 2. the a priori knowledge or assumptions are used to place constraints on the approximating function (in order to reduce the variance error); 3. an inductive principle is chosen in order to reach a solution and compare the results (Section 6.1.2); 4. a learning method (e.g. an optimization algorithm) is developed in order to construct the approximating function based on the inductive principle. Each o f the parts o f this process is explained in the following. The structure of the approximating function j%x,w) depends on the choice of the designer. The definition of the approximating function's complexity is related to a priori knowledge about the type of dependence. Basically, J{x,w) belongs to a wide class of functions. Most of the time the approximating function, fix,w), is generated by using a class of basis functions. Based on the type of the basis functions, the approximation method can be called universal or local. Also based on the freedom o f the class of basis functions the approximation method is called parametric, semi-parametric, or non-parametric. In parametric form the shape o f the approximating function is defined a priori (e.g. linear regression) while a non-parametric structure uses the data to find the right shape o f the approximating function. In a semi-parametric form (also called dictionary method) there is a limited number of basis functions that a designer can choose from. For example, linear regression is a parametric structure o f an approximating function, and a multi-layer neural network is a semi-parametric form o f structure. A n example of the universal approximator is x* that can construct the polynomials (J\x,w) = 1) w. x*). The S P O R E algorithm is a non-parametric structure that wi l l be introduced in subsequent sections. 126 The a priori knowledge can be used to limit the set of approximating functions, or limit the optimization goals. For example, the early stopping rule in multi-layer neural networks limits the complexity o f these learning machines. There are different inductive principles that lead to defining the complexity term and minimizing the risk functional of equation (62). These principles are Bayesian inference, the penalization principle, the neural network early stopping rule, the minimum description length, and the structural risk minimization ([66]). A s wil l be shown, we wil l use the penalization principle and the neural network early stopping rule. Structural risk minimization also overlaps wit work in this project, but does not add to it, and therefore, is not studied in detail. In the penalization inductive principle, R^fiv) is defined as follows: RJM = \ t(Ax,w)-y)2 + A d>mw)] (67) where AO\f[x,w)] is the term that measures the complexity of the approximating function. The penalization functional, <I>[/(x,w)], is defined based on a priori knowledge o f the data. The regularization parameter, A, is derived from the data in order to make the learning problem data driven. This means that A should be chosen so that minimizing R^Jw) is equivalent to minimizing R(w). In the neural network early stopping rule the complexity penalization term is controlled by limiting the repeated training steps. The complexity of the approximating function can be defined in different ways. The penalty functional, Of.] , can be parametric or non-parametric. I f a function of the parameters o f the approximating function defines the complexity o f the approximating function, the complexity functional is called parametric. I f the penalty functional restricts the shape o f the approximating function it is called a non-parametric penalty functional. For example, £ w] is a parametric penalty functional while Of/] = J " ^ ^ ds (for a given filter G(s)) is a non-parametric penalty-functional. The complexity of the learning methods used in this research is explained with the learning methods. 127 For a given Of/fow)], finding the right X is called complexity control or model selection. A s mentioned above, the right X is chosen so that minimizing the empirical risk functional is equivalent to minimizing the actual risk functional. In order to do so, an estimate of the actual risk functional should be obtained. Then for a series of X, the approximating function is found by minimizing the empirical risk functional. w* = argmin ,w) - yj + X ®[j%x,w))} (68) The desired value o f X is the one that produces an approximating function, fl{x,w*), which minimizes the estimated risk functional. Here w* is the vector ofthe parameters offix,w*) . The input-output relationship is estimated by the following equation: y,=/(*X) (69) In order to evaluate the risk functional two approaches can be taken: analytical and numerical. In an analytical approach, the actual risk functional is related to the empirical risk functional by the following equation: RM-ri^RJp) (70) where r(.) is called the penalization function, h is a measure of the degree of freedom of the data set and is similar to X. n is the number o f specimens. This approach is commonly developed for linear estimators. Cross-Validation In the cross-validation technique the data set is divided into two sets, one for training the learning machine and the other for finding the accuracy of the developed learning machine. Therefore, the cross-validation technique uses the data for approximating the actual risk functional, R(w). In this method the actual risk functional is estimated by using a portion o f the data set not used for training. In this case, the data set is divided into two parts: one part for estimating the parameters of the estimation function, and the second part (validation set) for estimating the risk functional. In this case, different values of X 128 can be used for developing the approximating function and the estimation o f risk functional is used for choosing the optimal regularization parameter (Z*). A n effective method of estimating the risk functional is k-fold cross validation. For example, in ten-fold cross validation, the specimens of the data set are randomly categorized into ten groups. Producing the approximation function and estimating the risk functional is repeated as follows. Each group in turn is considered as the validation set. The rest of the data set is used to develop the approximating function. For the group /' the risk functional is then estimated as follows: rr^VWLXj^-yjV (71) The estimation of the actual risk functional is as follows: R(w) « | z r (72) The above learning method is good for understanding the learning process. However, in practice other methods can be followed that accomplish the same goal with less effort. For example, in training the neural networks the number o f neurons in the middle layer and the training goal are two factors that can define the complexity of the network. Therefore, a two dimensional search can be completed by using the cross-validation technique in order to find the optimal values. Risk functional The learning system usually minimizes the risk functional but it is customary (from a statistical point o f view) to compare different random variables by using r2, the coefficient of determination, defined by the following equation: (73) * y The coefficient of determination, r2, is related to the risk functional Z (y, -ft*))2 in the sense that the maximum of r2 happens when the minimum of Z (y, -fix))2 is happening. 129 The following relationship between r1 and Z (y,-AX)Y 1S the basis of the comparison (see [84], pp. 531). / J = 1 - [ ^ E ( X - ^ , ) ) 2 ] / ^ (74) Therefore, the risk functional is a Euclidean norm o f the estimation error and the approximating functions are U (Lebesgue integrable) functions. The ten-fold cross validation technique can be used to estimate the risk functional. In this technique, the data set is divided into two parts, one for developing each model and one part for testing all the models as to how well they can predict the model. In order to do so the data set is randomly sorted and 10% of the data is separated for the final prediction test. The training data set is also sorted for 10 fold cross validation. Here, the data set is randomly grouped into ten sets. In each repeat of the training, one of the sets (without replacement) is picked for testing. The remaining nine data sets are grouped into the training data set used for model parameter estimation. The tenth set, which is the test set, is used to find out what the risk functional value is. The process is repeated ten times; therefore, ten values of the risk functional are obtained. The average of these estimated risk functional determines how well the model works. 6.2. H i g h d imens iona l i ty o f the feature set A s mentioned above and seen in Chapter 5, the representative features are logically chosen such that they seem to be best correlated to strength. The feature set therefore would be high dimensional. The high dimensionality o f the feature set makes it difficult to develop an estimator that can desirably estimate the tensile strength because the samples are not densely distributed in the feature space. This issue, which is called "the curse of dimensionality", is a restricting factor. This problem is usually dealt with by placing restricting constrains on the approximating function^*). This means thatX*) wil l be o f lower complexity, or of smoother function. 130 There are four issues that affect the function approximation process when dealing with high dimensional data as follows ([72], [21]). First, the number of samples needed for generating the same density o f the data (in the feature space) increases exponentially with dimension. Second, as the dimensionality of the feature space increases, in order to cover a fraction of the feature space volume (e.g. a hyper cube) a greater portion o f each axis is needed. Third, in the unit hyper-cube every point in the feature space is closer to an edge than to a neighboring point. Fourth, for every point in the feature space all other points look close to each other. Therefore every point can look like an outlier and the concept o f interpolating between sampling points (that is usually associated with function approximation) can be deceptive. Distance of one specimen from others in the feature 101 1 1 1 1 T " 8 idean nee in ire space 4 2 rji i i i i i i 200 400 600 800 1000 1200 Number of other specimens Figure 6.2 Distance o f all other boards from a sample board. The Figure shows that the average o f the distance is less than 3 while the distance from the closest point is more than 1. Figure 6.2 demonstrates the high dimensionality o f the feature space by showing the variation o f distance from a randomly selected point. This Figure shows that the Euclidean distance of a point (in the feature space) from the neighboring points is close to 1, while the average distance o f the point form all other points is between 2 and three. Since Figure 6.2 is produced from the feature set o f this thesis, it is a demonstration of the 131 high dimensionality issue in the strength estimation problem. The number of features is 88 and the number o f specimens is more than 1000. 6.3. T h e re la t ionsh ip between es t imat ion a c c u r a c y , l e a r n i n g capac i ty , a n d s a m p l e size In order to get an estimate of the structure o f the learning system, one can use upper bound error estimation based on the learning capacity of a learning machine. The learning capacity of a learning machine (also called the VC-dimension) is a measure o f how well a machine can learn the training data. I f the learning capacity o f the machine is very low, the machine cannot learn the variations in the training data and the trained system wil l produce a bias error. I f the learning capacity o f the machine is too high, it wil l generate a variance error, or the model wi l l simply over-fit its parameters. I f the learning capacity o f a learning machine is known, statistical learning theory provides the framework for analyzing bias error and variance error. The result o f this theory tends to be conservative; that is, an over estimation of error is produced. Therefore, the cross validation technique is used for producing the estimation o f error. The upper bound of error guides the design of the learning machine. The learning capacity o f a learning machine defines how much o f the variance in the learning data can be learned by the machine. The learning capacity o f a learning machine is roughly equal to the number of specimens (number of samples in the training set) that the learning machine can learn without generating any error. For a limited learning set size, this factor controls the estimation error. Small learning capacity causes bias error and large learning capacity causes variance error. A n estimate of the learning capacity (or VC-dimension) of a learning machine can be measured for a classifier ([73]) and a regression estimator (see Section 6.4). This measurement is based on repeating the machine training-step numerous times and measuring the variance o f error at each learning step. Therefore, this approach becomes impractical for cases where training a model is already difficult. 132 In order to find the optimum learning capacity of a learning machine, the designer can adjust a controllable learning factor such that, theoretically, by finding the bounds, or practically by measuring estimation error, the performance o f the system is improved. In the case o f polynomials, one of the controlling factors o f the learning capacity of the machine is the number o f terms of the function. For multi-layer neural networks the number o f neurons o f the hidden layer, for K-nearest neighbor the size of neighborhood (K), and similarly for radial basis functions the size of the neighborhood (the radius of the radial function) controls the learning capacity of the machine. In the case of K-nearest neighbor and the radial basis function, the functions locally minimize the risk functional. But i f K or radial function radius is kept constant, the overall estimation error can be used for finding the optimum K or radial function radius. If the estimation error is minimized at a level of the controllable learning factor, that level identifies the best structure of the learning system. For example, by increasing K in a K -nearest neighbor and using 10 fold cross validation for estimation error evaluation, a minimum point for the error curve can be found which defines the best learning capacity of the learning machine for the given data set. A n upper bound of the estimation error can be established based on the approximation error when the machine is being trained. The estimation error is a function o f the number o f training samples, learning estimation errors (error in learning the training set), and the learning capacity of the machine. The learning system learns the dependence between x andy for X given samples by minimizing the following empirical risk functional: where the empirical data or the training data (zj =(x^)) is the set of X samples that was mentioned above: (75) (76) z={zp...,zj (77) 133 The empirical risk functional is an estimation (over a finite number of samples) of following risk functional: R(a) = fQ(z,a)dF(z) (78) Where F(z) is the cumulative distribution function. If the estimated parameter set that results from minimizing the empirical risk functional (a) is put to the test, it will produce i?(ax) estimation error. The best parameter set for the same learning machine could be obtained by minimizing equation (78). If the best parameter set is called a0, the following is the inherent error due to the limited number of samples. A(ax)=R(aJ-R(a0) (79) The estimated error at the training level, which is estimated by R^a), should be adjusted to count for the limited number of samples. This difference was derived in [81] and is as follows. R(aJ<R(a)+Be(X) f ' 47? (a) /1+ ^ x (80) s(X) This bound holds with the probability of I-7. The two new parameters in inequality (80), B and e(X), are defined as follows. 0<Q(z,a)<B (81) IX 1 /Klny+1)-Jn(77) £(X) = 4 (82) The learning capacity of the machine, h, is assumed to be known. Equations (80) to (82) define the learning error as a function of the VC-dimension (h) and the number of specimens (X) in the database. 134 6.4. E x t e n d i n g the l e a r n i n g - c a p a c i t y ( V C - d i m e n s i o n ) to regress ion es t imators In this section the Vapnik-Chervonenkis dimension related to classifiers is generalized to cover regression problems. Since the VC-dimension is originally defined for classifiers, the equivalence o f classifiers and regression estimators wi l l be demonstrated first. Then, based on the VC-dimension of classifiers, the VC-dimension of regression estimators wil l be defined. Statistical learning theory can be used for estimating an upper bound for estimation variance. The VC-dimension o f classifiers was presented in [73]. The goal is to measure the dimensionality o f a learning machine and derive an upper bound for the risk functional estimation error. In [73] a practical method was presented for measuring the corresponding dimension of a classifier. The analytical V C dimension for all learning models cannot yet be derived, thus there is a need to estimate it for any particular machine. The measurement is based on finding the classification error for a size o f training samples. B y varying the size of the training sample one can find the sample size that initiates the learning process. In this section we wil l show that the same technique can be applied to regression estimators as well. 6.4.1. The equivalence of classifiers and regression estimators In this section it wil l be shown that under a general condition (on the output o f the regression estimator) a classifier and a regression estimator can easily be transformed into each other. 6.4.1.1. Learning problem statement A classification problem is defined on a set o f (xt,a)) where x.eR" is the input space and 0 ( e { O , l } is the output space, which is the class o f the given input. A classifier consists of a set of basis functions (/(.)) with a set of parameters asA. 135 The classification problem is the problem o f learning the dependence between x. and co. for a given class o f estimating functions j\x,a). In the empirical risk minimization principle the learning problem is equivalent to minimizing a risk functional on the given parameter space (a&A) o f the estimating functions (/Tx,a)). The risk functional is a distance measure and is usually the mean squared error or a similar measure. L%x,a) = \Z(fc,a)-a# (83) " i=l The parameter set that produces the best classification fit is denoted by a* and is defined as follows: a* = min R^a) = min \ I(/T>, a) - a,)* (84) A regression problem is defined on a set o f (x*y,) where X.GR" and y.sR. A regression estimator estimates the dependence of x. on y.. Similar to a classification problem, a regression problem minimizes a risk functional on the given set of estimating functions (f[x,a)). The risk functional is the same. LXx,a)=\±(j(x,a)-yy (85) The parameter set that produces the best regression fit is denoted by a* and is defined as follows: If one assumes that the same set of estimating functions is used for both regression and classification problems, then the question arises whether either problem could be stated by using the other. In other words, i f one has a regression estimator, is there any way to use it for a classification problem? Or i f one has a classifier, is there any way to use it for a regression problem? Solving this problem leads to the VC-dimension o f a regression estimator i f the VC-dimension o f the corresponding classifier is known. 136 6.4.1.2. Constructing a classifier by using a regression estimator The regression estimator learns the a posteriori probability (p(y\x)) at every point as wil l be shown in the following. For reconstructing the regression estimator, we need to construct a two-class classifier by using the regression estimator. In the case of a two-class classifier, the estimating function f[x,a) can be used to define the a posteriori probability of p(Cl \ x) or p(C2 \ x) as follows. Establishing a classifier by using the a posteriori probability is the subject o f sufficient statistics [79]: Assuming D is the domain of x where the regression problem is defined, we can define the normalized regression estimator as follows: (87) C2 = {xi\y>T} /=!,...,« (88) J(x,a) = fyp(xy) dy = jypixy) dy + pp(xy) dy (89) This leads to the following necessary and sufficient conditions: (90) and (91) Then the a posteriori probability of a class can be derived as follows: (92) and (93) 137 Therefore, assuming the two conditions of (90) and (91) are valid, the two classes are defined as follows: C={x\Ax,a)<T) (94) and C2 = {x\f{x,a)>T} (95) Note that in the above there is no assumption about the type of the estimation problem (classification or regression). Therefore, the learning problem is reduced to training the regression estimator to learn the dependence (either regression or classification). The classification problem then results from setting a decision level for distinguishing the two classes. The two conditions o f (90) and (91) can be questions wheny(x,#) = T. Therefore, in a classification problem, the decision level can be adjusted so that minimum classification error results. The nominal decision level for a classification problem is naturally T = (A+B)/2 where A and B are the outputs related to either class. nam x ' r 6.4.1.3. C o n s t r u c t i n g a regress ion est imator by u s i n g a class i f ier In the previous section it was shown that a classifier could be constructed by using the regression estimator. A regression function can be constructed by defining the output of a classifier so that it satisfies the following condition: The threshold level T in equation (96) is an arbitrary variable that belongs to the range of the regression estimator (min(y) < T< max(y)). It wil l be shown that the regression estimator can be constructed (as closely as desired) by using the set o f classifiers defined by equation (96). The regression estimator is developed on a training set as follows: V x , x , T <y<T<y)=>(a><a>) (96) SR-{(x^),i-K..,n} (97) 138 The estimator is developed by using equations (85) and (86). A group o f classification sets can be defined as follows. Assume the specimens are ordered such that y. <yM. B y using a decision level the output o f a classifier can be defined as follows: V x , x , T <y<T<y)=>(a><a>) (98) Then a set o f input/output can be defined that represents a class, as follows: Sa = {(x,a>), j= \,...,n, G)x,...,(Oi =1, G)i+l, ... , a = 0} (99) Then a group of n classifiers can be trained to learn the dependence in each Sa by using equations (83) and (84). Each classifier (Sa) is also equivalent to a classifier o f the previous section with the decision level of y. < T <y /+1. Assuming that the range o f the regression problem is limited and is divided into equally separated threshold levels, an upper bound for equation (85) can be reached as follows: A = min(y), B = max(y) (100) A ( = | T r T J = ( B ^ y ( n + l ) = A (101) The output o f the regression estimator can be approximated by the outputs of all the classifiers o f Sa (i = 1,... ,ri). yri = r.yc+A/2 (T* = max(T=l) , z'=l,... ,ri) (102) Where yct is the output o f the classifier ^ C i . The minimum mean squared error in this case wil l be as follows: (a*) = min L(x,a) = \ | ( v , - j ^ 2 (103) The error in the regression estimator can be defined as follows: max(l9 -fix,a)\)<N2 (104) 139 The output of the regression model and the output o f the regression estimator are related as follows: yH=A*,a) + et (105) Where s is the error (noise) in the estimation of the output of the regression estimator. This noise is uniformly distributed and is independent ofy. and j\x,d) ([80]): M $ N i A / 2 (106) ' 10 |£J>A/2 Properties o f the quantization error follow: E(e) = 0 (107) and o 2 = var( £) = J e2 ^de (108) -A/2 ^ a 12 A 12 (m+ If where m is the number of classifiers in the model. The variance of noise approaches zero as the number o f classifiers approaches infinity. In the following we wil l show that R^p(ct) is an upper bound for R^fa): \t{yt-Ax,a)-s)2 \ p,-fix, a)?-\t(yt -fix,d» s + \t (s)2 140 In the two extreme cases where one o f R " (a) or Rr (a) is zero, the other one wil l be equal to cr 2. Therefore the distance o f the two empirical cost functionals is as follows: ^l|£(y ( -A '«))- f J + d|(^2l «(=1 " «=i " ;=i Loosely accepting the following equation: a 2 =-2X*) 2 (111) we wil l have the following bound: s u p { ^ ( « H ? r ( a ) | } < a 2 (112) a Similarly, the risk functionals are related as follows: B RT(cc) = f(y-fix,a) - sfp(y) dy (113) A B B B = f(y-fedjfpiy) dy -lf(y -J\x,a)) .ep(y)dy + fe>p(y) dy A A A Which results in the distance of the two risk functionals, as follows: sup{\Rcr(a)-RXd)\}<a2 (114) 141 Where d is the variance of e. This shows that R " (a) is an upper bound for Rr (a) and empx 7 r r etnpx ' Rrip) is an upper bound for Rr(a). I f the number of classifiers approaches infinity the bound Rcr(a) approaches R(a). 6.4.2. The VC-dimension of a regression estimator In the above the construction of each learning system by using the other has been explained. In order to measure the VC-dimension of a given regression estimator we wil l construct a classifier from it and reconstruct the regression estimator from the classifiers. In the process, the regression estimator is represented by a finite number of classifiers whose VC-dimension can be measured ([73]). Then by equation (114) the upper bound for the empirical risk functional o f the regression estimator wi l l be derived. In order to find the VC-dimension of the regression estimator we should derive an upper bound for R(x,a). It is obvious that i f the maximum o f VC-dimension o f the regression estimator in the range [AJ3] is known we wil l have the following: suP{|/n«HC(«)i} * o 15> Therefore, with cr tolerance, an upper bound for the risk functional can be obtained by using m classifiers. The maximum estimation error for the regression estimator can be derived as follows: snp{\R-(a)-R;ja)\} + swp{\Rel(a)-R;ja)\} (116) h + sup{\RcXa)-R~(a)\} <tf-f*') + 2 c 2 a A. sup{|/r(a)-*;(a)|} + sup{\R;ja)-R^a)\} (117) + sup{|/r(«) -R(a)\} < sup{|/r(a) - / T ( a ) + R^(a) -/r (a) + R(a) - iT(a) |} Therefore, the upper bound for maximum risk functional error can be written as follows: s u p { | i r ( a ) - ^ ( « ) | } < ( | ) ( - ^ ) + 2 . o 2 (118) A s the number o f classifiers increases, & approaches zero and A m a x approaches to the V C -dimension of the regression estimator ([81]). 6.4.3. Coro l lary A n upper bound not related to the assumed distribution o f s can be obtained by placing AV4 instead of d: sup{^(a)-/r (a)|) < (R-^ -) + f (119) 6.4.4. Measur ing the V C - d i m e n s i o n at a point In order to measure the V C dimension of the regression estimator for the given set of data, the regression data set is transformed to a classification problem as follows: (x,,co) (120) where, the classification output (CD) is defined as follows: a>=l, if y > T ' n / \ T ( 1 2 1 ) In order to measure the V C dimension of the regression estimator, the variance in the estimation for a data set with variable set size should be measured. Therefore, a number of different test set sizes (X) are selected. For each set size, 20 different data sets (from the original data set given in equation (120)) are produced: Z2]={(x\, a$,(K> «P. •••>«. «0> - 0> - >(^ > O ) (122> The set size (X) is between 1 and 100, and 20 equally spaced numbers are selected. The maximum variation in estimated empirical risk functional must be measured. The empirical risk functional of a set of size X is defined as follows: 143 v x ( a )=^£ | tv /V*) l < 1 2 3 > The maximum variation in the measured empirical risk functional is defined as follows: = sup [ v f t Z > ) - v * ( Z » ] (124) The expected value o f £(Z 2 X ) is a monotonically decreasing function of the V C dimension o f the learning machine: £ {sup [ v } ( Z » - v^<Z>)]} * 0 ( | ) (125) where h is the V C dimension of the learning machine. The expected value is approximated by an average over N (=20) test sets: m = ^ t w % (126) The V C dimension of the learning machine (J{x,a), as A) is the integer parameter that X makes the best fit between <&(jp and £(X..): /7* = a r g m i n | : [ ^ i ) - 0 ( ^ ) ] 2 (127) The largest deviation in the empirical risk is estimated by using the opposite class of half o f the members o f TP* as follows: V^ = \t\wrfc,djR+\ t [a>rfixpa)Y (128) where the opposite class (tn) is defined as follows: m = 0 if 0 = 1 m = 1 / / co< = 0 144 The deviation function (0(^)) has the following simplified form that was suggested in [73]: // 1^0.5 * / | > 0 . 5 (130) The constant d is empirically derived ([73]) and is equal to 0.39. Example In this example the V C dimension of a regression estimator wi l l be measured. The original dependence between input and output of the samples is a norm function and the estimator function is a linear regression estimator. The regression problem wil l be transformed to a classification problem by using a threshold level and the V C dimension of the learning machine wil l be measured by following the steps in [73]. The V C dimension o f a polynomial is known to be equal to the number of its coefficients. It wi l l be shown that the measured V C dimension is very close to the known V C dimension o f the linear regression estimator. The function to be studied is the norm function of ten independent variables as follows: Where, the independent variables (x) are independent random variables in [0,10]. The number of variables (AO is changed and the V C dimension is measured. A linear regression estimator is used to learn the input-output relation: The V C dimension o f this linear regression estimator is equal to N+l. A sample set of size ten thousand is produced, by generating one thousand random points for each variable (A7) and the related output from equation (131): (131) = c+ 2jc x2 0 7 7 ' < (132) 145 (xt,y) i = l , . . . , 1 0 0 0 (133) Where the input variable is a vector o f size 10. The threshold level is equal to the average of the output: T - £ t , , (134) For solving equation (127), it was assumed that h is within the range o f [1,100] and a "look-up table" was generated for ®(~fp The solution to equation (127) was found by searching for the best h that minimized the error. Table 4 shows that the measured V C dimension is very close to the number of coefficients o f the linear regression estimator. Changing the threshold level (7) does not affect the measured V C dimension. Table 4: The measured V C dimension of a linear regression estimator. Number of variables Number of coefficients Measured V C dimension 5 6 6 10 11 10 15 16 16 20 21 21 25 26 25 30 31 31 35 36 37 40 41 42 60 61 61 146 T h e V C - d i m e n s i o n measurement a lgori thm Repeat n times Load the empirical data set {z} /=!,... ,X r Find the range of the output (v) •< T Load the threshold level (T) r r/,- = <> if y,<T 1 y ' - l i f y ^ T r Randomly assign each empirical pair (xJ5y'p to one of the two sets £,{, %\ r Negate the output of all the empirical elements in ^ y» = 1 if y-= 0 1 y'; = o if y;=i r Find e(X) feauation 821 Solve <|>C£) for h* (equation 127) Figure 6.3 The block diagram for estimating the VC-dimension 147 This routine works especially well when 2h* < X ^10/7*. I f an initial value for h* is known, the range of X can be defined readily. The routine works well when an assumption for h* is made and the range o f X is obtained. Then this range of X is used to find a new h* and this loop continues until the sequence of h* approaches to a constant value. Significance of VC-dimension calculation In order to get the theoretical distribution of the error o f the risk functional, the V C -dimension of a regression estimator is needed. Obtaining an estimate o f the risk functional (R(cc)) requires having a sufficient number of specimens. This condition is not always practically possible. The VC-dimension of a regression estimator (either measured or estimated) provides a means for obtaining the error distribution o f the (regression) estimator. This error distribution is used in every step of A S E C for finding the size o f the selected subset. For grading purposes, the practical significance of the VC-dimension is that one can estimate the grading improvement for any increase in the number of database specimens. Assuming that the VC-dimension o f the regression estimator (h) is known, or is measured, at a threshold level (or grading level) and assuming that the number of specimens increases from Xt to X2, one can calculate the expected improvement in grading For the same learning problem and the same learning machine, the empirical risk does not change much by changing the number o f the specimens, thus: as follows: + « * £ ) (135) Rice) 2 (136) (137) 148 Therefore the improvement in the risk functional o f the regression problem is as follows. * > ) - * > ) M y ) - e ( y ) (138) B y using equations 74 and 138, one can obtain the improvement in the coefficient o f determination (f). 6.5. S t r e n g t h es t imat ion b y u s i n g exist ing methods Each o f the learning machines used for strength estimation is explained in the following. 6.5.1. Linear regression A s a parametric model, a linear regression is the first model to be used for checking the dependence between features, x, and the output, y. This model generally fits a polynomial to the given data set. The approximating function is shown in equation (139): flx,w) = TwiXi (139) where N is equal to the number o f features and the complexity o f the approximating function. Therefore, one can test the effect o f the complexity o f the approximating function on the estimation, at the lower end of the complexity, by using this model. The higher degree terms in equation (139) can be introduced as new features; therefore, an estimation o f the parameters ofj\x,w) can simply be formulated as follows: fix,w) =2>x = wft +w2x2+ ...+ wxm = yt (140) or, Xw=Y (141) w=XlY (142) The capital letters denote the vectors of the same variables that lower case letters represent. X1 is the pseudo-inverse of X, which is (XX)lX. The potential problem with 149 the polynomial fit is that it generates a higher dimensional feature space than already exists. For example, for the strength estimation of wood, about 1000 specimens are collected and the number of features is about 100. A third degree polynomial creates 200 new features (the original features to the power o f two and three). Multiplying the number of features in the polynomial learning method rapidly adds to the problem of high dimensionality of the feature space. Also, since the form of the function is defined at the beginning, the bias error has a lower bound ([66]). 6.5.2. K-Nearest Neighbor and radial basis functions These two networks are local networks. A s mentioned before, the local networks do well when the samples are not dense. These methods generate a prediction at a given point of the input space without generating a hyper surface in the input space. These two methods are adaptive networks, called kernel-based networks, in the sense that fix) is a local function ([66], [74]). The radial basis network can be phrased as follows: j ^ i V / t f l l x - x J ) (143) 2 Where ht(.) is usually a bell-shaped function (ht (x) = ex). \\x-x. || is the Euclidean distance of the input x from the location of the radial basis function xt. The complexity of the approximating function (see page 125) is inversely proportional to the radius of h (.). Since the radius is the same for all hf(.) a line search can find the optimum radius. Therefore, for a given radius, w{ can be found by an optimization technique (similar to a training algorithm for a neural network) and the risk functional value is computed. A n optimization technique can find the radius that minimizes the risk functional. Similarly a K-nearest neighbor uses the average of yt for K neighbors of the point. The larger K is the smoother the function wil l be, therefore, the complexity o f the function is inversely proportional to K . The choice of the distance is important in this case. In order to construct the approximating function, the risk functional for the neighborhood the kernel function covers should be minimized. A n alternative would be to use similar kernel function over the entire space o f the input variables. Therefore, the summation of 150 all the local risk functionals is the empirical risk functional. The optimal parameter set is the one that minimizes the empirical risk functional. In a K-nearest neighbor machine, the size of neighbors controls the learning capacity of the machine. Therefore, by increasing the number of neighbors and finding the neighborhood size where the maximum estimation accuracy occurs, the best structure of the machine can be found. It was seen that the neighborhood of size 5 results in the most accurate estimation from 10 fold cross validation ([72]). In a radial basis function machine the diameter of the radial function controls the learning capacity of the learning machine. By keeping the radial function diameter constant, one can maximize the overall estimation accuracy and find the right structure of the learning machine. By using 10 fold cross validation for accuracy approximation and by doing a search on radial function diameter, the best diameter was obtained. 6.5.3. Multi-layer neural networks Multi-layer neural networks are semi-parametric adaptive networks. These networks usually consist of three fully connected layers. The first layer is the input (features). The second layer consists of the neurons and the third layer is the output layer, which is a linear neuron that sums the scaled output of all the neurons of the second layer. These networks are robust with respect to the training set and perform very well when dense samples are available, even for high dimensional inputs ([66],[75]). This network is a universal approximator; that is, given enough neurons the network can approximate the continuous function that appears in applications ([76],[77]). Multi-layer neural networks with different neuron activation functions are trained by using back propagation. The network can be modeled as follows. where M is the number of neurons, N is the number of features, b is the bias, and af are the scale factors of each feature. hf (.) is the activation function of the related neuron, g (.) y = th(g(x)) (144) g(x) = b + lZaixi (145) 151 is a linear function that generates the input to the neuron. A network consists of the necessary number of neurons at its middle layer and a linear neuron (hj (x) = x) as the output of the network. The typical forms of h. (.) are linear, sigmoid, or radial basis. The design of the network consists of finding the right number of neurons and activation functions. The common neurons are sigmoid functions, the logistic function and the arc-tangent function, as defined by equations (146) and (147) in the following. The choice of either one of these functions does not affect the result much ([72]): Kx)=J^r, (146) h(x)=^tanl(x)+^ (147) Networks of more than three layers are also being used. The extra layer(s) adds to the complexity of the network. By limiting the number of neurons and limiting the norm of the parameter vector, one can control the complexity of the network. The first usually is choosing the structure of the network, and the second is choosing the early stopping rule. Figure 6.4 A multi-layer neural network. 152 In this case, only a three-layer network is used. The hidden layer consists of symmetric sigmoidal neurons and the output layer is a linear neuron. The best accuracy was obtained by 5 hidden neurons. Adjusting the accuracy goal at the training level did not improve the results. 6.5.4. S P O R E This method o f function approximation was developed for a set of problems with the same conditions as the problem at hand (Ph.D. thesis of Grudic, [21]). This method is a nonparametric adaptive network that works well even when the number of features is large. There are different versions of the S P O R E algorithm but what is applied in this research is SPORE-1 , as is explained in the following. The network consists of a cascade of blocks of two variable polynomials with adjustable degrees. The output of each block is fed to the next block until the end o f the cascade. The output function is a linear combination o f all the blocks (Figure 6.5). x. a £ « L ( - ) Figure 6.5 The cascade form of SPORE-1 . Each polynomial (g(.,.)) is defined as follows: (148) O's(0,l,...) 153 The network is developed in block sections, a few blocks at a time, and progresses until all the variables are included. The number o f blocks in a block section is more than a constant number 'depthl ' . The order of the input variables is random. The parameters of each block in the block section is estimated for minimizing the following cost function. £J<g(;-)-yJ (149) where the set T® is a bootstrap sample from the training set. In Figure 6.5, the output scale of each block, ap is proportional to the inverse of the mean square error of the block. When the change in the cascade error in the last 'depthl+1' blocks is less than a constant factor s the construction of the new blocks stops. Then a search in the block set takes place to find a subset of the blocks that minimizes the risk functional (equation (149)) more than the block set does. I f such a subset exists, the blocks that are in the block set but not in the subset, are removed and variables are freed for later approximation improvement. The same process continues by using a block set of size 'depth2' from the remainder of the variables. This second block set is constructed to learn the error of the first block set. It can be seen that K defines the complexity of the network and the learning algorithm is a comprehensive search among different combinations o f the variables. In this model the learning capacity of the machine has to be controlled by adjusting its parameters. The model finds the best structure by trial and error. Therefore, no notion o f learning capacity is used for this machine. 6.6. A l t e r n a t i n g S p a c e E x p a n s i o n a n d C o n t r a c t i o n ( A S E C ) A s was implied through this thesis and in many exploratory estimation problems, not only the input-output dependence (fixjv)) is not known but also the representative feature set (x) could be arbitrarily defined. In A S E C method both the parameters of the estimating function (f[x,w)) and the set of input features (x) are optimized in order to improve the estimation accuracy. Two iterations o f A S E C are shown in Figure 6.6. 154 f Iteration I : X 1 / A l r . ( f Iteration I I : \ / X 2 A2 r Figure 6.6 Two iterations o f A S E C In a field like the strength o f material, the a priori knowledge usually exists in the form of relationships between strength affecting factors (represented by g in Figure 6.6), and between them and the strength o f material (represented by / i n Figure 6 .6) . /and g do not have to be the same in different iterations. For producing the new feature set (x* + 1), the existing feature set fx*) is transformed by the nonlinear transformation (g) to a large set of features, x 7 is a small subset o f the generated feature set that produces the smallest estimation error. If the set of transformations (g) is similar to a universal approximator, its convergence could be guaranteed for a large number of specimens. A good example of a universal approximator is a generalized polynomial, as follows. Z C ; x. + E C , , x. x. + ZC] x*+ ... (150) If the individual functions in g, are exponential transformations of the given feature set fx*), then the terms of equation (150) could be used to check i f all the necessary terms are included in g. 155 Although the polynomial model is a general approximator, it is not the final model when a small number of measured specimens are available. For example, for estimating y = tan(x) with a given number of samples and a given accuracy r2, a polynomial with a relatively large number of terms should be generated. The accuracy of the estimation depends inversely on the number o f coefficients because o f the limited number o f specimens. Therefore, a range o f nonlinear dependencies that can reasonably exist should be included in the expansion guide. The problem of wood features to tensile strength was modeled by using the following. A n approximation model is selected. In our case the model is as follows: ^ = I c ( i ( (151) The feature transformation (g(x)) was done by two sets of nonlinear transformations: a set o f such nonlinear transformations o f a single feature; and a set o f cross-multiplication of features. The nonlinear transformations o f a single feature (see Appendix) is shown in the following: { ^ I x U s i g n f x W M , * 2 } (152) The cross multiplication o f features investigate the inter-relation of the features an is shown in the following: {**} (153) The contraction method is a selection method that identifies a subset o f the given feature set so that the best estimation (minimum error) can be obtained. The Gram-Schmidt method o f space spanning is used for this purpose, as follows. 1. The closest vector to the output is selected. 2. A l l other features (including the output) are replaced by their orthogonal projection on the first feature. Steps 1 and 2 are repeated to generate the subset o f given size. 156 In order to include the variation in the feature set, a group of training feature sets (e.g. 10) called T (, is generated in order to demonstrate the direction of feature vectors and their variations. For each training feature set (T) the minimum subset is obtained and the table of consequent features is generated. Table 5: The parallel feature sets. iteration 1 iteration 2 iteration 3 f„ T 2 T 3 f31 f33 The selected subset (based on Table 5) is then as follows: The features are then eliminated from the set one by one such that a subset with highest accuracy is obtained. This step eliminates the features whose contribution to the accuracy of the model can be compensated for by the other features. Based on the given contracted feature set, the model of equation (151) was developed and the accuracy of the estimated and is compared to that of the previous model (accuracy o f model 0 is equal to 0). If improvement results another iteration wil l be attempted. I f no improvement results, the process wil l be terminated. The number of iterations in Gram-Schmidt selection can be obtained from statistical learning theory. In this theory there is an inherent estimation error associated with the generalization ability of the learning system. The generalization ability (measured by learning capacity or VC-dimension) for a polynomial can be estimated by the number of coefficients, which is the number of Gram-Schmidt iterations. 157 A starting point for the number o f iterations (for example 10% of the training set size) is considered and the subset is generated. The learning accuracy of the estimator can be obtained as shown in the following Figure: Figure 6.7 Estimation accuracy from the contracted space (top) and the empirical risk (bottom). 158 The overall (lower bound) estimation accuracy is obtained by subtracting the error from the estimation accuracy. upper bound of risk functional 40 56 numbar of features Figure 6.8 The curve o f the upper bound o f the risk functional for the given number of features. The optimum feature set size T is the basis number in the contraction step (step 3). T is a minimum number of iterations because the statistical learning theory provides a conservative measure of performance. The actual number o f iterations is a multiple of T (between 3 and 6 times) and a more realistic iteration number can later be obtained by using the cross validation technique. The cross validation technique works as follows. Each training set is again divided into a group of cross validation sets and wil l be called second sets. The second training sets and the second testing sets are used to find the realistic T. Each second set generates one trajectory o f features for the higher level set. Different trajectories are different rows o f the feature table in 'Table P. From each trajectory a diagram o f estimation accuracy is obtained whose maximum is at the iteration number. The average of these profiles is used for finding the iteration number. B y doing so, the iteration number, which is one o f the learning parameters, is produced without the effect 159 of the test set. Once this process is repeated for all 'n' scenarios, a set o f estimation accuracy and an optimum iteration number wil l be obtained: {*„/,}, (VJ ( W (155) One can use the average of these values as the outcome of one cycle of A S E C , as follows: Different trajectories are representative of the variation o f the feature vector directions. I f the variation is large enough to get closer to a different vector, the projection vector at the "ith" level switches to the new vector, thus a new trajectory results. I f the two projection vectors are very close the variation in the feature vector can be ignored. A measure o f this closeness is the condition number of the transfer matrix of equation (151). Therefore, i f by adding a projection vector to 'A ' the condition number of 'A ' jumps (beyond some threshold level so that it can produce the numerical error), the projection vector wi l l be removed from the table and replaced by the previous projection vector (Figure 6.9). Although the process o f space spanning for the trajectory with the replaced projection vector should be repeated to account for the change, it can be ignored because the induced change is very small. Finally, the minimum subset wi l l be as follows: (156) (157) 160 T f f i 11 12 13 T f f f 2 21 22 23 T f f f 3 31 32 33 f « « f 22 32 T f f„ f i 11 12 13 T f f 2 21 ^ _ 2 2 _ - > 23 T f • f f 3 31 22 33 Figure 6.9 Replacement of close features in the trajectory table. Example The given feature set consisted o f 88 features (geometrical features from X-ray images and statistical features from others). As mentioned above the two function sets of {-, \x\, sign(x)A/jxj, x2} and {x x] (cross multiplication) were used in order to generate a new feature set from any given feature set. The approximation model was a linear regression model, i.e., yl = T.ci x . The Gram-Schmidt space spanning method was followed to generate a sequence o f features, for example, one row of Table 5. Ten rows were produced. I f the coefficient o f determination between two features was more than r2 = 0.9, one o f them would be replaced with the other. Figure 6.7 shows the improvement in estimation as more terms were added to the feature set (see (154)). Using the inequality of equation (80), as an equality for 77 = 0.5 (representing the expected value), and B=( m a X 2 ™ n ) 2 (see equation (81)) the diagram of 0 was obtained. The cross-multiplication o f features produced a large number of new features. In order to avoid memory overload, an iteration o f the A S E C consisted of two cycles; in each only one o f the transforming function sets were applied. Once the sequence of features was 161 produced, the risk functional was estimated by using ten-fold cross validation. The result is shown in the following: Table 6: The computed r 2 after each iteration of ASEC r2 The estimation accuracy after the first iteration 0.6448 The estimation accuracy after the second iteration 0.6565 The estimation accuracy after the third iteration 0.7093 The estimation accuracy did not improve after three iterations. The final feature-set size was 38. The block diagram for an iteration of the ASEC A n iteration of A S E C consists o f expanding a given set of features to a new set of features (usually with more features) and then contracting the expanded set in order to produce a new feature set with fewer features. The new feature set includes new and more effective features. To expand a feature set, a set of predetermined functions can be used. The generality of this method is not compromised because i f such a set of functions is not available, any set of universal approximator functions can be used. A given set is contracted by selecting a series of features, by using the Gramm-Schmidt theorem. 162 Import the feature set Expand the imported set New feature set r Generate a series of features * Estimate the error on empirical risk Select the set with minimum empirical risk Figure 6.10 The flow chart o f one cycle o f the A S E C This cycle is repeated to the point where it no longer improves the estimation accuracy. 6.7. E s t i m a t i o n a c c u r a c y o f d i f ferent l e a r n i n g m a c h i n e s A l l the learning machines were tested using the cross validation technique. The ten fold cross validation technique is a commonly used test. In this test, each specimen is randomly assigned to one o f ten sets. In ten repetitions, each set, in order, is used as the testing set and the rest o f the ten sets are combined into the training set. The learning machine is trained using the training set and the estimation accuracy is tested by the test set. For generating the ten sets, a random number was assigned to each specimen. The specimens were sorted with respect to the random numbers. Starting from the smallest random number, one tenth o f the specimens were removed and assigned to the first set. Other sets were separated in the same fashion. In cases where the learning capacity of the 163 learning machine had to be determined, the best structure was determined by using the first training set and the same structure was used for the rest of the accuracy testing. The number o f neighbors controls the learning capacity o f the K-nearest neighbor. To test the K-nearest neighbor the number of neighbors was increased (starting from 1) and the cross validation technique was applied. For the multilayer neural network, the number o f hidden neurons and the early termination factor control the learning capacity. Starting with one hidden neuron (and increasing), the training was repeated. The early stopping level was changed to increase the cross validation accuracy. Since the steepest descent algorithm was used for neural network training, the estimation accuracy did not necessarily reach as low as the stopping level. Therefore, the training process was repeated by using another initial value for the parameters. Dr. Grudic, who developed the algorithm, trained the S P O R E learning machine. This machine adjusts its learning by doing repeated searches. For the A S E C algorithm the number o f coefficients is taken as equivalent to the learning capacity of the machine. The algorithm reached maximum accuracy in five iterations, two non-linear mappings and three cross-multiplication mappings. The expected accuracy diagram was produced, for any iteration, and the optimum number o f features (that minimized the estimation error) was found. The best number of parameters was usually approximately 20. Two to three times this number was transferred to the next iteration. The increased number of features normally did not affect the results. Too many features slowed down the algorithm to an impractical rate. Table 7 shows the strength estimation accuracy of the specimens using different learning machines. Table 7 : The accuracy of estimation by using different learning machines. Learning machine r 2 by using 10-fold cross validation Linear regression 0.60 K-nearest neighbors 0.31, k=15 Multi-layer neural networks 0.68 S P O R E 0.73 164 A S E C 0.71 A n estimate o f the contribution of different features is also studied here using a linear regression model. In order to see the contribution o f the measurement machines to the strength estimation accuracy, one or more measurement systems are excluded from the input (the profiles). The cross validation technique is not used here because the model is not being tested. Therefore, the entire set of specimens is used for developing the regression model and then testing the accuracy o f the estimation. The purpose o f this test is to show that all the measurements are contributing to final estimation accuracy and, no one o f them can be replaced by a combination of other measured properties. Table 8 shows the accuracy of the estimation for the given set of features. Table 8: The accuracy of the estimation for different feature sets. included features for estimation r 2 at learning X-ray 0.4059 M O E 0.5625 S O G 0.5275 X-ray and M O E 0.6101 X-ray and S O G 0.6011 M O E and S O G 0.6572 X-ray and M O E and S O G 0.6805 6.8. D i s c u s s i o n o f the results Table 7 shows two properties of the learning problem for the database o f this thesis. B y comparing the accuracy of linear regression and nonlinear transformations (Neural 165 Networks, A S E C , and SPORE) it becomes clear that higher order self and cross products of features lead to improved performance. The poor estimation accuracy from K-nearest neighbor shows the effect of high dimensionality of the feature space. As was pointed out in Section 6.2, high dimensionality o f a feature set (in this case, 88 features) makes the concept o f 'neighborhood' ineffective. Especially i f some dimensions are inherently noisy as they are in these types of measurements. The A S E C algorithm and the S P O R E algorithm followed two different approaches for improving the strength estimation. In the A S E C algorithm, the given feature set is being transformed (by using a priori knowledge) into a more effective set of features, while the S P O R E algorithm used the given feature set (88 features) and found the best hyper-surface that maps them to the output (measured strength). The closeness o f the estimation accuracy by the two algorithms shows the equal validity o f the two approaches. However, the estimation accuracy o f A S E C could be improved by searching for better transformations and more complex estimation functions (a simple linear estimator was used in this research). The final feature set that was produced by A S E C included 38 features. This shows that the original feature set (with 88 features) includes redundant information. Using a smaller (and more effective) feature set reduces the VC-dimension of some of the estimators (e.g. linear regression) and thereby improves the estimation accuracy for those models. This can be seen in Table 6, which shows that the error is minimized at 38 features. 166 Chapter 7. Conclusions and Future Directions 7.1. C o n c l u s i o n s The problem of estimating the tensile strength of lumber is the focus of this thesis. This problem was posed as a multi-sensory measurement problem and a small-sample learning problem. It was shown that the regression approximation was successful in estimating strength, and that regular industrial estimation accuracy can be significantly improved by using a multiple sensory system. The measurement means were those most commonly used in industry, so their speed could be adjusted to match speeds found in industry. The principles and practical limitations of each measurement means was studied in order to make sure each did not produce the significant effect on estimation accuracy. A study for calibrating the dynamic bending machine, the machine commonly used for wood grading, was performed. The result of this study showed that the flat-wise dynamic M O E that was measured by the dynamic bending machine was highly correlated (r 2 = 0.92, see Section 2.4.1) to flat-wise static M O E . The significance o f this finding is that the type o f M O E measuring machine used would not affect the accuracy of strength estimation. A conic model o f knots and a knot detection algorithm was developed. The estimation accuracy o f this model was demonstrated through simulation (^ = 0.81 for smaller diameter and r2 = 0.98 for larger diameter). This model was developed for single knots, which comprise the majority of the defects of boards. It was demonstrated that this model could be used for producing a simple representation of a board consisting of clear wood and single knots. Also, it was shown that the model and the related detection algorithm could be used for detecting the same defects in other measured profiles, such as the C C D camera image o f a board. Characterizing a board consisted of transforming the measured profiles into a set o f previously studied statistical features and geometrical features as proposed in this thesis. 167 The measured profiles, except for the X-ray images, were transformed to their statistical moments. From the two X-ray images of a board the geometrical features o f the board were calculated. The knots locations were shown to match the location of the actual knots. A brief survey o f different learning techniques was presented and the problem o f strength estimation as an empirical learning problem was discussed. Because of the high cost of measuring such properties, the problem of lumber strength estimation was considered as a small-sample learning problem. It was shown that classification VC-dimension could be extended to the VC-dimension o f a regression estimator. During this demonstration it was shown that a classifier could be constructed by using a regression estimator, and vice versa. This concept established the very basis of this research, which is to improve the grading accuracy (a classification problem) by improving strength estimation accuracy (a regression problem). The A S E C learning model was proposed in order to produce an optimal feature set from the original feature set. This model allows a researcher to incorporate a priori knowledge about input-output dependence. I f such dependence does not exist, a generic mapping can be used. The accuracy o f strength estimation when using this model, along with a simple relationship between knot-size and stress-distribution, showed that this model performs better than most existing learning models. 7.2. F u t u r e D i r e c t i o n s Throughout the course o f this research, a high speed of computation has been a goal since the real time applicability o f the process is a focus. Therefore, the solutions presented here were attempted without any recursive approach. Any improvement in estimation accuracy by using gained by recursive algorithms (for detection) could be further investigated. In particular, the knot detection algorithm could be repeated by adjusting the size and location of a partial cone in the space in order to maximize the similarity o f its orthographic projections to the detected knot patterns from the X-ray images. A more substantial model of clear wood properties for the board model (see Figure 4.8) could be developed. For example, the gradual variations in clear wood density could be 168 added to the model. Also grain angle, and M O E could be included with the properties o f wood in every element o f the model. Any performance improvement for the A S E C algorithm by modifying its transformations could be investigated. In essence, the effect of other a priori knowledge in the form of feature transformations (for example higher order terms in Irwin's equation) as well as non-linear estimators are to be investigated. Table 8 indicates that all the measurement means contributed to the estimation accuracy o f the tensile strength of wood. The estimation accuracy is likely to be improved by adding new measurement means, such as ultrasound or visual sensing, in order to gather relevant information that is not presently available. Other analyses, such as stress analysis, could be used for generating more effective features from the existing measurements. The estimation improvement due to knowing basic wood properties could also be examined. 169 Appendix: Board failure; initiation and propagation of a fracture Although fracture mechanics was not used in this research for analyzing the tensile strength of lumber, it provides the understanding of how the failure of wood takes place. In selecting the representative features of lumber, one needs an understanding of the failure process because it sometimes plays a leading role. This Appendix is intended as a brief examination of the wide area of fracture mechanics. In the literature, a crack is defined as a split in material that grows (propagates) when applied stress increases. A fracture is the growth of a crack due to applied stress. These two terms (crack and fracture) are loosely used interchangeably here because the meaning is conveyed by the text and because it is not essential to the adopted methodology o f this thesis. The process of failure due to fracture can be caused by applied force or by material status. "Fatigue" and "creep" are two examples where fracture is caused by the feature of the applied force and by the feature of the material state. In fatigue, the cyclic pattern of the applied force causes the fracture but in creep the state of material (hot metal, for example) causes the fracture in low stress levels. Also, fast fracture propagation and slow fracture propagation are distinguished in the literature. The tensile strength of lumber being estimated in this thesis occurs where slow monotonic loading is applied to a test specimen. There are three major modes of failure, opening (mode-I), shear (mode-II), and tearing (mode-Ill). Mode-I o f failure happens when the crack progresses across the board while the board is under tensile stress. Mode-II of failure happens when the crack is progressing along the board. Mode- I l l is not usually considered for the failure of a board. The failure o f a specimen in most cases is an opening mode of failure (mode I), as wil l be explained later. However, the propagation of fracture in wood (across and along the grain) suggests that failure mode must be a combination of mode-I and mode-II ([56]). There are numerous micro and macro fractures that already exist in wood. High stress or low strength zones create the condition in which the existing fractures can grow. In 170 theory, the fracture can start at any point in/on the board because micro fractures are randomly distributed in a board. In practice, it is almost certain that one of the defects of wood wil l be the location of fracture initiation, and in most cases, a knot initiates a fracture ([11], [12]). The tip of a fracture creates a high stress zone that deforms the material and causes its failure. The stress concentration at the tip of the fracture is much more than that o f other natural defects o f wood. Therefore, a fracture has a good chance of propagating until it stops at some point. It was observed that a crack ends in another defect (like knot) or when the crack propagation direction goes along the board. There are three types of fracture: opening, sliding, and tearing. Once the failure starts at a point it propagates step wise, that is, the fracture propagates consequently along the grains and across them until complete failure takes place. In practice a combination of opening and sliding takes place in the failure of wood. The sliding mode usually takes much more tensile stress to propagate. This is why in some cases one crack practically stops progressing. I f another crack initiates with stress level close to that of the first crack, then this second crack gets a chance to propagate. In grown knots there is density difference and high grain variation with respect to clear wood. In some cases of knots, only a high grain angle exists and that happens where the knot is related to the bottom of a branch (where the branch wood starts by high grain deviation o f the trunk wood). Variation in the density of a knot with respect to clear wood causes high stress zones, which can cause the failure o f clear wood. In high grain angle parts o f a knot the tensile strength o f wood drops significantly such that it can fail under regular tensile stress that clear wood, such as, trunk wood, with regular grain angle can stand. The normally used tool for analyzing the initiation and propagation o f fractures in material is the linear elastic finite elements. Critical stress intensity factor, or fracture toughness ([57]) is the concept on which linear elastic fracture mechanics is based. It is assumed that once a fracture is initiated, it can progress almost regardless of body and loading geometry (given certain specific general conditions ([33],[58],[59])). 171 Once a crack is created in the body of the board, it generates its own high stress zone. The stress distribution around a crack tip and its propagation is the subject o f many theories. Based on stress distribution around a fracture, different theories were developed that predict the failure of material ([33], [60], [61], [62]). Griffith ([39]) developed a fracture theory based on the decrease in material potential energy (Up) and fracture surface energy (LQ as follows: 7?+77-° <158) where Up and C/Sare as follows: J / = - ^ 0 5 9 ) U = 4aty (160) s * s Equation (158) states that when the stress level reaches its critical level, Up wil l be transformed to U„. Therefore the critical stress level can be obtained as follows: 2Ey a - A / - * (161) a is the stress level, E is modulus of elasticity, t is thickness, a is fracture surface, and ys is the elastic surface energy per unit thickness. Based on this theory, i f the critical stress in the material exceeds the critical level, the material fails and the crack propagates. A theory that distinguishes different modes o f failure was published in [57]. Based on this theory, the stress distribution around the tip ofthe fracture can be modeled as follows: <r, = ^(K, + Kn f£0) + Km f°(ej) + higher terms (162) where r and 0 are the coordinates o f the material element in a two dimensional polar coordinate system, r and 0 are measured with respect to fracture tip and fracture line. Kp 172 KJP and Km are stress intensity factors for opening, sliding, and tearing modes of failure. For a two-dimensional tensile stress problem only an opening mode of failure exists. For a fracture of length 2a critical stress is found from the following equation. KM=<ryfc (163) KIc is a material property. Once the stress intensity factor of any type exceeds the critical level the fracture of that type takes place. The mixed mode of failure that takes place in a material such as wood was explored in ([63]). The failure criterion in this case is as follows: Ic lie K} and Ku are the mode-I and mode-II stress intensity factors, and KIc and KIIc are the fracture toughness for those failure modes, which are material properties (constant). A theory initially used in fracture analysis routines ([12]) was based on maximum stress failure theory. A second theory that combines mode-I and mode-II of failure was later used for wood ([63]). The information from fracture mechanics contributes to the estimation problem in two different ways. Although the effect of knots on strength is the subject o f other research, geometrical features wil l be defined in order to include this knowledge (in a qualitative way) in the feature set. For example, knowing that the knots are usually the initiating points of fractures, the distance between two knots is included as a feature. Also, the form of any dependence of strength to the feature set can be incorporated into the learning system as a priori knowledge. For example, equation (162) states that the stress level produced due to the existence of a fracture diminishes by ~\=. In A S E C model (Chapter 6) this information is used for generating new features from the given set o f features by four different mappings of ~, \x\, sign(xj\fx, and x2. 173 References [I] "VisionSmart X-ray machine manual", VisionSmart INC, Edmonton, Alberta, Canada. [2] "Grain Angle Measurement with the Metriguard Model 510", Metriguard Inc, Pullman, WA 99163. [3] "Model 5100 slope-of-grain indicator operation manual", Metriguard, INC, PO Box 396, Pullman, Washington 99163,1977. [4] Logan J. D., "Bending Stress in the Metriguard Model 7200 HCLT and the CLT", Metriguard INC. [5] Blass H., Gard W., "Machine Strength Grading of Timber", Pacific Timber Engineering Conference, Australia, pp. 598-603, 1994. [6] Perstorper M., "Strength and Stiffness Prediction of Timber Using Conventional and Dynamic Methods" [7] Lam F., Foschi R. I., Barrett J. D., Q. Y. He, "Modified algorithm to Determine Localized Modulus of Elasticity of Lumber", Wood Science and Technology, Springer-Verlag, pp. 81-94, 1993. [8] Bechtel F. K., Allen J. R., "Non-destructive Testing Methods for Lumber", United States Patent, 1990. [9] McDonald K. A., Cramer S. M., Bendtsen B. A., "Research Progress in Modeling Tensile Strength of Lumber from Localized Slope of Grain", Proceedings of the Sixth Symposium on the Nondestructive Testing of Wood, Washington State University, Pullman, WA, 1987. [10] Cramer S. M., McDonald K. A., "Predicting Lumber Tensile Stiffness and Strength with Local Grain Angle Measurement and Failure Analysis", Wood and Fiber Science, 21(4), pp. 393-410, 1989. [II] McGowan, W. M., "Parallel-to-grain tensile properties of visually graded 2 by 6 -inch Douglas-fir", Information Report VP-X-46, Forest Products Laboratory, Vancouver, BC. [12] Shi Y., "Multiple Knot Failure Mechanisms in Structural Lumber", Ph. D. thesis dissertation, Civil and Environmental Engineering Department, the University of Wisonsin, Madison, 1996. [13] Courchene T., "The Effect of Edge Knots on the Strength of Western SPF MSR Lumber", MSc. thesis dissertation, The University of British Columbia, Vancouver, 1996. [14] Schajer G. S., "Method for Estimating The Strength of Wood", United States Patent, 1990. [15] Newnes INC., "X-rays help measure lumber strength on-line", Wood Technology, pp. 41-43, June 1997. [16] Logan J. D., "X-ray Lumber Grading V.S. Traditional Methods, Review of a recent article", Metriguard INC., Pullman, W A 1998. [17] Quin F. J., Steele P. H. Shmulsky R., "Locating Knots in Wood with an Infrared Detector System", Forest Products Journal, vol. 48, no. 10, pp. 80-84, 1998. 174 [18] Cho T. H., Conners, R. W., Araman P. A., "A Computer Vision System for Automated Grading of Rough Hardwood Lumber Using a Knowledge-Based Approach", Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 1990. [19]Rouger F., "Application of a Modified Statistical Segmentation Method to Timber Machine Strength Grading", Wood and Fiber Science, 28(4), pp. 438-449, 1996. [20]Breiman L., Friedman J. H., Olshen R. A., Stone C. J., "Classification and Regression Trees", Wadsworth International Group, 1984. [21] Grudic G., "Nonparametric Learning from Examples in Very High Dimensional Spaces", Ph.D. Thesis dissertation, The University of British Columbia, 1997. [22] National Lumber Grading Authority, "North American Export Standard for Machine Stress-Rated Lumber", 1986. [23] Biernacki J. M., Lam F., Barrett J. D., "Benefits of Improved Strength and Stiffness Prediction of MEL and MSR Lumber", Faculty of Forestry, The University of British Columbia, 1997. [24]Hoag M. L., Krahmer R. L., "Polychromatic X-ray attenuation characteristics and wood densitometry applications", Wood and Fiber Science, v. 23(1), pp. 23-31, 1991. [25]Bertin E. P., "Principles and practice of X-ray Spectrometric Analysis", Plenum Press, Newyork, 1970. [26] Victoreen J. A., "Calculation of X-ray mass-absorption coefficient", Jounal of Applied Physics, V. 20, pp. 1141-1147, 1949. [27] Schirato R. C , Polichar R. M., Reed J. H., "Development of monolithic CdZnTe arrays with improved energy and spatial resolution", pp.47-56, SPIE, v. 2278 X-ray and UV Detectors, 1994. [28] Shen J., Schajer G., Parker R., "Theory and Practice in Measuring Wood Grain Angle Using Microwaves", IEEE Transaction on Instrumentation and measurement, pp. 803-809, vol. 43, No. 6, December 1994. [29] Shen J., "Wood Property Measurements Using Microwaves", Ph.D. dissertation, The University of British Columbia, 1995. [30] Shen J., Schajer G., Parker R., "A New Probe for Characterizing Elliptically Polarized Microwave Fields", Journal of Microwave Power and Electromagnetic Energy, pp. 55^ 62, vol. 28, No. 4, 1993. [31] Torgovnikov G. I., "Dielectric Properties of Wood and Wood-Based Materials", Springer Verlag, 1993. [32] Saboksayr H., Lau W., "Landmark CLT™ Calibration", UBC internal report, May 1999. [33]Broberg, K. B., "Cracks and Fracture", Academic Press, 1999. [34] Weibull, W., "A Statistical Theory of the Strength of Material", Proceedings of Swedish Institute of Engineering Research (Ingeniorsvetenskapsakademiens), Stockholm, 1939. [35] Weibull, W., "Investigating into Strength Properties of Brittle Material", Svenska teknologf m reningens f m rlag, Stockholm, 1938. 175 [36]McGowan W. M , "Parallel-to-grain Tensile Properties of Visually Graded 2x6-inch Douglas-Fir", report # VP-X-46, Forest Products Laboratory, Department of Fisheries and Forestry of Canada, Vancouver, B.C., 1968. [37]Boatright S. W., Garret G. G., "The Effect of Microstructure and Stress State on the Fracture of Behavior of Wood", Journal of Materials Science, vol. 18, pp. 2181-2199, 1983. '» • > ' [38] Barrett, J. D., "Fracture Mechanics and the Design of Wood Structures", Philosophical Transactions of Royal Society of London, v. 299, p. 217-226, 1981. [39] Griffith, A. A., "The Phenomenon of Rupture and Flow in Solids'', Philosophical Transactions of Royal Society of London, v. 221, p. 163-197, 1921. [40] Smith, I., Hu, L. J., "Factors Affecting Mode I Fracutre of Plantation Grown Red Pine", Wood Science and Technology, v. 28, p. 144-157, 1994. [41] Wagner, H. D. "Statistical Concepts in the Study of Fracture Properties of Fibrous Composites", Application of Fracture Mechanics to Composite Materials, v. 6, Elsevier Science Publisher, Amsterdam, 1989. [42] Stanzl-Tschegg S. E. , Tan D. M., Tschegg E. K., "New Splitting Method for Wood Fracture Characterization", Wood Science and Technology, vol. 29, pp. 31-50, 1995. [43]Bazant, Z. P., Xi, Y., "Statistical Size Effect in Quasi-Brittle Structures Part I: Is Weibull Theory Applicable?", Journal of Engineering Mechanics, v. 117, p. 2609-2622, 1991. [44]Bazant, Z. P., Xi, Y., "Statistical Size Effect in Quasi-Brittle Structures Part II: Nonlocal Theory", Journal of Engineering Mechanics, v. 117, p. 2623-2640, 1991. [45]Haygreen J. G., Bowyer J. L., "Forest Products and Wood Science, an introduction", The Iowa States University Press, 1982. [46]Hankinson, R. L., "Investigation of Crushing Strength of Spruce at varying angles or grain, U.S. Air Service Information Circular #259, Chief of Air Service, Washington, D.C., 1921. [47] "Wood Engineering Handbook", Forest Products Laboratory, Prentice Hall, 1990. [48]Dabholkar A. Y., "Analysis of Wood With Knots and Cross grain", Ph.D. dissertation, Colorado State University, 1980. [49] Goodman J.R, Bodig J., "Orthotropic Strength of Wood in Compression", Wood Science, vol. 4(2), pp. 83-94, 1971. [50]Norris C. B., "Strength of Orthotropic materials subject to combined stresses", USDA Forest Products Lab, Report No. 1816, Forest Products Laboratory, Madison, WI, 1962. [51] Cote W. A., HannaR. B., "Ultrastrucniral Characteristics of Wood Fracture Surfaces", Wood and Fiber Science, vol. 15(2), pp. 135-163, 1983. [52]Lam F., Varoglu E., "Variation of Tensile Strength Along the Length of Lumber, Part 1: Experimental", Wood Science and Technology, pp. 351-359, 1991. [53]Lam F., Varoglu E., "Variation of Tensile Strength Along the Length of Lumber, Part 2: Model Development and Verification", Wood science and Technology, pp. 449-458, 1991. 176 [54] Friedman J. H., "An Overview of Predictive Learning and Function Approximation", in From Statistics to Neural Networks, Cherkassky V., Friedman J. H., Wechsler H., pp. 1-61, Springer-Verlag, 1994. [55] Bury K., "Statistical Distributions in Engineering", The University of British Columbia, 1995. [56] Ashby M. F., Easterling K. E. Harrysson R., Maiti S. K., "The Fracture and Toughness of Woods", Procedings of the Royal Society of London, vol. A 398, pp. 261-280, 1985. [57] Irwin G. R., "Analysis of Stresses and Strains Near the End of a Crack Traversing a Plate", Journal of Applied Mechanics, vol. 24, pp. 361-364, 1957. [58]Barenblatt G. I., "The Formation of Equilibrium Cracks During Brittle Fracture. General Ideas and Hypotheses. Axial-symmetric Cracks". Journal of Applied Mathematics and Mechanics, vol. 23, pp. 622-636, 1959. [59] Barenblatt G. I., "Equilibrium Cracks Formed During Brittle Fracture. Rectilinear Cracks in Plane Plates". Journal of Applied Mathematics and Mechanics, vol. 23, pp. 1009-1029, 1959. [60]Kolosov, G., Doctoral dissertation, Dorpat, Estonia, 1909. [61]Kolosoff, G., Uber einige Eigenschaften des evenen Problems der Elastizitatstheorie. Zeitschrift fur Mathemetik und Physik, vol. 62, pp. 384-409, 1914. [62] Inglis C. E., "Stresses in a Plate due to the Presence of Cracks and Sharp Corners", Transactions of the Institution of the Naval Architects, vol. 1, p. 219-230, 1913. [63] Wu E. M., "Application of Fracture Mechanics to Anisotropic Plates", Journal Of Applied Mechanics, pp. 967-974, 1967. [64] Cramer S. M., Goodman J. R, "Model for Stress Analysis and Strength Prediction of Lumber", Wood and Fiber Science, 15(4), pp. 338-349, 1983. [65] Lam F., Varoglu E., "Grading MSR Lumber for the Optimization of Market Value", [66] Cherkassky V. S., Mulier F., "Learning from Data", John Wiley & Sons, 1998. [67] Vapnik, V , "The Nature of Statistical Learning Theory", New York: Springer Verlag, 1995. [68]Castleman K. R., "Digital Image Processing", Prentice-Hall, 1979. [69] Horn P., "Robot Vision", McGraw-Hill, 1986. [70] Dabholkar A. Y., "Analysis of wood with knots and cross grain", Ph. D. Dissertation, Colorado State University, Fort Colins, USA, 1980. [71] Daugherty R. L., Franzini J. B., "Fluid Mechanics with Engineering Applications", McGraw-Hill Book Company, 1977. [72] Friedman J. H., "An Overview of Predictive Learning and Function Approximation", in From Statistics to Neural Networks, V. Cherkassky, J. H. Friedman, and H. Wechsler (eds.), NATO ASI Series F, 136, New York: Springer Verlag, 1994. [73] Vapnik V., Levin E., Le Cun Y., "Measuring the VC-dimension of a Learning Machine", Neural Computation, 6, 851-876, 1994. 177 [74] Hand D. J., "Kernel Discriminant Analysis", Research Studies Press, 1982. [75]Zurada, J. M., "Introduction to Artificial Neural Systems", West Publishing Co., [76] Hornik K., Stinchcombe M. , White H., "Multilayer Feedforward Networks are Universal Approximators", Neural Networks, Vol. 2, pp. 359-366, 1989. [77] Funahashi K., "On the Approximate Realization of Continuous Mappings by Neural Networks", Neural Networks, Vol. 2, pp. 183-192, 1989. [78]Breiman L., Friedman J. H., Olshen R. A., Stone C. J., "Classification and Regression Trees", Wadsworth Inc., 1984. [79] Van Trees H. L., "Detection, Estimation, and Modulation Theory", part I, John Wiley and Sons, 1968. [80] Gersho A., Gray R. M. , "Vector Quantization and Signal Compression", Kluwer Academic Publishers, 1991. [81] Vapnik V., "Statistical Learning Theory", John Wiley and Sons, 1998. [82] Miller A. J., "Subset Selection in Regression", Chapman and Hall, 1990. [83]Kreyszig E., "Introductory Functional Analysis with Applications", John Wiley & Sons, 1989. [84] Devor J. L. , "Probability and Statistics for Engineering and the Sciences", third edition, Brooks/Cole Publishing Co, 1991. 178
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Tensile strength estimation of lumber
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Tensile strength estimation of lumber Saboksayr, Hossein 2001
pdf
Page Metadata
Item Metadata
Title | Tensile strength estimation of lumber |
Creator |
Saboksayr, Hossein |
Date Issued | 2001 |
Description | The problem of wood tensile strength estimation of softwood lumber is studied in this thesis. The main contributions brought to this topic here are first, a set of knot geometry features that can be used in board strength estimation, and second a learning algorithm that selects the best set of features for the purpose of strength measurement. The estimation problem is posed as an empirical learning problem that is based on the measured properties of wood. The process of producing the required database consisted of three distinct tasks: selecting and preparing the boards, measuring a set of properties of wood for every board, and estimating the measured strength of each board from the measured profiles. A set of boards, providing a random sample of softwood lumber, already existed at UBC (from previous experiments). These boards were measured and used as the preliminary database. A second set of boards was selected randomly from the regular production of softwood lumber. These boards created the evaluation data set. For the measurement task, all the boards were scanned using the available measurement machines. These machines were SOG and Microwave for grain angle measurement, X-ray for local density measurement, dynamic bending machine for the Modulus of Elasticity measurement, as well as the ultimate tensile strength tester for measuring the tensile strength of a board. The output profiles per board were saved in a data file (one data file per board per machine). The measured data files were stored in a database consisting of a structure of directories. In the strength estimation task all the measured profiles of a board were mapped to specific features (usually statistical moments) and the features were then mapped to the strength of the board. One of the features of a board is the set of its knots. A conic model of a knot was chosen and the related mappings were developed such that the X-ray scanning could be used in order to detect the existence, location, and shape of knots in a board. Then geometrical features were proposed such that the set of knots of a board could be transformed into a set of features suitable for strength estimation methodology of this thesis. Since specimens are costly to measure, means to reduce the number required were developed. To this end statistical learning theory was applied. This theory addresses the suitability of the learning model for the physical problem and the effectiveness of the features for the estimation problem. Based on this theory, the ASEC learning model was developed. The learning problem for wood tensile strength estimation was divided into three problems: defining the most suitable feature set, measuring the suitability of a learning machine, and using the a priori knowledge about the dependence in the learning machine. A method for measuring the suitability of a regression estimator (VC-dimension) was developed in order to select the best model in a class of models. The ASEC learning model was developed in order to find the best set of new features from the given feature set by using the known dependencies. Different learning machines were tested in order to determine what model is most suitable for tensile strength estimation of lumber. The validity of all the methods was demonstrated by analytical proof, by simulation, or by test on the database. |
Extent | 18574283 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-10-09 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0065354 |
URI | http://hdl.handle.net/2429/13832 |
Degree |
Doctor of Philosophy - PhD |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2001-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_2001-715302.pdf [ 17.71MB ]
- Metadata
- JSON: 831-1.0065354.json
- JSON-LD: 831-1.0065354-ld.json
- RDF/XML (Pretty): 831-1.0065354-rdf.xml
- RDF/JSON: 831-1.0065354-rdf.json
- Turtle: 831-1.0065354-turtle.txt
- N-Triples: 831-1.0065354-rdf-ntriples.txt
- Original Record: 831-1.0065354-source.json
- Full Text
- 831-1.0065354-fulltext.txt
- Citation
- 831-1.0065354.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0065354/manifest