UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Using texture energy measures for the segmentation of forest scenes Dumoulin, François 1985

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1986_A6 D85.pdf [ 7.39MB ]
Metadata
JSON: 831-1.0096668.json
JSON-LD: 831-1.0096668-ld.json
RDF/XML (Pretty): 831-1.0096668-rdf.xml
RDF/JSON: 831-1.0096668-rdf.json
Turtle: 831-1.0096668-turtle.txt
N-Triples: 831-1.0096668-rdf-ntriples.txt
Original Record: 831-1.0096668-source.json
Full Text
831-1.0096668-fulltext.txt
Citation
831-1.0096668.ris

Full Text

USING T E X T U R E ENERGY MEASURES FOR THE SEGMENTATION OF FOREST SCENES by FRANCOIS DUMOULIN B.A.Sc, Laval University, 1978 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF FORESTRY in THE FACULTY OF GRADUATE STUDIES (Department of Forestry/Remote Sensing) We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA December 1985 ©Francois Dumoulin, 1985 In p resen t ing this thesis in partial fu l f i lment of the requ i rements for an a d v a n c e d d e g r e e at the Univers i ty of Brit ish C o l u m b i a , I agree that the Library shall m a k e it freely avai lable for re fe rence and s tudy . I further agree that p e r m i s s i o n for ex tens ive c o p y i n g o f this thesis for scho lar ly p u r p o s e s may be g ranted by the h e a d o f m y d e p a r t m e n t o r by his o r her representat ives . It is u n d e r s t o o d that c o p y i n g o r p u b l i c a t i o n of this thesis for f inancia l gain shall no t b e a l l o w e d w i t h o u t m y wr i t ten pe rm iss ion . D e p a r t m e n t of F o r e s t r y T h e Un ivers i ty o f Brit ish C o l u m b i a 1956 M a i n M a l l V a n c o u v e r , C a n a d a V 6 T 1Y3 D a t e December 17. 1985 DE-6(3 /81) A b s t r a c t The stratification of forest cover is a basis for most forest management activities. With the development of computer-based image analysis systems, attempts have been made to automate the photo-interpretation process using local spectral signatures. The results have generally failed to meet the classification accuracy of trained humans. To complement the spectral analysis, local texture analysis is proposed. This thesis investigates the potential of a technique based on texture energy measures (Laws 1980) for the segmentation of forest scenes. The technique is tested with a set of two panchromatic aerial photographs digitized at four spatial resolutions: 1.5, 5, 10 and 20 m. The main conclusion of these tests is that given an adequate spatial resolution, texture energy measures can provide a reliable segmentation of forest cover into two classes of textures corresponding to stands with open and closed canopies. Only the finest spatial resolution (1.5 m) proved to be adequate. ii C o n t e n t s Abstract ii List of Tables vi List of Figures vii Acknowledgements viii 1 Introduction 1 2 Understanding Texture 5 2.1 In Search of a Definition 5 2.2 A First Dimension: the Primitives 6 2.3 A Second Dimension: the Spatial Organization 6 2.4 On the Perception of Texture 7 2.4.1 The Theory of Early Visual Processing 7 2.4.2 A n Important Conjecture 8 2.4.3 The Theory of the Dual Visual System 9 2.5 Properties of Textures 12 2.6 Image Texture and the Physical World 14 2.7 Textural Patterns in Forest Scenes 15 2.7.1 The Three-Dimensional Forest 16 2.7.2 The Illumination 18 2.7.3 The Viewing Geometry 19 2.7.4 The Recording Process 19 iii 3 Computational Approaches to Texture Analysis 21 3.1 First-Order Statistics 24 3.2 Second-Order Statistics 24 3.2.1 Gray Level Dependency Matrices 25 3.2.2 Gray Level Differences 28 3.2.3 Gray Level Run Lengths 30 3.3 Measures of Spatial Frequency 31 3.3.1 Autocorrelation Function 31 3.3.2 Fourier Transform 32 3.4 Texture Transforms 34 3.4.1 Local Statistics of Image Intensities 37 3.4.2 Spatial Filtering 38 3.4.3 Distribution of Local Features 39 4 Laws' Texture Energy Measures 43 4.1 Three Computational Processes 44 4.2 The Spatial Masks 46 4.3 The Macro-Statistic 48 4.4 The Principal Components 49 4.5 The Performance 50 4.6 Advantages and Disadvantages 50 5 Material and Methods 53 5.1 The Photographs 53 5.2 The Reference Data 58 5.3 Equipment 60 5.4 Software Required 60 5.5 The Experiments 61 6 Results and Discussion 65 6.1 The Coarser Spatial Resolutions 66 6.2 Lake Agata - 1.5 m 70 6.3 Lake Relay - 1.5 m 73 6.4 Improving the Technique 74 6.4.1 Local Rank Correlation 74 6.4.2 No Ratioing 75 iv 6.4.3 Local Histogram Standardization 76 6.4.4 Customized Filters 77 6.4.5 Selecting a Subset of Laws' Masks 77 7 Conclusion 97' Bibliography 101 Appendix Principal Components Analysis 107 v L i s t o f T a b l e s 6.1 L. Agata - 5, 10, 20 m — Correlation Matrices 89 6.2 L. Agata - 5, 10, 20 m — Variance Explained 90 6.3 L. Agata - 5, 10, 20 m — Correlation: Energy Planes vs Components . . 91 6.4 L. Relay - 5, 10, 20 m — Correlation Matrices 92 6.5 L. Relay - 5, 10, 20 m — Variance Explained 93 6.6 L. Relay - 5, 10, 20 m — Correlation: Energy Planes vs Components . . 94 6.7 L. Agata - 1.5 m — Principal Components Analysis 95 6.8 L. Relay - 1.5 m — Principal Components Analysis 96 vi L i s t o f F i g u r e s 2.1 Two Random Dot Patterns With Different Second-Order Statistics 10 3.1 Three Representations of a Joint Probability Distribution 26 3.2 Two Partitions of the Fourier Power Spectrum 35 3.3 Intensity Extrema Along a Scan Line 41 4.1 Laws' Approach to Texture Analysis 45 4.2 The Nine Masks Proposed by Laws (1980) 47 5.1 Geographic Location of the Two Sites 56 5.2 The Original Images at the 1.5 and 5 m Resolutions 57 5.3 Positions of the Digital Images on the Photographs 59 6.1 Lake Agata - 5 m — The First Two Components 78 6.2 Lake Agata - 5, 10, 20 m — A Human Interpretation 79 6.3 Lake Agata - 1.5 m — Two Texture Energy Transforms 80 6.4 Lake Agata - 1.5 m — The First Three Components 81 6.5 Lake Agata - 1.5 m — A Computer-Based Classification 82 6.6 Lake Agata - 1.5 m — A First Human Interpretation 83 6.7 Lake Agata - 1.5 m — A Second Human Interpretation 84 6.8 Lake Relay - 1.5 m — The First Two Components 85 6.9 Lake Relay - 1.5 m — A Computer-Based Classification 86 6.10 Lake Relay - 1.5 m — A Human Interpretation 87 6.11 Lake Agata - 1.5 m — An Example of L3L3SDV Plane 88 vii A c k n o w l e d g e m e n t s I would like to express my sincere gratitude to my supervisor, Dr. Robert J. Wood-ham, for his support and for a few enlightening conversations which have guided my work. I also want to thank the people of the Forestry Service at the St-Maurice Woodlands Division of CIP Inc. for providing me with the photographs and forest maps required for my research. Special thanks go to Andre Dion, a fellow forester, and to three gifted photo-interpreters: Raymond Genest, Horst Prager, and Pierre-Paul Guimond. I must also extend my appreciation to the fellow graduate students with whom I have shared the facilities of the Laboratory for Computational Vision during this past year. Finally, I would like to thank Nedenia M . Krajci who took care of the digitizing of the photographs used in my research. The funding for this thesis was provided by the Natural Sciences and Engineering Research Council of Canada in the form of two postgraduate scholarships (1983-84 and 1984-85), and by the Canadian Forestry Service in the form of a fellowship (1985). viii Chapter 1 Introduction Nowadays, for statistical and ecological considerations, the stratification of forest cover is a basic requirement for planning most forest management activities. Since the 1940's, forest cover type mapping has been done by interpreters using aerial photographs of different types (panchromatic, colour, infrared false colour) and at various scales. To perform a segmentation of forest cover, photo-interpreters must rely on image features such as tone or colour, texture, context, shape and size (Avery and Berlin 1985). Fur-thermore, specific knowledge acquired through appropriate training enables them to infer other features such as topography, soil characteristics and site quality. The imminence of wood shortages in several parts of the world has prompted an in-tensification of forest management activities. This translates into increased and pressing needs for mass screening of aerial photographs and other types of remotely sensed images, mainly for inventory and monitoring purposes. With the development of computer-based image analysis systems and the availability of multispectral imagery in digital format, there have been numerous attempts to automate the segmentation of forest cover (van 1 Roessel 1971; Kalensky and Sayn-Wittgenstein 1974; Kalensky and Scherk 1975; Kalen-sky and Wightman 1976; Zsilinszky and Pala 1978). Most of these trials used only local spectral signatures, processed by per-point classifiers in a supervised or unsupervised mode. The results have generally failed to meet the classification accuracy achieved by experienced photo-interpreters. This indicates that spectral evidence alone is not enough to adequately classify the forest cover (Kalensky and Sayn-Wittgenstein 1974; Starr and Mackworth 1978; Hoffer 1978). It has been suggested that a better approach would be to integrate more of the features used by photo-interpreters, such as topography, texture, and context (Sayn-Wittgenstein and Kalensky 1974; Kalensky and Wightman 1976; Harris 1980). Studies have shown that the addition of topographic data (elevation, slope and aspect) to the spectral information may increase significantly the classification accuracy (Hoffer and Staff 1975; Hoffer et al. 1979; Woodcock et al. 1980). Unfortunately, the non-availability of digital elevation models for many areas of interest is a major restriction on the use of this approach. Texture is an important cue used by human interpreters for the segmentation of forest scenes. Research on texture perception and texture analysis started in the 1960's, with major insights provided by the work of Bela Julesz and his colleagues on pre-attentive vision (Julesz 1965, 1975, 1984; Julesz et al. 1973; Julesz and Bergen 1983). Several ways of translating image texture into useful measurements have since been proposed by researchers in the field of image analysis and pattern recognition. In the context of the forest cover segmentation problem, texture has recently been 2 incorporated into various machine-based schemes and is the subject of an increasing num-ber of exploratory studies (Triendl 1972; Sayn-Wittgenstein and Kalensky 1974; Iisaka et al. 1978; Schneider 1978; Strahler et al. 1979; Teillet et al. 1981). In 1980, a new technique for texture analysis based on texture energy transforms was introduced by Kenneth Ivan Laws (1980) in his doctoral thesis at the University of Southern California. His approach has captured the interest of many of those re-searchers in quest of textural features suitable for the segmentation of various types of textured images (Davis 1982; Pietikainen et al. 1981, 1983; Harwood et al. 1983). On one experimental data set, Laws' texture energy transforms were shown to yield a better discrimination than the most popular method, based on co-occurrence matrices. Accord-ing to Laws, these texture energy transforms are good descriptors of natural textures, that are usually characterized by a significant degree of randomness. The objective of this thesis is to investigate the use of Laws' texture energy measures for the segmentation of forest scenes into texture fields. To proceed with the presentation of this research, a basic understanding of texture is essential. This is the topic of chapter 2. In the same chapter, a parallel is drawn between the spatial patterns occurring in the forest and the textures observed in a photograph. Many of the textural features proposed for segmentation or classification of natural scenes are briefly reviewed in chapter 3. The method developed by Laws (1980) is presented in chapter 4. The experiments performed using three panchromatic aerial photographs resampled at various spatial resolutions are presented in chapter 5, followed by the results and a discussion in chapter 6. The technique of principal component analysis used in 3 the experiments reported in this paper is summarized in the Appendix. 4 Chapter 2 Understanding Texture Together with context and tone, texture is an important cue for the human visual system. The lack of correlation between tonal and textural dimensions in a wide range of patterns observed in natural scenes indicates that texture can convey additional infor-mation (Aggarwala 1978). Humans have an amazing ability to process this information and to discriminate a wide variety of textural patterns. 2.1 I n S e a r c h o f a D e f i n i t i o n Despite the importance of texture for our visual system and after more than 20 years of research in psychophysics on the perception of texture, there is still no single, formal, agreed upon definition of texture. The perception of texture is not yet fully understood and therefore no definitive definition can be formulated. Instead of a single definition of general acceptance, many specific definitions of tex-ture are in use. These definitions differ in wording but they all share some basic concepts. Robert M . Haralick (1979) described texture as an organized area phenomenon charac-5 terized by two dimensions. The first dimension relates to the texture primitives, and the second, to the spatial organization of these primitives. 2.2 A F i r s t D i m e n s i o n : t h e P r i m i t i v e s The notion of primitives, sometimes called texels, is central to texture (Ballard and Brown 1982). These primitives are derived from the intensity changes observed in an image. Simply put, the primitives are small regions with identifiable tonal or colour properties (Haralick 1979). They may extend over a single pixel or they may be an aggregate of pixels, in which case they can take various shapes. Aside from tone and shape, primitives with sufficient spatial extent can also be de-scribed by other attributes such as contrast and orientation. A textural pattern may include different types of primitives. Thus, a first characterization of a textural pattern can be based on the attributes of its various primitives. These attributes may be fixed, or varying according to a probabilistic rule. 2 . 3 A S e c o n d D i m e n s i o n : t h e S p a t i a l O r g a n i z a t i o n The second dimension of texture is the spatial dependence of its primitives. This ar-rangement may be regular, as for the bricks in a wall or the scales of the skin of a snake. A textural pattern with regular placement rules is best described by a grammar that could be used for generating it (Ballard and Brown 1982). A grammar provides a structural model. 6 The spatial dependence of the texture primitives may also vary according to proba-bilistic rules, as in an aerial view of the trees in a forest stand. Such textural patterns are best described by a stochastic model. 2.4 O n t h e P e r c e p t i o n o f T e x t u r e The notion of primitives is common to recent theories on vision and texture segmentation (Beck et al. 1981; Marr 1976; Julesz and Bergen 1983). But, there is some disagreement on the importance of placement rules for the discrimination of textural patterns. Two recent models of human vision are considered here. The first one is the theory of Early Visual Information Processing by the late David Marr (1976, 1982) and the second is the Dual Visual System model proposed by Bela Julesz (1984). In both models, the perception of differences in textural patterns is considered more fundamental than form recognition, and does not involve attentive processes such as scrutiny. 2.4.1 The Theory of Early Visual Processing Marr's theory is based on a hierarchy of symbolic representations of an image. The first one, the primal sketch, is the representation of interest for the discrimination of most types of textural patterns. Marr defines it as a primitive but rich symbolic description of the intensity changes occurring in an image. It is formulated in terms of tokens (primitives), which are various types of edges, line segments (thin bars) and blobs (Marr 1976). Each of these tokens has a list of attributes: orientation, size, local contrast, 7 position, and number of termination points. In this model of vision, the perception of textural patterns is implemented by grouping processes acting on a series of first-order statistics computed on the attributes of the tokens of the primal sketch. Grouping is a central concept in Marr's theory. The tokens of the primal sketch can be directly agglomerated into regions (e.g. textural patterns) through recursive grouping processes, if the presence of a distinctive local property is detected. The series of rules controlling the local grouping processes are based on statistics of the distribution of token attributes computed over moderately sized regions. Measures of spatial proximity and collinearity are also used. Examples of the proposed distributions of token attributes are: the number of blobs for different contrast and intensity levels; the number of tokens having a specific orientation; the widths, lengths and contrasts; and the spatial density of tokens or groups of tokens. In his publications, Marr (1976, 1982) provides examples of the potential of this symbolic representation and the grouping processes for texture discrimination. 2 . 4 . 2 A n I m p o r t a n t C o n j e c t u r e In the mid 1960's, after several experiments with random dot patterns, Julesz (1965, 1975) conjectured that textural patterns with identical first- and second-order statistics but different third- and higher-order statistics could not be distinguished by humans. By nth-order statistics, it is meant statistics computed from the nth joint probability 8 distribution of picture elements. For example, a second-order statistic can be computed from the distribution of the values observed at the two points of a dipole repeatedly dropped over an image. If two textural patterns have the same nth-order statistics, all their lower-order statistics are also the same, since the statistics of any given order can be uniquely determined by those of a higher-order (Julesz 1975). An example of textural patterns differing in their second-order statistics is given in Figure 2.1. Julesz and others have since found several counter examples where textures with identical second-order but different third-order statistics could be preattentively dis-criminated (Julesz et al. 1973, 1978). This was shown to be the result of differences in local features (texture primitives). The differences in second-order statistics are largely explained by differences in first-order statistics of local features such as lines and line ends (Pietikainen et al. 1983). Marr's grouping processes use such first-order statistics of local features (the tokens of the primal sketch). 2.4.3 The Theory of the Dual Visual System The two visual systems contemplated in Julesz's Dual Visual System model are the preattentive and attentive visual systems. The first one does not involve scrutiny, while the second one relies heavily on this process. Only the preattentive visual system is of interest here, since the segmentation of most types of textural patterns does not require the use of cognitive processes and happens almost instantaneously, that is within 50 to 100 msec (Julesz 1984). 9 Figure 2.1: Two Random Dot Patterns With Different Second-Order Statistics. 10 According to Julesz's theory, the preattentive visual system can detect differences in local features, anywhere in the image at once. These local features, called textons, are elongated blobs of various shapes (e.g. ellipses, rectangles, line segments) with specific attributes: colour or tone, angular orientation, width, length, binocular and movement disparity, and flicker rate. The set of textons also include line ends and crossings of line segments. Except for this last type of texton, the primitives or local features used in Julesz's model are similar to those of Marr's model. With regard to texture discrimination, the main difference between the models lies in the assertion made by Julesz (1984) concerning the inability of the preattentive system to compute even simple first-order statistics of the local features or to perceive the positional relationship between adjacent primitives. He argues that only differences in texton types or in their density can be detected at the preattentive level. According to Marr's theory, the discrimination of textures in non-attentive vision involves not only detection processes but also construction processes. Furthermore, the rules governing Marr's grouping processes are based on first-order statistics of the local features and on their spatial organization. This brief overview of how the perception of texture is implemented according to two recent theories on human vision clearly highlights the importance of the notion of primitives. The second dimension of texture, the spatial dependence, is not given the same importance in both theories. The Marr's grouping processes use measures of spatial 11 proximity and collinearity of tokens, while much simpler computations (density only) are involved in the preattentive visual system described by Julesz. In the context of the segmentation of images into texture regions, if a supervised approach is to be implemented, it is desirable to use a set of textural features compatible with the human perception of visual textures (Tamura et al. 1978). 2.5 Properties of Textures Texture is essentially a local property of an image, it characterizes the structural rela-tionship of the picture elements (pixels) in a region (Faugeras and Pratt 1980). Textural patterns are very sensitive to the scale of the image and to the tesselation used in the dig-ital recording or transformation of that image (Sayn-Wittgenstein and Kalensky 1974). For a fixed scale, an increasing pixel size will cause a reduction of the relative variation between the pixel values (Sadowski et al. 1978; Markham and Townshend 1981). When the variation in image intensities for neighbouring pixels is reduced, tone tends to dom-inate texture as the cue providing the strongest discrimination between the regions in the image. Texture being a local property, the dimension of the region under observation will also affect the perception of texture (Moik 1980). If it is too small, only the primitive will be visible, the spatial arrangement will not. While tone is very much affected by variations in the illumination, texture is relatively stable under such changes. Humans can recognize the same textural pattern under a wide 12 range of illuminations and contrasts. But, tone and texture are not fully dissociable. Textural patterns differing in their element density, length or width, often differ also in their large scale average intensity (Riley 1981). Texture can also be a hierarchical phenomenon: a particular textural pattern may repeat itself to form a larger textural pattern observable only in an image at a smaller scale (Nevatia 1982). While the leaves may be the primitives of the textural pattern observed in an image of a tree crown, the tree crowns themselves become the primitives in an image of a forest stand. Mathematical and computational descriptions of textural patterns rely heavily on quantitative and qualitative measures of the attributes of the primitives and the place-ment rules. For humans, simple qualitative descriptions of the textural pattern are more intuitive. These descriptions use perceptual texture dimensions to characterize and dis-criminate textural patterns (Laws 1980). Coarseness, density, uniformity, regularity and directionality are only a few of these perceptual dimensions. Coarseness refers to the relative size of the primitives. Density relates to the distance between neighbouring primitives. Uniformity concerns the type of primitives involved in a textural pattern: if only one type of primitive is involved, the texture is said to be uniform. If the placement rule is fixed, then the texture is perceived as regular. Textural patterns are also discrim-inated if they differ by the dominant direction in the arrangement of their primitives. 13 2.6 Image Texture and the Physical World An image is essentially a two-dimensional representation of a three-dimensional world. Some knowledge of this world and of the characteristics of the imaging process may facilitate the interpretation of the textural patterns. For example, the texture fields observed in an image may be caused by a particular mapping of a physical structure occurring in the three-dimensional world (e.g. trees in a forest, buildings in a city) or may correspond to a textured surface in the physical world, such as the black and white squares of a checker board. According to Marr (1982), the main factors responsible for the mapping of the real world into the particular intensity values of an image are: the geometry and the reflectances of the surfaces, the illumination of the scene, and the viewpoint. Texture is usually a property of surfaces and humans tend to relate variations in the size and shape of texture primitives to the orientation and depth of a plausible sur-face in the physical world (Ballard and Brown 1982). A regular textured surface in the three-dimensional scene will have an altered distribution in the two-dimensional image according to the local orientation of the surface with regard to the viewer. Thus, occa-sionally, some key information about the organization of the real world can be inferred from texture gradients. The direction of the gradient, or the direction of the maximum rate of change in the size of the primitives, can be interpreted as the rotation of the surface around the line of sight of the camera. The magnitude of that gradient provides a measure of depth, or the 14 degree of tilt of the surface with respect to the focal plane of the camera. To infer surface slope and aspect from texture gradients, a good knowledge of the characteristics of the projection system is essential to factor out the effects of changing surface orientation from those due to perspective (Riley 1980). Aside from texture gradients, discontinuities in image texture are associated with discontinuities in surface structure or in surface geometry, and can therefore provide additional information about the three-dimensional world (Riley 1980). In general, textures can be grouped in two classes: man made textures and natural textures. The man made textures tend to be more regular, with more precise placement rules and relatively well defined primitives. An aerial view of a city conveys this idea. On the other hand, most natural textures are more easily described by statistical models because they can't be characterized by an obvious fixed primitive or a fixed pattern of repetition (Nevatia 1982). The primitives of many natural textures seem to be just beyond our perceptual resolving power (Laws 1980), to be very close to the size of a single pixel (Ballard and Brown 1982). 2.7 Textural Patterns in Forest Scenes In this thesis, the use of texture analysis for the segmentation of forest scenes is investi-gated. Some of the factors responsible for the textural patterns observed in such scenes are briefly discussed in this section. An image of a forest being a mapping of the three-dimensional forest, the main factors giving rise to textural patterns are: the structure 15 and the other attributes of the forest, the illumination conditions, the viewing geometry, and the recording process. 2.7.1 T h e T h r e e - D i m e n s i o n a l F o r e s t A forest is a complex composition made of various types of objects and surfaces. The ob-jects are trees, shrubs and rocks. The surfaces are small ponds and streams, exposed soil, and ground level vegetation. Variations occur in the distribution and in the attributes of these objects and surfaces. In a monochrome image, this is all mapped into aggregates of pixels of variable tone and size, corresponding to crowns varying in shape, dimension and colour; shadows of trees varying in size and shape; patches of undergrowth (openings) of variable spatial extent and tone, and water bodies of relatively homogeneous tone. The texture observed in forest scenes may be associated with several parameters of the three-dimensional world, and more specifically of the arborescent vegetation. It may be affected by differences in the age of forest stands: fine textures generally correspond to young stands, and coarser ones to mature growth (Whiteford 1978). It may also re-flect changes in species composition and in stand density. Photo-interpreters routinely translate the textural patterns observed in aerial photographs into percentages of crown closure and stand densities, and they also use textural features to supplement the tonal information for species recognition. In a computerized approach, Schneider (1978) was successful in differentiating spruce stands from pine stands using textural features de-scribing the specific branching patterns of the two species. Iisaka et al. (1978) obtained 16 a relative success in a computer-based segmentation of a digitized aerial photograph into two stand densities using textural features. Clearly, an understanding of the biological rules governing the spatial distribution of trees and tree attributes should help in the analysis of textures. In nature, there seems to be a fragile equilibrium between the factors that cause randomness and those that oppose it. While the mechanisms of reproduction tend to favor clustering, natural dispersing processes result in an increasing randomness (Skellam 1952). Two broadly defined biological forces control the spatial distribution of trees: the effect of competition and the common influences (Sayn-Wittgenstein 1970). In general, factors such as soil fertility, topography and micro-climate change grad-ually with distance. These common influences explain the relative homogeneity found in nature. Where a tree of given attributes is encountered, chances are that other trees with roughly similar attributes will also be present in the neighbourhood. The competition among trees affects the distribution in a different manner and acts over shorter distances. Trees have spatial requirements for the development of their crown and root system. This reduces the likelihood of finding a tree within a very short distance of another one. In dense stands, competition will result in a more uniform distance between trees. Spatial distributions in plant communities have been extensively studied by ecologists and forest statisticians. Recently, some researchers in the field of remote sensing have investigated the effects of tree distributions on the sensor responses in order to predict the parameters of these distributions using the sensor responses (Strahler and Xiaowen 17 1981; Franklin 1984). In forest stands as in most plant communities, the spatial patterns are rarely regular, sometimes random, and often clumped (Franklin 1984). These patterns are usually related to the population dynamics. In fully stocked stands where competition is in effect, the distribution tends to be random or regular, rarely clumped (Newnham, 1968). But in sparse stands where the spatial patterns are often clumped, the clumps are generally of variable size (Vandermeer 1981). In a study conducted in Northern California in an area dominated by pine species, Franklin (1984) found no repeating pattern or clumping occurring at sampling rates between 5 and 50 m. 2.7.2 The Illumination The illumination conditions prevailing at the time of image capture have an important effect on the mapping of the real world. Bright sunshine will produce sharply contrasted shadows of tree crowns, with orientation and spatial extent that will depend on the position of the sun. The resulting textural patterns will be mixtures of shadows, crowns, and undergrowth. Differences in stand density will affect the proportion of shadow, crown and undergrowth composing the textural patterns. Under cloudy skies, the shadows will be poorly contrasted, if present at all. Another factor related to the illumination is the presence of haze in the atmosphere. It tends to reduce the contrast in textures. This can reach the point where the textural information is completely lost. 18 2.7.3 The Viewing Geometry Another important factor affecting how the real world is represented in an image is the viewing geometry. If the view angle is truly vertical and the projection is orthographic, the mapping of a group a trees located anywhere in the scene will take the form of a group of roughly circular blobs superimposed over the undergrowth and the shadows of surrounding trees. But if a perspective projection is used, as for typical aerial pho-tographs, the phenomenon of radial displacement will cause a quite different mapping of the crowns of trees located near the edge of the frame. In a similar manner, if the view angle is not vertical, the mapping of the three crowns will take different shapes, from elliptical to triangular. With a perspective projection, topography can confound the mapping of the three-dimensional forest into textures in the two-dimensional image. This confounding in-fluence of terrain is minimized when the interpretation is done stereoscopically, but is difficult to factor out in the analysis of single images. 2.7.4 The Recording Process In the case of a forest scene, the spatial resolution and the spectral sensitivity of the sensor used in the recording and subsequent processes are of prime importance. For example, if in the spectral band chosen softwood and hardwood species have similar spectral signatures, a dense stand of softwoods may present the same texture as a dense stand composed of a mixture of hardwood and softwood species. The same two stands may 19 be mapped into distinguishable textural patterns in a spectral band in which softwoods and hardwoods are contrasted. Losses in texture information will occur if the spatial resolution is so coarse that each pixel corresponds to a mixture of species. In a study on the influence of spatial resolution on classification accuracies, Kan et al. (1975) suggested that the physical size of the features to be detected should be at least four times larger than the scanner resolution in order to ensure total containment of the feature in at least one pixel. In particular cases, the recording process itself may be the cause for a textural pattern that characterizes the whole image, not just a region. The striping in Landsat TM and MSS images is a typical example of such textural pattern. These systematic patterns must be removed prior to local texture analysis. 20 Chapter 3 Computational Approaches to Texture Analysis In the past 15 years, a large variety of techniques have been proposed to translate image texture into useful measurements for machine-aided interpretation. Some of the proposed textural features correlate with the perceptual texture dimensions used by humans (Hsu 1978, 1979; Tamura et al. 1978). But others, even among the most popular and powerful ones, are rather poorly correlated with these qualitative dimensions. An excellent review of early techniques is given by Robert M. Haralick (1979). Tech-niques developed since then are discussed in recent textbooks on image analysis and computer vision (see Ballard and Brown 1982, and Nevatia 1982). The purpose of the present chapter is to present a subset of these techniques and to discuss their advantages and disadvantages. This brief overview will serve as an introduction to the approach selected for our experiments, the texture energy measures (Laws 1980), presented in the next chapter. For this review, the textural features are grouped into the following categories: 21 1. Deterministic approach 2. Stochastic approaches • First-order statistics • Second-order statistics • Measures of spatial frequency • Texture transforms Different techniques for texture analysis do not all address the same problem. This explains, in part, the variety of textural features proposed. A first group of techniques is concerned only with the classification of samples of texture, while a second group is designed also to segment an image into texture fields. With the first group of techniques, a preliminary segmentation of the scene must be achieved by some arbitrary means and each image segment is considered to be a pure sample of a texture. The samples are often obtained by systematically dividing the original image into small regions of equal size. With the second group, image segmentation and texture discrimination are performed simultaneously. Since texture is a property of regions, there is a problem, which bears similitudes with the chicken and egg dilemma. That is, which comes first since textures determine regions and regions determine textures. In the presentation that follows of different computational approaches to texture analysis, it will be made clear if the approach assumes a previously segmented scene or if it is suitable as a segmentation scheme. 22 Deterministic Approach Textural patterns composed of primitives with fixed attributes and organized according to regular placement rules can be fully described by a deterministic model. To derive such structural formulation, individual primitives must first be isolated and explicitly described. The placement rules are then recovered from the spatial organization of the primitives. Parsing the structure of a textural pattern is difficult when both the primitives and the placement rules are unknown (Tamura et al. 1978). Techniques implementing a deterministic approach require an a priori segmentation of the scene into homogeneous texture fields. Few natural textures meet the rigid requirements of a structural model. Neither their primitives nor their spatial arrangement are completely uniform (Nevatia 1982). Instead, they usually exhibit considerable variation. Thus, structural approaches are rarely appropriate for the analysis of natural scenes. Stochastic Approaches Statistical texture measures have been motivated by the lack of simply defined patterns in natural textures (Nevatia 1982). Several of them are supported by Julesz's conjecture on the inability of humans to differentiate textural patterns having similar second- but different higher-order statistics. 23 3.1 First-Order Statistics First-order statistics computed from histograms of pixel attributes are among the sim-plest features proposed for texture discrimination. The standard deviation, skewness, kurtosis and higher moments have been used. The statistics computed describe a re-gion considered to be composed of only one texture. Thus, this technique assumes a previously segmented scene. The first-order statistics are weak textural measures since they do not account for the spatial distribution of the pixel attributes (Nevatia 1982). They fail to discriminate a large variety of textures differing by the size and shape of their primitives or by their spatial arrangement. The mean and standard deviation of image intensities have been shown to be insufficient descriptors of texture fields (Pratt et al. 1978). For several types of textures, the first-order statistics can be made equal and the resulting images will still be perceived as being different textures. 3.2 Second-Order Statistics Some popular features for texture analysis deal with second-order statistics. These fea-tures have been shown to be well suited for the discrimination of several types of natural textures (Haralick et al. 1973; Weszka et al. 1976). Three closely related techniques have been proposed. They are based on gray level dependency matrices, gray level differences and gray level run lengths. All three types of features measure principally the coarseness and directionality in textural patterns. The segmentation of the scene into homogeneous 24 texture fields is generally a prerequisite for the use of these three techniques. 3.2.1 G r a y L e v e l D e p e n d e n c y M a t r i c e s Textural features based on gray level dependency were first proposed by Rosenfeld and Troy (1970), but were popularized by Haralick et al. (1973). These features are defined as functions of joint probability distributions of gray levels. Their computation proceeds as follows: • The original image is segmented into small regions composed of (hope-fully) only one texture. • For each sub-image, a set of gray level joint probability distributions are computed. The distributions are stored into co-occurrence matrices. Each entry Mijjj of such a matrix is the number of cases where a pair of pixels separated by a distance d, in direction 6, have the intensity values i and j. See the example presented in Figure 3.1. • Textural features, defined as functions, are computed from these matri-ces to characterize the texture of each sub-image. At the end of the process, each sub-image is associated with an array of values describing its texture. A classifier operates on these arrays of features to group the sub-images into classes of textures. A total of 14 textural features have been proposed by Haralick et al. (1973). One example is the CONTRAST feature which measures the amount of local variation present in a texture: 25 A s u b - i m a g e ... 0 0 2 2 2 3 1 1 1 3 1 3 1 3 3 3 2 3 2 3 3 2 2 0 2 2 1 0 0 1 j 0 1 2 3 T h e c o - o c c u r r e n c e o m a t r i x f o r a n a n g l e i 1 o f 0 a n d a d i s p l a c e m e n t 2 o f 2 p i x e l s . 3 0 1 2 0 1 2 0 2 2 1 1 2 0 0 3 3 G r a y l e v e l d i f f e r e n c e s 0 1 2 3 f o r s a m e o r i e n t a t i o n 6 8 6 0 a n d d i s p l a c e m e n t . run length 1 2 3 4 5 6 _ 0 G r a y l e v e l r u n l e n g t h s _1 / f o r t h e s a m e a n g l e . e 2 3 1 2 0 0 0 0 4 0 1 0 0 0 2 2 1 0 0 0 4 1 1 0 0 0 Figure 3.1: Three Representations of a Joint Probability Distribution. 26 Tamura et al. (1978) observed that Haralick's contrast feature is heavily affected by the coarseness of a texture and has little correlation with the perceived contrast. Second-order statistics are affected by differences in first-order statistics. The tech-nique of equal probability quantization (Conners and Harlow 1978), also known as his-togram equalization or histogram flattening, is commonly used to equalize the first-order statistics. Although proven to be quite effective in several cases, the gray level dependency approach has a few shortcomings: • If the original number of quantization levels is large, big texture samples are needed to obtain reliable estimates of the joint probability distribu-tions. But increasing the size of the sample also increases the chances of having more than one texture present in it. A solution is to reduce the number of quantization levels but this means a loss of information for textures characterized by low tonal variations. A typical compromise is to use 16 gray levels and sub-images with side lengths in the order of 50 pixels (Pratt 1978) • The number of possible combinations of distances and orientations is large. To save time and computing resources only a limited number of combinations must be selected. Even with a reduced number of displace-ments and orientations, the dimensionality of the feature space remains fairly large and is a major handicap of the technique (Faugeras and Pratt 27 1980). A full set of co-occurrence matrices is needed for each texture sample. • These textural features do not correlate well with perceptual texture dimensions (Tamura et al. 1978). For the textural features which are linear functions of the co-occurrence matrices, the preliminary segmentation and the computation of the co-occurrence matrices themselves can be avoided, as indicated by a study done at McGill University (see Teillet et al. 1981). These features can be derived directly from the pixel values. They are computed in a moving window and the result is stored in a new feature plane at a location corresponding to the center pixel. When this approach is taken, the technique becomes a texture transform as discussed in Section 3.4. 3.2.2 Gray Level Differences Weszka et al. (1976) suggested a group of textural features defined as first-order statistics of local property values as a substitute to the set of features based on the full joint probability distribution. The local property selected is the absolute difference between the gray tones of a pair of pixels separated by a distance d in direction 9. The distribution Pay of these differences is stored in a 1 x K array. Each entry of the array is the number of cases where a difference i (absolute value) is observed, in an image with K quantization levels. An example of such distribution is given in Figure 3.1. The CONTRAST feature is an example of textural feature derived from Pjj : 28 K-1 Cd,e — i2 Pd,$,i where i is a gray level difference. This textural feature is equivalent to the CONTRAST feature derived from the co-occurrence matrices. In fact, the correspondance between Haralick's co-occurrence ma-trices and the probability density function of the gray level differences is direct. For a specific angle and displacement combination, each entry of is the sum of all entries of the co-occurrence matrix Md,e along lines parallel to the main diagonal (Wesska et al. 1976). Textural features derived from the distribution of gray level differences are used in exactly the same manner as those based on co-occurrence matrices. The textural pattern of each sub-image is described by an array of features. These arrays are input to a classifier to produce a grouping of the sub-images into classes of similar textures. In a comparative study, similar results were obtained with textural features derived from co-occurrence matrices and other features based on gray level differences (Weszka et al. 1976). The technique using the gray level differences suffers from problems which are similar to those mentioned for the gray level dependency approach. Its main advantage is a substantial reduction in the storage requirements (1 x K arrays vs K x K matrices). 29 3.2.3 Gray Level Run Lengths A third technique in this category uses textural features based on distributions of gray level run lengths for various orientations. A gray level run is defined as a maximal string of consecutive, collinear picture points having the same gray level (Galloway 1975). A series of histograms are produced by finding the number of runs for each length encoun-tered, for various orientations. An example of such histogram is given in Figure 3.1. The histograms are stored in matrices having the lengths and gray levels for indices. To reduce the size of the matrices, the original range of gray levels is generally remapped in fewer intervals. The textural features are defined as functions of these distributions. An example is the RUN PERCENTAGE feature: RPce = E E ^ ,=0 j-l 1 1 where Pij,e is the number of runs of length j in direction 6, with gray levels falling in the i th quantization interval. N is the number of pixels in the sub-image. This feature is the ratio of the total number of runs in a given direction, to the total number of possible runs of length one. It is, to some extent, a measure of coarseness. In a comparative study, textural features derived from the distribution of run lengths gave poorer results than those based on co-occurrence matrices or gray level differences (Weszka et al. 1976). 30 3.3 Measures of Spatial Frequency Textural patterns differing in coarseness, periodicity or directionality may sometimes be differentiated using textural features derived from the distribution of spatial frequencies present in the patterns. Two techniques employ such features. The first one is based on the autocorrelation function, and the second one, on the Fourier transform of the textural patterns. These two representations are closely related: the Fourier transform of the autocorrelation function of an image is equal to the absolute value squared of its Fourier transform (Pratt 1978). Both techniques require that the scene be first divided into sub-images (samples) containing only one texture. In general, frequency domain features have little application to randomly spaced texture elements as encountered in most natural textures (Laws 1980). Their usefulness is limited to patterns having definite periodicity or directionality. 3.3.1 Autocorrelation Function The autocorrelation function measures the degree of match between an image and a shifted version of itself (Haralick 1979). It provides information about the size of the primitives and to some extent about their spatial arrangement. The equation for the discrete case is: 1 y - M - . ' - l j j (M-i)(N-j) ^ n = 0 Im,n • l m + « ' , n + ; MN <^m=0 <^f»=0 m.n 31 where M and N are the two dimensions of the image and i and j are the displacements in the x and y directions respec-tively. The autocorrelation function of a coarse texture falls slowly with increasing shifts (i and j). Periodicity causes cyclic falls and rises of the function. Uncorrelated or poorly correlated patterns are common in natural textures and the autocorrelation function fails to properly characterize such patterns. The autocorrelation function is not a sufficient texture descriptor (Pratt et al. 1978). 3.3.2 Fourier Transform The Fourier transform is a global computation frequently used in image processing for restoration and enhancement purposes. In the context of texture analysis, each texture sample is mapped in a new coordinate system (spatial frequencies) and textural features are based on the peaks observed in the new mapping. The Fourier transform re-expresses a function (e.g. an image) as a series of sinusoidal waves varying in amplitude, frequency and phase. The equation of the transform for the continuous case is: 7(u,v)= f °° / +°° / ( z , j / ) e - 2 " > * + ^ dxdy J —00 J—CO where i = \/— 1, and u and v are spatial frequencies. 32 The transform is generally a complex valued function. It can be formulated in terms of its real R and imaginary I components, but a more useful representation is based on the magnitude (amplitude) and phase components: magnitude : M(u,v) = sJJl2 (u,v) + I2(u,v) phase :${u,v) = tan i j j ^ ^ ) Most of the applications of the Fourier transform in texture analysis used the power spectrum, which is simply the square of the magnitude. The usual presentation of the power spectrum places the origin of both axes in the middle of the new image. Peaks close to the center indicate low frequencies (long period) while those closer to the border of the image represent high frequencies (short period). A periodic pattern has peaks corresponding to its period in its power spectrum. The peaks are located near the origin for a coarse texture, while a finer one has its peak spread out over a wider range of frequencies. Textural features based on Fourier transforms are functions of these peaks. A typical textural feature is the average value in a partition of the power spectrum. Two different partitions have been proposed: radial bins and angular bins (see Fig-ure 3.2). The features computed over radial bins measure the coarseness and periodicity 33 of the pattern. Those associated with angular bins measure directionality. A coarse texture has a higher average value in a radial bin close to the origin. A textural pattern with strong linear features with a preferred orientation has its major peaks in a particular angular bin. In a comparative study (Weszka et al. 1976), the Fourier features did not perform as well as textural features based on second-order statistics. They were also more expensive to compute. Very few studies have used the phase component of the Fourier transform for texture analysis. In one such study, the texture information content of the phase component proved to be low for the textural patterns under consideration (Eklundth 1979). The main disadvantage of the Fourier-based features is that they fail to properly characterize natural textures, in which randomness is common. This class of textures have their peaks scattered in the frequency domain (Nevatia 1982). Because Fourier features can only be defined for texturally homogeneous samples, they are not suited for implementation in an automatic segmentation-classification scheme. 3 . 4 T e x t u r e T r a n s f o r m s Several of the most recent techniques are based on texture transforms, which are feature planes produced by applying local operators all over the original image. In most cases, the operators are functions of the intensities present in a moving window successively centered over each pixel. The value returned by the operator is assigned to the center 34 Fourier Features ffc ' V , H i ^ v / I K J J ) u / r \ u radial bins angular bins Figure 3.2: Two Partitions of the Fourier Power Spectrum u and v are spatial frequencies. 35 pixel of the window and stored in a texture transform plane. Each pixel in such texture transform is a measure of a textural property for a neighbourhood of fixed size in the original image. The texture transform planes are extremely well suited for both the region clas-sification with a priori segmentation and the simultaneous segmentation-classification problems. They may serve as input for some of the classifiers commonly used in statis-tical pattern recognition. But the assumption of normality required by some classifiers (e.g. maximum likelihood) is often violated by the texture transforms. The various techniques included in this category differ mainly by their local operators. On that basis, three groups of techniques are identified. They are: • Local statistics of image intensities. • Spatial filtering. • The distribution of local features in a neighbourhood. Techniques in the first two groups involve only one level of computation: the output of the operator applied to the original image is directly the texture transform. For techniques in the third group, two levels of computation are required. At the first level, local features are detected by a local operator and the result is stored in a temporary plane. At the second level, the temporary plane is processed with a different local operator that computes a statistic of the distribution of the local features. The performance of these techniques is dependent on the selection of an appropriate window size. 3 6 3.4.1 Local Statistics of Image Intensities Texture transforms corresponding to first-order statistics of the local distribution of image intensities have been developed by Hsu (1978). Within a window successively centered over each pixel, a number of statistics are computed. Hsu proposed two window sizes: 3 x 3 and 5x5. For the smaller window, he defined 17 features, including the first four central moments (average, standard deviation, skewness, kurtosis). For the bigger window, six additional features are defined. They are related to the location of intensity minima and maxima in the horizontal and vertical directions. The technique is simple and computationally fast. Classification accuracies of 85-90% were reported for a case of automated land-use mapping using digitized panchromatic aerial photographs (Hsu 1978). In a study using Landsat MSS data (Irons and Petersen 1981), Hsu's textural features failed to define texture classes that would correlate well with the land-cover classes present in the scene. The authors attributed the differences between their results and those reported by Hsu mainly to the different spatial resolutions of the data. The Landsat MSS data has a resolution of 79 x 79 m, comparatively to 2.67 x 2.67 m and 17.3 x 17.3 m for the data used in Hsu's work. In both studies, the third and fourth central moments (skewness and kurtosis) did not produce useful information for the discrimination of land-use classes. Still with Landsat MSS data, Woodcock, Strahler and Logan (1980) reported some success with the standard deviation textural feature for a 3 x 3 window. 37 The main problem with this technique is that features based on first-order statistics fail to differentiate a wide class of textures, as mentioned in Section 3.1. 3.4.2 Spatial Filtering Iisaka et al. (1978) proposed a series of texture transforms produced by spatial filter-ing. Their approach considers texture as a spatial distribution of area extensive objects characterized by their size, shape and reflectance or emissive properties (Iisaka 1979). By filtering the original image with a set of predefined patterns or spatial environments (lattice filters), response values are obtained and stored in texture transforms. The size of the filters is typically 3 x 3 or 5 x 5. Each filter represents a different pattern: horizontal lines, vertical lines, diagonals, squares, etc. The original image is convolved with the filters, and for every pixel, the response value obtained is a measure of the resemblance of the sub-image with the pattern described in the filter. For a specific data set, texture transforms derived from spatial filtering gave better results than others derived from the gray level dependence (Iisaka et al. 1978). Because of the different spatial patterns stored in the masks, the technique has the potential of detecting differences in first-, second- and possibly higher-order statistics. It can roughly quantify the perceptual texture dimensions of directionality and coarseness. The technique of spatial filtering, as implemented by Iisaka et al. (1978), has a serious drawback: due to the non zero mean of the filters, the response values are quite affected by broad shifts in the image intensity. Such shifts can be caused, for example, by 38 illumination gradients across the scene at the time of image capture. A simple solution is to modify the filters so that their new mean is zero, using negative and positive values instead of only positive integers as proposed by Iisaka et al. (1978). 3.4.3 Distribution of Local Features A few statistical approaches dealing with local features (texture primitives) and the local density of these features (measure of spatial arrangement) have been suggested. The general scheme followed by these techniques proceeds in two steps: first, the presence of local features is detected and the result is stored in a temporary image; then, a local operator applied to the temporary image computes a first-order statistic of the distribution of the local features. The final output is a texture transform in which the gray tone at resolution cell (t, j) indicates how common is the local feature around resolution cell (i,j) in the original image (Haralick 1979). Proposed local features are: intensity extrema (minima and maxima), edges, and combinations of lines, edges and spots. Using Intensity Extrema Relying on the evidence that the human visual system responds to local extrema in an image, Carlton and Mitchell (1977) developed a technique to produce texture transforms based on the location and relative magnitude of local extrema. On a given scan line, an intensity extremum of magnitude T is assigned to a pixel if its intensity value is the 39 lowest (highest) immediately before the intensity rises (drops) by T levels above (below) the value of that pixel (see Figure 3.3). The computations proceed as follows: 1. A logarithmic transform of the image intensities is performed to simulate the response of the human visual system. 2. The logarithmic image is scanned both vertically and horizontally to locate different sizes of extrema, corresponding to different thresholds. The outputs of the two perpendicular scans are then combined. 3. In a moving window applied to the resulting image, the sum of each size of extremum is computed and assigned to the center pixel of the window in a texture transform. A different texture transform is produced for each size of extrema. Carlton and Mitchell (1977) used three different thresholds for low level, medium level and high level extrema. Carlton and Mitchell noted the importance of the window size, which can be varied to allow for a hierarchical segmentation. Texture transforms produced with large windows resulted in the segmentation of the image into broad regions, which can be further detailed with texture transforms produced with smaller windows. The technique is computationally fast and simple to implement. Examples of im-age segmentation provided by Carlton and Mitchell, where forest and grass were easily discriminated using textural features only, indicate a good potential for the analysis of natural scenes. With such texture transforms, differences in texture coarseness, contrast and density (spatial arrangement) should be detectable. 40 6 -5 -p i x e l o n s c a n l i n e Figure 3 .3 : Intensity Extrema Along a Scan Line A indicates an extremum of 2 B indicates an extremum of 3 41 Using Lines, Edges and Spots As early as 1970, Rosenfeld and Troy proposed a measure of edgeness, or the amount of edges per unit area, for the classification of textures differing in coarseness. The edge detector selected for their study was the Roberts gradient, which is the sum of the absolute values of the differences between diagonally opposite neighbouring pixels. Triendl (1972) exploited a similar idea but with a different edge detector: a 3 x 3 Laplacian. His technique consisted in filtering the original image with the feature detector and then smoothing the output. The resulting texture transform may be used as input for supervised or unsupervised classifications. Triendl also computed a texture transform corresponding to a double filtering of the original image with a 3 x 3 averaging filter. These two texture transforms contained enough information to provide a relatively good segmentation of a forest scene containing stands varying in age and height. Triendl suggested that a larger set of local feature detectors could be used to detect the presence of line segments, edges, and spots, with various widths, contrasts and directions. An articulated implementation of this idea of local feature detection for texture anal-ysis has been developed by Laws (1980). This implementation is presented in detail in the next chapter. 42 Chapter 4 Laws5 Texture Energy Measures In 1980, in a Ph.D. thesis published at the University of Southern California, K.I. Laws introduced a new set of texture transforms based on the degree of match between a pixel neighbourhood and a set of standard spatial masks. Called texture energy measures, these image transforms are well suited for the segmentation and the classification of scenes composed of natural textures. The new approach has raised a lot of interest among the researchers involved in texture analysis (Davis 1982; Harwood el al. 1983), and is now presented as a major technique in recent textbooks on image processing and machine vision (Nevatia 1982, Ballard and Brown 1982). This is the approach selected for the experiments described in chapters 5 and 6. The general idea is to detect the presence of local features such as edges, lines and spots, by filtering the image with a set of spatial masks, and then to produce a series of transforms by computing a local first-order statistic of these features. In the texture transforms thus produced, the value of a pixel represents a quantitative description of the textural pattern present in the immediate neighbourhood of that pixel. 43 As mentioned in the previous chapter, Triendl (1972) proposed a fundamentally sim-ilar idea a few years earlier. But the texture energy approach is a more articulated implementation involving a specific set of spatial masks, the definition of which is the main novelty in Laws' approach. 4 . 1 T h r e e C o m p u t a t i o n a l P r o c e s s e s The sequence of computations required for the production of the texture energy measures is illustrated in Figure 4.1 and summarized in the following three steps: 1. Spatial Filtering A series of new images is produced by convolving the original one with a set of small spatial filters, called micro-windows by Laws, that are typically 3 x 3 or 5 x 5. 2. Non-Linear Local Transform Each new image is converted into an energy plane by processing it with a local statistical operator. The macro-statistic retained by Laws, called ABSAVE, is the average of the absolute values in a macro-window cen-tered over this pixel. The size of the macro-window, typically 15 x 15, is chosen to be smaller than the expected smallest texture field. 3. Data Compression Using the principal components transformation, the energy planes are linearly combined into a smaller number of images with higher informa-tion content. 44 LAIS' TEXTURE ENERGY MEASURES The Process Figure 4.1: Laws' Approach to Texture Analysis The process starts with the original image (/). Eight filtered images (F) are produced. They are converted into energy transforms (E) for which principal components (C) are computed. A classifier is used to produce a segmented image ( 5 ) . 45 The principal component planes are the end product of Laws' technique. Only a subset of these planes is required for the segmentation of the scene. All the intermediate images, including the texture energy transforms, can be discarded. 4.2 The Spatial Masks Laws' spatial masks were empirically derived by experimenting with a large number of filters proposed in the literature, and by testing these with various samples of natural textures. The masks retained have the common characteristic that they can all be constructed by simply multiplying three basis vectors by their transposes. The basis vectors are: Laws indicated that these three vectors span the space of the 1 x 3 vectors, meaning that any vector of this size can be expressed as a linear combination of the three basis vectors. A set of nine 3x3 filters results from the multiplication of the transposes of the basis vectors by the basis vectors themselves. The nine masks are presented in Figure 4.2. It can be shown that these nine masks are linearly independent and thus span the space of 3x3 vectors. Any 3x 3 matrix can be expressed as a linear combination of these masks. S3 L3 E3 ( 1, 2, 1 ) — a local averaging (-1, 0, 1 ) — an edge detector (-1, 2,-1 ) — a spot detector 46 L3L3 L3E3 L3S3 1 2 1 -1 0 1 -1 2 -1 2 4 2 -2 0 2 '-2 4 -2 1 2 1 -1 0 1 -1 2 -1 E3L3 E3E3 E3S3 1 -2 -1 1 0 -1 1 -2 1 0 0 0 0 0 0 0 0 0 1 2 1 -1 0 1 -1 2 -1 S3L3 S3E3 S3S3 1 -2 -1 1 0 -1 1 -2 1 2 4 2 -2 0 2 -2 4 -2 1 -2 -1 1 0 -1 1 -2 1 Figure 4.2: The Nine Masks Proposed by Laws (1980) The transpose signs are not shown, but L3E3 stands for LZ'EZ. 47 If the local features in the texture pattern are expected to be more complex than those depicted by the small masks, a set of twenty-five 5 x 5 masks, or forty-nine 7 x 7 masks, can also be derived from the original three basis vectors. The resulting masks are mostly center weighted or symmetrical. Research done at the University of Maryland with the set of 5 x 5 masks indicated that the pattern described is more important than the actual numerical values stored in the mask (Pietikainen el al. 1981). An important characteristic for eight of the nine 3 x 3 masks is that the sum of their entries is zero. This ensures that the response values obtained by convolving an image with these masks will be invariant to broad changes in the average gray level. The ninth mask, labelled L3L3, does not have the zero sum property. This mask provides a measure of the intensity in a micro-window, from which a measure of the local contrast can be derived for the macro-window. Several of the nine 3 x 3 filters proposed by Laws are not new, they have been used in edge detection before. For example, the S3S3 filter is one of the Laplacian masks, a rotation invariant second derivative operator and the L3E3 and E3L3 masks are the two Sobel gradient operators. 4 . 3 The Macro-Statistic To extract the texture information contained in the filtered images, Laws experimented with several local statistics computed in a moving window. He concluded that either the 48 variance or the standard deviation alone is sufficient to extract this information. But for a zero-mean field such as produced by convolving a sub-image with a zero sum mask, the local standard deviation can be replaced by a much quicker computation, the sum or mean of the absolute values of the convolved image. Laws called this statistic the ABSAVE macro-statistic. Pietikainen el al. (1981) indicated that the local maxima of the responses of the masks is almost as good as the ABSAVE statistic for the extraction of the texture information from the filtered images. In order to produce texture energy measures that are invariant to broad changes in average gray level, such as those caused by changes in illumination, Laws suggested that the energy planes be computed only for the zero mean masks and that these planes be ratioed with a plane he called L3L3SDV. Each value in the L3L3SDV plane is a measure of the contrast occurring in the neighbourhood of each pixel in the original image. This plane is produced by first filtering the image with the L3L3 mask, and then computing the standard deviation (not the ABSAVE statistic) in a moving window. 4 . 4 The Principal Components According to Laws, the principal components planes represent natural texture dimensions and are more reliable texture measures than the texture energy planes themselves. They are also better for visual display. The technique of principal components analysis is described in more detail in the Appendix. 49 The principal component planes can be seen as a series of overlay planes with the same tesselation. Thus, each pixel of the original image is associated with a texture feature vector composed of its value in each of these planes. Using the feature vectors as input, the segmentation of the scene is possible with simple per-pixel classifiers such as the nearest-centroid classifier, or with more intelligent segmentation algorithms followed by an editing phase. 4.5 The Performance The usefulness of the texture energy measures for the segmentation of natural scenes and for the classification of natural textures has been demonstrated by Laws. For a specific data set, he obtained much higher classification accuracies (maximum of 94%) than those possible with the most popular approach, based on co-occurence matrices (maximum 72%). The power of these energy measures and their superiority over measures based on joint distributions of pixel values has been confirmed by other experiments in which hit rates of nearly 90% were achieved in the classification of samples of natural textures (Pietikainen el al. 1981, 1983). 4 . 6 Advantages and Disadvantages The selection of Laws' approach for the experiments presented in this thesis was moti-vated by its numerous advantages and relatively few disadvantages over most of the other techniques proposed for the analysis of natural textures. To summarize, the advantages 50 Since the texture energy planes (or principal components) can be handled like supplementary spectral channels, the technique is particularly well suited for image segmentation using popular per-point classifiers. The discrimination power of the texture energy measures have been shown to be greater than that of features derived from co-occurence matrices, for at least several samples of natural textures. The computations involved are simple and relatively quick to perform using stan-dard software for convolution and box filtering (McDonnell 1981). The whole pro-cess can even be speeded up substantially with special purpose software to handle all the convolutions at once, using directly the three basis vectors. The approach is intuitively more appealing and more tractable than other popu-lar techniques. It also seems quite compatible with the models of human vision proposed by Julesz (1984) and Marr (1982). Since Laws' set of nine 3 x 3 spatial masks spans the space of 3 x 3 vectors, no other set of vectors of the same format will do better. The textural measures derived using the zero sum masks are invariant to gradual changes in average brightness. Therefore, two sub-images differing in their average gray level but depicting the same textural pattern will be classified as being the same. Using the L3L3SDV plane for ratioing the ABSAVE planes, the texture energy measures can be made contrast invariant, although Laws indicated that the effect of this correction on the classification accuracy is small. 51 There are of course some problems associated with Laws' approach. Like any tech-nique involving spatial filtering, the window size must match the resolution and the scale of the original image. The determination of an acceptable micro-window size requires an a priori knowledge of the size of the local features composing the textural patterns present. For the macro-window size (for the ABSAVE statistic), a priori knowledge about the average size of the textured fields is required. The lack of a priori knowledge can be compensated by experimenting with different sizes of windows, at the cost of more time and computational resources. 52 Chapter 5 Material and Methods The objective of the experiments conducted for this thesis is to investigate the use of Laws' texture energy measures for the segmentation and classification of the forest cover on panchromatic aerial photographs. The specific goal is to relate textural information to the percentage of crown closure. 5.1 The Photographs The two forest scenes selected for these experiments are large format (23 cm x 23 cm) panchromatic aerial photographs at a nominal scale of 1:15000. They were taken over sites located in the vicinity of the Gouin Reservoir in Quebec. The two sites are in the boreal forest region (Rowe 1972). More precisely, the first one, referred hereafter as the Lake Agata site, is in Section B.3 (Gouin), while the second one, referred as the Lake Relay site, is in Section B.l.a (Laurentide-Onatchiway). Both sites are covered by relatively similar forests composed mainly of coniferous species, which typically account for about 83% of the volume. The dominant species, in order of decreasing importance, 53 are: • Softwood Species • black spruce • balsam fir • jack pine • white spruce (Picea mariana (Mill.) BSP.) (Abies balsamea (L.) Mill.) (Pinus banksiana Lamb.) (Picea glauca (Moench) Voss) Hardwood Species • white birch • trembling aspen • balsam poplar (Betula papyrifera Marsh.) (Populus tremuloides Michx.) (Populus balsamifera L.) The geographic location of the sites is shown in Figure 5.1. Here are more details about each site and the associated photograph: • Site 1 Code name Lake Agata Location — longitude 74°52' West — latitude 48°49' North Elevation 420 to 520 m above sea Photograph — reel no Q-72810 — photo no 182 — focal length 153.37 mm — date June 5, 1972 — hour approx. 08:30 (Eastern Daylight Time) — sun elevation 32° — sun azimuth 91° (clockwise from North) 54 • Site 2 Code name Location — longitude Lake Relay 73°28' West 48°10' North 500 to 580 m above sea Q-72326 36 152.47 mm June 17, 1972 approx. 08:45 (Eastern Daylight Time) 35° — latitude Elevation Photograph — reel no — photo no — focal length — date — hour — sun elevation — sun azimuth 94° (clockwise from North) On both sites, the topography is relatively smooth. There is no shadowing due to topography except for a small landform (probably an esker) on the lake Agata scene. A central portion of each photograph measuring 19 cm x 19 cm has been digitized at a resolution of 100 micrometers (1900x 1900 pixels), corresponding to a spatial resolution of about 1.5 m on the ground. For each scene, three more images were produced by degrading this high spatial resolution to 5, 10 and 20 meters with a resampling algorithm using a four points bi-cubic convolution. Figure 5.2 presents the 1.5 m and 5 m resolution images for both sites. The four spatial resolutions were selected in order to investigate the potential of texture analysis with imagery currently produced by airborne multispectral scanners (5 m and 1.5 m), or to be produced by the sensors of the S P O T satellite (10 and 20 m) scheduled for launch in late 1985 or early 1986. 55 Figure 5.1: Geographic Location of the Two Sites 1 — Lake Agata 2 — Lake Relay 56 Lake Relay - 5 m original original Figure 5.2: T h e Original Images at the 1.5 and 5 m Resolutions 57 For the three coarser tesselations, the images cover the same area, but have different number of rows and columns: 128 x 128 for the 20 m resolution, 256 x 256 for the 10 m resolution and 512 x 512 for the 5 m resolution. To cover the same area at a resolution of 1.5 m, 1900 rows and 1900 columns are required for close to 4 megabytes of data. Images of this size would quickly use up the disk space available. Therefore, only subimages of 512 rows and 512 columns were used at the 1.5 m resolution. This size corresponds to the maximum resolution of the image display device. The location of the subimages on the original photographs is given in Figure 5.3. 5.2 The Reference Data Three segmentations of each photograph made independently by three experienced photo-interpreters constitute the reference data. The three photo-interpreters are: Horst Prager, Raymond Genest and Pierre-Paul Guimond. Their work and the photographs were graciously provided by the Forestry Service of the St-Maurice Woodlands Division of CIP Inc., a large Canadian pulp and paper company. These segmentations conform to the stratification criteria of the Inventory Branch of the provincial Ministry of Energy and Resources in Quebec. For the purpose of this research, information concerning species, age structure and height was dropped, to keep only the cover type (softwood, mixed wood, hardwood) and the four classes of crown closure: 58 I— 5.1 cm — I 1 19 c m 2 3 c m Figure 5.3: Positions of the Digital Images on the Photographs 1 — full photograph 2 — 5, 10 and 20 m resolutions 3 — Lake Agata - 1.5 m resolution 4 — Lake Relay - 1.5 m resolution 59 1 8 1 % and above 2 6 1 % — 8 0 % 3 4 1 % — 6 0 % 4 2 5 % — 4 0 % Since the boundaries between stands were drawn on transparencies overlayed on the photographs, the registration of the photographs and the associated interpretations was straightforward. The interpretations were digitized in vector format using a digitizing tablet. 5 . 3 Equipment The equipment of the Laboratory for Computational Vision used for this thesis include: • an Optronics System C-4500 scanning digitizer • a DEC VAX 11/780 computer • a Raster Technologies Model One/25 image display workstation Other input and output peripherals devices were also utilized. 5.4 Software Required To implement Laws' approach, several programs are needed. Some of these were already available, the rest had to be developed. The principal programs required are: 60 CONVOLVE To perform the spatial filtering. Written by Alan Carter. LOCSTAT To compute a variety of local statistics in a moving window. Written by Francois Dumoulin. RATIO To produce an image by ratioing two other images. Written by Francois Dumoulin. REBPP To requantize an image into 8 bits per pixel for display or data compression purposes. Written by Francois Dumoulin. PRINCOM NNC To perform a principal component analysis and/or transform. Written by Francois Dumoulin, based on a program written by Tim Lee. A nearest centroid classifier (misnamed nearest neighbour). Written by Stewart Kingdon. A few other programs written by Stewart Kingdon, Marc Majka and Francois Du-moulin were also used. 5.5 The Experiments The main experiments conducted for this thesis are simply the application of Laws' approach to each of the four tesselations of the two original scenes. In each case: the process starts with the digitized image and ends with the production of the principal component planes and statistics. The results, in image and tabular 61 formats, are then interpreted to discover what type of information is present, and if it correlates in any way with differences in the percentage of crown closure. Every digital image has been processed in exactly the same manner. The computa-tions involved are summarized in the following five steps: 1. Nine new images are produced by filtering (convolving) the original image with each of the nine 3 x 3 masks. The 9 images produced are called feature planes. 2. An image labelled L3L3SDV, which is a measure of the local contrast in the original image, is produced by computing the local standard deviation of the feature plane associated with the L3L3 mask. A moving window of size 15 x 15 is used for this computation. 3. For each of the other eight feature planes, the average of the absolute values is computed in a macro-window of size 15 x 15 successively centered over each pixel. The resulting images are called energy planes. 4. To achieve contrast invariance, each of the energy planes is divided by the L 3 L 3 S D V plane. To ensure that the resulting image will have an optimal dynamic range, the following ratio (Hord 1982) is used: I3 = R arctan (7X / I2) If the two images involved in the ratio each have a range of 0 to 255, the resulting image will also have the same range of values when R is equal to 162.34. 5. With the principal components analysis, the dimensionality of the feature space is reduced from eight images to typically three or four. In addition to these computations, for the two 1.5 m tesselations, a classification of the scene into two classes of crown closure was done using a nearest-centroid classifier. 62 The classification accuracy was assessed by visual comparison with the ground truth images. To ensure that no artefact would be caused by the edge of the image in the convolution and macro-filtering processes, a buffer of 8 pixels was left on each side of each digital image. This buffer was removed just prior to the ratio phase. Therefore, for a 5 m tesselation, the size of the image at the beginning of the process is 528 x 528, but is reduced to 512 x 512 for the energy transforms and principal component planes. The pixel values are stored in integer format. Therefore, to minimize the loss of information due to truncation in the ratioing process, the dynamic range of the energy planes and the L3L3SDV plane was modified to fill the full 0-255 range (8 bits per pixel). In the experiments reported here, the principal components are derived from the correlation matrix instead of the more commonly used variance-covariance matrix. The choice between these two matrices is not a trivial one to make (see the appendix). One possible justification for using the correlation matrix in this particular case is the fact that the possible range of response values is not the same for all of Laws' masks. For example, if the original image has 256 (0-255) levels of quantization, the E3E3 filter can return values in the range -510 to 510, while the L3S3 filter can return values in the range -2040 to 2040. It is therefore possible that for a pattern similar to the one described by the E3E3 mask, the value returned by the L3L3 filter is very close to the one returned by the E3E3 filter. But, since the dynamic range of each energy plane is standardized prior to the ratioing process, the principal components derived from both matrices should be roughly similar. 63 . Apart from the main experiments, other trials to simplify and to speed up the com-putation and to improve the results were also conducted. The results of these trials are summarized at the end of the next chapter. 64 Chapter 6 Results and Discussion The investigation of Laws' texture energy measures was carried out first with the digital images derived from the Lake Agata scene. The Lake Relay scene served afterward to test the analysis for consistency. In this chapter, the digital images are referred by the name of the site and by their spatial resolution, e.g. L. Agata - 10 m. The output of the principal components analysis is in the form of statistics and images. Four groups of statistics are computed by the program PRINCOM: 1. The correlation matrix, which indicate how similar are the eight texture energy transforms. 2. The eigenvalues and the percentage of the total variation explained by each principal component. 3. The eigenvectors or the weight of each input image in the composition of a given principal component. 4. The coefficients of correlation between the texture transforms and the principal component planes. These coefficients indicate what type of local feature dominates a given principal component. 65 There is a lot of redundancy in these data. Therefore, only some of these statistics are reported and discussed in the following sections. Since the three coarser tesselations gave very similar results, the presentation and discussion of the results is done in two phases: first, for the 5, 10 and 20 m resolutions; and then for the 1.5 m resolution. The figures and tables are grouped at the end of this chapter. To facilitate their interpretation, here is a simple description of the spatial pattern represented in each For the sake of simplicity, the texture transforms are hereafter referred by the name of the filter involved, e.g. the ESLS texture transform. First, the results for the Lake Agata scene are described. The correlation matrices for the texture energy transforms associated with the 5, 10 and 20 m spatial resolutions are presented in Table 6.1. At all three spatial resolutions, the high coefficients of filter: E3E3 E3L3 E3S3 L3E3 L3S3 S3E3 S3L3 S3S3 Crossing of edges Horizontal edge Horizontal edge (complex pattern) Vertical edge Vertical line Vertical edge (complex pattern) Horizontal line Central peak (Laplacian derivative) 6.1 The Coarser Spatial Resolutions 66 correlation indicate that the texture transforms are all measuring essentially the same thing, responding to the same features of the original image. Consequently, the first principal component explains virtually all (93 to 96%) of the variation occurring in each data set. This is clearly shown in Table 6.2. Except for the second component at the 20 m resolution, all the other components explain very little of the total variation, that is less than 2%, and are therefore ignored. The coefficients of correlation between the principal components planes and the tex-ture transforms are presented in Table 6.3. The texture transforms are all strongly correlated with the first principal component and have little correlation with the sub-sequent components, except for the second component at the 20 m resolution. This is consistent with the high coefficients of correlation observed between the texture trans-forms themselves. As an aside, the three possible outcomes of the spatial filtering performed by con-volving a sub-image with any of Laws' masks are: • If there is no variation occurring in the image window, that is the pixels have almost the same gray level, then, due to the zero mean property, all eight filters will return a value very close to zero. • If the sub-image presents a clear spatial pattern, then the mask depicting the most similar pattern will return the highest value. This is in relative terms, since the range of possible response values is not the same for all masks, as discussed in Section refrange. • If some variation occurs in the window but with no local cohesion, i.e. no complex texture primitives, then all the filters will return relatively 67 similar values. This last outcome seems to be the best explanation for the results obtained with the three coarser resolutions. In the original image (see Figure 5.2), one can effectively detect a significant variation in the forest cover, but no clearly defined local features. The variation is essentially occurring at the level of single pixels, not of aggregates of pixels. The preceeding observations, based on the statistics, are confirmed by the visual interpretation of the principal component images. The first two principal component planes for the Lake Agata - 5 m tesselation are given as examples in Figure 6.1. In the first component, regions where little or no variation occurs, such as the lakes and the swamps, are sharply contrasted with those where significant variation is apparent. The high values, or bright tones, indicate the presence of sufficient variation and correspond essentially to the forest cover. The low values, or dark tones, locate the water bodies and barren sites. The second and subsequent principal component images are very noisy and do not correlate visually with the stratification of the scene into classes of crown closure shown in Figure 6.2. The three tesselations derived from the Lake Relay scene were also processed with Laws' technique. The results are comparable, as shown by the statistics presented in Tables 6.4, 6.5 and 6.6. 68 The stronger second component obtained at the 20 m resolution with both scenes remains unexplained, even after a careful visual analysis of the second component plane. In both cases, the maximum coefficient of correlation involves the E3L3 texture trans-form. It suggests that these two second components would be locating concentrations of horizontal features (lines, edges) not detected at the 5 and 10 m resolutions. But whatever information is mapped in these components, it does not correlate visually with the percentage of crown closure. With this type of forest scene and these coarse spatial resolutions, Laws' approach fails to generate textural features good enough to support a detailed and useful segmentation of the forest cover. It can at best provide a rough segmentation of the image in only two classes: forest and non-forest. Such a simple segmentation could be achieved more efficiently without texture analysis. Instead, the segmentation could be done with an image transform produced by filtering the original image with a low-pass (averaging) filter. In such a transform, the bright values stand for the swamp areas, the dark ones for the water, and everything else in between is the forest. One could argue that the window sizes selected for the computation of the texture transforms may have been too big. But in the the case of the micro-window (Laws' masks), the size of 3 x 3 is a minimum for the detection of local features using the convolution operation. Even features having the extent of only two pixels will be detected by filters of this size. For the macro-window, a test was done by computing the texture transforms with a 5 x 5 macro-window instead of 15 x 15. The results were no better. 69 6.2 Lake Agata — 1.5 m At this higher resolution, the results of the texture analysis, presented in Table 6.7, are substantially different from those obtained with the coarser resolutions. The first difference is the wider range of values in the correlation matrix, with a low of 0.494 and a high of 0.920. This indicates that some of the texture energy transforms are detecting local features with more spatial extent than a single pixel. But with a minimum correlation close to 0.5, there is still an important common factor among the texture transforms. Examples of texture energy transforms are presented in Figure 6.3. As a consequence of this lower correlation, three principal components are required to explain 95% of the total variation in the data set, comparatively to only one at the coarser resolutions. The correlation coefficients between the texture transforms and the principal compo-nent planes are quite informative. The first component, which correlates strongly with all the texture transforms, is the common factor. As mentioned in Section 6.1, all the filters return relatively similar values when the size of the texture primitives is one pixel. Areas where this type of variation occurs are delineated in the first component. The second component correlates significantly more with the E3L3 texture transform than with any other. It is thus responding to concentrations of horizontal features, lines or edges. For the third principal component, the maximum coefficient of correlation is associ-ated with the L3E3 texture transform which is sensitive to vertical features. 70 The visual interpretation of the principal component planes agrees with these ob-servations. The first three components are illustrated in Figure 6.4. By overlaying the original image (Figure 5.2) with each of the significant principal component images, it is possible to determine what local features of the three-dimensional world are detected by each component. The first component is sensitive to high concentrations of spot-like features. Such concentrations are a characteristic of dense stands, in which the bright tips of the crowns are heavily contrasted with the dark and opaque background of shadows, with little or no ground level vegetation being visible. The second component detects concentrations of horizontal edges and lines. The only important horizontal features in the original image (L. Agata - 1.5 m) are elongated tree shadows and the radial displacement in the horizontal (East-West) direction. These local features are characteristic of open stands. As opposed to dense stands in which the shadows merge together to form an opaque mask, in open stands, the shadows of the trees are in sharp contrast with the lighter tones of the lower vegetation. The shadows have an elongated form because the photograph was taken early in the morning, when the sun is at a low elevation and almost directly from the East (see Section 5.1). The third component is dominated by vertical (North-South) features. These vertical lines and edges are caused by the phenomenon of radial displacement. Again, this can only be observed in open stands, where the crowns are not substantially masked by shadows or obstructed from view by surrounding trees. To summarize, the first principal component locates stands having a high percentage 71 of crown closure, and the second and third components locate the stands with a more open canopy. With these three principal component images serving as input to a nearest centroid classifier, the supervised classification of the scene presented in Figure 6.5 was produced. The salt and pepper effect typical of the per-pixel classifiers has been removed by process-ing the segmented image with a median filter. The lake was delineated by thresholding a transform of the original image produced by filtering it with a local averaging filter. The final product is a classification in four strata: dense stands, open stands, barren sites and water bodies. The main boundaries between the dense and the open stands match roughly the ones observed in the segmentations produced by the photo-interpreters. Two of the three human interpretations are illustrated in the Figures 6.6 and 6.7. The photo-interpreters must stratify the variations in crown closure into four classes, but a lot of subjectivity is involved. By comparing the computer-aided segmentation with the two photo-interpretations, it is easy to see that the areas identified as dense stands by the classifier, correspond to a mixture of the classes 2 and 3 used by the photo-interpreters, while the open stands delineated by the computer correspond to a mixture of classes 3 and 4. 72 6 . 3 Lake Relay — 1.5 m The same analysis was performed with the 1.5 m tesselation of the Lake Relay scene. The statistics of the principal components analysis are given in Table 6.8. The results differ in nuances only. The first principal component is still locating con-centrations of spot-like features, while horizontal features are highlighted in the second one, and vertical features in the third one. But the portion of the total variation mapped in the second and third principal components is significantly smaller than for the Lake Agata - 1.5 m scene. Two main reasons explain these weaker components. First, the image (L. Relay - 1.5 m) was extracted closer to the centre of the aerial photograph than for the other scene, and the effect of the radial displacement is much smaller, reducing the number and the extent of some of the horizontal and vertical features. Secondly, the open stands include a certain percentage of hardwoods which cast a more fuzzy shadow than the cone shaped conifers. A supervised classification was done using the process described in the preceeding section, but with only the first two components as input. The two principal compo-nent images are shown in Figure 6.8, followed by the computer-aided segmentation in Figure 6.9 and one of the three photo-interpretations in Figure 6.10. The results of this experiment indicate that the segmentation of forest scenes contain-ing mixed and hardwood stands may be difficult. But it should be possible to consistently differentiate the dense and open softwood stands from the rest of the land cover. 73 6 . 4 Improving the Technique This implementation of Laws' approach to texture analysis could certainly be improved. For example, the computation of the texture energy measures could be speeded up by developing specialized software. With less obvious modifications, it may be possible to detect more subtile differences in textural patterns. In this section, a few ideas for possible improvements are discussed, some of which have been implemented and tested. 6.4.1 Local Rank Correlation Harwood et al. (1983) have proposed an alternative to the convolution operation required in the computation of the texture energy transforms. Their idea was to replace it by Spearman's rank correlation coefficient. The new operation consists in computing the cross-correlation of a ranked version of Laws' masks with a ranked version of the moving window. Although Harwood et al. did not explore this avenue in their paper, the rank cor-relation method seems to be an interesting alternative to achieve contrast invariance, since ranking has the effect of standardizing the local variances. To investigate this idea further, the required software was developed and tested with the L. Agata - 1.5 m scene. The only change made to the process is the replacement of the convolution and ratio steps by the local rank correlation. With this modification, the computation of the texture energy transforms is much faster. The visual interpretation of the principal component planes was deceptive. While the 74 first two components are quite similar in appearance to those described in Section 6.2, the third component, accounting for 13% of the variation, and the fourth one accounting for 5%, are very noisy and extremely difficult to translate into anything useful. The main reason for these poor results is that undesirable artefacts are introduced by the ranking process itself. Although ranking standardizes the local variance and reduces the effect of noise, it may also break the local structure in the moving window. For example, a linear feature is a sequence of pixels having roughly but not necessarily exactly the same intensity values. Ranking expands these small differences, while it reduces the contrast with the background. Therefore, in extreme cases, a dark line sitting in a bright neighbourhood may be mapped into a messy checkerboard. The original pattern can be dramatically altered. Laws' original approach gave much more useful results than this special implementation based on the rank correlation method. 6.4.2 No Ratioing According to Laws (1980), the impact of the contrast invariance property on the final result, the segmented image, is small. In an attempt to speed up substantially the whole process, the ratio step required to achieve contrast invariance was dropped. With this modified sequence of computations, the texture energy transforms were produced for the L. Agata - 1.5 m scene. In the resulting first principal component image, the steep edges occurring at the boundary between the lake and the forest, or the swamp and the forest, produced, through convolution, high values which are confused with the high 75 values characterizing the dense stands. Therefore, when dealing with forest scenes, the contrast invariance property should not be overlooked. In Figure 6.11, the L3L3SDV plane necessary for the ratioing is presented. The steep edges are clearly outlined in it. 6.4.3 Local Histogram Standardization Another way to derive texture energy measures with the contrast invariance property is to standardize the local variance everywhere in the image before computing the texture energy measures. It scales down the problematic steep edges. This is done in the following steps: • The mean and standard deviation are calculated in a moving window. • The scale and the offset required to transform these statistics to a desired mean and a desired standard deviation are computed. • The value of the center pixel is then modified by applying to it the scale and the offset. The result is stored in a new image. One problem with this simple technique is that in areas where there is very little variation, such as lakes, the small variation gets expanded dramatically. This is undesir-able. It introduces new texture patterns where none was present. To solve this problem, a threshold is set. If the calculated standard deviation falls below it, only the offset is applied to the center pixel. The technique was tested with the L. Agata - 1.5 m scene. Some improvement was noted. The principal components are more contrasted and more of the total variation is 76 concentrated in the first 3 principal components (97.6 vs 94.6% in Section 6.2). Unfortunately, the computation is quite lengthy and this offsets completely the sav-ings realized by removing the ratio step. 6.4.4 Customized Filters It may be tempting to replace Laws' masks by a smaller number of customized filters that would detect more specific features. But the set of masks proposed by Laws spans the space of 3 x 3 matrices, therefore, any new filter of that size can be expressed as a linear combination of Laws' masks. And since the principal component analysis concentrates the total variation in the first few components, there is nothing to gain in overall performance by developing customized filters. 6.4.5 Selecting a Subset of Laws' Masks Another appealing idea would be to use only the subset of Laws' masks that correlate strongly with the first 3 or 4 principal components. This has been tried, again with the L. Agata - 1.5 m scene. The principal components images produced using the four best filters are more fuzzy, i.e. the regions in the images are less differentiated. The conclusion is that although some filters correlate poorly with the original principal components (see Table 6.7), the total weight of these filters in each of the first 3 or 4 principal components is not negligeable. They are required to obtain sufficient contrast in the principal components images for subsequent classification. 77 Lake Agata - 5 m principal component / Lake Agata - 5 m principal component 2 Figure 6.1: Lake Agata - 5 m — The First Two Components 78 Figure 6.2: Lake Agata - 5, 10, 20 m — A Human Interpretation The classes of crown closure are: (1) 81% and above; (2) 61 - 80% ; (3) 41 - 60% ; (4) 25 - 40%. The other classes are: (reg) regeneration and (np) non productive (mainly swamps). 79 Lake Agata - 1.5 m S3S3 energy transform Figure 6.3: Lake Agata - 1.5 m — Two Texture Energy Transforms 80 Lake Agata - 1.5 m principal component 1 Figure 6.4: Lake Agata - 1.5 m — The First Three Components 81 Figure 6.5: Lake Agata - 1.5 m — A Computer-Based Classification 82 Figure 6.6: Lake Agata - 1.5 m — A First Human Interpretation The classes of crown closure are: (1) 81% and above; (2) 61 - 80% ; (3) 41 - 60% ; (4) 25 - 40%. The other classes are: (reg) regeneration and (np) non productive (mainly swamps). 83 Figure 6.7: Lake Agata - 1.5 m — A Second Human Interpretation The classes of crown closure are: (1) 81% and above; (2) 61 - 80% ; (3) 41 - 60% ; (4) 25 - 40%. The other classes are: (reg) regeneration and (np) non productive (mainly swamps). 84 Lake Relay - 1.5 m principal component / Lake Relay - 1.5 m principal component 2 Figure 6.8: Lake Relay - 1.5 m — The First Two Components 85 Lake Relay - 1.5 m supervised classification Figure 6.9: Lake Relay - 1.5 m — A Computer-Based Classification 86 Figure 6.10: Lake Relay - 1.5 m — A Human Interpretation The classes of crown closure are: (1) 81% and above; (2) 61 - 80% ; (3) 41 - 60% ; (4) 25 - 40%. The other classes are: (reg) regeneration and (np) non productive (mainly swamps). 87 Figure 6.11: Lake Agata - 1.5 m — An Example of L 3 L 3 S D V Plane 88 Table 6.1: L. Agata - 5, 10, 20 m — Correlation Matrices Lake Agata - 5 m E3E3 E3L3 E3E3 1, .000 E3L3 0, .949 1, .000 E3S3 0, .977 0, .929 L3E3 0, .951 0, .889 L3S3 0. .966 0, .905 S3E3 0. .975 0. .940 S3L3 0. .956 0. .954 S3S3 0. .965 0, .922 Lake Agata - 10 m E3E3 E3L3 E3E3 1 .000 E3L3 0 .953 1. .000 E3S3 0 .975 0 .935 L3E3 0 .946 0, .892 L3S3 0 .957 0 .913 S3E3 0 .982 0, .945 S3L3 0 .968 0, .961 S3S3 0 .973 0 .933 Lake Agata - 20 m E3E3 E3L3 E3E3 1 .000 E3L3 0, .893 1, .000 E3S3 0 .971 0, .820 L3E3 0 .928 0, .842 I.3S3 0 .950 0. .801 S3E3 0 .974 0, .850 S3L3 0 .961 0, .857 S3S3 0 .960 0, .822 E3S3 L3E3 L3S3 1.000 0.946 1.000 0.977 0.962 1 .000 0.973 0.928 0 .955 0.949 0.902 0 .931 0.978 0.925 0 .958 E3S3 L3E3 L3S3 1.000 0.954 1.000 0.980 0.970 1 .000 0.970 0.929 0 .953 0.958 0.916 0 .942 0.979 0.932 0 .960 E3S3 L3E3 L3S3 1.000 0.894 1.000 0.976 0.908 1.000 0.970 0.880 0.951 0.955 0.876 0.947 0.979 0.869 0.954 89 S3E3 S3L3 S3S3 1.000 0.968 1.000 0.978 0.942 1.000 S3E3 S3L3 S3S3 1.000 0.979 1.000 0.983 0.965 1.000 S3E3 S3L3 S3S3 1.000 0.983 1.000 0.981 0.961 1.000 Table 6.2: L. Agata - 5, 10, 20 m — Variance Explained Lake Agata - 5 ra pel pc2 pc3 pc4 I Var. expl . 95.53 1.90 1.01 0.54 Cumulative 95.53 97.43 98.44 98.99 pc5 pc6 pc7 pc8 I Var. expl . 0.39 0.28 0.20 0.14 Cumulative 99.38 99.66 99.86 100.00 Lake Agata - 10 m p e l pc2 pc3 pc4 % Var. expl . 96.00 Cumulative 96.00 1.84 97.83 0.86 98.69 0.44 99.13 pc5 pc6 pc7 pc8 % Var. expl . 0.38 Cumulative 99.51 0.23 99.74 0.17 99.90 0.10 100.00 Lake Agata - 20 m p e l pc2 pc3 pc4 % Var. expl . 93.00 Cumulative 93.00 3.40 96.39 2.00 98.40 0.59 98.99 pc5 pc6 pc7 pc8 % Var. expl . 0.50 Cumulative 99.49 0.28 99.77 0.15 99.92 0.08 100.00 90 Table 6.3: L. Agata - 5, 10, 20 m — Correlation: Energy Planes vs Components Lake Agata - 5 m pel pc2 E3E3 0. .990 -0. . 003 E3L3 0, .958 0, .227 E3S3 0, .989 -0, .054 L3E3 0. .959 -0, .214 L3S3 0. .979 -0. .148 S3E3 0. .987 0, .047 S3L3 0, .972 0, .164 S3S3 0. .981 -0, .016 Lake Agata - 10 m pel pc2 E3E3 0, .989 -0 .028 E3L3 0, .961 -0 .207 E3S3 0. .989 0, .061 L3E3 0. .962 0 .229 L3S3 0. .979 0 .157 S3E3 0. .988 -0, .066 S3L3 0. .981 -0, .131 S3S3 0. ,986 -0 .013 Lake Agata - 20 m pel pc2 E3E3 0 .990 0 .034 E3L3 0 .891 0 .430 E3S3 0 .982 -0 .135 L3E3 0 .933 0 .132 L3S3 0 .972 -0 .142 S3E3 0 .985 -0 .082 S3L3 0 .978 -0 .053 S3S3 0 .977 -0 .139 pc3 pc4 pc5 0.026 -0.031 -0.017 0.140 -0.094 0.025 -0.068 -0.033 0.062 0.158 0.023 -0.074 -0.005 0.026 0.115 -0.090 0.029 -0.073 -0.002 0.156 0.008 -0.149 -0.075 -0.046 pc3 pc4 pc5 -0.008 -0.070 -0.090 0.168 0.050 -0.032 -0.041 0.088 -0.033 0.111 -0.081 0.004 0.012 0.082 0.053 -0.096 -0.065 0.008 -0.010 -0.035 0.124 -0.128 0.030 -0.033 pc3 pc4 pc5 0.003 -0.003 -0.071 -0.128 -0.056 0.026 0.004 -0.096 -0.027 0.326 0.062 -0.027 0.091 -0.086 0.131 -0.102 0.073 -0.029 -0.108 0.132 0.085 -0.081 -0.028 -0.084 91 pc6 pc7 pc8 0.123 0.012 -0.039 -0.031 -0.017 0.009 0.007 0.074 0.058 -0.035 0.016 0.006 -0.010 -0.063 -0.017 0.022 -0.065 0.055 -0.019 0.033 -0.022 -0.061 0.010 -0.050 pc6 pc7 pc8 -0.072 0.002 -0.035 0.026 -0.018 0.007 -0.028 0.068 0.035 0.047 0.018 0.007 -0.041 -0.061 -0.017 -0.002 -0.048 0.056 -0.014 0.042 -0.020 0.087 -0.004 -0.034 pc6 pc7 pc8 0.101 -0.023 -0.038 -0.020 0.000 0.005 0.037 0.064 0.033 -0.028 0.009 0.007 -0.006 -0.041 -0.010 0.004 -0.057 0.047 0.006 0.047 -0.016 -0.099 0.000 -0.028 Table 6.4: L. Relay - 5, 10, 20 m — Correlation Matrices Lake Relay - 5 m E3E3 E3L3 E3E3 1. .000 E3L3 0, .933 1. .000 E3S3 0, .970 0, .927 L3E3 0, .942 0, .890 L3S3 0. .948 0, .914 S3E3 0. .956 0, .940 S3L3 0, .946 0. .958 S3S3 0. .945 0. .925 Lake Relay - 10 m E3E3 E3L3 E3E3 1. .000 E3L3 0, .959 1. .000 E3S3 0, .989 0 .949 L3E3 0. .969 0, .924 L3S3 0. .976 0 .936 S3E3 0, .989 0, .958 S3L3 0, .974 0, .971 S3S3 0, .981 0, .952 Lake Relay - 20 m E3E3 E3L3 E3E3 1. .000 E3L3 0. .874 1. .000 E3S3 0. .984 0, .839 L3E3 0. .955 0. .836 L3S3 0. .963 0. .805 S3E3 0. .979 0. .861 S3L3 0. .972 0. .875 S3S3 0. .969 0, .829 E3S3 L3E3 L3S3 1 .000 0 .942 1 .000 0 .975 0 .962 1 .000 0 .968 0 .930 0 .962 0 .953 0 .914 0 .945 0 .975 0 .925 0 .964 E3S3 L3E3 L3S3 1 .000 0 .962 1 .000 0 .985 0 .975 1 .000 0 .986 0 .957 0 .974 0 .973 0 .937 0 .961 0 .988 0 .955 0 .975 E3S3 L3E3 L3S3 1.000 0.951 1.000 0.978 0.968 1.000 0.976 0.932 0.952 0.967 0.927 0.948 0.985 0.941 0.968 92 S3E3 S3L3 S3S3 1.000 0.974 1.000 0.983 0.956 1.000 S3E3 S3L3 S3S3 1.000 0.983 1.000 0.989 0.973 1.000 S3E3 S3L3 S3S3 1.000 0.982 1.000 0.980 0.966 1.000 Table 6.5: L. Relay - 5, 10, 20 m — Variance Explained Lake Relay - 5 m pel pc2 pc3 pc4 % Var. expl. 95.43 Cumulative 95.43 1.74 97.17 1.05 98.22 0.68 98.89 pc5 pc6 pc7 pc8 % Var. expl. 0.46 Cumulative 99.35 0.35 99.70 0.19 99.89 0.11 100.00 Lake Relay - 10 m pel pc2 pc3 pc4 % Var. expl. 97.24 Cumulative 97.24 1.25 98.48 0.64 99.12 0.28 99.40 pc5 pc6 pc7 pc8 % Var. expl. Cumulative 0.25 99.65 0.20 99.85 0.10 99.95 0.05 100.00 Lake Relay - 20 m % Var. expl. Cumulative pel 94.39 94.39 pc2 3.17 97.57 pc3 1.22 98.79 pc4 0.39 99.17 pc5 pc6 pc7 pc8 % Var. expl. 0.31 Cumulative 99.49 0.28 99.77 0.14 99.91 0.09 100.00 93 Table 6.6: L. Relay - 5, 10, 20 m — Correlation: Energy Planes vs Components Lake Relay - 5 m pel pc2 E3E3 0. .978 -0. .022 E3L3 0. .958 0. .229 E3S3 0. .987 -0. .052 L3E3 0. .960 -0, .217 L3S3 0, .982 -0, .122 S3E3 0, .987 0, .044 S3L3 0. .978 0. .137 S3S3 0. .982 0. .003 Lake Relay - 10 m pel pc2 E3E3 0. .993 -0. .014 E3L3 0. .969 0. .202 E3S3 0. .993 -0. .038 L3E3 0, .973 -0. .173 L3S3 0. .987 -0. .111 S3E3 0. .993 0. .019 S3L3 0. .985 0. .121 S3S3 0. .990 -0. .003 Lake Relay - 20 m pel pc2 E3E3 0 .991 -0 .006 E3L3 0 .888 0 .455 E3S3 0 .989 -0 .090 L3E3 0 .967 -0 .065 L3S3 0 .977 -0 .153 S3E3 0 .986 -0 .019 S3L3 0 .983 0 .022 S3S3 0 .984 -0 .099 pc3 pc4 pc5 -0.088 0.176 0.047 -0.137 -0.035 -0.087 0.057 0.084 -0.067 -0.136 -0.082 0.036 0.030 -0.062 -0.069 0.099 -0.017 0.065 0.011 -0.059 0.103 0.156 -0.006 -0.029 pc3 pc4 pc5 -0.003 -0.073 -0.000 0.124 0.011 -0.053 -0.069 0.012 -0.037 0.136 -0.030 0.030 -0.002 0.107 -0.014 -0.065 -0.046 0.032 -0.029 0.038 0.101 -0.085 -0.019 -0.058 pc3 pc4 pc5 -0.013 0.019 0.118 0.042 -0.037 -0.016 -0.027 -0.070 0.040 0.229 0.058 0.001 0.091 -0.031 -0.058 -0.122 0.033 0.015 -0.117 0.108 -0.058 -0.073 -0.083 -0.046 94 pc6 pc7 pc8 -0.003 0.030 -0.026 -0.032 0.006 0.005 0.047 -0.061 0.044 -0.048 -0.031 0.007 0.080 0.065 -0.019 -0.048 0.056 0.053 0.083 -0.037 -0.017 -0.080 -0.028 -0.048 pc6 pc7 pc8 -0.073 0.009 0.032 -0.000 0.007 -0.007 -0.049 -0.046 -0.031 0.033 -0.018 -0.007 -0.016 0.036 0.011 0.017 0.053 -0.031 0.009 -0.027 0.011 0.079 -0.014 0.022 pc6 pc7 pc8 0.030 -0.002 -0.042 0.005 0.005 0.001 0.034 -0.036 0.055 -0.051 -0.017 0.010 0.080 0.050 -0.012 -0.060 0.069 0.024 0.040 -0.040 -0.000 -0.069 -0.028 -0.035 Table 6.7: L. Agata - 1.5 m — Principal Components Analysis CORRELATION MATRIX E3E3 E3L3 E3S3 L3E3 L3S3 S3E3 S3L3 S3S3 E3E3 1.000 E3L3 0, .747 1. .000 E3S3 0. .902 0. .593 1, .000 L3E3 0, .773 0, .683 0, .701 1 .000 L3S3 0. .866 0, .566 0 .920 0, .769 1.000 S3E3 0. .912 0. .696 0. .865 0, .697 0.794 1.000 S3L3 0. .869 0. .770 0. .774 0, .596 0.727 0.876 1.000 S3S3 0. .849 0. .494 0, .926 0, .571 0.856 0.873 0.775 VARIANCE EXPLAINED pel I Var. expl. 79.94 Cumulative 79.94 pc5 % Var. expl. 1.42 Cumulative 97.95 pc2 pc3 pc4 8.60 6.06 1.93 88.53 94.60 96.53 pc6 pc7 pc8 0.87 0.63 0.55 98.82 99.45 100.00 CORRELATION - ENERGY PLANES VS PRINCIPAL COMPONENTS pel pc2 pc3 pc4 pc5 pc6 pc7 pc8 E3E3 0, .970 0, .022 -0. .016 0. .027 0 .007 0 .224 -0 .002 -0 .073 E3L3 0, .767 0 .589 -0. .124 -0. .168 -0, .130 -0 .041 0 .015 -0 .019 E3S3 0, .941 -0, .250 0. .055 -0, .111 -0 .092 0 .045 -0 .054 0 .150 L3E3 0 .804 0 .265 0. .501 0, .148 0 .032 -0 .044 -0 .070 0 .002 L3S3 0, .913 -0, .209 0. .236 -0. .178 0, .124 -0 .045 0 .122 -0 .030 S3E3 0. .943 -0, .033 -0. .157 0. .241 -0, .083 -0, .035 0 .126 0 .038 S3L3 0, .895 0, .126 -0. .348 0. .023 0 .233 -0 .036 -0 .060 0 .034 S3S3 0 .896 -0. .381 -0. .111 0. .002 -0 .100 -0 .095 -0 .087 -0 .107 95 Table 6.8: L. Relay - 1.5 m — Principal Components Analysis CORRELATION MATRIX E3E3 E3L3 E3S3 L3E3 L3S3 S3E3 S3L3 S3S3 E3E3 1 .000 E3L3 0 .893 1 .000 E3S3 0 .904 0 .814 1. .000 L3E3 0 .929 0 .849 0. .906 1. .000 L3S3 0 .898 0 .811 0 .977 0, .935 1 .000 S3E3 0 .922 0 .867 0 .959 0 .901 0 .940 1. .000 S3L3 0 .898 0 .916 0, .903 0, .873 0 .889 0, .961 S3S3 0 .871 0 .786 0, .981 0, .869 0 .954 0. .961 1.000 0.896 1.000 VARIANCE EXPLAINED pel pc2 pc3 pc4 % Var. expl, Cumulative 91.53 91.53 4.01 95.54 2.18 97.73 0.90 98.63 pc5 pc6 pc7 pc8 I Var. expl. 0.75 Cumulative 99.38 0.32 99.70 0.17 99.87 0.13 100.00 CORRELATION - ENERGY PLANES VS PRINCIPAL COMPONENTS pel pc2 pc3 pc4 pc5 pc6 pc7 pc8 E3E3 0 .956 0, .129 0. .152 0 .210 -0 .000 0 .035 -0, .005 -0 .015 E3L3 0 .905 0. .393 -0. .056 -0 .063 0 .133 -0 .025 0. .010 0 .002 E3S3 0 .973 -0, .194 -0, .020 0 .008 0 .074 0 .023 -0, .047 0 .069 L3E3 0 .949 0, .023 0, .276 -0 .099 -0 .087 -0 .061 -0, .018 0 .004 L3S3 0 .968 -0, .181 0. .079 -0 .092 0 .048 0 .091 0, .049 -0 .027 S3E3 0 .982 -0. .042 -0, .122 0 .047 -0 .081 -0 .045 0, .078 0 .033 S3L3 0 .958 0. .138 -0, .191 -0 .044 -0 .133 0 .057 -0, .043 -0 .014 S3S3 0 .957 -0 .237 -0, .115 0 .029 0 .052 -0 .076 -0 .025 -0 .053 96 Chapter 7 Conclusion In recent years, with the advent of computers, digital imagery and multispectral sensors, attempts have been made to replace the visual interpretation of photographs by more cost efficient, less time consuming, and more reliable techniques from the field of digital image processing. The increasing needs for mass screening of aerial photographs for the management of precious forest resources have prompted the adaptation of these computer-aided solutions to the interpretation of forest scenes. So far the main focus has been on the analysis of spectral signatures. Unfortunately, only limited successes have been achieved with this approach. Species composition, topography, age, stand structure, and a few other factors greatly affect the spectral properties of forest stands. To complement the spectral analysis, it is also necessary to process spatial information. Textural analysis is one way of doing so. In the past 15 years, several techniques have been proposed to extract and quantify the textural information but veiy few applications to forest scenes are reported (Triendl 1972; Iisaka el al. 1978; Schneider 1978; Strahler 1981; Teillet el al. 1981). 97 In this thesis one of the most recent additions to this series of techniques has been investigated. The technique, based on texture energy measures (Laws 1980), has been implemented and tested with two panchromatic aerial photographs digitized at four spatial resolutions. The main conclusion of these tests is that given an adequate spatial resolution, the texture energy measures can provide a reliable segmentation of the forest cover in two classes of textures corresponding to stands with open and closed canopies. With the coarser tesselations, at 5, 10 and 20 m, the textural features failed to partition the forest cover in a meaningful way. This result casts some doubts on the usefulness of texture analysis with forest scenes recorded by the Landsat MSS (79 x 79 m) and Thematic Mapper (30 x 30 m) sensors, and even by the sensors of Spot (10 x 10 m and 20 x 20 m). Of course, it is always possible that a different approach to texture analysis could produce better textural features but this remains to be demonstrated. The outcome of the experiments done with the 1.5 m resolution is more promising. The texture energy measures were distinctive enough to provide a segmentation of the forest scenes in two classes of crown closure. A visual comparison of the computer-aided classification with three interpretations made by experienced photo-interpreters indicate that the four segmentations have a lot in common. It also reveals that although the au-tomatic interpretation may lack the detail of the human interpretation, it does not suffer from subjectivity. This relative success confirms the results obtained by Triendl (1972) and Iisaka el al. (1978) who were among the first to report successful segmentations of forest scenes using textural features. Their data was also at relatively high spatial 98 resolutions. The results reported in this thesis are quite informative about the key image features differentiating open stands from dense stands. This differentiation requires more than detecting tree crowns and estimating their density. In fact, the crowns are often so poorly contrasted with their surroundings that they can't be reliably detected. The true clues are the tree shadows. Open stands are characterized by discrete and highly contrasted shadows. These appear as linear features with a specific orientation. In dense stands, the shadows tend to merge together to form an opaque mask. This dark background is overlayed with a multitude of shiny spots corresponding to the brightly lit tip of the dominant crowns. This textural pattern is typical of dense stands. The phenomenon of radial displacement associated with perspective projections also permits the differentiation of sparse stands from dense stands. It generates linear features which are only observable in open stands, where they are not masked by shadows or obscured from view by surrounding trees. The forest scenes used for this thesis included only a few mixed stands and no pure deciduous stands of sufficient dimensions. Because of the more fuzzy and larger shadows of hardwoods, texture energy measures may not provide as good a segmentation as with coniferous stands. This issue requires further research. Another topic for further investigations is the effect of topography on the perceived textures. Also, more intelligent segmentation algorithms are needed. From a practical standpoint, the texture analysis of forest scenes, as implemented in this thesis, is relatively expensive in terms of computational resources. For exam-99 pie, to compute the first three principal components for a central section (19 cm x 19 cm) of an aerial photograph (1:15000) digitized at a 1.5 m spatial resolution with 256 levels of quantization, the storage requirements would peak at 47 megabytes during the process and the computations could take several hours on a Vax 11/780. And this is all required to detect only broad differences in the percentage of crown closure. But as we move towards Fifth Generation computer systems, the technological advances in terms of software and hardware will make texture analysis and other image processing techniques more affordable and more applicable to forestry problems. We may see the development in the near future of sophisticated inferential systems designed especially to produce detailed photo-interpretations based on spectral and textural signatures, and taking into account various types of ancillary information such as topography. 100 Bibliography Aggarwala, R.K. 1978. Role of texture in computer-aided signature analysis - a case study. Proc. ISP and IUFRO Int. Symp. on Remote Sensing for Observation and Inventory of Earth Resources and the Endangered Environment, Freiburg, W. Germany, 473-489. Avery, T.E. and G.L. Berlin. 1985. Interpretation of aerial photographs. 4th ed., Minneapolis, MN: Burgess Publ. Co., 554 p. Ballard, D.H. and C. M. Brown. 1982. Computer vision. Englewood Cliffs, NJ: Prentice-Hall, 523 p. Beck, J., S. Prazdny and A. Rosenfeld. 1981. A theory of textural segmentation. Com-puter Vision Laboratory, Univ. of Maryland, Tech. Rep. 1083, 79 p. Carlton, S.G. and O.R. Mitchell. 1977. Image segmentation using texture and gray level. Proc. IEEE Computer Society Conference on Pattern Recognition and Image Processing, New-York, 387-391. Conners, R.W. and CA. Harlow. 1978. Equal probability quantizing and texture analysis of radiographic images. Computer Graphics and Image Processing, 8, 447-463. Davis, L.S. 1982. Image texture analysis: recent developments. Proc. IEEE Symp. on Pattern recognition and image processing, Las Vegas, Nevada, 214-217. Eklundh, J.O. 1979. On the use of Fourier phase features for texture discrimination. Computer Graphics and Image Processing, 9, 199-201. Faugeras, O. and W.K. Pratt. 1980. Decorrelation methods for texture feature ex-traction. IEEE Trans, on Pattern Analysis and Machine Intelligence, PAMI-2, 323-332. Franklin, J. 1984. Characterizing the spatial pattern in forest stands for improved canopy reflectance modeling. Proc. Surveying and Mapping Conv., Washington D.C, American Society of Photogrammetry, 10 p. 101 Galloway, M. 1975. Texture analysis using gray level run lengths. Computer Graphics and Image Processing, 4i 172-179. Haralick, R.M. 1979. Statistical and structural approaches to texture. Proc. IEEE, 67, 786-804. Haralick, R.M., K. Shanmugan and I. Dinstein. 1973. Textural features for image classification. IEEE Trans, on Systems, Man and Cybernetics, SMC-S, 610-621. Harris, R. 1980. Spectral and spatial image processing for remote sensing. Int. Journal of Remote Sensing, 1, 361-375. Harwood, D., M. Subbarao, and L.S. Davis. 1983. Texture classification by local rank correlation. Computer Vision Laboratory, Univ. of Maryland, Tech. Rep. 1314, l i p . Hoffer, R.M. 1978. Biological and physical considerations in applying computer-aided analysis techniques to remote sensor data, in Remote sensing: the quantitative approach. P.H. Swain and S.M. Davis (Eds.), New York: McGraw-Hill, 227-289. Hoffer, R.M. and Staff. 1975. Computer-aided analysis of Skylab multi-spectral scanner data in mountainous terrain for land use, forestry, water resource and geologic applications. Final Report on Contract No NAS9-13380, Skylab EREP Project 398, LARS Contract Report 121275, Purdue University, 398 p. Hoffer, R.M., M.D. Fleming, L.A. Bartolucci, S.M. Davis and R.F. Nelson. 1979. Digital processing of Landsat MSS and topographic data to improve capabilities for com-puterized mapping of forest cover types., Laboratory for Applications of Remote Sensing, Purdue University, Tech. Rep. 011579, 169 p. Hord, R.M. 1982. Digital image processing of remotely sensed data. New York: Aca-demic Press, 256 p. Hsu, S. 1978. Texture-tone analysis for automated land-use mapping. Photogrammetric Engr. and Remote Sensing, 441 1393-1404. Hsu, S.Y. 1979. The Mahalanobis classifier with the generalized inverse approach for automated analysis of imagery texture data. Computer Graphics and Image Pro-cessing, 9, 117-134. Iisaka, J . 1979. Texture analysis by space filter and application to forestype classifica-tion, Proc. Fifth IEEE Symp. on Machine Processing of Remotely Sensed Data, Purdue University, 392-393. Iisaka, J., E. Nakata, Y. Ishii, M. Imanaka and Y. Miyazaki. 1978. Application of texture analysis and image enhancement techniques for remote sensing. Proc. Twelfth Int. Symp. on Remote Sensing of the Environment, Ann Arbor: MI: Environ. Res. Inst, of Michigan, 1957-1971. 102 Irons, J.R. and G.W. Petersen. 1981. Texture transforms of remote sensing data. Remote Sensing of Environment, 11, 359-370. Johnson, R.A. and D.W. Wichern. 1982. Applied multivariate statistical analysis., Englewood Cliffs, NJ: Prentice-Hall, 549 p. Julesz, B. 1965. Texture and visual perception. Scientific American, 212 (2), 38-54. Julesz, B. 1975. Experiments in the visual perception of texture. Scientific American, 2S2 (4), 34-43. Julesz, B. 1984. Toward an axiomatic theory of preattentive vision, in Dynamic Aspects of Neocortical Function, G.M. Edelman, W.E. Gall and W.M. Cowan (Eds.), Neuroscience Research Foundation Inc., 585-612. Julesz, B. and J.R. Bergen. 1983. Textons, the fundamental elements in preattentive vision and perception of textures. The Bell System Technical Journal, 62, 1619-1645. Julesz, B., E.N. Gilbert and L.A. Shepp. 1973. Inability of humans to discriminate be-tween visual textures that agree in second-order statistics - revisited. Perception, 2, 391-405. Julesz, B., E.N. Gilbert and J.D. Victor. 1978. Visual discrimination of textures with identical third-order statistics. Biological Cybernetics, SI, 137-140. Kalensky, Z. and L. Sayn-Wittgenstein. 1974. Thematic map of Larose forest from ERTS-1/MSS digital data. Canadian Surveyor, 28, 113-118. Kalensky, Z. and L.R. Scherk. 1975. Accuracy of forest mapping from Landsat com-puter compatible tapes. Proc. Tenth Int. Symp. on Remote Sensing of Environ-ment, Ann Arbor, MI: Environ. Res. Inst, of Michigan, 1159-1167. Kalensky, Z. and J.M. Wightman. 1976. Automatic forest mapping using remotely sensed data. Proc. XVI IUFRO World Congress, Oslo, Norway, reprinted by Forest Management Institute, Environment Canada, 21 p. Kan, E.P., D.L. Ball, J.P. Basu and R.L. Smelser. 1975. Data resolution versus forestry classification and modeling. Proc. Second Symp. on Machine Processing of Re-motely Sensed Data, Purdue University, 1B-24-1B-36. Laws, K.I. 1980. Textured image segmentation. Image Processing Institute, University of Southern California, USCIPI Rep. 940, 178 p. Markham, B.L. and J.R.G. Townshend. 1981. Land cover classification accuracy as a function of sensor spatial resolution. Proc. Fifteenth Int. Symp. of Remote Sensing of Environment, Ann Arbor, MI: Environ. Res. Inst, of Michigan, 1075-1090. 103 Marr, D. 1976. Early processing of visual information. Philosophical Transactions of the Royal Society of London, B275, 483-524. Marr, D. 1982. Vision. New York: W.H. Freeman and Company, 397 p. McDonnell, M.J. 1981. Box-filtering techniques. Computer Graphics and Image Pro-cessing, 17, 65-70. Mitchell, O.R., C.R. Myers and W. Boyne. 1977. A max-min measure for image texture analysis. IEEE Trans, on Computers, C-26, 408-414. Moik, J.G. 1980. Digital processing of remotely sensed images. Nasa, Pub. SP-431, U.S. Government Printing Office, 330 p. Morrison, D.F. 1976. Multivariate statistical methods. 2nd ed., New York: McGraw-Hill, 415 p. Nevatia, R. 1982. Machine perception. Englewood Cliffs, NJ: Prentice-Hall, 209 p. Newnham, R.M. 1968. The generation of artificial populations of points (spatial pat-terns) on a plane. Canadian Forest Service, Forest Management Institute, Inf. Rep. FMR-X-10, 28 p. plus appendices. Pietikainen, M., A. Rosenfeld and L.S. Davis. 1981. Texture classification using aver-ages of local pattern matches. Computer Vision Laboratory, Univ. of Maryland, Tech. Rep. 1098, 22 p. Pietikainen, M., A. Rosenfeld and L.S. Davis. 1983. Experiments with texture classi-fication using averages of local pattern matches. IEEE Trans, on Systems, Man and Cybernetics, SMC-1S, 421-426. Pratt, W.K. 1978. Digital image processing. New York: John Wiley and Sons, 750 p. Pratt, W.K., O.D. Faugeras and A. Gagalowicz. 1978. Visual discrimination of stochas-tic texture fields. IEEE Trans, on Systems, Man and Cybernetics, SMC-8 , 796-804. Riley, M.D. 1981. The representation of image texture. Artificial Intelligence Labora-tory, Mass. Inst, of Technology, AI-TR-649, 67 p. Roessel, J. van. 1971. Automated mapping of forest resources from digitized aerial photographs. Proc. IUFRO, Joint Report by Working Group on Application of remote sensors in Forestry, 177-188. Rosenfeld, A. and E.B. Troy. 1970. Visual texture analysis. Proc.IEEE Symp. on Feature Extraction and Selection in Pattern Recognition, LM. Garnett (Ed.), Ar-gonne, Illinois, IEEE Publ. 700-910, 115-124. 104 Rowe, J.S. 1972. Forest regions of Canada. Publication 1300, Canada Department of Environment, Canadian Forestry Service, Ottawa, Ontario, 172 p. Sadowski, F.G., W.A. Malila and R.F. Nalepka. 1978. Applications of MSS systems to natural resource inventories. Proc. of the National Workshop on Integrated Inven-tories of Renewable Natural Resources, Fort Collins, Colorado, Rocky Mountain Forest and Range Experiment Station, USDA Forest Service, Gen. Tech. Rep. RM-55, 248-256. Sayn-Wittgenstein, L. 1970. Patterns of spatial variation in forests and other natural populations. Pattern Recognition, 2, 245-253. Sayn-Wittgenstein, L. and Z. Kalensky. 1974. Interpretation of forest patterns on computer compatible tapes. Proc. Second Canadian Symp. on Remote Sensing, Guelph, Ontario, 268-276. Schneider, W. 1978. Extraction of textural signatures from forest aerial photographs using a special-purpose digital computer. Proc. Int. Symp. on Remote Sensing for Observation and Inventory of Earth Resources, Freiburg, W. Germany, Int. Arch, of Photogram., XXII (7), 507-517. Skellam, J.G. 1952. Studies in statistical ecology. I Spatial pattern., Biometrika, 39, 346-362. Starr, D.W. and A.K. Mackworth. 1978. Exploiting spectral, spatial and semantic constraints in the segmentation of Landsat images. Canadian Journal of Remote Sensing, 4, 101-107. Strahler, A.H. 1981. Stratification of natural vegetation for forest and rangeland inven-tory using Landsat digital imagery and collateral data. Int. Journal of Remote Sensing, 2, 15-41. Strahler, A.H., T.L. Logan and C.E. Woodcock. 1979. Forest classification and in-ventory system using Landsat, digital terrrain,and ground sample data. Proc. Thirteenth Int. Symp. on Remote Sensing of Environment, Ann Arbor, MI: Environ. Res. Inst, of Michigan, 1541-1557. Strahler, A.H. and L. Xiaowen. 1981. An invertible coniferous forest canopy reflectance model. Proc. Fifteenth Int. Symp. on Remote Sensing of Environment, Ann Arbor, MI: Environ. Res. Inst, of Michigan, 1237-1244. Tamura, H., S. Mori and T. Yamawaki. 1978. Textural features corresponding to visual perception. IEEE Trans, on Systems, Man and Cybernetics, SMC-8, 460-473. Teillet, P.M., D.G. Goodenough, B. Guindon, J.F. Meunier and K. Dickinson. 1981. Digital analysis of spatial and spectral features from airborne MSS of a forested 105 region. Proc. Fifteenth Int. Symp. of Remote Sensing of Environment, Ann Arbor, MI: Environ. Res. Inst, of Michigan, 883-904. Triendl, E.E. 1972. Automatic terrain mapping by texture recognition. Proc. Eighth Int. Symp. on Remote Sensing of Environment, 771-776. Vandermeer, J. 1981. Elementary mathematical ecology. New York: John Wiley and Sons, 294 p. Wallis, R. 1977. An approach to space-variant restauration and enhancement of images, in Image Science Mathematics., CO. Wilde and E. Barrett Eds., (Proceedings of the Symposium on Current Mathematical Problems in Image Science), North Holliwood, CA: Western Periodicals, 107-111. Whiteford, G. 1978. Aerial photograph interpretation as remote sensing, in Introduction to remote sensing of the environment. B.F. Richason Jr. (Ed.), Dubuque, IA: Kendall/Hunt Publ. 41-58. Weszka, J.S., C.R. Dyer and A. Rosenfeld. 1976. A comparative study of texture mea-sures for terrain classification. IEEE Trans, on Systems, Man and Cybernetics, SMC-6, 269-285. Woodcock, C.E., A.H. Strahler and T.L. Logan. 1980. Stratification of forest vegetation for timber inventory using Landsat and collateral data. Proc. Fourteenth Int. Symp. on Remote Sensing of Environment, Ann Arbor, MI: Env. Res. Inst, of Michigan, 1769-1787. Zsilinsky, V.G. and S. Pala. 1978. Digital analysis of Landsat data for selective forest stand typing - An initial feasibility study. Proc. ISP and IUFRO Symp. on Re-mote Sensing for Observation and Inventory of Earth Resources and Endangered Environment, Freiburg, W. Germany, 1763-1778. 106 Appendix Principal Components Analysis The principal components analysis, also known as the Karhunen-Loeve Transform, is essentially a technique used to analyze the covariance structure of a data set, or alternatively its correlation structure. The technique is covered in detail in Johnson and Wichern (1982) and Morrison (1976). With this technique, the total variability observed in a data set composed of n observations of p variables, can be mapped into a new system of p orthogonal axes. In the context of image processing, the variables are images of a same area and the observations are the pixel values. A typical data set would be the four spectral bands of a Landsat MSS scene. Often, in such a set of images, high correlations are found. The principal components provide a new set of images that are uncorrelated and ranked in order of decreasing variance. The three general uses for the principal component analysis in image processing are: image compression, feature selection, and image enhancement (Moik 1980). To fully reproduce the total variance in a system of p variables, p principal components are 107 required. Image compression, or data reduction, is achieved by retaining only those first few components that explain all but a negligible part of the total variance. For feature selection, the statistics produced by the principal component analysis indicate which ones among the original images convey the most specific or most discriminant information. Finally, the interpretation of a set of images may be simplified with the enhancements provided by a principal component transformation: the first component usually highlights the features which are common to all the input images, and the sub-sequent components present more specific information. The principal components can be derived from the covariance matrix SpXp or from the correlation matrix i2 p Xp • The resulting components are generally not the same and differ by more than a simple scaling factor (Morrison 1976). The components are affected by the scale in which the variables are measured. If an image has a wider range of values, it will likely have a larger variance than another one with a substantially smaller range, and will consequently be weighted more heavily in the first few components. There is no definitive rule to tell which matrix to select. If it is important that the principal components be expressed in the original units, then the covariance matrix must be used. If the original units are not meaningful and if the only interest is to decorrelate the data set, then the correlation matrix is the proper choice. Using the correlation matrix is equivalent to performing the principal component analysis on a data set of standard scores: 108 where Z{j is a standard score, X{j is an original observation i on variable j, Xj and a y are respectively the mean and the standard deviation associated with variable j. Computation of the Principal Components The principal components are computed in the following three steps: 1. Computation of the covariance matrix Spxp for the sample data set Xpxn . The correlation matrix i2p X p is computed if needed. 2. Extraction of the eigenvectors (characteristic vectors) of the covariance or the correlation matrix. 3. Transformation of the data with the eigenvectors serving as a new set of axes. By definition, the kth. principal component is the linear combination whose coefficients e/t,- are the elements of the normalized eigenvector associated with the largest eigenvalue A* of the covariance matrix. The constraints are: PCk = six orthogonality : 0 unit length : 1 109 The variance of each component is equal to the corresponding eigenvalue. The total system variance, given by the trace of the covariance matrix S , is equal to the sum of the variances of the p components, that is to the sum of the p eigenvalues. Thus, the proportion of the variance explained by component k is given by A* / tr(S). The correlation of the i th image and the fcth component is given by e,£ v/AJt / \fS~ii. The correlation between the components and the original images indicate which image contributed the most variance to each of the component. This is of great help for the interpretation of the principal component planes. When the components are derived from the correlation matrix R , the total "vari-ance" of the standard scores is given by the trace of the correlation matrix, tr(R) , which is p , the number of variables. The correlation between the i t h image and the kth. component becomes e,-fc v/XJb and the proportion of the "variance" explained is Xk/tr(R). The Program Princom The program PRINCOM was developed to perform the principal components analyses required for this thesis. The features of this special implementation include: • possibility to select the covariance matrix or the correlation matrix. • can deal with a maximum of 8 input images and 8 principal components. • option to output only the statistics or to produce also up to p principal components, where p is the number of input images. 110 • to reduce the loss of information by truncation during the conversion to the integer format in which the images are stored, an automatic scaling of the standard deviation of each principal component plane is provided. The resulting images have a standard deviation of 64 and a mean of 128 for a range of 256 gray levels. An example of a complete output of the program PRINCOM is presented in the next pages. The overflow and underflow statistics given at the end of this output concern the case when a pixel value to be written to a principal component plane is outside of the 0—255 range (8 bits per pixel). This is caused by the scaling and the shifting of the mean and it usually involves 5% or less of the total number of pixels. I l l PRINCIPAL COMPONENTS ANALYSIS INPUT IMAGES: iml = 'relay.10.el' im2 = 'relay.10.Ie' im3 = 'relay.10.ss' DIMENSIONS: rows= cols= 256 256 STATISTICS FROM INPUT IMAGES iml im2 im3 minimum maximum mean 0 255 204 .1 0 255 211.8 0 255 211.9 VARIANCE-COVARIANCE MATRIX iml im2 im3 iml im2 im3 811.2 716.0 1032.4 740.0 989.3 1448.8 CORRELATION MATRIX iml im2 im3 iml im2 im3 1.0000 0.9241 0.9523 1.0000 0.9555 1.0000 112 PRINCIPAL COMPONENTS (using correlation matrix) pel pc2 pc3 eigenvalue 2.88805e+00 7.59343e-02 3.60123e-% variance 96.27 2.53 1.20 cumulative 96.27 98.80 100.00 COEFFICIENTS pel iml 0.5750 im2 0.5757 im3 0.5814 eigenvectors) pc2 pc3 0.7255 -0.3782 -0.6872 -0.4431 -0.0371 0.8128 CORRELATION - Images vs principal components pel pc2 pc3 iml 0.9772 0.1999 -0.0718 im2 0.9783 -0.1894 -0.0841 im3 0.9880 -0.0102 0.1542 IMAGE VALUES OUT OF RANGE (occurrences): pel pc2 pc3 underflow 3989 1216 1623 overflow 0 1537 1931 113 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0096668/manifest

Comment

Related Items