Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The rapid recovery of three-dimensional structure from line drawings Rensink, Ronald Andy 1992

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_1992_fall_rensink_ronald_andy.pdf [ 9.66MB ]
JSON: 831-1.0051339.json
JSON-LD: 831-1.0051339-ld.json
RDF/XML (Pretty): 831-1.0051339-rdf.xml
RDF/JSON: 831-1.0051339-rdf.json
Turtle: 831-1.0051339-turtle.txt
N-Triples: 831-1.0051339-rdf-ntriples.txt
Original Record: 831-1.0051339-source.json
Full Text

Full Text

T H E R A P I D R E C O V E R Y O F T H R E E - D I M E N S I O N A L S T R U C T U R E F R O M L I N E D R A W I N G S by R O N A L D A N D Y R E N S I N K B.Sc . (Physics ) , The University of Waterloo , 1979 M.Sc . (Phys i cs ) , The University of B r i t i s h Co lumbia , 1982 M.Sc . (Computer Science), The University of B r i t i s h C o l u m b i a , 1986 A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F D O C T O R O F P H I L O S O P H Y in T H E F A C U L T Y O F G R A D U A T E S T U D I E S D E P A R T M E N T O F C O M P U T E R S C I E N C E We accept this thesis as conforming to the required standard T H E U N i v ^ R S I T Y O F B R I T I S H C O L U M B I A September 1992 © Ronald A n d y Rensink, 1992 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. 1 further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of <^<^-^p<yi/v7ii sc/py^ce The University of British Columbia Vancouver, Canada Date g y V . 2 ' ^z, DE-6 (2/88) Abstract A computat ional theory is developed that explains how line drawings of polyhedral objects can be interpreted rapidly and i n parallel at early levels of human vision. The key idea is that a t ime- l imited process can correctly recover much of the three-dimensional structure of these objects when split into concurrent streams, each concerned with a single aspect of scene structure. The work proceeds in five stages. The first extends the framework of M a r r to allow a process to be analyzed i n terms of resource l imitat ions. Two main concerns are identified: (i) reducing the amount of nonlocal information needed, and (ii) making effective use of whatever information is obtained. The second stage traces the difficulty of line interpretat ion to a smal l set of constraints. W h e n these are removed, the remaining constraints can be grouped into several relatively independent sets. It is shown that each set can be rapid ly solved by a separate processing stream, and that co-ordinating these streams can yield a low-complexity "approx imat ion" that captures much of the structure of the original constraints. In part i cu lar , complete recovery is possible in logarithmic t ime when objects have rectangular corners and the scene-to-image projection is orthographic. The th i rd stage is concerned w i t h making good use of the available information when a fixed time l imit exists. This l imit is motivated by the need to obtain results wi th in a t ime independent of image content, and by the need to l imit the propagation of inconsistencies. A m i n i m a l architecture is assumed, v iz . , a spatiotopic mesh of simple processors. Constraints are developed to guide the course of the process itself, so that candidate interpretations are considered in order of their l ikel ihood. The fourth stage provides a specific algorithm for the recovery process, showing how it can be implemented on a cellular automaton. F ina l ly , the theory itself is tested on various line drawings. It is shown that much of the three-dimensional structure of a polyhedral scene can indeed be recovered in very l i tt le t ime. It also is shown that the theory can explain the rap id interpretation of line drawings at early levels of human vision. Contents Abstract i i List of Figures v i List of Tables ix Acknowledgements x 1 Introduction 1 1.1 The Prob lem 2 1.2 The Approach 4 1.3 L imitat ions and K e y Assumptions 6 2 Background 8 2.1 R a p i d Para l le l Processing 8 2.1.1 Computat i ona l Studies 9 2.1.2 Psychophysical Studies 19 2.1.3 Computat i ona l versus Psychophysical Studies 26 2.2 The Interpretation of Line Drawings 28 2.2.1 Computat i ona l Studies 28 2.2.2 Psychophysical Studies 34 2.2.3 Computat i ona l versus Psychophysical Studies 38 2.3 High-level versus Low-level V is ion 39 2.3.1 T h e Structure of Low-level V is ion 39 2.3.2 The Role of R a p i d ParaUel Recovery 42 2.4 The Analys is of Resource-Limited Processes 43 2.4.1 M a r r ' s Framework 44 2.4.2 Extensions 45 2.4.3 A Revised Framework 47 2.5 R a p i d Line Interpretation 50 2.5.1 Basic Terms 50 2.5.2 Formulat ion of the Prob lem 51 3 Low-Complexi ty Recovery 54 3.1 General Issues 55 3.1.1 Concurrent Streams 55 3.1.2 Reduction to Canonica l Forms 56 3.1.3 Approx imat i on Strategies 60 3.2 Indiv idual Dimensions 61 3.2.1 Cont igui ty Label l ing 62 3.2.2 Convexity Label l ing 67 3.2.3 Slant Sign Label l ing 70 3.2.4 Slant Magni tude Label l ing 75 3.3 Integration of Dimensions 78 3.3.1 Convex Objects 79 3.3.2 C o m p o u n d Convex Objects 81 3.3.3 Rectangular Objects 85 4 Computat ional Analysis 100 4.1 E x t e r n a l Constraints 101 4.1.1 Image-to-Scene mapping 101 4.1.2 General Principles 103 4.1.3 Struc tura l Assumptions 105 4.1.4 System of E x t e r n a l Constraints 106 4.2 Internal Constraints 108 4.2.1 Processing Architecture 109 4.2.2 General Principles 110 4.2.3 Selection of In i t ia l Candidates 118 4.3 The R a p i d Recovery Process 124 4.3.1 Arch i tec tura l Specifications 124 4.3.2 Robustness 126 4.3.3 Basic Operation 126 5 A l g o r i t h m and Implementation 129 5.1 The Cel lular Processor 129 5.1.1 Basic aspects 130 5.1.2 Cel lular Processors as Cellular A u t o m a t a 131 5.1.3 Programming 133 5.2 A l g o r i t h m for R a p i d Recovery 136 5.2.1 Determinat ion of Basic Image Properties 137 5.2.2 Determinat ion of Junct ion Properties 137 5.2.3 In i t ia l Assignment of Interpretations 142 5.2.4 Propagat ion of Interpretations 142 5.2.5 F i n a l Assignment of Results 144 5.3 Neura l Implementation 144 6 Tests of the T h e o r y 146 6.1 Performance on Line Drawings 146 6.1.1 Rectangular Objects 147 6.1.2 Nonconforming Objects 151 6.1.3 Impossible Objects 157 6.2 Preattentive Recovery of Scene Structure 162 6.2.1 Basic Assumptions 162 6.2.2 Exp lanat i on of Psychophysical Results 165 7 S u m m a r y and Conclusions 173 List of Figures 1.1 E a r l y recovery of three-dimensional structure 3 2.1 Linkage between zone and surrounding locations 10 2.2 Types of junctions 31 2.3 Huffman-Clowes labell ing set 32 2.4 Penrose triangle 36 2.5 Extended computational framework 48 3.1 L i n k i n g of local constraints 59 3.2 Separation into ind iv idual dimensions 62 3.3 Cont igui ty labell ing 63 3.4 Set of contiguity constraints 64 3.5 Inconsistent drawing wi th consistent contiguity labelling 65 3.6 Reformulation of contiguity constraints 66 3.7 Set of convexity constraints 68 3.8 Inconsistent drawing wi th consistent convexity labelling 68 3.9 Reformulation of convexity constraints 69 3.10 Slant sign labell ing 71 3.11 Constraints on isolated L-junctions 72 3.12 Slant sign constraints for arrow- and Y- junctions 73 3.13 Slant sign labeUings for rectangular corners 74 3.14 Huffman-Clowes labellings for convex objects 80 3.15 Examples of compound convex objects 82 3.16 Huffman-Clowes labellings for compound convex objects 82 3.17 Free chain complexes 84 3.18 Examples of rectangular objects 85 3.19 Constraints on L-junctions 86 3.20 Huffman-Clowes labeUings for rectangular objects 86 3.21 P lanar i ty constraint 87 3.22 Interior angle constraint 87 3.23 Cont igui ty constraints on obtuse L- junct ion combinations 89 3.24 Terminat ion configurations 90 3.25 Cont igui ty constraints on Y - junc t i on combinations 92 3.26 Slant sign constraints applied to impossible figures 95 3.27 Condit ions of shallow slant 97 3.28 Combinat ions of angles into corners 97 3.29 Rescaling of image angles 98 4.1 Isolation of inconsistency in contiguity labelling 103 4.2 System of external constraints 107 4.3 E x a m p l e of complementary labell ing 112 4.4 E x a m p l e of reformulation of bijective constraint 115 4.5 Example of reformulation of nonbijective constraint 115 4.6 In i t ia l contiguity interpretations 120 4.7 In i t ia l convexity interpretations 122 4.8 In i t ia l slant-sign interpretations 123 4.9 Example of the rapid recovery process 128 5.1 Cel lular processor architecture 130 5.2 Calcu lat ion of orientation differences 139 5.3 Determinat ion of contiguity relations 141 6.1 Interpretation of convex rectangular object 149 6.2 Interpretation of nonconvex rectangular object 150 6.3 Interpretation of occluded rectangular objects 152 6.4 Interpretation of nonrectangular object 154 6.5 Interpretation of origami object 155 6.6 Interpretation of nonplanar object 156 6.7 Interpretation of object of inconsistent contiguity and convexity 158 6.8 Interpretation of object of inconsistent slant 160 6.9 Interpretation of object of inconsistent depth 161 6.10 Results explained by theory 163 6.11 Slant estimates for Condit ion A 166 6.12 Slant estimates for Condit ion B 167 6.13 Slant estimates for Condit ion C 169 6.14 Slant estimates for Condit ion E 170 6.15 Slant estimates for Condit ion G 172 List of Tables 2.1 Complexities of coherence classes Acknowledgments This work has benefited greatly from the contributions of many people. In part icular , I would like to thank the members of my committee for their help during my stay here at U B C . M a n y thanks to J i m E n n s , who has worked closely wi th me on many of the psychophysical experiments described here, and who has taught me much about the world of psychophysical experimentation. Thanks also to J i m L i t t l e , who has been a source of interesting discussions on the nature of parallel computation and its role in early vision, and to Dav id Lowe for his support and help in bridging the worlds of computational and biological vision. I am also indebted to A l a n M a c k w o r t h and W h i t m a n Richards for their many helpful comments and suggestions. I have been extremely fortunate in having several friends and associates able to critique various aspects of this work and who were kind enough to actually do so. This work has been much improved by their efforts. 1 would particularly like to thank Esfandiar B a n d a r i , James Elder , M a r c i a Grabowecky, Richard M a n n , Richard Pol lock, Greg Provan, M a r i o n Rodrigues, and M a r c Romanyc ia for their help in this regard. Space restrictions do not allow me to mention a l l the others who also have contributed, but I would like to assure them that their help is not forgotten. People can of course contribute to one's life in many other ways, and here again I have been rather fortunate in meeting many people who have helped make my stay here an enjoyable experience. In addition to the people mentioned above, I would also like to thank Debbie A k s , Françoise Guyaux , Chr i s Healey, Julie Johnson, Valerie M c R a e , D a n McReynolds , Jane M u l l i g a n , D a n RazzeH, L a n a Tr ick , Lindsey Wey, and C a r o l Whitehead. I wish you aU well . F ina l ly , 1 would like to express my deepest gratitude to my supervisor. Bob Woodham, for a l l the guidance and support I have received over the years. I have learned from him a great deal about the art of formulating and solving a scientific problem, and this work is in many ways an attempt to meet his high standards. Wi thout this guidance, and without the tolerance he has shown for my interests in sometimes rather esoteric fields, this work would not have been possible. Chapter 1 Introduction Those aspects of human vision most directly involved with the incoming image have a char-acteristic mode of operation: they are rapid (usually completed within several hundred m i l -liseconds), spatial ly paral le l (operating simultaneously across the visual field), and automatic (unaffected by changes in goals during the course of processing). This has led to an assump-t ion that these "ear ly" processes determine only simple geometric and radiometric properties of the image, e.g., line orientation, color, and contrast. There is considerable support for this assumption on computat ional grounds — these are the only kinds of properties can be reliably determined by spatial ly- l imited processors operating within a fixed amount of t ime. To reliably determine properties of the corresponding scene, therefore, a later stage of more t ime-consuming operations is needed. This division into early and later processes has formed the basis for many computat ional and psychophysical studies of the human visual system. However, the underlying assumption is false — for some Images, recovery of scene properties can be done at early stages of processing, rapidly and i n parallel [Ram88, E R 9 0 a , ER91] . In figure 1.1(a), for example, the drawing of the block w i t h a unique three-dimensional orientation can be detected almost Immediately. However, this is not possible when these drawings are altered slightly (figure 1.1(b)), showing that this phenomenon is not due to simple image properties alone, but to some aspect of the recovered scene structure. The goal of this thesis is to explain how properties of the scene can be recovered rapidly and in parallel at early levels of visual processing. In particular, it develops a computat ional theory of how the human visual system can rapidly interpret line drawings to obtain the three-dimensional structure of the corresponding polyhedral objects. Since the general problem of line interpretation is NP-complete [KP88] , a great deal of time may sometimes be required for its solution, even when parallel processing is used. If recovery is to be rapid , therefore, it cannot be based on this mapping , but rather must be based on an approximation in which the reliabihty and completeness of the output have been lowered to some degree. The central idea developed here is that a good approximation can be obtained by sp l i t t ing the recovery process into several quasi-independent streams, each based on a set of constraints that can be quickly solved. It is shown that relatively few constraints need to be altered in order to achieve this decomposition, and that the resulting "quick ajid d i r ty " process can recover a substantial amount of scene structure in very l itt le t ime. It is also shown that this model can explain the recovery of three-dimensional structure at early levels of human v is ion . In common wi th other areas of computational analysis, this study is first and foremost concerned w i t h how Information can be used by a visual system. For rapid recovery, however, the structure of the problem is no longer dictated entirely by the optics of the s ituation — Instead, Umits on processing t ime must also be taken Into account. This work shows how this perspective can be Incorporated Into a computational framework, and how It can lead to a new source of constraints on the representations and processes used In early vision. 1.1 The Problem In what follows, the scene domain Is taken to be the set of opaque polyhedral objects w i t h tr ihedral corners. The term ' t r ihedra l ' Is used here In a narrow sense, referring to corners formed from the Intersection of three planar surfaces in such a way that only three edges can radiate from any vertex, and that the vertex cannot contact any other edge. The Image domain Is the corresponding set of drawings formed by the projection of these objects onto the image plane. The rap id recovery process must recover from these drawings as much of the scene structure as possible wi th in some fixed amount of t ime. The goal of this work Is to develop a computat ional theory of this process, one which accounts for those aspects of three-dimensional structure recovered In human early vision. There are several reasons for this choice of problem. F i r s t , there Is evidence that human vision actually does recover three-dimensional structure rapidly and In parallel at early levels [ER90b, EI191, ER92 ] . The phenomenon is a striking and robust one, wi th a strong sensitivity to the arrangement of the lines. A s such, there Is considerable potential for making predictions about the kinds of line arrangements for which recovery wiU and wiU not be successful. ( a ) (b ) Figure 1.1: E a r l y recovery of three-dimensional structure. A line drawing that corresponds to a distinct three-dimensional block can be detected almost immediately when the block slants upwards (a). Ro ta t ing the page so that this block slants downwards causes detection to become more difficult, showing that slant has an asymmetry typ ica l of many properties of early vision (see [TG88 , ER90b] ) . W h e n line relations are slightly altered (b), detection is equally difficult under a l l conditions (also see [ER91]), indicating that slant is not recovered at a l l . Second, a great deal is known about the l imits to which three-dimensional structure can be recovered from line drawings,^ this problem having been the focus of several decades of work i n the area of computational vision (see section 2.2.1). Moreover, the general problem of line interpretation has been shown to be NP-complete [KP85]. Since the time required to solve an NP-complete problem can (in the worst case) increase exponentially w i t h its size,^ this rules out the possibiUty that the process can always be sped up by parallel processing alone. F ina l ly , of al l the rap id recovery processes, line interpretation is perhaps that which most severely taxes the abilities of early vision. Relations between image and scene are more ten-uous here than for most other recovery processes; indeed, many aspects of line interpretat ion are often considered to be learned conventions (see, e.g., [Sug86]). Thus , i f a mechanism can be found for the rap id interpretation of line drawings, it becomes plausible that similar mech-anisms might also exist for recovery processes based on more realistic associations between image and scene. 1.2 The Approach For a t ime- l imited process, the goal is no longer to extract al l available information from an image, but rather to make good use of the available computational resources. Two factors are therefore of pr imary concern: (i) minimiz ing the sheer amount of data transformation and transmission that needs to be carried out in parallel , and (ii) maximiz ing the effectiveness of these transformations in extracting three-dimensional structure. Th is work examines how these two factors influence the structure of the recovery process at the levels of computat ion, a lgor i thm, and implementation. Chapter 2 provides the background mater ia l for this analysis. It begins wi th a survey of the major empir ical and theoretical results on the l imits of rapid paral le l processing. This is followed by an overview of the important results concerning the recovery of three-dimensional structure from line drawings. A discussion is then presented of the ways in which these two 'Theories of line interpretation, however, have rarely taken into account noise and other distortions of the image. Complications also arise from shadow edges and texture boundaries. In the interests of simplicity, these will not be discussed here. ^Although there remains a possibility that NP-complete problems are in class P (i.e., can be carried out in polynomial time in the worst case), this situation appears highly unlikely [GJ79, Joh90]. Even if P = N P , the possibility would stiU exist that such problems are P-complete, meaning that their speed could not be substantially increased by the use of parallel processing [GR88]. threads can be drawn together. Next , M a r r ' s framework of computational analysis is extended to cover the case of resource-limited processes. T w o sorts of computational constraints are distinguished: "external" constraints on the static form of the mapping between image and scene, and " in terna l " constraints that guide the course of the process that generates i t . The analysis of rap id recovery itself begins in chapter 3, which examines the ways in which the image-to-scene mapping used in the general problem of Une interpretation can be replaced by an approximation of lower complexity. In part icular , it shows that low-complexity recovery can be carried out by weakening the constraints to allow their separation into independent subsets, each concerned w i t h a single aspect of the scene. Four such aspects are considered: the contiguity of edges, the positive convexity of edges, the sign of edge slants, and the magnitude of edge slants. It is shown that each of these subsets can be solved in subUnear t ime by a processing stream containing a sufficiently large set of parallel processors, and that this complexity is not increased when interaction between the streams involves only a one-way transmission of information. A l though the interpretative power of the resultant mapping is somewhat reduced, a considerable amount remains; indeed, it is shown that contiguity, convexity, and slant can be recovered completely in logarithmic t ime when a l l corners are rectangular, i.e., composed of mutual ly orthogonal surfaces. The next step is to develop constraints that maximize the UkeUhood of successful Inter-pretat ion when a Umlt Is placed on processing time. This Is done In chapter 4. A fixed amount of t ime Is assumed to be available. This choice Is consistent wi th the l imits typ i ca l for an early v isual process, and also has the advantage that the propagation of inconsistencies Is localized. In keeping wi th this mlnlmaUst vein, computational resources are l imited to a mesh of simple processors. A set of external constraints Is developed to hmlt the space of possible Interpretations. Four principles are used for the choice of constraints: separation of dimensions, local ity of constraints, local coordination of dimensions, and the structural assumption that the corners of the polyhedra are rectangular. Internal constraints are then developed that guide the course of the recovery process through this space of possible so-lutions. These are based on four principles: maintenance of interpretative power, locally irreversible computat ion, minimizat ion of Inconsistency, and an ordering of search to select preferred interpretations of m a x i m u m contiguity and convexity. Taken together, the exter-n a l and internal constraints define a process capable of recovering a considerable amount of three-dimensional structure In very htt le t ime. A l t h o u g h the external and internal constraints l imit the way in which the process uses information, they do not completely specify an algorithm. Chapter 5 provides this specifica-t ion , and implements the resulting algorithm on a mesh architecture. This is done v i a the device of a cellular processor. This mechanism is formed by part i t ioning the image into a set of disjoint "ce l ls" , each governed by a Unite-state processing element that can be programmed to execute a few simple operations on the contents of its ceU, and that can communicate only wi th its immediate neighbor. A s such, it obeys the general architectural Umitations assumed for the computat ional analysis while simultaneously being easy to control and analyze. T h e resulting a lgor i thm provides an existence proof that rapid line interpretation can be done on an architecture of the assumed type. The final step of the work is to test the theory on actual line drawings. In chapter 6, the process is tested on domains that range from those in which a l l underlying assumptions are obeyed to impossible figures which cannot correspond to any k ind of polyhedron at a l l . It is shown that a considerable amount of three-dimensional structure can indeed be recovered in very l i tt le t ime, and that this process degrades gracefully as the underlying s tructural assumptions about the scene domain are violated. These results are then used as the basis of predictions about the kinds of line drawings that can and cannot be rapidly detected by the human visual system. The theory is shown to be capable of explaining the abil ity of early human vision to recover three-dimensional structure rapidly and in parallel . 1.3 Limitations and Key Assumptions Before embarking on the development of the theory, it is important to acknowledge a number of Umitations and assumptions that could potentially l imit its relevance. First of a l l , the treatment here is concerned exclusively w i t h the rapid recovery of three-dimensional structure from line drawings. The advantage of this approach is that the problem domain is small and has a simple mathemat i ca l description, making the analysis of the recovery process as simple as possible. B u t this domain is a highly artif icial one — the figures contain no gaps or any other k ind of noise, nor do they describe cracks and markings which are found on just about any real surface. Indeed, these stimuli are so artificial that the analysis runs the risk of saying nothing at a l l about processes that interpret more reahstic images. The scope of the theory is also l imited by the architectural constraints required for the computational analysis. The analysis here assumes only a two-dimensional array of relatively simple processors, each connected only to its immediate neighbors (chapter 4). Since this l imi ta t i on is relatively severe, the resulting process provides a lower bound to what might be reasonably expected from a spatiotopic array of processors. But the assumption of a m i n i m a l processing architecture also means that the predictions are applicable only to the extent that such an architecture actually is representative of that used in human early vision. There is also considerable lat itude in the choice of the finer details of the theory. Several of the choices made here are somewhat tentative, intended only to show that such a theory can be developed. Consequently, they are unUkely to withstand the test of t ime. Insofar as the theory can explain the rapid recovery of three-dimensional orientation by the early human v isual system, it assumes that this system actually does carry out this process. Results to date [ER90b, E R 9 1 , ER92] show that the recovery of three-dimensional orientation at early levels is sufficient to explain most known results concerning the sensitivity of early vision to line drawings of opaque polyhedra. But while this sensitivity cannot be explained in terms of simple operations on the image (e.g., spatial filtering), there st i l l remains the possibil ity of some other "image-based" explanation, e.g., a sensitivity to particular spat ial relations between the lines, or the " loading- in" of a complete object model v ia lookup that based on image features (e.g. [PE90]). F ina l ly , even i f three-dimensional orientation actually is computed at these early levels, there is s t i l l no guarantee that the process is in any way attempting to make good use of available computat ional resources. Evolut ion often produces biological systems that are adequate rather than opt imal (see, e.g., [Ram85, Gou89]), and it may well be that rap id recovery falls Into this category. If so, its operation Is governed by constraints other than those based on effectiveness, and the computational model developed here w i l l be largely Irrelevant for explaining human performance. A scientific theory, however, ult imately succeeds or falls to the degree that It explains phenomena In a succinct way, and suggests new avenues of research to explore (e.g., [Lak78]). A s the following chapters show, the theory developed here Is able to account for the recovery of three-dimensional structure at early levels of human vision, and can make predictions as to what other kinds of line drawings can and cannot be recovered in this way. Furthermore, It does so by developing principles that are potentially applicable to other areas of perception and cognit ion, and to both artif icial and biological systems. Chapter 2 Background The problem of rapid line interpretation is a fusion of two concerns that historically have been quite separate: (i) determining the extent to which properties can be extracted rapid ly and i n paral le l from an image, and (ii) determining the extent to which line drawings can be interpreted as opaque polyhedral objects. Thus , a good way to begin is to survey the pr inc ipal developments i n each of these areas. It is then shown how aspects of both can be usefuUy combined into a rap id recovery process, and how this process can be analyzed by an extension of the computat ional framework of M a r r [Mar82]. 2.1 Rapid Parallel Processing The earliest stages of v isual processing are characterized by the uniform application of rela-tively simple operations at each location in the visual field (see, e.g., [Zuc87b, L B C 8 9 ] ) . It is evident that the problems solved at these levels make great use of parallel ism, w i th one or more processors assigned to each patch of the image. It is less evident, however, what the l imits of this k i n d of processing might be. This section surveys some of the m a i n results pertaining to rap id parallel processing. Theoretical results are presented first, w i th discussion focusing on the way in which a problem's structure determines its complexity on a parallel processor. This is followed by an overview of what is known about the extent of rapid parallel processing in human early vision. 2.1.1 C o m p u t a t i o n a l S t u d i e s T w o different routes can be taken when studying the l imits of parallel processing. T h e first starts wi th a given architecture and then determines its suitabil ity for various classes of problems. Such "processor-dependent" analysis is widely used, particularly to ascertain the capabilities of an existing machine (e.g., [PD84, L B C 8 9 ] ) . B u t the emphasis here is on problems rather than architectures per se. Consequently, a " d u a l " approach is taken: a class of problems is specified and the suitabil ity of various architectures for this class then examined. This approach can be based on the amount of coherence in the mapping between input and output image. It is shown that this coherence has a large influence on the l imi t s to which an operation can be sped up by a parallel architecture. A . Basics To estabUsh what is meant by the "su i tab i l i ty " of an architecture for a particular problem, consider first a network of Tur ing machines joined together by high-bandwidth connections. Such a " m a x i m a l " architecture obviously allows the greatest use to be made of paral le l ism, regardless of problem structure. Its generality, however, means that computational resources are often wasted. A natural architecture is therefore defined as one which best matches the given problem, i.e., which uses a m i n i m a l set of resources to carry out the task. Such an architecture can be obtained (conceptually, at least) by starting wi th a max imal architecture and then weakening the power of the ind iv idual processors and the communication patterns between them unt i l a change occurs in the t ime or space required. The m i n i m a l configurations at each of these transitions are exactly the natural architectures. Since different choices of t ime and space bounds are possible, there is usually more than one natura l architecture for a given problem. In general, finding the natura l architecture for a problem is difficult — even the mapping of neighbors in the problem space to neighbors in the architecture is N P - h a r d ^ [NKP87] . B u t when problems are based on the mapping of images to images^ the coherence in the mapping simplifies matters considerably (see, e.g., [Sto87, Sto88]). F ind ing a natura l architecture for a problem then reduces to determining its mapping coherence and relating it to the time and ' T h i s means that the problem is at least as hard as any NP-complete task. ^Without loss of generality, the calculation of a lower-dimensional result can be expressed as a mapping in which the output image contains repeated instances of the result. For example, calculation of the average value in an image can be expressed in terms of an output image containing the average at each location. <— - s —» Forward Linkaae From zone in Input To image in output - Contributes to result • • + T -s • ' • Backward Linkage From image in input To zone In output - Selects local operator Figure 2.1: Linkage between zone and surrounding locations, space required on various kinds of architectures. i) Mapping Coherence To begin w i t h , let a zone be some sxs patch of pixels in the image. Zones may overlap, so that each mxm image contains ( m - s ) ^ zones. Mapp ing coherence is described here in terms of how the input inside a zone influences the output image in the surrounding locations (figure 2.1). A zone is said to interact with the rest of the image if there is at least one direction i n which the range of this influence is unl imited . The linkage of the mapping is defined as the number of degrees of freedom in this interaction." ' For str ict ly local operations, such as those typ ica l of early vision, no interaction exists between a zone and locations sufficiently far away in the output. These are consequently zero-Unkage problems - any changes within a zone are propagated only a finite distance away. A t the opposite extreme, consider the sorting of image intensities. Here, changing any of the pixel values in the zone potentially changes the value of the output at a location arbi trar i ly far away. The linkage is therefore proport ional to zone area. ^Many of the ideas presented here regarding mapping coherence have their origin in the work of Stout [Sto87, Sto88]. However, the definition of linkage used here is considerably different, being based on degrees of freedom in a more uniform fashion. Distinctions such as unidirectional and bidirectional linkage, as well as the resulting set of coherence classes, are also novel. Linkages can run both ways, however, so that two kinds of problem can be distinguished. In unidirectional problems, the information transmitted from the zone can be computed purely locally — no pixel values are needed from the rest of the image beyond some finite surrounding region. For example, determining the average pixel intensity requires only one parameter (the sum of p ixel intensities) to be transmitted from any zone. Since the pixels outside the zone do not affect this value, this interaction is clearly unidirectional , w i t h no outside information needed. In bidirectional problems, on the other hand, the interaction between a zone and its surroundings runs both ways: not only do changes in the zone influence the rest of the image, but the rest of the image influences what is required of the zone. More precisely, the information to be transmitted from a zone cannot be determined in isolation for bidirect ional problems, since values from the rest of the image are needed to select the appropriate quanti ty to be calculated locally. A n example of this is the calculation of the median intensity of an image i n which the range of p ixel values is unHmited (e.g., random variables wi th a gaussian distr ibution) . A single degree of freedom can be assigned to the local output : the number of pixels above (or below) the global median. But the value of this median must first be t ransmit ted to the zone, and this value may be affected by changes in pixel intensities at locations arbi trar i ly far away.^ In essence, the k ind of linkage back from the image to the zone reflects the amount of contextual information needed to select the appropriate loca l operation (figure 2.1). Given this characterization of mapping coherence, problems can be grouped according to the strength and directionality of their Unkage.^ Four classes are considered here: zero l inkage, constant linkage, linkage proport ional to zone perimeter, and linkage proport ional to zone area. Constant- and perimeter-hnkage problems are further divided into unidirectional and bidirect ional subclasses. A n y operation involved in visual processing can be placed into one of these classes,^ and it is shown below that placement into a class puts bounds on its complexity on various kinds of architectures. *In a sense, the difference between unidirectional and bidirectional problems corresponds to that between deterministic and nondeterministic problems: in the unidirectional case, the outputs of isolated zones are sufficient to produce the solution, whereas in the bidirectional case, they are sufficient only to verify it. (For a discussion of the relation between deterministic and nondeterministic problems see, e.g., [GJ79].) ^The strength of the linkages can be different in the two directions. But the simple classification into unidirectional and bidirectional problems is sufficient for present purposes. ^Operations having a structure that does not match well with the examples discussed here (e.g., those involving fractal quantities) wUl require a finer division of coherence classes. But as a first approximation, lower bounds for such problems can be obtained by "rounding down" to the nearest coherence class. ii) Complexity Measures The complexity of a problem can be analyzed in a relatively processor-independent way v ia the methods of complexity theory (see, e.g., [Baa78, G J 7 9 , JohQO]). Here, the basic unit is taken to be the time required to merge two independent quantities into one — for example, the addit ion or mult ip l i cat ion of two numbers, or the testing of their equality. T h e complexity of a given algor i thm is then measured by the number of such operations required for the most difficult case i n the problem set.^ The complexity of a given problem is that of the least-complex algorithm capable of solving it on a given architecture. Differences in the speed of basic operations — such as arise in different mechanical or biological systems — are eliminated by the use of 0 -notat ion , which describes the t ime only to w i th in a constant factor. A n algorithm is said to require 0 ( / ( n ) ) time if there exist positive constants c, d and N such that for any input of size n > N, the t ime cf(n) < T{n) < df(n). If the a lgor i thm is the least complex known to solve the problem, the complexity of the problem is said to have an upper bound of 0(f{n)). Similarly, the problem has a lower complexity bound Q,{g{n)) when any algorithm to solve it must have a complexity of at least 0{g{n)). If a problem is bounded above by 0(f{n)) and below by Q(f(n)), it has (exact) complexity 0 ( / ( n ) ) (see, e.g., [Baa78, Har87]). Defined in this way, the time needed to solve a problem is - to wi th in a polynomial factor - independent of the part icular set of instructions of the machine (see, e.g., [Har87]). Th is quantity is therefore an invariant of the problem, (see, e.g., [Baa78, T W W 8 8 ] ) , and so can be used for abstract, machine-indifferent analysis [RP91]. A s for the case of mapping coherence, processes can be grouped into various complexity classes. (For a good survey of these classes, see [Joh90].) One of these is the set P of processes that can be carried out i n polynomial t ime; such processes may have a different complexity on different (serial) architectures, but this complexity wi l l always be po lynomial (see, e.g., [GJ79, Har87]) . This class can be further subdivided according to the degree that complexity is lowered by the introduct ion of parallel processing. The class NC is defined as the set of problems having subUnear complexity when a sufficient number of processors are provided.^ '^Complexity measures can also be based on average-case analysis, as well as on a probabilistic analysis that ignores exceptional cases of small measure (see [TWW88]). However, worst-case measures are those most often used, in part because of the relative ease of analysis. These measures are also preferred here since they avoid the need to develop extra procedures to handle cases in which the computational hmitations are exceeded. *More precisely, these are problems of complexity 0{\og,^ n) when 0{n'') processors are available (where exponents k,p G Z). In contrast, a class of "P-complete" problems has been found that is apparently incapable of being sped up this way; in essence, these problems remain "serial" no matter how many processors are allowed (see, e.g. [GR88]). Note that this view of complexity is based on the number of operations needed to combine data and so the t ime needed for data transmission across space is often ignored. Whi l e this is suitable for many situations, it is less so for others, especially for operations on images, where data is often moved around a considerable distance during the course of the computat ion. Transmission delays are severe in biological systems (where speeds are typically on the order of I m / s [She83]), and are also a factor in the operation of machine systems [Uhr87, p. 261]. W h e n applying complexity measures to image-processing problems,^ therefore, the effects of transmission delay must be kept in m i n d . Hi) Architectural Parameters Given that the complexity of an image-processing problem depends on the underlying architecture, it is important to estabUsh what the relevant parameters might be. In what foUows, architectures are described by graphs where each node represents a separate process-ing element ( P E ) and each edge a direct connection between the corresponding P E s . Thus , a m a x i m a l architecture corresponds to a complete graph in which each P E (equivalent to a Tur ing machine) is directly connected to aU the others. This model is superficiaUy differ-ent from the parallel random access machine ( P R A M ) often used in theoretical studies of paral le l processing (e.g., [GR88]), since the P R A M is defined as an abstract machine w i t h a shared memory immediately accessible to any of the processors.•'^ B u t this shared memory allows direct communication between P E s , and so the P R A M and m a x i m a l architectures are essentially equivalent. In this formulat ion, the complexity of a problem can be analyzed by tracing the flow of information through the network. The path taken by each piece of information can be represented by a pa th through the graph that begins at the point where it is picked up in the image, and terminates at its final position(s) in the output . The nodes of the graph are ®As used here, the terms 'image-processing problem' and 'image operation' are largely synonymous. The only difference is that the specification of a problem does not necessarily contain an explicit rule to obtain the output from the input, whereas this is generally true of the term 'operation'. ^"Strictly speaking, this characterizes only the most powerful variant: the P R I O R I T Y concurrent read -concurrent write ( C R C W ) P R A M . Since no other P R A M variants will be considered here, this qualification wiU not be explicitly mentioned. assigned weights representing the time required for local computation.^^ A weight can be assigned to each data path by accumulating the weights of aU nodes encountered along the way. The processing t ime for a computation is then the max imum weight of al l the data paths in the computation. Thus , complexity is largely governed by two sets of parameters: (i) those concerned w i t h the processing resources available to each P E , and (ii) those concerned w i t h the pattern of data transmission between P E s . A wide variety of processing elements are possible for a parallel architecture. A t one extreme, each P E has the power of a Turing machine and can operate completely indepen-dently of the others. This is basically the multiple instruct ion, multiple data-stream ( M I M D ) architecture [Fly72], in which each P E may carry out a different set of instructions. Weaken-ing the power of the P E s decreases their abiUty to respond to different signals so that they become less able to respond to the structure of the image and less flexible in communicating wi th their neighbors. In the extreme case this becomes a single instruct ion, multiple data-stream ( S I M D ) architecture [Fly72], where al l P E s operate in lockstep, carrying out the same operation everywhere in the network. A similar spectrum of possibilities exists for data transmission. The simplest network is a two-dimensional ^/n X ^/n array of n processing elements. Here, each of the processors is assigned to some particular zone or set of zones in the image, and operates in complete isolation from the others. This is the k i n d of architecture generally thought to exist at the very earliest stages of v isual processing, i.e., the ret ina and the striate cortex (e.g., [RobSO]). The simplest form of processor-processor interaction occurs in the mesh, a network, where each P E i n the array is connected with its nearest neighbors (e.g., [Ros83]). Here, data can be sent from any P E to any other P E wi th time proportional to the distance in the mesh. Transfer can be greatly sped up (at least in terms of the number of switches involved) by way of a pyramid network, in which a hierarchical communication structure is used. The basic -v/n X ^/n mesh forms the lowest level of this hierarchy. This mesh is then part it ioned into a set of A; X A; nonoverlapping sections, w i th the P E s in each section then connected to a single P E in a higher-level y/nik x y/n/k mesh. This higher-level mesh is in turn sectioned and connected to the P E s in a st i l l higher-level mesh, this process continuing u n t i l only one P E exists at the highest level (see, e.g. [Ros86]). The resulting structure is hierarchical , al lowing any two P E s separated by distance m to communicate through O ( l o g m ) switches. '^In this view, switches and memories are regarded as nodes corresponding to simple computations. Linkage A r r a y Mesh P y r a m i d Hypercube Zero Constant (unidirectional) Constant (bidirectional) Perimeter (unidirectional) Perimeter (bidirectional) A r e a 0 (1 ) 0 (1 ) 0 (1 ) 0 (1 ) 0 ( n ) 0 ( l o g n ) 0(logra) ? 0{\og^n) ? 0 ( n ) 0 (7zi /2) 0 ( l o g n ) 0{n'') 0{n) ? ? f2(n) O( l ogn ) Table 2.1: Complexities of coherence classes. P y r a m i d networks have been proposed for several of the more "g lobal" processes of v i s i on , such as line tracing [Ede87] and selective visual attention [KU84 , Tso90] A more highly-connected architecture is that of the hypercube (e.g., [Hil84]). A d-dimensional hypercube has 2'^  corners; i f the positions of neighbors in the hypercube differ by some constant distance a > 0 along any dimension, corners w i l l be separated by at most a distance of ad. Thus , i f n = 2'^  P E s are connected such that each corresponds to a different corner of the hypercube, then two P E s can communicate wi th in O ( l o g n ) t ime. A l t h o u g h this is the same as for the p y r a m i d , the greater number of possible paths yields a greater effective bandwidth , which allows the hypercube to avoid the bottlenecks that can arise at the higher levels of the pyramid [StoST]. • B . Classes of Image-Processing Problems This section presents the major results known about the l imits to which various kinds of image-processing problems can be sped up by parallel processing. Problems are grouped according the coherence of the corresponding mapping between input and output . Arranged i n this way, an interesting pattern emerges from these results - the lower-bound complexities due to data transmission are the same for al l problems in any coherence class (table 2.1.1). A n d these lower bounds prove to be the dominant factors in the complexity of many image-processing problems. i) Zero-linkage Problems B y definition, zero-linkage problems have no interaction between a zone and a locat ion that is sufficiently far away. These are exactly the problems best handled by local operations. The simplest of these are local measurements, i.e., the uniform apphcation of a spat ial ly-l imited template across the image. These include pointwise remappings of intensity (e.g., gamma correction) and convolutions by functions of l imited spatial extent. More generally, zero-Unkage problems include those that can be solved using properties of fixed support, i.e., where the property can be extracted from a fixed set of points in each zone [UU84]. The l imit on transmission distance means that each P E can complete its operation within a fixed t ime independent of image size, and so these problems have 0 (1 ) complexity. The l imi ted t ime and spatial extent also mean that each P E need only be a finite-state automaton. A n a t u r a l architecture for a zero-linkage problem is therefore a simple array in which each finite-state P E takes its input from the corresponding zone in the image. A l t h o u g h communication time is m i n i m a l in an array, an extensive amount of w i r ing is usually required to connect pixels to their P E s , especially i f the zones are large. Furthermore , such a network would be impossible to reconfigure when a different zone size is required. These drawbacks are largely eliminated by using a mesh. Here, the input is part i t ioned into nonoverlapping sections, w i t h each P E taking its input from a single section. Since P E s do not generally have direct access to a l l information in a zone, information must be t ransmit ted through the mesh. In essence, a mesh trades off t ime for space. For zero-linkage problems, the natura l mesh architecture is the cellular automaton [ T M 8 7 , C H Y 9 0 , T M 9 0 ] , for which the processing elements are simple finite-state automata . B y Umit ing the number of iterations allowed for each P E , a cellular automaton can carry out zero-linkage problems such as spatial filtering [PD84]. A s the P E s are given more power, they are able to combine simple measurements in interesting ways — for example, to determine the color or orientation of hne segments by comparing the magnitudes among a basic set of local measurements (see, e.g., [Gra85]). ii) Constant-linkage Problems Constant-Unkage problems are characterized by a positive Unkage whose strength does not depend on the size of the zone. Two variants can be distinguished: unidirectional and bidirect ional . Unidirectional problems One of the simplest unidirectional problems is to determine the average value of the pixels in an image. A s discussed in section 2.1.1, determining this quantity requires only one parameter (the sum of p ixel intensities) from any zone. Similarly, the calculation of the standard deviation is also undirectional , requiring two parameters (the sum of p ixe l intensities, together wi th the sum of their squares) to be obtained from each zone. Other problems which can be formulated this way include finding the min imum distance between black (or white) pixels in the image [Sto87], determining the center of mass [Tan84], and detecting horizontal or vertical concavities [Sto87]. A H these tasks can be carried out in O(logn) t ime on a pyramid architecture [Tan84, Ros86, Sto87]. In general, logarithmic complexity can be achieved for any process in which each P E reduces the data from the level below it to a constant amount, and passes this d a t a upwards [Sto87]. The pyramid is consequently a natura l architecture for the entire class of unidirectional constant-Unkage problems. Hypercubes allow no additional reductions of complexity. Bidirectional problems Relatively l i tt le work has been done on this class of problems. Determining the median can be done in O(log'^ra) t ime on a pyramid ; it is not known whether this quantity can be lowered [Sto87]. Another bidirectional constant-linkage problem is the determination of extreme points, i.e., those points located at the corners of the smallest convex polygon containing a l l points in the image. The complexity of this problem also is 0(log'^ n) on a p y r a m i d [Sto87]. It may be that this (provisional) hmit applies to al l such problems. Hi) Perimeter-linkage Problems A large set of problems can be characterized by linkage proportional to the perimeter of the zone. A g a i n , both unidirectional and bidirectional variants can be distinguished. Unidirectional problems This class is exemplified by connected component labell ing ( C C L ) , where each distinct component in the image is to be assigned a unique label . Note that a special case of this problem is the determination of whether aU lines in the image are connected. Provided that the components passing through the perimeter of a zone are correctly labelled (as far as the zone is concerned), no other aspect of the zone's contents are needed to solve this problem. The number of degrees of freedom is therefore equal to the number of perimeter crossings. Assuming a uniform distr ibution of components in the image, this is directly proport iona l to perimeter length. Another perimeter-linkage problem is the determination of the lengths of a l l Unes in the image. Here, two parameters (label and tota l length inside the zone) are required at each perimeter crossing. C C L can be done i n Q{n) t ime on a mesh [Sto88], 0(ra^/^) t ime on a pyramid [MS87] , and 0( log(n) ) t ime on a hypercube [LAN89] . This latter Umit is the same when a P R A M is used [SV82]. It may be that these Umits also apply to the other problems in this class. Bidirectional problems This class includes problems of constraint r e l a x a t i o n , w h i c h are characterized by l oca l measurements that require context for their complete interpretation [HZ83, KI85] . Since these problems depend only on local constraints, the effect of a zone on the result is determined by a band of pixels along the border, the exact width depending on the range of the l o ca l process. Linkage is consequently proport ional to the length of this border. But values i n the loca l zone must also be consistent wi th their surroundings, making the problem bidirectional . T w o types of relaxation problem exist: continuous and discrete. For continuous relajc-at ion , local values (as weU as interaction terms) are represented as real numbers. A m o n g other things, this allows non-zero probabiUties to be assigned to different interpretations of any local feature. The problems themselves are generally formulated in terms of max imiz ing or min imiz ing some global quantity, which then allows them to be recast as finite difference equations [HZ83]. One part icular ly interesting set of problems involves reconstructing sur-faces by finding the extremum of some global measure such as the smoothness or error of the reconstructed surface. This approach is the basis of general frameworks of visual processing such as regularization theory [PTK85] and Markov random fields [GG84] . Continuous relaxation can also be formulated in terms of Unear programming [BB82 , p. 420-430]. Since Unear programming can be done in polynomial time [Kar84], it is Ukely that continuous relaxation is of this complexity. For problems that can be cast as the solution of eUiptical equations (either Unear or nonUnear), the number of iterations required to solve the problem to wi th in a given accuracy is 0{n^), where L is the order of the equation [Bra77, p. 281]. O n a pyramid architecture, where multiresolution techniques [Bri87] can be used. These are often referred to as relaxation processes. They are described here as problems, however, since they are abstract specifications of input-output mappings that are quite independent of the particular processes used to carry them out. this is reduced to 0{n) iterations [Gla84]. In contrast to continuous relaxation, discrete relaxation requires that values assigned to pixels be integers, and that only one interpretation be allowed for each object. M a n y of these problems are NP-complete , including the interpretation of line drawings [KP85] . It is strongly suspected (although not proven) that NP-complete problems take an exponential ly large amount of t ime in the worst case [GJ79]. Assuming this to be true, the complexity of discrete relaxation results more from the cost of search than from bottlenecks on d a t a transmission. iv) Area-linkage Problems Final ly , problems exist for which linkage is proportional to the area of the zone. These area-linkage problems have m i n i m a l coherence between values in the input and the output at any location. A n example is the rotation of a discrete image by 180°. Here, a change i n one part of the input can change the output at a position arbitrari ly far away. Another example is the sorting of pixel intensities. Here again, a change in the value of a single pixel can lead to changes at locations far removed from the original zone. Area-l inkage problems involve such large amounts of data transmission that p y r a m i d architectures (and variants thereof) cannot efficiently handle the transmission of data . B o t -tlenecks exist at the higher-level P E s of the pyramid , and so considerable time is therefore required to move data over large distances. Image rotat ion, for example, is of complexity Q,{n) on a pyramid . A similar l imit exists for sorting [Sto87]. This latter value may a general l imit for area-Unkage problems on this architecture. W h e n the communication bottlenecks are bypassed by the use of more completely-connected architectures, the complexity of area-linkage problems is reduced. For example, sorting on a hypercube requires O ( l ogn ) t ime [Sto87], the lowest complexity possible on any architecture [GR88]. 2.1.2 P s y c h o p h y s i c a l S t u d i e s E a r l y vision consists of those operations in the human visual system that are rap id , spatially paral le l , and require l i t t le attention. Since these operations are directly involved with the incoming image, they are relatively easy to study empirically. Consequently, they have long been the subject of psychophysical investigation (see, e.g., [Zuc87b]). Since the focus here is on the l imits to this k ind of processing, this section surveys only the results of psychophysical studies on the descriptions used at the highest stages of early vision. A s this survey shows, there is a remarkable degree of convergence to these results. A . Basics Psychophysical studies of rap id visual processing have largely been concerned wi th those activities that occur almost instantaneously and without conscious effort. For example, when a horizontal Une is placed among a group of vertical Unes, it invariably "pops out" of the image, no matter how many vertical Unes there may be. On the other hand , detecting a T-shaped figure among L-shaped figures requires a much slower and more effortful serial scan of the display [Tre88]. This is generally taken as evidence that fast search is based on "v i sua l pr imit ives" formed rapidly and in paraUel across the visual field, while slow search is based on constructs formed serially at higher levels. This difference in performance (in both accuracy and response time) can be used to determine the set of properties determined rapidly and in parallel in early vision. For the most part , experiments have been based on one of three types of task: visual search, texture segmentation, or grouping. i) Visual Search In v isual seaxch, the task is to determine whether a displayed image contains a subset of some given collection of target patterns. Performance is generaUy measured by the accuracy or speed of the response. Psychophysical studies explore how this performance varies as a function of the number and type of target patterns In the coUectlon, the number and type of Items in the display, and the duration of the display Itself (see, e.g. [Rab78, Rab84]). V i s u a l search experiments date back to the work of Green and Anderson [GA56] , who demonstrated that search speed for a target was unaffected by variations In the shapes of the other Items, except for those of the same color. This suggested that color is available at early levels to aUow the selective processing of visual information. Further work by Neisser [Nel63] showed this to hold for simple geometrical properties as weU: target letters embedded in a group of nontargets are detected more quickly when they have a distinctive shape or orientation. More recent studies (e.g., [Tre82, TreSS, Dun89]) measure response time for a single target as a function of the number of items in the display, wi th response accuracy be ing held constant. These experiments show that if the target is sufficiently distinct from the other items, response t ime is effectively independent of the number of items present — subjectively, the target "pops out" of the display. Otherwise, detection time is roughly proport ional to the number of items in the display, wi th the constant of proportionality being twice as large for target-absent displays as for target-present ones. This latter pattern is t a k e n as evidence for a serial scanning process that terminates when the target pattern has been found [Rab78, Tre82, Tre88]. ii) Texture Segmentation A n alternative way to investigate rapid parallel processing is based on the perception of visual texture. Texture perception has several different aspects. These include obta in ing surface shape from texture gradient, determining the intrinsic structure of a surface, and finding the boundaries between regions of different texture (see, e.g., [Wil90]). M u c h of what is known about texture perception is based mostly on studies of this latter aspect, called texture segmentation. In part icular , studies have concentrated on finding the determinants of "effortless" segmentation, i.e., segmentation occurring within several hundred milliseconds of i n i t i a l viewing and w i t h no conscious scrutiny (e.g., [Jul81]). Segmentation itself has several different aspects, including the detection of regions of different texture, and the determination of the shape of the possible boundaries [Wil90]. For the most part , experiments proceed either by measuring the time required to perform these tasks to wi th in a given accuracy or by measuring performance accuracy as a function of display t ime. In both cases, a pattern of results is found that is much the same as that for visual search. Textured regions can be separated effortlessly from each other when they differ sufficiently in the density of their elements, or i f these elements are sufficiently distinct from each other (e.g., a region of horizontal lines against a region of vertical lines) [Jul86]. It must be kept in m i n d , however, that texture segmentation is a process with goals that are in many ways different from those of v isual search, and so may not necessarily involve the same set of elements. ni) Visual Grouping The representations used in early vision can also be studied by finding the determinants of v isual grouping [Bec66, Bec82]. This approach has its origins in the Gestalt laws of grouping. Disconnected elements can be grouped together into larger units (such as lines and regions) on the basis of s imilarity and spatial organization [Zuc87a]. Turning this around provides a way to define these properties operationally — similarity and spatial organization are exactly those properties that lead to visual grouping. Since studies of v isual grouping often are based on the conscious perception of grouping strength (e.g., [Bec82, SBG89]) and not on processing speed or accuracy, their results do not necessarily pertain to rap id parallel processing. But it has been found that "spontaneous" grouping is not based on the overall shapes of objects, but rather on the similarity of their "elementary par ts " [Bec82]. To the extent that these parts are consistent wi th the elements of v isual search or texture perception, they can provide a check on the descriptions formed at early levels of vis ion. Three types of grouping are commonly studied: (i) segregation into regions, (ii) segre-gation into populations, and (iii) creation of intrinsic surface structures. The first of these is similar to texture segmentation. B u t , whereas segmentation is generally concerned w i t h the boundaries of textured regions, segregation focuses on the hnking of items into distinct regions. Populat ion segregation is similar in most ways except that l ink ing based on prox imity in the image is replaced by l inking based on proximity in a more abstract space of intrinsic properties (e.g., color or orientation). Thus , for example, a group of yellow dots intermixed wi th blue dots can be separated into two distinction populations, even though no geometrical boundaries exist. Experiments are generaUy based on judgements of whether two or more kinds of features are scattered throughout the image. A l though it is also sometimes termed "texture segmentation" [Bec82], this task is conceptually quite different, involving the pooUng of v isual elements based on their intrinsic properties rather than their locations. The t h i r d type of grouping is the formation of " intr ins ic " structures, such as one-dimensional contours or two-dimensional flow patterns, which can arise even in images formed only of dots [Ste78, ZSS83]. Like the segregation of elements into regions or populations, this process is generally thought to be based on simple properties computed over local zones i n the image [Bec82]. B . Models of R a p i d V i s u a l Processing The general pattern of results from experiments on visual search, texture segmentation, and grouping is much the same: performance is governed by a small set of simple image properties such as line orientation, curvature, contrast, and color (see [TG88]). The explanat ion of these results, however, is far from straightforward. F i r s t , the hmited range of conditions over which d a t a i s collected can make it difficult to determine whether a process requires constant , logari thmic , or linear t ime. A n d there is no necessary connection between those properties that are computed quickly and those that are computed in parallel — a description m a y be the result of an extremely fast-acting serial mechanism, or conversely, a parallel mechanism may st i l l require t ime that increases wi th the size of the input [Tow72]. In spite of these reservations, several theories have been proposed to explain many aspects of the results. A l t h o u g h differing in details, these theories agree that simple properties are computed rapidly and i n paral le l at an early "preattentive" stage, and that complex properties require the application of more sophisticated serial operations at a subsequent "at tent ive" stage of processing [Bec82, Jul86 , Tre88]. i) Feature Integration Theory Feature-integration theory [Tre82, TG88] was originally developed to explain why "pop-out " i n v isual search occurs when the properties of targets differed sufficiently from those of nontargets, but not when they differed only in the spatial arrangement of their parts . Accord ing to this theory, the preattentive system is composed of a set of parallel spatiotopic maps, each describing the distr ibution of a particular property (or "feature") across the v isual field. These features are simple properties of the two-dimensional image, including or ientat ion, curvature, binocular disparity, color, and contrast [TG88]. Once these maps have been computed, a target containing a unique feature can be detected simply by checking for act iv i ty i n the relevant map [Tre88]. The separation of the maps, however, means that spatial relations between features can-not be represented explicit ly. Instead, the coherence of items is represented indirectly v ia a "master m a p " Unking together the appropriate locations in the feature maps. The compu-tat i on of complex structures therefore requires a spotlight of attention to access the master map and Unk up aU the relevant features into a coherent whole. Since this spotUght must seriaUy inspect each coUection of features present in the image, the detection of complex features requires t ime proport ional to the number of features present. Th is explains why, for example, targets distinguished only by inside/outside relations do not pop out [TG88]. In its original form, feature-integration theory did not account for several phenomena. A m o n g these were the finding that conjunctions of simple features at the same location can be rapidly detected when their constituents are strongly discriminable, and the finding that search rate increases smoothly w i t h the discriminabil ity of the st imuli [Tre88]. The first of these has since been explained by postulating an inhibit ion (or excitation) of the master map at locations where elements are strongly activated. This allows all items containing nontarget features to be effectively ignored, leaving a small remainder among which the target can be quickly detected [Tre88, C W 9 0 , TS90] . The second effect is accounted for by postulating that the spotlight of attention operates not on indiv idual items but on groups of items, the size of the group varying w i t h the discr iminabi l i ty of its members [TG88]. B o t h these refinements, however, mainta in the assumption that only simple local operations are carried out in paral le l at preattentive levels. ii) Resemblance Theory Resemblance theory [DII89, Dun89] is an alternative account of v isual search that differs from feature-integration theory in several ways. It shares the basic premise that simple features are computed at the preattentive level but postulates that the speed of search depends entirely on the resemblance between the target and nontarget patterns in the image. It explains the relatively slow search for conjunctions as due to the similarity of target and nontarget items arising from their common features. One of the more interesting aspects of this theory is that resemblance is based on the degree of transformation needed to map the features of one figure into those of another [DH89]. It therefore is a first step away from the idea that preattentive processes are necessarily based on simple loca l properties. A l t h o u g h some of the difficulty of conjunction search is apparently due to conjunction itself [Tre91], the possibihty remains that some aspects of preattentive operation are best explained in terms of features resulting from procedures applied to simple line elements. iii) Texton Theory In contrast to both feature-integration and resemblance theory, texton theory was de-veloped to account for effortless texture segmentation. Here, perceived texture is thought to depend entirely on the first-order densities of spatial patterns called textons. These are localized geometric shapes w i t h simple properties, including endpoints, elongated blobs. Une crossings, and Une segments of various lengths, widths, and orientations [Jul84a]. Texton theory explains texture segmentation by a model similar to those used for v i sua l search, wi th processing being separated into distinct preattentive and attentive systems. T h e preattentive system is composed of a set of spatiotopic maps, each describing the d is tr ibut ion of a particular texton across the visual field. Effortless segmentation occurs when the regions differ sufficiently in the first-order densities of their constituent textons. Because only texton densities are involved, textures cannot be effortlessly segmented when they differ only i n the relative arrangements of their textons (e.g., a region of L-shaped figures against a region of T-shaped figures). To separate such regions therefore requires conscious " s crut iny" by higher-level processes [Jul84a]. Textons have much in common wi th the set of features postulated for visual search. They include not only length, w i d t h , and orientation, but also color, mot ion, binocular disparity, and flicker [Jul84a]. Indeed, given that Une-crossings are no longer considered to be true textons [Not91], the two sets appear to be almost identical . Textons have even been used to explain visual search itself, using a mechanism analogous to the spotUght of attention being postulated to account for the detection of particular texton combinations [JB83]. Like feature-integration theory, texton theory has also been revised to allow groups of items to be searched in parallel w i th in Umited regions, the size of these regions varying w i t h the strength of the density gradient [Jul87]. But important differences also exist. Whereas feature-integration and resemblance theo-ries are based on the absolute presence or absence of features, texton theory posits boundaries based on the local differences between texton densities [Jul86]. Furthermore, while most (if not aU) textons have properties similar to those of preattentive features, they are quite dif-ferent ontologically: textons are geometric elements containing specific properties, and are not the properties themselves. In essence, each texton contains a conjunction of simple prop-erties. T h u s , although effortless texture segmentation cannot be based on spatial relations, it can be based on the conjunction of simple features [Jul84b]. iv) Spatial Filtering Recent attempts to provide an algorithmic framework for texture segmentation have shown that much of it can be explained in terms of the spatial filters postulated for edge detection, v iz . , localized Unear filters of differing widths and orientations (e.g.. [Cae84, B A 8 8 , VP88 ] ) . It also has been suggested that the texture boundaries themselves are determined v ia operations analogous to those used for edge-detection, w i th the array of filter outputs being smoothed and the lines of m a x i m u m change then used to mark the tex-ture boundaries [VP88 , G B 8 9 , B C G 9 0 ] . Direct psychophysical evidence has been obta ined In favor of this view [Not91]. A s a consequence, there is now some doubt about the need to mainta in textons as a separate set of texture primitives (see, e.g., [Not91]). But consensus remains that texture primitives — whatever these ult imately may be — are based only on simple local properties computed rapidly and reliably from the image. The spatial-filter model also helps to explain the grouping of image elements. Such filters respond not only to actual hues of a given orientation and length but also to simple s t ruc tura l groups having the same general outUnes, such as the " v i r t u a l Hues" formed by a row of dots of similar contrasts [Zuc86]. However, although filters are thought to be necessary for the grouping process, they are not usually beHeved to be sufficient. A p a r t from exceptions such as the rap id detection of " locally paral le l " structure In Glass patterns [Ste78]), grouping processes are generally thought to require nonlocal Integration of Information across the image (e.g., [Zuc87b]). It also appears unlikely that the properties determined at the preattentive level can be explained entirely In terms of a single set of filter-based elements. For example, populat ion grouping Is based on the lightness differences of the elements, rather than by the contrast ratios that govern region segregation. Furthermore, conjunctions of these properties do not support populat ion segregation, whereas they do support region segregation [BGS91]. T h e two sets of processes cannot therefore involve the same set of basic elements. Further support for this view comes from studies that show texture Identification and texture segmentation to be based on different sets of pr imit ive elements [Not91]. Thus , given the possible existence of several different sets of preattentive elements. It Is l ikely that at least some of them are not directly related to spat ial filters. 2.1.3 C o m p u t a t i o n a l versus P s y c h o p h y s i c a l Studies F r o m a computat ional viewpoint, there are good theoretical grounds for the assumption of a distinct stage of early v isual processing. To begin w i t h , almost aU. the properties at this level have two important characteristics: (i) they are zero-Unkage (section 2.1.1),^"^ and As a convenient way of speaking, the linkage of a property is identified with the linkage of the corresponding image-processing problem. (ii) they have a fixed support, i.e., the relevant property can be extracted from a fixed set of points in the zone. A spatially-bounded template can therefore determine the relevant property at each point and the corresponding map can be computed rapidly and in paral le l . A l t h o u g h recent experiments have shown that conjunctions of preattentive features can pop out when sufficiently distinct [TS90], this has l i tt le effect on the general argument since, as several models have shown (e.g., [CW90, TS90]) , this can be accounted for entirely by a more sophisticated search mechanism that selectively suppresses (or excites) the outputs of the simple feature maps. In contrast to these "template" properties, others are neither zero-linkage nor have a fixed support. For example, determining whether a given object is inside or outside a neighboring object cannot be done wi th in some fixed zone, since there are no Umits to the extent of the neighbor's boundaries. Even i f l imits were imposed, there would stiff be no fixed points which could always be used. Thus , a different template is required for each of the exponentially-increasing number of possible s h a p e s . I t has therefore been suggested that "nonloca l " properties, including v ir tual ly a l l types of grouping and spatial relations, are determined procedurally v i a specialized m'iiuaZ roufmes applied to earlier "base" descriptions [U1184]. M a n y of these routines are serial, spatially inhomogeneous, and are thought to be controlled by higher-level processes. A s such, their application is sometimes Unked (to greater or lesser extent) w i t h the spothght of attention required at attentive levels [UI184, TG88] . Th is point of view receives some confirmation from the finding that spatial relations such as parallel ism and inside-outside cannot be detected preattentively [TG88]. However, this grouping of visual processes into distinct early and later stages is not without its difficulties. Consider first the property of length. This is generally regarded as a pr imit ive quantity, both i n empirical studies on visual search (e.g., [TG88]) and in computat ional models of early vision (e.g., [Mar82]). But length is not a zero-linkage property — a gap arbitrari ly far away can change the value assigned to a Une. It also is not easily determined by a template, or even a set of templates along the Une, since the value from any ind iv idua l template depends on the overlap between it and the Une being measured. A t best, length might be determined from competition among the set of templates along the given Une (cf. [Zuc87a]) but this begins to introduce a nonlocal element into the computation. It also has been found that binocular disparity (and possibly depth) can be determined ^*For example, consider a surface patch divided into n intervals. If k possible values (e.g., color, height) exist for each interval, then fc" different combinations are possible. preattentively [NS86]. It is possible to call disparity a zero-linkage property, in the sense that the value at any point depends only on some finite surrounding zone in the image. B u t there is no way in which it can be given a compact fixed support — to ascertain disparity requires the matching of patches in the left and right images, and the contents of these patches can be quite arbitrary. Match ing must therefore be done procedurally. Recent results have also shown that the preattentive system can determine properties such as direction of Ughting and three-dimensional orientation — properties not of the image, but of the scene to which It corresponds [RamSS, E R 9 0 a , ERQOb, ER91] . Such recovery appears to have a nonlocal procedural component, since its success does not depend completely on the presence or absence of any part icular local property, but instead depends on the entire system of line relations present in the item [ER90b, E R 9 1 , ER92] . These findings ca l l Into question the basic assumptions behind the conventional assignment of visual processes to early and later levels. In part icular , they call into question the reasons for believing that three-dimensional structure cannot be rapidly determined by a spatiotopic array of para l le l processors. 2.2 The Interpretation of Line Drawings The problem of Une interpretation is one that is simple enough to allow easy formulat ion and experimental manipulat ion , yet complex enough that Its solution requires at least some degree of InteUlgence. A s such, it provides an interesting arena in which to study processes of a type generally thought to be restricted to higher levels of cognition. This section surveys some of the m a i n results of the computational and psychophysical studies that have been carried out in this area. In part icular , It examines the case where the drawings correspond to two-dimensional projections of opaque polyhedra. Some of the more important theoretical results are first surveyed. This is foUowed by an overview of what is known about the abiUty of humans to Interpret such drawings. 2.2.1 C o m p u t a t i o n a l S t u d i e s The problem of determining the three-dimensional structure of an object from its corre-sponding Une drawing has been the focus of a considerable amount of work in the field of computat ional vision (see, e.g., [CF82]) . This section reviews several important results that have been obtained. These results have for the most part been developed wi th in a single framework - the blocks world. W h e n the basic assumptions of this framework are met, a great deal can be said about what can and cannot be recovered from a hne drawing. A . Basics E a r l y work on the machine interpretation of Une drawings (e.g., [Rob65]) attempted to analyze scenes composed of a smal l set of known polyhedra. The goal was to identify Unes in the image w i t h edges of part icular instances of these objects. Recognition proceeded by using a priori knowledge of the polyhedral shapes to determine which image regions corresponded to which surfaces. A l t h o u g h research in "model-based" vision (e.g., [Low85]) continues to use such global constraints, attention also turned to the use of " local models", i.e., constraints on the local structure of the objects in the scene. G u z m a n [GA68] showed that the structural relations among the Unes of the junctions were often sufficient for the extraction of three-dimensional structure. Subsequent work (e.g., [Clo71, H u f Z l , Wal72 , Mac76]) gave this approach a more soUd theoretical framework i n which to formulate and discuss issues of Une interpretation. This theoretical framework was based on the so-caUed "blocks w o r l d " , a scene domain comprised of polyhedral objects wi th tr ihedral corners, i.e., corners formed from the intersec-t ion of three polygonal faces (see, e.g., [CF82]). The corresponding image domain is formed by the orthographic projection of these objects onto the image plane. Given that corners are t r ihedra l in the narrow sense (section 1.1), this projection consists of straight-Une segments connected by junctions of either two or three Unes. B y using Une drawings alone, aU effects of surface coloration (e.g., reflectance and texture) and shading are discounted. Viewing di -rection and direction of Ughting are held constant, w i th the two directions often being made coincident i n order to avoid shadows. The result is a "min iwor ld " i n which attention can be focused entirely on the recovery of surface geometry. The most comprehensive, and difficult, problem concerning Une interpretation in this min iwor ld is that of realizability: Given a Une drawing, does it correspond to an actual arrangement of polyhedral objects in some three-dimensional scene? If so, what are the three-dimensional shapes and positions of these objects? V i r tua l l y aU the work done on the blocks wor ld has proceeded by spUtting this problem into two parts: a quaUtative aspect concerned wi th the structural relations between edges and surfaces in the scene, and a quantitative aspect concerned wi th the slants of the Unes and the depths of the vertices [Sug86, K P 8 8 ] . A l t h o u g h both aspects can be approached Independently, the results of quahtatlve analysis have usually been used as the starting point for quantitative analysis (see, e.g., [Sug86]). B . Qualitative Interpretation Since the faces of a polyhedral object are planar, its structure is completely determined by the locations of its edges, i.e., locations where the orientation of adjoining faces suddenly changes. Qual i tat ive analysis is based on the structural relations between these edges (see, e.g., [Mal87]). Edges can be subdivided into convex and concave forms according to whether or not the edge folds outward, i.e., whether or not an external plane can be placed into contact w i t h the edge. A further subdivision results from the relation between object and viewer. Edges can be grouped according to whether both or just one face is visible.^^ T h i s latter k i n d of edge is referred to as a boundary edge. These correspond exactly to places i n the viewer-centered description where depth changes discontinuously. Two types of boundary edge are often distinguished, according to which side of the line corresponds to the visible face. The remaining edges are referred to as interior edges. These correspond to locations where the depth gradient changes discontinuously. These are divided into two types, according to whether the corresponding edge is convex or concave. Between them, boundary and interior edges describe not only an object's shape but also the segmentation of the image, i.e., which regions in the image do or do not correspond to connected surfaces in the scene (see, e.g. [Sug86, Kan90]) . To interpret a Une drawing, each Une must be labeUed as a particular k ind of edge (convex, concave, or boundary) . The interpretation is guided by a set of expUcit constraints on the various edge labeUings. These constraints can be provided largely by restrictions on the labeUing of junctions [Huf71, Clo71]. Four types of junct ion can be distinguished (figure 2.2). The first three are triUnear, formed from the jo ining of three Unes: (i) arrow-junctions, for which the greatest angle between two Unes is greater than 180°, (ii) Y- junct ions , for which it is less than 180°, and (ni) T- junctions, for which it is exactly 180° (see figure 2.2). There also exist L- junct ions , formed from the jo ining of two noncoUinear Unes. Each type of junct ion leads to a part icular set of constraints. These constraints, first given by Huffman [Huf71] and Clowes [Clo71], are shown in figure 2.3. These correspondences fai l to hold when there is an accidental aUgnment of viewing direction with particular arrangements of surface edges. Since accidental alignments are exceedingly rare, the assumption usuaUy is made that they ^*It is evident that this distinction applies only to convex edges. T-junction Y-junction Arrow-junction L-junction Figure 2.2: Types of junctions. do not occur; under this general viewpoint constraint, the correspondences between junct ion and edge types w i l l always ho ld . The problem of finding a consistent set of labels for a given drawing is known as the line labelling ^Tohlem. Every polyhedral scene gives rise to a unique set of labels [Ric88], and i f a drawing is realizable, it can be consistently labelled [Huf71, Sug86]. But the converse is not true. The separation into independent quahtative and quantitative components means that the metric structure of the scene is not available to the qualitative labell ing process. Consequently, Hne drawings can be consistently labeHed, but have no correspondence w i t h any polyhedral object [Huf71, Kan90] . The labeHing of a given drawing can be carried out by a relatively straightforward pro-cedure. A s figure 2.3 shows, each type of junction can be labeHed i n several different ways. To reduce the number of local candidates, interpretation often begins wi th the appHcation of " W a l t z filtering" to eHminate labels that are locaHy inconsistent [Wal72, Mac77]. Th is k i n d of consistency check is a relatively simple procedure that can be carried out in po lynomia l t ime [MF85] . W a l t z fHtering finds a correct interpretation if the locally consistent labels are globaUy consistent as weH. But this does not always occur. Consequently, it is sometime necessary to < I < < I < t Boundary edges: Solid Interior edges: + space Convex Concave Figure 2.3: Huffman-Clowes labelling set. explore al l possible combinations of the remaining labels, each combination then tested for global consistency. Since the labelling problem is NP-complete [KP85] , it is highly unlikely that a globally consistent solution can always be found in polynomial time. Instead, the worst-case t ime is likely to be an exponential function of the number of junctions (and lines) in the image. A l t h o u g h the Huffman-Clowes constraints never lead to inconsistency in a drawing that corresponds to a physically reahzable object, they sometimes consistently label an object that cannot be realized. T w o types of error occur: inconsistencies in the global topological struc-ture , and inconsistencies in the depths of the surfaces (see, e.g., [ D r a S l , Kan90]) . Topological inconsistencies can be el iminated when al l corners are rectangular [Kan90]. Inconsistencies in depth , however, must be handled v i a more powerful constraints based on the metric structure of the scene. The use of metric constraints was pioneered by M a c k w o r t h [Mac73], who developed an approach based on the observation that regions in the image must correspond to flat planes in the scene. Each plane is represented by its gradient, a two-dimensional measure of its orienta-t ion in space. Since aU faces of a polyhedral object are planar, its coherence can be captured by constraints i n the gradient space, which eUminate many inconsistent interpretations. A l t h o u g h constraints on gradient space are useful, they do not eliminate a l l inconsistent interpretations [DraSl ] . Th is is because only part ia l use is made of three-dimensional infor-mat ion — gradients ignore the fact that planes are also specified by their depth along the hne of sight. This latter quantity forms the basis of sidedness reasoning [DraSl ] , i n which constraints are based on the condition that one plane must always be in front of the other on a given side of their intersection line. The resulting set of constraints then ensures that a l l consistent interpretations correspond to physically realizable objects [DraSl ] . C . Quantitative Interpretation A n alternative to qualitative line interpretation is to work directly w i th the quantitative structure to obta in the depths and the three-dimensional orientations of the objects in the scene. This technique, first suggested in [Fal72], is based on the observation that the junctions around a common region correspond to points and edges around a common planar face. Th is plane can be described by a linear equation, w i th the unknowns being the depths of the corners in contact w i t h the face. Col lect ing the equations for each region in the drawing yields a system of Unear equations, which can be solved by straightforward means [Sug86, Kan90] . In general, these systems of equations are underdetermined. Thus , even when an absolute depth is attached to one point , several degrees of freedom sti l l remain [Sug86]. A d d i t i o n a l constraint on the solutions is therefore required. One such constraint is an a priori specifica-t ion of the three-dimensional orientation of particular faces. Since each of these specifications is independent, the number of degrees of freedom is reduced by the number of orientation specifications that can be given. Other kinds of local constraint also are possible. If a junction corresponds to a rectangular corner, the slant and ti lt of the corresponding faces and edges are completely determined by the angles of the Unes about its vertex, the values depending only one whether the junct ion is concave or convex [Per68, Mac76, KanQO]. Furthermore, there exists a set of necessary (but not sufficient) conditions on a junction that corresponds to a rectangular corner [Per68]: an arrow-junction must have one angle greater than 90°and the other two less than 90°, while a Y - junct ion must have al l its angles greater than 90°[Per68, Kan90] . Three-dimensional orientations can also be recovered when only two of the angles are 90°[Kan90]. In this case, the hnes must be correctly identified wi th the corresponding edges in the scene. Recovery of slant is possible for both orthogonal and perspective projection of the object onto the image plane [Kan90, ch. 8]. G loba l constraint also is used. One approach is to specify the recovered surface as the smoothest of aU possible candidates[Kan90, ch. 10]; this loosely corresponds to the regu-larization technique suggested for several aspects of early vision [ P T K 8 5 ] . More generally, there is an interplay between local and global constraints. For example, Mulder & D a w s o n [MD90] have shown that for some objects, a complete quantitative interpretation requires that only a subset of corners be rectangular. This essentially is a special case of m a x i m i z i n g the rectangularity in the recovered figure. 2.2.2 P s y c h o p h y s i c a l Studies In contrast w i t h the work on computational aspects of hne interpretation, work i n psy-chophysics has been rather heterogeneous. It encompasses a wide variety of experimental methodologies and s t imul i , as wel l as different theoretical frameworks. However, there is wide agreement i n the general pattern of experimental results, and these patterns also are consistent w i t h many of the results from computational studies. A . Basics Investigations into the perception of Hne drawings extend back to the very beginnings of experimental psychology. The first comprehensive explanation of how drawings could be perceived as three-dimensional objects was given at the turn of the century by M a c h , who proposed that the v isual system operates on a "principle of economy" (see [Att82]). This gave way to the Gestalt principle of figurai "goodness", which selected those interpretations that required m i n i m a l "energy" for their representation (see e.g., [Hoc78, pp . 131-155]). Accord ing to this principle, a Hne drawing of a cube is perceived as a three-dimensional object rather than as a coHection of two-dimensional Unes because this requires less energy for its representation. Similar reasons also explained the tendency to perceive its sides and angles as equal whenever possible. The vagueness of Gestalt laws eventually led to their abandonment by many workers in the field. However, the central insight remained that interpretation must involve constraints on the interpreted object. This provided the starting point for later investigations (e.g., [ H M 5 3 , Att54]) which attempted to provide a more rigorous study of these constraints. A s i n the case of machine vision, these later studies can be categorized into two groups: those concerned w i t h the qualitative aspects of hne interpretation, and those concerned w i t h its quantitative aspects. Studies in the first group focus on the factors that determine whether a hne drawing is perceived as a set of Unes or as a three-dimensional structure. Studies in the second group are concerned wi th the perception of the metric properties of the structure itself. Reflecting a bias toward viewing Une interpretation as a "high-level" activity, both kinds of studies tend to rely on verbal reports of consciously perceived structure. B . Qualitative Interpretation In contrast w i th computat ional studies of quaUtative structure, psychophysical studies have tended to focus on global aspects of the interpretation rather than local properties such as the convexity or concavity of ind iv idual edges. A t least some of this emphasis Ukely is due to the legacy of the Gestalt school, w i th its emphasis on the m i n i m u m energy of the entire interpretat ion. One of the first attempts to put this approach on a more rigorous footing was the work of Attneave [Att54], who recast the principle of m i n i m u m energy into one of " m a x i m a l s impUcity" , where simpUcity was based upon the " in format ion" contained in the percept. B y identifying this information wi th that used in information theory, it was hoped to have a more objective basis for the rules of the interpretation process. Since the absolute amount of information depends upon the coding scheme, such rules cannot be entirely objective. Nevertheless, a few general principles can be derived. For example, maximal ly simple structures have Unes of equal length, symmetry about the or ig in, corners of equal angle, etc. In the case of a cube, this approach correctly predicts that its Une drawing is interpreted as a symmetric structure wi th edges of equal length rather than as an asymmetrical set of Unes of unequal length. A l t h o u g h this approach could explain the perception of simple Une drawings, it could not do so for more complex ones without imposing ad hoc rules on how various regularities could be traded off against each other [HM53 , Att54] . In turn , this could not be done without the specification of a part icular coding scheme. Figure 2.4: Penrose triangle. Such schemes have been proposed (e.g. [Lee71]). If a large enough set of rules is imposed, it can indeed explain the perception of many kinds of line drawings [Res82, BL86] . B u t such " m i n i m a l description" approaches suffer from serious drawbacks. First of a l l , the emphasis on global measures means that a drawing that cannot be consistently interpreted must be represented as a two-dimensional structure. Th is is at odds with the finding that globally inconsistent figures such as the Penrose triangle (figure 2.4) are perceived as three-dimensional objects. (Also see, e.g. [Hoc78, pp 152-155].) It also is difficult to provide a plausible mechanism for finding m i n i m a l encodings. Even if this could be done, the search for the m i n i m a l description would st i l l take considerable time [Att82, Res82]. Most important ly , perhaps, it is difficult to justi fy why the size of the description itself should be the m a i n determinant of the process, rather than some property of the structure being described. A rather different approach was taken by Weisstein [WM78] , who investigated how various line drawings influenced the adaptation of the visual system to sinusoidal gratings. A simple blank hexagon placed on a grating extending over the entire visual field resulted in a complete lack of adaptat ion at the locations it covered. A t these locations, relatively low-contrast gratings could be easily detected, although this was not possible in the rest of the v isual field. B u t when a Y - j u n c t i o n was added to the hexagon, adaptation suddenly appeared i n the blank field, as i f that area had been "fiHed i n " by the surrounding gratings. These results were explained by a tendency for the early visual system to perceive this figure as a cube, which was then separated from the flat background. More generally, it was found that Une segments can be more accurately identified when they are part of a drawing of a coherent three-dimensional object than when they are among a set of unstructured Unes [WH74]. This was found to hold for junctions as weU [BWH75] , indicat ing that local properties govern this process. A similar set of results was obtained by Walters , who found hues to be perceptually brightened when interprétable as edges of a coherent three-dimensional object. Th is brightening was found to be unaffected by g lobal properties, depending only on junct ion type and hne length (over distances of less than 1°) [Wal87]. C . Quantitative Interpretation A s computat ional studies show, hne interpretation can be achieved v ia constraints on the quantitative structure of the recovered object. Interestingly, psychophysical studies suggest that several quantitative constraints are indeed involved in the human perception of l ine drawings. One of these constraints is that of rectangularity, i.e., the requirement that po lyhedral objects have sides at right angles to each other. The projection of rectangular corners yields junctions having a part icular set of constraints on the angles between their hues (section 2.2.1). E m p i r i c a l tests [Per72, SheSl] have shown that subjects are highly sensitive to these constraints, being able to determine accurately whether a hne drawing does or does not corre-spond to a rectangular cube. This can be done even when some of the hues are removed f rom the figure, provided that at least one Une is kept from each of the three orientations [Per82]. In contrast, subjects are far worse at recognizing which structures contain corners w i t h an-gles of 60° or 120° [She81]. Consequently, it is hkely that the crit ical factor is rectangularity rather than simple equahty among the angles. Rectangularity also makes it possible to determine the orientation of an object in three-dimensional space (see section 2.2.1). A very high correlation has been shown between actual slant and judgements obtained from Une drawings of rectangular figures [AF69]. A l t h o u g h the perceived slants are less steep than the actual slants, this " f lattening" can be lessened when contributions from other cues are reduced [Att72]. Another useful s tructural property is that of bi lateral symmetry, i.e., symmetry about a plane through the center of an object. This property is generally perceived in a hne drawing whenever it is consistent w i t h the laws of projective geometry (e.g., [Per76, P C 8 0 , Per82]) Even when the task itself makes no use of i t , subjects spontaneously perceive symmetry about half the t ime. Indeed, it is even possible to alternate between interpretations based on symmetry and rectangularity [Per76]. Not al l kinds of symmetry can be detected, since equal angles of 60° or 120° are not generally perceived as symmetrical [SheSl]. The preference for bi lateral symmetry may have ecological origins — most animals are bilaterally symmetr ic [Per 76]. Subjects can also detect the coplanarity of two planes in a Une drawing, and although the accuracy for this is somewhat lower than for the detection of rect angularity, it is stiU quite good [Per82]. Th is shows that the human visual system can recover at least some quantitative spatial relations from Une drawings. L i t t l e is known about the mechanisms that carry out Une interpretation in human v is ion . The two extremes have been proposed: "convergence" and "direct computat ion" mechanisms [PC80, Per82]. The former is essentially a general-purpose relaxation process (section 2.1.1) that can incorporate various constraints into its operation. Recovered properties are obtained as the computation settles into ah equiUbrium state. The latter is a special-purpose device that computes properties directly (i.e., in a non-iterative way) , obtaining its speed at the price of decreased flexibiUty. There is insufficient evidence to determine which of these two processes (if either) is responsible for Une interpretation, but the sensitivity of the v i sua l system to several kinds of geometrical properties has been taten to support the existence of the general-purpose " indirect" process [Per82]. 2.2.3 C o m p u t a t i o n a l versus P s y c h o p h y s i c a l Studies A s is evident from sections 2.2.1 and 2.2.2, there is considerable agreement between the results of computat ional and psychophysical studies in the areas where they overlap. According to computat ional models, there is enough information in the junctions to allow the recovery of almost a l l quaUtative structure from a Une drawing. This result is echoed in the finding that the perceived three-dimensionaUty of a Une drawing depends on the types of the junct ion involved. The similarities extend to the quantitative aspect of interpretation as weU, where the importance of s tructural constraints such as rect angularity has been estabUshed in bo th areas of study. Agreement, however, is not the same as completeness — many aspects of Une interpreta-t ion have not yet been investigated by either k ind of study. For example, most computational and psychophysical studies have been based on perfect or near-perfect Une drawings, so that interpretation in the presence of noise is a relatively unexplored domain. Another largely unexplored area is the complexity involved with interpreting various kinds of Une drawings. In its most general form, the reaUzabiUty problem (section 2.2.1) is NP-complete [KP85] , and so the interpretation of some drawings must sometimes be difficult and time-consuming, regardless of whether the system is artif icial or biological. No studies, however, have explored the way in which complexity issues are handled by the human visual system. The fact that psychophysical studies are usually based on reports of relatively high-level percepts shows a tacit agreement that hne interpretation requires sophisticated processing. But how then to account for rap id hne interpretation in early vision? Evident ly , rap id interpretation must be possible for only a subset of the scene domain , one for which the t ime complexity is very low. A few sub domains of this k i n d are known to exist. One of these is the orthohedral world , where aU objects are constrained to have surfaces parallel to the x,y, and z planes. Here, the labehing of n hnes can be done in 0(n) t ime on a serial processor, and in O(log^ n) t ime when processors are available [KP88] . But this result does not necessarily pertain to rapid recovery in human vision, since 'nP processors may not always be available, and O(log^ n) t ime may not always be allowed. More generaUy, it is not clear which aspects of scene structure can be rapidly determined, or even which aspects should be. These questions can only be examined in the context of a computational theory. 2.3 High-level versus Low-level Vision In order to develop a computat ional theory of rapid parallel recovery, it is necessary to know what role this process could play. This section re-examines the reasons for separating vis ion into high and low levels, and for the particular assignment of various processes to these levels. It then examines what can be expected of a rapid recovery process, and shows how it can help bridge the gap between the two levels. 2.3.1 T h e S t r u c t u r e of L o w - l e v e l V i s i o n Information-processing tasks generaUy have aspects common to al l inputs and aspects ap-pUcable only to special cases. In vision, these two aspects take the form of distinct levels: a " low" level based on the general constraints of geometry, physics, and information theory, and a " h i g h " level based on the more specific relations between indiv idual objects in the scene (e.g.,[Mar82, Fel85]. The boundary between low- and high-level vision therefore reflects the Umits of a " common core" beUeved to be derivable (usually in a bottom-up fashion) f rom general considerations alone. Owing to the inverse relation between the generaUty of a constraint and the s tructural complexity of the objects it appHes to (see, e.g., [Sal85, p. 49]), this common core must involve structures of a relatively simple structure. In part icular , the common core is usually taken to be a viewer-centered map (or set of maps) of various scalar properties of the image or scene, w i t h possibly some explicit representation of structural grouping as well [BT78, Mar82] . Th is characterization of low-level vision differs from that of early vision, in that emphasis is placed on properties of the input -output mapping itself rather than on properties of the process that generates it.-^^ But processing speed is used (often imphcit ly) to decompose low-level vision into a sequence of "hor izonta l " modules. Each of these is concerned wi th a distinct stage of the computat ion, and is apphed to the entire image (see, e.g., [Mar82, UU84, Uhr87]) . W h i l e there is no consensus on the exact structure of these stages, there is considerable agreement on their existence and general operation. A . E a r l y Stage The first stage of low-level vision is generally identified wi th early vision, i.e., based on operations carried out rapidly and in parallel across the visual field. Th is is sometimes described as the "image processing" stage, since the representations for both input and output are generally arrays of pixels, usually w i th the same spatial dimensions [Ree84]. E a r l y vision is beUeved to provide a quick in i t ia l analysis of the image, making expUcit those properties useful for subsequent stages of processing (e.g., the locations and orientations of Unes i n the image) [Mar79, Mar82].-^'' Its primit ive elements therefore describe properties that can be reUably determined i n this way. Typical ly , this is done by the concurrent appUcation of fixed templates to each point i n the image (e.g., spatial filtering [Gra85]). Th is early stage is common to virtuaUy aU computational models of low-level vision, tak ing on forms such as the "raw pr ima l sketch" of M a r r [Mar82], the " M I R A G E model " of W a t t and M o r g a n [ W M 8 5 , Wat88] , and the "cortex transform" of Watson [Wat87]. A l t h o u g h the primitives used in these models differ in detai l , they generally describe simple properties of the image, such as color, orientation, and spatial frequency. It has been recognized (e.g. [Mar82]) that primitives should describe properties of the scene whenever possible (e.g., using ^^This distinction between early and low-level vision is not one that is usually drawn. However, it helps to illustrate one of the points being made here, viz., that computational models must incorporate issues of resource use. ^'^Interestingly, in his earlier work, (e.g. [Mar79]), Marr emphasized that "there seems to be a clear need for being able to do early visual processing roughly and fast as well as more slowly and accurately" [Mar79, p. 31]. This idea became less prominent in later work. contrast to obtain changes i n surface reflectance). But scene-based properties that can be rehably determined w i t h templates are few and far between. For the most part , a complete determination of scene properties requires subsequent stages of processing that employ more sophisticated and time-consuming operations. B . Later Stages There are many ways to associate properties of the scene with pr imit ive image elements (see [dYvE88]) . Because of this , and because of the shortage of relevant information f rom psy-chophysical and neurophysiological studies, there is no general consensus as to how subsequent processing is carried out. One possibihty is that reconstruction is based directly on the image elements, using con-straints derived from the nature of the scene and the way it is projected to the image plane. This is sometimes assumed to be done v i a separate streams for each k ind of v isual m e d i u m (e.g., information obtained v i a luminance, mot ion , or texture) or for different scene and image properties (see, e.g., [CAT90]) . But although some recovery processes can be carried out a l -most immediately when paral le l processing is available (e.g. the recovery of three-dimensional surface orientation v ia photometric stereo [W008I]), much more time is generally required. For example, the interpretation of hne drawings is an NP-complete problem [KP85] , which effectively rules out the possibihty of always speeding it up sufficiently by parallel process-ing alone (section 2.1.1). Recovery processes described by the frameworks of regularization theory [PTK85] or M a r k o v random fields [GG84] also are relatively time-intensive, typical ly requiring several thousand iterations for images of moderate size (e.g., [Bla89]). Furthermore, their close association to relaxation problems makes it Ukely that the t ime required increases at least Unearly w i t h the size of the input . A n alternative approach is to bui ld up the scene descriptions more gradually, v i a an intermediate stage containing "non-template" properties that can be determined quickly. For example, the pr imit ive elements of M a r r ' s raw pr imal sketch [Mar82] are grouped together on the basis of local properties (e.g., common orientation) to form higher-level symboUc structures. Th i s grouping is done recursively, so that highly complex elements can be built up. The result is a " fuU" p r i m a l sketch that is available to subsequent processes, such as those involved w i t h texture segregation [Mar82]. U U m a n [UU84] has suggested that many spatial relations (including those described i n the fu l l p r i m a l sketch) are obtained v i a the application of v isual routines to the elements of early vision (section 2.1.3). These routines are based on a small set of simple operations such as mark ing and propagation, which are then concatenated together to form the desired procedure. This allows many "non-template" properties to be extracted from the image i n t ime proport ional to the size of the input . But although the complexity of these strategies is relatively low, the Unear complexity bounds are stiU insufficient for many purposes, espe-ciaUy i f images are large and complex. Furthermore, many of these operations are spatially inhomogeneous, suggesting that they may be based on a higher-level serial control [UU84]. Another alternative is to choose a more modest common core, e.g., the image itself, w i t h perhaps a few of its properties (such as the orientations of Une fragments) made expUcit. E s -sentially, this identifies low-level vision wi th some variant of early vision, perhaps augmented by high-speed grouping processes. Th is approach is found in many model-based recognition schemes (e.g., [ B r o S l , Bie85, Low85]), where recognition proceeds v i a the matching of image features to projections of a predefined model onto the image plane. It also is found in tech-niques that use image features to index directly into a large set of predefined models (e.g., [PE90]). But models are not always available, especiaUy for unknown environments. E v e n when they are, occlusion often removes many of the relevant features, raising the possibiUty of confusion w i t h other objects that share the same subset of visible features. Furthermore, this approach must be able to handle al l possible views of aU possible objects at al l possible orientations in the scene. This makes the system unwieldy as the number of objects to be represented increases: memory requirements can become substantial if aU possibiUties are to be stored; i f procedures are used to reduce the memory requirements, computation t ime increases. Thus , a " m i n i m a l core" based on simple image properties is often too m i n i m a l for low-level vis ion. A more complete intermediate description of the scene is therefore required. 2.3.2 T h e R o l e of R a p i d P a r a l l e l R e c o v e r y It would appear that low-level vision faces a di lemma of sorts, since a common core based on properties of the scene cannot be computed quickly, while simple image-based properties are insufficient for general purposes. But there is a way around this di lemma: instead of demanding that interpretations make opt imal use of available information, demand only that they be "reasonably correct". In part icular , instead of demanding that interpretations be consistent over the entire image, demand only that they be consistent over spatially Umited zones. Relaxing consistency in this way allows the recovery of scene properties to have a com-plexity far below that of " o p t i m a l " recovery: not only is m a x i m a l use made of paraHehsm, but the interaction of each processor with its neighbors can be considerably simphfied (cf. section 2.1.1). Since nonlocal context provides much of the information for interpretat ion, the outcome is usually subopt imal ; in fact, interpretations may exist only over a sparse set of locations in the v isual field. Thus , a rapid recovery process cannot be expected to produce a description that is complete, or even globally coherent. W h a t can be expected, however, is that some of this description wiU be accurate enough for tasks further along the processing stream. Such "quasi -val id" estimates could be useful in several ways. For example, they could help guide processes that cannot afford to wait for a complete analysis of the scene (e.g., active visual processes such as gaze or focus of attention [Bal91]). They could also act as precursors to serve as the i n i t i a l estimates for slower processes that restore some degree of global consistency [ER92]. They might also be used as (invariant) indexes into higher-level object models, thereby increasing the efficiency of model-based recognition. In any event, this view of early vision suggests that parallel processes may play a greater role t h a n previously suspected — in essence, the "hor izontal" stages of the conventional theories may be complemented w i t h "ver t i ca l " islands of locally-consistent interpretations. Given the plausiblhty of this viewpoint, the problem now is to develop it into a rigorous theory of early v isual processing. It is essential to find a way to describe a rapid recovery process precisely and to justi fy its operation. W h a t is required for this is a framework that allows it to be given a computat ional analysis in the sense of M a r r [Mar82]. 2.4 The Analysis of Resource-Limited Processes If rap id recovery is to be given a rigorous computational analysis, a general framework must exist that allows a clear formulation of the problem and sets the ground rules for its explana-t i on . The framework proposed by M a r r [Mar82] goes a long way towards this end. However, it can only be used to analyze processes for which the hmited resource is the information available in the image [RP91]. A few studies (e.g., [FB82, Ros87, Tso87]) have grappled w i t h the issue of how t ime and space hmitations influence the structure of a visual process, but a general framework for the incorporation of resource hmitations has not yet appeared. Such a framework is therefore developed here, based on a direct extension of M a r r ' s framework. 2.4.1 M a r r ' s F r a m e w o r k Accord ing to M a r r [Mar82], the complete analysis of a visual process involves three different levels of explanation: 1. Computat ional level. This is concerned w i t h the functional aspects of the task. It consists of two parts : (i) a description of the constraints between the input and output of a v isual process, and (ii) a justif ication of why these particular constraints were chosen. 2. A lgor i thmic level. Analys is at this level describes and justifies the representations and algorithms used. It is essentially a constructive demonstration that an algorithm exists capable of generating the required mapping. 3. Implementational level. This level is concerned wi th the description and justif ica-t ion of the physical substrate on which the algorithms are implemented. A n " implemen-t a t i o n a l " explanation provides a constructive demonstration that there exists a physical system that can carry out the required computations. One of the strengths of this framework is its recognition of a separate " computat ional " level of explanation focusing on both the what and the why of the input - output mapping. The what is concerned wi th the expUcit description of the constraints on the form of the mapping . This aspect of analysis is complete when the constraints are shown to determine a unique mapping . The why is concerned wi th the justif ication of these constraints, showing that the resulting set of associations between input and output is suitable for the purposes at hand. To use an example taken from M a r r [Mar82, pp. 22-24], the what of a cash register's function is explained by describing its output as the sum of its inputs. The why of this function is explained by the need for a pricing mechanism that has a zero value, is commutative and associative, and that allows inverse operations. In this approach, constraints on the input -output mapping of a visual process are assumed to be machine-indifferent, originating from the laws of optics or from the structure of the objects under consideration. A s such, it impUcitly assumes that the mappings are shaped only by the information available in the image, and not by Umits on the computat ional resources. Th is aUows analysis to be completely general, w i th no dependence on the structure of the processor carrying out the computations. W h e n the process is Umited pr imari ly by the ' * M a r r [Mar82] does consider efficiency to be important, but only once the task itself has been laid out. Efficiency itself is therefore addressed at the algorithmic rather than the computational level of analysis. available information, it can be completely explained by this k i n d of analysis. But when it is hmited by other factors, something more is needed. 2.4.2 E x t e n s i o n s A . E x t e r n a l and Internal Constraints If resource hmitations are to be incorporated into a computational framework, several i m -portant distinctions must first be made. The first is that between external and internal constraints. E x t e r n a l constraints are those on the "s tat i c " aspect of the mapping, i.e., those definable without regard to the way the output is generated. These are essentially the con-straints that apply when the processor is viewed as a "black box" . In the case of the cash register, for example, the requirements of commutativ i ty and associativity are external con-straints, apphcable only to the final form of the output function. These constraints can operate either directly v i a relations between input and output elements, or more indirect ly v i a relations between the elements of the input or output domains (see, e.g., [RM89]). W h e n a process can be analyzed entirely in terms of external constraints, the internal details of the processor are irrelevant. But when hmited computational resources enter into the picture, it becomes important to consider exactly how the process is carried out. This is specified by the internal constraints, which apply to the way the output is generated [RP91]. M o r e precisely, these are invariants of the information flow that occurs during the course of the computat ion. These constraints include hmits on the communication bandwidth and architectural constraints on the set of basic operations to be used. Internal constraints can therefore influence the complexity of a given operation on various kinds of processors. B . Resources and Resource Limitations It is important to recognize that when an information-processing task is analyzed, a subset of constraints is usuaUy specified that is fbced and not subject to further discussion. Eor example, when analyzing a process to recover shape from shading, the available information is determined by the viewing conditions and sensor array specified in the problem formulation. Exp lanat i on then centers around the constraints used to recover shape from this information, but the available information itself remains as a given throughout this analysis, and does not need to be explained. A s such, the available information is effectively a "boundary condit ion" for the analysis, Umiting the set of input -output mappings that can be considered. Such quantities are referred to here as resources, and the corresponding constraints as resource Umitations. Resources can involve either the external or the internal aspects of processor operat ion. E x t e r n a l resources are quantities that can be defined independently of processor structure . These include not only the available information, but also such things as the to ta l amount of t ime or energy used. The corresponding constraints are referred to as external limitations. These generally result from higher-level factors in the surrounding environment. A s such, they can be considered to be constant over the course of processing (cf. [Sal85, ch.4]). Internal resources can be similarly defined as those quantities relevant to the interna l structure of the processor. Examples of these include communication bandwidth , the d i s t r i -but ion of memory buffers wi th in the architecture, and the proportion of matter tak ing the form of processing elements. The corresponding constraints on these quantities are referred to as internal limitations. Considering again the example of the cash register, the explanat ion of a part icular design may involve an internal hmitat ion such as the requirement that m e t a l gears be used for a l l operations. Note that a similar complementarity exists for both kinds of constraint - as a given analysis requires fewer resources in its "boundary conditions", it appHes to a wider range of processes. If an analysis requires the existence of five identifiable points in an image, it also is apphcable to processes based on six identifiable points. If only four points are required i n the analysis, it can be appHed to an even larger set of processes. In the same way, an analysis that explains the operation of a cash register containing twenty gears also applies to a wider range of processes than one based on a l imi tat ion of forty gears. C . Abstractness A quantity is said to be abstract to the degree that its physical composition is relevant to the analysis. The most abstract quantities are purely formal ones, i.e., those that are independent of the properties of the underlying substrate. Such formal quantities include information and computat ional measures of t ime (section 2.1.1). The corresponding constraints and l i m i t a -tions are as abstract as the least abstract quantity involved. For example, the requirement of commutat iv i ty is a purely abstract constraint on the operation of a cash register, being completely independent of its mater ia l composition. Similarly, the requirement that a base 10 representation be used is also independent of physical structure. A s these examples show, both internal and external constraints can be completely abstract. More concrete quantities contain intrinsic constraints due to the physical properties of the underlying substrate, such as its density or thermal conductivity. These properties can affect both external and internal aspects of the processor's operation. For example, setting an upper Umit on the weight of a cash register hmits the tota l value that can be represented. This results i n an "approx imat ion" of the addition operation in which the output is given a definite upper bound. This upper bound may not necessarily be important for pract i ca l purposes if the cash register is an electronic device, but it may weU have a serious effect i f addit ion is required to be done mechanically. D . Completeness A n analysis is said to be complete to the extent that the constraints determine the uniqueness of the mapping , a lgor i thm, or Implementation being analyzed. For example, requiring addi -t ion to be based on a posit ional numeric representation provides only a part ia l specification of its algorithmic structure, since — among other things — the particular base has not been specified. The choice of base has no Impact on functional properties such as commutat iv i ty and associativity. It may, however. Influence the efficiencies possible for various operations. Note that the Init ial set of hmitations assumed In the formulation of a problem already sets Umits to the kinds of mappings, algorithms, or implementations that are possible. T h e set of constraints obtained from a computational analysis therefore serves to complete this or iginal set of specifications. 2.4.3 A R e v i s e d F r a m e w o r k The above considerations can be Incorporated Into a coherent system by a straightforward extension of M a r r ' s framework. The resulting system Is summarized in figure 2.5. A s In the original framework, analysis Is carried out at three different levels of explanation. The most general of these is the computational level, where analysis Is centered around the description and justif ication of the mapping between image and reconstructed scene. Analys is at this level is complete when It Is shown that the mapping described by the constraints is (i) unique, and (II) Is consistent w i t h the given hmitations. 1. Computational Level Constraints sufficient to determine input-output mapping that is (i) unique, and (ii) exists witliin given limitations j External Internal Abstract n All ^H::;:::;:::;: Possibly some Concrete Possibly some iov:::;: Possibly some :::!:::::.•: 2. Algorithmic Level Constraints sufficient to determine procedural decomposition that is (i) unique, and (ii) exists within given limitations External Internal Abstract g J All remaining Concrete I:;:;:;!;:;:; Possibly some Possi bl y 30 me 3. Implementational Level Constraints sufficient to determine physical instantiation that is (i) unique, and (ii) exists within given limitations Figure 2.5: Extended computational framework. If the analysis is to be general, these hmitations must be abstract (i.e., involve no phys i ca l properties) and external (i.e., have no dependence on the internal structure of the processor). The constraints derived under these conditions (shown in the upper left quadrant of figure 2.5) are therefore independent of any assumptions about the processor ItseK. This essentially corresponds to an analysis carried out at the computational level in M a r r ' s framework, except that constraints may now be justified by an appeal to abstract resources other than available information (e.g., t ime or space). It may not be possible to explain a mapping in such a general way if it has been shaped by the physical properties or internal structure of the processor. Analysis must then be completed by Invoking hmitat ions that are less general. These may be less abstract or may involve the Internal structure of the processor to some degree. The corresponding constraints are located in the remaining quadrants of figure 2.5. Note that if hmitations pertain t o only a few aspects of the process, they can give rise to only a part ia l set of constraints on Its physical substrate or architecture. If so, this st i l l allows the analysis to be apphcable to a relatively large set of processes. Similar considerations apply to the algorithmic level of analysis, where the goal Is to decompose the given process Into a system of more elementary data structures and operations. Exp lanat i on at this level describes and justifies the constraints that make this decomposition unique. If Internal constraints exist at the computational level, the two levels of analysis win not be completely Independent — the algorithmic analysis must not only obtain a set of abstract Internal constraints, but also ensure that they are consistent with those obtained from the computat ional level (upper right quadrants in figure 2.5). A n algorithmic analysis Is complete to the extent that It specifies a decomposition that is both unique and consistent w i t h a l l other constraints. It is general to the extent that nothing is assumed about the physical composition of the processor itself. The final level of analysis is that of implementation. A s In M a r r ' s framework, the goal Is to specify a set of constraints that determine a unique physical instantiat ion of the processor. However, there are now two sources of constraint to contend wi th : external constraints on the t o t a l amount of mater ia l , and internal constraints on its d istr ibution within the processor. Note that the implementat ional constraints do not determine the physical implementation precisely, but only to the "granular i ty " of the algorithmic analysis. Once a process is under-stood at the three levels, analysis can be recursively apphed to each of the components of this decomposition. 2.5 Rapid Line Interpretation Computat i ona l models have been most successful when (i) the parameters of the prob lem (such as input , output , and resource use) can be clearly specified, (ii) the problem can be solved by a modular process, and (ui) the constraints obtained by the analysis lead to testable predictions (see, e.g., [PTK 85 ] ) . It is evident from section 2.3.2 that the rapid recovery process is highly modular , requiring v i r tua l ly no interaction wi th other aspects of visual processing. It is also evident that knowledge of the constraints on this process can lead to predictions about the kinds of hne drawings that can and cannot be rapidly detected at early levels of human vision. This section shows that the problem itself is weh defined, wi th a l l relevant parameters clearly specified. 2.5.1 B a s i c T e r m s A . T i m e In order to keep the analysis as general as possible, the basic unit of time is taken to be that required to combine two independent quantities, or to transmit across some unit distance. B y describing t ime in terms of 0 -notat ion , this basic unit does not need to be specified i n greater detai l (see section 2.1.1). It is also important here to distinguish between serial and parahel measures of t ime. Ser ia l t ime refers to that required on a serial machine; essentiaUy, this describes the tota l amount of "work" needed. Paral le l t ime is the mi n i mum time required on a given parallel architecture, and is often less than serial time.^^ Unless otherwise specified, t ime is identified here w i t h paraUel t ime. B . R a p i d processing For many visual processes, it Is assumed (often ImpUcltly) that opt imal or near-optimal use Is made of the Information available In the image. This effectively places a fixed lower bound on the information to be used for a process, the exact bound depending on the input image. Since every problem has an intrinsic complexity, any such " informatlon-hmlted" problem must have a lower bound on the time It requires (see section 2.1.1). '®The solution of some problems cannot be sped up by using a paraUel architecture — see section 2.1.1. Similar considerations hold for other resource Umitations. In particular, a " t ime-Umited" process can be defined by placing upper bounds on the available processing t ime, these bounds depending on the input image. N o upper bound is expUcitly given for the information used by such a process, but complexity considerations imply that such an bound must exist. G iven the complementary nature of their upper and lower Umits, it is seen that — at least in a very broad sense — information-Umited and time-Umited processes are duals of each other. Intuitively, a rapid process is a time-Umited process for which the upper bounds on t ime are relatively low. In the interests of precision, the term ' rap id ' refers here to any process for which the complexity is a subUnear function of the number of Unes in the i m a g e . T h i s choice is mot ivated by two considerations. F i r s t , processes that can be carried out in po lynomial t ime form a natura l complexity class, w i th aU polynomial processes retaining po lynomia l complexity even when carried out on various machines (section 2.1.1). A s such, Unear-t ime processes cannot be readily isolated. Given that processes of high-degree po lynomial complexity cannot be considered as rapid , the subUnear criterion must be imposed if an awkward theoretical boundary is to be avoided. T h e choice of the subUnear criterion also is motivated by practical reasons: it is gen-erally impossible to distinguish a Unear-time parallel process from a constant-time process appUed sequentiaUy to each location in the visual field [Tow72]. Consequently, only subUnear processes can be readily identijfied as being carried out in paraUel. 2.5.2 F o r m u l a t i o n of the P r o b l e m In what foUows, the expression ' rapid recovery' refers to the rapid interpretation of Une drawings. The scene domain is a restriction of the blocks world (section 2.2.1) in which only three edges can be in contact about the vertex of any corner (section 1.1). The scene is assumed to be projected onto the image plane v i a a monocular orthographic projection. T h e inputs are therefore drawings composed of straight Une segments wi th no dangUng ends and which meet in junctions composed of either two or three Unes. The outputs are viewer-centered dense descriptions (i.e., maps) of the structure of the corresponding polyhedra in ^° Note that this is completely separate from considerations of efficiency. The efficiency of an information-limited process is a measure obtained by comparing the time it requires against the absolute lower bound imposed by complexity considerations. Similarly, the efficiency of a time-limited process is measured by comparing the amount of information it extracts from the image against the maximum that could be achieved. In both cases, efficiency is described in the same terms. ^^The term 'real-time' has been suggested for processes requiring at most Knear time [Vol82]. the s c e n e . A n estimate of the relevant properties is assumed to exist at every point along these hnes. A s for rap id processing generally, the available time is hmited to a subhnear function of the size of the problem, i.e., the number of the hnes and vertices in the drawing. The problem is to recover as much of the scene structure as possible wi th in the aUocated t ime. In what foUows, a rather severe hmitat ion is imposed: the recovery process must use only a constant amount of t ime, i.e., the amount of t ime must be independent of the size or content of the input . Th is is motivated by several considerations. F i r s t , i f the output of the rap id recovery process were the basis for more complex operations at higher levels (section 2.3), control of this interface would be greatly simphfied if it could be assured that recovery was always completed w i th in a fijced amount of time. Second, constant-time hne interpretation is an extreme case of rapid recovery, and there-fore an interesting problem in its own right. A m o n g other things, any structure recovered under these conditions sets a lower bound on what can be expected of any rap id recovery process. A n d given that extremely low hmitations are involved, the results obtained would be apphcable to the widest variety of processes (section 2.4.2). F ina l ly , constant-time interpretation leads in a very natural way to the locally-consistent estimates assumed to be provided by rap id recovery (section 2.3). Since transmission speeds are finite, a constant-time hmitat ion translates into a constant-distance hmit on the trans-mission of information i n the output . A s such, inconsistencies resulting from violations of the under ly ing assumptions are not propagated throughout the image, but are restricted to rela-t ively smal l regions, or "patches". This consequently avoids the destruction of Interpretations In areas where these assumptions do hold . To make the analysis relevant for the greatest range of processors, relatively severe h m -itat ions are also placed on the available processing resources (cf. section 2.4.2). Since the r a p i d interpretation is hkely to be done in-place by a spatiotopic array of processing elements (section 2.3), the number of processors must be proport ional to the number of locations upon which the hne drawing falls; accordingly, 0{n) processors are assumed to be available for an input of size n. The simplest way to coordinate these elements is as a two-dimensional array of independent processors. But although this architecture is in some sense a m i n i m a l ^^The term 'structure' refers here to properties of the scene. These are chosen to be the (positive) convexities, slant signs and slant magnitudes of the edges, as well as the contiguity relations between edges and surfaces (section 4.1.1). one, it requires a considerable amount of wiring for each processing element (section 2.1.1). Processors are therefore assumed to be arranged in a mesh, w i t h each element connected only to its nearest neighbor. It also is assumed that each processor is simple enough that its operation requires only a fixed amount of space and time.^^ A l t h o u g h the part icular space and time Umitations that apply to rapid recovery are not known, most aspects of this process can be analyzed without knowing their exact values. T h i s can be done by assuming that the t ime required for local processing is less than that required for transmission across some smaU fraction of the image. This amounts to an assumption that the complexity of rap id recovery is dominated by transmission t ime, a point of view largely i n accord wi th known Umits on biological and artif icial processors (section 2.1.1). A m o n g other things, this assumption removes the need to distinguish between time as defined by signal propagation and time as defined by the number of switches along the path (section 2.1.1), since these two measures are directly proport ional to each other for a mesh architecture. The interpretation process can therefore be described in terms of the percolation of i n -formation through a mesh network at some constant speed. The absolute size of the image, the size, speed and spacing of the processors, and the speed of transmission do not need to be known — a l l that is relevant is the ratio of transmission speed to the length of the Unes in the drawing. Even this can be eUminated by a rescaUng of the image (e.g., setting the average Une length to some constant). Consequently, the computational analysis is largely independent of the details of any part icular representation or architecture used. These assumptions, of course, do not rule out the use of a more complex architecture such as a pyramid. Rather, they merely avoid assuming the extra processing power, allowing the analysis to apply to a larger set of processes. Chapter 3 Low-Complexity Recovery The success of a rapid recovery system rests upon its abil ity to recover a large amount of scene structure wi th in a small amount of t ime. Since the interpretation of a Hne drawing is an NP-complete problem (section 2.2.1), a mapping that recovers a l l possible three-dimensional structure is not generally suitable for this purpose. Instead, a low-complexity "approx ima-t i o n " must be used that captures only part of the relevant structure. A n approximation can differ from a more complete mapping in three ways: (i) fewer degrees of freedom in the input , (ii) fewer degrees of freedom in the output , and (iii) fewer transformations of the given data . The first way (see, e.g., [Tso87]) essentially reduces the input resolution, while the second (see, e.g., [Lev86]) reduces the expressiveness of the output . B u t rap id recovery is assumed to have the same k ind of mapping as for opt imal interpretat ion, v iz . , an association of scene properties to each (high-resolution) Une in the image (section 2.5.2). A p p r o x i m a t i o n is therefore based on the t h i r d way — fewer transformations of the data . Because fewer transformations are involved, scene properties cannot always be recovered successfuUy at each zone in the image. If constraints are chosen carefuUy, however, the UkeUhood of this recovery can remain high. Given the Umitations on time and transmission distance, this UkeUhood is highest for those aspects that (i) are easy to compute, and (U) require m i n i m a l "nonlocal " input , i.e., m i n i m a l input from areas outside the zone. This chapter examines the extent to which low-complexity recovery can be carried out along concurrent streams, each concerned wi th a single dimension of scene analysis. Four part icular dimensions are considered: the contiguity and convexity of edges and the sign and magnitude of edge slants. Complexity bounds are derived that show the extent to which each of these properties can be computed in subhnear t ime and with m i n i m a l nonlocal in format ion . It also is shown that these streams can be combined to completely recover both qual i tat ive and quantitative structure in subUnear t ime for several subdomains of polyhedral objects , including convex polyhedra and polyhedra wi th rectangular corners. 3.1 General Issues Several general issues are involved in the specification of a mapping for a rapid recovery process. This section discusses three of the more important ones: the degree to which pro-cessing power can be increased by concurrent processing streams, the complexity of so lv ing the constraints wi th in each stream, and the tradeoffs that exist when approximating a given mapping by one of lower complexity. 3.1.1 C o n c u r r e n t S t r e a m s A decomposition into separate processing streams is found in many computational models of early vision (see, e.g., [PTK85] ) . This decomposition often has its origins in the processing of different media (e.g., contours defined by luminance, mot ion , or texture [CAT90]) , or in the processing of different aspects of the output (e.g., motion, color, and binocular dispar-i t y ) . E a c h of these streams essentially contains a bundle of highly-correlated information (a "dimension") that describes some particular aspect of structure in the image or scene. The existence of separate streams is beUeved to faciUtate the development of perceptual processes, since natura l selection can act independently on each one [ S i m S l , Mar82] . B u t there also is another reason for their existence — they maximize the sheer amount of data transformation that can be done within a given amount of t ime. If a set of operations are independent of each other, they can be carried out faster in paraUel rather than i n sequence. A l so , the complexity of each operation is often lower when fewer and less complex variables are involved. If the constraints can be reformulated such that each dimension involves only a few variables, then a m a x i m u m amount of data transformation is possible. Such a dimension is readily obtained by coalescing the original variables into a few groups, which are then treated as coarse-grained variables governed by a smaller set of "coUapsed" constraints [ M M H 8 5 , Mal87] . B y grouping the original set of variables in several different ways, the or iginal problem can be largely decomposed into several simpler subproblems, each of which can be solved by a concurrent processing stream. Note that the sets of properties handled by these streams do not need to be independent of each other — only the systems of constraints need to be this way. Decomposing a process into concurrent streams can lead to a considerable reduct ion of processing t ime, but the price of this reduction is a loss of coherence: the solutions obta ined in each stream are not necessarily compatible wi th those obtained in the other streams. T h u s , cross-dimensional constraints must be incorporated if aU (or even much) of the power of the original set of constraints is to be retained. A n important aspect of developing a successful approximation is therefore to maximize the use of cross-dimensional constraints without increasing the complexity of the problem. One such strategy is based on a hierarchical decomposition of variables [MMH85] . Here , the original variables are grouped together into a few sets of coarser-grained variables that obey a simple set of collapsed constraints. Once the set of coUapsed constraints is solved, the result is used as the basis for a new problem involving finer-grained variables. This can i n t u r n be applied to yet another set of constraints on even finer-grained variables. In essence, the problem has been decomposed into a sequence of simple streams in which the outputs of the coarser-grained systems help wi th the solution of the finer-grained ones. More generally, low-complexity interaction is possible if information from an unambigu-ous set of results in one stream can be transmitted to help constrain possible solutions i n another. Since the transmission is based on unambiguous (local) results, the backward flow of information from the second stream to the first one has no further effect on the or iginal re-sult . Th is essentially corresponds to a unidirectional hnkage (section 2.1.1) between streams, w i t h Unkage now generalized to apply not only to interactions across geometrical space, but across more abstract dimensions as well. If the amount of information to be t ransmit ted is smal l , the cross-dimensional constraints wi l l not add to the complexity of the problem. 3.1.2 R e d u c t i o n to C a n o n i c a l F o r m s If an approximation is to capture much of the structure of the original mapping, it must focus on those aspects of the scene that are (i) easy to compute and (ii) need a m i n i m a l amount of information from outside the local zone. One way to help ensure that these conditions are met is to select dimensions such that their determination can be reduced to the solution of some low-complexity problem. Two problems are of particular importance in this regard: 2-Satisfiabihty ( 2 - S A T ) and connected components labeUing ( C C L ) . In order to simplify the reduction to these problems, only the constraints on arrow-, Y -, and L-junctions are considered exphcitly. The constraints on T-junctions are handled by a preprocessing step where v i r t u a l gaps are introduced between the stems and the crossbars of each T - junct i on , the two hnes afterwards treated as unconnected. The crossbar of the T- junct ion then corresponds to an occluding edge, while the stem becomes an unconstrained hne that has at least one "danghng" end. This reformulation has the advantage that the remaining hnkages automaticahy spht the n hnes in the image into separate partitions, each of which can be treated separately.^ A . Reduct ion to 2 - S A T One way to ensure that a dimension is easy to compute is to restrict the set of " i n t r a -dimensional" constraints so that they correspond to an instance of the 2-Satisfiabihty (2 -SAT) problem. This can be defined in terms of a set of boolean variables^ V — {vi,V2,. •., Vn) and a set of clauses C = ( c j , C 2 , . . . , c^), w i th each Ci containing either one or two variables. T h e problem is to assign t r u t h values to the Vi such that al l clauses in C have at least one ' t rue ' htera l (see, e.g. [GJ79]). Since the clause Ck = {vi, Vj} is a disjunction of variables, it has the equivalent form = ~ {vi Avj). Consequently, any problem involving binary constraints on two-valued variables can be treated as an instance of 2 -SAT [MacQl]. For the hne labelhng problem, the variables are the possible edge labels, w i th the set of Huffman-Clowes constraints (section 2.2.1) determining their allowable combinations. Since these variables arc four-valued, reduction to 2 -SAT can only be done by decomposing the set of variables into sets of simpler elements. For the most part , such a reduction occurs v ia the direct transcription of edge labels into two-valued variables and the recasting of the remaining junct ion constraints into binary form (i.e., into a form involving only two variables). For example, edge convexity could be expressed in 2 - S A T by taking the '-|-' and ' - ' labels as the complementary values to be attached to the (interior) edges, and — as far as possible — implementing the constraints on the junct ion interpretations v ia binary constraints on these variables. Because of the 'Structures such as holes soraetimes lead to separate sets of labels being used for different parts of the same object. But this occurs even in Huffman-Clowes labelling. ^More precisely, the 2 -SAT problem is defined in terms of a set of literals U = ( « i , ¥i , M2, «2, • . . , u„, ¥„). These literals are constrained such that only one of the pair Ui or «i can be used, allowing them to be treated as two-valued boolean variables, the value of variable i being 'true' if ui is selected and 'false' if a; is selected. independence of the part i t ions , variables in the image can take on more than two values, provided that this does not occur wi th in any single part i t ion . Br ing ing together the above considerations yields T h e o r e m 3.1 7/ a line drawing of n lines has one or more partitions, each such that 1. All variables take on 2 values, and 2. All constraints on more than one variable are binary, then the relevant labelling problem can be reduced to an instance of 2-SAT. A U 2 -SAT problems of size n can be solved in 0 ( n ) t ime on a serial processor [E1S76], and in O(log'^n) t ime when O(n^) processors are available [KP88] . B . Reduct ion to C C L The reduction to 2 - S A T makes Uttle appeal to the geometrical organization of the constraints per se, using them only in regards to the ease of computation. But spatial coherence can also be used, both to obtain an approximation of lower complexity, and to help minimize the amount of nonlocal input needed for the local interpretations. In part icular , note that there sometimes exists a coordination among the sets of edge labels possible for a junct ion , or more generally, for some connected subset of Unes in the drawing. For example, i f boundary edges are ignored, al l Unes i n a Y - junc t i on must have the same label , either '-|-' or To capture this notion of coordination, define a bijective constraint on a set of variables Ui as one where the number of possible values for each variable is the same, and w i t h a 1:1 Unking between the allowable values (figure 3.1). For a bijectiVe constraint, therefore, the value of one variable determines the values of the others."^ The variables are essentially "locked together", and can be treated as a single quantity, w i t h appropriately reformulated constraints being appUed to neighboring variables. W h e n two bijective constraints apply to a common variable, the resultant set of con-straints is also bijective. This can simpUfy analysis considerably, allowing sets of junctions w i t h bijective constraints on m variables to be treated as a single m-valued complex when these junctions are connected to each other by Unes in the drawing (figure 3.1). A n example ^More precisely, the value of any variable is related to that of any other by a bijective function. bijective nonbijective nonbijective bijective bijective / N / \ / S / ^ / S /-\ /-•: /--.. /-\ complex j complex j+1 Figure 3.1: L i n k i n g of loca l constraints. Lines connect values compatible w i t h each other. of such a co-ordinated complex is the Necker cube. Here, two globally-consistent interpre-tations are possible, each of which has no local interpretations in common w i t h the other. Because the junctions are hnked v ia bijective constraints, the interpretation of any one junc-t ion immediately determines those of aU the others. For a bijective complex, globahy inconsistent labeUings can be removed by sending a signal from locations where legal values are missing and propagating it along the hnes of the complex, the signal causing the withdrawal of the relevant value at each location along the way. Th is propagation can be stopped at locations where the value has already been removed, and so when aU propagated signals have stopped, only the consistent labeUings wiU remain . This process is essentiaUy a variant of connected component labeUing ( C C L ) , w i t h connections made on the basis of the bijective constraints found at each junction. ' ' If only one part icular interpretation is required, al l but one value can be deleted from one of the locations, and the interpretations associated with the deleted variables removed. The interpretation process for a complex of bijective constraints can therefore be reduced to C C L . Since a Une drawing may contain several complexes separated or surrounded by "free" variables not in a complex, this is not necessarily true of the interpretation of the drawing itself. A lower-complexity approximation wiU only be possible when m i n i m a l effort is used in assigning values to the free variables and co-ordinating the interpretations of the various complexes. A t least two such conditions exist: when variables can have only one *More abstractly, this is a unidirectional perimeter-linkage problem (section 2.1.1), since all that is required is knowing which of the m labels to attach to each of the hnes crossing the boundary of the zone. The result of joining together two complexes across adjacent zones is always a single complex, since the constraints across the boundary are also bijective. value, or when they a^e not subject to any constraints at a l l . This consequently yields T h e o r e m 3.2 If a drawing of n lines has one or more partitions, each such that 1. All variables take on m values, and 2. All constraints on more than one variable are bijective then the problem can be reduced to an instance of CCL. The complexity of C C L is Q{n) t ime for a serial processor, and O( l ogn ) t ime when n pro -cessors are available^ [SV82, L A N 8 9 ] . Bijective constraints can also simpUfy the analysis of cross-dimensional interactions. If the interpretation in one stream corresponds to a single complex that covers the entire drawing, the number of possibilities is fixed, being at most the number of values possible for any loca l variable. A n d if the interpretation in a different stream also corresponds to such a complex, it too win have a fixed number of possible interpretations. Since only a fixed number of possible combinations needs to be examined, the interaction between the two streams w i l l increase complexity by at most a constant factor. This remains true even if the complexes do not cover the entire drawing, or i f complexes of different streams are not aligned wi th each other — the important factor is only that at each location in the image only a fixed number of combinations is possible. 3.1.3 A p p r o x i m a t i o n Strategies It is often the case that a set of constraints must be altered if a problem is to be reduced to a low-complexity form. A l t h o u g h there are a large number of ways that this can be done, two general strategies — each diametricaUy opposed to the other — can be discerned. The first of these is a conservative strategy, which increases the number of constraints u n t i l aU can be re-expressed i n the appropriate (e.g., binary or bijective) form. This approach effectively rejects a subset of legal labellings, avoiding those that require greater t ime (cf., "unsound reasoning" [Lev86]). Loosely speaking, speed is gained by increasing the number of " T y p e I " errors, i.e., increasing the number of realizable drawings (i.e., those that correspond *The number of processors is actually linear in the number of edges and the number of vertices. But since all vertices in the Hne drawings considered here have at least two and at most three edges, only the number of edges is used here. to a polyhedral scene) that are not detected as such. The result is a "quick and d i r t y " est imate as to what can exist in the scene. The opposite of this is a liberal strategy, which removes constraints unt i l the remainder can be put into the appropriate form. Here, low complexity becomes possible by increasing the " T y p e IV error rate, i.e., increasing the number of unrealizable drawings deemed to be reahzable. Such a strategy can be used as the basis for a quick "preprocessor" that provides hmits as to what cannot exist in the scene. In general, elements of both strategies may be used to develop an approximation, the Type I and T y p e II error rates being traded off against each other. 3.2 Individual Dimensions Given that hne interpretation is to be carried out along separate dimensions, which dimen-sions should these be? Several sets of considerations must be taken into account. JS the determination of a dimension is to have a low complexity, it must involve as few values as possible; indeed, i f the associated problem is to be reduced to 2 - S A T , the variables must have only two possible values. Similarly, i f use of nonlocal information is to be kept low, constraints should be bijective. A n d if interactions between dimensions is to be min imized , each dimension must involve constraints that interact w i th the others only in a unidirect ional way. It also is assumed that the dimensions involve quantities that are viewer-centered, a condition generaUy assumed for aU of early visual processing [Mar82]. A m o n g other things, this ensures that the recovery process obeys the more general viewpoint consistency constraint [Low87], which assumes that the scene is viewed from a single direction. It also entails that three-dimensional orientation must always be defined wi th respect to the direction of viewing. Each dimension must also obey a second constraint used by virtual ly aU theories of hne interpretat ion: the general viewpoint constraint (section 2.2.1). This requires any interpreta-t ion to be stable under smal l changes in viewing direction. One of the consequences of this constraint is that no two edges in the scene can be contained in a plane at right angles to the image plane. This allows accidental ahgnments to be ruled out — arrow- and Y- junct ions wiU always correspond to coherent corners in the scene, and since corners are assumed to involve no more than three edges (section 1.1), T-junctions wiU always correspond to occlusions of ///////////// 0. Input Image ///////Ay/// ///////////// //////////û!y 1. Contlgultij stream -contiguity relations between line and flanking regions 2. Convexity Stream - convex/nonconvex values to lines 5. Slant Sign Stream - sign of slant values to lines 4. Slant Magnitude Stream - magnitude of slant values to lines Figure 3.2: Sepaxation into indiv idual dimensions. one edge by a noncontiguous surface. In general, there are usually only a few viewing direc-tions that give rise to unstable interpretations, and so only a small penalty in interpretative power is given up in return for a large gain in performance (see, e.g., [Sug86]). In what follows, attention is given to bo th the quaUtative and the quantitative aspects of " o p t i m a l " Une interpretation (section 2.2.1). To further increase the amount of concurrent processing (section 3.1.1), each of these is further spUt, yielding four largely independent dimensions: edge contiguity, edge convexity, slant sign, and slant magnitude (figure 3.2). 3.2.1 C o n t i g u i t y L a b e l l i n g M u c h of the effectiveness of processing at early levels depends on knowing whether neigh-bor ing regions in the image correspond to contiguous or noncontiguous surfaces in the scene [Hor86, pp . 354-355]. Consequently, a reasonable candidate for an independent processing stream is one concerned wi th the determination of contiguity. To be as independent as possible, the corresponding dimension must avoid quantities that describe the internal structure of the objects (e.g., convexity and concavity). Furthermore, it would also help reduce complexity i f labels can have only two possible values. Thus , Huffman-Clowes ( H C ) labelUng cannot be used. A somewhat different scheme is therefore proposed — labels indicate only whether the sides flanking a Une correspond to surfaces that Figure 3.3: Contiguity labelling. are contiguous (C) or noncontiguous (N) wi th the corresponding edge in the scene^ (figure 3.3). In contrast w i th H C labelhng, each hne therefore has two labels, one for each side. A . Constraints on Contiguity Labelling In order to exclude doubly discontiguous edges (i.e., wires), contiguity constraints are required for the hnes. These are subject to the constraint that both sides cannot be labelled w i t h ' N ' , since the polyhedral world contains no wires; however, aU other combinations of ' C and ' N ' are possible (figure 3.4). Constraints on junctions are taken from the Huffman-Clowes scheme by identifying convex and concave edges wi th doubly-contiguous hnes, and boundary edges with singly-contiguous hnes (figure 3.4). This coUapses the H C constraints into the set shown in figure 3.4. It is apparent that any coherent scene wiU give rise to a consistent set of labels, and that this can be done by a process similar to that used for H C labelhng. The result is a segmentation of the image into sets of regions corresponding to noncontiguous surfaces in the scene. Because these constraints have been derived from the Huffman-Clowes set, any drawing which can be given a consistent H C labeUing can also be given a consistent contiguity l a -beUing. The converse s i tuat ion, however, does not necessarily hold : a consistent contiguity labeUing may not correspond to a consistent H C labeUing (e.g., the drawing in figure 3.5). The increased susceptibihty of the contiguity system to false labeUings stems from the loss ^Mackworth [Mac74, MMH85] describes a somewhat similar scheme of "connect" and "nonconnect" edges, based on the distinction between interior and boundary edges. However, it differs from the present scheme in using one rather than two labels per hne. -^z c ^ c - 4 -vN < I < Figure 3.4: Set of contiguity constraints. / N C C C C c N N Figure 3.5: Inconsistent drawing wi th consistent contiguity labeUing. of the correlations between contiguity and convexity, which are not taken into account when contiguity alone is considered. Thus , a consistent H C solution that has been "weakened" by coUapsing the convex and concave labels is only one of perhaps several possible solutions to the contiguity labelUng problem. B . Complexi ty of Contiguity Labelling Since contiguity labelUng involves only two values, its reduction to 2 -SAT depends entirely on the extent to which it can be described by a set of binary constraints. A s shown in figure 3.6, almost a l l of these constraints can be converted into binary form. Constraints on Unes are quite simple, since only a prohibit ion against double discontiguity is needed. For Y- junct ions , an addit ional bijective constraint is imposed: the "inside edges" of a region (i.e., Unes sharing a common region) must have the same contiguity labeUing. A m o n g other things, this yields the constraint that at most one of the three faces bordering a Y - j u n c t i o n can be noncontiguous. For arrow-junctions, a l l Unes except for those on the "outside" of the arrowhead must be marked as contiguous, and the outer sides of these junctions are subject to the bijective constraint that both must have the same value. There are 16 possible combinations of N and C labels on L-junctions, of which 6 are a l -lowed. A constraint against doubly-discontiguous Unes leaves 3 x 3 = 9 possibiUties. A con-straint against diagonal N labels removes another two. This leaves only one more constraint — that against 4-way contiguity (figure 3.6) — to be enforced. This constraint, however, cannot be enforced using binary constraints. Low complexity can therefore be guaranteed only for approximations in which this constraint has somehow been replaced. ~(N,) ~ ( C 3 - N 4 ) (N, ^ N3) (N^ N4) (N, ^ N^) Figure 3.6: Reformulation of contiguity constraints. Conservative approximation One way to remove the need for an exphcit constraint against 4-way contiguity is to require the inside and outside edges of an L- junct ion to have identical contiguity values; alternatively, one of the inside edges can be constrained to be discontiguous. V i a theorem 3.1, this results in Proposi t ion 3.1 When binary constraints are added that prohibit the 4-way contiguity of L-junctions, contiguity labelling can be reduced to 2-SAT. Liberal approximation A low-complexity approximation can also result by omitt ing the need to exclude 4-way contiguity on L-junctions. This leads to Proposi t ion 3.2 When the constraint against 4-way contiguity on L-junctions is omitted, contiguity labelling can be reduced to 2-SAT. Note that similar reductions to C C L are not possible unless extremely severe alterations are made to the constraints. 3.2.2 C o n v e x i t y L a b e l l i n g G i v e n that contiguity is concerned wi th inter-object relations, its natura l complement is in t ra -object structure, v i z . , edge convexity. A s for the case of contiguity, the standard H C labels are not suitable for present purposes, and must be replaced. A two-valued system is used here, based on that of [Mal87]: ' -h ' for edges of positive convexity (this has the same meaning as i n the H C system), and 'o ' for aU others. Note that the label 'o ' does not necessary correspond to negative convexity, but rather, serves as the complement required in a two-valued system. In what follows, the term 'convexity ' refers to positive convexity, in the sense defined here. A . Constraints on Convexity Labelling The constraints on convexity labell ing can be determined from those of the HuflFman-Clowes set by collapsing the labels i n a manner similar to that done for contiguity. The resultant set is shown in figure 3.7. A n y coherent scene wiU give rise to a consistent set of labels, which can be found by a " s tandard" labeUing process (section 2.2.1). Because the convexity constraints are a subset of the H C constraints, any drawing that can be given a consistent H C labelUng can also be given a consistent convexity label l ing. A s for the case of contiguity, however, the converse situation does not necessarily ho ld . A n example of this is shown in figure 3.8, which can be given a consistent convexity labelUng even though a consistent H C labelUng is impossible. The results of the contiguity and convexity streams can be combined if the edges marked as '4- ' in the convexity stream match a subset of the doubly-contiguous edges in the contiguity stream. The remaining 'o ' edges can then be assigned H C labels on the basis of contiguity alone. It is evident that combining the results in this way is possible exactly when a solution of the H C constraints can be found. But such a co-ordination requires the results in bo th streams to be weakened versions of the H C solution and, since the streams are separated, this does not generally occur. B . Complexi ty of Convexity Labelling Since convexity labelUng involves only two values, its reduction to 2 -SAT depends on the extent to which the constraints can be put into bijective or binary form. A s shown in figure + Figure 3.7: Set of convexity constraints. u u u ~(0, - +2) - ( + 2 - 0 3 ) ~(02^ +3) - ( N j ^ O,) - ( 0 3 - N,) ~(0, ^ O^) - ( + , - 0 3 ) ~(02-+3) ~(+3- +,) ~ ( 0 3 - 0 , ) (+ , ) u u 2 Figure 3.9: Reformulation of convexity constraints. 3.9, a l l of these can be put into this form. Theorem 3.1 therefore yields: Proposit ion 3.3 Convexity labelling of line drawings can be reduced to 2-SAT. A s is evident from figure 3.9, a l l of these constraints are also bijective, except for the prohibit ion against the double convexity of L-junctions. This suggests that approximations can be derived without a great alteration of the set of constraints. Conservative approximation A low-complexity approximation to convexity labelhng can be obtained by requiring a l l L- junctions to either have bo th sides labeUed with 'o ' , or else to have only one side labeUed wi th 'o ' . Theorem 3.2 then leads to: Proposit ion 3.4 If all L-junctions are constrained to have both sides labelled'•o\ or to have only one side labelled 'o ' , convexity labelling can be reduced to CCL. Liberal approximation A hberal approach would be simply to allow interpretations to contain doubly-convex L- junct ions . Theorem 3.2 then yields: Proposit ion 3.5 If both sides of L-junctions are allowed to be convex, convexity labelling can be reduced to CCL. 3.2.3 Slant S i g n L a b e l l i n g T h e quantitative aspect of Une interpretation considered here is the three-dimensional or ienta-t ion of the edges of each polyhedron. This property has two aspects: tilt, the two-dimensional orientation in the image plane, and slant, the deviation away from this plane. Since t i l t is already available in the image, processing can focus entirely on the recovery of slant. The determination of slant can itself be spUt into two components, concerned w i t h its sign and magnitude respectively. Slant sign (see, e.g., [Kan90]) describes whether the depth of the edge increases or decreases as it travels along some direction. It remains invariant under any positive rescaling of the depth, i.e., it can represent the "z-affine" structure, which may be the most important aspect of the recovered scene [TB90]. In this sense it is s imilar to convexity. B u t slant sign is viewer-centered rather than object-centered, and so is more typ i ca l of the properties thought to be handled by early vision (section 2.1.2). Slant sign is represented here by a double arrow (>- ), the direction of the arrow i n d i -cating the direction to foUow to increase distance from the viewer (figure 3.10). ^ The only consequence of using this representation is that under the general viewpoint constraint (sec-t i on 3.2), the slant sign must remain the same under smaU changes of viewing posit ion. Zero slant is therefore not aUowed. This can be stated as a constraint that no edges in the scene can be at right angles to the Une of sight. It is evident that any polyhedral scene obeying this general constraint wiU give rise to a consistent labelUng of the Une drawing. A . Constraints on Slant Sign Labelling A l t h o u g h many approaches (e.g., [Sug86]) require the quaUtative aspects to be solved before the quantitative aspects, the demands of rapid processing (section 3.1.1) require that the two types of aspects be determined largely concurrently. But if this is to be done, some '^ It may be useful to view the arrowheads as parallel lines receding into the distance. *In contrast to the other quantities, slant sign can only be defined with respect to a particular direction of travel. If slant sign is to be treated as a pure scalar, a canonical direction must therefore be defined. A natural choice for a coordinate system is one based on the lines surrounding each vertex, the reference direction being that in which the vertex is approached. Represented in this way, slant sign is subject only to an additional constraint that the labeUing of lines be spMt, with opposite ends of the Mnes having opposite values. A directional component also exists in the labelling of lines by arrows in the H C system, and the splitting required to put it into scalar form has becomes the basis of the contiguity system developed here. But because constraints on the slant system are binary and bijective, using a ' "split" representation will affect neither the power nor the complexity of slant sign labelling. In the interests of clarity, the "directional" form is used. Figure 3.10: Slant sign labelling. addit ional a priori assumptions are needed about the structure of the polyhedra in the scene — otherwise, any combination of slant signs can be attached to the hnes about a junction.^ Such structural assumptions do indeed seem to be used by the human visual system to determine three-dimensional structure (section 2.2.2). Convex polyhedra A very general s tructural assumption is that the polyhedra are convex. This prohibits Y- junct ions from having a l l hnes slanted towards the viewer,-^^ since this would correspond to a dent i n the surface. Similarly , an arrow-junction could not have its stem slanted away f rom the viewer while its two outer edges had to opposite slant. Constraints also come into play v i a the planarity of the faces: If the face is convex, two "chains" of arrow labels exist, which diverge from the junct ion at greatest distance and converge on the junct ion nearest the viewer. Directangular corners A more specific assumption is that polyhedra have directangular corners, i.e., corners for which two edges are at right angles to a th i rd about which they can " s w i v e l " . C o n s t r a i n t s can be based on the observation that two edges meeting at a right angle in the scene wih ®For example, a junction can always be interpreted as a very shallow corner, and this can be tilted or flipped to achieve any combination of signs. ' °More precisely, the distance to the viewer cannot be decreased as the distance from the vertex is increased. " F o r example, a book partway open has directangular angles at points where the spine meets the top and bottom of the covers. One possibi l i ty if y = 90° Intersection at 90" to line in image = zero slant Two possibi l i t ies U y ^ 90° 161 < 90°: Slant signs A and B have opposite values 151 = 90°: Slant sign B has zero value 151 > 90°: Slant signs A and B have same val ues Figure 3.11: Constraints on isolated L-junctions. always give rise to lines of opposite sign when the angle in the image plane is less than 90°, and to hnes of the same sign when the image angle is greater than 90°(figure 3.11). Given the hne corresponchng to the swivel edge, then, the slant signs of the other two hnes can be immediately determined (figure 3.12). It foHows that the constraints on the slant signs of arrow- and Y- junct ions are bijective (section 3.1.2), the exact constraints depending on whether the angles between the hne pairs are greater or less than 90°. However, constraints on L-junctions cannot be determined unless the orientation of the swivel axis in the image can be identified, since otherwise the angle i n the image may not correspond to a 90°angle i n the scene. Note that although the orientation must be given, it does not matter on which side of the junct ion the hidden swivel hes — a change of 180°will result in the same set of bijective constraints. Rectangular corners A powerful constraint apparently used by the human visual system is that of rectangu-lar i ty , the assumption that ah edges in each corner are at right angles to each other (section 2.2.2). A s in the more general case of directangular corners, knowing the label attached to one of the hnes on an arrow- or Y - junc t i on immediately determines those of the others. B u t now it is not necessary to know in advance which of the hnes corresponds to the swivel edge, since al l edges are equivalent. The constraints themselves take on a simple form — for arrow-junctions, the slant signs of the wings must be opposite that of the stem, whereas a l l Opposite sign as (a) Stem is swivel axis (b) Stem is not sv/ivel axis Figure 3.12: Slant sign constraints for arrow- and Y- junct ions . lines i n Y- junct ions must be given the same slant signs [Kan90]. Requiring a consistent set of slant signs for these junctions leads to Perkins ' laws (section 2.2.1): for Y- junct ions , a l l angles must be greater than 90°, while for arrow-junctions, the largest angle must be less than 270°, and the second-greatest less than 90°. The set of constraints on slant sign labels for rectangular corners are shown in figure 3.13. Note that the slant sign labels on arrow- and Y- junct ions become closely matched to the convexity labellings: for Y- junct ions , edges are convex exactly when they are slanted away f rom the viewer, and are nonconvex when they are slanted towards the viewer. Similarly , an arrow-junction w i l l have its stem slanting towards the viewer when it is nonconvex, and away when convex, the other lines taking on complementary values. The homogeneity of angles also means that there is no ambiguity about the angle be-tween the edges of the L-junctions. A n d since this angle is 90°, the slant sign of one Une automaticaUy determines that of the other (figure 3.11). Thus , L-junctions can be described entirely in terms of bijective constraints, without any need for a priori knowledge about the direction of the hidden swivel axis. 1 1 Figure 3.13: Slant sign labellings for rectangular corners. B , Complexi ty of Slant Sign Labelling Directangular corners W h e n corners are directangular, they give rise to bijective constraints on arrow- and Y -junctions. A n d when the directions of the hidden swivel axes are known, L-junctions have a similar set of constraints. Thus , from theorem 3.2, Proposit ion 3.6 When all comers are directangular and the directions of the swivel axis at all junctions are known, slant sign labelling can be reduced to CCL. Rectangular Corners W h e n corners are rectangular, a special swivel axis need not be singled out. A n d since L- junctions always have bijective constraints under this condit ion, this yields Proposit ion 3.7 When all corners are rectangular, slant sign labelling can be reduced to CCL. Note that the differences between directangular and rectangular corners do not lead to a significant difference in the complexity of slant sign labell ing. Rather , the main difi"erences are in the amount of a priori information needed from nonlocal sources. 3.2,4 Slant M a g n i t u d e L a b e l l i n g Slant magnitude is an absolute value which represents the "steepness" of an edge w i t h re-spect to the image plane. This quantity is completely independent of slant sign, being i n -variant under inversion about the image plane, but sensitive to the rescahng of depth. It is also a quantity that takes on a continuous value. A m o n g other things, this latter property means that the part icular representation used (e.g., angle, gradient) is not important f rom a computat ional point of view, since these quantities can be transformed into each other v i a information-preserving operations. A . Constraints on Slant Magnitude Labelling A s for slant sign, assumptions must be made about the structure of the polyhedra i n the scene if this dimension is to be determined independently of the others. Known corners If the three-dimensional structure of a corner is known and its edges have been identified w i t h the corresponding hnes in the image, a system of equations can be set up between the slants of these edges and the angles between the Unes of the junct ion [Kan90, p.288] sin sin ^ 2 cos(^i — ^2) + cos cos ^ 2 = cos7 1 2 , sin <jf>2 sin (^3 cos(^2 - ^3) + cos 4>2 cos <?!>3 — cos 7 2 3 , sin <?i>3 sin 1^ 1 cos(^3 - 6 * 1 ) - f cos <;è3 cos ij!>i = C O S 7 3 1 , (3.1) where (pi is the slant of edge i (with zero being along the hne of sight towards the viewer), 9i the angle of edge i in the plane, and 7 i j the angle between edge i and j. A solution can be found for any value of angles chosen. However, this solution requires an iterative scheme (e.g., Newton-Raphson) unless addit ional constraints are introduced [KanQO]. In order to keep the measure of slant symmetrical about the image plane, the angle a = 7 r / 2 - < ^ i s used for the slant magnitude ItseH, w i th a always in the interval (—7r/2,7r/2). Slant magnitudes cannot be determined for L-junctions in isolation, even when the angle between their edges is known. But equation 3.1 shows that if the slant magnitude of one of the edges is known, that of the other can be determined. A n d because this is an equation Unear in sin<p and cos^, </> (and therefore a) can be solved for analytically. If 7 is not 90°, two values are possible, corresponding to edges of greater or lesser slant (figure 3.11). These can take on different slant signs, depending on the particular value of 7; if this occurs, slant magnitudes must become signed i n order to maintain the correct binding betvi^een the signs and the magnitudes assigned to the edge. Otherwise, the slant sign of the known edge need not be given, since the solutions are symmetrical about the image plane. Directangular corners If the corner is known to be directangular and if the Une corresponding to the swivel edge can be identified in the image, the set of equations 3.1 takes on the form [Kan90, p.289]: B = 2 C0S(^1 - ^ 2 ) C 0 S ( ^ 1 - ^ 3 ) , C O s 2 ( ^ i - e 3 ) 1+ ; COs'^ /3 C 0 5 ( ^ 2 - ^ 3 ) V C O s 2 ( ^ 2 - ^ 3 ) , C = sin^/3 X -B + VB^ - 4AC 2A 7 r / 2 - t a n - \ / X (3.2) « 2 = 7 r / 2 - t a n - ( ' ° ' [ ^ ^ ~ ^ \ ^ o t a i ) , (3.3) COS(e/3 - (f2) « 3 = 7 r / 2 - t a n - ~ J ^ , , (3-4) cos(^3 - ^ i ) c o t a i where /? denotes the angle about the swivel axis, taken here to be edge 3. These values are coordinated sets, and so allow magnitude and sign to be completely separated. Since the two solutions of these equations are reflections of each other about the image plane [Kan90], arrow- and Y- junct ions have unique magnitude estimates for each edge. A l t h o u g h slant magnitudes cannot be determined for L- junctions, one constraint s t i l l appUes — i f the angle between corresponding edges is 90°, the magnitude of one edge uniquely determines that of the other. If the angle is not 90°, two values are possible (figure 3.11). Thus , i f the swivel angle cannot be identified, three values are possible for the slant of the second edge. Rectangular corners For a rectangular corner, aU. edges are orthogonal to each other, and the relation between slant and junct ion angle can be expressed in the much simpler form [Per68, Kan90] Note that the equahty of a l l angles between edges also ehminates the need to know which hne is the projection of the swivel axis. The rectangularity of the corner also means that there is no ambiguity in identi fying the angle 7 between the edges of any corner corresponding to an L- junct ion . A n d since 7 is 90°, the slant magnitude of one edge is uniquely determined by the magnitude of the other. For rectangular corners, therefore, L-junctions are completely described by bijective constraints. B . Complexi ty of Slant Magnitude Labelling Directangular Corners W h e n corners are directangular, there are three possible sets of magnitudes for a junc t i on , corresponding to the three possible choices of swivel axis. If the direction of the swivel axis and the swivel angle (3 are known, unique magnitude estimates can be assigned to edges contacting arrow- and Y- junct ions (equations 3.3 - 3.4). Furthermore, this condition also leads to b inary constraints on the magnitudes possible for L- junctions. But a chain of such L- junctions could cause the number of possible values to increase exponentially w i th its length, these values being impossible to resolve except by sequentially proceeding along the chain. In the worst case, therefore, the determination of slant magnitude for directangular corners could require at least hnear t ime, even on a parallel architecture. Rectangular Corners JS aU corners are rectangular, the magnitudes for the edges of arrow- and Y- junct ions can be obtained directly from equation 3.5. Values for L-junctions can be determined from the fact that slant magnitude remains invariant under a reflection of one edge by 180°; consequently, only the angle of the hidden edge is needed, and not its direction in respect to the vertex. B y determining and rebroadcasting the values of al l orientations in the part i t ion to a l l junct ions, the direction of the hidden edge can be made available to aU L-junctions.^'^ Once the local estimates of slant magnitude have been obtained, it only remains to check their consistency. A s discussed in section 3.1.2, such a consistency check can be carried out v i a C C L . Since aU other operations can be done in constant t ime, this yields Proposit ion 3.8 When all corners are rectangular, the determination of slant magnitude has a complexity no greater than that of CCL. Note that although the computation of the magnitude as given by equation 3.5 can be done in constant t ime, it does involve several trigonometric functions. But this calculation can be done quite simply if the slope of the slant rather than its angle is the relevant quantity. In part icular if the square of the slope (essentially, a "slope energy") is used, this removes the need for both an inverse tangent and a square root function. The only remaining quantities then become cosine functions of the angles between junct ion lines, which can be determined quite simply v i a the dot product (cf. section 5.2.2). Since slope energy and slope angle are related by a monotonie function, the particular quantity chosen is of no great importance for most purposes. In the interests of maintaining a parallel between two-dimensional and three-dimensional orientations, slope is represented here by its angle. 3.3 Integration of Dimensions A s shown in section 3.2.2, completely separated dimensions are often unable to capture large parts of the mapping structure contained in the original set of constraints. For example, a drawing may have several different contiguity and convexity labeUings, and if these are chosen such that the edges wi th positive convexity correspond to Unes that are doubly con-tiguous, the two can be combined into a complete H C labell ing. The separation of streams, however, means that it wiU generally be impossible to pick out the appropriate contiguity and convexity interpretations from among the alternatives. Instead of yielding a completely coherent interpretat ion, the process wiU be more Ukely to yield two part ia l interpretations that are incompatible w i th each other. ^^If only two directions exist in the drawing, any magnitudes compatible with equation 3.5 can be assigned to the edges. This loss of interpretative power, however, can be lessened by a controUed amount of interaction between streams. A s discussed in section 3.1.1, this can be done without ra is ing the complexity of the process provided that it is based on the unidirectional transmission of unambiguous results. In order to quickly achieve unambiguous results, a conservative strategy must be employed, based on addit ional structural constraints which rule out many legal interpretations (section 3.1.3). It is shown here that such a strategy can succeed for several sub domains of polyhedral objects. 3.3.1 C o n v e x O b j e c t s A part icular ly simple domain in which to begin is that of convex objects. These are po lyhedral objects in which aU edges of the object are convex; consequently, "mater ia l " always exists along the shortest pa th connecting any two points on two contacting edges (i.e., two edges that meet at a corner). A . Constraints on Labell ing of Convex Objects B y definition, the interior edges of convex objects are convex. A s such, al l arrow- and Y -junctions have a unique interpretation in both the contiguity and the convexity streams. Convexi ty also forces the inner edges of any L- junct ion to be contiguous; this in turn forces a unique convexity labehing of aU. L- junctions, i.e., aU edges nonconvex. The resulting set of constraints, shown in figure 3.14, leads to a unique set of convexity labels. The only indeterminate quantities are the contiguity labels on the outer edges of arrow-junctions and L- junct ions . Constraints on the outer edges of arrow-junctions are binary and bijective, requir ing both edges to have the same value, whereas those on L-junctions are simple b inary constraints that prohibit more than one side from being contiguous. B . Complexi ty of Labell ing Convex Objects A U convexity labeUings are unique, and so the contiguity labels in any consistent interpreta-t ion must necessarily be compatible wi th the convexity labels. Consequently, the determina-t ion of a complete quahtatlve interpretation reduces to the determination of a consistent set of contiguity labels. Figure 3.14: HufFman-Clowes labellings for convex objects. i) Reduction to 2-SAT Since a l l relevant constraints are binary, and since only two labels apply, proposit ion 3.1 leads directly to T h e o r e m 3.3 The line labelling of convex objects has a complexity no greater than that of 2-SAT. Note that if the H C labels are required, they can be recovered simply by assigning ' - ' to any doubly-contiguous non-convex hnes, and assigning boundary Unes to singly-contiguous lines. Similar ly , any consistent interpretation based on separate convexity and contiguity labels can be put into H C form. ii) Reduction to CCL For convex objects, the contiguity of the outer edges of the L-junctions results only from the contact of adjacent blocks, and not on any intrinsic structural property. It is therefore evident that a legal labelling for a drawing exists i f and only if it can be assigned an interpretat ion in which a l l blocks have been moved to a posit ion in which they are "free f loat ing" . This latter condition can obtained by setting al l outer edges of L-junctions to be discontiguous; the result is a set of unique contiguity labels on aU outer edges. Since a l l constraints are unique, no additional work is required to coordinate the results in the contiguity and the convexity streams. The complexity of the interpretation process is therefore exactly that needed to check for the presence of inconsistencies in each (section 3.1.2). This depends upon the conditions assumed for the image and scene domains. If only one set of connected hnes exists in the image, the preprocessing to remove T-junctions can be omitted , and the need to maintain partit ions ehminated. Under these conditions, the complexity of the interpretation process is exactly that of detection. But the blocks wor ld generaUy aUows several blocks to exist simultaneously in the scene, making it necessary to distinguish between various groups of hnes in the image. Since the basic H C labeUings are unique, so are those of the contiguity and convexity streams. A n d since both these streams involve the same partit ions, the integration of values is straightforward, leading to T h e o r e m 3.4 If objects are assumed to not contact each other, the line labelling of convex objects has a complexity no greater than that of CCL. fn essence, then , a principle of " m i n i m a l exterior contiguity" has been invoked to ob ta in a problem of lower complexity by reducing the set of preferred solutions. A s opposed to a purely conservative strategy (section 3.1.3), however, this strategy does not affect the labeUing problem in its narrowest sense, v iz . , a determination if at least one solution exists. Note also that i f an interpretation of a part i t ion exists, it is necessarily the only "free f loat ing" interpretation possible. Thus , a complete determination of scene structure (i.e., solving the reaUzabiUty problem) can be reduced to f inding a solution to a system of Unear equations and inequaUties [Sug86]. This can be solved v ia hnear programming, which can be carried out in po lynomial t ime [Kha79]. Linear programming, however, is a P-complete problem [Joh90, p.80], and as such is unhkely to be solvable by a sub-hnear algorithm even when parallel processing is available (section 2.1.1). 3.3.2 C o m p o u n d C o n v e x O b j e c t s Consider now a sUghtly less restricted domain in which it is stiU assumed that mater ia l always exists along the shortest path connecting any points along two contacting edges (as for convex objects), but for which the edges themselves are no longer required to be convex. These objects are referred to here as compound convex objects, since they can be readily reahzed by the attachment of convex objects to each other, this attachment being subject to the general constraint that only three edges can make contact at any vertex. Examples of such objects are shown in figure 3.15. < I < t Figure 3.16: Huffman-Clowes labellings for compound convex objects. A . Constraints on Labelling of C o m p o u n d Convex Objects C o m p o u n d convex objects give rise to almost the same set of arrow-, Y - , and T- junct ion labellings as found i n the " s tandard" H C set. B u t because of the shortest-path requirement, there must always be a common surface on the side formed by the interior angle of any L -junc t i on , and on the surfaces between edges of an arrow-junction. The interpretation process can therefore be based on the set of junct ion labeUings shown in figure 3.16. (The conversion into separate contiguity and convexity constraints is straightforward.) Note that only four constraints have been removed from the original Huffman-Clowes set.^'^ ' ^ T h e interpretation of an arrow-junction with a concave stem should not be allowed if consideration is focused on compound convex objects per se, since it can reduce the ability of the system to detect line drawings not obeying the constraints assumed. However, this interpretation can easily be removed, with all arguments going through unaffected. It is left in to show that the set of constraints in figure 3.16 potentially applies to a slightly larger domain of polyhedral objects. B . Complexi ty of Labelling C o m p o u n d Convex Objects To establish bounds on the complexity of hne labelhng, consider first the convexity system. F r o m figure 3.16, it is seen that no L-junctions can have a convex edge; consequently, a l l must have a 'o ' label attached to both edges. V i a proposition 3.4, it follows that convexity labelhng for this domain can be reduced to C C L , the computation proceeding independently for each par t i t i on . Compat ib ih ty between the convexity and contiguity streams can be guaranteed by trans-m i t t i n g the identities of any edge marked as ' - f ' and constraining the relevant edges to be doubly contiguous. Unique contiguity values can also be assigned to the inside edges of Y -junct ions, and to the crossbars of T- junctions. Since the partitions are the same for b o t h streams, contiguity labeUing needs to be done at most two times for each part i t ion — once for each of the two possible convexity interpretations. i) Reduction to 2-SAT A U contiguity constraints on Unes and L-junctions in figure 3.16 can be put into the b inary form described in section 3.2.1. Apphcat ion of proposition 3.1 then yields T h e o r e m 3.5 For compound convex objects, line labelling has a complexity no greater than the maximum of that of 2-SAT and CCL. ii) Reduction to CCL Since the labelhng of arrow-junctions, Y- junct ions , and T-junctions can al l be based on bijective constraints, the possibihty is raised that the hne labeUing of compound convex objects can be reduced to C C L . This can be done by showing that the bijective constraints on L- junct ions and hnes are unnecessary. Notice that each complex of bijective constraints beginning on the outside of an L- junct ion can be considered a " cha in" that travels along the sides of arrow-junctions, terminat ing either when it contacts an edge with a unique value (e.g., the stem of an arrow-junction) , another outer L- junct ion edge, or a danghng edge (figure 3.17). These chains can be readily determined v i a C C L . Because the junction constraints preserve contiguity, a l l values along a chain must have the same value. Thus , i f a chain terminates at a junct ion which forces it to have a unique value, or contains an edge which is simUarly constrained, aU of its elements chain A chain B Figure 3.17: Free chain complexes. must be set to that value, an operation which can be carried out by C C L . Otherwise, the chain is free to take on either contiguity value. A s long as it ensures that the basic contiguity constraints at its ends are obeyed, the chain can be considered to be essentially decoupled from the rest of the interpretation, its values then unaffected by subsequent assignments in the rest of the drawing. If a free chain has at least one end in contact w i th an L- junct ion , interpret its constituent variables as discontiguous. Th is assignment is always compatible w i th the constraints of figure 3.16. A n interpretation constrained in these ways is therefore possible i f and only i f it is possible to interpret the drawing as a set of compound convex objects. The use of this restr ict ion is essentially a generalized application of the principle of " m i n i m a l exterior cont iguity" used i n the analysis of convex objects. Invoking this principle essentially causes these objects to be dismantled into separate convex components whenever possible. Somewhat similar considerations apply to a chain that has dangUng edges at both its ends, except that here the chain w i l l be interpreted as contiguous. In a direct paraUel w i t h the previous principle, this can be seen as a principle of " m a x i m u m interior contiguity" . Note that the two contiguity principles have been invoked to obtain a problem of lower complexity by reducing the set of preferred solutions. A s opposed to a purely conservative strategy (section 3.1.3), however, this strategy does not affect the labell ing problem in its narrowest sense, since a restricted solution wiU be found if at least one more "general" solution exists. H a v i n g dealt w i t h L- junctions, it must now be shown that an explicit binary constraint is not needed against doubly-discontiguous Unes. If no junctions are present, a line can immediately be given any legal labeUing. Otherwise, as figure 3.16 shows, the prohibit ion against double discontiguity is automatical ly imposed for aU junctions. This proves T h e o r e m 3.6 For compound convex objects that are assumed to not contact each other, line labelling has a complexity no greater than that of CCL. Figure 3.18: Examples of rectangular objects. 3.3.3 R e c t a n g u l a r O b j e c t s Low-complexity interpretation is also possible for rectangular objects, i.e., polyhedral objects for which a l l corners have edges that meet at right angles. These constitute a large domain of objects, examples of which are shown in figure 3.18. A . Constraints on Labelling of Rectangular Objects Rectangular objects impose no addit ional exphcit constraints on the H C labeUings of arrow- , Y - , and T- junct ions . Constraints only apply to L-junctions, the particular choice of con-straints depending on the angle in the image.^'* There are two cases to consider here. T h e first is when this angle is acute (i.e., less than 90°). Because the angles between the corre-sponding edges in the scene are 90°, the hnes of an acute L- junct ion must have opposite slant signs (section 3.2.3). A n d since the hidden edge of a rectangular corner is always slanted away from the viewer [Kan90], the consistency of the slant signs leads to a bijective constraint on the convexity labeUing (figure 3.19(a)). A paral le l s i tuation exists for obtuse L-junctions (i.e., those for which the angle is greater than 90°). Rectangularity now forces both sides to take on the same slant signs. W h e n both edges are slanted away from the viewer, they must be interpreted as a pair of singly-contiguous hnes; i f they are slanted towards the viewer, three interpretations are possible (figure 3.19(b)). The resultant set of junctions labeUings, shown in figure 3.20, is much the same as that of Huffman-Clowes, except that the constraints shown in figure 3.19 have been added. A more quantitative constraint that can also be used is that of planarity: if the planarity of the faces is to be maintained, any chain of three connected hnes having three different " T o avoid possible confusion, this angle is taken to be the smaller of the two possibilities. Hidden edge must be in this zone Hidden edge must be in this zone Hidden edge must be in this zone K Hidden edge must be in this zone Figure 3.19: Constraints on L-junctions. 1 Figure 3.20: Huffman-Clowes labellings for rectangular objects Figure 3.21: P lanar i ty constraint. Surface normals for a common surface must be the same. ~ ( C 3 - N4) Figure 3.22: Interior angle constraint. directions in the image cannot aU be labeUed as contiguous (figure 3.21). This constraint ap-phes to both sides of the chain. A s figure 3.21 shows, this constraint stems from a prohibi t ion against having different surface normals defined from each of the hne pairs. Another quantitative constraint is the interior angle constraint: if a slant sign is to have local consistency, any chain of two connected obtuse L-junctions must alternate in the consistency labels attached to the edges on their interior angles (figure 3.22). This arises from the close connection between slant sign and contiguity for these L-junctions (figure 3.19), together w i t h the requirement that slant signs on these junctions must alternate. B . Complexi ty of LabeUing Rectangular Objects Since the convexity label l ing of edges involves only binary bijective constraints, proposi t ion 3.4 ensures that this can be reduced to C C L . A s for the other domains discussed here, only two convexity interpretations are possible for each part i t ion. A n d as before, compat ib i l i ty can be ensured by first solving for convexity and then requiring hnes labelled as ' + ' to be doubly contiguous. In order to reduce contiguity interpretation to a low-complexity problem, constraints must be put into an appropriate form. The constraints on arrow- and T-junctions are a l -ready binary and bijective, as are the constraints on acute L-junctions. Since the contiguity constraints on Y- junct ions can also be described by a set of binary bijective constraints (sec-t ion 3.2.1), the reduction of the contiguity labelling problem centers on the constraints for obtuse L-junctions and Unes, i) Reduction to 2-SAT L e m m a 3.1 For rectangular objects, the planarity and interior angle constraints allow the constraint against the 4-way contiguity of obtuse L-junctions to take on binary form, this reformulation having complexity no greater than that of CCL. Proof : If an obtuse L- junct ion is isolated, simply assign it one of the legal labeUings of figure 3.19. V i a an exhaustive enumeration of aU possibiUties, it can be seen that the p lanari ty constraint rules out a jo ining of acute and obtuse junctions. Unique values can also be assigned when the shared edge is an "outer" edge of an arrow-junction, w i t h the stem point ing away from the interior angle (figure 3.23). W h e n the stem points towards the interior angle, a binary (although not bijective) constraint can be imposed on the possible values. A unique set of contiguity labels can also be made possible for Y- junct ions by invoking the planarity constraint (figure 3.23). Otherwise, a l l shared edges are wi th others of the same type, and so the junct ion is part of a chain of obtuse L- junctions. These chains can be detected in a preprocessing step based on C C L . In such a chain, each interior side of a junction is an exterior side of its neighbor. A n d since the interior angle constraint forces the interior labeUings of neighbors to be different, it is impossible that such a junction can have both of its interior and exterior sides labeUed as contiguous. Since only a direct assignment of values and constraints to local configurations are involved, the complexity of these c Required by unary, binary constraints Impossible due to planarity constraint N H Required by b i jec t ive , planarity constraints Impossible due to in ter ior angle constraint N Figure 3.23: Cont iguity constraints on obtuse L- junction combinations. operations are no greater than that of C C L . The proof then fohows from the observation that the interior angle constraint is both binary and bijective. • T h e o r e m 3.7 For rectangular objects, the planarity and interior angle constraints allow line labelling to have a complexity no greater than the maximum of that of 2-SAT and CCL. Proof : Since hnes can be handled by a binary constraint (section 3.2.1), it is only necessary to express the constraint against the 4-way contiguity of obtuse L-junctions in b inary form. F r o m lemma 3.1, it foUows that these can be cast into the appropriate form v i a a preprocessing step that assigns unique values to many of the obtuse L- junct ion labels, and binary constraints to the rest. A n d it foUows from the l emma that this step has a complexity no greater than C C L . • These are the same complexity bounds found by Kirousis and Papadimitr i ou [KP88] for the somewhat more restricted case of the orthohedral world, in which aU edges are required to be paraUel to one of the three main axes in the scene. The addition of exphcit planarity and interior angle constraints, therefore, makes similar low-complexity recovery possible for the more general domain of rectangular objects. (a) L-termination (A) (b) L-termination (B) (c) Y-termination Figure 3.24: Termination configurations. ii) Reduction to CCL A s for the case of convex and connected convex objects, it is also possible to show that Une labelUng for rectangular objects is of complexity no greater than that of C C L . This requires a careful isolation of the remaining binary constraints on obtuse L-junctions and on Unes. L e m m a 3.2 Bijective constraints can determine the correct contiguity labelling of obtuse L-junctions, except for the case of "L-terminations', in which an L-junction contacts an arrow-junction with its stem oriented toward the interior angle (figure 3.24). Proof: A s l e m m a 3.1 shows, most values on obtuse L-junctions can either be uniquely as-signed or given bijective constraints, except for the case where it contacts an arrow-junct ion with its stem point ing toward from the interior angle. If the L - terminat ion is of type A (i.e., free variables for both exterior edges), the exterior edges require some addit ional constraint to ensure that they cannot both be contiguous. If the L - terminat ion is of type B (i.e., free variables on only one of the exterior edges), the side contacting the L- junct ion is uniquely determined, and so no constraints are needed for the other side. • It must now be shown that there is no need for an expUcit binary constraint against doubly-discontiguous Unes. L e m m a 3.3 Bijective constraints alone can enforce the prohibition against double disconti-guity, except for the case of a "Y-termination", i.e., a configuration in which a dangling edge contacts two arrow-junctions with stems oriented away from that edge (figure 3.24(c)), Proof : If no junctions axe present, a line can immediately be given any legal label l ing. Otherwise, as figures 3.19 and 3.20 show, the prohibition against double discontiguity is automatical ly imposed for a l l junctions except obtuse L-junctions and Y- junct ions . A s shown in figure 3.23, the constraints on obtuse L-junctions ensure that each hne has at least one contiguous side. This leaves only Y- junctions to be considered. The only constraints on Y- junct ions are the set of bijective constraints shown in figure 3.6, which require that the same value be assigned to hnes contacting a common region. If a drawing does correspond to a scene containing rectangular objects, however, two Y- junct ions wiU never contact each other, for to do so would immediately violate the p lanari ty constraint. The constraint against double contiguity can therefore be inherited directly from the junctions that the Y - junc t i on contacts. Comphcations arise from the danghng edges arising from occlusion (i.e., the stems of T- junct ions) , but these can be handled by a preprocessing stage: If all three edges are dangling: the junction can simply take on any legal inter-pretat ion. If two edges are dangling: the remaining edge necessarily contacts another junc -t ion , and so obeys the constraint; the inner edges of the danghng hnes are undeter-mined, but without loss of generality the appropriate constraints can be enforced by requiring these to be contiguous. If only one edge is dangling: A n exhaustive examination of al l cases (figure 3.25) shows that the planarity constraint ensures the appropriate contiguity con-dit ion for aU configurations except the Y- terminat ion . • Given lemmas 3.2 and 3.3, it must now be shown that the remaining contiguity constraints on L - and Y-terminat ions can be handled appropriately. A n y complex beginning at an L -or Y - t e r m i n a t i o n can be seen as a " cha in" that travels along the sides of Y- junct ions and the outer sides of arrow-junctions, and terminates at another L - or Y - t e r m i n a t i o n . These chains are similar to those used to analyze the interpretation of compound convex objects (sec.tion 3.3.2), and are handled in much the same way. Since aU other aspects of the hne interpretation can be handled by bijective constraints, it is only necessarily to show that the chains can also be labeUed in a consistent way v ia a process of complexity no greater than that of C C L . c c c Required by unary, binary const ra in ts Impossible due to p lanar i ty constra int Required by unary, binary const ra in ts Impossible due to p lanar i ty constra int Cannot ex is t for a rectangular object Impossible due to p lanar i ty constra int Figure 3.25: Cont iguity constraints on Y- junc t i on combinations. Contiguity labell ing can be carried out by first assigning labels to variables constrained to take on unique values, and to those chains that contain such a variable. Next , any chain wi th more than two orientations to its set of edges must necessarily be discontiguous i f it is to obey the planarity constraint; consequently, each free chain that remains cannot b e n d , but must travel in one general direction only. The remaining free chains can now be labelled. Note that since these chains are decoupled from the rest of the interpretation (section 3.3.2) it does not matter whether this is done before or after determining labels for the rest of the drawing. Indeed, it follows from lemma 3.2 and the bijective nature of the constraints that al l variables outside the chains w i l l have only one possible value. In order to reformulate the constraints on the remaining free chains, the range of possible interpretations is restricted in a manner similar to that employed in the other two domains , v iz . , a restriction such that a solution for the restricted variant exists i f and only if a solu-t ion exists for the more general case. Consider first the chains that are connected together cycUcally i n a group, i.e., connected by common L - or Y-terminations^^ If the number i n such a group is even, let a l l termination configurations be contiguous on one side only. If the number is odd , pick a termination configuration and set both of its sides to be contiguous if it is a Y - t e r m i n a t i o n or discontiguous i f it is an L- terminat ion , and then constrain the remaining configuration to be contiguous only on one side. Th is results in a set of bijective constraints that allow al l chains in the cycle to be interpreted while continuing to prohibit double discontiguity. It is evident that this process can be carried out in parallel for a l l cychc chains in the par t i t i on . Those chains that are not cycUc must contact a termination configuration for which one side of the central L - or Y - junc t i on has already been assigned a definite contiguity value. If there is only a single chain between such junctions, it can be determined in a fixed amount of t ime whether or not it can be given a value consistent wi th those already assigned; this is possible exactly when a legal labeUing exists for the drawing. A similar situation holds for two connected chains. T h e common L-terminations are necessarily of type A . If three or more chains are connected together, a shghtly more complex procedure can be used: 1. Determine the values for the endpoints if they are to have a legal labelhng of their termination configurations. 2. Constra in al l terminations between the chains along this path to have sides of opposite contiguity. 3. If the number of chains is even and the endpoints have different contiguity labels, the alternation of contiguity at the terminations wih suffice for a legal labelhng. A s imilar s ituation holds when the number of chains is odd and the endpoints have the same contiguity labels. Otherwise, pick one of the inner termination configurations: If it is an L-termination: it must be of type A ; constrain both sides of its central junct ion to be discontiguous. If it is a Y - t e r m i n a t i o n : constrain both sides of its central junct ion to be con-tiguous. The result of this process is a sequence of chains that must alternate in value at each ter-minat ion condit ion, except at the configurations described above. A solution for this set of constraints is possible exactly when a legal labelhng can be obtained for the drawing. The detection of the chains and the propagation of values along their extent can be carried out entirely by C C L . Since the constraints along these chains are bijective, their solution can also be obtained v ia this procedure. A n d since both convexity and contiguity labeUing can be reduced to C C L , this yields T h e o r e m 3.8 For rectangular objects, the planarity and interior angle constraints allow line labelling to have a complexity no greater than that of CCL. C . Slant Sign Constraints For rectangular objects, constraints exist on the slant sign of each edge in the drawing (sec-tions 3.2.3 and 3.2.4). The resultant set of constraints are shown in figure 3.13. Since a l l constraints are bijective, the entire drawing is imphcit ly Unked together in one entire com-plex w i t h only two possible interpretations. From proposition 3.7, it then foUows that the determination of slant signs can be reduced to C C L under these conditions. (a) Drawing delectad as impossible (b) Drawing not detected as Impossible rectangular object rectangular obiect - cannot be given consistent - can be given consistent set of slant sign labels set of slant sign labels Figure 3.26: Slant sign constraints applied to impossible figures. Described in this way, the determination of slant sign does not rely on any other system, and so can serve as an independent source of constraint on the f inal interpretation. Indeed, the use of slant sign yields a process of higher accuracy that that obtained from qualitative labeUing alone [Kan90], since it permits a greater rejection of drawings that cannot correspond to a rectangular object (figure 3.26(a)). However, because only local orientations about the junctions are involved, these addit ional constraints are not sufficient to eUminate aU impossible figures (figure 3.26(b)). Constraints on slant sign not only provide more information about the corresponding polyhedron, but can also speed up the interpretation process itself. For example, i f a Y -junct ion has a l l three edges slanted away from the viewer, it must correspond to a convex corner. Similarly , i f the stem of an arrow-junction slants away from the viewer, it must be concave and the other edges convex. A s shown in figure 3.19, the slant sign determines the labeUing of acute L-junctions and of obtuse L-junctions for which the edges slant away, aUowing aU contiguity constraints to be put immediately into binary form. Note also that i f the slant sign stream is used (together wi th the convexity stream) as the basis for contiguity interpretat ion, it eUminates the need for expUcit planarity and interior-angle constraints. D . Slant Magnitude Constraints Slant magnitudes are constrained v ia equation 3.5. It foUows from this equation that the magnitude of one edge immediately determines that the other (section 3.2.4). When only two Une directions are present in a part i t ion , slant magnitude is underconstrained; the slant magnitudes of the edges can then be fixed simply by assigning some arbitrary value to one of the directions. Since the parahel hnes in each part i t ion represent paraUel edges in the scene, once the part icular magnitudes have been chosen, C C L can be used to propagate them to aU Unes i n the par t i t i on . A similar s ituation exists when three different hne directions exist in the image, except now the slant magnitudes are uniquely determined. A s discussed in section 3.2.4, the deter-minat ion of the slant magnitudes for this situation can also be reduced to C C L . E . Robustness i) Image perturbations The quaUtative and quantitative interpretation of rectangular objects reUes on a special form of the general viewpoint assumption, v iz . , the assumption that slants in the scene are never zero, and consequently, that hnes in the image are never perpendicular to each other. A l t h o u g h this assumption is sufficient for theoretical purposes, any pract ical system must be able to compensate for errors that arise from the measurement of image properties. A s such, an addit ional set of techniques is required to ensure that the interpretation process remains robust against smaU perturbations of the input image. For the interpretation of rectangular objects, perturbations have their greatest effect when one or two edges have a slant differing only shghtly from zero. W h e n only one of the edges is very shaUow, the other two are in a plane closely ahgned with the hne of sight; as a consequence, their projections onto the image are be nearly at right angles to the projection of the shaUow edge (figure 3.27(a)); among other things, this makes it difficult to distinguish arrow- and Y- junct ions from T-junctions. If two of the edges are shaUow, their projections are nearly at right angles to each other (figure 3.27(b)), making it difficult to distinguish between acute and obtuse L- junct ions. A s such, shallow edges can cause a potential instabiUty in the labelhng of convexity, contiguity, and slant sign. Furthermore, from equation 3.5 it also foUows that estimates of slant magnitude are also sensitive to smaU errors in hne orientation angle 9i under these conditions. One way to obta in robustness against such perturbations is to alter shghtly the angles of the hnes in the junct ions , setting them to values that are aU the same. This helps both to remove the effect of local perturbations, and to reduce the effects of perturbations introduced at any later stage of processing. The exact procedure depends on which of the two situations (a) One edge shallow (b) Two edges shallow Figure 3.27: Conditions of shallow slant. Figure 3.28: Combinations of angles into corners. is encountered. In b o t h cases, the procedure begins by obtaining the distribution of Une directions i n the par t i t i on . Ideally, only three directions would exist, corresponding to the three directions of the edges of the corresponding object. If more than three exist, a procedure (e.g., taking the mean of each of the three distributions) can be used to remap the hnes onto a smaller set of angles. Once determined, these new values can be broadcast to al l junctions. This remapping applies to a l l possible corners (figure 3.28), since it follows from equation 3.5 that slant magnitude is indifferent to the particular combination chosen. Since a similar reassignment can also be used for the f inal set of directions obtained, alteration of Une direction can be based entirely on a pair of canonical junctions, obtained from the appropriate rearrangement of Unes in the image (figure 3.29). Consider first the case where one edge has a shaUow slant. A s figure 3.27(a) shows, this is signaUed by the existence of two nearly-paraUel Une directions in the image. Using the Normal to projection of nonshallov edge Slopes increased with respect to normal j slopes I decreased Normal to projection of shallow edge with respect to normal (a) One edge shallow (b) Two edges shallow Figure 3.29: Rescaling of image angles. n o r m a l to the shallow edge to complete a local coordinate frame, the slopes of the other two Such a rescahng of orientations corresponds to a rotation of each corner of the object about an axis perpendicular to the shallow edge, w i th the slant of the shallow edge being increased. Note that this is not a true rotation of the object as a whole, since this requires a change i n the distances between junctions as well. But i f the rotation is small , the change i n foreshortening is neghgible, and the transformation can be Interpreted as a shift in viewing posit ion that results in more robust interpretation. A similar technique can be used when two shahow edges exist. From figure 3.27(b), It Is seen that this condition Is signahed by the existence of two hne orientations that are nearly perpendicular to each other. A s for the case of one shallow edge, slopes can be rescaled, except that now the rescahng Is done wi th respect to an axis perpendicular to the edge which Is not shallow (figure 3.29(b)). The resulting transformation corresponds to a rotation about an axis at right angles to the nonshaUow edge, the rotation serving to increase the slant of the two shaUow edges. If it is necessary to apply both transformations to the hnes of an Image, this can be done simply be applying the required corrections in some fixed way. Thus , provided that three different hne directions can be distinguished In a part i t ion , they can always be remapped into a new set of orientations that can disambiguate any local ambiguities caused by smaU perturbations i n the Input Image, and that minimize the effects of any other perturbations that might be introduced by subsequent processing stages. hnes can be rescaled u n t i l at least one has a value 6min (figure 3.29(a)). ii) Perspective distortion Tl ie results obtained here assume the scene-to-image projection to be orthographic, i .e., rays from the scene contact the image plane at right angles. Since the scene-to-image mapp ing is the same at a l l points in the image, a spatially-uniform set of rules can therefore be used to recover the scene structure. This greatly simphfies the development and analysis of the recovery process. Orthographic projection is almost always a good approximation of the perspective projec-t ion that actual ly governs the mapping of objects onto the image plane. However, it breaks down when an object extends over a large fraction of the visual field. In such a case, only one point corresponds to a perpendicular projection from the object to the image plane, and a " r a -d i a l " distort ion arises that is centered about this point. A l though this distortion complicates the recovery process, it does not affect its interpretative power — a global transformation of the image can always be found that maps each junct ion to its equivalent under ortho-graphic projection (see, e.g., [Kan90, ch.8]). Consequently, both qualitative and quantitative structure can always be recovered. Since the emphasis of this work is pr imari ly on rap id recovery and not on robustness per se, special corrections for perspective distortion are not developed here. Note that perspective distort ion alters only the angles and lengths of lines in the image, and so the basic qualitative aspects of the recovery process are largely unaffected. Furthermore, if these distortions are smal l , the angles can be reahgned by the broadcast mechanism used to handle smal l perturbations i n the input . Thus , the only situation not encompassed by this approach is the relatively rare case where the projection of an object extends over a considerable fraction of the v isual field. Chapter 4 Computational Analysis A computat ional analysis describes and justifies an image-to-scene mapping that is (i) unique, and (ii) possible wi th in the given external and internal hmitations (section 2.4). For the mapping considered here, the external Hmitations are that the information comes f rom a single orthographic projection of a blocks world scene of the type described in section 1.1, and that a constant amount of time is available for its operation (section 2.5.2). Internal hmitat ions are that a mesh architecture is used, and that the processors have a fixed number of states. Th i s chapter develops a set of constraints that defines a process capable of recovering a large amount of the scene structure within these l imitations. To ensure that local computation is relatively simple, a set of external constraints is chosen that Hmit the range of possible mappings to those that can be determined in subhnear time.^ A set of internal constraints is developed to control the search through the space of possible solutions. These constraints are chosen so that a reasonable chance exists of f inding a plausible interpretation wi th in the allotted t ime. The constraints developed here, of course, are not necessarily those used by the human early vis ion system. M a n y factors are potentially involved in the rap id recovery of three-dimensional structure, not a l l of which are known or fuhy appreciated at the present t ime. A s such, this analysis is not intended primari ly as a definitive treatment of the rapid recovery process, but rather as an i l lustrat ion of how a computational analysis of this process can be carried out . *More precisely, the complexity of the mapping must be a subUnear function of the number of lines in the image (see section 2.5.1.) 4,1 External Constraints A s discussed in section 2.5.2, tlie output of a rapid recovery process is a dense set of estimates assigned to each spatiaUy-hmited patch (or "zone") in the image.^ These estimates must be bo th local ly consistent and computable in a constant amount of t ime. If they are to have a good chance of corresponding to the actual structure of the scene, the corresponding property must be easy to compute and use m i n i m a l information from outside the zone. G iven the correlation between the amount of nonlocal information that must be t rans -m i t t e d and the complexity of an operation (section 2.1.1), it fohows that recovered quantities must be computable by a low-complexity process, ideally one of no more than logar i thmic complexity (cf. chapter 3.1.2). This can be ensured by an appropriate choice of external con-straints (section 2.4) on the f inal form of the mapping. These constraints serve to ehminate those mappings that cannot be computed in subhnear t ime. 4.1.1 I m a g e - t o - S c e n e m a p p i n g For a rap id recovery process, the goal is to reconstruct as much of the three-dimensional scene as possible, w i t h interpretations required to be consistent only over spat ial ly-hmited zones (section 2.3.2). Th is goal is somewhat different from that of the "classical" problem of hne interpretat ion, which assigns unambiguous interpretations to each hne, and completely rejects drawings that cannot be given a globally consistent interpretation. Consequently, the image-to-scene mappings need not be the same for the two types of tasks. Possible differences in the image-to-scene mapping include not only differences in the part icular associations between input and output quantities, but also differences in the quan-tities themselves. The first step to find a mapping suitable for rap id recovery is therefore to determine an appropriate set of inputs and outputs. A . Basic Quantities A fixed hmit on time translates into a fixed hmit on the distance over which information can be transmitted . If recovery is to be robust wi th regards to this hmit , it cannot be based on global properties (e.g., the number of features present in the image), or on extensive ^The exact size of these zones is not critical, the main constraint being that they are small enough that each contains no more than a few Une segments (section 4.3.1). properties (e.g., the lengths of Hnes and edges). Instead, it involves only those properties that can be determined locally, i.e., over arbitrari ly small areas of the visual field (cf. section 2.1.1). Loca l i ty apphes to properties of both the input and the output. In what foUows, the basic quantities in the image domain are taken to be the two-dimensional orientations of the Unes and the locations of their endpoints.-^ The quantities i n the scene domain are taken to be the (positive) convexities, slant signs and slant magnitudes of the edges, as weU as the contiguity relations between edges and surfaces. In part i cu lar , the form of the output is taken to be exactly that used in chapter 3.1.2 — a set four spa-tiotopic maps, one for each property, in which the variables describe the absence or presence of the corresponding quantity. A U properties are assumed to be "dense", being attached to al l points along the Unes i n each zone. Note that this differs from the "sparse" form used i n the "c lassical" problem of Une interpretation (section 2.2.1), where properties are attached to ind iv idual Hnes, rather than points (see, e.g., [Mal87]). This choice of properties is motivated not only by the requirement that the properties be local , but also by the results of chapter 3.1.2, which show that these quantities can i n -deed be rapidly recovered for several sub-domains of polyhedral objects. Note that these are not " template" properties, which can be calculated reUably on the basis of local information (section 2.1.1), but instead require at least some nonlocal information for their complete determination. However, the low complexity indicates that relatively Uttle nonlocal informa-t ion is needed. This is the key to the effectiveness of a rapid recovery process — even though nonlocal information is generally needed for a complete local interpretation, at least some of this structure can be rapidly recovered if the amount of information needed is small. Of course, other quantities (such as the slants of the surfaces) could also be used, and conversely, some of the quantities used here may not actuaUy be recovered by the human early v isual system. But the quantities chosen here encompass both the quaUtative and quantitat ive aspects of Une interpretation (section 2.2.1), and are therefore adequate for present purposes, v iz . , iUustrating how rap id parallel recovery can be done. B . Isolation of Indeterminate Values Since the interpretation provided by a rapid-recovery process does not need to be consistent over the entire image (section 2.3.2), the surrounding interpretations do not need to be ^Locations are always relative to a particular zone, so that absolute coordinates are not needed. N H Figure 4.1: Isolation of inconsistency in contiguity labeUing removed when a local inconsistency is found. Indeed, this is not even desirable, for the propagation required for this removal can take considerable time (cf. section 3.1.2), and also results in information being lost from those areas which do obey the assumptions. Similarly , the hmited time available may be insufhcient to completely determine a l l interpretations, so that ambiguities also must be exphcitly handled in some way. In order to allow these possible outcomes to be exphcitly signaUed, the set of output states is expanded to include an ' F label for inconsistencies, and an ' A ' label for ambiguities. These can be apphed to any variable in any dimension. B y careful apphcation of these labels, indeterminate values can be isolated so as to allow a definite interpretation to be given to the other parts of the drawing. A n example of this is shown in figure 4.1. If the contiguity constraints of section 3.3.3 are used on this figure, a globaUy consistent labeUing is not possible. At tach ing an T label to one of the hnes on the central notch, however, allows the remaining edges to be given a definite interpretation. In order that these labels can be used wherever needed, no exphcit constraints are placed on their apphcation; i f necessary, they can be assigned to aU hnes in drawing. However, the use of these labels must be minimized i f the greatest amount of information is to be obtained about the three-dimensional structure of the scene. Since this cannot be done by constraints on the static assignment of the labels, it must be done v ia constraints on the process that generates these assignments (section 4.2). 4.1.2 G e n e r a l P r i n c i p l e s Chapter 3.1.2 shows that low-complexity approximations can be formed in many different ways, depending on the constraints selected. It is now necessary to select one particular set of constraints, and to justi fy this selection. Three general principles are relevant here. A . Separation of Dimensions A s shown i n chapter 3.1.2, a low-complexity mapping can be obtained when external con-straints apply pr imar i ly to simple variables within separate dimensions, wi th only a l i m i t e d amount of interaction between the corresponding streams. This strategy is taken as a basic principle here. In part icular , a l l constraints that are nonbijective (i.e., do not have a 1:1 mapping between allowable values — see section 3.1.2) must involve only two variables, each w i t h two possible values.'' This ensures that the problem is easy to solve (section 3.1.2). Since it has been shown to lead to low-complexity mappings for many subdomains (section 3.3), the set of dimensions used here is exactly that of chapter 3.1.2: contiguity, posit ive convexity, slant sign, and slant magnitude. B . Locality of Constraints Most constraints used in hne interpretation (sections 2.2.1 and 3.3) are local , pertaining to ind iv idua l hnes and junctions. A n example of this is the requirement that al l edges in a Y -junct ion must be labelled as '-f-' or 'o ' (section 3.2.2). Constraints such as the interior angle constraint (section 3.3.3), however, involve relations ôeiiyeen junctions, and so are not of this form. Since nonlocal constraints require image-based as well as scene-based information to be t ransmit ted , they increase the demands on computational resources. They are also awkward to enforce on a mesh processor, where information travels at a constant speed across the image (section 2.5.2). Only local constraints must therefore be used. A s a special case of this principle, note that it is impossible to ensure that a single label can be attached to any Une (section 2.2.1), since a Une can extend over a considerable distance in the image. Consequently, constraints on junct ion labeUings are stiU allowed, but they now apply to Une segments of fixed length rather than Unes of arbitrary length. A n auxiUary set of constraints is required to constrain neighboring segments to take on the same values. *Note that more values are possible for these variables, but that only two must enter into the nonbijective constraints. This allows T and ' A ' labels to be used in addition to the two definite values, since they do not enter into any explicit constraint. c. L o c a l Coordinat ion of Dimensions M u c h of the power of the original mapping can often be captured by a low-complexity approx-imat ion only if there is an interaction between the interpretations in the separate dimensions (section 3.3). Ideally, this would be carried out by a globally-coordinated sequencing between the different streams. Since global coordination is not feasible here, however, the interactions between dimensions must be reformulated to occur at a local level. A s discussed in section 3.3, the key to the successful integration of dimensions is the unidirect ional transmission of information. But global coordination is not needed — this can be achieved by t ransmit t ing information from a local interpretation after it has been assigned an unambiguous value. For example, when the edges of a Y - junct ion have a unique labelhng as convex, they are necessarily contiguous, and so can determine the corresponding interpretation in the contiguity stream. This k i n d of interaction allows local constraints to assist the interpretation process without any danger of increasing the complexity of the process. 4.1.3 S t r u c t u r a l A s s u m p t i o n s A s shown in section 3.3, approximations of subhnear complexity can exist when appropriate restrictions are placed on the contiguity and convexity interpretations of L- junctions. These restrictions can be achieved v ia assumptions about the structure of the polyhedra, and as discussed i n section 3.3, there are several sets of assumptions which can be used towards this end. L i what fohows, interpretation is based on constraints obtained from the assumption of rectangular corners (section 3.3.3). There are several reasons for this choice. First of ah, the visual system is exceptionaUy good at detecting junctions corresponding to rectangular corners, and using the rectangularity assumption to determine the three-dimensional orientations of the corresponding edges (see section 2.2.2). There is no reason to suppose that this preference is hmited only to the higher stages of v isual processing. A second set of reasons involves issues of symmetry and structure. If an angle between two edges i n a corner is unknown, 90° is a natural default, simply because it is midway on the range of a l l possible angles;^ in some sense, it may be considered to be an expected ^If edges are assumed to be unmarked, a rotation of 180° is an identity transform. Attaching an edge to a value. Furthermore, the fact that a l l edges are perpendicular to each other makes it s imple to convert between the slants of the edges and the slants of the faces: for rectangular corners, the normal to the surface corresponding to a region is parallel to the edge that is opposite it in the junct ion . Final ly , there are reasons based on computational complexity. The interpretation of rectangular objects requires relatively httle in the way of processing resources, since it is among the least complex of a l l hne interpretation problems. Rect angularity also allows edge slant to be determined rapidly by loca l processes, something not generally possible for the other domains considered in section 3.3. Indeed, the estimation of edge slant does not even need to wait for the quahtative analysis to be completed (cf. [Sug86]), but can proceed i n tandem wi th the determination of contiguity and positive convexity. 4.1.4 S y s t e m of E x t e r n a l C o n s t r a i n t s Bring ing together the requirements outhned above, the rap id recovery of three-dimensional structure is assumed here to be governed by the external constraints shown in figure 4.2. These involve four quasi-independent dimensions: contiguity, positive convexity, slant s ign, and slant magnitude. The intra-dimensional constraints are given by the permissible l a -beUings of the junctions; these are essentiaUy the constraints developed in section 3.3.3. Interactions between dimensions, shown by the arrows in figure 4.2, are readily derivable f rom this same set of constraints. This system of constraints is largely a reformulation of those of section 3.3.3. In part ic -u lar , the nonlocal p lanarity and interior-angle constraints have been replaced by local con-straints on slant sign, which then influence contiguity v i a the inter-dimensional interactions. There also exists a more direct local (nonbijective) constraint against doubly-discontiguous Unes. A l t h o u g h not required to at ta in a process of logarithmic complexity when global co-ordinat ion is possible (section 3.3.3), this constraint can improve the speed and power of the interpretation process when only local processing is allowed. Note that the contiguity constraints on obtuse L-junctions cannot in general be put into b inary or bijective form. In the absence of a definite interpretation for the two inner or two outer Unes, only a single common constraint (that requiring bo th inner Unes to take on the same value) can be appUed. But the remaining constraints can be put into b inary form i f corner does mark it, but two adjacent edges that differ by 180° are stiU effectively the same edge. Contiguity Convexity Slant Sign Figure 4.2: System of external constraints. Arrows indicate interactions between dimensions. a dependency on the state of particular "trigger" variables is introduced so that constraints are put into effect only after a definite interpretation has been assigned.^ Consequently, aU bijective and binary constraints on the junctions have been kept and the rest reformulated to take on one of these two forms. A l though global co-ordination of the type required to achieve the results of section 3.3.3 is no longer possible, these constraints do allow the three-dimensional structure of the scene to be recovered. Given sufficient t ime , a consistent interpretation without ambiguities or inconsistencies is possible whenever the drawing corresponds to a set of rectangular objects. Because of the T and ' A ' labels, however, the requirements of section 2.3.2 are also met: a scene that does not contain rectangular objects everywhere may stiU give rise to local interpretations in those regions where the basic s tructural assumptions are obeyed. 4.2 Internal Constraints A l t h o u g h external constraints hmit the range of interpretations which can be given to a line drawing, they are not generally sufficient to determine its form completely. For example, the mark ing of edges as inconsistent or ambiguous must be kept to a low level (section 4.1.1), but this requirement conflicts w i t h the prohibit ion against global measures (section 4.1.1). More generally, the interpretation process must operate wi th in a fixed amount of t ime, and while the external constraints developed in section 4.1 do select a set of solutions that can be calculated quickly and wi th a m i n i m u m of nonlocal information, they do not provide any guidance as to what should be done when such hmits are imposed. To complete the specification of this mapping, therefore, an independent set of internal constraints must be imposed to help ensure that the best use is made of the available pro-cessing time.^ These are constraints on the generation of the interpretation itself (section 2.4.2). For the process here, these constraints are required to lead to a subset of interpre-^These constraints, obtained from figure 4.2, are as follows: 1. If one of the inner edges is contiguous, no more than one outer edge can be contiguous. 2. If one of the outer edges is discontiguous, both inner edges must be contiguous. 3. If both inner edges are discontiguous, both outer edges must be contiguous. 4. If both outer edges are contiguous, both inner edges must be discontiguous. '^The use of such constraints is essentially an elaboration of Marr's principle of graceful degradation [Mar82, p. 106], extended to cover not only reductions of available information, but reductions of other resources as well. tal ions that have a relatively high hkehhood of corresponding to the structure actually i n the scene. A l t h o u g h the probabiUties of various image-to-scene associations depend on the particular scene domain under consideration, exact knowledge of these probabihties is not generaUy necessary — aU that is required is an ordering of the various candidates. A s such, it is possible to provide a set of principles that are potentially appUcable to many domains encountered in the na tura l world. 4.2.1 P r o c e s s i n g A r c h i t e c t u r e Internal constraints act on the flow of information that occurs during the course of computa-t ion . In order to develop such constraints it is first necessary to specify — at least at a general level — an "abstract architecture" that describes the way in which information processing and information transmission are carried out. A . Processing over Zones F r o m the definition of the rap id recovery problem (section 2.5.2), the only processing resources assumed to be available are a spatiotopic mesh of processors, w i th each processor having a relatively smaU set of states. If good use is to be made of these resources, each processor (or group of processors) in the mesh must be assigned to a separate zone i n the image, i.e., to a compact contiguous area of hmited spatial extent (section 2.1.1). This requirement stems pr imari ly from considerations of efficiency. W h e n only a fixed amount of t ime is aUowed, each processor can only act on a fixed number of inputs.^ Since the number of processors increases w i t h the size of the input (cf. section 2.5.2), effectiveness can be maintained by assigning each processor to a separate region of the image. A n d i f processors are uniform i n regards to their processing power (as assumed here), it is best i f these regions have the same size. Since transmission delays wi th in a zone must be kept to a low level, regions must also be contiguous and compact.^ (This is related to the preference for local quantities described in section 4.f .1.) The demand for contiguity is forced not only by the need for compactness, but *This is a generalization of an order-limited perceptron [MP69], with the output function being any function that can be calculated in a fixed amount of time. ^This restriction means that the process can be carried out by a generalized version of a diameter-hmited perceptron [MP69]. by the recognition that the external constraints are highly sensitive to breaks in the l ines, and since hne drawings may fal l anywhere, no part of the image can afford to be sk ipped . Cont iguity also reduces the computational power required of the indiv idual processors, since it is easier to handle a small set of unbroken hnes than a large set of disconnected hne fragments. B . Communicat ion between Neighboring Zones If the mapping between input and output could be described entirely in terms of non-interacting zones, recovery could be carried out on an array of processors, each of w h i c h calculates only simple template properties of its corresponding zone. But if anything be-yond the most rudimentary hne interpretation is to be carried out, communication between processors is required. In what foUows, it is assumed that the assignments of processors to zones maintains a spatiotopic organization and that neighboring processors in the mesh are assigned to neighboring zones in the image. Once again, this is motivated by considerations of efiiciency. Since al l constraints between the local interpretations are themselves local , there is relatively httle to be gained by hav-ing some other assignment of zones to processors. Furthermore, the operation of the l o ca l processors (as weU as the analysis itself) is simpUfied, since the transmission of in formation takes place only v ia the zone-to-zone percolation of information through the " v i r t u a l mesh" formed by the lattice of zones over the image. In part i cu lar , this information flow originates from zones containing an interprétable junct ion , and propagates at a constant rate along the connecting Unes. Internal constraints therefore act by controUing the in i t ia l assignment of interpretations wi th a zone, and by controUing the propagation of these values along the hnes of the drawing. 4.2.2 G e n e r a l P r i n c i p l e s A t the most general level, internal constraints can take effect in two ways: (i) constraints on the basic operations used, and (ii) constraints on the representations operated upon. These effectively provide general constraints on the propagation of information around the " v i r t u a l mesh" that occurs during the interpretation process. Four general principles are used here to provide constraints on this propagation. A . Maintenance of Interpretive Power If the process is to have a good chance of recovering some part of the scene structure, it must not be too quick to throw away possible interpretations. Consequently, a hberal interpretat ion strategy is used: a candidate interpretation is kept unless an inconsistency is detected. Since inconsistencies are determined by the set of external constraints, this becomes the requirement that internal constraints must not exclude any interpretation consistent wi th the external constraints. In other words, internal constraints must not have any ehminative power — they must be entirely concerned wi th the ordering of the various possible solutions. In order to allow al l possible interpretations to be handled i n a systematic way while keeping true to the demand that only two values exist in each constraint (section 4.1.2), the outputs of each of the four streams are spht into two separate subsystems: Contiguity : T w o complementary subsystems to represent the possibihty of contiguity and noncontiguity. Convexity : T w o complementary subsystems to represent the possibihty of convexity and nonconvexity. Slant Sign: T w o complementary subsystems to represent the possibihty of the two types of slant sign.^° Slant Magnitude : T w o different subsystems — a quantitative subsystem to carry the value of the estimate, and a quahtatlve subsystem to represent the possibihty that this value can legitimately be assigned. For the complementary subsystems, the existence of a possible interpretation is signalled by a 'possible ' state attached to the relevant edge, while its impossibihty is hkewise signaUed by an ' impossible ' state (figure 4.3).^^ The use of these subsystems allows a l l possible interpretations to be represented quite s imply: Definite: assignment of 'possible' to an edge in one of the subsystems and ' impossible ' to its complement. A m b i g u o u s : assignment of 'possible' in both subsystems. Inconsistent: assignment of ' impossible' in both subsystems. Slant towards or away from the viewer is not a pure scalar like the other two quantities — it has a directional component that must be taken into account (cf. section 3.2.3). This can be handled simply by having each subsystem represent an increase in depth as the line segment is traversed from one of the ends. " T h i s is somewhat analogous to the use of relevance logic in reasoning (see, e.g., [Lev86]. Cont igu i ty Subsystem Noncontigulty Subsystem '6 U '4 3 2 6 { poss ib le , imposs ib le } G { poss ib le , imposs ib le } Figure 4.3: Example of complementary labell ing. B . Locally Irreversible Computat ion If a process is to make good use of available t ime, it must avoid doing and undoing the same operations without any net effect. This requirement — essentially a form M a r r ' s principle of least commitment [Mar82, pp . 106-107] — rules out hypothesize-and-test strategies, favoring instead "one-shot" processes that require only a few steps for their completion. In the absence of global control , this must be done by a local mechanism that forces the process to avoid redundant processing while simultaneously ensuring that it w i l l not exclude any consistent interpretat ion. To combine this principle w i th that of maintaining interpretative power, the following scheme is used: 1. A 'possible' state is in i t ia l ly assigned to a l l values of al l complementary subsystems as weU as the qualitative subsystem of the slant magnitude stream. 2. Whenever a local inconsistency is found, the corresponding value is marked as ' impos-sible' , and this value w i l l never be withdrawn. This is essentially a simple form of Waltz filtering (section 2.2.1), w i th an in i t ia l m a x i m u m uncertainty steadily reduced to the point where no local inconsistencies remain. Given the s tructural assumptions that have been made, httle nonlocal information is required for a local interpretat ion. Convergence to a definite interpretation in each zone is therefore hkely to be fast. E v e n if only a hmited amount of time is available, the result is hkely to provide at least some informat ion to higher stages of processing. The s ituation is similar for the slant magnitude stream, except that the quahtative sub-system serves to indicate the confidence of the corresponding magnitude estimate. W h e n l oca l inconsistencies are found in the estimates of slant magnitude, an ' impossible' label is assigned to the variables involved, and then propagated along the Une. Note that the state of the interpretation can be described in terms of the distr ibution of the 'possible' states over the edges in the complementary subsystems. More precisely, local uncertainty exists only when a value and its complement are both possible. This aUows the overaU uncertainty attached to an interpretation to be described by an "entropy" measure that includes aU the ind iv idua l local uncertainties. A l though never used by the actual process itself, this measure can provide a way to describe the overaU state of uncertainty in the interpretation at any given moment. Loca l irreversibiUty can be incorporated into complementary labeUing by a reformulation of the constraints found at the local junctions. In order to preserve the power of the or iginal set of constraints, this reformulation is subject to the foUowing condition: any set of definite values ruled out by the original constraints of figure 4.2 must also be ruled out by the new set of constraints. F r o m figure 4.2, it is seen that aU constraints except the contiguity constraints on ob-tuse L- junctions are "context free", i.e., the particular set of constraints depends only on the geometrical configuration in the image, and not on the set of labels attached (cf. figure 4.2). Reformulation is based on the idea that once such constraints have been set up , eUm-inat ion of possible interpretations can occur by a simple priority mechanism that aUows an ' impossible ' state to replace any 'possible' state. Since 'possible' states are init ial ly assigned to a l l variables, the reformulation involves only the ways in which ' impossible ' states are to be t ransmit ted . The situation for state-dependent junctions is s imilar, except that no constraints are apphed u n t i l definite assignments have been made to the inner or outer edges. Consequently, it is possible to reformulate the constraints in a way that aUows the process to be locally irreversible while maintaining the complete set of external constraints: i) Unary constraints Since only two values can exist in a subsystem, a unary constraint (i.e., a constraint that acts on a single variable) necessarily requires the variable to have a unique value. For example, a unary constraint exists on the inside edges of an arrow-junction that force them to be contiguous. Given two complementary subsystems, unary constraints can be easily enforced by m a r k -ing the corresponding value in the complementary subsystem as 'impossible' . The value i n the or iginal subsystem, however, remains unaffected. This leaves a definite interpretat ion which can only be altered by becoming inconsistent (i.e., the value in both subsystems being ' impossible ' ) . For an arrow-junction, therefore, the Inside edges In the non-contiguity subsys-tem are marked as ' impossible ' and the corresponding edges In the contiguity system reta in their or ig inal state of 'possible'. ii) Bijective constraints Bijective constraints are such that a 1:1 correspondence exists between the values of the variables Involved (section 3.1.2). For example, a bijective constraint exists on the contiguity labels of the inside edges of an acute L- junct ion , since both of these edges must be either contiguous or discontiguous (section 3.2.1). Since only two values exist for each variable, bijective constraints take on a simple f o rm: either the variables have the same values, or else they have opposite values. If two adjacent hnes are required to have the same values, both must have the same definite values. I.e., the corresponding variables In the complementary subsystem must be ' impossible' . Th is constraint can be enforced by the requirement that If one of the corresponding variables In a subsystem Is 'Impossible', so must be the other variable in the same subsystem. If two adjacent hnes are required to have opposite values, their corresponding variables are s imilarly constrained, except that now the constraint apphes to variables In "opposing" subsystems (figure 4.4). Note that this latter type of constraint provides a binding between the two subsystems, which are otherwise largely Independent. iii) Nonbijective constraints The Intradlmenslonal constraints that are not bijective Involve a single prohibit ion against a part i cular combination of values (figure 4.2). These constraints can be reformulated quite simply. If one of these values in such a prohibited combination definitely occurs (i.e., its complement is ' impossible ' ) , then the other value must be excluded (i.e., its "direct" state Is 'Impossible') . Otherwise, nothing else is done. Note that in contrast to bijective constraints, only a one-way transmission of ' impossible ' states Is Involved. For example. If one side of a hne has been marked as definitely discontiguous (i.e.. Its value in the contiguity subsystem is ' impossible') then the other side must be contiguous .u^ u G ( possible, impossible } V, G { possible, impossible ) if (Vj = impossible) (u^= impossible) if <v^ = impossible)-> (u^ = impossible) = impossible) (v,= impossible) if (u, = impossible) -> (v, = impossible) 1 4 if (u^= impossible) -> (v^ = impossible) Figure 4.4: Example of reformulation of bijective constraint. Contiguity Subsystem Noncontiguity Subsystem u^  G {possible, impossible} Vj G { possible. Impossible } if (u^ = impossible) -> (v^» impossible) if (U2= impossible) -> (v^ = impossible) Figure 4.5: Example of reformulation of nonbijective constraint. (i.e., its value in the discontiguity subsystem is ' impossible') . But i f one side is marked as contiguous, nothing else necessarily follows, and so no transmission results (figure 4.5). iv) State-dependent constraints The contiguity constraint to be apphed to obtuse L-junctions depends on the interpreta-t ion attached to its edges. This k i n d of constraint can be reformulated in a straightforward way by imposing the appropriate set of constraints when the "trigger" variables take on definite values, i.e., when their values in the noncontiguity subsystem are 'impossible' . v) Interdimensional constraints Since information from one dimension is sent to another only when a definite interpreta-t ion has been achieved (section 4.1.2), the reformulation of the relevant constraints is fair ly straightforwaxd: when a part icular set of values has been definitely assigned to the edges about a junct ion (i.e., the corresponding complementary values are 'Impossible'), the associ-ated values i n the other stream can be given definite values (I.e., their complements are set to ' impossible ' ) . Note that i f a variable is deemed to be inconsistent, its value in both subsystems is ' impossible ' . Consequently, the transmission of information across dimensions can cause the corresponding variable in some other dimension to also be labehed as inconsistent. Since processing t ime Is hmited , however, the propagation of these Inconsistencies Is unhkely to affect greatly the quahty of the f inal interpretation, an assumption borne out by tests on a variety of hne drawings (chapter 6). C . M i n i m i z a t i o n of Inconsistency It is important to control the propagation of labels so that m i n i m a l Inconsistency results, I.e., ' impossible ' are assigned to no more variables than necessary. This condition is automat i -cally obeyed i f the drawing corresponds to a rectangular object, for the constraints are such that appropriate values can always be assigned to the variables. Indeed, when redundant constraints are added, more routes become available for propagation, and the faster spread of 'Impossible' states then speeds up the interpretation process. However, when the scene contains objects that do not conform to the underlying s tructura l assumptions, inconsistencies can arise in the resulting Interpretation. Consequently, the more routes available for propagation, the greater the spread of inconsistent interpretations. In order to hmlt the spread of such Inconsistencies, therefore, some care must be taken when selecting the part icular set of constraints to be used. In part icular , i f a variable is subject to a unary constraint, no other constraints must be apphed to i t . For example, the Inner edges of an arrow-junction are constrained to be contiguous. If they are also constrained to have the same value, the Interpretative power of the system Is not affected regarding rectangular objects, since no Interpretation exists In which these Unes can be assigned definite Interpretations as discontiguous. However, i f an ' impossible ' label has been t ransmit ted to one of these hnes, such a constraint wih cause it to be propagated to the others and assign them values that could never exist in any polyhedral scene. B y disallowing such a constraint, the opportunity for inconsistencies to spread is minimized while the power of the or ig inal system of constraints Is maintained. D . Pr ior i ty M a r k i n g T h e preceding principles have brought w i t h them a shift in emphasis from individuals to ensembles. But human perception tends towards ind iv idual interpretations. W h e n v iewing a Necker cube, for example, perception alternates between single unambiguous cubes, rather than being a superimposed set of aU possible interpretations. If the recovery process is to reduce the set of interpretations that are simultaneously possible, and if it is to remain effective, it must focus on those that have the greatest hkehhood of corresponding to the structure in the scene. This can be done by mark ing such preferred interpretations as distinct. If this m a r k i n g does not otherwise affect the interpretation process, it can allow priority to be given to the most Ukely candidates while stiU keeping available aU other possible interpretations. The simplest way to incorporate prior i ty mark ing is to extend the set of values that can be given to a variable ~ i n addition to 'possible' and ' impossible' , include a 'preferred' state, which has prior i ty over a 'possible' s t a t e . I n regards to aU constraints developed so far, the 'possible' and 'preferred' states can be treated as equivalent. The introduct ion of this dist inction can be viewed as a way to spUt the recovery process into two concurrent substreams, dealing w i t h ensembles and individuals respectively. A s such, the addit ional constraints required for pr ior i ty marking must be Umited entirely to 'possible' and 'preferred' values. The only exception is a requirement that when the complement of a 'possible' value is marked as ' impossible ' , the value itself must be upgraded to 'preferred'. However, this " in t ra - s t ream" transmission does not affect the set of constraints deaUng w i t h ensembles, since these are involved entirely wi th the propagation of ' impossible' states. Consequently, the introduct ion of the possible-preferred distinction has no adverse effects on the abiUty of the recovery process to eUminate inconsistent interpretations. If desired, the final output can be represented using " s tandard" labels that express the two definite interpretations, the inconsistent interpretation, and the ambiguous interpretation: 1. If one subsystem has a 'preferred' state and the other does not, take its value as a definite interpretation. 2. Otherwise, i f bo th subsystems have 'preferred' or 'possible', set the interpretation to be ambiguous. ^^Although interpretations can be even better distinguished by the use of several different priority levels, selection is usually from just a few alternatives, so that this system is sufficient for present purposes. 3. Otherwise, both subsystems must have ' impossible' states. Set the interpretation to be inconsistent. Pr i o r i ty mark ing is a " d u a l " to the interpretation of ensembles. Instead of using a hbera l strategy to ehminate impossible interpretations, it uses a conservative strategy to generate hkely ones. It is therefore based on the same set of external constraints as the "ensemble" system, except that it involves 'preferred' and ' impossible' labels. Initiahy, ah variables are set to 'possible' , w i th the exception of a small in i t ia l set that are assigned a 'preferred' state. Constraints are enforced in much the same way as those of the ensemble substream, except that they involve the transmission of 'preferred' states to the "direct" subsystems instead of ' impossible ' states to the complementary subsystems. Consequently, the complexity of pr ior i ty mark ing is the same as that of the ensemble substream. The only asymmetry between the two processes is that interpretations in the ensemble substream can override those of the pr ior i ty substream, but not vice versa. In part icular , when an ' impossible ' state is assigned to a variable in some subsystem, it is effectively wi thdrawn from pr ior i ty marking . In addit ion, the value of the corresponding variable in the complemen-tary subsystem is upgraded from 'possible' to 'preferred'. This asymmetry reflects the basic difference underlying the assignment of the two kinds of label: possible-impossible distinc-tions are based on necessary consequences of the set of assumptions, while possible-preferred distinctions are generally based on considerations of hkehhood. Since complementary subsystems are not used for slant magnitude, there is no need to distinguish between 'possible' and 'preferred' values. However, 'preferred' labels can be used to signal when there is some evidence for a definite assignment of magnitude (e.g., magnitudes obtained v ia Perkins ' laws). Th is proves especially useful in distinguishing slants that are zero by default from those that have been determined to be zero, since the latter can be treated equivalently to any nonzero slant magnitude. 4.2.3 S e l e c t i o n of I n i t i a l C a n d i d a t e s The course of processing is controhed not only by constraints on the dynamic flow of informa-t ion , but also by constraints on the in i t ia l interpretations that are considered. In part icular , the pr ior i ty mark ing mechanism developed in the previous section provides a way to dis-t inguish a subset of selected candidates, but does not itself provide any principles to guide this selection. A l though the most appropriate selection of in i t ia l candidates depends on the paxticular domain being modelled, a few general principles appear to be widely applicable. A . Contiguity The first of these principles is that of maximum interior contiguity, which assumes that the inner surfaces of corners are contiguous whenever possible. This principle is an extension of the one used to reduce the complexity of labeUing convex and compound convex objects by selecting a preferred set of interpretations (sections 3.3.1 and 3.3.2). It may also be a basis for the human perception of Une drawings [Mac74][Hor86, p. 355]. T h e principle of m a x i m u m interior contiguity appUes only to the contiguity stream. T h e part icular set of junct ion markings , shown in figure 4.6, is as foUows: Arrow-junctions : Cont igui ty is preferred for aU Unes except for the outer pair , which do not generaUy form an interior angle. Since contiguity is necessary for inner hnes, noncontiguity is impossible. Y- junct ions : Cont iguity is preferred for aU Unes. L-junctions (obtuse): Contiguity is preferred for the inside edges, since most of the possible interpretations assign them this value. The outer edges of these junct ions , however, cannot be given preferred interpretations, for although one of these edges can be contiguous, the other cannot, and the symmetry of the situation makes it impossible to prefer one over the other. L-junctions (acute): Outer edges are necessarily contiguous, while symmetry makes it impossible to prefer any interpretations of their inner edges (figure 4.2). T-junctions : The crossbars have a necessary contiguity relation, so that preferred values foUow immediately. N o preference is given to values on T- junction stems, since these are effectively isolated Unes.^^ B . Convexity Preference in the convexity stream is determined by the principle of maximum convexity, i n which assumes that aU tr ihedral corners in the scene have positive convexity. This principle '^Strictly speaJdng this is not true for more reahstic surfaces, where the stem represents not only an edge of a different surface, but can also represent a crack or thin surface marking on the same surface. T o address this issue more fully would require the development of a rapid-recovery system based on a more extensive set of labels, and this is beyond the scope of the present work. Contiguity Subsystem Noncontiguity Subsystem • • • • Impossible •::-::-:y::-:> Possible Preferred stems from the observation that convex corners are more common than concave ones; indeed, concave corners do not necessarily correspond to actual structures of the object itseff, but may instead result from contact between adjacent objects [Bie85]. The i n i t i a l set of junct ion markings in the convexity stream, shown in figure 4.7, is as foUows: Arrow-junctions : Convexity is preferred for the stems, while non-convexity is preferred for the outer wings. Y- junct ions : Convexity is preferred for aU hnes. L-junctions (obtuse): A U Unes are necessarily non-convex, leading to preferred values i n the nonconvexity subsystem. L-junctions (acute): A l t h o u g h constrained to have one convex and one nonconvex side, symmetry makes it impossible to assign a preference. T-junctions: The crossbars of the T-junctions correspond to occluding edges, and so are necessarily non-convex. C . Slant Sign Owing to the close connection that exists between convexity and slant sign when the corners are rectangular (section 3.2.3), the principle of max imum convexity can also determine pre-ferred states for values in the slant sign stream. These are shown in figure 4.8. Since the corners are assumed to be rectangular, the convex edges in arrow- and Y- junct ions corre-spond directly to edges that are slanted away from the viewer, and non-convex edges to edges slanted towards the viewer. Consequently: Arrow-junct ions : Slant toward the viewer is preferred for the stems, while slant away is preferred for the outer wings. Y- junct ions : Slant away from the viewer is preferred for aU hnes. Other junctions do not contain enough information to determine slant sign directly, and so no preference can be assigned on their account. Convexity Subsystem Nonconvexity Subsystem • • • • Impossible Possible Preferred Slant Sign Subsystem - away Slant Sign Subsystem - toward • • • • Impossible Possible Preferred D . Slant Magnitude If an arrow- or Y - j u n c t i o n obeys Perkins ' laws (section 3.2.4), the values in its quantitative subsystem axe assigned the corresponding slant magnitudes, and the values in the quahtatlve subsystem axe set to 'preferred' to show that a definite interpretation has been made. O t h -erwise, the magnitude of the edges is set to a default value of zero, and the corresponding quahtatlve label is set to 'possible' so that it can be overridden by any definite interpretation. 4.3 The Rapid Recovery Process Taken together, the external and internal constraints developed above go a long way towards specifying a mapping that allows a large amount of scene structure to be recovered in very httle t ime. M i n i m a l assumptions have been made about processing resources — it is assumed only that a mesh of relatively simple processors is available, and that the time required for local computat ion is less than that of data transmission to nearby locations. Consequently, these constraints are largely independent of the details of the underlying mechanism. If the theory is to be complete, however, it must lead to a mapping that is uniquely specified. Several architectural parameters must therefore be specified. It must also be shown how the external and internal constraints can be smoothly combined into a rap id recovery process that is robust to small perturbations in the input . 4.3.1 A r c h i t e c t u r a l Speci f icat ions The constraints developed in the previous sections have the advantage that they are apphcable to a range of possible processors. Because they depend on a few aspects of the processor, however, these aspects must be given a definite specification i f the resultant mapping is to be unique. The choices made here are intended to be as general as possible, and to reflect what is known of the human visual system when the specification of particular parameters is unavoidable. To begin w i t h , the processing elements are assumed to be finite-state, making it necessary to convert continuous quantities such as two-dimensional orientation and slant magnitude into discrete form. Spat ia l location must be represented with a high degree of precision, reflecting the high acuity possible even at early stages in human vision (see, e.g., [WB82]). Each cell is therefore assumed to be able to represent location to within l / 1 6 t h of its own size.^^ O n the other hand, the measurement of hne orientation in the early visual system is based on channels of a half-amphtude bandwidth of about 1 0 - 2 0 ° [TG79], and so is much less precise. Consequently, orientation measurements are quantized to intervals of 10°. The estimates of slant magnitude must also be quantized. Like two-dimensional orienta-t ion , these are given a relatively coarse-grained representation, wi th magnitude quantized to intervals of 20°, centered around values of 0°, 20°, 40°, 60°, and 80°, Another issue is the way in which the zones can be arranged over the image. Three m a i n types of regular tesselation are possible: rectangular, tr iangular, and hexagonal. The part ic -ular choice does not greatly matter when processing does not involve coordinate-dependent quantities, but this must be made definite for purposes of analysis. In order to simphfy the implementation as much as possible, it is assumed that aU. zones have the same shape and size, and that they form a rectangular lattice over the image. The coordination of communication between zones must also be specified. Processing over each zone is carried out by a separate processor or group of processors, and communication between these processors may proceed either synchronously (coordinated by a global clock) or asynchronously. Since the process acts v ia an irreversible priority override mechanism, and since the available propagation paths are c o n s t a n t , p r e c i s e temporal coordination of operation is not important . Consequently, the issue of synchronous communication has htt le impact on the performance of the process. The major difference between the two types of communicat ion is therefore in the ease of implementation and analysis. In what foUows, synchronous communication is assumed. F ina l ly , an appropriate size must be chosen for the zone themselves. This depends in part on the absolute number of available processors, or more precisely, on the ratio of processors to the size of the input . It is assumed here that each zone can be made small enough to contain at most three hnes (i.e., enough for a single junct ion) . Beyond these requirements, the exact size of the zones is unimportant for present purposes — since processing speed is dominated by transmission time (section 2.5.2), changing the size of the zones only leads to Since each cell is later assumed to correspond to a visual area of roughly 10 min arc (section 5.3), this yields an precision of less than 1 min arc, roughly comparable to the limits of human visual acuity [WB82]. ^^State-dependent constraints are similar, the only difference being that a delay is introduced by the re-quirement that a definite set of labels be assigned to the critical variables. a rescaling of the t ime course of the process. 4.3.2 R o b u s t n e s s The assumption of rectangularity carries wi th it an obhgation to protect the process f rom the instabihties that result when hnes in the image are nearly paraUel or are nearly at right angles to each other. For arrow- and Y- junct ions , techniques similar to those of section 3.3.3 can be apphed in a straightforward fashion, at least locaUy. In part icular , an arrow- or Y - j u n c t i o n containing a 90°angle is treated as if the angle were shghtly larger. Since a global broadcast of the reassigned angles is not feasible using a mesh architecture, ambiguous L-junctions cannot be immediately resolved. They are consequently treated here as junctions containing constraints common to both acute and obtuse L-junctions (see, e.g., [Mal87]). One such constraint is that at least one edge must be nonconvex (see figure 4.2). The sensitivity of slant magnitude estimation can be reduced by a few addit ional measures. For junctions i n clear violat ion of Perkins ' laws (section 3.2.4), edges are given no i n i t i a l preferred slant magnitude (i.e., the values in the qualitative system are set to 'possible') . Constraints are also weakened so that neighboring estimates are acceptable only if they are w i th in adjacent ranges. FinaUy, to Umit the accumulation of errors that would result i f estimates of slant magnitude were propagated v ia L- junctions, estimates are taken only f rom direct sources (i.e., at the arrow- and Y- junct ions) , w i th values propagated only as far as the next junct ion . 4.3.3 B a s i c O p e r a t i o n Given the addit ional refinements of the sections 4.3.1 and 4.3.2, the process is completely specified. Since many of the constraints apply to the generation of the interpretation, and not simply its f inal f orm, the image-to-scene mapping cannot be given a closed-form description, fnstead, the interpretation of a given hne drawing can only be obtained by carrying out the process itself. The detailed operation of the recovery process is discussed in chapter 5, where an algo-^^Note that the absolute scale is important for any real system, leading to a preference for cell sizes that are as large as possible. Thus, the absolute size of a cell involves a time-space trade-off (cf. section 5.1.3): a larger number of smaller, simpler cells increases computational simphcity, while a smaller number of larger, more complex cells reduces transmission time (when internal transmission is not a factor). The choice of appropriate size is hkely to be based on some compromise between these two sets of conflicting requirements. r i t h m is developed that embodies al l the relevant external and internal constraints. However, the recovery process itself is shghtly more abstract than this, since it is completely specified without the addit ional details of the algorithm. The basic elements of its operation, t a k i n g place i n each zone concurrently, are as follows: 1. In i t ia l measurements are made of the termination locations and the orientations of the hne segments wi th in the zone. Terminations include not only true endpoints of the hnes, but also crossings of the zone boundaries. The locations of these terminations are represented with high precision (1/16 of the zone size). Orientation measurements are quantized i n units of 10°. 2. The type of junct ion present (if any) is the zone is estabhshed, and the angles between its hnes determined. 3. In i t ia l interpretations are assigned to aU variables in ah. substreams. If the zone con-tains one or more disconnected hnes, a l l values are assigned 'possible'. If it contains a junct ion , the hnes are labeUed according to the rules described in section 4.2.2. Th i s is done separately for the values and complementary values in each of the streams. 4. Values are propagated along connecting hnes to neighboring zones v ia the pr ior i ty mechanism described in section 4.2.2. This is done in tandem for both subsystems i n a l l streams. Since communication is only possible between zones that are immediate neighbors (section 4.2.1), this leads to a percolation of information along the hnes at a constant speed. Propagat ion of labels proceeds by assigning ' impossible ' states to eUminate inconsistent interpretations, and by assigning 'preferred' labels to select a preferred subset of the remaining possibihties. 5. Simultaneous with this " intra-dimensional" process, an " inter-dimensional" propaga-t ion is also occurring, t ransmit t ing information from zones that contain a variable w i t h a definite interpretation. Th is transmission appUes only to zones at the same location in the image, and foUows the rules given in figure 4.2. 6. T h e transmission of information along Unes and between dimensions continues unt i l the t ime hmit is reached. Inconsistent interpretations are identified by the assignment of ' impossible ' to an edge i n both subsystems. Ambiguous interpretations are identified by the assignment of 'possible' in both subsystems. Of the remaining interpretations, those deemed to be most Ukely are distinguished by the 'preferred' state. A n example of this process is shown in figure 4.9, which iUustrates how the in i t ia l convexity estimates assigned to a drawing evolve into a more complete interpretation. Convexity (+) Nonconvexity (o) jQzMagnitude iiiii Impossible «ftssïPossible ^••Preferred Narrow gray lines mark cell boundaries Chapter 5 Algorithm and Implementation The computat ional analysis of chapter 4 has yielded a set of external constraints o n the "s tat i c " associations between image and recovered scene, and a set of internal constraints on the "dynamic " aspects of the recovery process itself. These specify a unique image-to-scene mapping, and provide some general hmitations on the transformations that are to be used. W h a t is now required is to show that these constraints can be incorporated into a complete, well-defined system. In part icular , the process must be decomposed to the point where it can be carried out v i a the operations available on a device having the processing hmitat ions assumed i n the computat ional analysis (section 2.4). The analysis here is based on a device called the cellular processor. This is a type of cehular automaton (section 5.1.2) formed by part i t ioning a dense mesh of processors into a relatively sparse set of disjoint "cel ls" , each of which is assigned a simple processing element to carry out the loca l interpretations. It is shown that the basic operations of this mechanism can be implemented on a mesh of simple finite-state processors. The algorithm itself is then developed v i a a simple program based on these basic operations. The general properties of this mechanism are shown to be compatible w i t h what is known of primate cortical structure, and a tentative suggestion put forward regarding the way in which it might be implemented i n human visual cortex. 5.1 The Cellular Processor If it is to be effective, a rap id recovery process must be based on estimates made over regions of the image that are contiguous and compact, i.e., over zones (section 4.2.1). This introduces two different spatial scales into the recovery process: (i) a fine-grained scale that supports the Figure 5.1: Cel lular processor architecture. high resolution of the input and output representations, and (ii) a coarser-grained scale based on the size of the zone. A useful mechanism to handle this situation is the cellular processor. This is a device consisting of two spatiotopic meshes: (i) a dense mesh of measurement elements that determine basic image properties (e.g., color, contrast, and orientation), and a sparser mesh of more complex control elements that carry out the local interpretations (figure 5.1). 5.1.1 B a s i c aspects The cellular processor allows algorithmic analysis to be carried out in a straightforward way, w i t h issues of measurement and control separated as much as possible. Each measurement element ( M E ) can be loosely identified wi th a mechanism that measures some template property, such as the color or orientation of hnes. These M E s are assumed to have a small set of possible output values that are determined entirely by the contents of a spatiaUy-hmited neighborhood around the corresponding point in the image. A s such, they have no internal states and operate independently of each other. The spatiotopic order of the set of inputs is assumed to be maintained in the set of M E outputs , so that the array of M E s and the array of their outputs can both be referred to as the "measurement layer" , the distinction between the M E s and their outputs being clear from context. Adjacent elements in this layer may or may not have overlapping input regions. It is assumed that the density of M E s is sufficiently high that that no information in the image is lost.^ ^This layer has some interesting similarities with the dense set of localized filters found in the striate cortex To carry out more complex operations, the measurement layer is partit ioned into a n u m -ber of compact, contiguous sections (or "ceUs"),^ wi th the outputs in each ceU assigned t o a control element ( C E ) at the corresponding location in a higher-level "control layer" . E a c h C E is assumed to be sufficiently complex that it can respond to a l l possible combinations of outputs in its ceh. Towards this end, each C E is given a small finite memory to hold interme-diate quantities derived from the M E outputs (e.g., the number of hne segments it contains, their location, and the areas of any region bounded by them). Note that these quantities need not be determinate — in situations where space or time is extremely hmited, or where there is some inherent uncertainty i n the measurements themselves, statistical quantities m a y be preferred (see, e.g., [Ros86]). ft also is assumed that each C E can control at least some of its M E s v ia backprojections that override the M E output.'^ In addit ion, each C E is assumed to have a small set of operations that it can perform on its memory locations and on a its M E s . These operations form the basis of the local processing carried out by the processor. In contrast to the isolated elements of the measurement layer, elements in the control layer are able to interact w i t h their nearest neighbors, having access to at least some aspects of their neighbor's current state. This adds a degree of " la tera l " control to the "bo t tom-up" and " top-down" strategies generally employed in visual processing. 5.1.2 C e l l u l a r P r o c e s s o r s as C e l l u l a r A u t o m a t a Since each M E is an isolated mechanism performing a single operation, al l interesting aspects of the recovery process are carried out by the processors in the control layer. Consequently, the evolution of a cellular processor can be completely described by a rule that maps the current state of each control element onto a new state, the new value being determined by (i) the outputs of the M E s wi th in its ceh, (u) the contents of its memories, and (in) the states of its immediate neighbors. Since processing must be indifferent to the absolute spatial coordinates in the image, this mapping must be spatially uniform. Furthermore, the process is assumed to operate v ia synchronous communication between ah zones (section of primates (see section 5.3). ^The meaning of the term 'cell' corresponds to that of 'zone', but at the level of architecture rather than that of image. ^This can be accomplished by special internal memories, each capable of overriding the outputs of one particular M E . In this formulation, the output of the C E can be expressed either as the set of M E outputs or as the set of G E memory states. 4.3.1). Described in this way, a ceUular processor (or more precisely, its control layer) is seen to be a special type of ceUular automaton. CeUular automata are discrete deterministic systems formed by a cî-dimensional gr id of identical processors operating according to a fixed local law. More precisely, a ceUular au-tomaton ( C A ) can be defined as a quadruple [Kar90] A = {d,S,NJ), (5.1) where is a positive integer describing the dimension of A , 5 is a finite set of states, is a set of n neighborhood vectors (each of the form x — {xi, ...Xd)), and / is the local transit ion function from 5 " to 5 . The ceUs of A are arranged along an infinite c?-dimensional gr id , their positions indexed by elements Z'^, the rf-dimensional space of integers. CeUular automata were developed originaUy by U l a m and von Neumann as tractable approximations of highly nonhnear differential equations in biological systems (see [TM90]) . B u t they are also interesting in their own right, since local rules can lead to a variety of complex, spatiaUy-extended structures (see, e.g., [ C H Y 9 0 , Smi90, TM90] ) . CeUular automata have been used for simple image operations, including thresholding, pointwise arithmetic on image pairs, and convolution (see, e.g., [Gol69, P D L + 7 9 , Ros83]). Other operations include the shrinking and expansion of elements in the image, and the formation of their convex huU [PDL+79] . Indeed, it is Ukely that C A s can do considerably more than this , since given the appropriate transit ion functions and in i t ia l configurations, they are capable of universal computat ion , i.e., computing any function that can be computed by a Tur ing machine (see, e.g., [CHY90] ) . In order to conform w i t h the general constraints of the recovery process, a two-dimensional gr id is used, and the neighborhood set N is the set of ceUs at most a unit distance away in the hor izontal or vertical direction. ' ' Thus , the neighborhood is composed of nine ceUs: the ceU itseK, and a layer formed by overlapping 3 x 1 arrays of ceUs immediately to the top , b o t t o m , r ight , and left. Consequently, the transit ion function / is described by a mapping ^ S that associates each possible pattern of neighborhood states to the new state of the center ceU. •*A rectangular tesselation is not necessary for cellular automata that operate on images — several appli-cations (e.g., [Gol69]) are based on a hexagonal array. Indeed, any C A vs^ ith an arbitrary neighborhood N is equivalent in its computing power to one with a von Neumann neighborhood, i.e., one with neighbors to the top, bottom, left, and right (see [PDL+79]). 5.1,3 P r o g r a m m i n g A . Basic Considerations Viewing tlie control layer of a cellular processor as a cellular automaton, its programming reduces to the design of an appropriate transition function and selection of an appropriate in i t ia l configuration of values. There are, however, three important constraints part i cular to its operation. F i r s t , to rule out the necessity for any k ind of higher-level global mechanism, the i n i t i a l value of each C E must be determined entirely by the M E s within its corresponding ceh. This means that the i n i t i a l configuration of values must be in spatial register w i t h the input image, thereby prohibi t ing the use of the special-purpose in i t ia l patterns or auxihary elements often used i n general C A design. Similarly , the f inal configuration also is required to be i n register w i t h the image, since the output is required to be a spatiotopic map. This rules out algorithms that deform the spatial organization found in the input , such as the shr inking process used to count the number of items in an image [Gol69]. F inal ly , the operation of the processor itself must be in-place, i.e., the memory in inactive ceUs cannot be used as scratch space for intermediate calculations. The use of scratch space is a viable option when the in i t ia l configurations are such that known subsets of the gr id can be guaranteed to remain inactive (see, e.g., [Arb87, ch. 7]). However, this condition cannot in general be met when an arbi trary set of input images (and therefore Initial configurations) is possible. The power of a cehular architecture cannot therefore be harnessed in the manner used for many classes of general C A problems, v iz . , by designing an appropriate in i t ia l configuration. Instead, the appropriate information must be stored locally i n each ceh. This can be done by increasing the number of states in S (i.e., increasing the number of states in each control element). Increasing power in this way also ahows the transit ion function to have a more natura l structure, simphfying the design and analysis of the system's behavior [Arb87, ch. 7]. A t the lowest possible level, therefore, the programming of a cehular processor reduces to the selection of a set of states for each C E , together w i t h a transit ion function that operates on these states. B u t to help ensure that the processor respects the constraints described above, it is convenient to program at the shghtly higher level of simple operations on part icular properties accessible by the C E . Once such a set of operations has been specified, any part icular recovery process can then be specified by the appropriate concatenation of operations. This is effectively a general mechanism for the "abstract programming" of paral le l processes, with the resultant program loaded into each of the C E s , where it acts somewhat hke paraUehzed version of a v isual routine [UU84]. B . Elementary Structures and Operations The data structures to be used i n programming the control elements are straightforward: the values of the M E s in the corresponding ceU, the contents of the internal C E memories, and the accessible properties of the adjacent C E s . These are a l l simple scalars, wi th only a smal l set of possible values. A s such, they can be handled in a uniform way. M o r e lat i tude exists in the choice of elementary operations. There is in some sense a " n a t u r a l " set of basic operations — if too few exist, it may not be possible to carry out ah the intra-ceU operations wi th in a single time step; i f too many exist, they merely add to the space required by the C E . The elementary operations chosen here are simple forms of da ta input , output , and transformation: 1. Input of information from M E s to memory elements. Connectivity wi th in a ceh is assumed to be high enough to aUow a C E to estabhsh direct access from any M E to any of its internal memory elements. 2. O u t p u t of information from memory elements to M E s . Connectivity also is assumed high enough to allow a C E to estabhsh backprojections from any of its i n -ternal memory to at least some of its M E s . The interpretation output by the processor takes the form of values of these latter M E s (or equivalently, of the corresponding memory elements that override them). 3. Simple operations on information in memory elements. It is assumed that each C E can add, subtract , mult ip ly , and perform integer division on the contents of the memory elements. It also is assumed that a two numbers can be compared to determine the higher value. Inputs and outputs for these operations are always taken from the memory elements; transfer of contents between memory elements is s imply a special case where no operation is performed. In addit ion, each C E is assumed to have an input from higher levels that provides a simple control on its operation. Depending on the value of this signal, the C E either resets its memories to some default state, begins/continues its operation, or halts its operation. c. Combining Basic Operations The cehular processor is programmed by creating an appropriate transit ion function and set of states from the elementary structures and operations described above. This can be done most simply by concatenating elementary instructions together into a sequence, an operation which corresponds to the composition of the corresponding transit ion functions. B o t h simple and compound operations can be concatenated into new compound operations. Note that the resultant transit ion need not be carried out in a sequence of separate transitions — it can be "fused" into a more complex function that can be carried out in one step. The replacement of a sequence of simple operations by a single transit ion corresponds to the use of a lookup table (cf. section 7). In this sense the process is consistent w i t h the loading-in of a complete object model based on its features in the image (e.g. [PE90]) . However, the approach here involves items of a smaller " local models" composed entirely of locally-definable properties. Note that the issue here here centers around the advantages of a larger sequence of simple transitions as opposed to a smaller sequence of more complex transitions — an instance of the basic time-space tradeoff found in more general models of computing (see, e.g., [Har87]). Operations can similarly be combined v i a the " i f - t h e n " conditional construct, the result simply formed from the two alternative functions. The loop construct of conventional pro-gramming languages also is aUowed, but only if the body of the loop is carried out a hmited number of times. A s used here, the loop is a simple programming convenience, which is "unrohed" in the actual implementation of the transit ion function. A loop controUed by a variable can be translated into several separate unroUed loops, which are then selected v i a condit ional constructs. In a similar fashion, procedures can also be used to help specify the process, but each is to be treated as a macro that is replaced in the actual transit ion function by the set of instructions it contains. A s such, procedures cannot cal l each other recursively. FinaUy, the program given to the ceUular processor does not need an exphcit 'ha l t ' com-m a n d , since it is assumed that the processor is suspended (as weU as started) by an exphcit command from higher levels. 5.2 Algorithm for Rapid Recovery G i v e n the set of operations available to the cehular processor (section 5.1.3), it remains to use these as the basis of an algorithm capable of carrying out the recovery process sketched in section 4.3. A l t h o u g h the constraints on the recovery process and on the ceUular processor are not sufficient to specify a unique algorithm, this is not important for the present purpose, which simply is to show that such an algorithm can exist. The algorithm used here can be summarized as foUows: For each control element: 1. Obta in from the measurement elements the locations of aU Une terminations and the orientations of aU Une segments wi th in the ceU. Terminations include not only true endpoints of the Unes, but also points at which the zone boundaries are crossed. A s required by the specifications of section 4.3.1, orientation measurements are quantized i n units of 10°. 2. Determine the type of junct ion( i f any) that is present, and make exphcit several of its properties, such as the values of the angles involved. 3. Ass ign in i t ia l labels to the hnes according to the rules described in section 4.2.2. 4. For each subsystem of each stream, repeat the foUowing: a. Read the relevant values from any neighboring C E that shares one of the hnes. Update the current values v ia the priority mechanism described in section 4.2.2. b. Read the relevant values from those streams containing a variable w i t h a definite interpretation, and update the current values according to the rules given i n figure 4.2. Since this transmission apphes only to zones at the same location in the image, only the internal memories of the C E are involved. c. A p p l y the intra-hne constraints according to the rules given in figure 4.2.2. These eUminate any local inconsistencies that may have arisen in the new set of values. 5. Stop iteration when the t ime hmit is reached. The final interpretations are determined from the assignment of the 'possible', 'preferred', and ' impossible' labels in each sub-system, according to the rules given in section 4.2.2. T h e foUowing sections describe in greater detail how these operations are carried out by the ceUular processor. 5.2.1 D e t e r m i n a t i o n of B a s i c Image P r o p e r t i e s T h e measurement elements in each ceh describe the image basic properties available to the control element. These include the locations of the hne terminations and the orientations of the hne segments in the area subtended by the ceh. There are a variety of ways this can be done. Here, each M E is assumed to signal the existence of a hne centered at the corresponding locat ion in the image array. Line segments of different orientation, horizontal length, and vert ica l length are represented by different sets of M E s , each signalling the presence or absence of its part icular type of segment by a simple binary output . A s required by the architecture specifications given in section 4.3.1, these elements represent length and position to a high degree of precision, w i th orientation represented only coarsely. These outputs contain a complete (in fact, redundant) description of ah hne segments i n the ceh, and can therefore support the determination of a l l the image properties needed by the control element. Three properties are of particular interest, al l of which are represented v i a a bank of Unite-state memory elements: N u m b e r of lines in the cell: This can be determined from a count of the number of M E outputs that are active. A m a x i m u m of three is assumed (section 4.3.1). T h e endpoints of each segment: These are calculated for each segment from the knowledge of the relevant center point and the horizontal and vertical lengths. N o more than six endpoints need to be stored. T h e orientation of each segment: These can be taken directly from the orientation label of the appropriate M E . N o more than three values need to be stored. 5.2.2 D e t e r m i n a t i o n of J u n c t i o n P r o p e r t i e s The next step is to obtain those properties of the junct ion useful for subsequent interpretation. These only need to be calculated once, their values then stored in an appropriate bank of memory elements. Five part icular sets of properties are used here: junct ion position, junct ion angles, junct ion type, junct ion rectangularity, and an auxihary set of hne descriptions (two for each hne) used for the interpretation of contiguity. A s required by the recovery process, a l l quantities are finite. A . J u n c t i o n Position Junctions are detected simply by finding the intersection point of the hnes in the cell. It is assumed that each ceU is sufficiently small that at most one junct ion (and therefore one intersection point) can exist w i th in the area it subtends. The existence of the intersection point is determined by testing for the identity of the endpoints. For the case of T- junct ions , a shghtly different procedure is used, based on a unique zero distance from an endpoint of one hne segment to a different hne segment. If no intersection point is found, no junct ion exists wi th in the ceU. Note that this is possible even i f the junct ion contains several hnes, since these hnes may be non contacting. If an intersection point is found, its location is stored into an appropriate memory element. B . Junct ion Angles The angle 9ij between each pair of connected hnes Si and Sj is simply the absolute value of the difference of the two orientations. The only real difficulty here is to determine how the hnes are connected - as seen from figure 5.2, each pair of Unes can be combined in two different ways, corresponding to acute and obtuse forms. These can be distinguished v i a the dot product of the two hnes, defined to be (see, e.g., [Tho72]) cos(^ij) = (a» •aj)/\ai\\aj\. The disambiguation of acute and obtuse junctions can be based on the sign of the cosine: positive for acute angles, negative for obtuse. If the difference between two hne orientations actually corresponds to angle 9ij, it wiU therefore have a value between 0° — 90° for pairs w i t h a positive dot product , and between 90° - 180° for a negative dot product. If these conditions do not hold , the angle must be 180° minus this value (figure 5.2). Since only the sign of the dot product is important , division by the magnitudes need not be performed, and so can be readily carried out by the control element. Note also that the dot product is a true scalar quantity (see, e.g., [Tho72]), so that no artifacts are introduced by the selection of any part icular co-ordinate system. A m o n g other things, this takes care of any problems introduced by the discontinuity in orientations at 180°. It also means that orientation can be taken w i t h reference to any co-ordinate system, the only requirement being that the same system is used locally for any junct ion . Figure 5.2: Calculat ion of orientation differences. C . Junct ion T y p e Junctions are classified by a two-stage process. The in i t ia l classification based on their arity, i.e., the number of hnes existing at the common intersection point. This value can be obta ined simply by counting the hnes that have an endpoint identical wi th the intersection point . Junctions are then marked as foUows: N o junction: N o intersection point exists T- junct ion : One endpoint contacts the intersection point L-junction: T w o endpoints contact the intersection point A r r o w - , Y - j u n c t i o n : Three endpoints contact the intersection point Further disambiguation can be based on the values of the angles between the hnes: L-junction (acute): angle is less than 90°) L-junction (obtuse): angle is greater than 90°) Arrow- junct ion : angles sum to less than 360° Y - j u n c t i o n : angles sum to exactly 360° Comphcations can arise when junct ion angles are nearly orthogonal, since the uncertainty in the sign of the dot product makes it difficult to discriminate acute L- junctions from obtuse ones in a rehable way. It also becomes difficult to distinguish arrow- from Y- junct ions if two such angles are present in a junct ion (i.e., one of the hne pairs is almost coUinear wi th an-other) . Various techniques can lend robustness to the recovery process under these conditions (cf. section 3.3.3), but in the interests of simphcity, only a few are used here (section 4.3.2). In part icular , L- junctions w i t h angles determined to be 90° (based on the quantized estimates in memory) are treated as a separate type of L- junct ion that has the constraints common to both obtuse and acute L- junctions. A s for the other kinds of L-junctions, right-angled L- junctions can be determined by a simple test of junct ion angles. D . Junct ion Rectangularity A n important basis for the recovery of slant magnitude is the assumption that the junct ion corresponds to a rectangular corner in the scene, wi th slant magnitudes assigned only to those edges belonging to a junct ion obeying Perkins ' laws (section 4.3.2). Consequently, it is important to indicate whether or not a junct ion can correspond to such a corner. This can be done v i a a simple test based on the angles and type of the junct ion: L- junct ion : no T- junct ion : no A r r o w - j u n c t i o n : one angle > 90° and two angles < 90° Y - j u n c t i o n : three angles > 90° Rectangulari ty is flagged simply by assigning a zero value to aU angles i n junctions that do not pass this test. In the interests of robustness, this procedure must be extended to handle right angles as weU. Note that since two angles of 90° in a junct ion form a T- junct ion, only one right angle is allowed in any arrow- or Y - j u n c t i o n , aUowing the extension to be done in a simple way: A r r o w - j u n c t i o n : one angle >= 90° and and two angles <= 90°. Y - j u n c t i o n : three angles >= 90°. E . Contiguity Lines In contrast to the other interpreted properties, the contiguity of Unes wi th their flanking regions requires the assignment of two values per hne, one for each side (section 3.2.1). Cont igui ty is therefore represented here by a pair of contiguity hnes ("c-hnes") obtained by Contiguity lines on both sides of each junction line Inner contiguity lines in same direction as adjacent junction line Figure 5.3: Determination of contiguity relations. offsetting the original "parent" hne a few pixels on either side (figure 5.3). The value of each c-hne indicates whether its corresponding region is contiguous with the parent hne in the junct ion . In a l l respects, c-hnes are treated as regular junct ion hnes, taking on the states of 'possible' , 'preferred', and ' impossible' . Because several constraints apply to c-hnes that share a common region, it is useful to have a record of which c-hnes are connected to each other inside the junct ion. Since almost al l connected c-hnes are the inside edges of adjacent hnes (cf. section 3.2.1), the test for connectedness reduces almost completely to a search for these inside edges.^ The problem, then, is to determine which c-hnes are on the " ins ide" , i.e., which c-hne faces the junct ion hne opposite its parent (figure 5.3). A simple way to solve this problem is based on the cross product, which for hnes a and b is defined as the determinant (see, e.g., [Tho72]) where the ej are unit vectors in the x, y, and z directions. In the case of two dimensions, the cross product describes the area swept out by the two vectors, its sign depending on the sense of the rotat ion required to ahgn a w i th b. Consider first one of the junct ion hnes. The cross product of this hne w i t h an adjacent junct ion hne can be readily determined by the control element, the sign of this quantity describing the sense (either clockwise or counterclockwise) in which this hne must be rotated ^The only exception is for the outer edges of the arrow-j unction, and these can be handled straightforwardly. a X 6 = ex ey ez aa: ay a^ = |a||6| sin(6'a6)ez, bx by bz to line up wi th the adjacent hne. Consider now the associated c-hnes, together w i t h the hnes formed by jo ining their outer points to the junction intersection. Only the c-hne on the inside edge (i.e., the c-hne facing the adjacent junction hne) can give rise to a hne i n the same rotat ional direction as that of the adjacent junct ion hne (figure 5.3). Repeating the same procedure for the c-hnes of the adjacent hne then yields the pair of inside edges. The cross product is a quantity independent of the co-ordinate system (see, e.g., [Tho72]). It therefore allows processing to be unaffected by the particular co-ordinates used in repre-senting the orientation of the hnes. 5.2.3 I n i t i a l A s s i g n m e n t of Interpretat ions Once the basic image and junct ion properties have been determined, the next step is to assign the i n i t i a l interpretations to the variables in each of the four streams. The states of these variables aie held in eight separate banks of memory elements, one element per subsystem. Only three possible values can be attached to any complementary variable, and only five are possible for slant magnitude (section 4.3.1). Consequently, these memory elements only need to take on a few possible states. The assignment operation Itself is a straightforward procedure that sets the values of the relevant memory elements, the particular choice of values depending only on the junct ion type (section 4.3). Th is can be carried out by using a set of conditional statements. Together w i t h the values describing the structure of the junctions, the resulting set of C E memory states provides the in i t ia l configuration for the iterative part of the interpretation process. 5.2.4 P r o p a g a t i o n of Interpretat ions G i v e n an array of in i t i a l interpretations, it remains to transmit these values to neighboring locations and streams. A s discussed in section 4.2.2, this is done by an iterative process that at each i terat ion replaces values of low priority w i th values of higher priority, i.e., 'preferred' replacing 'possible' , and ' impossible ' replacing 'possible' and 'preferred'. The constraints at each junct ion guide the local transmission of these values, resulting in "waves" of higher-pr ior i ty states c irculating around the hnes in the image. The propagation of these waves continues u n t i l an equihbrium state is reached or unt i l the process is t imed out. variables sharing constraints w i t h the "target" variable are accessed, and if any of these has a value of higher pr ior i ty than the value of the target variable, the memory element is set to this value. This can be done even for the case of state-dependent constraints (section 4.2.2, since only an addit ional conditional construct is required to put the appropriate constraints into effect. Information access occurs v i a four different avenues: neighboring cells, neighboring streams, intra- junct ion constraints, and intra-hne constraints (section 4.3). Since transmis-sion is based on a simple prior i ty mechanism, the order of the access operations w i th in each stream is unimportant . This allows the propagation process to be carried out by a relatively simple set of operations. A . Updates from Neighboring Cells The updat ing of values from sources outside the ceh can be done concurrently for each hne segment. To access the appropriate values from neighboring locations, first determine which neighbors contain a continuation of the relevant hne segment. This is done by reading the set of endpoint locations stored in each neighboring control element and testing for equality against the endpoints of the local hne segment. Since a hne crossing a ceU boundary is divided into two segments that terminate at the same point (i.e., the boundary) , this test succeeds i f and only i f the segment continues into the neighboring ceh. For each ceU containing a continuation of the segment, access the relevant set of memory elements and compare their values against those of the current ceU. Since continuations are required to have the same values, updating foUows the rules for bijective constraints described in section 4.2.2. B . Updates from Neighboring Streams Just as information is t ransmitted from neighboring locations, it also is t ransmitted from neighboring streams. The only difference between the two situations is that whereas inter-ceU transmission is based simply on priority, inter-stream transmission usually has an addit ional dependence on the part icular type of junction and on the particular Une in that junct ion (section 4.2.2). This dependence is fixed for each junct ion type, w i th updat ing carried out by a set of condit ional assignments between the appropriate variables. Since this updat ing is based on simple priority, the order in which streams are evaluated is unimportant . c. Updates from Intra-junction Constraints After assigning new values to the hnes based on sources external to the junct ion, the next step is to impose the set of local constraints on the hnes of the junction itself. Updat ing follows the rules described in section 4.2.2, wi th the particular constraints depending on junc t i on type. Consequently, it can be carried out by a set of conditional constructs. Since the f inal result depends only on pr ior i ty of the values involved, the order of evaluation of the hnes is un important , and can even be done in parallel . D . Updates from Complementary Subsystems A f inal path of Information transmission originates in the constraint between the values i n complementary subsystems: i f any value in a subsystem has been set to ' impossible ' , the value of its complement is upgraded to 'possible' (see section 4.2.2). Since this constraint holds for aU hnes at al l times, it can be carried out v ia a conditional assignment incorporated into the assignment mechanisms used in the other access paths. 5.2.5 F i n a l A s s i g n m e n t of R e s u l t s A f t e r the propagation of values has been halted, a final "postprocessing" phase can be used to transform the states of the sets of complementary variables into a more " s tandard" repre-sentation that expresses the two definite interpretations, the inconsistent interpretation, and the ambiguous interpretation. The rules of this transformation are given in section 4.2.2. T h e transformation itself can be carried out straightforwardly on the relevant memory elements since only a simple remapping of values is involved. 5.3 Neural Implementation T h e final requirement of a computat ional analysis is that it demonstrate the existence of a physical system capable of carrying out the process — in part icular , one compatible w i th the neural mechanisms beheved to underhe human vision [Mar82]. But the detailed knowledge about the neurophysiology of vision is hmited mostly to processes that measure simple image properties such as contrast and orientation (see, e.g., [Bis84, Sch86]). A n detailed analysis of the neural implementation of rap id recovery is therefore not possible at the present time. However, the ceUular processor developed in section 5.1 is largely compatible w i t h what is known about the anatomy and physiology of primate visual systems. The main stream beheved to be involved i n form vision begins wi th ret inal ceUs that measure simple properties such as the contrasts and motions of luminance gradients i n the image. The outputs of these ceUs extend to the lateral geniculate nucleus, and the geniculate ceUs in t u r n extend to the v isual cortex, w i t h a spatiotopic order maintained at aU points along the way (e.g., [Bis84]). The v isual cortex serves as the location where the outputs of this stream are brought together w i t h those of the other streams. It contains ceUs sensitive to a variety of simple properties, including contrast, color, and hne orientation (e.g., [dYvE88]) . These form a dense spatiotopic map , w i th each point in the array containing a description of the various image properties at the corresponding point in the input image. A s such, this map is an array similar i n many respects to the measurement layer of the ceUular processor. The spatiotopic ordering of ceUs in the primate visual cortex is not quite point-to-point . Rather , it is "patch-to -patch" , w i th each set of ganghon ceUs in a ret inal patch projecting to a separate module (or "hypercolumn") in the visual cortex [HW74, H u b S l , Bis84]. Each hypercolumn is a vertical section of the cortex wi th an area of approximately 1mm x 1mm; the pr imate visual cortex is thought to have about 2000 such columns, each containing at least several thousand ceUs [HubSl ] . The corresponding patch of the visual field increases wi th eccentricity from the fovea, but around the fovea itself it has dimensions of about 10' arc (i.e., 1/6°) [HubSl ] . A U the measurements made over each patch are brought together i n the corresponding hypercolumn, aUowing it to completely analyze its section of the visual field. A s imilarity w i t h the control layer of the ceUular processor is evident. Th is similarity is reinforced by the observation that most connections between ceUs are vertical ones wi th in the column itseff, lateral connections to other areas being much sparser and shorter, often wi th lengths of only 1-2 m m (i.e., extending only to nearest neighbors) [HubSl ] . H hypercolumns can be identified wi th the control elements of the ceUular processor, it would imply that hypercolumn operation is more sophisticated than geheraUy beUeved. B u t such sophistication would not be implausible given the number of ceUs in each hypercolumn and the density of their internal connections. In this context it is important to note that hypercolumn organization is extremely common, being found in most parts of the cortex in virtuaUy a l l mammaUan species [GJM88] . Thus , it is not absolutely essential that rapid recovery is carried out in the hypercolumns of the visual cortex — the hypercolumns of the extra-striate visual areas (see, e.g., [MNS7]) could also be used for this purpose. Chapter 6 Tests of the Theory The final stage of the analysis is to test the theory on actual hne drawings of po lyhedral objects. T w o sets of issues are of interest here. The first is how weU the recovery process handles various kinds of hne drawings. The process is tested on drawings of objects that violate the underlying assumptions about scene structure, and on drawings that cannot be given a consistent global interpretat ion. It is shown that a substantial amount of three-dimensional structure can be recovered under a wide range of conditions. The second set of issues concerns the abihty of the theory to explain the recovery of three-dimensional structure at the preattentive level of human vision. It is shown that the theory can explain — at least in broad outhne — how early visual processing can recover three-dimensional orientation f rom some kinds of hne drawings, and why it cannot do so for others. 6.1 Performance on Line Drawings To examine the power and the hmitations of the recovery process, it is tested on a range of hne drawings, including those in which ah underlying assumptions are obeyed as weh as those in which various assumptions are violated. A l though the resulting interpretations are not perfect indicators of the overaU effectiveness of the process, they do provide an idea of the relative ease or difhculty of Interpreting the various kinds of hne drawings. Since the speed of the process is determined primari ly by the speed of information trans-miss ion, the absolute size of the hne drawing has virtuaUy no influence apart from a rescahng of the t ime course (section 2.5.2).^ The effects of size are therefore ehminated by rescaling ah drawings so that their average hne length is the same. For the drawings considered here, the average hne length is set to 5 ceh widths. Transmission speed can be similarly factored out by measuring time in terms of the number of transitions between adjacent ceUs, or equivalently, by the number of iterations. This value is essentially a free parameter, which can have different values when recovery is used in different situations or for different purposes. However, in order to obtain an indication of the relative difficulty of recovery for various kinds of hne drawings, it is useful to base comparison on one particular t ime hmit . A s a representative value, the number of transitions is such that information is propagated along a distance of twice the average hne length. This allows enough time (on average) for the estimates from each junct ion to be transmitted to their nearest neighbors, and for any updated values to be transmitted back. Since the average length Is 5 ceh widths, 10 transitions are used. 6,1.1 R e c t a n g u l a r O b j e c t s W h e n scenes contain only rectangular objects, al l assumptions about the structural con-straints (section 3.1.1) are true, giving the process the best chance to obtain a globaUy con-sistent interpretation of a l l scene-based properties. The corresponding hne drawings therefore test the abihty of the process to obtain such interpretations under Ideal conditions. i) Convex objects The objects most amenable to rapid recovery are simple convex rectangular blocks (figure 6.1), since these not only obey ah structural assumptions, but also obey the principle of m a x i m u m convexity that is used to select the in i t ia l set of interpretations (section 4.2.2). A s figure 6.1 shows, almost ah the three-dimensional structure has been recovered, wi th un -ambiguous preferred values assigned to al l the Unes in al l four streams, and w i t h almost a l l the alternatives ruled out as impossible. A remnant of uncertainty remains In the center of the drawing, where the alternative con-vexities and slant signs are not yet completely ruled out. The propagation of the ' impossible ' ^Performance does change as the size of the entire object approaches the dimensions of a zone, since the assumption of no more than three Unes per cell (section 4.3.1) can no longer be held. However, drawings here are assumed to be large enough that this is of no concern. values from neighboring ceUs does, however, provide these areas wi th a definite interpretation after a few more iterations. Th is iUustrates a common feature of the process — ambiguity is typicaUy eUminated by proceeding from the outside of the drawing to the inside. Th is is largely due to the low ambiguity of the L-junctions, which are most often found on the outside border of the drawing. The other area of uncertainty is the assignment of contiguity to the outer edges of the drawing. This is due to the inherent ambiguity of the Une drawing itself, which can be interpreted as a block attached to various surfaces (floor, waU, ceiUng) or as a block without any attachments at a l l . The recovery process has no means for preferring one over the other, and so the interpretation of these values remains ambiguous. ii) Nonconvex objects Nonconvex rectangular objects obey aU structural constraints, but contain nonconvex corners that are initiaUy assumed to be convex (section 4.2.2). A s seen from figure 6.2, the i n i t i a l assignment of an incorrect set of values to the nonconvex junct ion does not seriously affect the final interpretation. Cont igui ty is assigned unambiguously and correctly to almost a l l surfaces, w i t h the exception of the outer edges, which — as for the case of the convex block — cannot be given an unambiguous interpretation. Note that the preference for contiguity of the edges of Y- junct ions (section 4.2.2) has caused the lower edge to be given a 'preferred' value, although the opposite interpretation has not been definitely ruled out. The other streams similarly contain edges that either have a definite interpretation or involve preferred interpretations. Ambiguous convexity and slant sign interpretations exist on the edges of the concave Y -junct ion i n the center of the drawing. This is due to its in i t ia l preference as a convex junct ion and to the subsequent assignment of preferred complementary values based on values from its neighbors. The corresponding ambiguity in these neighbors (i.e., the convex Y- junctions) is removed v i a the certainty in the L- junct ion interpretations. This again iUustrates that many of the unambiguous interpretations are first formed on the outside of the drawing and then propagated inward. Because slant magnitude does not depend on the convex/con cave distinction, it is unam-biguously assigned to aU hnes, Umited only by the transmission distance. Magnitude • • • •Hmpossible wftW-Possible Preferred Narrow gray lines mark cell boundaries r...:...\«m. ..Mn. :...:..»]».:.«1.. Convexity (+) Nonconvexity (o) Slant Sign (1) Slant Sign (2) Slant Magnitude (Value) Slant Magnitude (Confidence) •Qc Magnitude iiiii Impossible «UfcK Possible Narrow gray lines mark cell boundaries I Preferred iii) Occluded objects W h e n several objects exist in the scene, projection to the image plane often results in the par t ia l occlusion of one object by another. Information from the occluded junctions is lost , a loss which is only part iahy compensated for by the constraints from the T- junctions. A s figure 6.3 shows, however, the recovery process is fairly robust against the effects of occlusion. Assignments of contiguity are as good as those for indiv idual blocks; indeed, they are somewhat less ambiguous, since the T-junctions have added extra information to the crossbars. Convexity and slant are almost unimpaired, w i th only a shght increase in the area of uncertainty around the central Y- junct ions . The only significant loss of information occurs in the hne connected to an occluded arrow-junct ion on one end, and to an L- junct ion on the other. The L- junct ion can provide an assignment of slant sign, but cannot transmit slant magnitudes. Consequently, the hne must remain uninterpretable wi th in this stream. 6.1.2 N o n c o n f o r m i n g O b j e c t s Another test of rap id recovery concerns its abihty to interpret hne drawings of "nonconform-i n g " objects, i .e., those that do not conform to a l l the structural assumptions that underhe the recovery process. The abihty of the process to recover various scene properties under such conditions provides an indication of its robustness in domains beyond those for which it is op t ima l (cf. section 2.3). i) Nonrectangular objects Given the importance of rectangularity for the in i t ia l assignment of slant magnitudes (section 3.2.4) and the constraints on convexity (section 3.2.2) and slant signs (section 3.2.3), it is important to determine how recovery is affected when these assumptions are no longer true of the scene. F r o m figure 6.4, it is seen that the process can stiU recover a fair amount of structure. The inner edges of a l l hnes are interpreted unambiguously as contiguous. T h e outer edges of the drawing remain largely uninterpreted. W h e n more iterations are allowed the contiguity interpretation assigned to the acute L- junct ion spreads around the outside of the drawing. Most of the trihnear junctions have been given unambiguous interpretations in the con-vexity and slant sign dimensions. A l though a contradiction in slant magnitude has been Convexity (+) Nonconvexity (o) Slant Sign (1) Slant Sign (2) Slant Magnitude (Value) Slant Magnitude (Confidence) Narrow gray lines mark cell boundaries found for one of the edges, and cannot be assigned to two others (since the junctions violate Perkins ' laws), the remaining four edges have been assigned definite values. ii) Origami objects Another class of objects that do not conform to the structural assumptions are the origami objects [KanSO], formed by jo ining extremely thin polygonal plates to each other along their edges. A l t h o u g h they are similar to polyhedra in having planar surfaces, origami objects are never sohd, and so their projections cannot be interpreted as sohd polyhedra. A n example of such a drawing is the chevron shown in figure 6.5. A s seen from figure 6.5, the interpretation process is fairly robust to the violation of this assumption. Most of the outer edges are interpreted as contiguous, an interpretation at odds wi th that given to the convex block. But three of the four inner edges of the rectangles are stiU interpreted unambiguously as being contiguous. The results in the other three streams are largely unaffected by the violation of this assumption, w i th the interpretations matching almost exactly wi th those of the sohd convex block. iii) Nonplanar objects M u c h of the power of a hne interpretation process stems from a basic assumption that the surfaces of the corresponding object are planar (see section 2.2.1). The drawing in figure figure 6.6 violates this basic assumption, the upper surface being uninterpretable as a plane. The local nature of the rap id recovery process, however, allows much of the structure of nonplanar objects to be recovered, since global consistency is not enforced. This is iUustrated in the interpretations shown i n figure 6.6. Contiguity is assigned correctly almost everywhere, w i t h inconsistent interpretations assigned only to the inner edges of the notch in the upper surface. Similar considerations apply to convexity and slant sign. Furthermore, slant magni -tudes are unambiguously assigned to a l l hnes, a result due to the absence of a check on slant magnitude at L-junctions (section 4.1.3). Slant Sign (1) Slant Sign (2) Slant Magnitude (Value) Slant Magnitude (Confidence) nQ^Magnitude iiiii Impossible «ww-Possible Preferred Narrow gray lines mark cell boundaries Slant Magnitude (Value) Slant Magnitude (Confidence) ZTQ^ Magnitude iiiii Impossible *WWB: Possible Preferred Narrow gray lines mark cell boundaries Convexity (+) Nonconvexity (o) Slant Sign (1) Slant Sign (2) Slant Magnitude (Value) Slant Magnitude (Confidence) : Q c Magnitude iiiii Impossible *isîi?. Possible Preferred Narrow gray lines mark cell boundaries 6.1.3 I m p o s s i b l e O b j e c t s Objects are said to be " impossible" i f they cannot exist under the assumption that connecting hnes in the image correspond to connecting edges in the scene. If accidental ahgnments are allowed, connecting image hnes can correspond to disconnected edges in the scene, so that a corresponding object can be found for any hne drawing [Kul87]. But the conditions required for this are extremely unstable, v io lat ing the general viewpoint constraint (section 3.2), so that such interpretations are not generaUy aUowed. Instead, the drawing is interpreted as an impossible figure containing a set of globaUy inconsistent interpretations. A s a f inal test of its abihties, the rap id recovery process is apphed to drawings of these impossible objects. To keep the influence of other factors to a m i n i m u m , al l junctions are such that they can be consistently interpreted as rectangular corners. The apphcation of the recovery process to these drawings consequently provides a good test of how weh it can handle global inconsistency. i) Objects of inconsistent contiguity and convexity The first class of impossible objects are those that correspond to drawings that cannot be given a consistent set of contiguity and convexity labeUings. The example considered here is shown i n figure 6.7. Such drawings violate the basic assumption that a surface contiguous w i t h a given edge remains contiguous throughout its entire length; among other things, this ehminates the distinction between object and background [Kul87]. In addit ion, several of the hnes cannot be given a consistent convexity interpretation along their length, providing a second source of inconsistency. Because the interpretation process involves only local sections of the drawing, however, it is relatively robust to such inconsistencies. This is ihustrated in figure 6.7. Here, almost aU hnes are given an unambiguous contiguity interpretation that is correct locaUy. The only exceptions i n this stream are two horizontal hnes that have been interpreted as inconsistent. Inconsistencies in convexity and slant sign are also picked up, but these are restricted entirely to the inner hnes, the outer sections having a completely unambiguous interpretat ion. Slant magnitude is completely unaffected by the inconsistencies in contiguity and convexity, w i th unambiguous interpretations assigned to virtuaUy aU Unes. Convexity C+) !• \ - i « # (I'l--Slant Sign (1) Slant Magnitude (Value) ••••-T:;:.::::.:;:!!JJ.-,.;v/..... >^Mai*K B B B • • B B B ' ' Nonconvexity (o) E E L Slant Sign (2) Slant Magnitude (Confidence) llQn Magnitude i i i iMmposs ib le «ww: Possible Narrow gray lines mark cell boundaries I Preferred ii) Objects of inconsistent slant Another class of impossible objects give rise to drawings in which the hnes cannot be given a consistent set of slant estimates. A n example is shown in figure 6.8. Such inconsistency negates the basis for the propagation of slant estimates along common edges. The results of the recovery process are shown in figure 6.8. A s seen from this figure, m u c h of the (local) three-dimensional structure is stiU recovered. Contiguity is assigned correctly to al l hnes, the only uncertainty existing in the outer edges. Convexity also is largely unaffected, although inconsistencies have begun to appear in the Y- junct ions . These inconsistencies are more severe i n the slant sign stream, although the arrow-junctions and L-junctions re ta in unambiguous interpretations. Because the estimation of slant magnitudes is independent of slant sign, unambiguous magnitude estimates are assigned to a l l the hnes. iii) Objects of inconsistent depth P a r t of the reason for the speed of the rapid recovery process is that it avoids global checks of the resulting description, using the consistency of the world itself as the basis for coherent interpretations. One important example of this is the complete lack of any check on depth information (section 3.1.1). This renders the process susceptible to a number of " ihusions" on drawings for which the corresponding surfaces have globally inconsistent depths. A n example of such a drawing is the Penrose triangle, shown in figure 6.9. A s seen from this figure, al l four streams result in interpretations that are largely u n a m -biguous for a l l hnes. The only exceptions are uncertain contiguity estimates for the outer hnes of the drawing, and uncertain slant estimates for the innermost hnes. B o t h of these are to be expected, since the uncertainty i n outer contiguity occurs for almost a l l drawings, and the uncertainty in slant estimates is a consequence of the inner hnes contacting only L - and T- junct ions , neither of which can give rise to a magnitude estimate. V i r t u a l l y aU local struc-ture is therefore recovered, w i th no inconsistencies being detected. The interpretation is an iUusion of exactly the type expected, w i t h v irtual ly aU edges assigned definite interpretations even though the corresponding object cannot be reahzed. Convexity C+) * : ; ; ; : Jt I ;* • I • Slant Sign (1) Slant Magnitude (Value) Slant Magnitude (Confidence) • Q c Magnitude i i i i : Impossible 4sww Possible Narrow gray lines mark cell boundaries I Preferred Convexity (+) Nonconvexity Co) Slant Sign (1) Slant Sign (2) Slant Magnitude (Value) Slant Magnitude (Confidence) • Q c Magnitude i i i i : Impossible «was Possible Preferred Narrow gray lines mark cell boundaries 6.2 Preattentive Recovery of Scene Structure The ult imate goal of the theory developed here is to explain the rapid recovery of three-dimensional structure in human early vision. In part icular , the goal is to explain why certain kinds of hne drawings can be rapidly detected in visual search tasks, and why others cannot. Figure 6.10 shows the set of results considered. The search items, together wi th the search rates, are taken from [ER91] and [ER92]. In ah. cases, two search rates are presented - those for displays in which the target is present, and those for which it is absent. The recovery ra t i o p is the measure developed in section 6.2.1 to explain these rates. A l though not exhaustive, this set is representative of what is known about search rates for various kinds of hne drawings. B y making several relatively simple assumptions about the relation of recovered structure to search rates, the theory is able to explain the relative difficulty of search for a l l cases examined. Because these assumptions are fairly general, they also aUow predictions to be made for drawings not yet tested. 6.2.1 B a s i c A s s u m p t i o n s T i m e and Space Parameters To carry out the analysis, it is necessary to specify both the size of the drawings and the amount of t ime to be allocated. In what foUows, drawings are scaled to have the same m a x i m u m extension. This is done so that the relative sizes match those of the drawings used for the experiments described in [ER91] and [ER92]. The extent of the drawings is taken to be 5 ceUs. If ceUs are related to hypercolumns (section 5.3), this wiU correspond closely to the ac tual number of hypercolumns involved. The time hmit is set at 5 iterations — enough for a one-time propagation of information across the m a x i m u m extent of the drawing. This is only meant to be a representative value, useful as the basis for a comparison of the difficulty of interpretation for various kinds of drawings. Relat ing structure to search rates Since the goal of this work is to explain the relative preference for certain kinds of hne drawings over others, and not the phenomenon of rapid detection per se, no commitment is Search Items Rates (ms/item) Condition larget Distractor Present Absent ji 12 oo B. 51 96 0.4 C. D. E. iCi S l j 22 35 37 52 31 65 66 80 oo 0.0 1.2 0.0 1.1 H. \ \ i \ r i 63 101 0.0 Figure 6.10: Results explained by theory. The search items and search rates are taken from [ER91] and [ER92] . The recovery ratio p, discussed in section 6.2.1, describes the difference i n the recovered three-dimensional structure of the target and distractor items. The correlation between p and search rate is evident. made here to any part icular model of v isual attention or visual search. Instead, a set of four relatively general assumptions is used to relate recovered structure to search rates: 1. Search rates increase with greater target-distractor distinctiveness. This assumes that search rates are largely governed by a signal-to-noise ratio that com-pares the relative number of distinctive features in the target to the number of features it shares w i t h the distractors. Th is is a widely-accepted assumption used to explain search rates for many kinds of v isual stimuh (e.g., [TG88, DH89]) . 2. Target-distractor distinctiveness is based on differences in slant. It is assumed that the slant sign and slant magnitude of each hne in the interpreted drawing are combined into a single quantity that acts as an irreducible feature, capable of being detected almost immediately when sufficiently distinct (section 2.1.2). This as-sumption is supported by the finding that the speed of search for hne drawings can be better explained in terms of three-dimensional rather than two-dimensional orientation [ER90b] . 3. C o m m o n uninterpretable lines increase target-distractor similarity. Uninterpretable hnes are assumed to be part of the "noise" that interferes wi th the process of distinguishing target from distractor in visual search tasks. Such interfer-ence could exist for a variety of reasons. If, for example, the rapid-recovery system acted only to eUminate impossible interpretations, Unes without a definite slant es-t imate would be assigned a l l possible values. This set of values would therefore be common to bo th target and distractor. 4. Slant is represented as a departure from zero. This takes slant to be a quantity hke two-dimensional orientation, which is represented as a departure from the canonical orientations of vertical or horizontal [TG88]. Here, the canonical value is assumed to be zero, i .e., a three-dimensional orientation perpendicular to the hne of sight. To obtain a quantitative measure of target-distractor similarity, addit ional assumptions are needed to refine the original set: 1'. Search rates increase with the recovery ratio p. The quantity p is defined here as the rat io of the target-distractor difference over the target-distractor similarity. A l -though this is a considerable simphfication that among other things completely ignores configurational effects among two-dimensional features, it nevertheless provides a rough quantitative measure that captures something of the trade-off between distinctiveness and similarity. 2'. Differences are based on unambiguous slant estimates. Unambiguous estimates are those for which a slant magnitude has been assigned to the hne (sec-tion) and for which one of the slant signs is preferred. In hght of assumption 4, only differences in nonzero slants contribute to tbe distinctiveness measure — since target and distractor always differ by a 180° rotat ion in the image, the slants of corresponding hnes differ in their sign. Consequently, the slant difference is always twice the value of the slant itseff. Since the exact location of a feature is not important in visual search (section 2.1.2), ambiguity may also arise i f two hnes of the same orientation have different slants. 3'. Similarities are based on ambiguous slant estimates. If an ambiguous inter-pretat ion exists for the slant sign, or i f a slant magnitude is not possible, the correspond-ing hne segment is considered to add to similarity in the same way as uninterpreted hnes. 4'. Slant signals are proportional to line length along each orientation. In effect, each smal l section of hne is assumed to signal the value of the slant at its locat ion, and to pass this value on to the mechanisms governing visual search. Since the location of a feature is not important for this purpose (cf. section 2.1.2), ah signals from a common orientation can simply be summed together. The tota l signal is therefore proport ional to the cumulative length along a particular direction. In order to avoid specifying different weights for different slants, each nonzero slant and slant difference are assigned the same value. In summary, then, search rates are assumed to increase wi th the recovery ratio p, defined as Yle Y^i {segment agj has unambiguous nonzero slant) ^ Y^e I2j {segment agj has ambiguous slant) ' where agj denotes a hne segment of orientation 6 in the image. Because of the asymmetry between upward and downward slants [ER90b] (also see fig 1.1), this rat io is taken to apply only to cases where the object corresponding to the target is slanted upward. A g a i n , it should be emphasized that the theory developed here is not addressed towards explaining such an asymmetry, but rather is only intended to explain the relative difficulty of search. 6.2.2 E x p l a n a t i o n of P s y c h o p h y s i c a l R e s u l t s Context Effects The first test of the theory is to see if it can explain why different contexts influence the detectabihty of a Y - junc t i on among a set of similar junctions rotated by 180°. The de-tectabihty of this junct ion is greatly affected by the presence and shape of the surrounding outhne, as shown in figure 6.10, taken from [ER91]. W h e n Y- junct ions are surrounded by a Slant Sign (1) Slant Sign (2) •OcMagnitude i i i i i Impossible wsfw Possible I M M Preferred Narrow gray lines mark cell boundaries Figure 6.11: Slant estimates for Condit ion A . Slant angle (in degrees) obtained by mult ip ly ing slant magnitude number by 20. quasi-hexagonal frame (Condit ion A ) they are detected quite rapidly (7 m s / i t e m for target present; 12 m s / i t e m for target absent). But a square surround (Condit ion B ) causes search to slow down considerably (51 m s / i t e m for target present; 96 m s / i t e m for target absent). A comparison of the interpretations for Condit ion A (figure 6.11) and for Condit ion B (fig-ure 6.12) shows that this effect is readily explained in terms of the recovered three-dimensional structure. The interpretation of Condit ion A contains no ambiguity in regards to slant, wi th a considerable difference between target and distractor. Since there are no nonzero slants in common, the recovery rat io p is infinite, accounting for the fast search that occurs for this condition. Slant Sign (1) Slant Sign C2) Slant Magnitude (Value) Slant Magnitude (Confidence) n Q z Magnitude lii»^ Impossible •»}«»; Possible Preferred Narrow gray lines mark cell boundaries Figure 6.12: Slant estimates for Condi t ion B . Slant angle (in degrees) obtained by mult ip ly ing slant magnitude number by 20. A l t h o u g h the long stem of the Y - junc t i on is assigned a unique slant, this is only one-third the "s ignal " obtained from the drawing of Condit ion A . Furthermore, a considerable amount of uninterpretable structure exists. The recovery ratio therefore has a relatively low value {p = 0.4), which explains the much lower search rate. Contiguity To determine whether the slow search found in Condit ion B is due to the failure of the recovery process or s imply due to the presence of T- junctions, consider the drawings of Condit ion C and Cond i t i on D , taken from [ER92]. A s seen from the figure, search for the target i n Condi t ion C is fast, w i th about the same speed as for that of Condit ion A . Consider now the drawings of Cond i t i on D . Since targets composed of two items can be easily detected in a background of single items [TG88], the target should be easy to detect i f the distractor is not segmented into two groups. The target also differs in overah shape in the image, which can only help to speed search. B u t the search rates (22 m s / i t e m for target present; 31 m s / i t e m for target absent) clearly show that search is relatively difficult. The interpretation of the drawing in Condit ion C is shown in figure 6.13. The T-junctions have part i t ioned the drawing into two groups, each of these being interpreted as a complete block w i t h unambiguous slants assigned to al l hnes. The high recovery ratio is therefore high (p = oo), explaining the high speed of search. The distractors in Condi t i on D , being identical to those of Condi t ion C , have hkewise been interpreted as a pair of blocks. Since al l nonzero slants match those of the separate blocks in the target i tem, however, no slant differences exist, and so p is zero. Target and distractor differ only in the relative location of their parts, and since relative location cannot be determined at early levels (e.g., [Jul84a, Tre88], search is to be expected to be relatively slow. A l t h o u g h search is faster than in Condit ion B , this can easily be attr ibuted to some weak effect resulting from the overall difference in two-dimensional shape. Rectangularity Condit ions E and F (taken from [ER91]) provide a direct test of the rectangularity constraint. These drawings have been distorted so as to violate the assumption of rectangularity in two different ways. In Condit ion E , the internal Y - junc t i on has been altered so that the system of junctions cannot be consistently interpreted as rectangular; indeed, the top surface is no Slant Sign (1) Slant Sign (2) 1 Slant Magnitude (Value) ® 55? m Slant Magnitude (Confidence) oQ^ Magnitude i iH! Impossible WBWW Possible Narrow gray lines mark cell boundaries 1 Preferred Figure 6.13: Slant estimates for Condit ion C . Slant angle (in degrees) obtained by mult ip ly ing slant magnitude number by 20. •Qcf^^Snitude i i in Impossible Possible Preferred Narrow gray lines mark cell boundaries Figure 6.14: Slant estimates for Condi t i on E . Slant angle (in degrees) obtained by mult ip ly ing slant magnitude number by 20. longer even planar. Tbis condition leads to slow search. To control for the possibihty that paraUehsm rather than rectangularity is the key property, Condit ion F uses a cube stretched vertically, so that parahel hnes remain parallel while both the Y - junc t i on and arrow-junction now violate Perkins ' laws. Search is again slowed. The interpretation of the drawing in Condit ion E is shown in figure 6.14. Here, the distortions of the junctions have created conflicts in both slant sign and slant magnitude along several hnes. The low value of the recovery ratio (p = 1.2) then explains the slow search speeds found. The results of Condi t ion F are also easily explained - since the junctions violate Perkins ' laws, an in i t ia l assignment of slant magnitude is not even attempted. Consequently, p is zero and search is slow. Connectedness To test the possibihty that rap id recovery is based on the direct lookup of complete objects rather than v i a the interaction of more local structures (cf. section 2.3), search rates were determined for the drawings of Conditions G and H (taken from [ER91]). Condit ion G corresponds to the rectangular block of Condit ion A , wi th gaps introduced midway along the lengths of the hnes. If lookup depends on the presence of local features alone, search rates should be similar to those for Condi t ion A . However, search slows down dramatical ly for this condition (52 m s / i t e m for target present; 80 m s / i t e m for target absent). A similar s ituation arises i n Cond i t i on H , where the junctions themselves have been removed, leaving only a set of isolated hnes in place. A g a i n , search slows down considerably (63 m s / i t e m for target present; 101 m s / i t e m for target absent). These results show that junctions are necessary for three-dimensional orientation to be recovered, but that they are not sufiicient. A l t h o u g h difficult to account for by a process based on the lookup of complete objects, these results are readily explained by the rapid-recovery process developed here. The inter-pretat ion of the drawing in Condi t i on G is shown in figure 6.15. The introduction of the gaps results i n two major differences from the estimates for Condit ion A : (1) instead of a single object, the drawing gives rise to a number of smaller parts scattered about the image, and (h) the isolation of the L- junct ion prevents them from receiving any k ind of slant estimate. T w o sources of slowdown therefore emerge: not only are there a larger number of items to be considered, but the recovery rat io itself has a low value (p = 1.1) due to the uninterpreted L-junctions.'^ A n even simpler explanation can be given for the results of Condit ion H . Here, the absence of junctions prevents any slant estimate from being assigned to the hnes. A s such, they are left as sets of simple two-dimensional objects, which require higher-level processing to be grouped into assembhes corresponding to three-dimensional objects. scatter in slant estimates would also result if lines in the drawings are sufficiently small that accurate orientation measurements cannot be made. This scatter could only reduce search rates further. Slant Sign (1) Slant Sign (2) • O ^ Magnitude i i i i ! Impossible www Possible B I I M Preferred Narrow gray lines mark cell boundaries Figure 6.15; Slant estimates for Condit ion G . Slant angle (in degrees) obtained by mul t ip ly ing slant magnitude number by 20. Chapter 7 Summary and Conclusions A computat ional theory is developed to explain the rapid interpretation of hne drawings at early levels of human vision. This is done by first extending the framework of M a r r [Mar82] to allow processes to be analyzed in terms of hmits on their computational resources. The problem of rap id hne interpretation is then examined along two dimensions: (i) reducing the t o ta l amount of information to be transmitted , and (h) making effective use of the information that is processed. The first of these is addressed by developing constraints on the structure of the recovered object that aUow it to interpreted in subhnear time. The second is handled by constraints on the dynamic operation of the recovery process so that it considers the most hkely interpretations f irst. It is shown that the resulting process can be implemented on a mesh of simple processing elements, and that it can recover a considerable amount of three-dimensional structure in very httle t ime. It also is shown that such a process can explain the abihty of human vision to recover three-dimensional orientation at preattentive levels. These results are relevant to several areas of study. F i r s t , the extension of M a r r ' s frame-work developed in section 2.4 provides a way to discuss the various factors involved when a process is to be explained i n term of hmited computational resources. Th is extension has ele-ments contained in previous attempts to incorporate resource hmitations (e.g., [FB82, Tso87]) into a computat ional framework, but it also puts forward several new distinctions (e.g., exter-n a l vs. internal constraints, constraint vs. hmitat ion) , and treats these in a more systematic way. A l t h o u g h st i l l in rudimentary form, this framework can help guide the development of computat ional theories for other resource-hmited processes. Another , more concrete framework is the taxonomy of image mappings proposed in sec-t ion 2.1.1. Here, mappings are grouped on the basis of information flow across the image, which i n t u r n is related to lower bounds on their computational complexity. The structure of this framework remains conjectural at the moment. If proven, these results would be inter-esting extensions of the work of M i n s k y and Papert [MP69] on the abihties of simple paral le l architectures to carry out various kinds of operations on images. The developments in chapter 3 provide several interesting results concerning the complex-i ty of coUapsed constraint satisfaction problems. These results support earUer observations (e.g., [Mac74, Mal87]) that such systems can often be solved quite easily. They also show that careful selection and coordination of such "coUapsed" subsystems can lead to approximations that are not only soluble i n subUnear t ime, but that also retain much of the information i n the or ig inal set of constraints. It would be interesting to see whether the approach developed here (viz . , separation into weakly interacting subsets of binary and bijective constraints) could be usefuUy apphed in other domains. The complementary subsystems developed in chapter 4 provide an interesting way to handle local inconsistencies and ambiguities. In part icular , their incorporation into a pair of Uberal and conservative interpretation schemes suggests a general way to handle interpre-tat ion problems that require inconsistency and ambiguity to be exphcitly represented and treated in a systematic fashion. F ina l ly , the results of chapters 5-6 provide support for the view of early vision sketched in section 2.3.2 — that the "hor izontal" modules formed by different levels of processing can be complemented by "vert i ca l " columns capable of providing interpretations that are locaUy consistent. This has imphcations for the study of both machine and biological v is ion systems. The algorithms developed in chapter 5 show that this style of processing can be easily incorporated into a machine vision system, aUowing it to obtain rapid estimates of scene-based properties at aU points in the image. It is seen from the results of section 6.1 that a considerable amount of scene structure can often be recovered this way. Consequently, a rap id recovery process can greatly facihtate the overaU operation of a machine vision system. The results of section 6.2 hold a similar imphcation for biological vision systems — rap id recovery at early levels can be used to help quickly construct higher-level descriptions of the wor ld . Furthermore, given that hne interpretation is relatively difficult at early levels (cf. section 1.1), the results of chapter 6 make it plausible that other kinds of rapid recovery processes may also exist at these levels. O p e n Questions and Future Directions M a n y of the results concerning the actual performance of the rapid recovery process are based on t ime and space parameters assumed to be representative of early visual processing. A l t h o u g h suitable as a first approximation, the selection of these values is nevertheless some-what arbitrary. It would therefore be useful to carry out a set of psychophysical experiments to examine the t ime course of this process in greater detail , and to see if these values t r u l y are representative. A m o n g other things, such experiments might be able to confirm or refute the theory in regards to the order in which various properties are actually recovered. A related set of issues apphes to the recovery ratio of section 6.2, used to relate recovered structure to search rate. This quantity is sufhcient for present purposes, but is only a rough indicator of search difficulty, and ideally would be replaced by a more rehable measure. T h e general idea that a signal-to-noise ratio largely governs search speed is widely accepted (e.g., [TG88, DH89]) , but a more precise measure is not currently known. A s such, this problem is not hmited to explaining the results of search for hne drawings. But as data accumulates from more search experiments, it might at least be possible to refine the recovery ratio to take Into account such possibihties as several canonical slant values, and different weights for different slant magnitudes. A more general set of concerns involves the way in which rap id recovery Is related to object recognition. One of the main roles assumed for rapid recovery is to provide early estimates of scene-based properties that facihtate later processes, including those Involved w i t h object recognition (section 2.3.2). It Is entirely possible, however, that object recognition proceeds by a lookup mechanism that uses simple Image properties to retrieve a complete globahy-conslstent model of the object (e.g., [PE90]). If so, rap id recovery at early levels could be accounted for entirely in this way. The results of section 6.2, however, show that recovery is destroyed by nonrectangular corners and by the introduct ion of gaps Into the drawings, something rather difficult to account for In terms of this mechanism. Furthermore, a theoretical objection can also be raised against the Indiscriminate use of lookup tables, since an enormous amount of memory would be required to store ah possible views of each object at ah possible angles (see section 2.3). Lookup for a hmited number of objects, however. Is entirely possible. Indeed, the process developed here can itself be viewed as using a simple form of lookup (cf section 5.1.3), the in i t ia l interpretations based on a smaU number of " loca l " models Invoked by the junctions and the resulting interpretations then weeded out by in situ constraints. Since even the consistency of global models w i th each other must also be estabhshed in some way, the issue is therefore one of determining the appropriate granularity of the models involved. A n interesting direction for future research is to ascertain the various levels of granularity that might be used, and to determine how models of different granularity might interact. In any event, it has been shown here that smaher-grained " l oca l " models are sufficient to allow a substantial amount of three-dimensional structure to be recovered in very httle t ime. It has also been shown that the properties recovered in this way can be used to explain why part icular kinds of hne drawing are or are not interpreted at early levels of human vision. A s such, the central point of this work has been estabhshed — substantial amounts of scene structure can be recovered i n very httle time by sphtting a process into quasi-independent streams that are each concerned wi th a single aspect of scene structure. This principle is , of course, not hmited to rap id hne interpretation, and it wiU be interesting to see if it can be apphed to other forms of rap id perception and rapid cognition. Bibliography [AF69] F . Attneave and R. Frost . The determination of perceived tridimensional orienta-t ion by m i n i m u m criteria. Perception & Psychophysics, 6B:391-396, 1969. [Arb87] M . A . A r b i b . Brains, Machines, and Mathematics (2nd ed.). New York and B e r h n : Springer, 1987. [Att54] F . Attneave. Some informational aspects of visual perception. Psychological Re-view, 61:183-193, 1954. [Att72] F . Attneave. Representation of physical space. In A . W . Me l ton and E . M a r t i n , editors. Coding Processes in Human Memory, pages 11-29. Washington D C : V . H . W i n s t o n & Sons, 1972. [Att82] F . Attneave. Pragnanz and soap bubble systems: A theoretical exploration. In J . Beck, editor, Organization and Representation in Perception, pages 11-29. HiUs-dale, N J : E r l b a u m , 1982. [BA88] J . R . Bergen and E . H . Adelson. Ear ly vision and texture perception. Nature, pages 363-364, 1988. [Baa78] S. Baase. Computer Algorithms: Introduction to Design and Analysis. Reading, M A : Addison-Wesley, 1978. [Bal91] D . H . B a l l a r d . A n i m a t e vision. Artificial Intelligence, 48:57-86, 1991. [BB82] D . H . B a l l a r d and C M . B r o w n . Computer Vision. Englewood Chffs, N J : Prentice-H a l l , 1982. [BCG90] A . C . Bov ik , M . C l a r k , and W . S . Geisler. Mult ichannel texture analysis using locahzed spatial filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12:55-73, 1990. [Bec66] J . Beck. Effect of orientation and of shape similarity on perceptual grouping. Perception & Psychophysics, 1:300-302, 1966. [Bec82] J . Beck. Textura l segmentation. In J . Beck, editor. Organization and Representa-tion in Perception, pages 285-317. Hillsdale, N J : E r l b a u m , 1982. [BGS91] J . Beck, N . G r a h a m , and A . Sutter. Lightness differences and the perceived seg-regation of regions and populations. Perception & Psychophysics, 1991. [Bie85] I. Biederman. H u m a n image understanding: Recent research and a theory. Com-puter Vision, Graphics and Image Processing, 32:29-73, 1985. [Bis84] P . O . Bishop. Processing of v isual information wi th in the retinostriate system. In J . M . Brookhart and V . B . Mountcast le , editors. Handbook of Physiology, Vol. Ill, Sensory Processes, Part 1, pages 341-424. Bethesda, M D : Amer i can Physiological Society, 1984. [BL86] F . Bosehe and E . L . J . Leeuwenberg. A test of the m i n i m u m principle requires a perceptual coding system. Perception, 15:331-354, 1986. [Bla89] A . Blake. Comparison of the efficiency of deterministic and stochastic algorithms for v isual reconstruction. IEEE Trans. Pattern Analysis and Machine Intelligence, 11:2-12, 1989. [Bra77] A . Brandt . Mul t i - l eve l adaptive techniques for part ia l differential equations: Ideas and software. In J . R . Rice, editor. Mathematical Software III, pages 277-318. New Y o r k , N Y : Academic , 1977. [Bri87] W . L . Briggs. A Multigrid Tutorial. Phi ladelphia , P A : Society for Industrial and A p p h e d Mathemat ics , 1987. [Bro81] R . A . Brooks. Symbohc reasoning among 3d models and 2d images. Artificial Intelligence, 17:285-348, 1981. [BT78] H . G . Barrow and J . M . Tenenbaum. Recovering intrinsic scene characteristics from images. In A . R . Hanson and E . M . Riseman, editors. Computer Vision Systems, pages 18-24. New Y o r k , N Y : Academic , 1978. [BWH75] K . Berbaum, N . Weisstein, and C.S . Harr is . A vertex-superiority effect. Bull. Psychonomic Society, 6:418, 1975. [Cae84] T . CaeUi. On the specification of coding principles of visual image processing. In P. DodweU and T . Caehi , editors. Figurai Synthesis, pages 153-184. HiUsdale, N J : E r l b a u m , 1984. [CAT90] P. Cavanagh, M . A r g u i n , and A . Treisman. Effect of surface medium on v i -sual search for orientation and size features. Journal of Experimental Psychology, 16:479-491, 1990. [CF82] P . R . Cohen and E . A . Feigenbaum. V is i on . In E . A . Feigenbaum and P . R . Cohen , editors. The Handbook of Artificial Intelligence (Vol. 3), pages 139-194. Stanford, C A : Heuris Tech Press, 1982. [CHY90] K . C u h k II , L . P . H u r d , and S. Y u . Computat ion theoretic aspects of cehular automata . Physica D, 45:357-378, 1990. [Clo71] M . B . Clowes. O n seeing things. Artificial Intelligence, 2:79-112, 1971. [CW90] K . R . Cave and J . M . Wolfe. Modehng the role of parallel processing in v isual search. Cognitive Psychology, 22:225-271, 1990. [DH89] J . Duncan and G . W . Humphreys. V i s u a l search and stimulus similarity. Psycho-logical Review, 96:433-458, 1989. [Dra81] S . W . Draper . The use of gradient and dual space i n hne-drawing representation. Artificial Intelligence, 17:461-508, 1981. [Dun89] J . D u n c a n . Boundary conditions on parallel processing in human vision. Percep-tion, 18:457-469, 1989. [dYvE88] E . A . de Yoe and D . C . van Essen. Concurrent processing streams in monkey v isual cortex. Trends in Neuroscience, 11:219-226, 1988. [Ede87] S. E d e l m a n . Line connectivity algorithms for an asynchronous pyramid computer. Computer Vision, Graphics,and Image Processing, 40:169-187, 1987. [EIS76] S. E v e n , A . I ta i , and A . Shamir. O n the complexity of timetable and mult i com-modi ty flow problems. SI AM Journal on Computing, 5:691-703, 1976. [ER90a] J . T . Enns and R . A . Rensink. Influence of scene-based properties on visual search. Science, 247:721-723, 1990. [ER90b] J . T . Enns and R . A . Rensink. Sensitivity to three-dimensional orientation in v isual search. Psychological Science, 1:323-326, 1990. [ER91] J . T . Enns and R . A . Rensink. Preattentive recovery of three-dimensional orienta-t ion from hne drawings. Psychological Review, 98:335-351, 1991. [ER92] J . T . Enns and R . A . Rensink. A model for the rapid interpretation of hne drawings in early vision. In Visual Search II. London: Taylor k Francis, 1992. [Fal72] G . Falk . Interpretation of imperfect hne data as a three-dimensional scene. Arti-ficial Intelligence, 3:101-144, 1972. [Fel85] J . Feldman. Connectionist models and parallelism in higl i level vision. In A . Rosen-feld, editor, Human and Machine Vision II, pages 86-108. New Y o r k , N Y : A c a -demic, 1985. [Fly72] M . J . F l y n n . Some computer organizations and their effectiveness. IEEE Transac-tions on Computers, 21:948-960, 1972. [GA56] B . F . Green and L . K . Anderson. Color coding in a visual search task. Journal of Experimental Psychology, 51:19-24, 1956. [GA68] A . Guzman-Arenas . Decomposition of a visual scene into three-dimensional bodies. In Proc. AFIPS Fall Joint Computer Conf, pages 291-304, 1968. [GB89] R. Gurnsey and R . A . Browse. Asymmetries in visual texture discrimination. Spa-tial Vision, 4:31-44, 1989. [GG84] S. Geman and D . Geman. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 6:721-741, 1984. [GJ79] M . R . Garey and D .S . Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. San Francisco: W . H . Freeman, 1979. [GJM88] I.I. Glezer, M . S . Jacobs, and P . J . Morgane. Imphcations of the ' in i t ia l b r a i n ' concept for brain evolution in cetacea. Behavioral and Brain Sciences, 11:75-116, 1988. [Gla84] F . Glazer . Mul t i l eve l relaxation in low-level computer vision. In A . Rosenfeld, editor, Multiresolution Image Processing and Analysis, pages 312-330. New Y o r k and B e r h n : Springer, 1984. [Gol69] M . J . E . Golay. Hexagonal parallel pattern transformations. IEEE Transactions on Computers, 18:733-740, 1969. [Gou89] S . J . G o u l d . Wonderful Life: The Burgess Shale and the Nature of History. New Y o r k : N o r t o n , 1989. [GR88] A . Gibbons and W . Ryt te r . Efficient Parallel Algorithms. Cambridge : Cambridge University Press, 1988. [Gra85] N . G r a h a m . Detection and identification of near-threshold visual patterns. Journal of the American Optical Society, A, 2:1468-1482, 1985. [Har87] D . Hare l . Algorithmics. Reading, M A : Addison-Wesley, 1987. [Hil84] W . D . HiUis. T h e Connection Machine: A computer architecture based on ceUular automata. Physica D, 10:213-228, 1984. [HM53] J . Hochberg and E . M c A l i s t e r . A quantitative approach, to figurai goodness. Jour-nal of Experimental Psychology, 46:361-364, 1953. [Hoc78] J . E . Hochberg. Perception (2nd ed.). Englewood Chffs, N J : Prentice-Hah, 1978. [Hor86] B . K . P . H o r n . Robot Vision. Cambridge, M A : M I T Press; New York: M c G r a w - H i l l , 1986. [Hub81] D . H . Hubel . Explorat ion of the pr imary visual cortex, 1955-78. Nature, 299 :515-524,1981. [Huf71] D . A . Huf fman. Impossible objects as nonsense sentences. In B . Meltzer and D . Michie , editors. Machine Intelligence 6, pages 295-323. Edinburgh : E d i n b u r g h University Press, 1971. [HW74] D . H . Hube l and T . N . Wiesel . Sequence regularity and geometry of orientation columns i n monkey striate cortex. J. Comparative Neurology, 158:267-294, 1974. [HZ83] R . A . H u m m e l and S .W. Zucker. O n the foundations of relaxation labehng pro-cesses. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5 :267-287,1983. [JB83] B . Julesz and J . R . Bergen. Textons, the fundamental elements in preattentive vision and perception of textures. Bell System Technical Journal, 62:1619-1645, 1983. [Joh90] D . S . Johnson. A catalog of complexity classes. In J . van Leeuwen, editor. Handbook of Theoretical Computer Science, Vol. A, pages 69-161. Amsterdam: Elsevier , 1990. [Jul81] B . Julesz. Textons, the elements of texture perception, and their interactions. Nature, 290:91-97, 1981. [Jul84a] B . Julesz. A brief outhne of the texton theory of human vision. Trends in Neuro-science, 7:41-45, 1984. [Jul84b] B . Julesz. Toward an axiomatic theory of preattentive vision. In G . M . Edeknan , W . E . G a l l , and W . M . Cowan, editors. Dynamic Aspects of Neocortical Function, pages 585-612. Neurosciences Research Foundation, 1984. [Jul86] B . Julesz. Texton gradients: The texton theory revisited. Biological Cybernetics, 54:245-261, 1986. [Jul87] B . Julesz. Preattentive human vis ion, a hnk between neurophysiology and psy-chophysics. In F . P l u m et a l . , editor. Handbook of Physiology, Vol V. Amer i can Physiology Society, 1987. [KanSO] T . Kanade . A theory of or igami world. Artificial Intelligence, 13:279-311, 1980. [Kan90] K . K a n a t a n i . Group-Theoretical Methods in Image Understanding. New York and Berhn : Springer, 1990. [Kar84] N . K a r m a r k a r . A new po lynomial time algorithm for hnear programming. Com-binatorica, 4:373-395, 1984. [Kar90] J . K a r l . Reversibihty of 2d cehular automata is undecidable. Physica D, 45 :379-385, 1990. [Kha79] L . G . Khach iyan . A polynomial algorithm for hnear programming. Soviet Math. Dokl., 20:191-194, 1979. [KI85] J . K i t t l e r and J . Ihingworth. Relaxation labelling algorithms - a review. Image and Vision Computing, 3:206-216, 1985. [KP85] L . M . Kirousis and C H . Papadimitr iou . The complexity of recognising polyhedral scenes. In 26th FOCS, 1985. [KP88] L . M . Kirousis and C H . Papad imi tr i ou . The complexity of recognizing polyhedral scenes. Journal of Computer and System Sciences, 37:14-38, 1988. [KU84] C K o c h and S. U h m a n . Selecting one among the many: A simple network i m -plementing shifts in selective visual attention. Technical Report A . I . Memo 770, M I T , 1984. [Kul87] Z. K u l p a . P u t t i n g order in the impossible. Perception, 16:201-214, 1987. [Lak78] I. Lakatos . The Methodology of Scientific Research Programmes. Cambridge : Cambridge University Press, 1978. [LAN89] W . L i m , A . Agrawal , and L . Nekludova. A fast parahel algorithm for labehng connected components in image arrays. In P . M . Dew, R . A . Earnshaw, and T . R . Heywood , editors. Parallel Processing for Computer Vision and Display, pages 169-179. Reading, M A : Addison-Wesley, 1989. [LBC89] J . J . L i t t l e , G . E . Blehoch, and T . A . Cass. Algor i thmic techniques for computer vision on a fine-grained paraUel machine. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11:244-257, 1989. [Lee71] E . L . J . Leeuwenberg. A perceptual coding language for visual and auditory pat -terns. American Journal of Psychology, 84:307-350, 1971. [Lev86] H . J . Levesque. M a k i n g behevers out of computers. Artificial Intelligence, 3 0 : 8 1 -108, 1986. [Low85] D . Lowe. Perceptual Organization and Visual Recognition. Boston: Kluwer A c a -demic, 1985. [Low87] D . Lowe. The viewpoint consistency constraint. International Journal of Computer Vision, 1:57-72, 1987. [Mac73] A . K . M a c k w o r t h . Interpreting pictures of polyhedral scenes. Artificial Intelligence, 4:121-137, 1973. [Mac74] A . K . M a c k w o r t h . On the Interpretation of Drawings as Three-Dimensional Scenes. D . P h i h thesis. University of Sussex, January 1974. [Mac76] A . K . M a c k w o r t h . Model -driven interpretation i n inteUigent vision systems. Per-ception, 5(3) :349-370,1976. [Mac77] A . K . M a c k w o r t h . Consistency in networks of relations. Artificial Intelligence, 8:99-118, 1977. [Mac91] A . K . M a c k w o r t h . The logic of constraint satisfaction. Technical Report TR-91-26 , Department of Computer Science, University of B r i t i s h Co lumbia , 1991. [Mal87] J . M a h k . Interpreting hne drawings of curved objects. Int. J. Computer Vision, 1:73-103, 1987. [Mar79] D . M a r r . Representing and computing visual information. In Artificial Intelligence: An MIT Perspective, pages 15-80. Cambridge, M A : M I T Press, 1979. [Mar82] D . M a r r . Vision. San Francisco: W . H . Freeman, 1982. [MD90] J . Mulder and R . J . M . Dawson. Reconstructing polyhedral scenes from single two-dimensional images: The orthogonahty hypothesis. In P . K . Patel-'Schneider, editor, Proceedings of the 8th Biennial Conf. of the CSCSI, pages 238-244, 1990. [MF85] A . K . M a c k w o r t h and E . C . Freuder. The complexity of some polynomial network consistency algorithms for constraint satisfaction problems. Artificial Intelligence, pages 65-74, 1985. [MMH85] A . K . M a c k w o r t h , J . A . Mulder , and W . S . Havens. Hierarchical arc consistency: Exp lo i t ing structured domains in constraint satisfaction problems. Computational Intelligence, 1:71-79, 1985. [MN87] J . H . MaunseU and W . T . Newsome. V i s u a l processing in monkey extrastriate cor-tex. Ann. Rev. Neurosci., 10:363-401,1987. [MP69] M . M i n s k y and S. Papert . The Perceptron: Principles of Computational Geometry. Cambridge, M A : M I T Press, 1969. [MS87] R . Mi l l er and Q . F . Stout. D a t a movement techniques for the pyramid architecture. SI AM Journal of Computing, 16:38-60, 1987. [Nei63] U . Neisser. Decision time without reaction time: Experiments in visual scanning. American Journal of Psychology, 76:376-385, 1963. [NKP87] L . N i , C . T . K i n g , and P. Pr ins . Paral le l algorithm considerations for hypercube management. In Proceedings of the 1987 Intl. Conf. on Parallel Processing, pages 717-720, 1987. [Not91] H . C . Nothdurf t . Different effects from spatial frequency masking in texture seg-regation and texton detection tasks. Vision Research, 31:299-320, 1991. [NS86] K . Nakayama and G . H . Silverman. Serial and parahel processing of visual feature conjunctions. Nature, 320:264-265, 1986. [PC80] D . N . Perkins and R . G . Cooper J r . How the eye makes up what the hght leaves out. In M . A . Hagen, editor. The Perception of Pictures: Vol II, pages 95-130. New York , N Y : Academic , 1980. [PD84] K . Preston, J r and M . J . B . Duff. Modern Cellular Automata: Theory and Appli-cations. New York : P lenum, 1984. [PDL+79] K . Preston , J r . , M . J . B . Duff, S. Lev ia ld i , P .E .Norgren , and J . - I .Tor iwaki . Basics of cehular logic w i t h some apphcations in medical image processing. Proc. IEEE, 67:826-856, 1979. [PE90] T . Poggio and S. Ede lman. A network that learns to recognize three-dimensional objects. Nature, 343:263-266, 1990. [Per68] D . N . Perkins . Cub ic corners. Technical Report Quarterly Progress Report No . 89, Research Laboratory of Electronics, M I T , 1968. [Per72] D . N . Perkins . V i s u a l discrimination between rectangular and nonrectangular par-aUelopipeds. Perception & Psychophysics, 12:396-400, 1972. [Per76] D . N . Perkins . How good a bet is good form? Perception, 5:393-406, 1976. [Per82] D . N . Perkins . The perceiver as organizer and geometer. In J . Beck, editor. Orga-nization and Representation in Perception, pages 73-93. Hihsdale, N J : E r l b a u m , 1982. [ P T K 8 5 ] T . Poggio, V . Torre , and C . K o c h . Computat iona l vision and regularization theory. Nature, 317:314-319,1985. [Rab84] P. R a b b i t t . The control of attention in visual search. In Varieties of Attention, pages 273-291. New Y o r k , N Y : Academic , 1984. [Ram85] V . S . Ramachandran . The neurobiology of perception. Perception, 14:97-103,1985. [Ram88] V . S . Ramachandran . Perceiving shape from shading. Sci. Am., 259:76-83, A u g 1988. [Ree84] A . P . Reeves. Parahel computer architectures for image processing. Computer Vision, Graphics, and Image Processing, 25:68-88, 1984. [Res82] F . Restle. Cod ing theory as an integration of gestalt psychology and information theory. In J . Beck, editor, Organization and Representation in Perception, pages 31-56. Hil lsdale , N J : E r l b a u m , 1982. [Ric88] W . Richards. Image interpretation: Information at contours. In W . Richards , editor. Natural Computation, pages 17-36. Cambridge, M A : M I T Press, 1988. [RM89] R. Reiter and A . K . M a c k w o r t h . A logical framework for depiction and image interpretat ion. Artificial Intelligence, pages 125-155, 1989. [Rob65] L . G . Roberts . Match ing perception of three-dimensional sohds. In Optical and Electro-optical Information Processing, pages 159-197. Cambridge, M A : M I T Press, 1965. [RobSO] J . G . Robson. Neura l images: The physiological basis of spatial vision. In C .S . H a r -r is , editor. Visual Coding and Adaptability, pages 177-214. Hihsdale, N J : E r l b a u m , 1980. [Ros83] A . Rosenfeld. Paral le l image processing using ceUular arrays. Computer, 16:14-20, 1983. [Ros86] A . Rosenfeld. P y r a m i d algorithms for perceptual organization. Behavior Research Methods, Instruments, & Computers, 18:595-600, 1986. [Ros87] A . Rosenfeld. Recognizing unexpected objects: A proposed approach. In Proc. of the DARPA Image Understanding Workshop, pages 620-627, 1987. [RP91] R . A . Rensink and G . Provan. The analysis of resource-hmited vision systems. In Proc. Thirteenth Annual Conf. of the Cognitive Science Society, pages 311-316, 1991. [Sal85] S . N . Salthe. Evolving Hierarchical Systems. New York : Co lumbia University Press, 1985. [SBG89] A . Sutter , J . Beck, and N . G r a h a m . Contrast and spatial variables in texture segregation: Testing a simple spatial-frequency channels model. Perception & Psychophysics, 46:312-332, 1989. [Sch86] P . H . SchlUer. The central visual system. Vision Research, 26:1351-1386,1986. [She81] R . N . Shepard. Psychophysical complementarity. In M . K u b o v y and J . R . Pomer -antz, editors, Perceptual Organization, pages 279-341. Hihsdale, N J : E r l b a u m , 1981. [She83] G . M Shepherd. Neurobiology. Newr York and Oxford : Oxford University Press , 1983. [Sim81] H . A . S imon. The Sciences of the Artificial (2nd ed.). Cambridge, M A : M I T Press; New Y o r k : M c G r a w - H i U , 1981. [Smi90] M . A . S m i t h . Representations of geometrical and topological quantities in cehular automata . Physica D, 45:271-277, 1990. [Ste78] K . A . Stevens. Computat i on of locaUy parallel structure. Biological Cybernetics, 29:19-28, 1978. [Sto87] Q . F . Stout. P y r a m i d algorithms opt imal for the worst case. In L . U h r , editor . Parallel Computer Vision, pages 147-168. New York , N Y : Academic , 1987. [Sto88] Q . F . Stout . M a p p i n g vision algorithms onto parahel architectures. Proc. IEEE, 76:982-995, 1988. [Sug86] K . Sugihara. Machine Interpretation of Line Drawings. Cambridge, M A : M I T Press; New Y o r k : M c G r a w - H i h , 1986. [SV82] Y . Shiloach and U . V l s h k i n . A n O(log n) paraUel connectivity algorithm. Journal of Algorithms, 3 :57-67,1982. [Tan84] S . L . Tanimoto . Sort ing, histogramming, and other statistical operations on a p y r a m i d machine. In A . Rosenfeld, editor, Multiresolution Image Processing and Analysis, pages 136-145. New York and Berhn : Springer, 1984. [TB90] J . T . T o d d and P . Bressan. The perception of 3-dimensional affine structure from m i n i m a l apparent motion sequences. Perception & Psychophysics, 48:419-430, 1990. [TG79] J . P . Thomas and J . Gihe. Bandwidths of orientation channels in human vis ion. Opt. Soc. Am. A, 5:652-660, 1979. [Tho72] G . B . Thomas. Calculus and Analytic Geometry. Reading, M A : Addison Wesley, 1972. [TM87] T . Toffoh and N . H . Margolus . Cellular Automata Machines: A New Environment for Modeling. Cambridge, M A : M I T Press, 1987. [TM90] T . TofFoh and N . H . Margolus . Invertible ceUular automata: A review. Physica D, 45:229-253,1990. [Tow72] J . T . Townsend. Some results concerning the identifiabihty of paraUel and serial processes. British J. Math. Statist. Psychol., 25:168-199, 1972. [Tre82] A . Treisman. Perceptual grouping and attention in visual search for features and for objects. J. of Experimental Psychology, 8:194-214, 1982. [Tre88] A . Treisman. Features and objects: The fourteenth bartlett memorial lecture. Quart. J. of Experimental Psychology, 40A:201-237, 1988. [Tre91] A . Treisman. Search, similarity, and integration of features between and w i t h i n dimensions. Journal of Experimental Psychology: Human Perception and Perfor-mance, 17:652-676,1991. [TS90] A . Treisman and S. Sato. Conjunct ion search revisited. Journal of Experimental Psychology: Human Perception and Performance, 16:459-478,1990. [Tso87] J . K . Tsotsos. A "complexity Leve l " analysis of vision. International Journal of Computer Vision, 1:346-355, 1987. [Tso90] J . K . Tsotsos. A n a l y z i n g Vis ion at the Complexity Level . Behavioral and Brain Sciences, 13, 1990. [ T W W 8 8 ] J . F . Traub , G . W . WasiUcowski, and H . Wozniakowski . Information-Based Com-plexity. New Y o r k , N Y : Academic , 1988. [Uhr87] L . U h r . Highly paraUel, hierarchical, recognition cone perceptual structures. In L . U h r , editor. Parallel Computer Vision, pages 249-292. New Y o r k , N Y : A c a -demic, 1987. [UU84] S. UUman. V i s u a l routines. Cognition, 18:97-159, 1984. [Vol82] R. VoUmar. Some remarks about the efficiency of po lyautomata . International Journal of Theoretical Physics, 21:1007-1015, 1982. [VP88] H . Voorhees and T . Poggio. Comput ing texture boundaries from images. Nature, 333:364-367,1988. [Wal72] D . L . W a l t z . Generating semantic descriptions from drawings of scenes w i t h shad-ows. Technical Report A I - T R - 2 7 1 , M I T , 1972. [Wal87] D . Walters . Selection of image primitives for general purpose visual processing. Computer Graphics and Image Processing, 37:261-298, 1987. [Wat87] A . B . Watson . Efficiency of a model image code. J. Opt. Soc. Am. A, 4:2401-2417, 1987. [Wat88] R . J . W a t t . Visual Processing. Hihsdale, N J : E r l b a u m , 1988. [WB82] J . M . Woodhouse and H . B . Bar low. Spatial and temporal resolution and analysis. In H . B . Barlow and J . D . M o h o n , editors. The Senses, pages 133-164. Cambridge : Cambridge University Press, 1982. [WH74] N . Weisstein and C . S . Harr is . V i s u a l detection of hnes: A n object-superiority effect. Science, 186:752-755, 1974. [W1190] F . Wi lk inson . Texture segmentation. In W . C . Stebbins and M . A . Berkley, editors. Comparative Perception - Vol II: Complex Signals, pages 125-156. New York : John Wi ley and Sons, 1990. [WM78] N . Weisstein and W . Maguire . Comput ing the next step: Psychophysical measures of representation and interpretation. In A . R . Hansen and E . M . Riseman, editors. Computer Vision Systems, pages 243-260. New York , N Y : Academic , 1978. [WM85] R . J . Wat t and M . J . M o r g a n . A theory of the primitive spatial code in h u m a n vision. Vis. Res., 25:1661-1674, 1985. [Woo81] R . J . Woodham. Ana lys ing images of curved surfaces. Artificial Intelligence, 9 :117-140,1981. [ZSS83] S . W . Zucker, K . A . Stevens, and P. Sander. The relation between proximity and brightness s imilarity in dot patterns. Perception & Psychophysics, 34:513-522, 1983. [Zuc86] S. Zucker. E a r l y orientation selection: Tangent fields and the dimensionahty of their support. In A . Rosenfeld, editor. Human and Machine Vision II, pages 335-364. New Y o r k , N Y : Academic , 1986. [Zuc87a] S. Zucker. The diversity of perceptual grouping. In M . A . A r b i b and A . R . Hanson, editors, Vision, Brain, and Co-operative Computation, pages 231-261. Cambridge , M A : M I T Press, 1987. [Zuc87b] S. Zucker. E a r l y vision. In S .C . Shapiro, editor, Encyclopedia of Artificial Intelli-gence, pages 1131-1152. New York : John Wi ley and Sons, 1987. 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items