\u00a3n of tyP e left bank and chains of type right bank, and the chains alternate in type and orientation around C. Assume that all pairs of construction edges occur in the river graph; if only one edge of a. pair existed then one face would violate our head-to-tail assumption. We will conclude that property T l must be satisfied. With the construction edges, the definition of when construction edges are added to a river imply that each river has a single drain point. The river bank edge semantics imply that when r,- and \u00a3j meet head-to-head then this is a point through which the river C can drain. As long as the construc-tion edges exist, each face has exactly one drain point. The ordering of construction edges in the standard embedding of M ('\u00bb) the river graph ensures that the edge type for a construction p j g u r e 3 g- J3a(j (a) a n d edge in a river matches the type of its upstream edge on the good (b) medial axes around construction edges same face. Therefore n and m are 1 and we have two chains in the boundary of C. \u2022 It is natural to perform the local tests for T l , T4, and T5 in the same scan of the river graph. However, it is more convenient to find single-face errors with the necessary condition of T l , 31 remove all construction edges from the river graph, and then apply the test for condition T4 to locate subsequent inconsistencies. Removing the construction edges merges adjacent faces, typically two rivers or a river and a lake, which by property D2 leaves an acyclic graph. Keeping in the construction edges when finding the medial axis leads to branches around the construction edges in the final river centreline (solid line in Figure 3.6). By applying the test for condition T4 after construction edges are gone, we still ensure that the lake and river faces are closed, even after they are combined. 3.3 M e d i a l A x i s A p p r o x i m a t i o n A l g o r i t h m The medial axis is a well-studied structure in image analysis and in computational geometry. Image analysis algorithms typically discretize the problem with a rectangular grid of pixels and compute which pixels are (approximately) in the medial axis [30]. This is not a good fit with our input vector data for rivers: thin, meandering polygons defined by sequences of unevenly scattered vertices. Neither is it a fit with the output we desire to exploit: topological connections within the medial axis and back to the original data. Computational geometry has developed optimal algorithms to compute the structure for simple polygons [1, 19]. The medial axis is also a subset of the Voronoi diagram of the polygon edges and, as such, algorithms that compute Voronoi diagrams [11, 37, 43, 69, 75, 97] or constrained Delaunay triangulations [18, 27, 79] can find the medial axis. Unfortunately, implementations of algorithms for the medial axis or the Voronoi diagram for polygons [2] must deal with the parabolic segments in the diagrams. The more successful im-plementations use incremental constructions and topological information [84] or triangulations [43] to resolve degenerate conditions and to avoid relying on tests with parabolic segments to make decisions. While good in practice, incremental algorithms have non-optimal worst-case behaviours. Consequently, we adapt a robust sweep algorithm for the Voronoi diagram of point sites to ap-proximate the medial axis. The sweep algorithm provides a better worst-case time complexity. The restriction to point sites also means that we need not handle parabolic segments in the final diagram. Moreover, an implementation for the Voronoi algorithm for point sites has fewer types of degeneracies to deal with than a comparable implementation for line segments. The Voronoi diagram of a discretized polygon boundary produces a matching Delaunay triangulation for the 32 interior of the polygon and this triangulation guides our approximation to the medial axis. Figure 3.7: The Voronoi diagram of Figure 3.8: The Voronoi diagram of lines and curves from the Voronoi lines and curves diagram for points As the cover of Okabe, Boots, and Sugihara [69] demonstrates (reproduced in Figure 3.7), we can approximate the Voronoi diagram of lines or curves with the Voronoi diagram of points along the lines or curves boundaries. Figure 3.8 shows the Voronoi diagram of lines and curves that Figure 3.7 approximates. By spreading more points along the lines or curves, the Voronoi diagram of the points converges to a superset of the Voronoi diagram for the lines or curves. Thus, given the polygonal contour of a river or lake, we will discretize the boundary of the river, compute the Voronoi diagram of these points, and approximate the medial axis from the result. Discretization always involves trade-offs. The quality of the approximation depends on the degree of discretization for the boundary. As more points discretize the boundary of the polygon, the Voronoi diagram inside the polygon converges to a superset of the medial axis for the polygon. Computationally, adding more points to the boundary adds degeneracies and increases the computation time. We must strike a balance between computation time and fidelity of the approximation. Our solution adaptively discretizes the river boundary until the Delaunay triangulation of the discrete boundary contains all river boundaries. Unlike Nackman and Srinivasan [64], we do not have the medial axis to guide our discretization. Since hydrology data is usually well-sampled, 33 Q <~ {PO, PU \u2022\u2022 ; Pn-l} D f- the Delaunay triangulation Q for i = 0 to n \u2014 1 do \/ * Assert that P from po to p,- is in D *\/ push pi onto S while S is not empty do q <- pop(S) if segment from g to q.next crosses a Delaunay triangle then t <\u2014 midpoint of segment from q to q.next Q ^ Q U {*} update D push the neighbours of t in D onto 5. endif repeat the if statement with the segment from q to q.prev endwhile endfpr Algorithm 1: Algorithm to discretize a polygon boundary we want to avoid splitting every edge as with Saalfeld's algorithm [78]. We treat both rivers and lakes as polygons when approximating the medial axis. In fact, we can eliminate construction edges in the river networks to combine rivers and lakes into larger polygons for the medial axis computation. We only revert to using the type of the edges when we orient the medial axis in Section 3.4. We start with three data structures: \u2022 A doubly-linked list of the vertices for the polygon P. \u2022 A quad-edge storage structure for a triangulation. \u2022 A stack S of points on the polygon P. The n points of the polygon P are po, Pi, \u2022 \u2022 \u2022, Pn-i and will always be referred to as such, even after we subdivide edges of P. The linked list of points for P has a pointer into the quad-edge storage structure so that each point references one edge for which it acts as an endpoint. The quad-edge structure also has a references from each edge to its origin vertex in P. Figure 3.3 presents the incremental algorithm. The algorithm incrementally updates the current triangulation [85] and compares the new Delaunay triangles with the river boundary again. We divide river boundary edges until no Delau-34 nay triangle crosses the river boundary. Upon completion, the Delaunay triangulation decomposes the river interior into triangles (Figure 3.9) and the corresponding Voronoi edges inside the river form a single component. (a) (b) (c) Figure 3.9: Delaunay triangles inside the polygon with the initial points (a), and after one (b) and two (c) decomposition steps Note that discretization by adding points on the edges of a polygon is different from dis-cretization of the whole space as in image processing. The former is more adaptive and does not lose the topology of the output. Lemma 3.3.1 Our algorithm for computing the discretized Voronoi diagram of n lines takes 0(nlogn + k(n + k)) worst case time and 0(nlog n + k) expected time where k points are added to the edges. P r o o f : The initial Delaunay triangulation of the n line endpoints takes O(nlogn) time. The for loop iterates n times, but we will count its work with points inserted into Delaunay triangulation. For any point on the stack, we locate the wedge that contains its edge and test for the edge in D in 0(n) worst-case time and 0(1) expected time. The cost of the test goes to the point itself for the first test on pi and subsequently on the point whose insertion placed q onto the stack S. To add the ith incremental point q, we first split the edge and update Q in constant time, insert q into the triangulation D in worst case 0(n + i \u2014 1) time and expected 0(1) time and push at most 0(n) (and expect 0(1) points onto the stack S. The worst-case time complexity after inserting k points is then 0(nlogn+n+&(n+k)) = 0(n\\ogn-\\-k(n+k)) and expected time complexity of 0(nlogn + n + k) \u2014 0(n\\ogn + k). \u2022 35 Figure 3.10 shows an example by Edelsbrunner and Tan [34] with n vertices and m lines that requires Q(nm) additional points if \u2022 the line segments will appear as Delaunay edges. For lakes and river Figure 3.10: A lower polygons, m \u2014 n so our algorithm, as stated, has an 0(n ) worst-case i 3 0 u n ( j f o r n n e time complexity and an 0(n2) expected time complexity. These are subdivisions, intolerable complexities for GIS applications if they represent the normal behaviour. If these lower bounds were representative of the running time on GIS data then the algorithm would be too slow to be practical. The worst-case scenario, where polygons fold on themselves to produce the configuration of Figure 3.10, are improbable in river data. Even for a serpentine river, we need many digitized points along the river banks to capture the curvature of the river. The main locations for edge refinements are in bulges in the river banks and in narrow inlets to a river. Consequently, the number of points added to the river polygon boundary is proportional to the size of the polygon and the incremental algorithm has an 0(n2) worse-case time behaviour and an 0(ralog?i) expected time behaviour. We have three approximations to the medial axis centreline that come from the Delaunay triangulation of the discrete boundary. The three approximations are different embeddings of an abstraction of the medial axis. Let G be the graph dual of the Delaunay triangulation D for the interior of polygon P; exclude the vertex for the outer face of P in G. Then G is a connected planar-graph. Our approximation is a subgraph of G that connects all tributaries to P. For each tributary of P, designate an adjacent triangle of D as the initial flow direction for the tributary into P. These triangles correspond to a set of vertices T in G. Root G at one vertex t of T. There exists a unique path 7rt(s) in G from every vertex s in T to t. The medial axis approximation A is the subgraph of G induced by these paths: A = [JseT ^t{s)-If P has only one tributary, then this approximation contains only the root of the tree. It does not provide any detail on the shape of the lake. To extend the approximation beyond the root, we add to T the vertex u for the Delaunay triangle farthest away from the tributary. Vertex u extends the approximation from this root so that it captures part of the shape for the lake. This medial axis approximation is an abstract graph that gives the interconnections between tributaries. To see the approximation, we must embed the graph. We consider three straight-line 36 embeddings for A by selecting different positions for a vertex v of A that corresponds to a Delaunay triangle d of D: \u2022 Voronoi approximation: embed v at the Voronoi vertex for d\u2014the centre of the circum-circle for d (Figure 3.11a). \u2022 centroid approximation: embed v at 2\u00b1|\u00b1\u00a3 w n e r e fl) a n ( \\ c a r e the vertices of d (Fig-ure 3.11b). \u2022 midpoint line approximation: embed v at a+b+2 c where a, b, and c are the vertices of d and segment ab is the shortest edge of d (Figure 3.11c). In the Voronoi approximation, the edges remain in-side the river as long as the incremental discretizing forces the triangle angle opposite a river edge to be smaller than 90 degrees. Unfortunately, if the discretized points along a boundary edge are far from one another (relative to the river width) then this approximation has a zig-zag pattern rather than the expected smooth centreline. Figure 3.11: The Voronoi (a) In the centroid approximation, the centroid is a nat- c e n t r o i d (b)> a n d midpoint (c) approximations to the medial axis. ural choice as a representative point for a triangle, but the paths between centroids is not smooth. When the bases of the marked triangles alternate between river banks, the centroid line approximation zig-zags again. The zig-zag is most pronounced when the triangles have one side much smaller than the other two sides. The midpoint line approximation is similar to the centroid line. Geometrically, if p is the midpoint of line segment ab then the point a+b+2 c is the midpoint of the line segment from c to p; this point is always inside triangle abc. With a fine boundary discretization, the shortest edge of the Delaunay triangles are usually along the river banks so line segment cp crosses between river banks. While all three approximations generate a centreline that lies between the river banks, they differ in two respects: convergence to the medial axis and estimation of river area. Let the distance between two adjacent points along the discretized boundary be at most d. Then let Vd, Cd, and. Ld be the Voronoi, centroid, and midpoint line approximations to the medial axis. Lemmas 3.3.2 37 and 3.3.3 prove that each of and Lci converge to the medial axis as d \u2014> 0 and are therefore appropriate for approximating the medial axis. Observation 3.3.4 shows an example of why the lack of convergence of C'd makes it an inappropriate approximation. Lemma 3.3.2 The Voronoi approximation V(i converges to a superset of the medial axis as d.\u2014t 0. Proof: Let p be a point on the medial axis equidistant to points q\\, q2, \u2022 \u2022., qn on the boundary of polygon P. If any two or more points of qi, q2, \u2022 \u2022 ., qn are sites for Vd then p appears in Vd as the bisector of these sites. Otherwise, let ri and r 2 be the neighbours of q\\ and let r 3 and r 4 be the neighbours of qi on the discrete boundary of P (Figure 3.12). As d \u2014> 0, points ri and r 2 both converge to q\u00b1, as points r3 and r 4 both converge to q^. The Voronoi point of r\\, ri, and one of and r 4 then converges to p in Vd. m \\ Lemma 3.3.3 Let T be the set of Delaunay triangles inside a polygon P Figure 3.12: Point configuration for that correspond to junctions in the medial axis of P. Let M be the medial L e m m a 3 3 2 axis of P. Then a subset of the midpoint line approximation, namely Ld(~)T, converges to M DT as d \u2014y 0. Proof: Assume that d is small enough so that the boundary of P appears as edges in the Delaunay triangulation. ' Let p be a point on the medial axis of P that appears in a Delaunay triangle t not in T. As d \u2014> 0, the corners of the shorter edge of t converge to one another and t converges to a shortest chord between opposite boundary points of P. The weighting factors for the representative point of t weigh the ends of the chord equally so the representative point of t converges to p in Ld-If t is in T then t is not subdivided as d \u2014>\u2022 0 so there is no convergence to M in any triangle of T. \u2022 38 Observation 3.3.4 Cd diverges from the medial axis as d \u2014>\u2022 0. More- e-over, the length of Cd can be unbounded (Figure 3.13). ^ ^ ^ ^ ^ ^ With successively finer refinements, the centroid line approximation C ^ L ^ C ^ C ^ C ^ C ^ C ^ C ^ can oscillate between opposite sides of the polygon. The oscillation has an unbounded length as d approaches 0. \\I\\j\\*\\I\\T\\IWW! In Section 3.6 we describe how the river area is derived from each approximation; in short, the Voronoi approximation samples the figure 3.13. Edge subdivision makes the river width to estimate the area whereas the other two approximations centroid line long, use the Delaunay triangulation from which they are defined to compute the river area in a simpler and more direct manner. We use the midpoint line in our work as a good compromise between convergence and ease of area estimation. 3.4 Edge Orientation The medial axis itself solves the river connectivity problem, but does not provide upstream and downstream relations among rivers; we have ignored the direction of water flow along the medial axis edges so far. A correct direction of flow is important to answer queries such as \"What is the river area upstream from a particular point?\" or \"If a particular tributary is polluted, what (downstream) rivers may be affected?\" For rivers, the direction of flow for a medial axis edge matches either the flow direction of a tributary at one end of \"\"^ \"*\\ the edge or the flow direction on the river banks that define ^ (a) the medial edge. This does not apply to medial axis edges inside lakes since the lake shore edges do not encode any flow Figure 3.14: Forced flow (dashed) \u2022 c ,. r , , i \u2022 t i r c edges of the medial axis information. For lakes, a tew simple topological rules suffice \u00b0 in our tests for mountaneous regions. The rules essentially enforce the conservation of flow by preventing water from accumulating at any vertex of the medial axis. In a non-degenerate medial axis and in the midpoint line approximation, every junction vertex involves three edges. The conservation argument says that the edges will split as either two incoming edges and one outgoing 39 edge or one incoming edge and two outgoing edges. When we have identified two incoming or two outgoing edges at a junction, the direction of the third edge is thus determined. If a junction has three outgoing edges then there are many outflows for the lake; Lemma 3.4.1 characterises this condition. The rules, in order of precedence, are 1. if an edge is adjacent to a tributary then the flow of the edge matches the flow of the tributary (Figure 3.14a). 2. if the medial axis edges inside a lake meet at a point then there must be at least one edge that enters and at least one edge that leaves the point (Figure 3.14b). If the medial axis is treated as a tree, then rule 1 orients every leaf edge and rule 2 propagates the edge orientations from the leaf edges to the rest of the tree edges according to a topological order of the tree. When the medial axis combines the incoming and outgoing tributaries of a lake in a special way, there may not be a Unique orientation of the medial axis that conserves flow. Consequently, these two rules are not sufficient to orient all medial axes. Lemma 3.4.1 characterises the instances where the orientation is not unique. Lemma 3.4.1 For any set of vertices S of the medial axis M let Ts be the subtree of the medial axis that connects the vertices of S. Let I be the set of leaves of M that attach to incoming tributaries-. Let O be the set of leaves of M that attach to the outgoing tributaries. Then M can be uniquely oriented to conserve flow if and only ifTj n To does not contain an edge of M. Proof: Assume that Tj and To have at least one vertex each. Figure 3.15: The Suppose that T\/ n To contains no edges. Then M \\ {Tj U To} is medial axis and a path TT. To drain T 7 , all edges of T\/ must be oriented towards ir and flow s u b t r e e s 7r must be oriented from Tj to To- To satisfy the conservation property at every vertex of To, the edges must be oriented away from 7r, otherwise some vertex does not have an incoming edge. This provides a unique orientation for M. 40 Now, suppose that e is an edge of TjdTo- Then each subtree of M\\e contains both an incoming and outgoing tributary. Inductively, each of these trees can be oriented with a conserving flow. Either orientation of e leads to a. conserving flow of M so the orientation is not unique. \u2022 When the propagation of rule 2 fails because of Lemma 3.4.1, we are left with a set of trees F of unoriented medial axis edges where every leaf in F has one incoming edge and one outgoing edge already oriented in the medial axis. The user can assign directions to one or more of the leaves in F and allow rule 2 to continue propagating. Alternatively, we can iteratively assign an arbitrary direction to one leaf in each tree of F and propagate rule 2. Any such assignment yields a consistent flow on the medial axis, though such a consistent flow may divert the bulk of the lake out a small and relatively unimportant outlet of the lake. These edge-orienting rules do not detect lakes that have either no incoming edges or no outgoing edges. Nei-ther of these cases exhibit the subtree intersection property of Lemma 3.4.1. In fact, rule 2 completely orients the me-dial axis edges of these lakes in a self-consistent manner and routes the flow into or out of one centre node of the medial axis. For lakes with no outgoing edges, one consistent ori-entation is as good as any other: all the water remains in the lake. For lakes with no incoming edges, a consistent Figure 3.16: Centerline paths with , , , ,, r , , , , \u2022 \u2022 matching banks orientation can have the bulk of the lake area draining out of a minor tributary. 3.5 Benefits of the Medial Axis The main benefit of the medial axis is, of course, to provide a centreline. We derive benefits beyond the centreline from the association between a medial axis edge and its nearest river bank. This section outlines four benefits: the link between derived data and original data, a definition of opposite banks, estimates for river width and surface area, and network orders for river banks and lake shores. First, the association ties calculations on the centreline to the source data that defined the 41 T T T T > > : - ~ - \u00bb -centreline. The medial axis replaces the river banks in a network to give a single-line river network that undergoes further analysis, such as identifying drainage basins, locating fish spawning habitat, and tracking the run-off of forest cut blocks. River banks inherit attributes and results of the analysis from their associated medial axis edges. Thus, the single-line net-work allows for simpler network analysis than on river banks but does not lose the connection to river banks. Figure 3.16 shows a fish migration path in bold computed from the centreline of the river system. The path is tied back to the nearby river banks and lake shores; habitat analysis can then be performed from the bank characteristics. Second, we can define opposite banks through the association. Since F i g u i e 3.17. Opposite bank a medial axis edge is the bisector of its closest river banks, we consider these relations two banks as opposite one another along the river. We can then combine the attributes of opposite banks, such as slope, elevation, soil type, and vegetation type, into attributes for the centreline or can compare attributes with one another to detect inconsistencies in the data or anomalies in the environment. Some additional refinement to this simple definition may be necessary, since the medial axis is sensitive to bulges in the boundary of rivers. Within a bulge, the medial axis may identify opposite sides of the bulge as opposite banks, as illustrated in Figure 3.17, rather than identifying the sides of the bulge to an edge across the bulk of the river. This is correct behaviour if there is a tributary that flows into the bulge, but may be inappropriate for some applications when there is no tributary. We discuss this further in Section 3.6. Third, once we use the association to define points on opposite banks of a river, we can use these points to define the width of the river at a point or to define the section of a Figure 3.18: A river, its polygonal river that corresponds to a segment of the single-line centreline (a), and its area network. We define the width of a river at a point p to be the estimate (b). length of the line segment that touches opposite river banks; the medial axis identifies these opposite points on river banks at p. Instead of using one nominal river width for the whole river, we can sample the river width anywhere. (b) 42 With the river width, we have two approaches to estimate the surface area of a river. We can refine the approach of single-line rivers: partition the length of the river into short sections, sample a nominal width for each section as in Figure 3.18, and sum the area estimates of length times nominal width. Alternatively, we can close off a section of a double-line river, and compute the area of the resulting polygon. Opposite points along river banks are natural candidates for closing off the river. This second approach has a nice implementation that relies on the Delaunay triangulation of the river (see Section 3.6). Figure 3.19: A river network with its Strahler order (a) and the subnetworks with Strahler numbers greater than 1 (b) and 2 (c). Fourth, the association extends network orderings to wide rivers and lakes for cartographic generalization. The medial axis or centrelines of rivers have, for many years, been used as a replacement for double rivers for display at larger map scales [67]. As mentioned in Section 2.6, network orders on single-line networks, such as the Horton, Strahler, and Shreve orders, have been used to select the mainstems of a river network for general-ization and display at large map scales. Figure 3.19 shows a sample network with its Strahler order and the result of selecting edges of high order from the network. These same orderings can apply to networks that contain lakes or river banks by treating the lakes and wide rivers as their medial axes; this is not surprising. The river bank edges receive an order number from the associated medial edges. Then selecting edges with higher order also extracts the relevant rivers along the path. There are two ways to propagate network orders to lake shores and river banks. The first way assigns the network order of the nearest medial edge to a river bank. The nearest edge assignment is appropriate for ordering river banks since the order of the river may increase as we near the 43 mouth of the river. For lakes, one may prefer to assign the network order of the highest medial edge in the lake to all the lake shores. This assignment treats the lake as a single unit and preserves the visual cues of lake extent and shape. 3 . 6 A r e a G e n e r a t i o n The Voronoi approximation to the medial axis provides river width measurements for finding the river surface area with the technique of Section 3.5: parti-tion the river into sections and sample one width value for each river section. The section lengths and sampled widths represent a better area estimate for the sections than the area from a single nominal river width. Equiv-alently, if the sections all have the same length then the average of the sampled widths yield an overall width Figure 3.20: The area of bays estimate for the river. Two drawbacks of sampling the , , \u2022 , ,. , , ^ & accumulate in nearby medial edges. river width to estimate river area are that the samples can miss small bays and can overestimate the width at river junctions. The centroid line and midpoint line approximations to the medial axis can estimate the river surface areas in two ways. The first way is identical to the method of the Voronoi approximation since both approximations capture the same proximity information. The second way assigns the area of each Delaunay triangle to its representative point (centroid or midpoint). Since the Delaunay triangulation decomposes the interior of the river, the representative points account for all the river area. The river area between two points a and b is the sum of the areas assigned to the representative points between a and b along the medial axis approximation. The second way is not directly applicable for the Voronoi approximation since Voronoi vertices do not necessarily lie inside their corresponding Delaunay triangles. Computing river area from Delaunay triangles has minor drawbacks of its own. First, not every Delaunay triangle has its representative point in the approximate medial axis since the approximation only keeps the triangles that join tributaries. To overcome this drawback, we allocate the areas of inlets and bays that have no tributaries to a nearby representative point to preserve all 44 the area for the feature. This assignment is shown in Figure 3.20. representative point to preserve all of the area for the feature. Second, the granularity of the area estimate depends on the relative sizes of the Delaunay triangles. The area of a triangle in a river branch may be small while the area of a triangle at a river junction may be large. The granularity only becomes an issue for the ends of river sections. 3 . 7 I m p l e m e n t a t i o n L e s s o n s We implemented our medial axis approximation algorithm as a library for the Cause &; Effect GIS of Facet Decision Systems of Vancouver. The implementation highlighted five issues of merging computational geometry algorithms with a GIS. First, a user should always have the ability to check (and and the maps do not align perfectly so the lines do not meet at (dashed) a common point on the boundary. Still another error is well-intentioned simplification that skews the topology of the input data. For example, Figure 3.21 displays two simplified polygons that meet at a vertex and whose outer face (dashed line) is topologically merged with an inner face. The sanity-checks that we applied to our input data before running our algorithms saved us from many debugging headaches. Functions that verify the current content of a data structure against invariants for the data structure, called audit functions, reduced the amount of debugging required in the implementation. Moreover, a liberal use of audit functions finds errors in the data structure when the error is created instead of when the erroneous data is later used. Once the debugging of an implementation is complete, we can remove the audit functions to speed-up the execution time. Second, we must be prepared for inconsistencies from floating point errors. Our implemen-tation used epsilon geometry [45] on near-zero values and simulation of simplicity [33] on vertical edges in a sweep algorithm to avoid floating point inconsistencies. Despite these efforts, there are still cases where our computed results produced topological inconsistencies, such as merging the re-check) the input data and data structures for consistency. With the large amounts of GIS data, there is always an error in raw input data for some algorithm. One type of error is semantic inconsistency as described in Section 3.2. Another error is a bad combination of data, such as lines that cross a map boundary Figure 3.21:Tncorrect polygon topology (solid) and the resulting incorrect face 45 inner and outer faces of a polygon. Letting topological tests guide insertions into data structures, similar to the topological algorithms by Sugihara and Iri for finding Voronoi diagrams of many sites [84], stabilised these inconsistencies and let the data structures adhere to an invariant. Third, the rule-of-thumb for implementations that states \"be lenient in the input data that you accept and concise in the results that you produce\" can be taken too far. When applied, the rule-of-thumb simplifies the task of using the output of one program as the input for another. Our implementation of this rule was the ability to relax the requirement that end points be identical to connect two river polylines; given a user-specified tolerance e, any two points that were within e of one another were considered to be the same point. This flexibility was very useful in combining data that came from different maps and whose endpoints did not match perfectly at the boundary. A consequence of this flexibility is that we treat the endpoints of an edge of length less than e as one point, which then creates topological inconsistencies in data structures that track the adjacencies of polylines. Fourth, we must avoid the common error of reporting results with too much precision. Our implementation made this error when approximating the medial axis. The TRIM data at our disposal has a 1.0 metre precision. The vertices of the medial axis approximation have the precision of a single-precision floating point number. To maintain a consistent level of precision, we should post-process the medial axis to decrease the precision; however, this process must ensure that the reduced precision does not permit the medial axis approximation to cross its river and lake boundaries. Fifth, non-optimal algorithms can be sufficient for GIS problems. The worst-case (or even expected case) time complexity for our medial-axis approximation algorithm does not measure favourably against the optimal algorithms for medial axes or for constrained Delaunay triangula-tions. Nevertheless, the simplicity of the algorithm, especially in dealing with data degeneracies, outweighs the implementation complexity. 3.8 Sample River Systems Our initial tests extracted and directed the centrelines and computed the areas of rivers and lakes in the mountainous interior of British Columbia where lakes have few out-flowing rivers. In the 500 lakes and rivers tested as part of the Horsefly river system, the resulting water flow directions 46 matched the expected flow directions in all of the cases. In the majority of the cases, the lakes only had one outlet and one inlet so there was a single direction of flow to find. Other rivers or lakes, as in Figure 3.22, have a medial axis with a more complex branching structure. With the success of collapsing lakes in the Horsefly watershed, Ian Williams from the De-partment of Fisheries and Oceans at Nanaimo and I created a centreline river network for the Fraser River Basin, from the mouth of the river at Vancouver to the upper reaches of the Driftwood River. The river network used 1:20 000 scale TRIM data. The centreline network is the spatial struc-ture that underlies models in the Integrated Fraser Salmon Model (IFSaM) system that simulates salmon migration in the Fraser River [94, 95]. Figure 3.23 show a subset of the Fraser River network with its connectivity properties through lakes and two-bank rivers. We selected a network edge e near the mouth of the Nechako river; the figure displays the edges upstream from e in red and the edges downstream from e in green. The river sur-face areas from our network for the Fraser and Nechako rivers have also been used in a model that forecasts the river tempera-ture from the surface area and atmospheric temperature. When Figure 3.22: The medial axis of compared against historical data, the model with our area esti- t n e ^ a ^ e boundary does not respect islands mates was more accurate than the same model with older area estimates. A difficulty, which we have not resolved, is over-estimating the lake and river areas because we do not consider islands and sandbars. Sandbars appear along river banks and narrow the effective width of the river. Islands eliminate area from lakes. We expect that a more liberal definition of a river bank can handle most sandbars. In other cases, sandbars cross the width of rivers, which will make automatic handling more troublesome. To negate the effects of islands on the lake areas, we can subtract the area of the islands from the area of the rivers or lakes to which the islands belong. This solution is not entirely satisfactory, since it does not give us an easy method for finding the area of a river between two points on the river banks, and the automatically generated centrelines do not respect the land formations (Figure 3.22). Although the medial axis can be computed (or approximated by our 47 techniques) for polygons that have holes, the resulting diagram contains loops and would require an additional decision process on how to cut the loops to form a single-line network for fisheries analysis. In our salmon migration application, lakes that were large enough to contain islands were interesting primarily for the role they played in overall connectivity; spawning occurs in smaller streams. Thus, we did not implement more general handling of islands. Figures 3.24, 3.26, and 3.25 show additional examples of the centreline approximation. The cen- \\J \\ \\ treline approximation in Figure 3.24 threads through \\ a very narrow section of a lake. Figures 3.25 and 3.26 I narrow rivers. These last two figures are not shown to the same scale; Figure 3.25 is a small river from the upper right corner of Figure 3.26. demonstrate that the approximation can provide good centrelines at different scales and for both wide and Figure 3.23: Centreline network for part of the Fraser River 48 0 1 km 2 km Figure 3.24: Centreline for a narrow portion of the Lower Campbell Lake 0 500 m 1000 m Figure 3.25: Centreline approximation for a winding section of the upper Quesnel River Chapter 4 T e r r a i n In the progression from river networks to watersheds of rivers, we must model the terrain that appears inside the watersheds. This chapter summarises basic terrain representations in Section 4.1 and then focuses on data structures (Section 4.2) and algorithms (Section 4.3) for storing and creating Triangulated Irregular Networks (TINs). For a broader overview of terrains and TINs, refer to Marc van Kreveld's summary article and Mark Kumler's monograph [53]. 4.1 Terrain Representations There are three classic representations for terrains: contour lines, raster grids, and Triangulated Irregular Networks'(TINs). Contour lines are iso-height lines of the terrain. They first appeared in printed topographic maps and have been adopted in digital data, both from digitiz-ing topographic maps and as a familiar representa-tion. Visually, contour lines make it easy to locate mountains and pits because each one generates sets 200 m 400 m 600 m Figure 4.1: Contours of Weaver Creek, 100 metre intervals. of nested contours. For computers, this nesting must be made explicit, whether as links between 50 adjacent contour lines or through a contour tree [90]. Raster grids are more common digital models of terrain. Raster grids subdivide the terrain with a regular grid and measure the elevation at one point, such as the grid centre, within each grid cell. The raster grid has an efficient storage structure. The x and y coordinates of every point are implicit in the location of the cell within the grid. Consequently, we only store the elevation of each grid cell and the number of rows and columns in the grid. The grid also provides a topological structure for the terrain. From any grid cell, we can find the adjacent cells in constant time and every cell has a fixed number of neighbouring cells. With the compact structure and the local topology, terrain algorithms that work with raster grids are efficient. The main disadvantage of raster grids as terrain representations is the storage space. The grid must be fine enough to capture terrain features in rough regions. If there is a flat area in the same terrain, such as a lake or a plateau, then the grid also subdivides the flat area at a fine resolution since the grid is regular. Figure 4.2: Triangulation (left) and perspective view (right) of TINs for Weaver Creek TINs are continuous representations of terrains. They model the terrain with piecewise-planar triangular patches between selected elevation points on the terrain. Unlike raster grids, the elevation points are not necessarily at regular intervals. Consequently, TINs can represent flat regions with few sample points and use a proportionately larger number of sample points in rough landscapes. By allowing the sample points to be in irregular positions, TINs can also have higher detail around important terrain features such as valleys and ridges. The disadvantages of TINs are that every sample point must store its x, y, and elevation values and that the TIN must store face adjacency information. There are two other terrain representations that are not frequently seen. The first is a 51 \"richline\" description of the terrain and the second uses quaternary triangular meshes (QTMs). A richline description [28] encodes the terrain ridges, valleys, pits, and peaks to represent the main characteristics. The richline description pays more attention to the drainage characteristics of the terrain and less attention to fine-grained point elevations everywhere on the terrain. QTMs [31] are an alternate method to latitude and longitude of encoding the location of a point on the surface of the Earth. QTMs model the Earth as a triangulated approximation to a sphere and apply a numbering scheme to identify each triangular face. Each triangular face is then represented by a single point at the triangle centroid. The numbering scheme for QTMs allow us to refine individual triangles by adding more digits of precision to the number for a triangular face, which implicitly subdivides a triangle. As the QTM numbering scheme comes from a triangulated approximation to the Earth, we can approximate terrain with a set of Q T M points in a manner similar to TINs. However, unlike TINs, the adjacencies between faces would be implicit in the number for each QTM point. At present, neither the richline nor the Q T M representations have had much attention for terrain analysis. We use TINs for our terrain model since our data is not gridded and it combines lower density elevation data with river data. The following sections describe algorithms and data structures for TINs and definitions of terrain features on TINs. 4.2 TIN Data Structures A TIN is just a planar embedded graph (a triangulation) to which we attach specific meaning: each triangular face represents part of a terrain. Consequently, we can use the adjacency lists, winged-edges, or quad-edges of Section 2.2 to represent the TIN. Also in parallel with graphs, the trade-off between structures is the time to access neighbouring edges or faces. As with graphs, the choice of representation for the TIN depends on whether our algorithms need to access information about adjacent faces or edges and whether the representation handles these accesses. 52 van Kreveld [89] describes another data structure available for storing TINs. It is similar to the quad-edge data structure in that it stores pointers to both the ver-tices and the faces of the TIN, but it is not edge-centric. '\" i 15 \u00absMo'\"> The TIN is still stored as a graph, but there are graph nodes for every TIN edge, every TIN vertex, and every TIN face. Every TIN edge has a graph edge to the nodes of its end vertices and to the nodes of its adjacent faces. Similarly, every TIN face has a graph edge to the nodes of its three adjacent edges and maintains a consistent order of the edges around the triangle. The TIN vertices only Figure 4.3: A TIN and its network act as the endpoints of TIN edges; they do not act as the structure (from [89]) start of graph edges since the degree of a vertex is arbitrarily large and we want the graph nodes to contain a fixed number of pointers to other graph nodes. This structure allows a constant-time access from any face to its adjacent faces, incident.vertices, and edges. It also allows constant-time access from any edge to its immediately-adjacent edges (through the faces) and to its incident faces and vertices. Of all the data structures, we use quad-edges to store TINs since they represent both TIN vertices and faces in a symmetric manner. 4.3 TIN Algorithms Much of the existing terrain elevation data appears as a set of contour lines, of profiles, or raster grids. There are many algorithms for converting these types of data into TINs [38, 47, 53, 89] that select a subset of the raster grid or contour points as the TIN vertices. Once we have the set of TIN vertices, a Delaunay triangulation of the vertices defines the faces of the TIN. We restrict our attention to the incremental conversion algorithm of Heckbert and Garland. Heckbert and Garland use a user-specified error tolerance between the gridded elevation points and the TIN. They begin with a coarse TIN\u2014a triangulation of the xy bounding box for the terrain. Their algorithm computes the vertical distance between the current TIN and every point in the raster grid. If the furthest point is within the error tolerance then the TIN creation is complete. 53 Otherwise, the algorithm adds the raster point with the furthest distance from the current TIN to the TIN and re-triangulates the TIN vertices to re-establish the Delaunay property. The algorithm iteratively adds points in this manner until the TIN satisfies the user's error tolerance. Heckbert and Garland's algorithm adds points according to their vertical distance from the TIN since it is a convenient and easily-computed value. As with the polyline simplification of Section 2.4, we could apply other criteria, such as changes in terrain curvature or the change in volume below the terrain with the addition of a point [3]. However, these other criteria are often more expensive to compute after each insertion to the TIN. Most TINs use a Delaunay triangulation to define the triangular faces since the triangulation avoids small angles in its triangles. Since the triangulation is indepen-dent of the actual terrain structure, the triangulation can introduce inconsistencies. For example, the triangulation Figure 4.4: A narrow valley (left) and a can insert a triangle that blocks a narrow valley in the b a d triangulation of the valley (right) terrain (Figure 4.4). Known structural lines, such as ridges, valleys, and rivers, are often inserted into the triangulation as breaklines to remedy these problems [17]. The triangulation becomes a constrained Delaunay triangulation so that the breaklines are guaranteed to appear in the final triangulation. Another alternative is to manually alter the TIN so that its characteristics match those of the terrain [93]. 4.4 TIN Features The analysis of TIN drainage characteristics in Chapter 5 focuses on particular features of the TIN: ridges, valleys, peaks, and pits. Given a model of how water flows on a TIN, we use the definitions from Frank et al. [39]: \u2022 A TIN edge e is transfluent if water flows from one of its adjacent triangles, across the edge, and onto the other adjacent triangle. \u2022 A TIN edge e is a valley (or is cofluent) if water from both adjacent triangles flows onto the edge. 54 \u2022 A TIN edge e is a ridge (or is difluent) if no water from either adjacent triangle flows onto the edge. The definitions of pits and peaks are independent of the water flow model: \u2022 a peak is a local maximum of the surface. In a TIN, a peak is a vertex with higher elevation than its incident edges and triangles. \u2022 A pit is a local minimum of the surface. In a TIN, a pit is a vertex with lower elevation than its incident edges and triangles. 55 Chapter 5 W a t e r s h e d s The watershed of a river X is the set of terrain points that drain into river X. It is natural to ask questions about watersheds when analyzing the effects of land-based activities on the environment, on rivers, or on fisheries. Sample activities are logging, road construction, and mining. In this chapter we investigate the properties of watershed boundaries on a TIN. While more complex models exist, we use the assumptions that underlie the majority of the complex models and show that some desirable properties of watersheds are inconsistent with even the simple model. As we expose the inconsistencies, we build a data structure called the watershed graph to represent the watersheds of a terrain. We prove that our data structure represents a structure that is consistent with how water flows on the TIN. Let p be a point on a terrain T and let trickle(p) be the path of steepest descent on T from p. The path trickle(p) is the drain path of p on T. The watershed of point q is then all points whose drain path includes q, that is {p | q \u00a3 trickle(p)}. There are two watersheds that we might consider in a terrain. The first is the watershed of a pit in the terrain. We might want this watershed for an interest in the watershed itself, for a measure of whether or not the pit should be smoothed out of the terrain, or for a measure of how sensitive the current TIN may be to further detail refinements. The second is the watershed of an individual point on the terrain. This latter type of watershed handles the watersheds of rivers by asking for the watershed of the most-downstream point of the river. The chapter begins with a description of previous raster and vector watershed algorithms in Section 5.1. We then outline the approach taken by our algorithm in Section 5.2. Next, Sections 5.3 56 and 5.4 define terrain features and assumptions used by our algorithm. Section 5.5 then presents the degeneracies that can occur in a terrain that cause inconsistencies between the derived watersheds and our expectations of the watersheds and, in parallel, develops our watershed graph to capture the watershed structure of pits on the terrain. We prove that our algorithm is correct in Section 5.6 and analyse the construction complexity for the graph in Section 5.9. Since we want to deal with data that can contain degeneracies, Section 5.7 describes some approaches for handling flat terrain. Section 5.8 describes alternative representations for watersheds. Finally, Sections 5.10 and 5.11 describe an implementation of our algorithm and lessons learned through the implementation. 5.1 Watershed Background Most automated approaches for identifying watershed boundaries simulate rainfall and assume that water always flows along the path of steepest descent. The algorithms can be categorized as contour algorithms, raster algorithms, or vector algorithms. 5.1.1 Contour A l g o r i t h m s Manual techniques for finding watersheds often use contour algorithms. Some computer algorithms emulate these techniques. Tang [87] extracts peaks, pits, saddles, ridges, and drainage lines from contour lines by triangulating the points of the contour lines into a TIN and examining all horizontal edges of the TIN. These horizontal edges go between contours of the same elevation and represent \"specific geomorphological elements\" that are classified as pits, peaks, saddles, ridges, and drainage lines. Tang uses the medial axis between these contours of the same elevation to convert a set of horizontal edges or triangles into terrain features such as ridges. He concludes that this method does not find all terrain features but does identify the geomorphological elements in critical regions. More mathematical work on watersheds characterises the shapes of watersheds. Chorley et al. [20] suggest one loop of a lemniscate of the form p = \u00a3cos k6 in polar coordinates as a description instead of circular measures [63]. For a watershed, they use the length of the watershed as the parameter t and its area to obtain the parameter k of the ideal watershed. They then classify watersheds by and comparing k values or ratios between the lemniscate perimeter and the watershed perimeter. 57 5.1.2 Raster A lgor i thms Historically, most computer algorithms for identifying watersheds are raster algorithms. These algorithms appear in the literature for geomorphology, geology, cartography, computer vision, and civil engineering. Collins [23] allocates one watershed per local minimum in the terrain, examines the terrain points in order of increasing elevation, and assigns a watershed value to each point that matches the watershed of an adjacent point. The computed watershed boundaries vary with the processing order of the points. The method also generates many watersheds\u2014one per pit\u2014that later raster methods try to avoid. Other raster algorithms identify terrain features, such as ridges and channels, and recon-struct the watershed topology from the features. Peucker and Douglas [73] identify drainage lines by labelling the highest raster cell in every 2 x 2 window of the raster. The unlabelled cells are locations where water accumulates and form the drainage lines of the terrain. In this approach (and other feature detection algorithms), it is difficult to prove that the ridge networks are connected and that they correctly delimit the watersheds. Mark [57] simulates rainfall on a raster by allocating one raindrop to each terrain point and letting the raindrops flow on the terrain; terrain points that do not receive any raindrops belong to the watershed boundaries. By recording the final destinations of each raindrop for each cell, we collect the raster cells into watersheds. Tracking the rainfall is proportional to the grid size, which can be large. Moreover, we know which cells belong to the watershed but have no indication of the boundary yet. O'Callaghan and Mark [68] use an approach similar to Peucker and Douglas, but they adapt the algorithm to better handle noise in the data and to only delimit the major drainage lines. Their algorithm works with a 3 X 3 cell window, so each raster cell can be affected by its 8 neighbours. Their algorithm begins by smoothing the raster within each 3 x 3 window and assigning a single drainage direction for each raster cell to one of its 8 neighbours. Each raster cell is then classified by the number of incoming and outgoing drainage edges from its 8 neighbours as a pit, ridge, link, or fork. They create preliminary drainage basin labels for each tree of raster cells that flow together and use the preliminary labels to overflow pits through the lowest saddle point on their boundary; all but the lowest pits are overflowed. Then, for each raster cell c, the algorithm estimates the 58 watershed area by the number of raster cells that drain through c. It then identifies the major drainage lines as those cells whose area estimate exceeds a user-specified threshold. Jensen [51] also allocates one watershed for each local minimum in the terrain, but uses a greedy steepest ascent algorithm to assign terrain points to watersheds. She uses a 3 x 3 window around a grid cell in her search for the steepest direction. The greedy algorithm creates problems when a watershed extends above a saddle point in the TIN. In these instances, the computed watershed can wrap around the implicit water divide at the saddle and can claim the entire cap of a mountain for the watershed when the cap actually drains to more than one location. Douglas [28] proposes an algorithm similar to Jensen's for finding ridges and channels in a raster terrain. To find ridges, he examines each raster cell and flags the lowest corner of each cell. After examining all cells, the unflagged corners form the ridges. Similarly, to find channels, he flags the highest corner of each cell. After either of these operations, the algorithm thins the lines in the terrain (by erosion of boundary contours) until only a skeleton of the original lines remain. The ridges, channels, and other information-rich lines form a \"richline model\" for the terrain that captures the main drainage characteristics for the terrain. While the richline model tends to capture .the watershed characteristics, ridge features may not always meet so the model may not capture the entire watershed boundary. Band [6] uses the network edge detection of Peucker and Douglas [73] and the pit overflow technique of O'Callaghan and Mark [68] to find well-connected drainage networks and ridges in raster grids. He then considers all non-network raster cells as a drainage area and applies a line-thinning algorithm to the drainage area while preserving line connectivity across river junctions. The thinned lines are the watershed boundaries in the raster. A two-pass flood-fill algorithm then labels all the raster cells with the river name in whose watershed they appear. While Band's algorithm allows for watersheds with holes (where a separate pit occurs), the holes are not connected to the outer boundary so further processing must find whether or not one watershed is contained inside another. Martz and De Jong [59] also model watersheds in a raster by determining the flow direction for each raster cell and collecting all the cells that drain to a common depression. For spurious depressions and flat areas, they modify the elevations in the terrain raster so that each raster-cell flows towards the nearest cell with a downward slope in the original raster. Their elevation 59 modifications tolerate nested depressions. Later work between Martz and Garbrecht [41, 58] embeds this technique in a larger system for finding watersheds, drainage networks, sub-watersheds, and network properties such as the Strahler order. Qian et al. [76] use an expert system to guide the connections of local drainage networks by global data. Their system begins with a low-level identification of ridges and valleys in a raster grid similar to Jensen. Unlike Jensen, their process uses a larger raster window to identify ridges and valleys (size 3iV X 3A^ for positive integers N) and restrict themselves to four possible directions for the ridges or valleys (horizontal, vertical, or either diagonal of a cell). Groups of valley cells or ridge cells are thinned into single-cell width segments and the segments are locally connected into a network according to the adjacencies of their endpoints. Connections between the low-level networks are then created with the assistance of an expert system. The expert system characterises potential connections from curvature similarities, segment lengths, elevation and slope differences, segment orientations, ridge intersections, the distance between endpoint, and a set of user-specified tolerances for these criteria. At the end of their work, they mention how to construct watersheds for their drainage networks with a tree-thickening algorithm on the network segments that maintains consistency with the ridge segments found in the lower-level processing. Zhang et al. [98] use an underlying continuous approximation to the raster to identify water-sheds. They approximate the terrain with a surface that passes through every raster point and is simplified through a discrete cosine transform. They estimate the valley and ridge raster cells from the transformed terrain with first-order and second-order partial derivatives. Next, they extend the valleys and ridges (to a limited degree) to close gaps between isolated sections. Finally, they apply a flood-fill naming scheme with unique identifiers for valley edges to label all raster cells that belong to a common watershed. As with the previous feature-detection algorithms, the extensions of valleys and ridges try to close off watersheds, but there is no control over how much extension is necessary. Chorowicz et al. [21] only extract drainage networks from a raster grid, but their techniques could be extended to find ridges and thus provide a basis for another watershed algorithm. Their algorithm identifies well-connected drainage networks and represents areal components, such as wide valleys, in the network. Their algorithm begins with a profile scan of the raster grid. The scan examines the elevation of each raster cell in relation to its horizontal and vertical neighbours 60 (possibly far away when there is little or no change in elevation) and distinguishes cells as sharp valleys, wide valleys, summits, and high flat ground. The next step uses \"hydrological flow mod-elling\" to identify the drainage network. The flow modelling follows paths of steepest descent in the raster from terrain saddle points to find river channels; the channels end at depressions in the terrain. The flow modelling then overflows the depressions through their lowest saddle point to increase the connectivity of the drainage network. The result is a single-line drainage network. Finally, the algorithm overlays the wide valleys and high flat ground features of the profile scan with the network to generate a drainage network with areal components. Fua [40] combines multiple raster views of a terrain, whether stereoscopic photos or stereo-pair photos, with grey-scale images to model 3D terrain and subsequently extract drainage networks. He uses the stereoscopic photos to estimate the curvature of the terrain in steep areas and looks for \"V\"-shaped terrain features as valleys. In lower-slope areas where terrain curvature is harder to estimate, grey-scale images provide additional information for tracing the paths of rivers. In a different context, watershed algorithms are used for feature extraction in images. For example, Najman and Schmitt [65] define a hierarchical ranking of watershed edges and use. the hierarchy to identify salient object boundaries in images. There are difficulties with extracting watershed boundaries from raster algorithms. One example is that boundary edges for the watersheds are not explicitly defined. We must derive the edges and their connections to one another from the discrete raster structure. 5.1.3 Vector Algorithms Vector algorithms generate watershed boundaries with finer resolution than raster algorithms at the expense of using non-local properties of the data. Skea et al. [81] use the Voronoi edges of rivers edges as the watershed boundaries for the rivers. The Voronoi approach is similar to a vector equivalent to Band's line thinning algorithm for finding watershed boundaries [6]. It assumes that water is attracted to the nearest river segment. Skea et al. take the approach one step further by applying a subsequent correction to nearby terrain features. Consequently, in flat areas, the Voronoi edges between rivers act as the watershed boundaries while in more rugged areas terrain ridges act as the watershed boundaries. Palacios-Velez and Cuevas-Renaud [70] identify valleys and ridges from a TIN by looking at 61 how the paths of steepest descent on the adjacent faces to an edge lead towards or away from the edge. They make the observation that the contributing area (watershed) for a river is bounded by alternating paths of steepest ascent\/descent and ridges. They note that Clearly in order to apply this approach for the definition of contributing areas, a good routine, capable of tracking the ridge lines in ascending as well as descending direction should be developed first. However, the existence of segments of 'pure' streamlines alternating with ridge segments would further complicate the procedure. ([70], page 308) but find the watersheds by dividing the TIN faces into small areas and by following the paths of steepest descent from the centre of each small area to a river or pit. Their subdivision approach converges to the watershed, but leaves itself open to potential watershed inconsistencies. However, their suggestions of embedding the steepest ascent and steepest descent paths from ridge lines into the terrain would lead to a correct algorithm. Their later work [71] considers soil types and runoff of the terrain in the watershed partitions. . \u2022 Nelson, Jones, and Miller [35, 66] simplify the raster elevation data with a TIN terrain model, sub-divide triangles of the TIN according to their drainage characteristics, and group all triangles whose paths of steepest descent from their centroids end at a common local minimum in the terrain. These groups are the watersheds. The watershed boundaries are the edges of the TIN whose adjacent triangles belong to different watersheds. Garg and Sen [42] also approximate the terrain with a TIN to find watersheds. They classify each edge of the TIN as a ridge, channel, or flow boundary segment (transfluent edge) and use paths of steepest ascent from the channel edges to identify the watershed cascades for each channel edge (the watershed cascade for an edge is the area of land that drains directly to the edge, but not from another channel edge). They model the watershed cascade for a channel edge as a band on the terrain whose edges are paths of steepest ascent from the channel endpoints. The algorithm assumes that the paths of steepest ascent only cross TIN edges until they reach a ridge or another channel edge. Although the vector algorithms define a set of boundaries, the results are not always the complete watershed boundaries. With the Voronoi diagram of Skea, the accuracy of the boundary depends on how well the Voronoi edges match with the terrain. With the triangle assignment of Nel-62 son et al., the boundary of one watershed can incorrectly degenerate to a collection of unconnected closed curves. Wolf [96] finds watersheds as an application of surface networks [74]. He uses a surface that is twice continuously differentiable. The surface network uses the local maxima, local minima, and the saddles of the surface as the graph vertices and links the vertices by their proximity on the terrain, much like contour trees [90]. Wolf specifies a set of properties that the graph edges must satisfy to be a weighted surface network, but does not indicate how the edges should be found. The watershed of a pit p is then a closed cycle in the weighted network graph that alternates between passes and peaks, that encloses all the river courses and pits \"upstream\" from p, and that does not encircle any other river courses or pits. Unfortunately, the approach does not consider mathematical drainage degeneracies, such as a watershed whose interior is not connected, that can occur even in twice continuously differentiable surfaces. 5.2 Watershed Algorithm Overview Our approach for identifying watersheds focuses on the boundaries instead of on the area inside the watersheds. As with Nelson et al., we model the terrain with a TIN and assume that water always follows the path of steepest descent. We identify a set of edges and paths on the TIN that form a superset of the watershed boundary edges for the pits in the terrain and connect these edges and paths into a graph. We prove that the graph is planar and that each face of the graph is the watershed of one pit on the terrain. Finally, a simple graph traversal algorithm extracts the watershed boundaries as a single closed and connected curve. Most greedy algorithms for constructing TINs from raster grids [47, 53] provide good space savings over a raster terrain representation. As suggested by Fowler and Little [38] and further developed by Little and Shi [56], precomputing structural lines such as ridges and valleys of the raster grid and using these lines as a basis for a constrained triangulation of the terrain can provide an equally good terrain representation with the same error tolerance as the other TINs but use fewer TIN vertices. We can easily question each of our model assumptions, namely that we approximate the terrain with a TIN, and that water always follows the path of steepest descent, on the terrain. For the terrain, it is non-trivial to guarantee that the drainage characteristics of the TIN and terrain 63 match. This aspect leads to a different research area that starts with embedding breaklines into the TIN to correct poor triangulations [17] and can lead to shaping or sculpting the terrain [93]. For the path of water, the assumption implicitly states that water has neither volume nor inertia. Once water reaches a pit, it cannot escape the pit no matter how shallow the depression in the terrain. When water reaches a sharp edge of the terrain, it can change its course by up to 90 degrees. Despite their possible limitations, our assumptions are common for many models of surface water flow. While we limit ourselves to two simple assumptions, models with extensions of our assumptions contain the properties of our model. Our goal is to obtain consistent watershed boundaries for our simple model; once we have consistent boundaries we can better isolate how additional properties affect the watersheds of points. 5.3 Height Profile Function The drainage characteristics at a point depend on the elevations of the surrounding TIN. Define the height profile function ht:P(9) : [0,27r) -) 1 at a point p to be a function of the angle 9 that returns the elevation at which the cylinder {(x \u2014 p.x)2 + (y \u2014 p.y)2 = e 2 , z \u00a3 R } intersects the TIN in a direction 9 from p. This definition can be seen as a continuous version of the 8-neighbor test of O'Callaghan and Mark [68]. We will only talk about the profile around one point, which we assume to be the origin of the coordinate system. The interesting properties of he

2> \u2022 \u2022 \u2022' aPl} 1S * n e s e ^ \u00b0f incondug generalized ridges along faces at p in counter-clockwise order around p starting at up0 \u2022 Up = VPU Ap = {uP1, uP2,..., uPk}. The elements of the set are in counterclockwise order around p starting at upo \u2022 Tp = {tP1,tP2,..., tP2l} are paths of steepest ascent from p that stop as soon as they encounter an element of R. Path t{ is the path of steepest ascent along incoming generalized ridge et[;\/2j-Let b{ be the element of R where the highest point of t{ ends. \u2022 cwp(r) : RpUTp \u2014>\u2022 UPU {upo} is defined as follows: - if r G Rp then cwp(r) is the first element of UPU {UPQ} immediately clockwise from ridge r at p 75 - if r G Tp as r = \u00a3p(2i+i) for an integer i then cwp(r) is the first element of Up U {%>o} immediately clockwise from r at p \u2014 otherwise, r 6 T p as r = \u00a3p(2i) for an integer i so define cu;p(r) as the element of Up that matches a;. Then, the embedded watershed graph G = (V, E) for a normal terrain has vertices V = Ru\\J(UpU{up0}) pes and edges E = U U (cw P(r),r)U (J (cwp(tt-),6i) I p e s \\ r e \u00ab P t ; e T p \/ and abstracts the local drainage characteristics of normal terrains (Lemma 5.5.3). For notational convenience, the definition for the watershed graph edges assumes that we can derive the ridge bi, which is where the path of steepest ascent ends, from The order of the edges around a geomet.ry c u . function watershed topology ^ valley (drain in the given direction) \u2014\u2014\u2014\u2014 incoming ridge incoming generalized ridge on a face outgoing ridge Figure 5.11: Watershed graph topology on a normal terrain. vertex in the watershed graph embedding is the same as the geometric order of the ridges around the point in the TIN. For two paths of steepest ascent \u00a3Pi2i+i and \u00a3Pi2i+2 with the same geometric coordinates, the edge for \u00a3P)2i+i appears counterclockwise from edge \u00a3p,2i+2 at their common upper point b^, by definition of the cwp function, they have different endpoints at their lower endpoint. The watershed graph embedding remains planar (Lemma 5.5.4). 76 Figure 5.12 depicts a triangulation for a terrain (left) with ridges in bold lines and paths of steepest ascent across faces in dashed lines. The corresponding watershed graph appears in the right panel of the same figure. Figure 5.12: A normal terrain (left) and its watershed graph (right). Ridges appear in red and steepest ascent paths appear as dashed lines. Lemma 5.5.3 Let q be a point on a normal TIN in a local neighbourhood N of TIN vertex p. There is a path in the embedded watershed graph between p and q that stays in N iff q drains to p or p drains to q. Proof: We show that the watershed graph captures the drainage characteristics of gener-alized ridges that lie on TIN faces. The analysis for the drainage characteristics of ridges along TIN edges is the same as in the proof of Lemma 5.5.1. Earlier, we described how an incoming generalized ridge was modelled by two ridges that bound an incoming valley. These two ridges correspond to matching steepest ascent paths tp,2i+i and \u00a3p,2i+2 in the watershed graph. Since the paths have different lower end-points, the incoming valley bounded by the paths flows into the point p. For an outgoing generalized ridge across a face, the watershed graph has no corre-sponding edge. Lemma 5.6.9 proves that the outgoing generalized ridges have no effect on the watershed boundary for normal TINs so excluding them from the watershed graph is acceptable. \u2022 Lemma 5.5.4 The watershed graph for normal terrains is planar. 77 Proof: The embeddings for vertices of Up and R remain unchanged from Lemma 5.5.2 of Section 5.5.1 on nice terrains. Graph edges that correspond to ridges of Rp are straight-line embeddings along the ridge edges. Graph edges from T p , which are paths of steepest ascent along faces, follow the paths of steepest ascent in the TIN. The paths of steepest ascent guarantee that edges from different paths of steepest ascent will not cross one another. This leave us to consider the embedding of edges \u00a3P]2i+i and \u00a3Pi2i+2 that originate from a common path of steepest ascent a;. Convolve the path a, with a disk of radius e and embed \u00a3Pi2i+i and \u00a3P)2i+2 along the outer edges of this convolution to guarantee a 2e separation between , tp,2i+i and \u00a3Pi2i+2- \u2022 5.5.3 Nasty Terrain Nasty terrains have no assumptions on where paths of steepest ascent stop. Nasty terrains allow paths of steepest ascent to end at saddle points or endpoints of ridges. This situation requires a precarious alignment of TIN vertices. Nevertheless, the water flow and terrain model descriptions do not forbid this behaviour. The only place where paths of steepest ascent affect the drainage characteristics is at in-coming generalized ridges along TIN faces. The paths of steepest ascent that stop at ridges create \"spikes\" in the watershed boundary and were studied in Section 5.5.2. When the upper end of a spike is a TIN vertex v, all the water that drains to v can drain along the spike. If the watershed for v has a non-zero area then the spike connects two areas into one watershed: it funnels one area into another. In this instance, the interior of the watershed can be disconnected. Let S be the set of TIN points with more than one outgoing generalized valley. Add the TIN points that are the lower endpoints of ridges to S. We have two cases to examine for steepest ascent paths that end at a point of 5: \u2022 the upper end of a steepest ascent path reaches the point of 5 through the drain, and \u2022 the upper end of a steepest ascent path reaches the point of S through an outgoing generalized valley that is not the drain. Consider what happens to a pair of parallel ridges separated by an e-width valley that reach a point p of S. Since the path of steepest ascent reaches the point itself, it must approach the point 78 along an outgoing generalized valley. The difference is whether or not the outgoing generalized valley is the drain for the point. Start with the case where the outgoing generalized valley is the drain of p. Any water that reaches an incoming generalized valley at p must flow through p and out the drain for p. When a path of steepest ascent is converted into a pair of parallel ridges, all water that would normally follow the path of steepest descent is modelled as flowing down the valley between the two parallel ridges. Conversely, if the outgoing generalized valley is not the drain of p then any water in incoming generalized valleys must not enter the valley between the two parallel ridges. When the paths of steepest ascent end at vertices, the two parallel paths that model the steepest ascent path form a funnel; water on the terrain can reach into the valley between the parallel paths. These e-width funnels connect disjoint areas of the terrain. Consequently, the watersheds for nasty terrains can have disconnected interiors as well as spikes. The effects of the funnels on the watershed graph are in the endpoints of graph edges. In normal terrains, the endpoints of graph edges are always from a subset of the graph vertices, namely R, whose elements were in a one-to-one correspondence with TIN features. In nasty terrains, the graph edges can stop at elements of Up. With this change, the watershed graph is no longer bipartite. We present the definitions for sets of the TIN that define the watershed graph. Only those definitions that change from the normal terrains are described below. We consider ridges in the TIN as open edges; they do not contain their endpoints. \u2022 S is the set of TIN points whose height profile function have more than one outgoing gener-alized valley or that are the lower endpoint of a ridge. Let p be a point of S. \u2022 The set of steepest ascent paths Tp is defined as for normal terrains. However, the point bi where a steepest ascent path ends is either the interior point of a ridge in R or is a point of 5. \u2022 Ip is the set of paths in Tq that, for any q \u00a3 5, end at p. \u2022 the domain of the function cwp(r) changes to include Ip. cwp(r) : Rp U Tp U Ip \u2014> Up U {upo} is as follows: - if r \u00a3 Rp or r \u00a3 Ip and r does not enter p along its drain then cwp(r) is the first element 79 of Up U {upo} immediately clockwise from ridge r at p - if r G Tp as r = tpi and i is odd then cwp(r) is the first element of UPU {upo} immediately clockwise from r at p - if r G Tp as r = \u00a3p,- and i is even then cwp(r) is the element of Up that matches a^ij. - if r G 7p enters p along its drain, r = tqi for some q G S and i is odd then ctup(r) = up0 - if r G Ip enters p along its drain, r = tqi for some q G 5 and i is even then cwp(r) = upfc. (Recall that J7P = {uP1, uP2,.. ., wpfc}). The watershed graph G = (V, E) for a nasty terrain has vertices V = RU\\J(UPU {upo}) pes and edges ^=U U ( c w p ( r ) ' r ) U U (cwp(U),bl))U \\J {cwp{ti),cwbi{ti)) P6-5 \\J'G-Rp (;6Tpwhere t;gTpwhere 6,-\u00a3S and abstracts the drainage characteristics of nasty terrains (Lemma 5.5.5). The convention that a path of steepest ascent t{ links to a single element of 5 U R simplifies the notation for edge definitions. geometry r u l ftmotion watershed topology .^ valley (drain in the given direction) \u2014 \u2014 \u2014 incoming ridge incoming generalized ridge on a face outgoing ridge Figure 5.13: Watershed graph topology on a nasty terrain. In the definition of the edges for a TIN point, the first union creates graph edges for adjacent TIN ridges, the second union creates graph edges for incoming generalized ridges whose paths of 80 steepest ascent end at ridges, and the third union creates graph edges for incoming generalized ridges whose paths of steepest ascent end at TIN vertices. Each union differs in the selection of the second endpoint for the graph edge. Figure 5.14 shows a terrain (left) with ridges in bold lines and paths of steepest ascent across faces in dashed lines. In this figure, the interior for one of the watersheds is not simply connected since there are paths of steepest ascent that start from the centre saddle point and reach a saddle point on the left of the terrain. The corresponding watershed graph for this terrain appears in the right panel of the same figure. Figure 5.14: A nasty terrain (left) and its embedded watershed graph (right). Ridges appear as red edges and steepest ascent paths appear as dashed lines. Lemma 5.5.5 Let q be a point on a nasty TIN in a local neighbourhood N of TIN vertex p. There is a path in the embedded watershed graph between p and q that stays in N iff q drains to p or p drains to q. Proof: We concentrate on the drainage effects of paths of steepest ascent whose highest point is a saddle point of the TIN. The remaining topology of the watershed graph is identical to that of normal terrains and its drainage characteristics are covered by the proof of Lemma 5.5.3 for normal terrains. Given a path of steepest ascent up that ends at a point p in the set 5 the path defines a pair of graph edges \u00a3 2,+i and t2i+2- Only one such pair of graph edges can approach a point of 5 from each of its outgoing generalized valleys (Lemma 5.5.6). If the pair approach from 81 an outgoing generalized valley that is not the drain, then the cwp function uses the same point of V as the endpoint of the edges, so the face is closed at p. If the pair approach from the drain of p then the cwp function assigns UPQ as the endpoint of the most-counterclockwise edge and upk as the endpoint of the most-clockwise edge. The two edges close a face only if k = 0, which implies that there are no incoming generalized valleys at p and that p is a peak.\u2022 Lemma 5.5.6 At a point p in a TIN, at most one path of steepest ascent can reach p along each of its outgoing generalized valleys. Proof: Proof by contradiction. Suppose that two paths of steepest ascent A and B reach p along the same outgoing generalized valley. Then A and B follow one another as they leave p. Let q be the first point along the reversal of A and B where A and B differ. Since both paths are paths of steepest ascent, q must have more than one outgoing valleys. Any point with more than one outgoing valley, whether a TIN point or a point on a ridge edge, stops paths'of steepest ascent, so both A and B should have stopped at q. This contradicts A and B ever reaching point p. \u2022 Lemma 5.5.7 The watershed graph for a nasty terrain is planar. Proof: The planar embedding of the watershed graph for nasty terrains is identical to the embedding for normal terrains of Lemma 5.5.4. Lemma 5.5.6 shows that only one pair of graph edges can ever reach a point along an outgoing generalized valley. If tPj2i+\\ and \u00a3p,2;+2 reach a common graph point q then embedding \u00a3p,2\u00bb'+i counterclockwise from tp,2i+2 prevents the edges from crossing one another. The order of the edges at their common upper point is opposite from the order at their common lower point; edge \u00a32;+i is clockwise from edge i2j'+2 at their lower endpoint and is counterclockwise from edge \u00a32;+2 at their upper endpoint. With this embedding, the watershed graph remains planar. \u2022 5.5.4 Watersheds of A r b i t r a r y Terrain Points The watershed graphs of Sections 5.5.1 to 5.5.3 concentrate on the watersheds for pits. More often we want to find the watersheds of other points, such as points on rivers. A simple extension of the watershed graphs for pits allows us to find the watersheds of arbitrary points. 82 The edges inside each face of the watershed graph create a monotonicity property for wa-terflows within the face. \u2022 Let \/ be a face of a watershed graph that contains a pit p. \u2022 Let Cf : [0,1) \u2014\u00bb R 2 be the parameterised boundary of \/ in the watershed graph. \u2022 Let in: f \u2014>\u2022 [0, 2n) be the angle at which water placed at a point of \/ enters p. \u2022 Assume that q = C(0) is the most-clockwise point around C for which in(q) \u2014 0. From these definitions, Lemma 5.5.8 establishes the monotonicity property for the embedded wa-tershed graph. Lemma 5.5.8 The internal edges of the watershed graph faces guarantee that if s and t are points of f where s is in the path of steepest descent from a point s' = Cf(so) and t is in the path of steepest descent from a point c t' = Cf(t'o) and so < to then in(s)