Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Automatic determination of puck possession and location in broadcast hockey video Duan, Xin 2011

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2011_fall_duan_xin.pdf [ 26.67MB ]
JSON: 24-1.0052111.json
JSON-LD: 24-1.0052111-ld.json
RDF/XML (Pretty): 24-1.0052111-rdf.xml
RDF/JSON: 24-1.0052111-rdf.json
Turtle: 24-1.0052111-turtle.txt
N-Triples: 24-1.0052111-rdf-ntriples.txt
Original Record: 24-1.0052111-source.json
Full Text

Full Text

Automatic Determination of Puck Possession and Location in Broadcast Hockey Video  by Xin Duan B.S. (Computer Science) Zhejiang University, 2009  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  Master of Science in THE FACULTY OF GRADUATE STUDIES (Computer Science)  The University Of British Columbia (Vancouver) August 2011 c Xin Duan, 2011  Abstract The goal is to automatically determine puck possession and location in broadcast hockey video. The task is challenging for several reasons. The puck itself is small, moves rapidly and lacks distinctive local visual features. Players also move and change direction rapidly. Interactions among players on the same team and with players on the opposing team are complex. This thesis demonstrates an automatic system composed of two modules that explore the task from different perspectives. The system takes as input player tracking results that have been registered to a geometric model of the ice surface under a given homography. It analyzes video sequences according to the user specified module and generates graphical annotation and data storage as output. In the first module, puck candidates are detected directly. An innovative hierachical graph-based method is used to track puck transitions between players. Based on these puck transitions, a puck state engine is created to determine puck possession and to select the most likely puck trajectory. In the second module, a dense player motion field is constructed to estimate puck location and possession according to motion convergence points. This work is a first attempt to extend automated hockey video analysis to include puck detection and puck possession. It stands as a proof of concept system for the broad area of team sports analysis.  ii  Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ii  Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  iii  List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  vi  List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  vii  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ix  1  Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1  1.1  Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1  1.2  Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . .  2  1.2.1  Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  3  1.2.2  Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . .  3  Sub-Problems and Vision Challenges . . . . . . . . . . . . . . . .  4  1.3.1  Puck Tracking . . . . . . . . . . . . . . . . . . . . . . .  4  1.3.2  Motion Field Analysis . . . . . . . . . . . . . . . . . . .  5  1.3.3  Puck Possession . . . . . . . . . . . . . . . . . . . . . .  7  1.4  Annotation System . . . . . . . . . . . . . . . . . . . . . . . . .  8  1.5  Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . .  9  Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  11  2.1  About Possession . . . . . . . . . . . . . . . . . . . . . . . . . .  11  2.1.1  Sport View . . . . . . . . . . . . . . . . . . . . . . . . .  12  2.1.2  Ball Possession Automation . . . . . . . . . . . . . . . .  13  1.3  2  iii  2.2  UBC Hockey Analysis System . . . . . . . . . . . . . . . . . . .  13  2.2.1  Player Tracking System . . . . . . . . . . . . . . . . . .  13  2.2.2  Rink Rectification System . . . . . . . . . . . . . . . . .  14  2.2.3  Play Annotation System . . . . . . . . . . . . . . . . . .  15  Puck/Ball Tracking . . . . . . . . . . . . . . . . . . . . . . . . .  15  2.3.1  Tracking the Puck . . . . . . . . . . . . . . . . . . . . .  15  2.3.2  Tracking the Ball . . . . . . . . . . . . . . . . . . . . . .  16  Motion Field . . . . . . . . . . . . . . . . . . . . . . . . . . . .  19  3  System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  20  4  Motion-Based Puck Tracking . . . . . . . . . . . . . . . . . . . . . .  24  4.1  Candidates Detection and Filtering . . . . . . . . . . . . . . . . .  24  4.1.1  Preliminary Blobs Extraction . . . . . . . . . . . . . . . .  25  4.1.2  Domain Knowledge Based Filtering . . . . . . . . . . . .  30  Hierarchical Graph-Based Puck Tracking . . . . . . . . . . . . .  35  4.2.1  Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . .  36  4.2.2  First-Level Graph Construction . . . . . . . . . . . . . .  36  4.2.3  Second-Level Graph Construction . . . . . . . . . . . . .  37  4.2.4  Trajectory Candidates Estimation . . . . . . . . . . . . .  39  Puck Trajectory and Possession . . . . . . . . . . . . . . . . . . .  41  4.3.1  Puck State Engine . . . . . . . . . . . . . . . . . . . . .  42  4.3.2  Dynamic Annotation and Analysis . . . . . . . . . . . . .  43  Motion Field Analysis . . . . . . . . . . . . . . . . . . . . . . . . . .  47  5.1  Dense Motion Field Construction . . . . . . . . . . . . . . . . . .  47  5.1.1  Motion Field Window . . . . . . . . . . . . . . . . . . .  47  5.1.2  Motion Definition . . . . . . . . . . . . . . . . . . . . .  48  5.1.3  Radial Basis Function Interpolation . . . . . . . . . . . .  49  Play Evolution and Puck Location Prediction . . . . . . . . . . .  52  5.2.1  Measurement Propagation . . . . . . . . . . . . . . . . .  52  5.2.2  Convergence Point Detection . . . . . . . . . . . . . . . .  52  Team Possession Estimation . . . . . . . . . . . . . . . . . . . .  53  2.3  2.4  4.2  4.3  5  5.2  5.3  iv  6  Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  55  6.1  Ground Truth . . . . . . . . . . . . . . . . . . . . . . . . . . . .  55  6.2  Experiments and Error Measurements . . . . . . . . . . . . . . .  56  6.2.1  Puck Possession . . . . . . . . . . . . . . . . . . . . . .  56  6.2.2  Puck Tracking . . . . . . . . . . . . . . . . . . . . . . .  58  6.2.3  Motion Field Analysis . . . . . . . . . . . . . . . . . . .  64  System Performance . . . . . . . . . . . . . . . . . . . . . . . .  69  6.3.1  Puck Tracking Module . . . . . . . . . . . . . . . . . . .  69  6.3.2  Motion Field Module . . . . . . . . . . . . . . . . . . . .  70  Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . .  72  7.1  Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  72  7.2  Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  73  7.2.1  Video Frame Classification . . . . . . . . . . . . . . . . .  73  7.2.2  Multiple Cameras . . . . . . . . . . . . . . . . . . . . . .  73  7.2.3  Puck Candidates Detection . . . . . . . . . . . . . . . . .  74  7.2.4  Puck Tracking . . . . . . . . . . . . . . . . . . . . . . .  74  7.2.5  Stability when Real Trajectory is not Detected . . . . . .  75  7.2.6  Puck Possession Reasoning . . . . . . . . . . . . . . . .  75  7.2.7  Motion Field Analysis . . . . . . . . . . . . . . . . . . .  76  7.2.8  Offline, Online and Real-time . . . . . . . . . . . . . . .  76  Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  77  6.3  7  v  List of Tables Table 4.1  Transition rules definition for PSE . . . . . . . . . . . . . . .  43  Table 6.1  Puck player possession results for 1000 frames . . . . . . . . .  57  Table 6.2  Puck team possession for 1000 frames . . . . . . . . . . . . .  57  Table 6.3  Puck possession error measurements for 1000 frames . . . . .  57  Table 6.4  Error measurements for puck transition trajectories . . . . . . .  63  Table 6.5  Puck transition trajectories vs. full trajectory . . . . . . . . . .  64  Table 6.6  Puck team possession for 1000 frames by motion field . . . . .  69  Table 6.7  Preprocessing stages rate for 1000 frames test sequence . . . .  70  Table 6.8  Annotation system stages rate for 1000 frames test sequence . .  70  Table 6.9  Motion field speed for window size 106x115 . . . . . . . . . .  71  vi  List of Figures Figure 1.1  Single puck trajectory . . . . . . . . . . . . . . . . . . . . .  6  Figure 1.2  GUI for puck tracking module . . . . . . . . . . . . . . . . .  9  Figure 1.3  GUI for motion field module . . . . . . . . . . . . . . . . . .  10  Figure 3.1  System overview . . . . . . . . . . . . . . . . . . . . . . . .  21  Figure 3.2  Module A: puck tracking . . . . . . . . . . . . . . . . . . . .  22  Figure 3.3  Module B: motion field . . . . . . . . . . . . . . . . . . . . .  23  Figure 4.1  DoG filter with global threshold . . . . . . . . . . . . . . . .  27  Figure 4.2  Adaptive thresholding . . . . . . . . . . . . . . . . . . . . .  28  Figure 4.3  Connected components labeling connectivity . . . . . . . . .  29  Figure 4.4  Size and shape filter . . . . . . . . . . . . . . . . . . . . . .  29  Figure 4.5  NHL official rink model . . . . . . . . . . . . . . . . . . . .  33  Figure 4.6  Weighted directed acyclic graph . . . . . . . . . . . . . . . .  40  Figure 4.7  Puck state engine . . . . . . . . . . . . . . . . . . . . . . . .  43  Figure 5.1  Motion field . . . . . . . . . . . . . . . . . . . . . . . . . . .  51  Figure 6.1  Puck possession visualization interface . . . . . . . . . . . .  58  Figure 6.2  Example of puck transition trajectory . . . . . . . . . . . . .  59  Figure 6.3  Puck transition trajectory example error measurement . . . . .  60  Figure 6.4  Final puck trajectories i . . . . . . . . . . . . . . . . . . . . .  61  Figure 6.5  Final puck trajectories ii . . . . . . . . . . . . . . . . . . . .  62  Figure 6.6  Interpolated puck final trajectory and ground truth . . . . . .  64  Figure 6.7  Full puck trajectory and ground truth compare . . . . . . . . .  65  Figure 6.8  Motion field play evolution and puck location prediction i . .  66  vii  Figure 6.9  Motion field play evolution and puck location prediction ii . .  67  Figure 6.10 Motion field puck team possession annotation . . . . . . . . .  68  viii  Acknowledgments I express my gratitude to my supervisor Prof. Woodham for his advice, support, inspiration, encouragement, and insightful discussion throughout this research work. Thanks to Prof. Little for being the second thesis reader and sharing valuable suggestions. I would like to thank many colleagues in the Laboratory of Computational Intelligence who had meaningful discussions with me from both the research and software perspective, specifically Kenji Okuma and Wei-Lwun Lu who provided the player tracking results, Ankur Gupta who generated the homography, Tomas Hofmann who shared initial ideas about trajectory scoring. I would also like to thank Dr. Kihwan Kim from the Georgia Institute of Technology for discussion about details of motion field analysis. Thanks to the Department of Kinesiology and Physical Education at McGill University for providing the high quality HD video data. Thanks to Natural Sciences and Engineering Research Council of Canada (NSERC) for supporting this work. Finally, I wish to thank my parents who raised me, love me and always support me. Thanks to my love who proofread this thesis. To them I dedicate this thesis.  ix  Chapter 1  Introduction 1.1  Motivation  Sports video analysis is a topic of interest in current computer vision research. The methods developed are expected to have wide application beyond the area of sports. Automatic analysis can help to enhance the broadcast quality as well as assist coaches and players to improve play strategy. Moreover, it provides some intuitive methods for solving general vision problems such as detection and tracking, indexing, object-based encoding and video enhancement. This work focuses on multiplayer team sports analysis, particularly hockey. Due to the complex dynamic scenes, automatic analysis is quite challenging. Plenty of research explores sports analysis from a traditional vision perspective including detecting and tracking the players, estimating the homography for coordinate system rectification, content-based camera calibration, play event detection. In addition to analysis, understanding complex dynamic scenes from both a vision and a sport perspective motivates this work to explore the automatic generation of puck possession and location information in broadcast hockey video. Interest is also motivated by sport concepts that are of interest to fans, coaches and players alike. Other than hockey, sports such as basketball and soccer also have similar concepts. • Offence-Defence: terms crucial for planning sport strategy. Distinguishing  1  them depends on team puck possession. • Possession: a term that indicates control of the play. Puck possession in hockey includes team possession and player possession. Both aggregate and temporal data are required.  In hockey, the team that demonstrates the greatest control and movement of the puck, typically is the most successful. — The Art of Puck Possession[3] • Play Evolution: a general term to describe where the play goes to. In lots of cases, it predicts where the puck location will be. It can estimate puck location and possession and thus can potentially be used for camera control.  A good hockey player plays where the puck is. A great hockey player plays where the puck is going to be. — Wayne Gretzky In most cases, people can distinguish the offensive team from the defensive team and tell which team or player possesses the puck during hockey broadcast. However, automatically determining this information is challenging. Hockey may be the hardest of all the multiplayer field sports for this topic owing to the speed and continuous flow of play. We are not aware of any previous work to automate the determination of puck possession and location in hockey.  1.2  Problem Definition  The impact of puck possession and location on ice hockey strategy has been articulated by Thomas from a purely sports and statistical perspective [43]. Given today’s computer vision technology and large volume of hockey video data, the potential exists to determine puck possession and location automatically. Given a hockey broadcast video sequence, the problem is stated as follows: 1. Automatically determine puck possession for teams. 2. Automatically determine puck possession for players. 2  3. Automatically track (when the puck is visible) or infer (when the puck is not visible) the puck location to determine the puck trajectory. 4. Explore the possibility to process the data on-line, in addition to off-line, after the fact, analysis.  1.2.1  Data  We use High Definition (HD) broadcast video sequences from National Hockey League (NHL) games obtained from the Department of Kinesiology and Physical Education at McGill University. The video sequences were recorded by a single HD camera mounted approximately over the extended center line. The camera is allowed to pan, tilt and zoom. We use subsets of a full game video. In particular, we use a 1000 frame sequence as the major test sequence for our experiments. The data resolution is 1920 x 970. Our experiments were run on the raw data sequence.  1.2.2  Scope  Here, we discuss the scope of the thesis problem and identify components of the overall system provided by others. Assumptions about the input to our system are: • Video Data We assume that the geometric transformation, both between frames and of each frame to the rink model, is an homography. Moreover, we assume our test sequences have no shot boundaries and are sampled evenly in time. Our methods rely on temporal coherence. • Player Tracking Player tracking results are pre-generated by Okuma et al. for the test video sequences [32]. Results include the players’ bounding box coordinates in the image coordinate system for each frame, labeled by player ID and team. The robustness of player tracking is reliable in our system with few small geometric and player identification errors. • Homography The homographies for each frame that transform between image coordinates 3  and the geometric rink model are pre-generated, based on the work of Gupta et al. [11] [12]. There are occasional frames with registration errors. However, our system is robust to occasional registration errors.  1.3  Sub-Problems and Vision Challenges  To achieve our goal of determining the puck possession and location, two ideas are exploited in two separate modules in our system: 1) detect and track the puck location whenever possible, assign puck possession based on puck location and player trajectories; 2) create a representation for patterns of player motion, infer which team possesses the puck and where the puck will be. Each of the two modules relates to other active vision research. We identify three sub-problems and discuss their challenges.  1.3.1  Puck Tracking  Research on ball tracking in soccer has been published [16] [35] [49]. However, to the author’s knowledge, there are no previous publications about puck tracking in hockey. Generally, detecting and tracking the puck creates challenges including erroneous detection, occlusions, real-time requirements. Specific challenges are: • The puck is small (25mm thick, 76mm diameter) and lacks distinctive local features. • The puck moves fast (can reach 170 km/h) and is frequently blurred within the exposure time of a single video frame. • Fast motion causes apparent puck intensity, shape and size to vary across frames. • Lighting conditions influence apparent puck intensity in different parts of the rink. • The puck can be hidden from view (i.e., occluded by a player, by the blade of a stick, or otherwise not be in the camera’s field of view).  4  • Puck-like backgrounds (i.e., logos, advertisements, lines, shadows) on the ice may lead to erroneous or missed detection. • The relatively large size of the rink (61 meters x 30 meters square). • The puck trajectory is sensitive to camera motion and perspective distortion. • There are occasional frames with registration errors. There are three general approaches to object tracking: feature-based, modelbased, and motion-based. Feature based tracking uses features to detect objects of interest within each single frame by methods such as SIFT [27]. Model-based tracking uses semantic analysis and domain knowledge beyond feature-based tracking to increase accuracy and robustness. This has been used for vehicle and human tracking [18]. Motion-based tracking uses temporal consistency and physical motion assumptions over multiple frames to track moving objects. In hockey video the puck is small and lacks distinctive local features. Simple features (i.e., size, shape, intensity) vary a lot across frames. Moreover, there are lots of other puck-like features in view which are hard to distinguish from the real puck purely based on features. In our experience, feature-based and modelbased tracking alone cannot reliably track the puck. Thus, we include temporal and spatial information so that motion-based tracking becomes the kernel of our puck tracking algorithm. Details are in Chapter 4. If the sequence only contains one single pass between two players, our motionbased puck tracking is expected to generate a reliable puck trajectory, see Figure 1.1. However, longer video sequences contain not only visible puck transitions but also ones that are not visible. Moreover, puck-like objects have a bigger chance to create false alarm trajectories. Thus, generating puck trajectories to represent all the actual puck transitions between players in a long video sequence remains a challenge.  1.3.2  Motion Field Analysis  We make our best effort to track the puck directly in order to determine puck location and possession information. However, our motion-based puck tracking is not applicable to the case when the puck is not visible for a long period. Sometimes, 5  Figure 1.1: Single puck trajectory. The puck trajectory shown here is overlaid in green. It represents a single pass between two players. Red dots are other puck candidates. The trajectory plotted on this single frame includes past and future information and is curved owing to camera motion. we only care about the approximate puck region, play evolution and puck possession for teams. Estimating team possession and predicting the possible puck region based on player motion flow has one major benefit: it is applicable to any frame as long as it contains player motion. A downside of this method is low precision compared to direct puck tracking. Our method is based on the assumption that overall player motion in hockey converges either to the current puck location or to where the puck is likely to go to. This assumption holds in many cases of multiplayer field sports. However, it is not always the case when less skilled players make a mistake or when skilled teams have plays involving deception. Our goal in motion field analysis is to infer puck location and possession, as best we can, based on motion convergence. The idea is based on the method in [17] that first extracts the player motion and projects it onto world coordinates, then generates a dense motion field by interpolation, and finally calculates motion flow convergence points to predict ball location  6  and play evolution. The particular challenges with broadcast hockey video are: • A single camera view only shows part of the rink and some of the players. Visible player motion data is insufficient to provide good interpolation results for the full rink. • As the camera moves, the motion field needs to be generated dynamically. • In hockey player tracking, high frequency oscillation in position estimation may influence the dense motion field at its boundaries due to the relative size of motion field and players. Moreover, rink homography estimation is not ideal so that velocity oscillation is inevitable. This increases the difficulty of accurate prediction. Temporal smoothing may be required to maintain stability. • Hockey strategy is dynamic and evolving, especially at high skill levels. Multiple convergence points are to be expected. The trade-off between global and local propagation needs to be balanced. The potential benefit of multiple convergence points needs to be explored. • The puck can move and change direction rapidly. Players have noticeable response time. This needs to considered when estimating the player velocity. • Real-time processing and visualization also create challenges.  1.3.3  Puck Possession  Puck possession has great impact in the hockey strategy [43]. Manual labeling is not practical, especially when dealing with a large amount of video data. Automatic determination of puck possession is of value and is the novel contribution of this thesis. Consider the definition of puck possession in hockey. In the NHL official rule book [21], puck possession is defined as: the state of a player other than a goaltender who is the last one to have come in contact with the puck. The idea for possession determination is accordingly quite straightforward. Assume there are reliable puck tracking results as well as player tracking results for each frame, possession can be determined according to the last touch of particular player with the puck. Then, what challenges are associated with determining puck possession? 7  • A status trigger mechanism is required to update puck possession. • Player possession analysis is harder than team possession analysis. How can one maintain high accuracy for both? • We explore the online processing method based on player motion field analysis. It is hard to accurately generate player possession, however, whether it is possible to determine team possession is an open question. • Sometimes, players from opposing teams cluster in an area fighting for the puck. It is difficult to distinguish who possesses the puck in these cases.  1.4  Annotation System  This section describes the system graphical user interface (GUI). The implementation environment mainly is Python, with OpenCV and some Matlab code. The system extends the annotation software API created by Okuma et al. [32]. We use their basic functionality such as video loading and displaying, and create two new layers corresponding to our two modules. After loading the player tracking results and the homography data, the user selects the module to use for processing: • The puck tracking layer can automatically track the puck when there are visible puck transitions between players. It also assigns puck possession and determines the full puck trajectory. As shown in Figure 1.2, the red circle shows puck location and green lines show the puck trajectories. The interpolated full puck trajectory is kept up to date and displayed in the rink model. The rink model is located at the top left corner and displays player trajectories as well. The text box at the top right corner shows puck possession information for both teams and players as well as system status. Player possession is labelled in red text beside the player and by a black circle in rink model. • The motion field layer is processed online to construct a dynamic dense player motion field for each frame. As shown in Figure 1.3, the red rectangle on the rink model shows the motion field window. Green arrows represent 8  Figure 1.2: GUI for puck tracking module. Top left corner is the rink model. Green lines are puck trajectories. Red circle is puck location. Top right corner is puck possession information box. Player possession is also labelled beside the player. motion flow. Convergence points are displayed as red circles which represent possible play evolution and puck location. This chapter provides an overview of the system. In addition to problem definition, GUI visualization makes the task clear and transparent. From Section 1.3, we know that there are challenges. Upcoming chapters describe system architecture and the methods developed for the two modules. The next section provides the outline.  1.5  Thesis Outline  The thesis is organized as follows. Related work is discussed in Chapter 2. Previous research work related to the sub-problems in Section 1.3 is summarized from both a sport and vision research perspective. Chapter 3 presents an overview of our system architecture and the relationship between the two system modules. Chapter 4 describes the first system module that determines puck possession and tra9  Figure 1.3: GUI for motion field module. Top left corner is the rink model. Red rectangle shows the motion field window. Green arrows show motion flow. Red circles show the possible play evolution and puck location. Top right corner is puck possession information box. jectory via tracking the puck transitions between players. This chapter describes puck candidates detection, hierarchical motion-based tracking and determination of puck possession for teams and players based on a group of puck transition trajectories. Chapter 5 describes the second system module that estimates play evolution, possible puck regions and team possession via motion field analysis. This chapter introduces the basic theory of dense motion field construction, measurement propagation and convergence point detection. Improvements for the particular challenges of hockey are presented, including dynamic motion field, local propagation, temporal smoothing and puck team possession estimation. Chapter 6 presents results of our system and compares them with ground truth. Both system accuracy and performance are considered. Finally, Chapter 7 concludes our contributions, limitations and future work.  10  Chapter 2  Related Work This chapter reviews related work. Section 2.1 considers research on the impact of puck/ball possession and location in multiplayer field sports from a purely sport perspective. Research on automatic ball possession analysis also is considered. One goal is to build a hockey video analysis system for semantic annotation. Section 2.2 surveys some previous hockey analysis systems for player tracking, rink rectification and play reasoning. Corresponding to the two modules of our system, puck tracking and motion field analysis, Section 2.3 reviews puck/ball tracking methods and Section 2.4 discusses motion field construction and critical points detection.  2.1  About Possession  In most multiplayer field sports (i.e., hockey, basketball and soccer), puck/ball possession and location are of interest to fans, coaches and players alike. Several studies consider the relationship between ball/puck possession and sports strategy from a sport and mathematical view. See details in Section 2.1.1. To the author’s knowledge, there is no previous work to automatically generate puck possession and location for hockey. However, there is literature on automatic analysis of ball possession in soccer. See details in Section 2.1.2.  11  2.1.1  Sport View  Thomas considers the impact of puck possession and location on hockey strategy [43]. A state space capturing team possession and puck location is created. This is used to model the game as a Semi-Markov process. The model, in turn, is used to determine the average number of goals scored by a team as a function of the starting state. The model also is capable of producing strategic estimations of the trade-off between keeping puck possession and giving up puck possession to improve location. Moreover, Rollins challenges the traditional wisdom that always sacrificing puck control for constant forward motion and attempts to quantify length of possessions in relation to scoring chances [38]. As a player and a fan, Bruce discusses the difficulty and importance of maintaining puck possession [3]. Similar interest exists in other multiplayer field sports especially soccer. Jones et al. point out that the ability to retain possession of the ball is linked to success. Investigating 24 matches from English Premier League 2001-2002 season, they conclude that successful teams were found to have significantly longer possessions than unsuccessful teams irrespective of match status [15]. Similarly, O’Shaughnessy analyzes championship data from Australian Football League 2004-2005 season and proposed to make optimal use of ball possession data by looking at the choices faced by the player in possession that maximize the likelihood of success [33]. From another perspective, Lago-Penas and Dellal examine 380 matches from the Spanish League First Division 2008-2009 season and show that team possession is influenced by situation variables related to team status (i.e., winning or losing, home or away) based on linear regression analysis. Their results suggest that the best teams maintain a higher percentage of ball possession and with situation variables having only a small influence on it [20]. In basketball, temporal ball location and individual player possession may be more valuable for play strategy than aggregate possession time due to some specific rules (e.g., the 24 second shot clock in FIBA rules). Kubatko et al. review basics of basketball analysis and shed light on the relationship between possession and traditional boxscore statistics. They estimate individual possession usage and analyze individual efficiency [19].  12  2.1.2  Ball Possession Automation  There is no previous publication on automatically generating puck possession and location information for hockey. However, there are examples of team ball possession analysis for soccer. Yu et al. present a framework to automatically analyze team possession from broadcast soccer video [48]. Their framework takes time series curves of ball position as system input. Local extrema of the ball velocity curve and acceleration starting points are extracted. Ball and player locations are also used to infer the touch points. Analysis is in 2D image coordinates. The color histogram of player who touches the ball is used to recognize his corresponding team through a Support Vector Machine. Our system computes puck trajectories automatically instead of assuming them as input. In hockey, puck motion typically is piecewise linear, with constant velocity. Further, the puck is not always visible. In hockey, local extrema of puck velocity are not appropriate. Another novel approach is proposed by Pallavi et al. to identify team possession in soccer [34]. They segment soccer video frames by Markov Random Field (MRF) clustering, classify players by colour-texture features and detect them by a combination of static analysis, optical flow analysis and difference image analysis. They track the ball using a circle Hough transform which works for them since, in soccer, the ball is big enough and slow enough to be detected as a circular region in each frame.  2.2  UBC Hockey Analysis System  In this section, we describe the UBC hockey analysis system, including player tracking in Section 2.2.1, rink rectification and homography estimation in Section 2.2.2 and high-level play annotation in Section 2.2.3. This system is the starting point for the work described in this thesis.  2.2.1  Player Tracking System  Hockey player tracking is hard because motion is fast, player pose is varied, and player interactions are complex. Further, broadcast coverage includes camera motion and therefore a changing background. Okuma et al. automatically analyze the hockey video to estimate parameters of the broadcast cameras, to track hockey players and to display the resulting trajectories [30] [32]. To compensate for cam13  era motion, they first compute homographies using Kanade-Lucas-Tomasi (KLT) tracker [44] [40], RANSAC [9], and the normalized Direct Linear Transformation (DLT) algorithm [13]. To track hockey players, they combine AdaBoost for object detection with a boosted particle filter (BPF) for multiple-object tracking in dynamic models. Most players are successfully tracked, even when leaving and entering the camera’s field of view. Our current system expects, as input, player 2D locations and bounding boxes. Lu et al. automatically track multiple hockey players as well as simultaneously recognize player actions [28]. They represent the players with Histograms of Oriented Gradients (HOG), and use Switching Probabilistic Principal Component Analysis (SPPCA) to model the appearance of the players by a mixture of local subspaces. The boosted particle filter (BPF) in [30] is also augmented here with a more robust observation model. Player action is recognized by combining the HOG descriptors with the Sparse Multinomial Logistic Regression (SMLR) classifier.  2.2.2  Rink Rectification System  Another input to our system are the frame-by-frame homographies required for rink rectification. Okuma et al. automatically compute homographies to compensate for camera pan, tilt and zoom [31]. Local displacements of image features are firstly computed by KLT tracker. These features are selected by RANSAC and an initial estimate of the homography is generated. The system is capable to analyze long consecutive sequences of NHL hockey video and to map each frame into rink coordinates. Occlusion due to players, camera motion and the lack of reliable keypoints in some camera views all pose challenges for robustness and accuracy. Gupta et al. use line and ellipse features in addition to key-point matches to estimate the homographies [11] [12]. Specific domain knowledge of lines and circles is to incorporate additional geometric features to make the estimation more robust. The system is used to process a sequence of 1000 frames, without error accumulation.  14  2.2.3  Play Annotation System  In this section, we review the hockey high-level semantic data annotation system of Li and Woodham. It is a proof of concept system to represent and reason about hockey play. [24]. The system takes rectified player tracking data as input. The raw motion trajectory data is manually augmented to include knowledge of forward/backward skating, puck possession and specific player attributes (i.e., shoots left, shoots right). A Finite State Machine (FSM) models the state and possible player actions in specific play situations. Some state transitions corresponds to specific player action (e.g., pass, shoot) and have an associated Event Evaluation Function (EEF) to assign a score based on the spatial-temporal play evaluation. The system describes what happened in each situation, assesses the outcome, estimates when and where key play choices were made, and attempts to predict whether better alternatives were available. A textual natural language description and simple 2D graphic animation is provided. The system focuses on high-level reasoning associated with hockey strategy, assuming puck possession data as input. Determining puck possession and location automatically represents a significant system enhancement.  2.3  Puck/Ball Tracking  In this section, we review other work related to puck/ball tracking in sport video. Section 2.3.1 describes a previous approach to puck tracking not based on computer vision. Section 2.3.2 describes previous work on ball tracking in soccer.  2.3.1  Tracking the Puck  Due to the perception that the puck would be difficult to follow during televised games, the Fox TV network introduced their FoxTrax system. Cavallaro describes the story about how the FoxTrax project started and the technology used [5]. FoxTrax uses a special puck with embedded electronics. Their puck behaves the same as a regular puck but contains a shock sensor and infrared emitters. During the broadcast, infrared pulses emitted by the puck are detected by 20 pulse detectors and 10 modified IR cameras located in the rink rafters. Shuttering of the IR cameras is synchronized to the pulses. Coordinates data are first locally processed by 15  each IR camera and then transmitted to the Puck Truck which contains computers to superimpose additional graphics to enhance puck visibility in the video output. The electronic devices allow the puck to be tracked robustly in real time but with high relative cost when the puck leaves the rink surface and is not retrieved.  2.3.2  Tracking the Ball  Previous work on ball tracking in soccer is relevant. We first look at methods to detect the ball separately in each frame. These methods typically involve online processing and could, in principle, be real-time. A second category of methods incorporates temporal information using constraints on motion consistency or trajectory likelihood. These methods typically involve offline processing (i.e., they are not causal) but can be more robust. Finally, we briefly review some other relevant work on ball tracking. Feature Based Tracking Compared to a puck, soccer ball contains more features, feature based ball tracking methods are considered. Ancona et al. use a detector to detect the ball during goal events in a soccer game by using Support Vector Machine for classification, in which the examples are views of the soccer ball [1]. The camera is mounted on the extended goal line and the classifier is trained to specifically detect the ball when a goal is scored. For this problem, the ball only needs to be detected in a fixed region. D’Orazio et al. use circle Hough transform as a quick detector of ball candidates, which are then evaluated by a new neural classifier [8]. Since the Hough transform is fast, this makes the computation tractable. Similarly, Leo et al. use the circle Hough transform but combine it with SIFT [27] instead of a neural classifier [23]. Frame by frame puck detection is problematic. But, a feature based detection approach is useful to suggest puck candidates. Motion/Trajectory Based Tracking In this section, we review ball tracking based on trajectory or motion consistency. Yu et al. pioneered this approach [49]. They first detect ball candidates, which are further filtered based on features (i.e., size, colour) and used to generate candidate 16  trajectories. The trajectories are improved by filtering and filling the gaps, then selected iteratively by a scoring mechanism. We can review these steps separately. Candidates Detection.  For ball candidates detection, background subtraction is  widely used. The first step is play field extraction by methods based on, for example, mean colour value [46], or on colour histograms [14]. The second step is non-ball region removal by methods such as connected area analysis [46], coarseto-fine filtering by size, shape and colour [14], player tracking elimination [6], and rule-based candidates detection [36]. Tracking.  For ball tracking, three categories of methods are summarized.  • Graph Based Pallavi et al. first segment the video into distinct camera shots by zoom level. [35]. For shot camera shots, normally the close-ups which are not considered. For medium camera shots, they simply filter the candidates by motion patterns to identify the ball. For long camera shots, a directed weighted graph is constructed for the ball candidates. Each node in the graph represents a candidate and each edge links candidates in a frame with the candidates in the next few consecutive frames. Then, dynamic programming is applied to find the longest path, which generates the actual ball trajectory. • Kalman Filter A Kalman filter can be used to predict the location of the ball in consecutive frames [39] [41] [36]. The iterative predictor-corrector nature of the Kalman filter can be helpful for tracking viewed as a dynamic problem. The Kalman filter assigns a likelihood to each candidate based on features (i.e., speed, direction, size, colour) which is used to determine whether the candidate is accepted in the next frame. However, when the ball is in the possession of a player for a long time, the typical Kalman filter fails and produces an aggregate error. To overcome long period occlusion, Kim and Kim consider a Dynamic Kalman Filter [16].  17  • Graph Based + Kalman Filter The two approaches can be combined. Liang et al. use two alternating procedures [25]. For detection, a weighted graph is constructed with each candidate as a node and each edge linking two nodes in adjacent frames. A Viterbi algorithm extracts the optimal path as ball locations. For tracking, a Kalman filter based on template matching is used to track the ball initialized by the detection results. In each tracking step, ball location is verified to update the template and to guide ball re-detection. Tong et al. achieve robust and real-time ball tracking [45]. Each ball candidate is tracked by a Kalman filter in successive frames. This results in lots of initial trajectories which thereafter are scored and filtered according to their length and relationship in time. Then, a distance graph is constructed, in which each node represents an initial trajectory and each edge has distance between two nodes, the Dijkstra algorithm is used to get the shortest path including a collection of non-overlapping trajectories in time which are connected and smoothed by cubic spline interpolation. Other Ball Tracking Approaches Some other work in soccer ball tracking is worth mentioning. Liu et al. estimate 3D ball positions from monocular broadcast soccer video [26]. The ball is tracked by combination of a Viterbi decoding algorithm and a Kalman filter. The method calibrates the camera’s position in the stadium via a homography. Calibration is also done for camera zoom and rotation. The method can estimate the ball position in the air without referring to other reference objects with known height. Other work focuses on solving the extreme cases involving overlap and occlusion. Shimawaki et al. considers the case when the ball overlaps with players and lines by considering spatial-temporal relationships between players, lines, and the ball [42] [29]. Bennett et al. use spatio-temporal continuity and multiple-occupancy boxes in tracking player and the ball in basketball videos [2].  18  2.4  Motion Field  This section reviews relevant work on motion field analysis to identify critical points in the flow field. Rao and Jain introduce a symbolic descriptor of texture orientation to detect critical points for finger print recognition [37]. Corpetti et al. describe a method to detect singular points from synthetic and real dense motion fields generated from optical flow in satellite cloud images [7]. They decompose dense flow into irrotational and solenoidal components and find local extrema of velocity potentials as critical points. Tong et al. detect critical points by decomposing a vector field into curl free, divergence free, and harmonic fields based on Hodge Helmholtz decomposition [47]. The method is stable but requires expensive computation. Linear movements and curved motion happen frequently in sports. Kim et al. describe an approach to detect the locations where the play evolution will go to. They extract the ground level sparse movement of players, generate a dense motion field and detect convergence points for the dense motion field [17].  19  Chapter 3  System Overview This chapter presents an overview of our system. In order to analyze and annotate puck location and possession, two separate modules are included. See Figure 3.1. In the puck tracking module (see Figure 3.2), offline processing includes candidates detection, candidates filtering and puck tracking. This is used to generate the puck trajectory candidates. After one pass of online play simulation, transition trajectories are filtered and extended, the puck possession and final interpolated puck trajectory are generated which can then be used for annotation. In the motion field module (see Figure 3.3), online processing consists of motion field window computation, dense motion field construction, measurement propagation, convergence point detection and team possession reasoning. This predicts possible puck locations, play evolution and team possession. Puck tracking to generate accurate puck transition trajectories and puck possession is described in Chapter 4. Puck tracking uses both future and past information. This is processed offline (acausal) in the current implementation. We distinguish frame sequences where the puck is continuously visible from those where the puck is not visible from the current camera viewpoint. Player motion field analysis is described in Chapter 5. Motion field analysis has two complementary benefits: 1) It can be applied to any frame, whether or not the puck is visible, provided only that there is player motion. 2) It is estimated online (causal). The overall system includes data storage (i.e., log, database files), two processing modules and a GUI (see Section 1.4). 20  Data Input  START  Image Sequence Input Data Loading  Player Track Bounding Boxes Rink Homographies  Module A  Module?  Module A: Puck Tracking  Module B  Module B: Motion Field  Graphical Annotation  No, keep using  User Quit? Yes END  Figure 3.1: System overview. The system includes two major modules: Module A: Puck Tracking and Module B: Motion Field.  21  Module A: Puck Tracking Yes  Final Interpolated Puck Trajectory Exists? Yes  No  Final Transition Trajectories Exists? Yes  No  Transition Trajectory Candidates Exists? Yes  No Puck Candidates Exists?  No  OFFLINE  Data Output  Candidates Detection Puck Candidates  Candidates Filtering  Puck Tracking Puck Transition Trajectory Candidates  ONLINE Play Simulation Transition Trajectories Filtering and Extension  Final Puck Transition Trajectories Puck Trajectory Interpolation  Final Interpolated Puck Trajectory  Figure 3.2: Module A: puck tracking.  22  Puck Possession Reasoning  Module B: Motion Field Dense Motion Field Construction Motion Field Window Computation Player Velocity Computation RBF Interpolation  Component A  Component?  Play Evolution and Puck Location Prediction  Component B  Team Possession Estimation  Dense Motion Field Construction (Overall)  Dense Motion Field Construction (Team A)  Dense Motion Field Construction (Team B)  Measurement Propagation  Convergence Point Detection  Team Possession Reasoning  Camera Auto-Control  Figure 3.3: Module B: motion field.  23  Chapter 4  Motion-Based Puck Tracking This chapter describes the puck tracking module from which puck possession is inferred. Vision-bask puck tracking creates challenges (see Section 1.3.1). Our tracking method incorporates both spatial and temporal information. In Section 4.1, we apply simple blob extraction and then eliminate blobs unlikely to be the puck using filters based on puck features and domain knowledge. This results in a collection of puck candidates. In Section 4.2, a hierarchical graph-based method is used to generate puck trajectory candidates which might represent puck transitions between players. In Section 4.3, a puck state engine is used to simulate puck state transitions for the entire sequence. This is used to determine puck possession and to rule out false alarm trajectories. Final puck trajectories are generated. Input is the raw video sequence, player tracking results and rink rectification homographies. Output is video annotation of puck possession for teams and players, puck locations and trajectories.  4.1  Candidates Detection and Filtering  Puck detection is challenging. Motion blur causes the shape, size and intensity of the puck to vary across frames. Lighting influences puck and rink intensity. The puck overlaps rink features with similar colour and scale (i.e., lines, logos, players, sticks, shadows). This both complicates detection of the actual puck and introduces additional puck-like candidates as false positives. In this section, we  24  focus on detecting the puck when it is visible to the camera. Full occlusion is not considered here and is left to the tracking algorithms. The goals of this section are: • To detect the puck when it is visible. • To keep the number of false positives manageable. The detection results are called puck candidates. These are the input required in Section 4.2.  4.1.1  Preliminary Blobs Extraction  Puck candidates detection begins with feature detection using standard image processing techniques. Even in HD hockey broadcast video, the puck is small and lacks distinctive local features. Motion blur, changing lighting conditions and variation of ice surface background all make standard methods such as template matching, Hough circle transform and SIFT unsuitable. What can be said is that the puck is black and normally has low intensity. The puck’s visual appearance is deformed due to motion blur and perspective. But, the width-to-height ratio of its bounding box, and its overall size still is within known limits. We extract preliminary blobs based on intensity, size and shape for further processing. Grey Scale Based Blob Detection The puck normally has low intensity especially when it moving slowly. Simple grey scale thresholding would be the fastest and most straightforward approach. However, the apparent puck intensity does increase with motion blur and is affected by changes in local light condition. Also, there are other features in the background with similar intensity to the puck (i.e., shadows, line markings on the ice, other logos, and brackets for the protective glass). Global thresholding will both lose the puck, when visible, and introduce many false positives. We detect based on local intensity gradient. Two methods are included in our system. They both work well in our experiments. We use the latter one to run results shown in Chapter 6. Difference of Gaussian Filter with Global Threshold. The Gaussian filter is widely used to smooth an image. A Difference of Gaussians (DoG) filter is used for edge 25  detection. The following formula shows the centered two-dimensional case, where σ1 , σ2 are the standard deviations (blur radii) of the two Gaussian distributions: 2  2  2  2)  1 − (x 2σ+y2 ) 1 − (x 2σ+y2 1 2 f (x, y, σ1 , σ2 ) = e e − 2πσ12 2πσ22  (4.1)  We use a DOG filter to detect blobs with center darker than their surrounding areas. We apply the DoG filter on grey scale image generated by Matlab rgb2gray function for each frame, experimentally, we set the standard deviations to σ1 = 2 and σ2 = 1. Our implementation does Gaussian blur in Matlab. The puck is almost always successfully highlighted when it is visible. But, also highlighted are other objects (i.e., player body parts, sticks, logos, lines). We threshold the DoG filter results to produce a binary image which sets the puck and other highlighted regions to 1, and background to 0. The threshold value is determined experimentally as 0.015 (within a normalized grey scale range [0, 1]). The threshold value will depend on the test sequence. Ideally, it is big enough to keep highlighted blobs close in size to the real objects and introduce few false positives. Ideally, it also is small enough not to split large connected components into small pieces which could introduce more puck-like objects. Figure 4.1 shows the grey value image and the binary image generated after DoG filtering and thresholding. In this example, the puck shape and intensity are attenuated by motion blur. Our filter retains the puck pixels in the binary image generated. Adaptive Thresholding. Global thresholding uses a fixed threshold to separate expected foreground pixels from the background based on grey value difference. If the grey-level histogram has two well separated peaks, global thresholding can work well. However, it does not account for variations in puck intensity and strong illumination gradients that do occur in broadcast video sequences. Adaptive thresholding determines a local threshold T (x, y) based on the local neighborhood of pixels intensity by computing their weighted sum with Gaussian kernel, then subtracting by a constant C. Pixels with grey value smaller than T (x, y) are set to 1, otherwise 0. This highlights regions locally darker than their surroundings. In our experiment, we set the local window size S to 113 since in most cases  26  (a)  (b)  Figure 4.1: DoG filter with global threshold. Figure shows (a) the grey scale [0, 1] input image (generated by rgb2gray function in Matlab). (b) the binary image generated after DoG filtering and thresholding. The puck is moving. Motion blur attenuates intensity and shape. Blur radii are set to 2 and 1. Global threshold is 0.015. Red rectangle labels the puck. the size of the puck and its surrounding area is smaller than this value in our test sequence. The constant C is empirically set as 0.1 (within a normalized grey scale range [0, 1]). Puck detection based on grey value alone can still fail when the puck crosses logos and other rink markings. Detection based on colour is a possibility. But, that is not done in the current implementation. In our implementation, if the actual puck is missed continuously for less than 3 frames, puck detection can be recovered by tracking, as discussed in Section 4.2. But, if puck detection fails for a longer sequence of frames as it can occur when the puck moves along the blue line, the entire trajectory may fail to be tracked. In our test sequence, if we use the blue channel, the puck can be distinguished from the blue line easily. We apply our filter to the blue channel. Figure 4.2 shows two groups of results, global and adaptive thresholding, using grey scale image (generated by Matlab rgb2gray function) and blue channel image. Connected Components Labeling In order to extract blobs from the binary image generated by above grey scale based blob detection, we use Connected Components Labeling (CCL) to find the connected region and label the associated blobs. CCL is convenient to compute blob size and centroid. For images discretized on a square grid, connectivity can be defined as edge adjacent (4 connected) or as both edge and corner adjacent 27  (a)  (d)  (b)  (e)  (c)  (f)  Figure 4.2: Adaptive thresholding. Two groups of results. Left: (blue channel) (a) source image; (b) global thresholding (T = 0.5); (c) adaptive thresholding (S = 113, C = 0.1). Right: (adaptive thresholding S = 113, C = 0.1) (d) source image; (e) grey scale image (Matlab rgb2gray) as input; (f) blue channel as input. Red rectangle labels the puck. (8 connected). See Figure 4.3. We use the bwlabel function in Matlab with 4 connectivity. Size/Shape Based Filtering Blobs extracted from the connected component analysis which are too big or too small are rejected. The thresholds for size filtering are determined according to the input image resolution and camera zoom. In our test sequence, camera zoom is relatively constant. The image resolution is 1920 x 970. However, the apparent 28  (a)  (b)  Figure 4.3: Connected components labeling connectivity. Blue pixels connect to P. (a) 4 connectivity (b) 8 connectivity  (a)  (b)  (c)  Figure 4.4: Size and shape filter. The actual puck, when visible, is never filtered out in our test sequence (a) input binary image (b) size filter [30, 150] (c) shape filter (30 x 30 box)  puck size does vary due to motion blur, camera zoom and grey scale thresholding. We assign broad size thresholds. Blobs with area size between 30 and 150 pixels are kept. These thresholds keep the actual visible puck and only miss 37 frames in our 1000 frame test sequence. But, different thresholds would be required for different camera resolution and zoom. The apparent shape of the puck varies. We use 30x30 as a rough estimate of the maximum puck bounding box for our test sequence. Figure 4.4 shows the blobs after size and shape filtering, the number of remaining puck candidates varies across frames, but it is approximately more than 100, sometimes even more.  29  4.1.2  Domain Knowledge Based Filtering  In Section 4.1.1, blobs are extracted. Thresholds are set so that the puck, when visible, is almost never eliminated. However, the consequence of wide thresholds is the inclusion of many false puck-like candidates. We apply a group of filters based on domain knowledge to filter out false candidates as much as possible. These filters typically reduce the number of candidates in each frame from more than 100 to less than 20 or sometimes even to single digits. We briefly explain each filter. Rink Image Knowledge Surrounding Area Analysis.  When the puck is visible and moving on the ice, its  surrounding area typically is also ice. Here, we care about the puck when it is a single object isolated from other objects. Surrounding Area Analysis (SAA) helps to eliminate the candidates that are part of another object or caused by thresholds that split large objects into small pieces. Algorithm 4.1 show details of surrounding area selection and examination for each frame. It is iteratively applied to the entire test sequence. We define pixels that have a absolute distance of two to the blob boundary as circumference enclosing the blob which, in turn, is used to select surrounding pixel (pickSurroundingPixels is used in Algorithm 4.1). To acheive this, we use the connected components labeling results and compute the circumference by using two expanding matrices Expand1 , Expand2 .  0  0  exp1 =  0  0    0 0 0 0 0   0 1 0 0 0    1 1 1 0 exp2 =  1   0 1 0 0 0  0 1 0 0     1 1 1 0  1 1 1 1   1 1 1 0 0 0 1 0 0  0 0 0 0 0  (4.2)  We identify pixels as ice based on intensity and colour (isPixelAcceptasIce is used in Algorithm 4.1). A blob is accepted as isolated if at least 90% of its surrounding pixels are identified as ice. Generally, ice has high intensity with known markings such as yellow and blue lines. We set the rules as intensity minimum is 0.43 (within 30  a normalized grey scale range [0, 1]) or average RGB value absolute difference with standard blue (0,0,1) or yellow (1,1,0) smaller than 0.3. Algorithm 4.1: filterbySurroundingAreaAnalysis Input: inMask, source image I Output: outMask, centroids ccl ←− Connect Component Labeling f or inMask outMask ←− image.empty(size[inMask]) foreach k ∈ max(max(ccl) do bbk ←− ccl[k].boundingbox ck ←− ccl[k].cendroid if isBoundingBoxwithinImage(bbk , I) then mk ←− cropSmallImage(bbk , inMask) ik ←− cropSmallImage(bbk , I) circum f erence ←− (conv2(mk , exp2 ) > 0).∗(conv2(mk , exp1 ) == 0) surrik ←− pickSurroundingPixels(circum f erence, ik ) total ←− 0 count ←− 0 foreach pixel ∈ surrik do if isPixelAcceptasIce(pixel) then count ←− count + 1 end total ←− total + 1 end if count total > 0.9 then outMask.add(ccl[k]) centroids.add(ck ) end end end  Hough Line Transform. We assume motion based on Newton’s first law. Namely, “the velocity of a body remains constant unless the body is acted upon by an external force”. Ice has near zero-friction. When the puck is moving on the ice and has no contact with other objects, its velocity is approximately constant. By the same argument, blobs arising from straight lines in the original image could create false positive puck trajectories. The Hough line transform is a good technique to detect 31  straight lines given a set of candidate edge points. We use points generated by edge detection and the Hough line transform in OpenCV. Rink Rectification and Homography We assume that the puck is confined to lie on the ice surface and seek the transformation between 2D image coordinates and 2D locations on the ice surface. We need a geometric model of the rink to define a world coordinate system. Rink dimensions differ for various hockey organizations (i.e., International Ice Hockey Federation, National Hockey League (NHL)). Here, use the rink model defined by the NHL rule book [21]. Figure 4.5 shows the rink dimensions and colours of required markings. Rink rectification defines the geometric transformation for 2D points in the image coordinate system to 2D points in the rink model. The transformation is a homography. Homogeneous coordinates are used to represent locations of points. Let pi = [xi yi wi ]T be a point in the image coordinate system. Let pg = [xg yg wg ]T be the corresponding point in the rink model. Equation 4.5 defines the geometric transformation between them where H (see Equation 4.6) is a 3 × 3 matrix. pg = H pi  (4.3)  Each frame has its own transformation matrix H. Gupta et al. used point, line and ellipse features to estimate the homography for each frame in our test sequence [11] [12]. We use Gupta et al.’s results for rink rectification. The rectification results have errors in some frames. Our system is intended to be robust to occasional rectification errors. We store the homographies in a database, indexed by frame id, and dynamically load the required transformations at run-time.   h11 h12 h13   H = h21 h22 h23  h31 h32 h33  (4.4)  The centroids of puck candidate blobs are obtained in image coordinates. All these puck candidates are transformed to rink coordinates for further processing. Some  32  (a)  (b) Figure 4.5: NHL official rink model. (a) the dimensions of the ice surface extracted from the NHL rule book; (b) the rink model used by our system with the same dimensions as in (a).  33  puck candidates transform to coordinates outside the rink surface. This is also used as a filter to eliminate candidates outside of the rink (e.g., audience seats). Rink Model and Player Knowledge Player Bounding Box. Okuma et al. provide the player tracking results in image coordinates as another input. These are also stored in a database with the player bounding boxes indexed by track id, frame id. Player position is calculated by transforming the foot point of the bounding box to rink coordinates. So far, we have not used player tracking results. Subsequent analysis will consider puck transitions between players (e.g., passes) (see Section 4.2). Lots of puck-like candidates are created by objects that are part of the player (i.e., head, uniforms, skates, stick). Tracking the puck when it is in the control of a player is difficult. When a player has control (with feet or stick), puck possession is not in doubt. We filter out candidates which are within a player’s bounding box in image coordinates. A sideeffect is that the puck trajectory segment is truncated at its endpoints near players. However, this is dealt with later, in tracking extension. Static and Oscillating Candidates.  The filters described remove many false pos-  itive puck candidates. The actual puck is rarely lost, provided it is visible and isolated from other objects. There is one other category of false positive puck candidates that we filter. These are stationary puck-like features that persist from frame to frame (e.g., brackets on the protective glass, parts of logos). While stationary in the world, these candidates may not appear entirely stationary due to thresholding effects, and errors in the homography. Typically, stationary candidates will oscillate frame to frame within a small radius expressed in rink coordinates. Stationary candidates do negatively impact path tracking (see Section 4.2) in two possible ways: 1) we get the wrong answer 2) the computational cost of graph-based analysis increases considerably. We try to eliminate them here. For each frame, we check its 10 consecutive neighborhood frames. If candidates from these frames appear within radius (1.5 pixels) more than twice, we remove all these candidates. The observed radius value is good to estimate near static candidates and meanwhile avoid eliminating the real moving puck. Again, this might not be able to remove all 34  the static candidates. But, it removes enough to reduce the number of false alarm puck trajectories, see detail in Algorithm 4.2. Algorithm 4.2: filterStaticandOscillatingOnes Input: puck candidates dictionary cDict Output: cDict stCount ←− 3 stRange ←− 5 stRadius ←− 1.5 foreach f rame ∈ [1, length o f cDict] do foreach currnode ∈ cDict[ f rame] do stNum ←− 0 stList ←− empty foreach f ∈ [ f rame − stRange, f rame + stRange] do if f = f rame then foreach node ∈ cDict[ f ] do if distance(currnode, node) < stRadius then stNum ←− stNum + 1 stList.add(node) end end end end if stNum >= stCount then cDict.remove(currnode) foreach node ∈ stList do cDict.remove(node) end end end end  4.2  Hierarchical Graph-Based Puck Tracking  Section 4.1 describes how to generate a set of puck candidates for each frame. The next task is to identify the real puck trajectory. This is complicated by the fact that the puck may not be visible in sub-sequences of frames when it is occluded 35  or outside the camera view. We divide puck trajectories into two categories: transition trajectory (visible), occlusion trajectory (not visible). As a Chinese idiom says, “Saving the nation through twisted means”. We track the puck when it is visible to generate puck transition trajectory candidates. We use them to estimate puck possession status and to eliminate false alarm puck trajectories. Puck transition trajectories are combined with player tracking trajectories to interpolate the complete puck trajectory. The goal of this section is to find the puck trajectories which represent puck transitions between players, or passes.  4.2.1  Basic Idea Every body persists in its state of being at rest or of moving uniformly straight forward, except insofar as it is compelled to change its state by force impressed. — Isaac Newton  To a first approxmation, ice is a frictionless surface and we assume the puck moves with constant speed and direction when passed from one player to another. Accordingly, transition trajectories will be straight lines in the rink coordinate system. We extend the trajectory extension approach in [49] to build a weighted graph with edge weight incorporating both the direction and speed difference between consecutive frames. Dynamic programming is used to find the longest paths as trajectory candidates. These trajectory candidates are further filtered based on trajectory score, frame coverage and other domain knowledge to generate the puck transition trajectory candidates.  4.2.2  First-Level Graph Construction  We compute puck candidates coordinates in the rink coordinate system. The centroid of each detected blob defines its location. Each frame generates a collection of puck candidates. A first-level graph G1 (V, E) is built to connect puck candidates nodes between consecutive frames. Algorithm 4.3 describes the process. Generally, nodes at frame t connect to nodes at frame t +1 if the distance between them is within some radius. We define this radius based on knowledge of puck speed (the NHL record is approximately 170 km/h) [22], the standard broadcast frame rate 36  (30 fps), rink width (approximately 30 metres), rink model width in our system (168 pixels). maxRadius =  rinkModelWidth 1 1000 ×{ × maxSpeed × } groundWidth rate 3600  (4.5)  = 8.81 The puck may be missed for a few frames because of occlusion or detection error. For nodes at frame t, we also search the nodes at frame t + 2, t + 3, ..., t + maxFrameJump. In our experiments, we set the maxFrameJump as 3 which is a good compromise between filling the gaps and not connecting nodes that are too far away. This produces a graph of possible transitions of the puck from one frame (we assign the transition edge to this frame) to the next a few frames.  4.2.3  Second-Level Graph Construction  G1 (V, E) represents the possible puck transitions between consecutive frames. We build a second level graph G2 (V, E) which is a weighted directed acyclic graph. In G2 (V, E), we create a node called edgeNode for each corresponding to an edge in G1 (V, E). We create an edge called edgeEdge and assign to it a unique weight calculated by comparing the speed and direction of the associated pair of edgeNodes according to the following empirical formula weight(e1 , e2 ) =  1 2  λ (1 + ∆v) + (1 − λ )(1 + 4∆θ )2  (4.6)  where ∆v and ∆θ are respectively the absolute value of speed (pixels per frame) and direction (radians) differences between the two edgeNodes. Constant λ is used to balance the trade-off between speed changes and direction changes. We assign a bigger weight to the direction term than to the speed term since we want to be more sensitive to direction. That is, we want the trajectory to be as straight as possible on the ice surface. We choose λ = 0.3. Equation 4.8 constrains the edgeEdge weight to be within the range [0, 1]. Bigger changes in direction or speed have smaller weight. Algorithm 4.4 shows how we build G2 (V, E) based on the connectivity of G1 (V, E) and the empirical restrictions placed on ∆v and ∆θ .  37  Algorithm 4.3: buildGraphforCandidiatesNodes Input: candidates dictionary cDict Output: G1 (V, E) : V ←− cDict, E ←− edgeDict maxFrameJump ←− 3 maxRadius ←− 8 minRadius ←− 1 foreach length ∈ [1, maxFrameJump] do foreach f rame ∈ [1, length o f cDict] do edgeDict[ f rame] ←− empty foreach prevnode ∈ cDict[ f rame] do A ←− prevnode foreach nextnode ∈ cDict[ f rame + length] do B ←− nextnode AB dist ←− length if dist ∈ [minRadius, maxRadius] then edge ←− Edge(prevnode, nextnode) edge.speed ←− dist edge.direction ←− arctan( y component o f x component o f edge. f rameskip ←− length edge. f rame ←− f rame prevnode.edges.add(edge) edgeDict[ f rame].add(edge) end end end end end  AB ) AB  Due to frame-to-frame homography errors, we still allow connect nodes within a range of direction and speed changes. For direction, we assume ∆θ should be less than 20◦ , approximately 0.35 rad. For speed, we use ratio between ∆v and the minimum of the two speeds. We assume this is less than a threshold (in our case, 0.5). These assumptions are good enough to reflect our physical model, while, at the same time, accommodating typical errors in the homography estimate.  38  Algorithm 4.4: buildGraphforEdgeNodes Input: G1 (V, E) Output: G2 (V, E) : V ←− edgeDict, E ←− edgeEdgeDict foreach f rame ∈ [1, length o f cDict − 1] do index the edgeNodes edgeEdgeDict[ f rame] ←− empty foreach e1 ∈ edgeDict[ f rame] do nextnode ←− e1 .next foreach e2 ∈ nextnode.edges do ∆v ←− e1 .speed − e2 .speed ∆θ ←− e1 .direction − e2 .direction ∆θ ←− min(∆θ , 2π − ∆θ ) if e1 .next equals e2 .prev then ∆v if min(e1 .speed,e < 0.5 and ∆θ < 0.35 then 2 .speed) edgeEdge ←− EdgeEdge(e1 , e2 ) edgeEdge.∆v ←− ∆v edgeEdge.∆θ ←− ∆θ edgeEdge.weight ←− weight(e1 , e2 ) edgeEdgeDict[ f rame].add(edgeEdge) end end end end end  4.2.4  Trajectory Candidates Estimation  Compared to false trajectories, real puck transition trajectories on the ice surface are straighter and have more constant speed. Given two vertices in G2 (V, E), such that there exists a path between them, the bigger the path length the greater the chance that the path is a puck trajectory. Dynamic programming is used to find the longest path between two given vertices. Instead of manually specifying the end vertices, we modify the algorithm to do weight propagation. After tracing back all longest paths, a group of filters is used to select puck transition trajectory candidates.  39  Figure 4.6: Weighted directed acyclic graph. An example of graph G2 (V, E). Each edgeNode belongs to one frame. edgeEdges are created based on the connectivity of G1 (V, E). The bigger the change of speed or direction for each edgeNode pair, the smaller the weight value. All weights are within the range [0, 1]. Longest weighted path is highlighted. Longest Paths Propagation Figure 4.6 shows an example of a weighted directed acyclic graph G2 (V, E). Each vertex represents an edgeNode that belongs to a frame, each edge represents the weighted edgeEdge. Algorithm 4.5 describes longest paths propagation. For each entry in the propagation table, we trace back its previous vertices iteratively to get a list of transitions (edgeNodes). We then generate the trajectory of the associated puck candidates coordinates in time order. We include start and end frame indices as well as the trajectory score given by the propagation length of the table entry. Rule-Based Trajectories Filtering For each trajectory, its end vertex corresponds to an entry in the propagation table. We filter all the trajectories to decide to accept or to reject. First, we assume that any trajectory that represents a puck transition between players will cover a number of frames and have a large enough score. In our experiments, we say that a trajectory covers at least 5 frames and the minimum score is bigger than 0.2 × 5 = 1. Here, 0.2 is the minimum accepting weight for an edgeEdge (i.e., 40  Algorithm 4.5: longestPathsPropagation Input: G2 (V, E) Output: table table ←− array with |V (G)| elements(length ←− 0, prev ←− None) foreach f rame ∈ [1, length o f cDict − 1] do foreach v ∈ edgeDict[ f rame] do foreach e ∈ edgeEdgeDict[ f rame] do if e.prev equals v then w ←− if table[w].length ≤ table[v].length + e.weight then table[w].length ←− table[v].length + e.weight table[w].prev ←− v end end end end end  when ∆θ = 0.35, ∆v = 0.5). A puck transition trajectory is a pass from one player to another player. The distance of the puck from one player to the other will be changing through time. We filter out trajectories where the distance of the puck to one of the players is more or less constant through time (as can happen with the candidate corresponds to the hockey stick). Another filtering step is to examine trajectories that overlap in the temporal dimension. When there is overlap, we keep the one with maximum score and remove the others. Finally, we have the group of puck trajectory candidates used to simulate the play evolution and to trigger puck possession. At this stage, there are still false alarms. Further filtering is done dynamically during the play simulation over the entire sequence.  4.3  Puck Trajectory and Possession  We combine the puck transition trajectory candidates described in Section 4.2 with the player tracking provided as input to estimate the puck transitions for the entire test sequence. Section 4.3.1 defines a puck state space and the corresponding transition rules. Section 4.3.2 discusses the annotation information, play simulation,  41  trajectory filtering and extension and puck possession generation. The final puck trajectories are generated by interpolation.  4.3.1  Puck State Engine  Necessarily, a puck exists and is unique at any time (frame) during hockey play. We define puck states suitable for an entire sequence. Our system uniquely assigns a state to each frame at run-time. We define a Puck State Engine (PSE) including puck state space and transition rules between states. The PSE is represented by a quintuple (Σ, S, s0 , δ , F). The following describes each of them: • Σ: input sequence (indexed by frame number), puck transition trajectories candidates, player tracking results • S: {PlayerHold, TrackBegin, OnTrack, TrackEnd, TrackExtend, Init} • s0 : Init, extra virtual state (video loaded, before start play) • δ : subset of (FI, T ), FI stands for frame increase, T are transition rules • F: final accepting states {PlayerHold, TrackEnd, TrackExtend} Given as input the video sequence, puck transition trajectory candidates and player tracking results, the PSE steps through each frame, in order. Puck state changes according to our transition rules. FI stands for frame increase and T is the set of transition rules (see Table 4.1). Figure 4.7 shows the engine simulation. For each frame increase, {T 1, T 2, T 3, T 4} are examined. When a transition trajectory candidate starts at the current frame, the engine goes to TrackBegin state, {T 5, T 6} are examined to accept or reject the trajectory candidate as trajectory filtering. When a transition trajectory candidate stops at the current frame, the engine goes to TrackEnd state, {T 7, T 8} are examined to determine possession change or trajectory extension. Similarly, when the engine goes to TrackExtend state, {T 9, T 10} are examined to determine possession change or continue with trajectory extension. See the criteria for {T 5, T 6, T 7, T 8, T 9, T 10} in Section 4.3.2.  42  Table 4.1: Transition rules definition for PSE Label  Definition  T1 T2 T3 T4 T5 T6 T7 T8 T9 T10  if a transition trajectory candidate starts at current frame if a transition trajectory candidate stops at current frame if a transition trajectory candidate is at current frame, not T1 or T2 if no transition trajectory candidate at current frame if the transition trajectory candidate is accepted if the transition trajectory candidate is rejected if the transition trajectory candidate end node close enough to a player if the transition trajectory candidate end node far from any player if the extend node close enough to nearest player if the extend node far away from any player  Figure 4.7: Puck state engine. Init is the initial state. Small dot labels the current state (start from Init when a new sequence is loaded). Double circles label final accepting states. Arrows label the transitions.  4.3.2  Dynamic Annotation and Analysis  In this section, we correlate the puck state space with trajectory filtering, trajectory extension and possession reasoning. Possession annotation is real-time and dynamically updated. Trajectory filtering and extension generate the final puck transition trajectories. They are further combined with puck occlusion trajectories to interpolate the full puck trajectory. We create visualization for puck possession and trajectory annotation.  43  Annotation Content Annotation is divided into two parts: puck possession and puck location. Puck possession includes both team possession and player possession. Puck location includes current puck location, puck transition trajectories and dynamic updated full puck trajectory. Specifically, the annotation provides: • Current puck state • Name of the team that possesses the puck (i.e., NYR) • Aggregate possession time for both teams • Track ID of the player who possesses the puck • Label for the player who possesses the puck in both rink and image coordinates • Highlighted current puck location • Puck trajectories that represent transitions between players • Dynamic updated puck full trajectory since the start of play Trajectory Filtering and Extension In Section 4.3.1, transition rules {T 5, T 6, T 7, T 8, T 9, T 10} rely on the criteria of accepting or rejecting a transition trajectory candidate, possession changing or trajectory extension. Moreover, sometimes a puck transition trajectory candidate is truncated compared to the real transitions. For the end near the start node, this is not a serious problem, since the distance from the start node to a player is only used to accept or reject a new trajectory. A coarse distance threshold, 0.3 × rinkModelWidth, works well enough for this case. This is exactly the trajectory filtering criteria to accept or reject a transition trajectory candidate. However, for the end near the stop node, we use a stricter distance threshold to determine whether the puck is close enough to another player to designate that player as the pass receiver and hence signal a change in puck possession. This is not precisely correct according to the official hockey rules defining puck possession. But, it 44  proved a sufficient approximation for our purposes. A better idea is suggested in Section 7.2. Based on observation for stick length (approximately 1.60m [21]), human body and arm extension length (i.e., 1m), we set this distance threshold to be:  3m × rinkModelWidth = 0.1 × rinkModelWidth rinkWidth  (4.7)  We use our physical model to improve the estimation of a puck transition trajectory near its end point. At the end of the trajectory, we compute the velocity of the puck, and use the end node as seed. We keep extending the trajectory for consecutive frames based on Newton’s first law until the extended node is close enough to a receiving player. Puck Possession and Full Trajectory Determination As the puck tracking module parses the input sequence, possession changes, as required. Additional puck transition trajectory candidates are filtered out that violate play consistency. Puck transition trajectory candidates accepted as actual transitions between players are added to full trajectory list. These transition trajectories are used to determine puck possession. The time intervals between them are approximated by the trajectories of player who holds the puck during that interval (occlusion trajectories). During trajectory extension, extended nodes are added into the full trajectory list as well. Thus, for each frame, we have coordinates of the full puck trajectory up to this frame. We apply the Laplacian temporal smoothing to the list of coordinates to generate and smooth the full puck trajectory which is updated in real-time. At the end of play simulation, we have the full puck trajectory for the entire sequence logged and displayed. Visualization of possession annotation and the puck transition or interpolated full trajectories are dynamically updated. Algorithm 4.6 describes puck possession and puck trajectory determination. But, it does not include details of the visualization.  45  Algorithm 4.6: puckPossessionTrajectoryDetermination Input: data, puckTra js, playersDict, PSE(Σ, S, s0 , δ , F) Output: poss, puckTra js, f ullTra j poss, f ullTra j ←− empty foreach f ∈ data do state ←− PSE.stepRun(puckTra js, playersDict) tra j ←− getCurrentTra j(puckTra js, f ) or None if state equals PSE.TrackBegin then if T 6 then PSE.PlayerHold ←− True; puckTra js.remove(tra j) end if T 5 then PSE.OnTrack ←− True; f ullTra j.add(tra j.coords) end end if state ∈ {PSE.PlayerHold, PSE.Ontrack} then poss.player ←− getPlayerbyID(playersDict[ f ], poss.player.ID) f ullTra j.add(poss.player.coord) end if state equals PSE.TrackExtend then extendode ←− computeExtendNodeonGround(x0 , y0 , vx , vy , f , f0 ) f ullTra j.add(extendnode.coord) player ←− f indNearestPlayer(extendnode, playersDict[ f ]) if T 9 then PSE.PlayerHold ←− True; poss.player ←− player end end if state equals PSE.TrackEnd then player ←− f indNearestPlayer(endnode, playersDict[ f ]) if T 7 then PSE.PlayerHold ←− True; poss.player ←− player end if T 8 then poss.TrackExtend ←− True x0 , y0 , vx , vy , f0 ←− computeExtendParameters(tra j, f ) end end updateTeamPossession(poss.player) end  46  Chapter 5  Motion Field Analysis This chapter describes the motion field analysis module for puck location predication and team possession estimation. Our implementation is based on the method in [17]. We apply the method to broadcast hockey video which presents challenges due to the speed of player motion, a single, moving camera that provides only a partial view of the ice surface in each frame. Player tracking results are assumed as input. Section 5.1 introduces the dense motion field construction in rink coordinates. Motion field analysis is applied to two tasks: Section 5.2 discusses the play evolution and puck location prediction and Section 5.3 presents the team possession estimation. The motion field module is online processing. The methods in this chapter apply iteratively to each frame.  5.1 5.1.1  Dense Motion Field Construction Motion Field Window  In [17], three synchronized static cameras are mounted on a high scaffold. The top-view image is used to define the motion field. However, in broadcast video, data is streamed from a single view. This provides only a partial view of the ice surface that changes dynamically with camera rotation. Partial views mean that we do not have all the player information at any given time. Trying to calculate a dense motion field for the entire ice surface based on partial player information is  47  problematic. Instead, we estimate the motion field for a rectangular sub window corresponding to the current view. Algorithm 5.1 has two modes. By default, we use the first mode. Figure 5.1 shows the visualization. Algorithm 5.1: computeMotionFieldWindow Input: H : homography, I : image, G : rink model, players, mode(1) Output: m f Rect pts ←− empty if mode equals 1 then foreach player ∈ players do o f f set ←− 0.1 × rinkModelWidth pts.add(rinkRecti f ication([player.x, player.y], H)) end end if mode equals 2 then corners ←− [[0, 0], [I.width, 0], [0, I.height], [I.width, I.height]] foreach pt ∈ corners do o f f set ←− 0 pts.add(rinkRecti f ication(pt, H)) end end le f t ←− max(minx(pts) − o f f set, 0) right ←− min(maxx(pts) + o f f set, G.width) up ←− max(miny(pts) − o f f set, 0) down ←− min(maxy(pts) + o f f set, G.height) m f Rect ←− Rect([le f t, up], right − le f t, down − up)  5.1.2  Motion Definition  Okuma et al. provide a tracking box for each player defined in image coordinates [32]. For every tracking box in frame k, we use the centre of its bottom line to define the foot point coordinate pki (x, y), where i is the player index. We compute the 2D location in the local coordinates of the motion field sub window pˆ ki (x, y) ∈ Im f by pˆ ki (x, y) = Hk pki (x, y) + T  48  (5.1)  Hk is the homograhy for frame k. T is a linear transformation matrix from absolute rink coordinates to coordinates in the motion field sub window. As in [17], we define the player motion in the motion field as a velocity vector vki (u, v). To compute this motion vector, we first compute the 2D location for player i in frame k, and then search for the corresponding player location in a previous frame k − α according to the player’s track id. For stability, the integer α is chosen to be greater than 1. We choose α = 5. The velocity vector is defined as the difference between the above two locations: vki = pˆ ki − pˆ ik−α  5.1.3  (5.2)  Radial Basis Function Interpolation  In this section, we drop explicit mention of the frame index, k, to simplify the notation. Given a set of player 2D locations in the motion field {pˆ 1 , pˆ 2 , ...pˆ n }, and the corresponding velocity vector {v1 , v2 , }, we construct a smooth dense motion field that matches all of these velocities. This is a standard 2D interpolation problem. For convenience, we discuss ui , the x component of the velocity ˆ y)), and matching constraints f (pˆ i ) = ui first. Assume we have a function f (p(x, for 1 ≤ i ≤ n. We use Radial Basis Function (RBF) interpolation to express the interpolation function as: n  ˆ = ∑ λi φ ( pˆ − pˆ i ) + λn+1 + λn+2 x + λn+3 y f (p)  (5.3)  i=1  where, φ (r) is the radial basis function. We compared the thin-plate spline: φ (r) = r2 logr and the multiquadric: φ (r) =  1 + (εr)2 (where ε approximates the av-  erage distance between sample points). For us, the multiguadric generated better results. Players typically are irregularly spaced on the ice surface. The thin-plate spline performs poorly when the number of sample points is small and not distributed uniformly. A multiquadric does better in these circumstances. Recall, we define the motion field for a dynamically moving sub window of the ice surface. If we try to use the entire ice surface to define the motion field for a given frame then, players will cluster in some region. A thin-plate spline generates poor results in this case. Even the multiquadric can have significant oscillation at the boundary 49  resulting in unexpected flow and false convergence points. To compute the coefficient λi for 1 ≤ i ≤ n, we use the matching constraint f (pˆ i ) = ui for 1 ≤ i ≤ n and the following boundary conditions for 2D RBF interpolation: n  ∑ λi = 0  i=1 n  ∑ λi x = 0  (5.4)  i=1 n  ∑ λi y = 0  i=1  Now we have n + 3 linear equations which can be written as the linear system:   φ ( pˆ 1 − pˆ 1 ) φ ( pˆ 1 − pˆ 2 ) · · ·  φ ( pˆ 2 − pˆ 1 ) φ ( pˆ 2 − pˆ 2 ) · · ·   .. .. ..  . . .  φ ( pˆ − pˆ ) φ ( pˆ − pˆ ) · · ·  n 1 n 2   1 1 ···   x1 x2 ···  y1  y2  ···      φ ( pˆ 1 − pˆ n ) 1 x1 y1 λ1 u1         φ ( pˆ 2 − pˆ n ) 1 x2 y2    λ2  u2  .. .. .. ..   ..   ..      . . . .  .   .      φ ( pˆ n − pˆ n ) 1 xn yn   λn   = un  (5.5)         1 0 0 0  λn+1   0      xn 0 0 0  λn+2   0 yn 0 0 0 λn+3 0  Solving this linear system gives the coefficients λi for 1 ≤ i ≤ n + 3 to use as the RBF kernel. For all pˆ 0 (x0 , y0 ) ∈ Im f , the velocity component in the x direction is computed as follows: n  u0 = ∑ λi φ ( pˆ 0 − pˆ i ) + λn+1 + λn+2 x0 + λn+3 y0  (5.6)  i=1  In our implementation, we combine the RBF interpolation algorithm in [4] with the open source Python package scipy.interpolate to compute the RBF kernel. The velocity component in the y direction, vi , is obtained in the symmetric way. We apply the RBF kernel to our sampled grid points Ps with a sample rate of one ˆ j) ∈ Ps . The motion sample per pixel to estimate the motion vector at each p(i, vectors are visualized as arrows in our GUI in both rink and image coordinates. ˆ = ui + v j. See Figure 5.1. The Motion field is denoted as Ω(p)  50  Figure 5.1: Motion field. An example of motion field Ω inset at the upper left of the display. Arrows represent the velocity vector at a sub-sample of the grid points. Inverse rectification is used for visualization in image coordinates (green arrows) as well. Red dot is the convergence point which will be discussed in next section. Temporal Smoothing Due to slight errors in frame-to-frame homography estimation, the dense motion field often has abrupt changes between frames. Small oscillations in player velocity affect the interpolation results. This creates inconsistency in the motion field and subsequent convergence points detection. To maintain motion field stability, we temporally smooth player velocity and interpolated motion vectors. We apply Laplacian temporal smoothing on the player velocity prior to RBF interpolation. We create a buffer with temporal length of 5 (the value of α in Section 5.1.2) to hold the most recent 5 frames of the motion field. A box filter with size of 5 is applied to smooth the motion vectors and the result is used as the motion field for the current frame and the current motion field window. Temporal smoothing stabilizes the motion field over time and decreases interpolation oscillation.  51  5.2  Play Evolution and Puck Location Prediction  This section is based on the dense motion field construction as described in Section 5.1. Section 5.2.1 defines and shows how to propagate a confidence measure along the motion field flow. Section 5.2.2 generates an importance table which is used for convergence points detection. We use the convergence points as our prediction of play evolution and puck locations.  5.2.1  Measurement Propagation  We implement measurement propagation as in [17]. Kim et al. use a value, ρi j , as a measurement of confidence in the estimate of the local velocity at point (i, j) in the motion field Ω. An importance table, Φ, is generated by forward propagation of the confidence ρ, in the motion field Ω. Propagation is iterative according to the following equation [17]: Φ(i + ui j , j + vi j ) = Φ(i + ui j , j + vi j ) + ρi j  (5.7)  Forward propagation in [17] tries to find motion flow convergence from view, not just from local extrema. Propagation allows locations with larger ρ to influence locations further away than those with smaller ρ. We add additional constraint, not in [17]. For any absolute location (a, b) in the motion field, a closer motion is treated as having greater influence since we only have a subset of the hockey players in any given camera view. The confidence is initialized with ρab . During forward propagation, we multiply an attenuation factor ε ∈ [0.8, 1.0] with the confidence so that it decreases along the motion flow. Confidence is propagated to another absolute location (m, n). Iteration stops when the attenuated confidence proportional to current ρmn is smaller than ε. We also limit the number of iterations to 10. The final step is normalization of the accumulated importance table Φ into a range of [0, 1] which is required for convergence point detection.  5.2.2  Convergence Point Detection  After normalization of the importance table Φ, table values are in the range [0,1]. A larger table value shows the flow merging regions in Ω. We threshold the table  52  values to get a collection of 2D coordinates Γ whose corresponding values in Φ are large enough. We use a threshold of 0.9. We use the discrete data set Γ as training data and apply meanshift clustering to get cluster centers as convergence points. The number of clusters can be more than one. These convergence points are added to the video annotation as our prediction of play evolution and as some possible locations that the puck will go to. In our implementation, we combine the meanshift algorithm in [10] with the open source Python package scikits.learn.cluster.  5.3  Team Possession Estimation  We use the player motion field to estimate puck possession. Player possession requires accurate puck location and player location. At best, motion field analysis is able to predict play evolution and possible puck location. However, it is not good enough to determine player possession. In the absence of accurate puck tracking results, our goal here is to estimate the team possession based only on the player motion field, player velocity and player location. To allow for online processing, we try to achieve this goal by using only information from the current and past few frames. Rather than trying to determine team possession frame by frame, we use the motion field to estimate when team puck possession changes. We compare the motion fields of both teams at the sample grid locations. We estimate the motion field for each team separately, as described in Section 5.1. RBF interpolation requires more than one sample point. Otherwise, the matrix given by Equation 5.5 is singular. We only compute the motion field for a team when there is more than one player from that team visible in the frame. When only one team has a computed motion field, we estimate the puck possession to be unchanged from the previous frame. When both teams have a motion field, we compute the angle between the two motion vectors at each sample grid location. These angles are classified as same or opposite according to whether the angle is less than or greater than 90 degrees. If the fraction of same pairs is larger than a threshold (we use 0.9), we estimate the puck possession to be unchanged from the previous frame. Otherwise, we use the direction of player motion and the distance to players on the opposite team to examine the possibility of a change in puck possession. Additionally, when  53  players visible in a particular frame are all from the same team, we estimate this team has the puck possession. Algorithm 5.2 describes team possession estimation. Algorithm 5.2: estimateTeamPossessionbyMotionField Input: H : homography, players, prevteamposs Output: teamposs teamA,teamB ←− seperateTeams(rinkRecti f ication(players, H)) m f Rect ←− computeMotionFieldWindow(teamA + teamB) [ΩA , ΩB ] ←− [empty, empty] if X ∈ {A, B} and len(teamX) > 1 then ΩX ←− rb f Interpolation(teamX, m f Rect) end if X ∈ {A, B} and isNotEmpty(X) and isEmpty({A, B} − X) then teamposs ←− X end if X ∈ {A, B} and isNotEmpty(ΩX ) and isEmpty(Ω{A,B}−X ) then teamposs ←− prevteamposs end if isNotEmpty(ΩA ) and isNotEmpty(ΩB ) then [opposite, same] ←− [0, 0] foreach p(i, j) ∈ sampleGridLocations(length = 5, m f Rect) do [v1 , v2 ] ←− [ΩA (i, j), ΩB (i, j)] ·v1 ∆θ ←− arccos |vv11||v × 180 π 2| if ∆θ > 90◦ then opposite ←− opposite + 1 else same ←− same + 1 end end same if same+opposite > 0.9 then teamposs ←− prevteamposs else teamposs ←− examinePlayerDirectionLocation(teamA,teamB) end end  54  Chapter 6  Results We test our system using the 1000 frame broadcast video sequence described in Section 1.2.1. This chapter presents the experimental results. Section 6.1 describes the ground truth used to evaluate the results of our experiments. Section 6.2 presents the experimental results and error measurements. Finally, Section 6.3 discusses overall system performance.  6.1  Ground Truth  Ground truth is obtained by human labeling of puck possession and puck location. For puck possession, we manually label the player possession and team possession for each frame in the data set. The aggregate possession rate for each team is calculated based on this temporal labeling. Based on the definition in Section 1.3.3, puck possession ground truth normally is easy to determine based on the puck transitions and the time when the puck merges with a player’s stick. For puck location, we use our GUI to track the puck frame by frame. Unlike puck possession, puck location labeling may not be possible in every frame. When the puck is visible, we label its bounding box and add it to the ground truth. When the puck is not visible, we label it only if its location can be reasonably estimated from the current or neighbouring frames. This happens when the puck is temporarily occluded by a small object such as a player’s body or a player’s stick. However, when the puck is continuously occluded for a significant period (e.g., occluded by the boards, not  55  in the camera view), we can not reliably estimate the puck location and no ground truth is generated for these frames. These frames are not taken into consideration for error measurements. For our test sequence, we are able to label ground truth for puck possession in all 1000 frames and for puck location in a subset of 871 frames.  6.2  Experiments and Error Measurements  In this section, we report, in detail, the experimental results for our test sequence. Section 6.2.1 assesses the accuracy of puck possession analysis using the puck tracking method described in Chapter 4. Section 6.2.2 evaluates the puck tracking results both in terms of formal error measurement and in terms of visualization of the estimated puck trajectories. Section 6.2.3 examines the motion field described in Chapter 5 and its ability to predict team possession and puck location.  6.2.1  Puck Possession  Puck possession includes both player possession and team possession. Here, we present the puck possession results for the puck tracking module only. The puck possession results are compared to ground truth. Table 6.1 gives player possession results in the frame intervals generated by our system. We define [α, β ] to be the frame interval that a particular player possesses the puck. The entries δ +, δ − in the errors column indicate the number of possession frames extra (+) or missed (-) for that particular row. Player and team ID are obtained from the player tracking results provided by Okuma et al. [32]. Given the team ID, we count team possession frames for both ground truth and our results. This is shown in Table 6.2. Also shown is the possession percentage for each team. The puck possession visualization interface and the aggregated possession estimation for the final frame are shown in Figure 6.1. Team possession ground truth and our results are identical. Accuracy is 100%. The total player possession error count is 21 frames. Here, accuracy is 97.9%. We count either δ + or δ − but not both to avoid double counting. Table 6.3 summarizes. Player possession accuracy relies on the accuracy of the puck tracking results. Possession reasoning based on Euclidean distance between puck and player may 56  Table 6.1: Puck player possession results for 1000 frames Team  Player ID  Ground Truth  Our Results  Errors  MTL MTL NYR NYR NYR NYR NYR NYR NYR NYR NYR NYR NYR  8 11 14 21 29 21 32 31 32 31 44 38 23  [1, 41] [42, 140] [141, 272] [273, 404] [405, 445] [446, 515] [516, 559] [560, 666] [667, 771] [772, 815] [816, 860] [861, 1000]  [1, 43] [44, 140] [141, 270] [271, 404] [405, 441] [442, 512] [513, 564] [565, 664] [665, 772] [773, 814] [815, 860] [861, 999] [1000, 1000]  2+ 222+ 44+, 33+, 5+ 5-, 22+, 1+ 1-, 11+ 11+  Table 6.2: Puck team possession for 1000 frames Team  Ground Truth  Our Results  Percentage  MTL NYR  140 860  140 860  14% 86%  extend or miss a few frames near the puck transition end points. Since this typically happens for puck transitions between players on the same team, team possession accuracy is not affected. In our test sequence, the final puck trajectories represent all the puck transitions between players with no false positives. Except for errors at frames near the puck transition end points, player possession accuracy is also quite reliable. However, for test sequences involving more complex interactions among Table 6.3: Puck possession error measurements for 1000 frames Team  Error Count (frames)  Accuracy  Team Possession Player Possession  0 21  100% 97.9%  57  Figure 6.1: Puck possession visualization interface. Screenshot of the puck possession interface showing the possession data for the final frame in our test sequence. players, puck tracking may well include missing or false positive puck transitions. Then, player possession accuracy would decrease. Player possession accuracy necessarily is less than or equal to team possession accuracy. Most puck transitions are between players on the same team. Team possession can be correct even when player possession has errors at the endpoints of puck transitions. For long test sequences perhaps including an entire game, high overall accuracy still is possible since errors affect only a small percentage of the frames.  6.2.2  Puck Tracking  Puck Transition Trajectories We generate a collection of puck trajectories to represent the puck transitions between players. Figure 6.2 shows the first puck transition trajectory generated for the 1000 frame test sequence. In order to compare estimated puck locations with ground truth, we create x-t, y-t, dist-t graphs in rink coordinates, as shown in Figure 6.3. The dist-t graph plots the Euclidean distance between the tracked puck locations and the corresponding ground truth. In the entire 1000 frame test sequence, 58  Figure 6.2: Example of puck transition trajectory. Puck transition trajectory from Frame #17 to Frame #41 in green. Red circle is puck location. Trajectory in rink coordinates (upper left) is static. The corresponding trajectory in image coordinates is projected and dynamic according to the camera motion. there are a total of 13 puck transition trajectories. Representative frames from the other 12 trajectories are shown in Figure 6.4 and Figure 6.5. Puck trajectories are visualized as green curves in both image coordinates and rink coordinates. Puck location in the given frame is labeled as a red circle. Table 6.4 gives the error measurements for all 13 trajectories determined as follows. Let the collection of puck trajectories be {T1 , T2 , ...Tn }. Each Ti is a sequence of puck locations {pi1 , pi2 , ...pim }. Let Tˆi be the corresponding ground truth i  puck trajectories with puck locations { pˆi1 , pˆi2 , ... pˆimi }. Table 6.4 reports the error measurements for the 13 puck trajectories that represent a puck transition between players. Recall that the rink model has dimension 394x168. The first column in Table 6.4 is the frame index interval of the trajectory. The second column is trajectory length in frames. The third column is the average distance error, ε, which is calculated as the average Euclidean distance between points on the trajectory to the corresponding ground truth points. See Equation 6.1. The fourth column is the average distance error computed with unit of metres. Recall the rink dimension in  59  (a)  (b)  (c) Figure 6.3: Puck transition trajectory example error measurement. Graphs for the example puck transition trajectory (green) and ground truth (red): (a) x-t graph (b) y-t graph (c) dist-t graph. 60  Frame #71 to #83  Frame #126 to #135  Frame #247 to #271  Frame #384 to #399  Frame #432 to #441  Frame #494 to #506  Figure 6.4: Final puck trajectories i. Green lines show the entire puck trajectory from start frame to end frame. The red circle is the tracked puck location in the selected frame. NHL rule book [21] is approximate 61x30 metres. εi =  ∑mj=1 euclideandistance(pij , pˆij ) Ti .length  61  (6.1)  Frame #540 to #558  Frame #650 to #665  Frame #745 to #769  Frame #800 to #812  Frame #842 to #857  Frame #978 to #999  Figure 6.5: Final puck trajectories ii. Green lines show the entire puck trajectory from start frame to end frame. The red circle is the tracked puck location in the selected frame. If we only consider puck trajectories that represent passes between players, the total average distance error for all trajectories is given by: ∑ni=1 εi × Ti .length ∑ni=1 Ti .length 81.958597 = 223 = 0.367527  ε=  (6.2)  This average distance error in rink model is 0.367527 pixel. We compute the average distance error in actual rink dimension is 0.065630 metre.  62  Table 6.4: Error measurements for puck transition trajectories Trajectory Coverage [17, [71, [126, [247, [384, [432, [494, [540, [650, [745, [800, [842, [978,  41] 83] 135] 271] 399] 441] 506] 558] 665] 769] 812] 857] 999]  Total Frames 25 13 10 25 16 10 13 19 16 25 13 16 22  Average Distance Error (pixel)  Average Distance Error (metre)  0.212931 0.851974 0.083623 0.300697 0.106464 0.461295 0.458183 0.223218 0.157077 0.223764 0.148134 1.323969 0.430706  0.038023 0.152138 0.014933 0.053696 0.019011 0.082374 0.081818 0.039860 0.028049 0.039958 0.026452 0.236423 0.076912  Puck Full Trajectory We generate puck transition trajectories to help us to determine the puck possession. During play simulation, we uniquely assign a puck location when the puck is 1) visible or 2) its positions can be estimated in surrounding frames. This succeeds for 871 frames in the test sequence. If the puck is held by a player, we use the player’s location to approximate the puck location. If the puck location is on a trajectory extension, the position of the extended node is taken as the puck location. Gaps within transition trajectories are already filled in tracking section. Our puck tracking module generates unique puck locations for all frames that have ground truth. In our test sequence, 871 frames have ground truth. We interpolate to obtain the final puck trajectories for both tracking and ground truth, as shown in Figure 6.6. Error measurement for the entire test sequence including x-t, y-t and dist-t graphs, is shown in Figure 6.7. Table 6.5 compares the average distance error for puck transition trajectories to the full puck trajectories obtained by adding trajectory extensions and frames in which the puck is held by a player. The average distance error for the full trajectory is an order of magnitude larger than for puck transition trajectories alone. This is mainly due to the fact that assigning puck 63  Figure 6.6: Interpolated puck final trajectory and ground truth. The interpolated full puck trajectory (green) and ground truth trajectory (red). Some frames have neither tracked puck location nor ground truth results. Table 6.5: Puck transition trajectories vs. full trajectory Trajectory Type  Total Frames  Transitions Full  223 871  Average Distance Error (pixels)  Average Distance Error (metre)  0.367527 5.778364  0.065630 1.031851  location to player location, when the puck is held, is only approximate and itself subject to location errors in player tracking.  6.2.3  Motion Field Analysis  Motion field analysis is applied to two tasks: 1) predict play evolution and possible puck location; 2) estimate puck team possession. The analysis is based on player motion only. Accordingly, we consider qualitative performance more than quantitative analysis. Play Evolution and Puck Location Prediction Dense motion field construction is visualized dynamically in real-time as flow arrows. The flow field accounts for overall player motion trends on the ice sur64  (a)  (b)  (c) Figure 6.7: Full puck trajectory and ground truth compare. Graphs for the full puck trajectory (green) and ground truth (red): (a) x-t graph (b) y-t graph (c) dist-t graph. 65  Frame #049  Frame #182  Frame #219  Frame #328  Frame #419  Frame #467  Figure 6.8: Motion field play evolution and puck location prediction i. Play evolution and puck location prediction is labeled by red circles. Motion flow vectors are shown in rink coordinates (blue) and in image coordinates (green). The red rectangle in the rink model is the motion field window. face. For motion field construction, we sample every pixel in rink coordinates. For motion vector visualization, we sample every fifth pixel in rink coordinates, and project these points back to image coordinates as well. Selected frames from our 1000 frame test sequence are shown in Figure 6.8 and Figure 6.9. Motion vectors in the rink coordinates are shown as blue arrows. The motion vectors are also shown in the image coordinates as green arrows. The motion field convergence points are shown as red circles. These approximate play evolution and serve as our prediction of where the puck might go.  66  Frame #498  Frame #533  Frame #609  Frame #715  Frame #918  Frame #974  Figure 6.9: Motion field play evolution and puck location prediction ii. Play evolution and puck location prediction is labeled by red circles. Motion flow vectors are shown in rink coordinates (blue) and in image coordinates (green). The red rectangle in the rink model is the motion field window. A frame can have multiple convergence points. Typically, this reflects multiple pass opportunities. At most, one of them can be the target of an actual pass. Our results suggest that predictions of play evolution and puck location are qualitatively consistent with ground truth. Puck location prediction based on motion field analysis is currently not quantitatively accurate enough to reason about puck trajectories as is done in Section 4.3. However, it is potentially useful for applications such as automatic camera control (pan/tilt/zoom).  67  (a)  (b)  Figure 6.10: Motion field puck team possession annotation. Motion vectors for each team (blue is NYR, red is MTL). Team possession textual annotation is in the upper right corner. (a) most vectors have small angle difference. Team possession is unchanged. (b) motion vector angle differences indicate a possible change in team puck possession. Puck Team Possession Puck team possession estimation considers the difference between motion vectors of the two teams. Figure 6.10 shows the motion vectors of both teams and the team possession. Two sample frames are shown. In one, the team possession stays the same as in the previous frame. The other suggests a possible change in team possession. Table 6.6 gives the overall team possession error measurement. The table shows 58 out of 1000 frames in error. For the test sequence, the accuracy of team possession prediction based on motion field analysis is 94.2%. There are two segments with significant error. They are two typical problematic examples. The first one has continuously 30 frames error. It happens during a long pass from one team to another. It is hard to find the accurate possession changing time during a long pass purely based on player motion. The second one has continuously 18 frames error. It happens after two teams were moving against each other. We mistakenly estimated that the possession changed. In sum, it is easy to estimate that the possession stays the same as in the previous frame. But, it is hard to estimate when the possession does change.  68  Table 6.6: Puck team possession for 1000 frames by motion field  6.3  Frame Range  Ground Truth  Our Results  Error Count  [1,110] [111,140] [141,508] [509,514] [515,515] [516,516] [517,794] [795,795] [796,796] [797,814] [815,815] [816,816] [817,817] [818,818] [819,1000]  MTL MTL NYR NYR NYR NYR NYR NYR NYR NYR NYR NYR NYR NYR NYR  MTL NYR NYR MTL NYR MTL NYR MTL NYR MTL NYR MTL NYR MTL NYR  0 30 0 6 0 1 0 1 0 18 0 1 0 1 0  System Performance  Our experiments were run on an Intel Core2 CPU machine with 2.66GHZ CPU, 3.0G RAM, under Ubuntu 10.04+. We create a performance table for the 1000 frame test sequence at resolution 1920x970. The two modules are discussed separately in the following sections.  6.3.1  Puck Tracking Module  The puck tracking module includes: candidates detection and filtering, puck tracking, transition trajectory filtering and possession reasoning. We divide them into two groups based on implementation environment. The first group includes preliminary blobs detection and candidates filtering. Candidates filtering is sub-divided into 1) connected component labeling and size filtering and 2) shape filtering and surrounding area analysis, as shown in Table 6.7. The overall processing speed of this group is over two orders of magnitude slower than the frame rate. In order to allow interactive use of other components of the system, the puck candidates  69  Table 6.7: Preprocessing stages rate for 1000 frames test sequence Stage Preliminary Blobs CCL and Size Filter Shape Filter and SSA Total  Average Time Per Frame(second) 0.28 6.02 1.98  Environment Python Matlab Matlab  8.28  Table 6.8: Annotation system stages rate for 1000 frames test sequence Stage  Time(second)  Speed(fps)  Parsing Candidates File Other Candidates Filters Graphs Construction Transition Trajectories Computation  2.32 24.59 0.83 2.41  431.03 40.67 1204.81 414.94  Total  30.15  33.16  log is pregenerated for each data set. An efficient connected component labeling algorithm could break the bottleneck. Currently we use Matlab. The second group includes candidates file parsing, additional domain knowledge based candidates filtering, hierarchical graph-based puck tracking (includes graph construction and transition trajectories computation), as shown in Table 6.8. They are included in our annotation system written in Python. They already run fast enough. The average overall processing speed for all the stages in this group is 33.16 fps, which is faster than the North America broadcast rate (30 fps). Our annotation system can run in real-time provided the puck candidates and player tracking results are available from preprocessing. Transition trajectory filtering, possession reasoning and final trajectory interpolation are done online during real-time play simulation.  6.3.2  Motion Field Module  In the motion field module, the first component is play evolution and puck location prediction which includes: dense motion field construction, measurement propagation and convergence point detection. Dense motion field construction is sub70  Table 6.9: Motion field speed for window size 106x115 Stage  Average Time Per Frame(second)  Dense Motion Field Construction Measurement Propagation Convergence Point Detection  0.26 0.25 0.09  Total  0.6  divided into: motion field window computation, player velocity computation, RBF interpolation. Since the motion field window is dynamic, its window size varies across frames in our rink model (394x198), size measured by pixels. Motion field computation is online. It uses the current frame with a few previous frames for velocity computation and temporal smoothing. Table 6.9 shows a single frame in our 1000 frame test sequence with motion field size (106x115) with sample rate of per pixel to provide a general idea about the processing speed. This speed is related to the motion field size. Another component of the motion field module is team possession estimation. The processing cost for this is approximately double the cost of dense motion field construction itself since both teams require motion field construction with same window size. Overall, the speed of the motion field module is acceptable but not yet fully real-time. It is sensitive to rink model and motion field size as well as sample rate.  71  Chapter 7  Conclusion and Future Work 7.1  Conclusion  Annotation of puck possession and location in hockey broadcast video is an example of automatic sports video analysis. Similar problems exist in other multiplayer field sports. In this thesis, two separate modules were demonstrated. In the first module, we describe motion-based puck tracking in three steps: 1) puck candidates detection and filtering. 2) hierarchical graph-based tracking. 3) puck trajectories filtering and possession reasoning. Our contributions are: to create fast blob extraction and filters to select puck candidates, to develop an innovative graphbased tracking method to transform puck tracking into a longest path problem in a weighted acyclic graph, to define puck state engine and simulate puck transitions. This is the first exploration of puck tracking and puck possession determination in computer vision. In the second module, we modify and implement motion field analysis [17] for hockey: dynamically creating a motion field window, balancing the global and local influence on flow propagation, temporal smoothing to maintaine motion field stability, correlating flow with team possession reasoning. In addition, we extended the UBC hockey system GUI to include annotation for puck possession, puck trajectories, motion flow and convergence points. The result of our test sequence shows that puck tracking and possession annotation are robust, even when the homography and player tracking results are of limited absolute accuracy and contain isolated frames with significant errors in rectification. Motion 72  field analysis module is also usable for exploration. The overall implementation is fast enough to be interactive including the graphical visualization.  7.2 7.2.1  Future Work Video Frame Classification  Our test sequence contains 1000 frames of continuous play. The camera moves and, in each frame, provides only a partial view of the rink and players. However, over the course of an entire game, an actual hockey broadcast contains lots of other types of frames. These include switches among multiple cameras, player and crowd close-ups during play stoppages, commercials, slow motion replays, and player jockeying prior to a face off or following a whistle. Frames during play stoppages are not directly relevant to puck possession and location. It would be useful to first segment the whole game into continuous sequences of play for input to our system. In this thesis, we assume the video stream to have been pre-segmented into suitable sequences. Cameras and camera zoom change even during continuous play. Some parameters of our system are adaptive to the image resolution and rink model. Currently, the camera zoom is not taken explicitly into account. Extending the UBC hockey system to do automatic shot segmentation and estimation of camera parameters, such as zoom, is an area for future work.  7.2.2  Multiple Cameras  Our test sequence provides a single view data at a zoom level that typically includes only part of the rink and some of the players. This, combined with camera motion, increases the difficulty of puck tracking and motion field construction which further creates challenge to determine puck possession. Multiple cameras can help in two ways: 1) The puck is more often visible. This would help with puck tracking. 2) If we have tracking results for every player in each frame, the motion field is better defined and its estimation greatly improved. This would provide a better test of its use in play evolution prediction.  73  7.2.3  Puck Candidates Detection  Missed detections of the visible puck still occur. There still is opportunity to reduce the number of false positive candidates. Colour segmentation and high dynamic range processing are avenues to explore. Currently, detection is offline and slow. A fast implementation of connected component analysis would improve things at a practical level since it is currently the speed bottleneck.  7.2.4  Puck Tracking  Trajectory Extension Due to candidates detection and filtering, some puck transition trajectories are truncated near their end points compared to the real transitions from player to player. We hope to recover the transition trajectories as well as possible with trajectory extension based on the Newton’s first law of motion. In the current implementation, we compute the puck velocity for transition trajectory and extend it based on our physical model until touched by some player. Kalman filter would be another tool for trajectory extension. Dynamic Kalman Filter We use a graph-based method to find trajectory candidates. We do not use a Kalman filter for two reasons. First, the puck may be occluded for a long time. The state vector of a Kalman filter cannot be automatically updated after long periods when the puck is not visible. The puck location and velocity can change significantly during occlusion. Under these circumstances, a Kalman filter will accumulate significant error over time, unless the state vector is re-initialized adaptively. Second, Kalman filter prediction requires the measurements to be correct. In our case, we have multiple puck candidates with similar features and many are false positives. Our work already suggests one way to do dynamic state vector estimation based on puck transition trajectory candidates.  74  7.2.5  Stability when Real Trajectory is not Detected  In our test sequence, puck possession and puck tracking accuracy are high. We successfully track all the puck transitions between players. We would expect a missed transition to arise for two possible reasons. First, failure to detect the puck over an extended sequence of frames could cause the whole transition to be missed. Second, extremely short distance puck transitions could be lost entirely or subsumed by a false positive trajectory. If such a failure were to occur, overall accuracy of puck possession and location estimation likely would decrease for multiple subsequent frames. One might consider redesign of the puck trajectory filtering algorithm independent of play simulation. The idea is to build another graph based on the second-level graph in Section 4.2.3. Puck transition trajectories and partial player trajectories are the graph nodes. Then compute three dimensional distance (both spatial and temporal) between consecutive nodes and use Dijkstra’s algorithm to select trajectories. Final puck trajectory interpolation or some other approximation method can be applied to compensate for missed transitions.  7.2.6  Puck Possession Reasoning  Possession reasoning in our system uses the Euclidean distance between the end point of a trajectory and the nearest player. If the distance is small enough, in rink coordinates, we say that the player received the puck. According to the rules of hockey (see Section 1.3.3), a player A retains possession of the puck until another player B touches it. This, of course, is more subtle than Euclidean distance alone. One idea is to determine possession change based on blob connectivity. Possession changes only when the blob associated with the puck merges with some component blob associated with the receiving player. Another idea is to determine touch based on a change in the direction of puck motion. If a puck moving along the ice changes direction, it has interacted (i.e., it has been touched) by something. However, it can happen that possession changes even though the puck motion stays in a similar direction.  75  7.2.7  Motion Field Analysis  Motion field analysis explores the prediction of play evolution online. The goals are to estimate where the puck will go, in future play, and to provide useful information about where the puck currently might be when the puck itself is not directly visible. This exploration is limited if only part of the rink and a subset of the players are visible in each frame. Data covering the entire rink and all the players would be a useful next step in motion field analysis for hockey. We would expect to use domain knowledge about hockey strategy in a global motion field analysis. Player pose also provides information about puck possession and play evolution. Player tracking that includes information about player pose is something to include in the motion field analysis. Currently, the computational bottleneck in motion field analysis is flow propagation and RBF interpolation. Performance here could be improved, if warranted.  7.2.8  Offline, Online and Real-time  For the puck tracking module, the three identified offline parts are: candidates detection, candidates filtering and puck tracking. Future frames are used to better detect and track the puck in the current frame. It is worth considering the possibility of online or near online processing. The processing rate for candidates filtering and puck tracking is already, on average, faster than the broadcast video rate. Candidates detection currently is slower. Improvements in the processing frame rate for candidates detection are needed to allow the entire module to be real-time. One can imagine a buffer (of, say, 200 frames) to dynamically load video sequences, during broadcast with our method applied to the sequence in the buffer. The system seems both online and real-time to the viewer with only a small delay from the actual time. For the motion field module, processing is already online since it uses only previous and current information. To achieve real-time performance, implementation on a GPU or other parallel architecture can be considered.  76  Bibliography [1] N. Ancona, G. Cicirelli, E. Stella, and A. Distante. Ball detection in static images with support vector machines for classification. Image and Vision Computing, 21:675–692, 2003. → pages 16 [2] B. Bennett, D. R. Magee, A. G. Cohn, and D. C. Hogg. Using spatio-temporal continuity constraints to enhance visual tracking of moving objects. In ECAI’04, pages 922–926, 2004. → pages 18 [3] Bruce. The art of puck possession. Exactsports Article, April 2010. → pages 2, 12 [4] M. D. Buhmann. Radial Basis Functions: Theory and Implementations. Cambridge University Press, 2003. → pages 50 [5] R. Cavallaro. The FoxTrax hockey puck tracking system. IEEE Computer Graphics and Applications, 17:6–12, 1997. → pages 15 [6] K. Choi and Y. Seo. Tracking soccer ball in TV broadcast video. In F. Roli and S. Vitulano, editors, Image Analysis and Processing ICIAP 2005, volume 3617 of Lecture Notes in Computer Science, pages 661–668. Springer Berlin / Heidelberg, 2005. → pages 17 [7] T. Corpetti, E. Mmin, and P. Prez. Extraction of singular points from dense motion fields: An analytic approach. Journal of Mathematical Imaging and Vision, 19:175–198, 2003. → pages 19 [8] T. D’Orazio, C. Guaragnella, M. Leo, and A. Distante. A new algorithm for ball recognition using circle Hough transform and neural classifier. Pattern Recognition, 37(3):393 – 408, 2004. → pages 16 [9] M. A. Fischler and R. C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. CACM, 24:381–395, June 1981. → pages 14 77  [10] K. Fukunaga and L. Hostetler. The estimation of the gradient of a density function, with applications in pattern recognition. Information Theory, IEEE Transactions on, 21(1):32 – 40, Jan 1975. → pages 53 [11] A. Gupta. Using line and ellipse features for rectification of broadcast hockey video. Master’s thesis, The University of British Columbia, Decemeber 2010. → pages 4, 14, 32 [12] A. Gupta, J. J. Little, and R. J. Woodham. Using line and ellipse features for rectification of broadcast hockey video. Computer and Canadian Robot Vision (CRV), 2011. → pages 4, 14, 32 [13] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, second edition, June 2004. → pages 14 [14] Y. Huang, J. Llach, and S. Bhagavathy. Players and ball detection in soccer videos based on color segmentation and shape analysis. In N. Sebe, Y. Liu, Y. Zhuang, and T. Huang, editors, Multimedia Content Analysis and Mining, volume 4577 of Lecture Notes in Computer Science, pages 416–425. Springer Berlin / Heidelberg, 2007. → pages 17 [15] P. Jones, N. James, and S. Mellalieu. Possession as a performance indicator in soccer. International Journal of Performance Analysis in Sport, 4(1): 98–102, August 2004. → pages 12 [16] J.-Y. Kim and T.-Y. Kim. Soccer ball tracking using dynamic Kalman filter with velocity control. Computer Graphics, Imaging and Visualization, International Conference on, 0:367–374, 2009. → pages 4, 17 [17] K. Kim, M. Grundmann, A. Shamir, I. Matthews, J. Hodgins, and I. Essa. Motion field to predict play evolution in dynamic sport scenes. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010. → pages 6, 19, 47, 49, 52, 72 [18] D. Koller, K. Daniilidis, T. Thorhallsson, and H.-H. Nagel. Model-based object tracking in traffic scenes. In ECCV, pages 437–452, 1992. → pages 5 [19] J. Kubatko, D. Oliver, K. Pelton, and D. T. Rosenbaum. A starting point for analyzing basketball statistics. Journal of Quantitative Analysis in Sports, 3, 2007. → pages 12 [20] C. Lago-Penas and A. Dellal. Ball possession strategies in elite soccer according to the evolution of the match-score: the influence of situational 78  variables. Journal of Human Kinetics, 25:93–100, October 2010. → pages 12 [21] N. H. League. National Hockey League Official Rules 2010-2011. Triumph Books, 2010. → pages 7, 32, 45, 61 [22] S. Leahy. Chara breaks own hardest shot record, hits 105.9 mph. Yahoo! Sports Puck Daddy, Jan 2011. → pages 36 [23] M. Leo, T. D’Orazio, P. Spagnolo, P. Mazzeo, and A. Distante. SIFT based ball recognition in soccer images. In A. Elmoataz, O. Lezoray, F. Nouboud, and D. Mammass, editors, Image and Signal Processing, volume 5099 of Lecture Notes in Computer Science, pages 263–272. Springer Berlin / Heidelberg, 2008. → pages 16 [24] F. Li and R. J. Woodham. Video analysis of hockey play in selected game situations. Image and Vision Computing, 27(1-2):45 – 58, 2009. → pages 15 [25] D. Liang, Y. Liu, Q. Huang, and W. Gao. A scheme for ball detection and tracking in broadcast soccer video. In Y.-S. Ho and H. Kim, editors, Advances in Mulitmedia Information Processing - PCM 2005, volume 3767 of Lecture Notes in Computer Science, pages 864–875. Springer Berlin / Heidelberg, 2005. → pages 18 [26] Y. Liu, D. Liang, Q. Huang, and W. Gao. Extracting 3D information from broadcast soccer video. Image and Vision Computing, 24(10):1146 – 1162, 2006. → pages 18 [27] D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60:91–110, nov 2004. → pages 5, 16 [28] W. L. Lu, K. Okuma, and J. J. Little. Tracking and recognizing actions of multiple hockey players using the boosted particle filter. Image and Vision Computing, 27(1–2):189–205, 2009. → pages 14 [29] J. Miura, T. Shimawaki, T. Sakiyama, and Y. Shirai. Ball route estimation under heavy occlusion in broadcast soccer video. Comput. Vis. Image Underst., 113:653–662, May 2009. ISSN 1077-3142. → pages 18 [30] K. Okuma, J. J. Little, and D. G. Lowe. Automatic acquisition of motion trajectories: Tracking hockey players. SPIE Internet Imaging V, 5304: 202–213, January 2004. → pages 13, 14 79  [31] K. Okuma, J. J. Little, and D. G. Lowe. Automatic rectification of long image sequences. Asian Conference on Computer Vision (ACCV), January 2004. → pages 14 [32] K. Okuma, A. Taleghani, N. D. Freitas, O. D. Freitas, J. J. Little, and D. G. Lowe. A boosted particle filter: Multitarget detection and tracking. European Conference on Computer Vision (ECCV), 3021:28–39, 2004. → pages 3, 8, 13, 34, 48, 56 [33] D. M. O’Shaughnessy. Possession versus position: Strategic evaluation in AFL. Journal of Sports Science and Medicine, 5:533–540, 2006. → pages 12 [34] V. Pallavi, J. Mukherjee, A. K. Majumdar, and S. Sural. Identification of team in possession of ball in a soccer video using static and dynamic segmentation. International Conference on Advances in Pattern Recognition (ICAPR), pages 249–255, 2006. → pages 13 [35] V. Pallavi, J. Mukherjee, A. K. Majumdar, and S. Sural. Ball detection from broadcast soccer videos using static and dynamic features. J. Vis. Comun. Image Represent., 19:426–436, October 2008. ISSN 1047-3203. → pages 4, 17 [36] C. Pei, S. Yang, L. Gao, and W. Ma. A real time ball detection framework for soccer video. In Systems, Signals and Image Processing, 2009. IWSSIP 2009. 16th International Conference on, pages 1–4, June 2009. → pages 17 [37] A. Rao and R. Jain. Computerized flow field analysis: oriented texture fields. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 14 (7):693 –709, July 1992. → pages 19 [38] L. Rollins. Implication of puck possession on scoring changes in ice hockey. Bachelor’s Thesis, HAAGA-HELIA University Of Applied Sciences, 2010. → pages 12 [39] Y. Seo, S. Choi, H. Kim, and K.-S. Hong. Where are the ball and players? soccer game analysis with color based tracking and image mosaick. In Proceedings of the 9th International Conference on Image Analysis and Processing-Volume II, pages 196–203, London, UK, 1997. Springer-Verlag. → pages 17 [40] J. Shi and C. Tomasi. Good features to track. In Computer Vision and Pattern Recognition, 1994. Proceedings CVPR ’94., 1994 IEEE Computer Society Conference on, pages 593 –600, 1994. → pages 14 80  [41] T. Shimawaki, J. Miura, T. Sakiyama, and Y. Shirai. Ball route estimation in broadcast soccer video. In ECCV Workshop on Computer Vision Based Analysis in Sports Environments, pages 26–37, May 2006. → pages 17 [42] T. Shimawaki, T. Sakiyama, J. Miura, and Y. Shirai. Estimation of ball route under overlapping with players and lines in soccer video image sequence. In Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, volume 1, pages 359–362, 2006. → pages 18 [43] A. C. Thomas. The impact of puck possession and location on ice hockey strategy. Journal of Quantitative Analysis in Sports, 2(1), 2006. → pages 2, 7, 12 [44] C. Tomasi and T. Kanade. Detection and tracking of point features. Technical Report CMU-CS-91-132, pages 1–22, 1991. → pages 14 [45] X. Tong, T. Wang, W. Li, Y. Zhang, B. Yang, F. Wang, L. Sun, and S. Yang. A three-level scheme for real-time ball tracking. In N. Sebe, Y. Liu, Y. Zhuang, and T. Huang, editors, Multimedia Content Analysis and Mining, volume 4577 of Lecture Notes in Computer Science, pages 161–171. Springer Berlin / Heidelberg, 2007. → pages 18 [46] X.-F. Tong, H.-Q. Lu, and Q.-S. Liu. An effective and fast soccer ball detection and tracking method. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, volume 4, pages 795–798, August 2004. → pages 17 [47] Y. Tong, S. Lombeyda, A. N. Hirani, and M. Desbrun. Discrete multiscale vector field decomposition. ACM Trans. Graph., 22:445–452, July 2003. → pages 19 [48] X. Yu, H. W. Leong, J.-H. Lim, Q. Tian, and Z. Jiang. Team possession analysis for broadcast soccer video based on ball trajectory. In Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint Conference of the Fourth International Conference on, volume 3, pages 1811–1815, 2003. → pages 13 [49] X. Yu, C. Xu, H. W. Leong, Q. Tian, Q. Tang, and K. W. Wan. Trajectory-based ball detection and tracking with applications to semantic analysis of broadcast soccer video. In Proceedings of the eleventh ACM international conference on Multimedia, MULTIMEDIA ’03, pages 11–20, New York, NY, USA, 2003. ACM. ISBN 1-58113-722-2. → pages 4, 16, 36 81  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items