UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Vision utility framework : a new approach to vision system development Afrah, Amir 2009-02-11

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata


24-ubc_2009_spring_afrah_amir.pdf [ 13.68MB ]
JSON: 24-1.0066947.json
JSON-LD: 24-1.0066947-ld.json
RDF/XML (Pretty): 24-1.0066947-rdf.xml
RDF/JSON: 24-1.0066947-rdf.json
Turtle: 24-1.0066947-turtle.txt
N-Triples: 24-1.0066947-rdf-ntriples.txt
Original Record: 24-1.0066947-source.json
Full Text

Full Text

Vision Utility FrameworkA New Approach To Vision System DevelopmentbyAmir AfrahB.ASc., The University of British Columbia, 2006A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinThe Faculty of Graduate Studies(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)February, 2009c Amir Afrah 2009AbstractWe are addressing two aspects of vision based system development that arenot fully exploited in current frameworks: abstraction over low-level detailsand high-level module reusability. Through an evaluation of existing frame-works, we relate these shortcomings to the lack of systematic classi cation ofsub-tasks in vision based system development. Our approach for addressingthese two issues is to classify vision into decoupled sub-tasks, hence de ninga clear scope for a vision based system development framework and its sub-components. Firstly, we decompose the task of vision system developmentinto data management and processing. We then proceed to further decom-pose data management into three components: data access, conversion andtransportation.To verify our approach for vision system development we present twoframeworks: the Vision Utility (VU) framework for providing abstractionover the data management component; and the Hive framework for pro-viding the data transportation and high-level code reuse. VU provides thedata management functionality for developers while hiding the low-level sys-tem details through a simple yet  exible Application Programming Interface(API). VU mediates the communication between the developer’s application,vision processing modules, and data sources by utilizing di erent frameworksfor data access, conversion and transportation (Hive). We demonstrate VU’sability for providing abstraction over low-level system details through theexamination of a vision system developed using the framework. Hive is astandalone event based framework for developing distributed vision basedsystems. Hive provides simple high-level methods for managing communi-cation, control and con guration of reusable components. We verify therequirements of Hive (reusability and abstraction over inter-module datatransportation) by presenting a number of di erent systems developed onthe framework using a set of reusable modules.Through this work we aim to demonstrate that this novel approach forvision system development could fundamentally change vision based systemdevelopment by addressing the necessary abstraction, and promoting high-level code reuse.iiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Code Snippets . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . ixDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1 Frameworks for Vision Data Management . . . . . . . . . . 62.1.1 Access to Image Data . . . . . . . . . . . . . . . . . . 62.1.2 Image Conversion Frameworks . . . . . . . . . . . . . 72.1.3 Data Transport Middlewares . . . . . . . . . . . . . . 82.2 Frameworks for Multimedia Development . . . . . . . . . . . 142.3 Frameworks for Vision System Development . . . . . . . . . 152.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Classi cation of the Vision Problem . . . . . . . . . . . . . . 193.1 Data Management vs. Processing . . . . . . . . . . . . . . . 193.2 Sub-Components of Data Management . . . . . . . . . . . . 213.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22iiiTable of Contents4 The Vision Utility Framework . . . . . . . . . . . . . . . . . . 244.1 VU Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.1.1 System Development Model . . . . . . . . . . . . . . 254.1.2 Modules . . . . . . . . . . . . . . . . . . . . . . . . . 254.1.3 Communication Model . . . . . . . . . . . . . . . . . 274.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2.1 Interface Layer . . . . . . . . . . . . . . . . . . . . . . 284.2.2 Core Layer . . . . . . . . . . . . . . . . . . . . . . . . 294.3 Proof of Concept . . . . . . . . . . . . . . . . . . . . . . . . . 324.3.1 Implementation . . . . . . . . . . . . . . . . . . . . . 324.3.2 System Development using VU . . . . . . . . . . . . . 354.3.3 Results and Evaluation . . . . . . . . . . . . . . . . . 394.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 Hive Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 455.1 Hive Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 465.1.1 System development Model . . . . . . . . . . . . . . . 465.1.2 Modules . . . . . . . . . . . . . . . . . . . . . . . . . 465.1.3 Framework Model . . . . . . . . . . . . . . . . . . . . 485.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.2.1 Interface Layer . . . . . . . . . . . . . . . . . . . . . . 495.2.2 Service Layer . . . . . . . . . . . . . . . . . . . . . . . 515.2.3 Manager Service Layer . . . . . . . . . . . . . . . . . 515.2.4 Communication Layer . . . . . . . . . . . . . . . . . . 525.3 Proof of Concept . . . . . . . . . . . . . . . . . . . . . . . . . 535.3.1 Implementation . . . . . . . . . . . . . . . . . . . . . 545.3.2 System Development Using Hive . . . . . . . . . . . . 545.3.3 Results and Evaluation . . . . . . . . . . . . . . . . . 585.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646.1 Problems with Current Approaches . . . . . . . . . . . . . . 646.2 Contributions of this Work . . . . . . . . . . . . . . . . . . . 646.3 Future Direction . . . . . . . . . . . . . . . . . . . . . . . . . 656.3.1 Extension of Vision Utility Framework . . . . . . . . 656.3.2 Extension of Hive . . . . . . . . . . . . . . . . . . . . 666.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68ivTable of ContentsAppendicesA Previous Publications . . . . . . . . . . . . . . . . . . . . . . . 72B Statement of Co-Authorship . . . . . . . . . . . . . . . . . . . 73C The Vision Utility Framework’s Application ProgrammingInterface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74C.1 Application API . . . . . . . . . . . . . . . . . . . . . . . . . 74C.2 Driver API . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76D Hive’s Application Programming Interface . . . . . . . . . . 79D.1 Application API . . . . . . . . . . . . . . . . . . . . . . . . . 79D.2 Drone API . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81vList of Tables4.1 VU application API . . . . . . . . . . . . . . . . . . . . . . . 334.2 VU driver API . . . . . . . . . . . . . . . . . . . . . . . . . . 344.3 Image speci cation used by VU . . . . . . . . . . . . . . . . . 354.4 OpenCV processor data types . . . . . . . . . . . . . . . . . . 38viList of Figures2.1 YARP’s communication model . . . . . . . . . . . . . . . . . 92.2 Player’s arhitecture . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Increased reusability of thin frameworks . . . . . . . . . . . . 112.4 Architecture of RTMP . . . . . . . . . . . . . . . . . . . . . . 122.5 Architecture of Scallop . . . . . . . . . . . . . . . . . . . . . . 122.6 OpenCV framework components . . . . . . . . . . . . . . . . 163.1 Classi cation of the vision problem . . . . . . . . . . . . . . . 224.1 Scope of VU . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Source, application and processors in VU . . . . . . . . . . . 264.3 The VU architecture. . . . . . . . . . . . . . . . . . . . . . . . 284.4 OpenCV based VU processor . . . . . . . . . . . . . . . . . . 364.5 Output from four VU sources . . . . . . . . . . . . . . . . . . 404.6 Image transformation results . . . . . . . . . . . . . . . . . . 424.7 VU processor results . . . . . . . . . . . . . . . . . . . . . . . 435.1 Example Hive system . . . . . . . . . . . . . . . . . . . . . . 475.2 Hive’s architecture . . . . . . . . . . . . . . . . . . . . . . . . 495.3 Hive’s event based communication . . . . . . . . . . . . . . . 535.4 Face detection  ow diagram . . . . . . . . . . . . . . . . . . . 595.5 Face detection results . . . . . . . . . . . . . . . . . . . . . . 605.6 Quality of view  ow diagram . . . . . . . . . . . . . . . . . . 605.7 Quality of view analysis results . . . . . . . . . . . . . . . . . 615.8 Camera calibration  ow diagram . . . . . . . . . . . . . . . . 615.9 Multiple camera calibration results . . . . . . . . . . . . . . . 625.10 Augmented reality  ow diagram . . . . . . . . . . . . . . . . 625.11 Augmented reality results . . . . . . . . . . . . . . . . . . . . 63viiList of Code Snippets4.1 VU framework device interface . . . . . . . . . . . . . . . . . 374.2 OpenCV wrapper function . . . . . . . . . . . . . . . . . . . . 394.3 Application code for source assignment . . . . . . . . . . . . . 414.4 Registering incoming image handlers . . . . . . . . . . . . . . 42viiiAcknowledgementsI have had the privilege of receiving guidance and support from some greatcolleagues and friends during the course of working on this thesis.To my supervisor, Sid: thank you for your valuable guidance, motivation,and for providing me with great opportunities. I have learned a lot fromyou.To Gregor: thank you for your help and friendship. This work wouldnot be what it is without your input.To Tony: thank you for your guidance, and for always managing to makeme feel good with your positivity.I thank all my friends in the HCT lab for making this journey fun andful lling.ixDedicated to my parents. Thank you for your love and support in everyaspect of my life.xChapter 1IntroductionThe  eld of computer vision has been going through an extraordinary meta-morphosis within the last several years which has led to an increasing de-mand for the development of vision based systems. Vision processing, theautomated analysis and manipulation of image and video data through com-putation, has been around since the 1960’s. However, the activity in this eld has accelerated greatly in the past 15 years. This increase is due toa number of factors, mainly the reduction in price and an increase in theaccessibility of the tools (sensors and computation platforms), as well asour understanding of the  eld[15]. The tremendous boost in this  eld hassparked an increasing demand for end-to-end vision systems for deploymentand prototyping of new algorithms from a heterogeneous set of sources andmethods.The task of developing end-to-end vision systems (or the vision prob-lem as we refer to it in this work) has become increasingly sophisticatedand complex as a by-product of recent growth in the  eld. In addition togeneral system issues, vision has speci c requirements that introduce par-ticular system development challenges. Foremost in the list of issues is theperformance requirement of computer vision. Large volume of data and thecomputational complexity of current processing algorithms often exceedsthe performance of readily available computation machines (PCs). Access-ing and managing image data is another non-trivial issue that complicatesthe vision problem. There are numerous data format and protocols thatare employed by di erent camera devices. Furthermore, new methods andalgorithms in multiple camera research have emerged that employ inputsfrom several (possibly heterogeneous) cameras and other sensors to performtasks such as calibration, tracking and 3D reconstruction[22, 25, 29, 40].These multi-sensor systems amplify the issues associated with vision systemdevelopment and require tremendous e ort in order to access, manipulate,transport and process the data in real-time.The need for addressing system-level development issues in the currentstate of computer vision follows the pattern described in [41]. Shaw and Gar-lan have identi ed the following pattern for general software engineering: as1Chapter 1. Introductionthe complexity of a given problem increases, system level issues becomemore than just choosing an algorithm and data structures. Developers faceissues such as composition of components, global control structures, commu-nication protocols, synchronization and data access, physical distribution,etc. The vision problem has reached the point where system level issuesare becoming quite signi cant. A framework that targets these issues in asystematic manner is necessary.Currently system-level issues are addressed in the vision community us-ing two approaches: developing in-house solutions and using available frame-works and packages. Development of in-house solutions results in a largeamount of e ort and resources as it requires users to implement the neededfunctionality. Furthermore, due to the high cost of development, these so-lutions tend to deal with system-level issues minimally which introduces anumber of repercussions. The major two drawbacks being lack of robust-ness and generality as custom solutions are very application speci c with noeasy means for reuse. Due to these issues, the negative trade-o s that ex-ist between  exibility and development e ort is signi cant enough to determany developers from this approach. The use of standardized frameworksand packages for addressing the system-level issues is generally a superiorroute. Standardized frameworks decrease development e orts by exploitingthe redundancy in commonly required functionality thus providing modulereuse.There exists a number of frameworks and packages that target the visionproblem such as OpenCV, Gandalf and VXL. Although these frameworksaddress critical computer vision tasks, in their current rendition they havea number of major shortcomings (as discussed in Chapter 2). The approachtaken by existing frameworks prevents them from maximizing the potentialabstraction over low-level details and reusability that could be extractedfrom computer vision.In this thesis we are targeting the shortcomings of existing frameworksfor addressing the vision problem. The approach we present is to providea conceptual classi cation of the vision problem into a number of decou-pled sub-tasks. We demonstrate that using this approach we can provide aframework with a well de ned scope that promotes module reusability andabstraction over low-level infrastructure-level details. In the remainder ofthis chapter we present: the speci c issues that exist with respect to currentvision based system development approaches, our approach for addressingthese issues and an overview of the contributions we have made through thiswork.21.1. Problem1.1 ProblemThe main problem that we are addressing is that current approaches forvision system development do not fully address the need for abstractionover the low-level details and high-level module reuse.Computer vision is a rich  eld that consists internally of a number ofsecondary components in addition to its primary task of processing imagedata via computation. These secondary issues mainly address the tasks ofretrieval, pre-processing and delivery of image data from sources (cameras orimage/video  les) to the module responsible for performing the processing.Each one of these tasks is non-trivial and introduce a number of issues fordevelopers. Current approaches require vision developers to explicitly dealwith these issues (such as data types, communication and access to sources)which leads to a high awareness of the low-level details of these tasks (asdiscussed in Chapter 2). This awareness complicates the development taskthereby increasing the load on developers.Existing frameworks provide a function based approach for addressing(what they consider to be) an appropriate sub-set of the vision problem (asdiscussed in Section 2.3). Using this approach reusability is achieved withineach framework through function reuse. Since no standardized classi ca-tion of the vision problem currently exists, the functions provided by theseframeworks often overlap in scope. As demonstrated by Makarenko et al.in [26] this leads to very poor reusability in the global sense.1.2 ApproachThe approach we have taken in this work is to provide a new methodologyfor vision based system development in order to address the shortcomingsof current frameworks.Firstly, we present a novel classi cation of the computer vision sub-tasksthat is based on separating data processing from data management. Wefurther classify the sub-components of data management into three stronglydecoupled sub-tasks: data access, data conversion, and data transportation.Secondly, we present a framework that is based directly on the classi -cation of sub-tasks and demonstrate that there are signi cant advantages tovision system development using this approach.Thirdly, we present a framework that implements the data transport sub-component of data management and demonstrate that there are signi cantadvantages to having a transport mechanism for vision based development.31.3. ContributionsMuch of the inspiration for this approach has been derived from the suc-cess of a similar approach in the  eld of Computer Graphics. The introduc-tion of separation between graphics programming and display/interactionin Computer Graphics was done with Open Graphics Library and utilityframeworks such as Graphics Library Utility Toolkit (GLUT), which rev-olutionized the Computer Graphics industry and made it accessible to awider set of developers.1.3 ContributionsThe work presented in this thesis provides three main contributions withrespect to the problems facing computer vision based application develop-ment (as described in Section 1.1). The following is an overview of thesecontributions:1. We present a novel systematic decomposition of the vision probleminto a number of decoupled sub-components.2. We present the Vision Utility (VU) framework which is based on thisdecomposition and provides abstraction over the infrastructure-leveltasks of computer vision.3. We present the Hive framework that provides data transportation be-tween vision modules, hence promoting module reusability.Novel contributions of the Hive framework have been published in theInternational Conference on Distributed Smart Cameras (ICDSC’08) andthe International Conference on Computer Vision Theory and Applications(VISAPP’09)[1, 30].1.4 OverviewThe remaining chapters of this thesis are laid out as follows: Chapter 2 willevaluate the previous research and existing frameworks that target the visionproblem and other  elds as they relate to this research. Chapter 3 presentsour decomposition of the vision problem which provides the foundation forour approach to vision based system development. Chapter 4 presents theAPI and architecture of the VU framework and the proof of concept systemdeveloped using this framework. Chapter 5 presents the transport frame-work and a detailed description of its architecture and implementation as41.4. Overviewwell as presenting a set of modules and applications based on this framework.Chapter 6 is the conclusion that summarizes the problem, the approach andthe contributions of this work as well as presenting a future direction for thecontinuation of this research.5Chapter 2Related WorkThe vision problem contains many sub-components that extend into a num-ber of di erent  elds. This chapter provides a discussion of the currentstate of research and existing frameworks that aim to address the variety oflimitations faced in vision system development.This chapter is organized as follows:  rstly, we review existing frame-works that focus on speci c infrastructure-level issues of the vision prob-lem. More speci cally, we evaluate frameworks that target accessing imagedata from devices, converting the format of image data and data transport.Secondly, we present an overview of frameworks for multimedia applicationdevelopment and evaluate them with respect to requirements of vision baseddevelopment. Finally, we evaluate existing frameworks that speci cally at-tempt to target the vision problem in a comprehensive way.2.1 Frameworks for Vision Data ManagementIn this section we evaluate existing frameworks that target infrastructure-level requirements of the vision problem. We focus on existing solutions foraddressing the following tasks: accessing image data, conversion of imagedata and inter-module transportation of data in systems.2.1.1 Access to Image DataThe following is an evaluation of several signi cant frameworks for standard-ization of image acquisition from various di erent sources:Device Standardization There have been a number of e orts to cre-ate standardized formats for device manufacturers such as IIDC 1394-basedDigital Camera Speci cation[3] and VAPIX network camera communica-tion speci cation[11]. IIDC standardizes access to camera devices that useFireWire as the camera-to-PC interconnect. The IIDC standard speci esthe camera registers,  elds within those registers, video formats, modes ofoperation, and controls. IIDC ensures uniformity of access to devices that62.1. Frameworks for Vision Data Managementadhere to its standards. VAPIX is a HTTP protocol developed by AxisCorporation for communication with network cameras via TCP/IP and theserver client model. Using VAPIX the image data and camera con gurationdata in VAPIX are sent as HTTP commands to and from the camera deviceallowing uniform communication to any network device that implementsVAPIX. Although these standard formats present a theoretically valid ap-proach, a convergence of such standards by manufacturers is not likely withinthe short term if ever.Video4Linux Video4Linux (V4L) is an example of a class of solutionsthat attempt to provide seamless access to source via a uniform interface[38].V4L provides standardized access to video devices by including a kernel in-terface for video capture. This approach utilizes Linux’s paradigm of treat-ing all input and output communication as reads and writes to a  le andpresents imaging devices as  le handlers to users. V4L de nes standardtypes for devices and video properties. It functions for opening and clos-ing devices, changing device properties, data formats, and input and outputmethods that are implemented via system calls. Using these de ned typesand methods, programmers have access to the sources that are installed on aparticular machine. Although V4L provides abstraction over speci c cameraprotocols (e.g. IIDC) to the user quite e ectively, it has two drawbacks. It ishighly platform dependent and there is a high barrier to adding support fornew devices. In order to add support for a new device (or class of devices)a developer needs to write kernel drivers which is a cumbersome task andeliminates any hope of an opportunity for platform independency.2.1.2 Image Conversion FrameworksThe conversion of image data involves tasks such as image format conversion,resizing and a ne transformation, etc. Performing these functions is nottrivial and there have been a number of e orts to provide frameworks thatprovide support for these tasks. We discuss some of these frameworks inthis section.Open Source Image Format Packages Currently there exist a varietyof open source cross-platform software packages for image conversion andmanipulation such as ImageMagick[19], DevIL[13], and Netpbm[32]. Theseframeworks provide standard representations of images and a large set ofmanipulation routines that extend well beyond image format conversions72.1. Frameworks for Vision Data Managementand a ne transformations. The main issue with the approach taken bythese frameworks is the over exposure of low-level details of routines.As we demonstrate in Chapters 3 and 4, the functionality of these frame-works could be separated into data management and data processing. Thisseparation would allow the framework to provide high-level abstractions overthe conversion routines and remove users’ awareness of low-level details viaa declarative model.CoreImageTM and CoreVideoTM CoreImage and CoreVideo pro-vide a plug-in based architecture for image manipulation and processingthat utilizes graphics cards for hardware acceleration[20]. Although theseframeworks provide a limited set of image manipulation and hardware ac-celerated processing, the model being used is quite successful at providingabstraction over the low-level details. The API for these frameworks presentsfunctionality as processing blocks that can be aggregated to form a pipelinethat performs one, or a number of, tasks. Image formats and propertiesare completely abstracted away from users in the intermediate sections ofthe pipeline. Users simply connect processing blocks together without ex-ternally managing images or pixels. This level of abstraction removes thenecessity for users to deal with a large overhead of image manipulation.2.1.3 Data Transport MiddlewaresModularity, code reuse and standard accessibility to system components areissues that have been quite apparent and pronounced in various disciplines;particularly in Robotics and Haptics research and development. Since com-munication between di erent sensors, actuators and control algorithms arecentral to these two  elds there have been several projects that have at-tempted to provide middle-wares to support abstraction and module baseddevelopment. This section provides an evaluation and discussion of themost widely used frameworks which is helpful in understanding the variousdecisions that we have made with respect to the development of our commu-nication framework. We discuss the advantages and disadvantages of eachframework and their relevance to vision.RoboticsThe most widely used frameworks for robotics developemnt are YARP[28],Player[10], Orca[6], Orocos[7] and CARMEN[31]. We discuss the  rst threeas they provide the most insight into the di erent communication paradigms:82.1. Frameworks for Vision Data ManagementFigure 2.1: YARP’s communication model with modules. Figure adaptedfrom[28].Yet Another Robotics Platform YARP provides a  exible commu-nication medium based on the observer pattern between di erent runningprocesses. Figure 2.1 shows the communication of the YARP module withusers’ code and other modules. The process that produces data opens an\out port" on a speci c data type and the receiving process opens an \inport" to receive a data type. Data transfer will take place between thetwo ports upon connection. YARP provides an abstraction over the datatransport medium and utilizes the network (TCP/UDP) in its current im-plementation. Although the communication mechanism of YARP is verysimilar to communication required by vision systems, some of its featuresare very speci c to robotics. Much of this di erence is due to the fundamen-tal di erence in use of a large number of sensors and actuators in roboticsthat can often lead to physical exceptions causing instability and unexpectedcrashes. There is a large emphasis on creating mutually exclusive blocks ofprocessing that minimally interfere with each other. Another fundamentaldi erence is the fact that YARP does not provide any means for controllingmodules and is designed to be used strictly as a slave framework. Vision re-quires the communication framework to be the main backbone of the system,whereas YARP is a more lightweight approach that does not assume controlin order to be compatible with other possible libraries and frameworks inthe system.92.1. Frameworks for Vision Data ManagementPlayer Player is another software framework that provides a communica-tion means for managing the  ow of distributed sensing and control databetween sensors and actuators in robots. Player utilizes a message proto-col implemented using multi-threaded TCP sockets. Player is in essence aset of protocols implemented on a client-server model which provides trans-parent network access to the devices employed within a robot. The socketbased communication protocol of Player provides: easy of distribution ofsensors, actuators and control processes; independence between modules (aseach module can be implemented in any language and on any platforms);and the convenience of a client-server model for information exchange. Fig-ure 2.2 shows the overall architecture of Player. The middle section of thediagram is the Player framework that creates the connections between thedevices and control routines. The protocol speci ed by Player is a low-levelschema for information exchange between actuators (servers) and controlroutines (clients). This communication model is not su cient for visionbased systems that require more  exible peer to peer (client to client) net-working. Player focuses on the data management of many actuator-controlpairs. However, it lacks the complex control required for the creation ofcomplex networks of sensors for a single uni ed application like vision.Orca Orca is an open source, general robotics framework that was de-veloped based on Component Based Software Engineering (CBSE)[9]. Theaim of Orca project is to provide an extensible de-centralized middlewarefor connection reusable components that is independent of a particular ar-chitecture and the communication mechanism. The project also aims toprovide an online repository of useful standalone components that can beused by all robot developers.Orca identi es and categorizes the following components of robotics:‘objects’, which refer to units of data that are communicated in betweenmodules; ‘communication pattern’, which speci es the data transfer modelsuch as client-push; ‘transport mechanism’, which speci es the data transferprotocol such as TCP/IP; and ‘components’, which refers to the algorithmimplementations and hardware interfaces.In order to build a system using Orca, a developer connects one or severalcomponents together using the transport mechanism. Orca is most suitableas a vision communication mechanism. However, there are fundamentallimitations that prevent it from being as e ective for vision systems. Theconcept of a controlling module is missing from Orca as no one componentis designated to setup and con gure other components. Also, the level of102.1. Frameworks for Vision Data ManagementFigure 2.2: The scope and architecture of Player. Figure adapted from [10].abstraction is not su cient for vision developers as Orca users are forced toselect communication mechanisms and indicate the communication patterns.Robotics Frameworks Evaluation Makarenko et al. presented a studylooking at existing robotics frameworks comparing the levels of abstrac-tion of six existing solutions (Player, Orocos, Carmen, Orca, Yarp andOpenSlam)[26]. Makarenko et al. conclude that although each solution at-tempts to provide a framework for reusable components and is partially suc-cessful within its community, there is only an estimated 4% reusability acrossthese platforms. Through the analysis of available solutions, Makarenko etal. illustrate that the level of abstraction provided within each of the solu-tions is insu cient. The authors organize the frameworks into three concep-tual layers: Driver and Algorithm Implementations (DA); CommunicationMiddleware (CM); and Robotic Software Framework (RSF). They indicatethat while DA is the layer desirable for reusability, current solutions tendto couple it to the CM and in some cases, even worse, to RSF. The authorssuggest that in the future, frameworks should be designed to have a thinmiddle-ware and RSF. Further, the levels of abstractions in future frame-112.1. Frameworks for Vision Data ManagementFigure 2.3: Increasing reusability through well de ned component scope.Diagonal lines show that each component could be utilized in other frame-works. Figure adapted from [26].works should decouple inter-dependency between layers in order to allow fora mix and match of layers amongst di erent frameworks. This decouplingwould lead to maximizing code reuse in the community as a whole. Fig-ure 2.3 shows the concept of mixing and matching graphically indicatinghow each decoupled component can be used with a number of other choices.We have adopted this concept in our design and implementation throughoutthis work.HapticsAnother  eld that has dealt with similar issues as vision is haptics researchin HCI. Haptics research inherently requires the use of many di erent sensorsand performance is of critical priority, just as in any other Human ComputerInterfaces. Pave et al. have presented the Real-Time Platform Middlewarefor Prototyping of Haptic Applications (RTPM)[34] in order to address someof these issues. RTPM is a framework for development of distributed real-time collaborative Haptic applications. RTPM provides abstraction overcommunication between its di erent distributed haptic, graphics, and con-trol modules based on Remote Function Calls (RFC) model implemented bythe Common Object Request Brokage Architecture (CORBA). RTPM medi-122.1. Frameworks for Vision Data ManagementFigure 2.4: Architecture diagram of RTMP. Figure adapted from [34].ates communication between user processes and the operating system whileproviding abstractions over communication details (as shown in Figure 2.4).The main problem with the approach taken by RTPM in regards tocommunication for vision based applications is its strong client-server com-munication paradigm. Vision systems require more  exible communicationpatterns between components that support the creation of communicationpipelines where the output of a module is directly connected to the inputof another module without the mediation of the application. Using a strictclient-server paradigm for vision would produce computational bottlenecks.In contrast to RTPM, Hive provides the ability to create complex peer-to-peer connections between modules as well as providing RFC for con guringthe modules (as shown in Chapter 5).VisionThe majority of activity in the area of data transportation with respectto vision systems consists of frameworks for distributed smart cameras, dis-tributed sensor networks and generalized data communication solutions thatdeveloped out of speci c applications. The following is an evaluation of theexisting research in this  eld:132.1. Frameworks for Vision Data ManagementFigure 2.5: Architecture diagram of Scallop. Figure adapted from [37].Scallop Scallop is an open framework for creating distributed sensor sys-tems using the peer to peer connectivity paradigm[37]. Scallop provides a exible communication middle-ware for sensor modules data transportation.Figure 2.5 shows the architecture of Scallop. The API mediates communi-cation between user’s code and sensors and other modules.Scallop provides a plug-in mechanism for modules to be used within itsframework which is required by vision systems. Scallop also avoids com-putational bottle-necks by focusing on a pure decentralized peer to peercommunication model where each node communicates data directly to othernodes. The main disadvantage of Scallop is its lack of central control overmodules. Due to this limitation there is no way for dynamic recon gurationof the processing network which may be required for vision systems thatneed dynamic recon guration based on run-time data.Cluster Based Smart Camera Framework Lei et al. present a gener-alized framework for communication in smart camera arrays that providescontrol and mediates data transportation between smart cameras[23]. Thisframework is based on a set of modules (nodes) running on a PC cluster.The nodes represent smart cameras as they contain a data source and a pro-cessing unit. The communication between these nodes is done through twochannels: a message communicator and a data communicator. One of thenodes is designated as the master and it controls the operation of the other142.2. Frameworks for Multimedia Developmentnodes (workers) by sending control messages through the message channel.All of the nodes in the framework are connected directly to each other form-ing a peer to peer network. The framework allows the smart camera arrayto perform a number of built-in tasks and allows for the extension of thesetasks via modi cation of the nodes. The framework provides abstractionover device access, data communication and centralized control over thecon guration of the nodes.This framework provides a scalable peer to peer communication platformfor inter-nodal data transportation as well as a central control mechanism forrun-time con guration of the nodes. The main limitation with this frame-work is the homogeneity of its nodes. All of the nodes are designed to beutilized collectively to perform a single task, yet each node on its own is nota reusable entity. This methodology prevents the framework from support-ing extension through a plug-in architecture of its nodes. All the nodes mustbe updated for the system to perform additional tasks beyond the defaultsupport.RPV The RPV framework provides an API for developing vision sys-tems by connecting a number of modules together on a cluster of PC’s[2].RPV provides the abstractions for gathering input data from sources andpipelining the operators to process the data. Using RPV, a programmer candevelop distributed multi-camera systems with little overhead.RPV provides abstraction over peer to peer data communication be-tween its nodes and supports a plug-in model for the addition of source andprocessing nodes. However, RPV fails to address the need for a centralizedcontrol over the nodes and does not support run-time recon guration of thenetwork of nodes.2.2 Frameworks for Multimedia DevelopmentThere currently exists a number of popular frameworks for multimedia appli-cation development that target similar issues as vision development. Apple’sQuickTimeTM, Sun’s Java Media Framework, and Microsoft’s DirectShoware three of the main players in this  eld. These frameworks focus mainlyon capturing, decoding and rendering to the screen (playing) video (andaudio). However, they provide a limited set of  lters for manipulating thisdata and no support for data communication. The following is a discussionof these three frameworks in more detail:152.2. Frameworks for Multimedia DevelopmentQuickTime 7TM QuickTime is a media framework developed by AppleInc. for managing and handling various multimedia requirements[36]. Inaddition to its ability to manage audio, animation, and graphics, Quick-Time provides functionality for capturing, processing, encoding, decoding,and the delivery of video data through a framework called QTKit. QTKit’sview of vision data is based on the concept of video clips or as QuickTimecalls it ‘movies’. QTKit provides a set of classes for accessing vision datafrom sources (capture devices and  les) that provide high-level abstractionsover the source’s low-level details. QuickTime also provides a very com-prehensive, high-level mechanism for decoding and encoding video betweena large number of di erent formats. There are two limitations with theQuickTime’s approach with respect to vision based system development.The  rst issue is QuickTime’s lack of support for data processing. AlthoughQuickTime provides the use of CoreImage and CoreVideo frameworks whichprovide a small subset of built-in image manipulation routines, it does notprovide any mechanism for advanced vision processing to be integrated inthe framework. This limitation forces users to explicitly deal with the over-head involved in transferring image data in between the framework and theexternal processing module which signi cantly reduces the e ectiveness ofQuickTime as a vision based system development framework. The secondmajor limitation of QuickTime is that it does not provide a means mech-anism for the integration of a data transportation mechanism to addressthe communication between distributed tasks. We have addressed theselimitations in our approach presented in Chapter 4.Java Media Framework Java Media Framework (JMF)[21] is a cross-platform multimedia framework similar to QuickTime that provides capture,playback, streaming and transcoding of multimedia in a number of di er-ent formats for Java developers. The architecture of JMF consists of threestages: input, processing and output. The input stage provides routinesfor accessing video data from capture devices,  les and network inputs. Theprocessing stage deals with converting data using di erent codecs and addingcommon video e ects. The output stage deals with rendering the video data,saving it to disk and sending the data via network. The fundamental lim-itations of JMF are similar to the QuickTime framework. The processingaspect is simpli ed to use intermediate  lters and codecs (although JMF pro-vides limited support for codecs compared with QuickTime) with no built-insupport for extended processing. Also, the support for data transportationis limited to reads from and writes to the network.162.3. Frameworks for Vision System DevelopmentDirectShow DirectShow[14] is a multimedia framework developed by Mi-crosoft to provide a common interface for managing multimedia across manyprogramming languages. DirectShow is an extensible  lter-based frameworkthat provides data capture,  ltering, conversion and rendering of video andaudio data. DirectShow interfaces with the Windows Driver Model in orderto provide access to a large number of capture and  lter devices. DirectShowinsulates the application programmer from the details of accessing these de-vices; however, it also su ers from the same drawbacks as other multimediaframeworks as it provides no support for complex video processing and datatransportation.2.3 Frameworks for Vision System DevelopmentThere have been a number of frameworks that have been designed to targetvision processing as whole. We discuss a few of these frameworks in thissection and compare them with our approach.OpenCV The Open Computer Vision library (OpenCV)[5] is a compre-hensive and widely used vision processing framework. The overall designof OpenCV relies on declaring data type de nitions for image and visionentities and providing functions for operating on and extracting data fromthem. OpenCV provides limited system development support (source ac-cess and image manipulation) for developers to easily create vision systems.OpenCV provides a framework for accessing data from cameras installed onthe system that utilizes an OS speci c framework such as V4L. OpenCValso provides a function based approach for image format conversion andresizing. These functions access images of di erent formats from disk andconvert the data into OpenCV’s native image class. OpenCV also providesroutines that allow the programmer to resize images using a number ofdi erent interpolation methods, extract sub-regions of images to sub-pixelaccuracy and extract speci c channels from a multi-channel image.OpenCV provides function level code reuse within the framework; how-ever, it provides no easy way for users to provide higher level blocks thatcould be reused outside OpenCV. As demonstrated by Makarenko et al.[26]this approach to code reuse is not successful when examined on a scalethat extends beyond the OpenCV framework. In addition to code reuse,OpenCV has other shortcomings and drawbacks that make it inadequatefor larger scale vision system development. Limitations such as lack of sup-port for distribution, multithreading, limited source access and image data172.3. Frameworks for Vision System Developmentmanipulation, force developers to create custom frameworks (or utilize otherexisting frameworks) that employ OpenCV as a complementary framework.OpenCV is an excellent representation of the current approach to com-puter vision development. OpenCV provides users with the tools necessaryto create end to end vision systems, yet these tools cover only a subsetof the vision problem. Figure 2.6 shows the components of the OpenCVframework. The computer vision task has been addressed by 3 libraries:CxCore, OpenCV and HighGui. CxCore de nes a set of data types for rep-resenting common entities such as images, points and arrays and providesthe functions that perform operations on these data types such as elementaccess, copying, arithmetic, etc... OpenCV provides functions that imple-ment computer vision algorithms as well as data access and manipulation.HighGui addresses the image and video I/O as well as window manage-ment and display. Although there is some degree of classi cation embeddedin the categorization of the framework’s tasks into these three classes, thefunctionality of these libraries (managing and processing data) overlap in anumber of cases and are all presented to the user at the same level using thesame function based API. It is clear from this super cial classi cation thatOpenCV does not provide a strong conceptual separation between di erentclasses of vision processing tasks. Consequently, as a result of the lack ofconceptual categorization, OpenCV has major limitations and shortcomingspreventing its extensive use as the major tool for system development.Gandalf Gandalf is a function based computer vision and numerical anal-ysis library that de nes a set of data types relating to mathematical andcomputer vision operations. Gandalf consists of four packages: a Commonpackage that de nes simple structures and data types used by other pack-ages; a Linear algebra package with a large number of routines for matrix andvector manipulations; an Image package that declares the image structureand provides low-level image manipulation routines; and a Vision packagethat provides a number of standard image processing, computer vision andgeometrical routines.Gandalf is similar to OpenCV in its approach to provide a function-basedset of routines for processing numerical and image data. However, Gandalffocuses more on processing and less on providing functionality for dealingwith reading, writing, manipulating image data formats, and displaying[17].Gandalf has the same code reusability issue as OpenCV as it is also a func-tion based framework. Gandalf can not be used as an all encompassingvision framework, because of its focus and limited scope; it does not provide182.3. Frameworks for Vision System DevelopmentFigure 2.6: The OpenCV computer vision framework internal components.support for the data retrieval and preparation tasks. In its current rendi-tion, Gandalf could be used as a processing component of the VU frame-work. However, it will need a much more standardized interface in order toco-exist with the other components in the framework. The VU frameworkpresented in this thesis provides the standardized interface for interactionwith frameworks such as Gandalf.VXL VXL is a vision development framework that consists of a collectionof C++ libraries for computer vision based development[43]. VXL containsfour core libraries for numeric processing (VNL) that provides methods formatrices, vectors, decompositions, optimizers, etc, image access and manip-ulation (VIL), geometry de nition (VGL), and other platform-independentvision related functionality (VSL, VBL, VUL). In addition to the core li-braries, VXL contains a number of other libraries that cover a variety ofvision problems. VXL provides a modular framework where each packagecould may be used as a lightweight unit independent of others.The VXL package has an advantage over OpenCV and Gandalf in thatit provides di erent libraries to address the di erent categories of the visionproblem (re ected by its core libraries). The fundamental approach of VXL192.4. Conclusionis however the same as the other two; to provide a function based set oflibraries. The main disadvantage of this framework is that unlike the VUframework, it relies on users to explicitly manage and handle the inputdata preparation which includes access and manipulation. As we prove inChapter 4, this level of detail is unnecessary for the application developer.Matlab MATLAB is a numerically oriented programming environmentthat provides easy methods for matrix manipulation, plotting of data, im-plementation of algorithms and a number of other useful features[27]. MAT-LAB also contains a package that provides access to image  le access andcommon image processing and analysis routines. MATLAB is commonlyused for algorithm prototyping by researchers as it’s easy interface andreadily accessible image processing package allow for quick development.However, MATLAB is inadequate for any serious system development dueto its poor computational performance and its inability to create end to endsolutions. The scope of MATLAB is quite di erent than VU as VU targetsthe development of large deploy-able vision systems.2.4 ConclusionIn this chapter we explored the two current approaches for utilizing existingframeworks to address the vision problem: using several standalone frame-works that address di erent components of the vision problem and using asingle comprehensive framework (perhaps also in conjunction of other stan-dalone frameworks) to address all of the sub tasks of the vision problem.We reviewed a number of di erent frameworks that target speci c subtasks of vision system development in three categories: image data access,image data conversion, and transportation. Through a discussion of eachframework, we showed that frameworks which provide a large set of func-tionality for data access and image data conversion exist, yet they overlapin scope and expose unnecessary low-level details to users. With respect todata transportation, we showed there are several frameworks; however, theydo not address all the requirements of transportation of vision applications.In the evaluation of the comprehensive frameworks we demonstratedthat they have three major shortcomings: they do not provide all of thefunctionality needed by users; similar to the other frameworks they force theuser to deal with low-level details of all the sub tasks of the vision problem;and they do not provide adequate support for high-level reusability.The overall conclusion of the evaluation presented in this chapter is that202.4. Conclusionthe current approaches for vision based system development su er from avariety of shortcomings which limit reusability and require a large e ort onthe part of developers.21Chapter 3Classi cation of the VisionProblemThis chapter presents an overview of our methodology for addressing visionsystem development issues. It is based on a systematic classi cation of thevision problem into a number of decoupled sub-tasks.As discussed in Chapter 2, current frameworks for vision based applica-tion development such as OpenCV, VXL and Gandalf focus on processingimage and video data while o ering intermediate support for retrieval, pre-possessing and transportation of image and video data. Furthermore, theyfail to provide any strong notion of classi cation of di erent sub-tasks withincomputer vision. All of the functionality of these frameworks are o ered tousers on the same level and through the same interface (function based).The lack of classi cation and abstraction in current approaches to visionbased system development leads to limited component reuse and large de-velopment e orts due to the unnecessary exposure of low-level developmentdetails to developers.In the following sections we directly target the lack of classi cation of thevision problem in current frameworks. We present a categorization of thevarious computer vision sub-tasks into a set of decoupled components whichprovides two main advantages over the traditional approaches: abstractionover low-level details, and increased reusability. Based on this categorizationwe de ne the scope of a framework for vision based development. We presentour component based model and discuss the scope of each component.3.1 Distinction between Data Management andProcessingThe main objective of vision systems is to process image data via computa-tional methods in order to perform one of two tasks: Extract high-level descriptions from images such as location of per-sons or objects etc. We refer to this information as meta-data in this223.1. Data Management vs. Processingcontext. Manipulate image data for example applying a smoothing  lter orcorrect radial distortion etc.The fundamental di erence between these two types of processing is thetype of output they produce. The  rst task produces high-level meta-datawhereas the second task produces image data. We refer to the task ofprocessing images which includes both analysis and manipulation of imagedata as data processing.In vision systems, in order to perform the task of data processing, thesystem developer must address the following issues: retrieve the data froma data source that could be a camera, an image  le, a video  le or a numberof other devices; deliver the data from the source to the module in charge ofperforming the actual processing; modify the format of the data to matchthe format expected by the processing module; and deliver the output fromthe processing module to the module in charge of storing, displaying orusing the output for any other purpose. We refer to these tasks collectivelyas data management with respect to the vision problem. As described, datamanagement is composed of a number of di erent non-trivial, yet necessarysub-tasks.Although the data processing task has speci c requirements for the for-mat and communication protocol of its inputs and outputs, it is decoupledfrom the data management task as long as a standard interface is de ned forcommunication between the two. This decoupling allows for the existence ofa framework that implements the data management tasks while merely pro-viding a standard interface for communication with modules that performthe processing. A framework with this limited scope allows developers tofocus strictly on creating processing modules without addressing any datamanagement. Using this model for vision system development means thatthe scope of data processing becomes well de ned and quite thin, allowingfor greater code reuse as demonstrated by Makarenko et al[26]. In Chap-ter 4 we present a framework that is based on the separation between datamanagement and processing while highlighting the direct mapping betweenthis classi cation and the scope of the framework. Furthermore, we proceedto validate this approach by demonstrating the advantages of the frameworkin Section 4.3.3.The approach of separating management of data from processing hasproved successful in other  elds, a good example being computer graph-ics. Modern computer graphics programming is divided into two parts, alanguages speci cation such as Open Graphics Library (OpenGL)[39] and233.2. Sub-Components of Data Managementutility frameworks such as Graphics Library Utility Toolkit (GLUT)[18].OpenGL speci es graphics language which allows users to perform graph-ics tasks such as creating and manipulating polygons, shades and textures.Frameworks such as GLUT provide the data management functionality suchas interaction from the user and displaying the graphics pipelines output tothe screen. The abstraction provided by utility frameworks allows the lan-guage speci cation to be reusable, completely portable across platforms andaccelerated using di erent graphics hardware.The following section describes in more detail the scope of the function-ality of the data management task and its sub-components.3.2 Sub-Components of Data ManagementAs de ned in the previous section, data management is the task of accessing,preprocessing and delivery of input and outputs to and from the processingtask. Data management includes a number of non-trivial sub-tasks withrespect to accessing image data from devices and managing data in thevision system. These tasks can be classi ed into three categories: Data Access Data Transportation Data ConversionFigure 3.1 shows our classi cation of the computer vision. As the di-agram indicates, there is a strong separation between data managementand processing. Furthermore, it can be seen that the three sub-componentswithin data management are also separated from each other indicating thestrong decoupling that exists with respect to data management’s internalcomponents.Although these three components are standalone and completely de-coupled from one another, they can collectively be utilized to address thedata management requirement of vision systems in an abstracted way. InChapter 4 we present the VU framework directly based on this model thatprovides vision data management. We discuss in detail its conceptual de-sign and implementation while demonstrating its bene ts through a set ofexample applications developed on it.The following is an overview of the three internal components of datamanagement.243.2. Sub-Components of Data ManagementFigure 3.1: Graphical representation of our classi cation of the vision prob-lem.Data Access There exist a wide variety of devices and other media (e.g. les) that could be used as sources of image data for the processing module.The data access module addresses the task of obtaining data from these de-vices. More speci cally, the data access module deals with con guration ofthe source and retrieval of image data in a standard format. A detailed ex-planation of the data access component is presented through our discussionof the Uni ed Camera Framework in Section 4.2.2.Data Transportation In many vision systems, especially large scale sys-tems, the components of the system such as source and processor modulesare often distributed over a network or physically connected to several ma-chines via a communication medium such as a bus. The data transport mod-ule addresses the need for inter-communication of data and control amongstthe di erent modules of the vision system. We have developed a standaloneframework for addressing the issues involved in data transportation for visioncalled Hive. Chapter 5 presents the conceptual design and implementationof this framework in detail.Data Conversion Di erent sources employ a large variety of data formatsand compression schemes to represent the image data. In order for modules253.3. Conclusionto communicate data e ectively, they need to agree on the communicateddata types. The data manipulation module addresses the need for conversionof data formats in order to allow devices with di erent native representationsof image data to communicate. A detailed explanation of the data conversioncomponent is presented in Section ConclusionIn order to address the drawbacks of current frameworks for vision based ap-plication development, we presented a new approach that is based on a novelclassi cation of the vision problem. We decomposed vision into two tasks:processing vision data and managing vision data. We further classi ed thedata management task into: data access, which retrieves image data fromsources; data transport, which delivers data in between the modules; anddata conversion, which converts the between di erent data formats requiredby modules.The classi cation of computer vision tasks in this way provides two majoradvantages:  rstly, it reduces the developers’ and researchers’ work load byproviding high level abstractions and allowing them to focus on developmentand extension of a particular task independently of others, and secondly,it promotes developing modularized, reusable code by removing inter-taskdependencies via standardization of a clear interface between tasks.26Chapter 4The Vision UtilityFrameworkIn this chapter we present the VU framework for vision based applicationdevelopment. It is directly based on the classi cation of the vision problempresented in Chapter 3. VU provides the required data management func-tionality to vision developers via an API that abstracts the details of itssub-components.The goal of this chapter is to verify that the approach of vision systemdevelopment through separation of vision into sub-tasks is valid and that itprovides the necessary abstraction over low-level data management details.More speci cally, the framework will be evaluated in terms of the followingcriteria in Section 4.3.3: Is data capture from sources de-coupled from processing? Are the image data format details hidden from the user? Does the framework provide abstraction over inter-component com-munication?We present the VU framework in three parts. We  rst present anoverview of the framework which includes the development model, the con-ceptual design, and components of VU. Second, we discuss the details of thearchitecture of the VU framework and  nally we present the proof-of-conceptwhich includes an implementation of the framework, a vision system basedon the framework, and an evaluation of the e ectiveness of the framework.The VU framework in its current build is designed to address a subset ofthe vision problem and it is intentionally limited to support vision systemsthat consist of a single source and a single processor. This decision wasmade in order to simplify the development task. The VU approach howevergeneralizes to more sophisticated frameworks that support multiple sourcesand processors and connection patterns.274.1. VU Overview4.1 VU OverviewIn this section we introduce the VU framework. The VU approach towardsvision based system development is fundamentally di erent from traditionalframeworks as it is based on a novel classi cation of the vision problem. Wepresent this new approach by discussing the system development model, themodules of the framework and the communication model used.4.1.1 System Development ModelVU allows programmers to create vision systems by developing applicationsthat con gure and connect sources and processors. In this model, sourcesare modules that produce image data, processors are modules that performprocessing on data, and the application is the control center of the wholesystem. This development model preserves the separation between the datamanagement tasks performed by the framework (as described in Section 3.2)and the data processing performed by the processing module.VU provides a system development environment that encapsulates thedata management functionality while providing an API that implicitly pre-serves the natural decoupling between the data management task and dataprocessing task. Figure 4.1 shows the scope of the VU framework and itsrelationship to the application, source and processing blocks.4.1.2 ModulesThe following sections describe the scope of the source, processor, and theapplication modules with respect to the VU framework.SourceThe representation of image data sources within VU is generalized to a blackbox with an interface that includes the con guration parameters and theoutput. Figure 4.2a shows the representation of the source block. In the VUframework all sources are abstracted as con gurable virtual cameras. Thecon guration interface of the source block exposes the internal con gurationof the source (such as resolution, exposure settings, white balance and soforth in the case of actual camera devices.)284.1. VU OverviewFigure 4.1: System development using the VU framework. The scope of theframework is clearly marked.(a) (b)(c) (d)Figure 4.2: Representation of the a)source, b)application, c)analysing pro-cessor and d)image based processor modules in the VU framework.294.1. VU OverviewProcessorAs discussed in Section 3.1 we classify the processing task into two cate-gories: extraction of meta-data and manipulation of data. Within VU thetwo types of processors are distinguished. The  rst type is called the imageanalysis processor and the second type is called the image manipulation pro-cessor. Figures 4.2c and (d) show the di erence between the two processors.While they are both presented as black boxes with an input and con gu-ration data, the image analysis processor produces meta-data whereas theimage based processor produces image data. The input is image data andthe con guration parameters allow customization of the algorithm to accom-modate the nature of the input data. Meta-data is a high-level descriptionextracted by the analysis process and the output data is an image. In prac-tice it is possible for a processor device to be a mixture of the two modelsand provide both meta-data and output image data.The following is a simple example which demonstrates the concepts be-hind the processor representation in the VU framework: A basic foregroundextraction algorithm works by subtracting input images from a constantbackground image  agging the pixels that are greater than a threshold valueas foreground and grouping these pixels by bounding boxes. In this case,the input is a pixel based image, the output is a binary image (foreground-1,background-0), the con guration is a threshold value and control over back-ground model selection, and the meta-data is the list of bounding boxes.Since the outputs are both high-level (meta-data such as location and bound-ing boxes) and low-level (meta-data such as binary image), this processor isa mixture of the analysis and image based processor.This simple model for the processing unit can scale to more complicatedalgorithms by presenting a broader de nition of the input set and the param-eter set. The overall paradigm of inputs, outputs, parameters and meta-datais still valid even if a processing block requiring a set of images as input anda parameter set selecting amongst multiple algorithms is presented.ApplicationThe application is developed by the users of the VU framework. The ap-plication uses the VU API to access, con gure and connect sources to pro-cessors and retrieve meta-data. Figure 4.2c shows the representation of theapplication block within the framework. The application can receive theoutput of the source and processor as well as the con guration parametersand meta-data.304.2. Architecture4.1.3 Communication ModelThe communication model for the VU framework is based on asynchronouscallbacks. A discussion of the decision to employ asynchronous callbacks isincluded in Chapter 5 where we present the architecture of the Hive frame-work.VU provides two methods for communicating con guration and dataamongst devices: remote function calls for con guration and retrieval ofmeta-data of devices, and receiving outputs from devices. Sources onlysupport the  rst method whereas processors support both.Remote function calls allow the application to read and write data to adevice. The data can be con guration parameters or meta-data produced bythe device. For source devices the application can call the remote functionsat any time; however, for processor devices there is a synchronous callbackwhich is invoked upon processing a frame of data. This callback blocksand allows the application to exchange data with the device. Through thiscallback the application is allowed to control, modify and verify resultingdata of the processing module and possibly make decisions whether to moveon to the next frame or reprocess (further process) the current frame.In addition to the remote function calls, processors allow the transfer oftheir outputs to applications. The output here refers strictly to image dataas other outputs are transferred via the remote function call mechanismdescribed above. Limiting the output of the devices to image data is donein order to provide the functionality of image manipulation framework andallow for possible extension of the framework to multi-processing devices viacascading of the outputs.4.2 ArchitectureThe VU framework provides the data management required for vision basedapplication development through an interface that abstracts the details ofthe data management operations. The architecture of the VU frameworktherefore contains two layers: The Interface layer and the Core layer. Asshown in Figure 4.3 VU’s architecture consists of the interconnection of thefollowing components: Interface layer{ Application Interface{ Driver Interface314.2. ArchitectureFigure 4.3: The VU architecture. Core layer{ Data Transport{ Data Access{ Data ConvertThe following sections provide a description of each layer and its subse-quent components:4.2.1 Interface LayerThe Interface layer contains two components; the Application Interface whichprovides the programmer access to the functionality provided by the Corelayer, and the Driver Interface which provides a contact point for devicedevelopers allowing them to create drivers for existing sources and pro-cessors. Since the VU framework is based on asynchronous callbacks, inboth instances the interface routines allow the user to register a number ofcallbacks to perform actions based on arriving events. The following is adiscussion of the interface and the driver component of the framework.324.2. ArchitectureApplication InterfaceThe Application Interface layer provides routines to expose the functionalityof the VU framework to the vision application programmer. It providesaccess and abstraction over the functionality of the Core layer to simplifythe task of the developer. The following is a description of the servicesprovided by the Application Interface: Device query: This functionality allows the programmer to query thedevices (sources and processors) that are available and their generalproperties. Device communication: This functionality allows the programmerto send and receive data from devices in order to set or get con gura-tion parameteras and meta-data. Device interconnection: This functionality allows programmers tocreate a vision processing context by connecting sources to processors. Callback handling: This functionality allows programmers to regis-ter handlers for the communication callbacks from each processor.Driver InterfaceThe Driver Interface standardizes the communication between devices andVU while providing abstraction over the implementation details of devices.Devices can virtually be anything, ranging from software routines to hard-ware accelerated implementations via GPUs, as long as they provide a‘driver’ that adheres to the de ned interface.Since communication between devices and the VU framework is strictlycallback based, the Driver Interface provides the mechanism to allow de-velopers to register their services in order for them to be VU compatible.These services must include handlers for con guration providing output andthe main processing of the device.4.2.2 Core LayerThe Core layer of the VU framework performs the data management ser-vices. As discussed in Chapter 3 data management consists of the followingthree tasks: Data Access: Con guration and retrieval of data from sources334.2. Architecture Data Conversion: Conversion and transformation of image data Data Transport: Data transfer mediation between modulesVU provides abstraction over the details of these tasks from the user throughthe use of the Core layer. The following is a discussion of each of the sub-components of the Core layer:Data AccessCurrently there are large variations of protocols and formats used for ac-cessing image data, which is mostly due to the diversity of existing imagesources. Analogue cameras, digital cameras, image and movie  les, as wellas imaging sensors based on mediums other than light (e.g. ultrasound)are just several examples of sources that are commonly used. Even withineach group there exists a wide variety of devices with di erent representa-tions in software and physical interfaces. This variability results in a lot ofcustomized e ort in order to con gure and get data from these devices. Inaddition to the variability within devices, there exist numerous representa-tions of image and movie data that introduces additional complexity.This diversity has created a need for a general framework that providesuniform access to image sources regardless of access protocols, physical in-stantiation and native data type. The aim of this framework would be to getimage data in native format from any image and video source in a uniformway.In order to unify access to image data from devices and other mediums,we have created a model that presents image data sources as virtual cam-eras with a uni ed interface. We have named this conceptual approach theUni ed Camera Framework (UCF). UCF de nes and implements two sets ofspeci cations: the image speci cation that speci es the format of the imagedata, and the device communication protocol that speci es a standard fordevice communication.The image speci cation within UCF is the de nition of the meta-dataformat that provides a complete description of its associated image data.This speci cation identi es the properties and the format of the image dataallowing the consumer of the data to be able to decode and use the imagedata. Typical image properties include resolution and compression type.However, there are many other properties that need to be included in orderto provide a comprehensive description of images.The device communication protocol within UCF speci es the methodfor communication of data to and from devices. The communicated infor-344.2. Architecturemation consists of image and con guration data. This protocol would beimplemented on top of the existing device driver in order to allow that deviceto be used as a UCF compatible device.Data ConversionManually pre-processing images and converting them into the format re-quired by the processing task is a non-trivial task. Pre-processing image dataincludes operations such as format conversion, resolution change, colour-space change and a ne transformation. These tasks are non-trivial andrequire additional e ort on the developer’s part. The existence of a frame-work that performs these tasks seamlessly as a preprocessing step for a pro-cessing task is essential to the vision processing community. However, sucha framework is still not available in a seamless manner. The main reasonfor the non-existence of such a framework is due mostly to the inadequatestandard method for the representation of image data properties. Withouta standard representation, it is very di cult to provide a framework forconversion between data of di erent formats.We propose an image manipulation framework that allows for seam-less transformation between data with di erent properties. This frameworkwould rely on the image speci cation provided by the data access frameworkexplained in the previous section. The reliance on the standard image spec-i cation makes this framework conceptually simple as it can be representedas a black box. The image speci cation would provide a uniform method foridenti cation between the input and output of the framework and based onthe output requirements the framework would apply the appropriate conver-sions and transformations. It is important to note that there is some overlapbetween operations that are considered to be part of this framework (dataformat conversion/transformation) and data processing. However, includ-ing functionality here does not preclude the processing task from performingsimilar operations to data processing. This is demonstrated in Section 4.3.3with the resizing functionality of the processor.Data TransportTransportation of data between di erent source and processing modules isan important aspect of vision system development. The vision problem isinherently a distributed task due to its employment of multiple sensors.Communication between these sensors and processing modules is not a triv-ial task. As discussed in Chapter 2, there are no existing frameworks that354.3. Proof of Concepttarget the data transport requirement of vision system development ade-quately. In order to address data transport we have developed an archi-tecture for building distributed modular systems focusing on vision calledHive. We present the motivation and the complete description of the Hiveframework in Chapter 5.4.3 Proof of ConceptIn this section we present an evaluation of the framework in three parts. Inthe  rst part we present the implementation details of the VU frameworkthat we have developed in order to verify our approach towards vision basedsystem development. In the second part we present a system that we devel-oped using the VU framework. In the third and  nal part of this section,we present an evaluation of the framework, and discussion of the results ofthe application.The current version of the VU framework is a proof-of-concept systemthat illustrates the validity of our approach; therefore, it is not a fully fea-tured framework. The main limitations of this implementation are the fol-lowing: VU only supports a single pipeline consisting of one source and oneprocessor; VU uses the above frameworks that are also not fully featuredand only provide a subset of the features. To implement a full version of thisframework would require fully featured versions of the components statedabove and is currently out of the scope of this thesis.4.3.1 Implementation of VU FrameworkThe implementation of the VU framework closely follows the architecturepresented above. We have implemented an API that exposes the function-ality of VU. In order to provide the needed functionality to the users andmaintain consistency with the classi cation presented in Chapter 3, we havedeveloped three frameworks to address the data access, manipulation andtransport requirements. The following is a discussion of the API of theframework as well as the implementation details of the components of theframework.Interface LayerIn the VU implementation we created two separate interfaces: an applicationAPI and a driver API. The application API exposes the services of theframework to the vision system developer, whereas the driver API allows364.3. Proof of ConceptRequirement MethodsDevice query GetDevices()Device con guration SetParameter()GetParameter()Device interconnection CreateContext()SetContextSource()Callback handling SetIdleFn()RegisterImageOutputCallback()RegisterPostprocessCallback()Table 4.1: This table shows the methods that provide the VU applicationfunctionality.developers to implement drivers for devices to use in the VU framework.Both API’s use the asynchronous callback model to interface with users. Acomplete list of the VU API is included in Appendix C which outlines thedescription, inputs and outputs of each VU routine.Application API The application API supports device communicationand control through a set of routines that implement the tasks outlined inSection 4.2.1. Table 4.1 shows the correlation between the requirements ofthe VU framework and the routines that provide the requirements. TheGetDevices() routine provides a list of the available devices to the user. Theuser can query and set the parameters of these devices using the GetParam-eter() and SetParameter() routines. Device interconnection is not a directcall, but two calls. The  rst call (CreateContext()) starts a vision processing‘context’ associated to a particular processing device. The second call (Set-ContextSource()) associates a source device to the ‘context’ hence producingthe connection between the source and the processor. SetIdleFn() allows theuser to provide an idle routine that gets called when the framework is notprocessing events. Using RegisterPostprocessCallback() the user can pro-vide a routine that gets called when the processor has  nished processinga frame of data. This allows the user to check the status of the systemand possibly con gure the processor or source during runtime. Using Reg-isterImageOutputCallback() the user can provide a routine that accepts aprocessor’s output image in a speci c format.374.3. Proof of ConceptRequirement MethodsCallback handling RegisterProcessorFn()RegisterGetCon gurationFn()RegisterSetCon gurationFn()RegisterDataReciever()Output production SendOutput()GetParameter()Table 4.2: This table shows the methods required to device developers inorder to write drivers for various devicesDriver API The Driver API provides a standard interface to mediatethe communication between the VU framework and devices through asyn-chronous callbacks. Table 4.2 lists the API methods for the Driver. Fora device to be used in the VU framework, it needs to implement a driverthat provides handlers for these routines. The RegisterProcessorFn() allowsthe developer to provide the main processing routine of the device whichis called repeatedly by the framework. RegisterDataReciever() allows thedeveloper to specify the properties of the incoming images (device’s input)and provide the routine that receives it. RegisterGetCon gurationFn() andRegisterSetCon gurationFn() allows the developer to provide routines thatresponds to the applications requests for retrieval (GetParameters()) andassignment (SetParameters()) of internal parameters. The driver developerwould also need to expose the parameter types to the application developervia a header  le for these functions to be utilized.Core layerIn order to implement the functionality of VU we have developed threeframeworks that implement the sub-components discussed in Section 4.2.2.The VU Core layer simply makes calls to these frameworks based on internalevents or user requests from the Interface layer. The implementation of thedata access and the data convert components is done in a very lightweightmanner as proof-of-concept and are covered here. However, for the datatransport, we have developed a full framework that is presented in Chapter 5.We have de ned a light version of the image speci cation and device pro-tocol. The image speci cation de nes a number of data types that describethe content and the encoding of the image. Table 4.3 summarizes the imagespeci cation that we are using in our implementation. As can be seen we384.3. Proof of ConceptProperties OptionsImage Compression RawJPEGPNGPixel Depth (Bytes) 1, 2, 3, 4Pixel Type GrayscaleRGBBGRHSVYUVCMYKImage Origin Top LeftBottom RightTable 4.3: This table provides a summary of the image properties includedin our proof of concept image speci cationhave limited support for data types, but this description can be extended toinclude many other image and video formats. The device access protocol isbased on a uniform set of asynchronous callbacks to the driver.In order to perform data conversion we have developed a set of librariesto convert between di erent image types as de ned by Table 4.3. We havewrapped the data conversion component in a simple function that takes aninput image header, an output image header and input data, and providesdata in the format requested by the output header. This is a very simpli edview of data conversion and to implement a fully featured package requiresfurther research.4.3.2 System Development using VUThis section presents the proof-of-concept application that has been devel-oped using the VU framework. In order to create a direct contrast betweenthe VU framework programming paradigm and the current vision applica-tion development paradigm we have chosen to present a subset of OpenCV’sfunctionality in this section.We describe the components that make up this proof-of-concept system(the sources and the processor) and present the application program thatuses these components. We then present the result of the application andcontrast this approach to the conventional method of development.394.3. Proof of ConceptFigure 4.4: OpenCV based VU framework processor.SourcesFor this application we use two sources that employ very di erent phys-ical protocols: An Axis 207 network camera that employs TCP/IP overEthernet and a Logitech Quickcam pro 3000 that uses USB as a communi-cation medium. We have developed drivers for these two cameras for useas sources with the VU framework. These drivers adhere to the data accessimage speci cation data and retrieval standards presented in Section 4.3.1.ProcessorWe have designed a VU framework processor that performs a number ofdi erent tasks implemented using the OpenCV framework. The following isthe functionality we have chosen to include in this processor: Resizing Smoothing Image Subtraction Image Thresholding Blob ExtractionThese sub-processing components have been pipelined and collectivelyconstitute the processor that performs foreground object detection. How-ever, the user has control over activating components of the pipeline and byutilizing VU’s functionality could perform any combination of these com-ponents on an image by processing it through the pipeline multiple times.Figure 4.4 shows the processor and its components.In order to be utilized as a VU device, the processor needs to providethree components: a header  le that includes the con guration and meta-data types of the device; a set of handlers methods for the VU callbacks;404.3. Proof of Conceptand a header  le that provides functions to allow the application to accessand con gure the devices parameters and meta-data.Code Snippet 4.1 shows the code that performs the task of providinghandlers to for the callbacks. This code provides function pointers as han-dlers for the incoming image, get and set con guration, and the processingroutine. Note that the SetDataReceiver call passes the input header objectthat speci es the properties for the incoming image. The framework takescare of converting the incoming image to match those properties automati-cally.CodeSnippet 4.1 VU device interface code. This snippet shows the processof registering callback handlers.// Declare VUDeviceVUDevice vu_device(VU_DEVICE_PORT);// configuration of VUDevicevu_device.SetProcessorFn(OpenCVProcess);vu_device.SetConfigurationFn(GetConfiguration);vu_device.SetConfigureFn(SetConfiguration);vu_device.SetDataReceiver(input_header, RecieveImage);Table 4.4 shows the con guration parameters of the OpenCV processorconstituting the second requirement for a VU device. In addition to thecontrol structure (Active Modules) that determines the active componentsof the pipeline, each component of the pipeline has a set of con gurationparameters that is exposed to the application developer. By manipulat-ing these con guration parameters the pipeline and its components can becon gured.Code Snippet 4.2 shows the third and  nal requirement of the VU device;an example of a function that utilizes the set and get con guration func-tionality of VU application API to modify and access a particular device’sinternal parameters and meta-data. This code shows the function that setsthe status (active or not-active) of each component on the device. It takesin the VU object, the processor device and the con guration data and con- gures the device using the appropriate ‘command’ and VU’s setParameterroutine.414.3. Proof of ConceptParameters Con gurationsActive Modules ResizingSmoothingImage SubtractionThresholdingBlob DetectionSmoothing Smoothing TypeSmoothing parametersImage SubtractionThresholding Threshold ValueBlob Detection Minimum SizeMaximum SizeTable 4.4: This table shows the parameters of the OpenCV processorApplicationThe role of the application module is to con gure the source and connect itto the processor. For this application we would like to display the processor’soutput on the screen under a number of di erent con gurations to test theframework.Assigning the Source The application can assign one of the two availablesources to the processor. Depending on the user’s input one of the twosources is chosen and con gured to provide images at a speci c resolution(e.g. 640x480 or 320x240).Con guring the Processor In order to fully exercise the processor wehave developed the application to take inputs from the user in runtime tocon gure the processor. Using the GLUT framework we are capturing keystrokes from the user and setting appropriate  ags that activate or deactivatecomponents of the pipeline upon the invocation of the post process callbackhandling routine.Displaying Processor Output In order to display images we are usingthe GLUT framework to create a 640x480 RGB window. To receive theoutput of the processor in this format we register a handler for the incomingimage and specify the desired image properties (640x480 RGB). The image424.3. Proof of ConceptCodeSnippet 4.2 This is an example of wrapper code that sets the acti-vation of opencv components.bool cvDeviceSetComponents(VU *vu, VU::Device processor,opencvComponents &config_data){int psize = sizeof(commandType)+sizeof(opencvComponents);byte *pdata = new byte[psize];commandType parameter = CV_SET_COMPONENT_STATUS;memcpy(pdata,&parameter,sizeof(commandType));memcpy(pdata+sizeof(commandType),&config_data,sizeof(opencvComponents));bool res = vu->SetParameter(processor, pdata,psize, VU::DeviceConfig);delete[] pdata;return res;}receiving handler copies the data into the GLUT display bu er and calls thedisplay function.4.3.3 Results and EvaluationThis section presents an evaluation of the VU framework based on the cri-teria presented in the introduction of the chapter. Through the followingevaluation we demonstrate the following: The details of accessing data from sources is de-coupled from the pro-cessing module. The details of image data format of the source and processor are hiddenfrom the user of the framework. The details of physical connection and communication of the sourceand processor module is hidden from the user of the framework.In the following sub-sections we present and discuss the results in detailas they apply to these three points. In addition to directly evaluating theframework with respect to these three points we present the results of theoverall system in order to verify the VU framework’s ability to con gure andcommunicate to the processor in run-time.434.3. Proof of ConceptSource Access AbstractionAs presented in the previous section we have developed VU drivers for twodi erent cameras: a USB camera and a network camera. In order to testthe source access abstraction of the VU framework we have developed anapplication that connects each source to a processor and displays its output.In order to simplify the task and focus on the task at hand (source accessabstraction) we have con gured the processor to perform no operation onthe image hence the output of the processor is the same as its input.The two cameras have di erent native formats. The USB camera pro-duces raw RGB images whereas the Axis 207 camera produces JPEG images.Although both cameras have a number of di erent con gurable internal pa-rameters, for demonstration, we only con gure the resolution of the cameras.Figure 4.5 show the output of the processor with the di erent camerasset at di erent resolutions, and Code Snippet 4.3 shows the code responsiblefor assigning a source to a processing pipeline. As can be seen from the code,the application programmer does not deal with any low-level details such asmemory allocation, addressing or format change. The user simply picks asource from the list, has the option of setting the properties and assigns thesource to the processor. This interface will remain the same regardless ofthe source; for example, image and movie  les could also be represented asvirtual cameras.Image Detail AbstractionAn important contribution of VU is the abstraction it provides over theimage representations inside the system from the user of the frameworkthus removing any e ort that is associated with image data manipulationand transformation on the user’s part.In order to evaluate the framework’s ability to perform this abstraction,we provide a number of di erent image resolutions and formats to each com-ponent of the pipeline. Since each component of the pipeline (the processorand the application) can only perform operations on a speci c data type,the framework should handle the format changes to cater to the needs ofeach component. We evaluate the application developer’s awareness of thechange in image formats throughout the system.The processor used for this evaluation is designed to strictly accept im-ages with resolution of 640x480 RGB format and depending on its con g-uration can produce images of any resolution. The application has beendesigned to recon gure the source to produce images with resolutions of444.3. Proof of Concept(a) (b)(c) (d)Figure 4.5: Output of the processor with the a) USB camera at 320x240,(b) USB camera at 640x480, c) AXIS camera at 320x240, (d) AXIS cameraat 640x480.640x480, 352x288 and 320x240 in both raw RGB and JPEG format andstrictly display images of resolution 640x480 in RGB space.We exercised the system by testing it under a number of di erent sce-narios.In the  rst scenario, the source is con gured to provide images of reso-lution 640x480 in RGB space, and the processor is simply passing the sameimage to the application that displays it at the same resolution and type,hence eliminating the need for any format change (as shown in Figure 4.6a)).In the second scenario, the source is producing images of resolution 320x240of JPEG format, the processor requires raw 640x480 images hence requiringa transformation by the framework to match the required data type for theprocessor (as shown in Figure 4.6b)). In the last scenario, the source is stillproducing 320x240 JPEG images and the processor has been con gured toconvert the images to raw 100x200 RGB format. Since the application ac-454.3. Proof of ConceptCodeSnippet 4.3 VU code for setting up the source of a processingpipeline./* using the QCP camera */vu.SetContextSource(devices.at(0));QCPSettings qcp_settings;qcp_settings.width = 320;qcp_settings.height = 240;res = QCFSetProperties(vu, devices.at(0), qcp_settings);/* using the AXIS camera *///vu.SetContextSource(devices.at(1));//AxisSettings axis_settings;//axis_settings.width = 640;//axis_settings.height = 480;//res = AXISSetProperties(vu, devices.at(1), axis_settings);cepts only 640x480 RGB the framework converts the image automatically tomatch the required format for the application (as shown in In Figure 4.6c)).Code Snippet 4.4 shows the code for registering image receiving handlersfor both the application and the processor device driver. Using this API,the user simply states the required properties of the incoming image andthe VU framework takes care of the conversion to that type. Using thisframework all sources and processors are compatible and can be connectedusing the ‘plug and play’ paradigm.Communication AbstractionTo demonstrate VU’s ability to provide abstraction over the inter-modulecommunication details, we present and compare the code for two systems de-veloped on the VU framework. The components of the  rst system (source,processor and application) are on a PC running Linux OS. For the secondsystem the source and application are running on the Linux PC and theprocessor is running on a Windows PC that resides on the same network.The only di erence in the two applications from the developer’s per-spective is the selection of the di erent processor from the list. The systemdeveloper treats both processing modules identically as the actual physical464.3. Proof of Concept(a) (b) (c)Figure 4.6: Output of the processor with a) Camera and processor raw RGB640x480, (b) Camera JPEG 320x240 and processor raw 640x480, c) camera320x240 JPEG and processor raw RGB 100x200.CodeSnippet 4.4 Code for registering an incoming image handler.UCF::ImageProp img_prop;img_prop.width = width;img_prop.hight = height;img_prop.size = width*height*UCF::Byte3;img_prop.image_format = UCF::Raw;img_prop.pixel_depth = UCF::Byte3;img_prop.pixel_type = UCF::RGB;// VU Applicationvu_application.RegisterImageReceiver(img_prop,image_receiver);connection is completely hidden from the application developer via the datatransport component. More discussion regarding data transport is presentedin Chapter 5.Run-time Processor Control and Con gurationTo demonstrate the run-time control and con guration of the processor fromthe application we present the results of the application under a number ofdi erent controlling commands from the user. Figure 4.7 shows the screenshots of the system output with a number of di erent processor componentsactivated.474.4. ConclusionRaw feed Smoothing ResizingResizing + Smoothing Background image Foreground imageSubtraction Thresholding Foreground GroupingFigure 4.7: The result of the VU processor.4.4 ConclusionIn this chapter we presented our approach of addressing the vision problem.Our approach is directly derived from the classi cation of the vision problempresented in Chapter 3. We presented the VU framework which providesthe data management component of the vision task to vision developerswhile providing abstraction over the low-level details of data managementsub-components.We presented a proof-of-concept implementation of each data manage-ment sub-components and demonstrated how the VU framework utilizesthese components and provides transparent access to their functionality tovision based application developers. We presented an application developedusing the VU framework that utilizes two di erent sources and a processorbased on the OpenCV framework.484.4. ConclusionWe demonstrated that by using the VU framework, the user is not re-quired to explicitly address the details of accessing sources, converting imagedata and transporting data in between modules.49Chapter 5Hive FrameworkIn this chapter we present the Hive framework as a solution to satisfy com-puter vision’s requirement for data transportation as described in Chapter 3.As presented in Chapter 4, Hive serves as the basis for the transport com-ponent of the VU framework. However, Hive is a standalone frameworkdesigned to provide the modularity and distributivity needed in vision sys-tem development.The main goal of the Hive framework is to mediate reusability by provid-ing abstraction over inter-module communication in a platform independentway. There is a direct link between support for data transportation andreusability that has inspired the Hive framework. The existence of a datatransport mechanism is necessary to achieve code reusability since without astandard method for data communication between vision components thereis no easy way of de ning a standard interface for components to make themreusable. Reusability, abstraction and platform independence form the basisof our evaluation at the end of this chapter.Data transportation for vision based systems is a complex task and tocreate an all encompassing solution it requires that has a large set of re-quirements be fully met which is well beyond the scope of this. We havemainly focused on satisfying a set of key requirements in developing Hivewhile not addressing hard real-time or global synchronization requirements.Following is a list of the requirements addressed by Hive: Abstraction and Encapsulation: Low-level communication detailsshould be hidden from the user. This feature also decouples the im-plementation of the framework from its API. Plug-in Interface: Hive modules should be standalone (not requireany other modules) and reusable to emulate the successful operatingsystem’s paradigm of plug and play. Flexible and Low overhead Communication: The communica-tion protocol needs to provide low-overhead direct connection between505.1. Hive Overviewmodules while being extensible and  exible to accommodate any pos-sible distribution pattern for vision systems. Centralized Control: The modules and connection between themodules should be controlled centrally to allow for dynamic recon- guration. Cross platform: The framework needs to be platform independent inorder to be promote use, portability and interchangeability of modules.This requirement allows heterogeneous sensor systems to be easily de-veloped.This chapter presents an overview of the Hive frameworks, the archi-tecture of Hive, and a discussion of the implementation of its components,highlighting how the architecture satis es the mentioned requirements.5.1 Hive OverviewIn this section we are introducing the Hive framework by presenting itsapplication development model, its components and its framework model.5.1.1 System development ModelHive is a modular framework based on the concept of encapsulated anddistributed processing. Hive systems consist of a single application and anumber of drone modules which allows developers to rapidly create visionsystems by reusing modules. Hive provides control routines to the applica-tion and mediates all of the communication between the modules by creat-ing a structured peer-to-peer network. Using Hive’s communication model,drones can be set up as a pipeline, a distributed network, or any combi-nation of the two. Figure 5.1 shows a simple Hive system that performshuman tracking using a background subtraction and a tracking drone. Theimage data in this case is provided by the camera drone. The applicationcreates the data connections between the drones for processing and receivesthe output of the tracker. The following sections present the components ofHive: the application, the drone and swarms.5.1.2 ModulesThe following is a description of the di erent types of modules that makeup Hive systems:515.1. Hive OverviewFigure 5.1: An example swarm set up using Hive to accomplish a visionprocessing task. Adapted from [1]ApplicationThe application module is the control center of Hive based systems. Theapplication provides access to Hive modules to the system developer. Therole of the application is to con gure and connect drones together to createsingle or multiple swarms.Hive supplies an Application API for the system developer that providesaccess to functions that con gure drones’ internal parameters and createdata connections for drone-to-drone and drone-to-application communica-tion. In addition to the initial con guration and setup, the API allows anapplication to connect drones to itself to receive data from drones. Thisdata may be the results of the vision processing component of the systemthat is used by the Hive application or it can be run time information aboutthe status of each drone. Hive applications can recon gure drones or theswarm dynamically at run time if needed.The application accesses drones using a universal addressing scheme.Using this addressing model, drones can be connected to the same physicalmachine as the application or be scattered across the network.DroneDrones are independent, reusable modules with a well de ned interface thatcarry out one or several speci c tasks. Drones can be based on a physical525.1. Hive Overviewdevice such as a graphics card or a software routine. However, regardlessof the nature of the drone, the interface remains constant. The drone’sinterface consists of the speci cation of the input data types, the internalcon guration parameters, and the output data types. The con gurationparameters of a drone allow a programmer to customize its function. Forexample, a camera drone’s parameters can be the resolution and frame rate.Drones can be pure data sources (such as cameras), pure data processors(such as a background subtraction module), or a combination of the two(such as smart cameras).Hive provides a simple interface for developers to create drone modulesthat run a software routine or interface with a physical hardware device.The Driver API provides routines for receiving data, performing process-ing, sending and receiving con guration parameters from applications, andsending data to other modules in the system. This API allows developers toeasily create drones which can be connected to any other drone that adheresto its interface.SwarmHive swarms are a set of inter connected drones controlled by a Hive ap-plication. Swarms consist of sources that produce data and processors thatprocess the data. They can be set up in a variety of con gurations to accom-plish a single or a series of complex vision processing tasks. An applicationcan set up multiple swarms simultaneously using di erent drones or employa single drone in more than one swarm.5.1.3 Framework ModelThe Hive framework is a hybrid of two well known architecture patterns:Pipes and Filters (stream processing) and event driven architecture. Eachof these patterns have a set of advantages and disadvantages. By combiningthem we are leveraging the bene ts of each approach in order to satisfyHive’s requirements.The Pipes and Filters model for stream processing is a popular model forsignal processing that allows for parallel processing and distribution. In thismodel  lters are processing units with well de ned inputs and outputs andpipes are connectors that transfer stream data among  lters. The simple rep-resentation of  lters (inputs and output description) in this model providesmodularity and easy reusability. Furthermore, as  lters are independentunits, they may be utilized in order to achieve parallel and distributed pro-535.2. Architecturecessing. Bene ts of Pipes and Filters with relation to computer vision hasbeen demonstrated in [16] where the authors use a modi ed Pipes and Fil-ters model to successfully develop several vision based systems. The maindisadvantage of the Pipes and Filters model with respect to vision basedsystem development stems from the simplicity of the  lters. As the modeldoes not provide high-level management of  lters, there is no mechanismfor interactively accessing and manipulating the  lter parameters. We haveaddressed this issue in Hive by modifying the communication model of the\ lters".Event Driven Architecture (EDA) describes a model of communicationbetween components that is based on production and consumption of events.An event in this sense is de ned as a change in state that is exchanged be-tween a system’s components[8]. The use of EDA for vision is not necessary;however, it provides a number of features that make it a very suitable model.The main advantage provided by this model is that system components arevery loosely coupled since the event creator has no knowledge of the recip-ient(s). Due to the loosely coupled nature of its components, EDA can beeasily distributed.The Hive framework’s architecture is based on a modi ed Pipes andFilters model which utilizes an event driven architecture for communicationbetween components. This approach provides the advantages of the Pipesand Filters in addition to interactive control over the components of theframework. The details of Hive’s architecture are presented in Section ArchitectureHive is based on a layered architecture inspired by the success of other lay-ered designs such as the OSI model[44]. As successfully demonstrated bythe OSI model, the layered architecture decouples services o ered by theframework and provides abstraction and encapsulation of the implementa-tion details of each service. In general layered architectures promote modu-larity and reusability as lower layers can be used by several instances of theupper layers, thereby reducing the complexity of development.Figure 5.2 shows Hive’s layered architecture for both the application anddrone. There is a division in the architecture between the application anddrone in the interface and Service layers. This distinction is necessary sinceHive provides two API’s with di erent functionality; however, the abstrac-tion provided by the Communication layer is identical for both modules. Inthis section we present a detailed discussion of the motivation and design of545.2. ArchitectureFigure 5.2: The layered architecture of Hive for both applications anddrones.each layer.5.2.1 Interface LayerThe Interface layer provides Hive’s application programming interface tousers via asynchronous callbacks. Callbacks are chosen in this instance be-cause they work well with the asynchronous event based nature of the sys-tem. The interface layer could also be implemented via blocking or pollingmethods. However, in these two cases the overhead on the user is increasedas the user would need to manage the blocking call or poll for events man-ually. Hive provides two di erent API’s to users: the Application API andthe Driver API. Both API’s share a subset of routines for setting up datahandlers and the main method as well as a number of speci c routines. Thenext three sections describe the Interface layer by discussing the API’s forboth the application and drones.Application Speci c APIThe fundamental di erence between applications and drones is the applica-tion’s ability to manage drones and create swarms. The Application APIprovides the functionality to set and get the internal con guration param-eters of individual drones. This can be done prior to creating a swarmor during the operation of a swarm. To create swarms the Application555.2. ArchitectureAPI provides methods for connecting drones to each other. These connec-tions specify the data type and can either be synchronized or streaming.The di erence between the two connection methods is that for synchronizedconnections there is a request each time the data is required by the receiv-ing drone. However, with streaming connections the data is sent wheneverthe sending drone has it available. There are also methods for connectingdrones to the application and vice versa. The application can also send dataand noti cations to drones directly. The API for the application exposesthe functionality provided by the Service (Manager) layer to the applicationdeveloper.Driver Speci c APIThe driver speci c API is a thin layer that provides counterpart routines tothe Application API and a drone’s output. The Driver API allows dronesto provide function handlers to respond to the set and get con gurationrequests. These handlers are invoked by callbacks when initiated by theapplication. The API also allows the drone to provide outputs by creatingdata of a certain type which is transported by the Communication layer toany drone that has registered to receive data of that type.Common APISince communication for both application and drones is done via asyn-chronous callbacks, registering callback handlers is the same for both in-terfaces. There are two types of callbacks for communication with modules:the main routine and incoming data. The main routine is a method thatis invoked repeatedly by the Service layer and often carries out the mainprocessing task of the module. For example, a camera drone’s main routineis responsible for retrieving a new frame and passing it as output to theEvent layer. The incoming data handlers are routines that are registered bythe modules to process a speci c data type. These routines are invoked bythe Service layer whenever a data of that type arrives.5.2.2 Service LayerThe Service layer provides the functionality for the Interface layer for thedrones. The Service Layer’s functionality can be categorized into the fol-lowing: using the Communication layer to send appropriate commands anddata to other modules in order to achieve tasks required by the Interface565.2. Architecturelayer; and responding to incoming commands and data by invoking the ap-propriate handler routines (registered via the Interface layer).5.2.3 Manager Service LayerSimilar to the Service layer, the Manager Service layer provides the func-tionality of the API provided by the Interface layer for the application. TheManager layer uses the Communication layer to send and receive data todrones. In order to set and get drone parameters, the Manager layer sendscommands to a speci c drone and blocks until the expected response eventis received. In order connect (or disconnect) drones together, the Managerlayer sends commands to the recipient drone, instructing it to request (orcancel the request) for data from the sending drone. The Manager layer alsoprovides routines for sending data and noti cation of a speci c kind directlyto a drone.Drone Service LayerThe drone speci c functionality of the Service layer is minimal; mainly theregistration and invocation of the callback routines that handle the con g-uration of the drone. In addition to con guration, the Service layer alsopasses the output data to the Event layer for potential delivery to othermodules.Common ServicesThe Service layer maintains and manages the callback handlers for boththe application and drones. It provides routines for registration (from theinterface layer) and invocation (from itself or the communication layer) ofthese handlers. Each handler is registered for a speci c data type, and isinvoked by the Service layer when data of that type is received through theCommunication layer. The callback handlers are queued and called one at atime to maintain simplicity and prevent synchronization issues on the user’spart.5.2.4 Communication LayerThe Communication layer manages the transfer of data and commands be-tween modules in Hive via a peer-to-peer network using an event basedmodel. Event based communication is suitable for Hive since vision datais represented and processed as discretized packets. The communication575.2. Architecturemodel is based on a publish/subscribe model which uses an asynchronousmessaging model. The peer-to-peer model is chosen to avoid communicationbottlenecks that can occur in client-server models where the data from allcomponents is directed to a central server for processing. Combining the exibility of an event based mechanism with the e ciency of peer-to-peercommunication allows for a system that is loosely coupled and scalable tovery high number of modules.The Communication layer consists internally of the Event and Transportlayers. The following is a discussion of the details of these two layers.Event LayerThe Event layer provides the functionality needed for managing connectionsbetween drones as well as sending and receiving events. The event basedcommunication between drones is based on a publisher/subscriber model,where the sender does not explicitly send messages to receivers but themessage is delivered to the recipient if it has registered for the message’sdata type. This paradigm decouples the communicating modules allowingthe sender to operate at its maximum potential regardless of the receiver’sperformance. The event based communication provides  exibility and ex-tensibility for the Communication layer.The Event layer handles two types of events: network events and moduleevents. In both cases the Event layer uses the functionality of the Transportlayer to deliver events to their destination. Figure 5.3 graphically demon-strates the  ow of data and events through the di erent layers of the Hiveframework.Network events are commands to manage interconnections between mod-ules. These events are issued by the application module to drones as requestsfor creating or removing uni-directional data pipelines between two modules.The ‘connect’ event instructs the recipient drone to register itself as inter-ested in data of a speci c type on the drone that will produce the data, hencecreating a uni-directional pipeline from the sender to the recipient. Thereare two types of pipelines: persistent, which transports events continuously;and non-persistent, which only transports a single event.Module events carry data or noti cations between drones. These eventscan be sent directly from the application to drones or transferred betweendrones using the uni-directional pipelines (as described above). The datatransferred between drones is managed by the Event layer as described pre-viously; drones produce output data of a certain type and pass it to theEvent layer which then delivers the data to any module that has registered585.3. Proof of ConceptFigure 5.3: The  ow across layers associated with sending (b), and receiving(c) data across the network.for data of that type.Transport LayerThe Transport layer can receive packets from and deliver packets to othermodules. The send and receive functions are complementary and implementthe delivery of incoming and outgoing events as complete packages. The re-ceive routine is implemented as an asynchronous callback to the Event layer.The Transport layer was designed to implement a peer-to-peer network ina simple and lightweight manner. The communication is based on TCP/IPto allow modules to run anywhere on the network.5.3 Proof of ConceptThe main objective of Hive is to provide the data transportation and stan-dardization required for vision (and other sensor) based system developmentthat meets the following requirements: Component reusability Platform independency595.3. Proof of Concept Abstraction over inter-component communicationIn this section we focus on presenting a number of drones and systemsdeveloped using Hive to validate the framework by showing how it meets theabove requirements. In order to focus on Hive’s features we have isolatedthe transportation component in the following systems by dealing with theimage data access and conversion within each drone. Furthermore, we utilizedi erent platforms and sensors to show Hive’s platform independency andability to uniformly access modules regardless of underlying hardware orsoftware platforms.5.3.1 ImplementationWe have developed a full version of the Hive framework to evaluate ourconceptual VU framework presented in the previous section.The implementation of this version of Hive is in C++ using the Boostlibrary. Boost is a free peer-reviewed and portable set of C++ source li-braries that provide standard API’s for tasks such as networking, threadmanagement and timing etc.[4]. The development of Hive follows the layerarchitecture closely to provide the same abstraction levels in code. C++was chosen because it is the most widely used language for vision systemdevelopment. Boost libraries allow us to easily provide a C/C++ interfacefor Hive and provide the platform independency we require.5.3.2 System Development Using HiveThis section presents a diverse set of reusable modules (drones) implementedusing Hive that form the building blocks for Hive systems. Each drone’sfunctionality is described as well as its inputs, outputs, con guration andprocessing method. The drones are categorized into the following threetypes: data capture, processors, and visualization and storage.Data CaptureThis section presents a number of data capture drones that act as a startingpoint (sources) to swarms in Hive. We discuss the functionality and theabstraction of these drones. Note that the drones that produce images inthis section follow the UCF image speci cation described in Section 4.2.2.AXIS Network Camera : AXIS network cameras are standalone unitsthat interface directly to the network via Ethernet and host a web-server605.3. Proof of Conceptthat provides access to the camera’s internal parameters and image data viathe AXIS VAPIX protocol over TCP/IP[11]. The AXIS drone abstracts theVAPIX protocol by implementing the translation between Hive commandsand protocol to VAPIX. This drone can run on any machine that is connectedto the network which the camera resides on.Inputs: NoneOutputs: Colour imageCon guration: Camera settings; Output format (JPEG or RGB)Logitech Quickcam pro 3000 : Quickcam pro 3000 is a USB webcamthat has a number of con guration parameters for image quality and reso-lution. Logitech provides a driver that allows access to image data and thecon guration. The Hive drone for this device implements the translation be-tween the Hive and native protocol device. The only di erence from Hive’sperspective between this source and the AXIS network cameras presentedpreviously is the con guration parameters.Inputs: NoneOutputs: Colour imageCon guration: Camera settings; Output format (RGB or HSV)Image Sequence : Often when testing algorithms the same sequence isused for evaluation purposes. In some cases the actual data capture devicemight not be available and pre-recorded data is needed. The image sequencedrone provides this functionality by loading an image sequence from the diskthus allowing seamless switching of data sources (e.g. from a live camera toan image sequence stored on any computer on the network). The drone itselfcan load any image sequence stored on its local machine. The root name ofthe sequence and the frame rate to supply data are given as con gurationparameters.Inputs: NoneOutputs: Colour imageCon guration: Root  lename; Frame rateVideo Files : This drone ful ls the same purpose as the image sequencedrone, but for video  les. This is currently implemented using OpenCV andthus supports the codecs installed on the system. The frame rate used isthe same as that used for the video  le.Inputs: NoneOutputs: Colour imageCon guration: Filename615.3. Proof of ConceptFastrak : Spatial position is an important aspect of many vision applica-tions. Vision based algorithms for estimating 3D position require intensiveprocessing and are often inaccurate. This task can be performed easily andaccurately using tracking hardware. The Polhemus Fastrak[35] is a magneticdevice that performs real-time six degree of freedom tracking of sensors. Fas-trak provides the 3D position and orientation of each sensor (pitch, roll andyaw) relative to a base station. The Fastrak drone implements routines forgetting and setting con guration of the tracker, start and stop routines andthe get data routines. The only parameter on the tracker is the number ofconnected sensors. The device allows up to four sensors to be connectedsimultaneously. The start and stop routines allow the application to controlwhether the drone is producing data.Inputs: NoneOutputs: 3D position and orientation for each sensorCon guration: Number of active sensorsProcessingThis section presents the processing drones that we have developed usingHive.Background Subtracter : Many algorithms in Computer Vision makeuse of background subtraction (or foreground extraction) as a precursorto the main computation. For example, silhouette images are required forvisual hull construction and are useful for tracking, object detection and vir-tual environment applications. This drone provides eight di erent methodsof background subtraction, ranging from simple frame di erencing to moresophisticated techniques[33]. Algorithm selection and parameter setting canbe altered via drone con guration.Inputs: ImageOutputs: Foreground image; Alpha matteCon guration: Algorithm selection; Parameters of algorithmFace Detector : Finding faces in an image has become an ubiquitousapplication seen now as standard on many compact cameras and also avail-able on some camera phones. This drone makes use of the face detectionsupplied with OpenCV, which utilizes a cascade of boosted classi ers usingHaar-like features[5, 24]. For each input image the drone produces an arrayof rectangles corresponding to regions possibly containing a face.625.3. Proof of ConceptInputs: ImageOutputs: Array of rectanglesCon guration: Algorithm parametersColour Point Detector : Locating colour points in images is a usefulmethod for tracking objects of interest. This drone  nds the center of aregion in an image that corresponds to a certain colour. The image is  rstthresholded against the required colour and then the pixels left in the imageare grouped into blobs. The centers of the blobs are then calculated forthose that meet the preferred size criteria.Inputs: ImageOutputs: 2D position of the coloured areasCon guration: RGB value of point; Min and max size of regionsConvolution : Convolution is a common operation in computer visionthat is the basis behind many  lters. We have developed two drones: onethat implements 7x7 convolution using software and another that imple-ments it using CUDA on a graphics card[12]. The interface to both drones isidentical; however, the performance is substantially di erent. In the graph-ics card implementation, the image and kernel are loaded into the graphicscard memory and operated on using parallel processors.Inputs: ImageOutputs: Convoluted ImageCon guration: Convolution KernelVisualization and StoringDisplaying and storing image data is a vital part of vision based systemdevelopment. The following describes drones set up to accomplish thesetasks:Live Video Viewer : This drone provides a display for incoming imagesand annotation tools to draw shapes (from other drones such as the FaceDetector). Multiple instances of this drone can be tied to di erent dronesproviding real-time feedback at each stage of a swarm’s computation, whichis useful for debugging during development. For example, in the Face Detec-tion application described in Section 5.3.3 separate viewers can be connectedto the camera, the background subtractor and the face detector to monitoralgorithm results.635.3. Proof of ConceptInputs: Image; Rectangles; PointsOutputs: Video to screenCon guration: NoneImage Sequence Capture : As discussed above for the Image Sequencedrone, capturing data from cameras is important for o ine processing oralgorithm development and testing. This drone accepts images and storesthem directly to disk, saving them with a  le name given via con guration.The Image Sequence Capture drone also supports saving images to video les instead of image sequences.Inputs: ImageOutputs: Images to diskCon guration: Root  lenameVideo Capture : Video capture works in much the same way as ImageSequence Capture, although the incoming images are saved to disk as avideo  le. The  le name for the video is supplied via con guration, as is thevideo compression format.Inputs: ImageOutputs: Video to diskCon guration: Filename; Video format (codec)5.3.3 Results and EvaluationIn this section we present a number of systems that have been developedusing the Hive framework and the drones presented in Section 5.3.2. Wepresent the results of these systems through which we demonstrate the com-ponent reusability, platform independency and seamless communication ofthe Hive framework. The systems in this section emphasize that given a baseset of drones, system prototypes can be constructed quickly and swarms canbe dynamically connected to test di erent con gurations.Face DetectionWe have implemented a face detection system using the AXIS Camera,Background Subtracter, Face Detector and the Live Video Viewer. Thesystem operates in two ways: the  rst detects faces on the original cameraimages and the second performs face detection on a foreground extractedimage. Results demonstrate the improvement in accuracy and performanceby incorporating a background subtraction system. The system itself showsthe simplicity and  exibility of development using Hive.645.3. Proof of Concept(a) System 1(b) System 2Figure 5.4: Face Detection: Flow charts showing the connections in (a)direct detection and (b) the addition of a Background Subtractor drone.Taken from [30]Both methods of face detection use a ‘dumb’ application to connectthe various drones together. The application is termed ‘dumb’ because itdoes not need to do any computation or result collation itself as the dronesperform all the processing.The  rst method uses the application to connect an AXIS Camera toboth the Face Detector and the Live Video Viewer and then connects theFace Detector to the Live Video Viewer (as shown in Figure 5.4(a)). UsingHive and the prede ned drones, this amounts to under thirty lines of code(including con guration parameters). To obtain real-time performance theface detector is con gured to be less accurate and faster. However, thisresults in more false positives. All of the drones and the application modulefor this setup run on one PC with Windows XP OS for this demo.For the second system a background subtracter is inserted between theAXIS Camera and the Face Detector in order to reduce the number offalse positives while maintaining real-time performance. In this instance,the Background Subtracter drone is running on another PC running LinuxOS connected via the network. This new system, shown in Figure 5.4(b),removes identi ed faces from the background (such as photographs) as wellas reducing the number of false positives.The second system demonstrates Hive’s platform independency and ab-655.3. Proof of Concept(a) (b) (c)Figure 5.5: Face Detection: First row uses fast method, Second row theaccurate method. Taken from [30]straction over the physical location and connection of drones. The appli-cation does not need to address any communication or platform speci cdetails.The results of the two systems are shown in Figure 5.5, with and withoutbackground subtraction, and at two levels of accuracy. To obtain real-timeperformance the Face Detector is set to  nd faces with low accuracy, whichincreases the rate of false positives (Figure 5.5a)). Attaching a BackgroundSubtracter to the system removes large regions of the image (Figure 5.5b))where false positives can appear as well as removing static faces (such asphotographs) from the scene. Figure 5.5c) shows the  nal result using thesecond system. The second row of images displays results for the systemwith the Face Detector in high accuracy mode.The addition of the Background Subtracter drone to the system is simpleand shows how systems can be enhanced or tested by inserting additionalprocesses using Hive.Quality of View AnalysisThis system extends the previous real-time Face Detection algorithm tocreate a system which analyzes the quality of the given views in a multiplecamera network. The quality evaluation is set to the number of faces in eachview and the application automatically switches to the view with the most665.3. Proof of ConceptFigure 5.6: Quality of View Analysis: The  ow chart for the Hive sensornetwork for analysing the quality of views via face detection. Taken from[30]faces. This system for example could be used for home video podcastingin one-person shows; using multiple webcams the system will automaticallychange to the view the presenter is looking at.The system connections are shown in Figure 5.6. Three AXIS Cameradrones are each connected to a Background Subtracter drone which is inturn attached to a Face Detector drone. The cameras are also connected tothe application to provide the images for the chosen view. As the feeds comein from the Face Detectors, the number of faces in each view is comparedand the view with the most faces is chosen. Its images are then routed tothe applications display.This example demonstrates the reusability provided by Hive. We showhow a sophisticated system can be built quickly using Hive from a set of basedrones. Figure 5.7 shows the screenshots of the Quality of View Analysissystem running for three scenes.Multiple Camera CalibrationApplications such as tracking, augmented reality, camera calibration andhuman computer interface require a mapping between image pixels and 3Dpositions in the world. This system uses a magnetic positioning device toobtain the intrinsic and extrinsic camera parameters in order to calibratecameras. There are various methods for computing camera calibration; wehave developed a multiple camera calibration system based on the Tsaicalibration method[42].Using Hive we utilize the Colour Point Detector and the Fastrak drones.For this system we use a green marker on the Fastrak sensor to locate it inthe image giving an image point to 3D point correspondence. To perform675.3. Proof of ConceptFigure 5.7: Quality of View Analysis: Each row represents a snapshot intime from each of the three cameras. The red boxes in the top-left, centerand bottom-right images show positive detections and the view chosen bythe system. Taken from [30]calibration, the marked sensor is moved around in the  eld of view of eachcamera to produce a data set which is then processed using the Tsai methodto calculate the intrinsic and extrinsic parameters.Figure 5.8 shows the interconnection of drones in the multi-camera cali-bration system. The application is connected to one Fastrak and three setsof the Colour Point Detector and AXIS Camera swarms. The applicationcouples the 3D sensor position from the Fastrak drone with the 2D locationof the colour point from the Colour Point Detector drone and runs the cali-bration routine. The resulting calibration parameters are written to disk foreach camera. Figure 5.9 shows the annotated images for each camera. Notethat extension to more cameras is trivial, requiring an additional swarm foreach camera.685.3. Proof of ConceptFigure 5.8: Drone interconnection for camera calibration. Taken from [30]Figure 5.9: Feed from cameras 1, 2 and 3 during data point collection forcalibration. Taken from [30]Augmented RealityThe insertion of virtual objects into a real scene has many applications inentertainment, virtual reality and human computer interaction. We haveimplemented a real-time augmented reality system using the Fastrak droneand multiple camera drones that provides jitter-free virtual object insertionwhich is accurately represented in the di erent camera viewpoints.Figure 5.10 shows the interconnection of drones for this system. We usethe multiple camera calibration described above to calibrate the camerasto the Fastrak’s coordinate system. The calibration system provides thelocation and orientation of each camera in the tracker’s coordinate systemas well as the camera’s intrinsic parameter (focal length and the principlepoint). The calibration data is used to construct a model of the camerasand the coordinate system in OpenGL.Given this model, a 3D object can be placed in the scene using the correctposition and orientation supplied by the Fastrak sensor and rendered in theimage plane of the modeled camera. This rendering is superimposed on theactual camera feed to produce the images that contain the virtual object.Figure 5.11 shows the frames from the three cameras before and after the695.4. ConclusionFigure 5.10: Drone interconnection for augmented reality. Taken from [30]Camera 1 Camera 2 Camera 3Figure 5.11: Original feed from the cameras vs. augmented reality. Takenfrom [30]placement of the augmented reality object. Figure 5.11 shows the setup usedfor the calibration and the augmented reality system.5.4 ConclusionIn this chapter we presented the architecture and the programming modelof Hive; a component based framework for mediating communication andservices needed for development of distributed vision systems.We presented the Hive API which allows developers to create reusablesource and processing modules called ’drones’. Using control modules called‘applications’ the ‘drones’ can be connected in virtually any con guration to705.4. Conclusioncreate sophisticated vision systems. We described the architecture of Hive asa set of layered services to provide abstraction and to decouple the architec-ture from implementation. We showed that by de ning a clear interface foreach layer, Hive provides increasingly high-level services that uses the func-tionality of underlying layers without relying on particular implementation.At its lowest level, Hive provides a  exible, low overhead Communicationlayer that follows the peer-to-peer interconnection model to allow for thecreation of systems that are distributed and scalable.We present a number of source and processor drones that have been im-plemented using the Hive framework. We show that using these drones, wecan create a number of di erent systems easily in order to show modularityand reusability.71Chapter 6ConclusionIn this chapter we revisit the material presented in this thesis in three parts.Firstly, we revisit the problems that we identi ed in the introduction andaddressed throughout the thesis. Secondly, we summarize the contributionsthat have been made with respect to those problems. Thirdly, we proposea number of suggestions for the continuation of this work and the futuredirection of this area of research.6.1 Problems with Current ApproachesThe major focus of the work presented in this thesis is to address two im-portant requirements that are not fully addressed in current approaches tovision based application development: abstraction over low-level details andhigh-level module reuse.We identi ed that the underlying reason that these two issues have beenpreviously overlooked is the lack of classi cation and conceptual abstractionsin current approaches to computer vision based system development.6.2 Contributions of this WorkIn order to address the lack of conceptual abstraction we  rstly separate thevision problem into the data management task and the processing task. Wefurther decompose the data management task into the following decoupledcomponents: Data Access: Con guration and retrieval of data from sources Data Transformation: Conversion and transformation of imagedata Data Transportation: Data transfer mediation between modulesWe proposed that a framework for vision development should provide thedata management functionality which consists of the data access, transfor-mation and transportation sub-tasks.726.3. Future DirectionBased on the above decomposition we presented VU framework, a frame-work that provides the data management functionality to vision developerswhile providing abstraction over low-level details. We demonstrated how theVU framework simpli es the developers task through abstraction of deviceaccess, image data details and communication between components.In addition to the VU framework, we presented the Hive framework,an event based framework for developing distributed sensor systems thatprovides simple high-level methods for the communication, control and con- guration of the reusable components. We discussed the details of designand architecture of Hive as well as a number of modules (sources and proces-sors) and applications developed using Hive. We showed that even thoughHive is completely independent of the VU framework, it forms the basis ofthe data transportation component of VU.6.3 Future DirectionDuring the course of the research presented above, a number of future direc-tions have emerged for the continuation of this research topic. We summarizethese directions here.6.3.1 Extension of Vision Utility FrameworkWe have identi ed four components of vision processing and presented thefunctionality and scope of each component. However, we only focused in de-tail on one of the components (data transportation). We have proposed anapproach for the remaining components (data access and data transforma-tion) and shown that this approach is valid for a subset of the functionality,providing a comprehensive solution. However, it is a non-trivial task thatstill requires in-depth research.The VU framework in its current rendition has been designed as solely aproof-of-concept framework that supports a limited set of functionality. Themain limitation of VU is that it only addresses a single vision context withone source and one processor. In order for this framework to be utilized as asuccessful vision based application development framework, it would need tosupport contexts with multiple sources and processors. In order to achievethis, the API of the framework needs to be extended to fully exploit thetransport component by supporting more  exible connection con gurationsamongst the modules. Another feature that could be added to the VUframework is the addition of an auto-discovery feature for devices.736.4. Conclusion6.3.2 Extension of HiveThere are a number of future extensions in order to improve the Hive frame-work both in terms of added functionality and improved performance. Wediscuss two immediate possible additions to the framework: support for taskdistribution and an extension to the transport layer.Hive currently provides the mechanism for transparent communicationof modules. A possible extension to Hive that would further exploit thismechanism is to add build-in support for task distribution using the idea ofpools of drones. The framework could accommodate dynamic allocation ofdrones to tasks based on the workload. The framework could provide thesynchronization and control means for distributing the task and gatheringresults with minimal e ort on the application developer.Currently the sole transportation mechanism of the Hive framework isTCP/IP over Ethernet. However, the transport layer can be extended tosupport a number of di erent mediums such as shared memory and physicalbuses (e.g. USB and FireWire). Shared memory can drastically increase theperformance for communication between drones running on processors thatshare memory banks, whereas USB and FireWire could extend the frame-work to be used for applications that require speci c connectivity betweencomponents.6.4 ConclusionIn this thesis we explored the current methodology for computer vision basedsystem development in order to determine the underlying cause of the in-adequacy of the existing frameworks. We proposed that the fundamentallimitation with the current approach o ered through various frameworksis the lack of conceptual high-level classi cation of the vision problem intosmaller sub-components.In order to address the lack of sub-task classi cation in computer visionwe proposed a decomposition of the vision problem into the following decou-pled components: data access, which addresses retrieval of image data fromsources; data transformation, which addresses format conversion of imagedata; data transportation, which addresses the communication of data be-tween modules in a vision system; and data processing, which addresses theanalyzing and manipulation of image data.Based on the above classi cation we presented a framework that pro-vides the functionality of the data access, data transformation and datatransportation components through an API that abstracts the details of746.4. Conclusioneach component from users. We described the programming model of thisframework and presented an application as a proof-of-concept to validatethis approach.We focused on the transport component of the above classi cation andpresented Hive, a standalone event based framework for developing dis-tributed vision based systems that provides simple high-level methods forthe communication, control and con guration of the reusable components.The main objectives of Hive are to promote component reusability, platformindependency and abstraction over communication. We presented a set ofmodules and applications to validate the Hive framework.The vision system development approach presented in this thesis couldfundamentally change the way vision development is approached and couldhelp advance the vision community as a whole through abstraction, stan-dardization and promotion of code reuse.75Bibliography[1] Amir Afrah, Gregor Miller, Donovan Parks, Matthias Finke, and SidneyFels. Hive: A distributed system for vision processing. In Proc. of theInt. Conf. on Distributed Smart Cameras, September 2008.[2] D. Arita, Y. Hamada, S. Yonemoto, and R. Taniguchi. Rpv: a pro-gramming environment for real-time parallel vision - speci cation andprogramming methodology. In Proceedings of 15th International Par-allel and Distributed Processing Symposium, pages 218{225, 2000.[3] 1394 Trade Association. Iidc 1394-based digital camera speci cation.Technical Report 1.3, 1394 Trade Association, 2000.[4] Boost C++. http://www.boost.org/.[5] Gary Bradski and Adrian Kaehler. Learning OpenCV: Computer Visionwith the OpenCV Library. O’Reilly Media, Inc., 1st edition, October2008.[6] Alex Brooks, Tobias Kaupp, Alexei Makarenko, Stefan Williams, andAnders Oreback. Towards component-based robotics. In IEEE/RSJInternational Conference on Intelligent Robots and Systems, pages 163{168, 2005.[7] H. Bruyninckx. Open robot control software: the orocos project. In Pro-ceedings of IEEE International Conference on Robotics and Automa-tion, pages 2523{2528, 2001.[8] K Chandy. Event-driven applications: Costs, bene ts and design ap-proaches. Presentation at Gartner Application Integration and WebService Summit 2006, California Institute of Technology (2006).[9] Szyperski Clemens. Component Software: Beyond Object-Oriented Pro-gramming. Addison-Wesley Longman Ltd., 1998.76Chapter 6. Bibliography[10] T.H.J. Collett, B.A. MacDonald, and B.P. Gerkey. Player 2.0: Towarda practical robot programming framework. In Australasian Conferenceon Robotics and Automation, 2005.[11] Axis Corporation. Axis communication application programming in-terface:http://www.axis.com/. Technical report, AXIS, 2008.[12] NVIDIA Corporation. Nvidia cuda compute uni ed device architecture:Programming guide. Technical Report 1.0, NVIDIA Corporation, 2007.[13] DevIL. http://openil.sourceforge.net/.[14] Direct Show.http://msdn.microsoft.com/en-us/library/ms783354(VS.85).aspx.[15] David Forsyth and Jean Ponce. Computer Vision: A Modern Approach.Prentice Hall, 2003.[16] Alexandre R. J. Francois. Software architecture for computer vision. InEmerging Topics in Computer Vision. Prentice Hall, 2004.[17] Gandalf. http://gandalf-library.sourceforge.net/.[18] GLUT. http://www.opengl.org/resources/libraries/glut/.[19] ImageMagick. http://www.imagemagick.org/.[20] Apple Inc. Core image programming guide. Technical report, Apple,2008.[21] Java Media Framework API.http://java.sun.com/javase/technologies/desktop/media/jmf/.[22] John Krumm, Steve Harris, Brian Meyers, Barry Brumitt, MichaelHale, and Steve Shafer. Multi-camera multi-person tracking for ea-syliving. Visual Surveillance, IEEE Workshop on, 0:3, 2000.[23] Cheng Lei and Yee-Hong Yang. Design and implementation of a clusterbased smart camera array appliaction framework. In Proc. of the Int.Conf. on Distributed Smart Cameras, September 2008.[24] Rainer Lienhart and Jochen Maydt. An extended set of haar-like fea-tures for rapid object detection. In Proceedings of International Confer-ence on Image Processing, volume 1, pages 900{903, September 2002.77Chapter 6. Bibliography[25] R. C. Luo, Y. Chin-Chen, and L. S. Kuo. Multisensor fusion and inte-gration: approaches, applications and future research directions. IEEESensors Journal, 2:107{119, 2002.[26] Alexei Makarenko, Alex Brooks, , and Tobias Kaupp. On the bene ts ofmaking robotic software frameworks thin. In IEEE/RSJ InternationalConference on Intelligent Robots and Systems, 2007.[27] MATLAB. http://www.mathworks.com/.[28] Giorgio Metta, Paul Fitzpatrick, and Lorenzo Natale. YARP: Yet An-other Robot Platform. International Journal on Advanced RoboticsSystems. In International Journal on Advanced Robotics Systems, pages43{48, 2006.[29] Gregor Miller. High Quality Novel View Rendering from Multiple Cam-eras. PhD thesis, University of Surrey, UK, CVSSP, SEPS, Universityof Surrey, Guildford, GU2 7XH, 2007.[30] Gregor Miller, Amir Afrah, and Sidney Fels. Rapid vision applicationdevelopment using hive. In Proc. International Conference on Com-puter Vision Theory and Applications, February 2009.[31] M. Montemerlo, N. Roy, and S. Thrun. Perspectives on standardizationin mobile robot programming: The carnegie mellon navigation (car-men) toolkit. In Proceedings of IEEE/RSJ International Conferenceon Intelligent Robots and Systems, pages 2436{2441, 2003.[32] Netpbm. http://netpbm.sourceforge.net/.[33] D. H. Parks and S. Fels. Evaluation of background subtraction al-gorithms with post-processing. In IEEE International Conference onAdvanced Video and Signal-based Surveillance, 2008.[34] George Pava and Karon E. MacLean. Real Time Platform Middlewarefor Transparent Prototyping of Haptic Applications. In 12th Interna-tional Symposium on Haptic Interfaces for Virtual Environment andTeleoperator Systems, pages 383{390, 2004.[35] Polhemus. Fastrak :http://www.polhemus.com/. Technical report, Polhemus, 2008.[36] Quicktime. http://developer.apple.com/QuickTime/.78Chapter 6. Bibliography[37] Pekka Saastamoinen, Sami Huttunen, Valtteri Takala, Marko Heikkila,and Janne Heikkila. Scallop : An open peer-to-peer framework fordistributed sensor networks. In International Conference on DistributedSmart Cameras, 2008.[38] Michael H. Schimek, Bill Dirks, Hans Verkuil, and Martin Rubli. VideoFor Linux v4.12: http://v4l2spec.bytesex.org/v4l2spec/v4l2.pdf. Tech-nical Report 0.24, Linux, 2008.[39] Mark Segal and Kurt Akeley. The OpenGL Graphics System. Speci -cation 3.0, The Khronos Group Inc., 2008.[40] A. W. Senior, A. Hampapur, and M. Lu. Acquiring multi-scale imagesby pan-tilt-zoom control and automatic multi-camera calibration. Ap-plications of Computer Vision and the IEEE Workshop on Motion andVideo Computing, IEEE Workshop on, 1:433{438, 2005.[41] Mary Shaw and David Garlan. Software Architecture: Perspectives onan Emerging Discipline. Prentice Hall, 1996.[42] Roger Y. Tsai. An e cient and accurate camera calibration techniquefor 3d machine vision. In Proceedings of IEEE Conference on ComputerVision and Pattern Recognition, pages 364{374, 1986.[43] VXL. http://vxl.sourceforge.net/.[44] H. Zimmerman. OSI Reference Model-The ISO Model of Architecturefor Open Systems Interconnection. In IEEE Transactions on Commu-nications, pages 425{432, 1980.79Appendix APrevious PublicationsThe material presented in Chapter 5 (Hive Framework) has been previouslypublished in the International Conference on Distributed Smart Cameras2008 and International Conference on Vision Systems and Application 2009[1, 30].80Appendix BStatement of Co-AuthorshipParts of the work presented in this thesis were completed in collaborationwith other researchers. The following list outlines my contributions: Identi cation and design of the research program:Collaborated on the classi cation of the vision problem.Determining the Scope of the Vision Utility framework. Performing the research:Design and implementation of the Vision Utility framework.Implementation of the VU proof of concept application.Evaluation of the VU proof of concept application.Scope de nition and layered design for Hive’s architecture.Implementation of Hives Event, Service, and Interface layers.Implementation of the Face Detections, Multi-camera Calibration,and Augmented Reality applications and drones. Data analysis:Evaluation of the Vision Utility Framework.Evaluation of the Hive framework. Manuscript preparation:Preparation of the entire thesis document.81Appendix CThe Vision UtilityFramework’s ApplicationProgramming InterfaceThe following is a description of the routines that constitute the VU API:C.1 Application API ContextID CreateContext(Device processor id):Functionality: This method creates a VU processing pipelinethat is associated with a speci c processor.Arguments: Processor deviceReturn: ID of the processing pipeline created void SetContextSource(Device source):Functionality: This method allows the application to assign asource to the context that is already created using the CreateContext()method.Arguments: Source deviceReturn: void bool SetParameter(Device dev, void *data, int size, Con g-Type type):Functionality: This routine allows the application to set thecon guration parameters of a device or its driver (determined by Con- gType arguement). The driver parameter consists of manipulatingthe communication mechanism between the device and application.Arguments: Device ID, con guration data, size of the con gu-ration data, target (driver or device)Return: Result of the operation82C.1. Application API bool GetParameter(Device dev, void *data, int size, Con g-Type type):Functionality: This routine allows the application to get thecon guration parameters of a device or its driver (determined by Con- gType arguement). Similar to SetParameter() the driver parametercontrols the communication mechanism between the device and appli-cation.Arguments: Device ID, con guration data (to be written into),size of con guration, target(driver or device)Return: Result of the operation void SetIdleFn(IdleFn VU idle function):Functionality: This routine allows application to register an idlefunction that gets called by the framework repeatedly when there areno other events being processed.Arguments: Idle function routineReturn: void void RegisterPostprocessCallback(VUCallBack fn):Functionality: This routine allows the application to register amethod that is invoked every time the processor has  nished a cycleof processing on an input. The processor blocks until this call returns.Arguments: Callback handler routineReturn: void void RegisterImageReceiver(UCF::ImageProp prop, VUCall-Back fn):Functionality: This routine allows application to register a han-dler for incoming images from the processor. This routine allows theapplication to specify the properties of incoming images. Images areconverted to match these properties upon being received by the sys-tem.Arguments: Incoming image properties, handler routineReturn: void void PostReprocess():83C.2. Driver APIFunctionality: This routine signals the processor of a contextto reprocess an input. This routine is called within the post-processcallback by the application.Arguments: voidReturn: void void PostDone():Functionality: Similar to PostProcess(), however this call signalsthe vision context to move on to the next input data.Arguments: voidReturn: void void Start():Functionality: This routine starts the vision system. This rou-tine requires that a vision processing context be set up prior to itsinvocation.Arguments: voidReturn: void void MainLoop():Functionality: This routine blocks forever and allows the frame-work to invoke the registered callbacks in response to actions of thevision processing context.Arguments: voidReturn: void void Cycle():Functionality: This is a non blocking counter-part to the Main-Loop() call. This routine relies on the application for getting calledrepeatedly.Arguments: voidReturn: voidC.2 Driver API void RegisterProcessor(NoArgCallBack fn):84C.2. Driver APIFunctionality: This routine allows the device driver to registerthe main processing method of the device. This routine gets called bythe framework when there is an input awaiting processing.Arguments: Processor functionReturn: void void RegisterGetCon guration(DataCallBack fn):Functionality: This routine allows the device driver to registerthe routine that retrieves the con guration of the device. This routineis invoked by the application through a callback.Arguments: Get con guration routineReturn: void void RegisterSetCon guration(DataCallBack fn):Functionality: This routine allows the device driver to registerthe routine that performs con guration setting of the device. Thisroutine is invoked by the application through a callback.Arguments: Set con guration routineReturn: void void RegisterDataReciever(UCF::ImageProp img prop, Dat-aCallBack fn):Functionality: This routine allows the device driver to register amethod that receives incoming image data. This routine allows driverto set the properties of the incoming image. This is an optional routineand is only used for processors.Arguments: Image properties, receiver routineReturn: void void SendOutput(UCF::ImageProp image prop, char *data):Functionality: This routine allows a device to send image out-puts to the application.Arguments: Image properties, image dataReturn: void void Wait():85C.2. Driver APIFunctionality: This routine blocks forever and is used once thedriver registers all the appropriate callback handlers in order to allowthe framework to invoke appropriate callbacks to respond to incomingevents.Arguments: voidReturn: void86Appendix DHive’s ApplicationProgramming InterfaceThe following is a description of the routines that constitute in the HiveAPI:D.1 Application API bool SetCon g(ModuleID &id, byte *data, int size):Functionality: This method allows the application to set thecon guration of drones. The application programmer must have thedrone’s header  le that speci es the con guration speci cation for eachdrone.Arguments: Drone’s ID, con guration data, size of the con gu-ration dataReturn: Boolean result bool GetCon g(ModuleID &id, byte *data, int size);:Functionality: This method allows the application to get con- guration data from a drone. The application programmer must havethe drone’s header  le that speci es the con guration speci cation foreach drone.Arguments: Drone’s ID, size of the con guration dataReturn: Boolean result, Con guration struct bool Connect(ModuleID &target, ConnectOptions &options,Connection::Type type):Functionality: This method allows the application to create datapipelines between drones, an overloaded method is also provided toconnect drones to the application. The connection options providedby the user supplies the method and the data type for the connection.87D.1. Application APIThe data type speci es what kind of data should be sent through thispipeline as a drone could produce a number of di erent outputs. Themethod speci es whether the data is ’synchronized’ or ’streaming’.The di erence here is that for synchronized connections there is arequest each time the data is wanted by the receiving drone howeverwith synchronized the data is sent whenever the sending drone has itavailable.Arguments: Source drone ID, destination drone ID, connectionoptionsReturn: Boolean result bool Disconnect(ModuleID &target, ConnectOptions &op-tions, Connection::Type type):Functionality: This method allows the application to removeexisting connections between drones, it is also overloaded to drone-to-application connections. Disconnect() requires the connection type inorder to remove only the required connection, as there may be severalconnections between 2 modules.Arguments: Source drone ID, destination drone ID, connectionoptionsReturn: Boolean result void SendDataToDrone(ModuleID &mod, DataType &type,Data &data):Functionality: These methods (SendNoti cationToDrone) allowthe application to send data to speci c drones. The di erence betweenthe two methods is that SendNoti cationToDrone() only transfers thedata type, where SendDataToDrone() also sends the data payload.Arguments: Drone’s ID, data type, data(SendDataToDrone)Return: Void void SetMainFn(MainFn main):Functionality: This method allows the module to register a mainhandler routine. This routine is called by the service layer repeatedly.Arguments: Main RoutineReturn: Void88D.2. Drone API void RegisterHandler(DataType &datatype, HandlerFn han-dler):Functionality: This method allows the module to register a call-back routine that is called when the module receives data of a speci ctype. The data type is passed as an argument.Arguments: Data Type, handler functionReturn: VoidD.2 Drone API void SetCon gFns(Con gureFn GetCon g, Con gureFn Set-Con g):Functionality: This method allows the drone to provide the call-back routines that get and set the internal parameters of the drone.These are the functions that are invoked when the application’s get-con g/setcon g is called for a drone.Arguments: Handlers functions for con guration setting and get-tingReturn: Void void NewData(const DataType &type, const Data &data):Functionality: This method is used by the drone to send theoutput to other modules in Hive, This routine only requires the datatype and data payload as the destination of this data is determinedby the event layer.Arguments: Data Type, dataReturn: Void void SetMainFn(MainFn main):Functionality: This method allows the module to register a mainhandler routine. This routine is called by the service layer repeatedly.Arguments: Main RoutineReturn: Void void RegisterHandler(DataType &datatype, HandlerFn han-dler):89D.2. Drone APIFunctionality: This method allows the module to register a call-back routine that is called when the module receives data of a speci ctype. The data type is passed as an argument.Arguments: Data Type, handler functionReturn: Void90


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items