Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Energy efficient video sensor networks for surveillance applications Sarif, Bambang Ali Basyah 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2016_september_sarif_bambangalibasyah.pdf [ 4.43MB ]
Metadata
JSON: 24-1.0308674.json
JSON-LD: 24-1.0308674-ld.json
RDF/XML (Pretty): 24-1.0308674-rdf.xml
RDF/JSON: 24-1.0308674-rdf.json
Turtle: 24-1.0308674-turtle.txt
N-Triples: 24-1.0308674-rdf-ntriples.txt
Original Record: 24-1.0308674-source.json
Full Text
24-1.0308674-fulltext.txt
Citation
24-1.0308674.ris

Full Text

  ENERGY EFFICIENT VIDEO SENSOR NETWORKS FOR SURVEILLANCE APPLICATIONS  by  Bambang Ali Basyah Sarif  B.Sc. Bandung Institute of Technology, 1999 M.Sc. King Fahd University of Petroleum & Minerals, 2003    A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY  in  The Faculty of Graduate and Postdoctoral Studies  (Electrical and Computer Engineering)    THE UNIVERSITY OF BRITISH COLUMBIA  (Vancouver)  August 2016   © Bambang Ali Basyah Sarif, 2016  ii  Abstract Video sensor networks (VSNs) provide rich sensing information and coverage, both beneficial for applications requiring visual information such as smart homes, traffic control, healthcare systems and monitoring/surveillance systems. Since a VSN-based surveillance application is usually assumed to have limited resources, energy efficiency has become one of the most important design aspects of such networks. However, unlike common sensor network platforms, where power consumption mostly comes from the wireless transmission, the encoding process in a video sensor network contributes to a significant portion of the overall power consumption. There is a trade-off between encoding complexity and bitrate in a sense that in order to increase compression performance, i.e., achieve a lower bitrate, a more complex encoding process is necessary. The coding complexity and video bitrate determine the overall encoding and transmission power consumption of a VSN. Thus, choosing the right configuration and setting parameters that lead to optimal encoding performance is of primary importance for controlling power consumption in VSNs. The coding complexity and bitrate also depend on the video content complexity, as spatial details and high motion tend to lead to higher computation costs or increased bitrates. In a video surveillance network, each node captures an event from a different point of view, such that each captured video stream has unique spatial and temporal information. This thesis investigates the trade-off between encoding complexity and communication power consumption in a video surveillance network where the effect of video encoding parameters, content complexity, and network topology are taken into iii  consideration. In order to take into account the effect of content complexity, we created a video surveillance dataset consisting of a large number of captured videos with different levels of spatial information and motion. Then, we design an algorithm that minimizes the video surveillance network’s power consumption for different scene settings. Models that estimate the coding complexity and bitrate were proposed. Finally, these models were used to minimize the video surveillance network’s power consumption and estimate the encoding parameters per each node that yield the minimum possible power consumption for the entire network. iv  Preface This thesis presents research conducted by Bambang Ali Basyah Sarif, under the guidance of Dr. Victor Leung and Dr. Panos Nasiopoulos. Some of the research results have been published as journal articles, conference proceedings and/or submitted for peer review.  The content of Chapter 2 is from our previous publication in [P1]. The content of Chapter 3 appears in two conference proceedings [P2][P3] and one journal publication [P4]. Portions of Chapter 4 appear in two conference proceedings [P5][P6] and one journal publication  [P7], while the main body of Chapter 5 is from [P8]. The research ideas and the work presented in all of these manuscripts, including the literature review, mathematical modeling, algorithm design and development, simulations and implementation, analytical evaluations, result compilation, and manuscript preparation are all the result of my work. Dr. Mahsa T. Pourazad provided high-level consultation and editorial input. Dr. Mahsa T. Pourazad, and Dr. Panos Nasiopoulos provided consultation for experiments conducted in [P7], while Gable Yeung and Dan Miner helped in setting up the equipment for the experiments. My research supervisor, Dr. Victor Leung, has provided research guidance, supervisory comments, and corrections during the process of conducting the studies and writing the manuscripts. My research co-supervisor, Dr. Panos Nasiopoulos, helped with discussions, providing research guidance, supervisory feedback, checking the validity of my analytical results, and proofreading the respective manuscripts. Two of the publications [P4][P5] resulted from this thesis are co-authored by Dr. Amr Mohammed who provided high-level consultation and feedback. The first and last chapters of this thesis were written v  by Bambang Ali Basyah Sarif, with editing assistance from Dr. Panos Nasiopoulos and Dr. Victor Leung.  Over the course of his Ph.D., Bambang Ali Basyah Sarif has participated in other research. However, this thesis exclusively includes work related to energy efficiency of H.264/AVC-based video surveillance networks. Publications resulting from the research presented in this thesis are listed in the following. [P1]. B.A.B. Sarif, V.C.M. Leung, and P. Nasiopoulos, "Distance Based Heuristic for Power and Rate Allocation of Video Sensor Networks", the IEEE Wireless Communication and Networking Conference (WCNC 2012),  pp. 1893-1897, April 2012. [P2]. B. A. B. Sarif, M. T. Pourazad, P. Nasiopoulos, and V. C. M. Leung, “Encoding and Communication Energy Consumption Trade-off in H.264/AVC-based Video Sensor Network,” 14th IEEE International Symposium on a World of Wireless, Mobile, and Multimedia Network (WoWMoM 2013), 4-7 June 2013, Madrid, Spain. [P3]. B. A. B. Sarif, M. T. Pourazad, P. Nasiopoulos, and V. C. M. Leung, “Analysis of Energy Consumption Fairness in Video Sensor Networks,” the Qatar Foundation Annual Research Conference, Qatar, Nov. 2013.  [P4]. Bambang A.B. Sarif, Mahsa T. Pourazad, Panos Nasiopoulos, Victor C.M. Leung and Amr Mohamed, “Fairness Scheme for Energy Efficient H.264/AVC-based Video Sensor Network”, SpringerOpen Human-centric Computing and Information Sciences, Feb. 2015, DOI: 10.1186/s13673-015-0025-2 [P5]. B. A. B. Sarif, M. T. Pourazad, P. Nasiopoulos, and V. C. M. Leung, “Analysis of Power Consumption of H.264/AVC-based Video Sensor Networks through Modeling the Encoding Complexity and Bitrate,” the 18th International Conference on Digital Society (ICDS 2014), Barcelona, Spain, March 2014. [P6]. Bambang A.B. Sarif, Mahsa Pourazad, Panos Nasiopoulos and Victor C.M. Leung, “A New Scheme for Estimating H.264/AVC-based Video Sensor Network Power Consumption”, the 2015 World Congress on Information Technology Applications and Services, Jeju, Korea, Feb. 2015. [P7]. Bambang A.B. Sarif, Mahsa T. Pourazad, Panos Nasiopoulos, and Victor C.M. Leung, “A Study of the Power Consumption Estimation of the H.264/AVC-based Video Sensor Network”, International Journal of Distributed Sensor Network, vol. 2015, Article ID 304787, 2015, DOI:10.1155/2015/304787 [P8]. Bambang A.B. Sarif, Mahsa T. Pourazad, Panos Nasiopoulos, and Victor C.M. Leung, vi  “Encoding Complexity and Bitrate Models for Resource Limited H.264/AVC Video Applications”, submitted for a review on April 7th, 2016, IEEE Trans. on Circuit and System for Video Technology  vii  Table of Contents Abstract .............................................................................................................................................. ii Preface ............................................................................................................................................... iv Table of Contents ........................................................................................................................... vii List of Tables ................................................................................................................................... xii List of Figures ................................................................................................................................. xiv List of Abbreviations ................................................................................................................. xviii Acknowledgments ......................................................................................................................... xx Dedication ...................................................................................................................................... xxii  INTRODUCTION ........................................................................................................ 1 1.1 Video Surveillance Network Overview ...............................................................................................4 1.2 Overview on Video Surveillance Network Power Consumption Estimation ....................8 1.2.1 Video Surveillance Network Power Consumption Model ............................................... 9 1.2.2 Trade-off between Encoding and Communication Power Consumption .............. 11 1.3 Video Encoding Configuration and Fairness in a Video Surveillance Network ............ 12 1.4 Coding Complexity and Bitrate Models ............................................................................................ 14 1.5 The Effect of Content Complexity on Video Surveillance Network’s Power Consumption ................................................................................................................................................. 17 1.6 Research Topic Formulation ................................................................................................................. 18 1.6.1 Scope and Limitation of the Thesis........................................................................................ 19 1.6.2 Research Road Map ..................................................................................................................... 21 1.7 Summary of Thesis Contributions ...................................................................................................... 24 viii   ENCODING AND COMMUNICATION POWER CONSUMPTION TRADE-OFF IN A VIDEO SURVEILLANCE NETWORK ........................................................... 30 2.1 Power-rate-distortion Model in Video Surveillance Networks ............................................ 31 2.2 Network Lifetime Maximization in Video Surveillance Networks ...................................... 31 2.3 Distance Based Heuristics for Energy Efficient Video Surveillance Networks ............. 34 2.3.1 Proposed Approach ...................................................................................................................... 35 2.3.2 Experiments and Results ............................................................................................................ 39 2.3.2.1 Performance of the Proposed Technique ........................................................... 40 2.3.2.2 Comparison with Other Techniques ..................................................................... 42 2.4 Summary ......................................................................................................................................................... 45  ENCODING PARAMETERS AND SCENE’S CONTENT COMPLEXITY IN VIDEO SURVEILLANCE NETWORKS ............................................................................... 47 3.1 Video Encoding Parameters and Node’s Power Consumption ............................................. 47 3.1.1 H.264/AVC Coding Performance Analysis .......................................................................... 48 3.1.1.1 Encoding Complexity Measure ................................................................................ 49 3.1.1.2 The Effect of GOP Size towards Coding Complexity and Bitrate ............. 50 3.1.1.3 The Effect of QP towards Coding Complexity and Bitrate .......................... 51 3.1.1.4 The Effect of SR towards Coding Complexity and Bitrate ........................... 51 3.1.2 Coding Complexity and Bitrate Trade-off .......................................................................... 53 3.1.3 The Effect of Encoder Parameter Settings towards Node`s Power Consumption ..   ............................................................................................................................................................... 55 3.1.3.1 Encoding Power Consumption................................................................................. 55 ix  3.1.3.2 Communication Power Consumption ................................................................... 56 3.1.3.3 Encoding and Communication Power Consumption Trade-off ............... 56 3.1.4 Fairness-based Node’s Encoder Configuration Allocation .......................................... 58 3.1.4.1 Simplified Configuration ID Table .......................................................................... 59 3.1.4.2 Fairness Constraint........................................................................................................ 59 3.1.4.3 Exhaustive Search Fairness-based Algorithm .................................................. 60 3.2 Relation between Video Content, Bitrate and Coding Complexity...................................... 61 3.2.1 Video Surveillance Dataset ....................................................................................................... 62 3.2.2 Effect of Scene Complexity towards Coding Complexity and Bitrate...................... 65 3.2.3 Scenes’ Complexity Measurement and Classification .................................................... 66 3.3 Common Approach .................................................................................................................................... 73 3.4 Fairness-Based Optimization for Minimum Power Consumption ...................................... 74 3.4.1 Fairness-based CID allocation for Minimum Power Video Surveillance Network .   ............................................................................................................................................................... 77 3.4.2 Proposed Fairness-based CID Allocation Algorithm for the Test Set...................... 78 3.4.3 Experiments Settings ................................................................................................................... 81 3.4.4 Performance of the Proposed Technique for the Training Scenes ........................... 83 3.4.5 Performance of the Proposed Technique for the Test Scenes .................................... 88 3.5 Summary ......................................................................................................................................................... 89  CODING COMPLEXITY AND BITRATE MODELS FOR H.264/AVC-BASED VIDEO SURVEILLANCE NETWORKS .................................................................. 91 4.1 Proposed Model ........................................................................................................................................... 91 x  4.1.1 H.264/AVC Coding Complexity Model .................................................................................. 92 4.1.2 H.264/AVC Bitrate Model .......................................................................................................... 95 4.1.3 Analysis of the Model ................................................................................................................... 97 4.2 Model Evaluation in a Simulated Video Surveillance Test Environment......................... 98 4.2.1 Model Evaluation for Videos with Changing Content ................................................... 99 4.2.2 Video Surveillance Network’s Power Consumption Estimation ............................ 105 4.2.3 Experiments and Results ......................................................................................................... 105 4.3 Summary ...................................................................................................................................................... 110  SPATIAL AND TEMPORAL COMPLEXITY OF CONTENT AND VIDEO SURVEILLANCE NETWORK’S POWER CONSUMPTION.............................. 111 5.1 Methodology............................................................................................................................................... 111 5.1.1 Content Complexity ................................................................................................................... 112 5.1.2 Design of Bitrate Model ........................................................................................................... 113 5.1.3 Design of Coding Complexity Model ................................................................................... 116 5.1.4 Estimation of the Model Parameters ................................................................................. 117 5.1.4.1 Video Sequences .......................................................................................................... 118 5.1.4.2 Video Encoding Parameters ................................................................................... 118 5.1.4.3 Bitrate Model Parameters Estimation ............................................................... 119 5.1.4.4 Coding Complexity Model Parameters Estimation ..................................... 121 5.2 Experiments and Results ..................................................................................................................... 124 5.2.1 Performance Evaluation of the Bitrate Model .............................................................. 125 5.2.1.1 Comparison with an Existing Bitrate Estimation Model .......................... 127 xi  5.2.2 Performance Evaluation of the Coding Complexity Model ...................................... 128 5.3 Power Consumption Estimation using the Proposed Models ............................................ 130 5.4 Summary ...................................................................................................................................................... 138  CONCLUSIONS AND FUTURE WORK ............................................................... 140 6.1 Summary of Thesis Contributions ................................................................................................... 140 6.2 Significance and Potential Applications of the Research ...................................................... 143 6.3 Future Work ............................................................................................................................................... 144 6.3.1 The Use of HEVC ......................................................................................................................... 144 6.3.2 Event-triggered Content-Based Video Surveillance Network ................................. 144 6.3.3 Content Based Frame Rate Adjustment ........................................................................... 145 6.3.4 Cloud Based Video Surveillance Network ........................................................................ 146 References…………………………………………………………………………………………. ..................... .147   xii  List of Tables Table 2.1:  Parameters used ................................................................................................................................. 39 Table 2.2:  Average network lifetime and network lifetime offset, normalized to PRO-VSN 42 Table 3.1:  Encoding parameters for the low complexity configuration ......................................... 48 Table 3.2:  Configuration ID .................................................................................................................................. 54 Table 3.3:  Parameters used for encoding and communication trade-off experiments .......... 56 Table 3.4:  Simplified Configuration ID (CID)............................................................................................... 58 Table 3.5:  Spatial Information unit (SI) of all videos ............................................................................... 68 Table 3.6:  Temporal Information unit (TI) of all videos ........................................................................ 69 Table 3.7:  Training and test scenes .................................................................................................................. 72 Table 3.8:  Parameters Used for Fairness-based Algorithm Experiments ..................................... 82 Table 3.10:  Fairness ratio allocation obtained from each training scenes ................................... 85 Table 3.9:  Pnet, Pavg and STD(Pi) of the Training Scenes .................................................................... 86 Table 3.11:  Percentage of improvement of the proposed algorithm against the other techniques ..................................................................................................................................................................... 88 Table 4.1:  ME complexity level (ML) ................................................................................................................ 92 Table 4.2:  ME complexity level (ML) and ML............................................................................................... 94 Table 4.3:  Coding complexity modeling error ............................................................................................ 97 Table 4.4:  Bitrate modeling error ..................................................................................................................... 97 Table 4.5:  Coding complexity estimation error for different values of k .................................... 101 Table 4.6: Bitrate estimation error for different values of k .............................................................. 103 Table 4.7: Parameters used................................................................................................................................ 105 xiii  Table 4.8: Experiment 1 scenarios ................................................................................................................. 106 Table 4.9: Experiment 2 scenarios ................................................................................................................. 108 Table 5.1:  Functions of SA and TA to model fI.......................................................................................... 112 Table 5.2:  Functions of SA and TA to model fP ......................................................................................... 112 Table 5.3:  ME complexity level (ML) ............................................................................................................. 114 Table 5.4:  Bitrate estimation results ............................................................................................................ 125 Table 5.5:  Bitrate estimation results for select videos ........................................................................ 125 Table 5.6:  Comparison with existing model ............................................................................................. 127 Table 5.7:  Coding complexity estimation results ................................................................................... 128 Table 5.8:  Coding complexity estimation results for select videos................................................ 128 Table 5.9:  Parameters used .............................................................................................................................. 134 Table 5.10:  GOP and ML values obtained for three scenes selected .............................................. 135 Table 5.11:  Nodes’ maximum power consumption............................................................................... 137  xiv  List of Figures Figure 1.1: Video Sensor Networks (VSNs) ......................................................................................................1 Figure 1.2: Scope of the research ....................................................................................................................... 19 Figure 1.3: Research roadmap ............................................................................................................................. 22 Figure 2.1: Encoding and transmission power consumption ............................................................... 30 Figure 2.2: Optimization in PRO-VSN............................................................................................................... 33 Figure 2.3: A Video surveillance network’s zones ..................................................................................... 35 Figure 2.4: The distance-based optimization ............................................................................................... 38 Figure 2.5: Performance of the proposed heuristic. (a) varying distortion, node=4, width=50m; (b) varying distortion, node=9, width=50m ...................................................................... 40 Figure 2.6: Performance of the proposed heuristic. (a) varying width, node=4, distortion=400; (b) varying width, node=9, distortion=400 ................................................................ 40 Figure 2.7: Performance of the proposed algorithm. (a) maximum network lifetime, (b) the number of zone producing maximum lifetime (zopt) ................................................................................ 41 Figure 2.8: Network lifetime comparison (a) exact values (b) normalized towards PRO-VSN............................................................................................................................................................................................. 43 Figure 2.9: Average power consumption profile ........................................................................................ 44 Figure 3.1: Complexity measure of different GOP sizes and QP values (a) encoding time, (b) the number of instruction counts ...................................................................................................................... 49 Figure 3.2: Rate distortion plot for different QP values and GOP sizes ........................................... 50 Figure 3.3: The effect of different SR values to bitrate, complexity, and PSNR (a) BQmall 15 fps CIF video, GOP=75, QP=28 (b) Traffic 15 fps CIF video, GOP=75, QP=28 ............................... 51 xv  Figure 3.4: Trade-off between coding complexity and bitrate for BQmall 15 fps CIF video, GOP={1,2,4,875}, SR={2,4,8,16}, QP=28 ......................................................................................................... 53 Figure 3.5: Node`s power consumption: (a) single hop (b) multi-hops transmission ............. 57 Figure 3.6: Camera placement ............................................................................................................................. 60 Figure 3.7: Scene settings ...................................................................................................................................... 61 Figure 3.8: Snapshots of the “office” scene in the first activity setting from (a) camera2 and (b) camera6; and snapshots the third activity setting: (c) camera2 and (d) camera6 ............ 62 Figure 3.9: Snapshots of the “classroom” scene from (a) camera2 and (b) camera4; and the snapshots of the “party” scene from (c) camera2 and (d) camera4.................................................. 63  Figure 3.10: Complexity, bitrate, and video quality trade-off ............................................................. 64 Figure 3.11: The complexity and bitrate of some videos from the “party” scene (a) different cameras but similar activity setting (b) the same camera but different activity settings ...... 65 Figure 3.12: The complexity and bitrate of some videos captured by camera2 at similar activity setting but different scenes .................................................................................................................. 65 Figure 3.13: The complexity and bitrate of some videos captured at different scene settings............................................................................................................................................................................................. 67 Figure 3.14: SI(TI) class of office_act1 and classroom_act1 scenes and their corresponding scene’s class .................................................................................................................................................................. 71 Figure 3.15: Scene’s classes .................................................................................................................................. 72 Figure 3.17: Power consumption minimization using CID .................................................................... 75 Figure 3.16: MaximumFairness CID allocation algorithm ..................................................................... 73 Figure 3.18: Power consumption minimization with bounded CID values ................................... 76 xvi  Figure 3.19: RecursiveBranchBound procedure ........................................................................................ 77 Figure 3.20: Fairness-based minimum energy algorithm ...................................................................... 79 Figure 3.21: Fairness-based with adjustment algorithm ....................................................................... 80 Figure 3.22: Network topology used in the experiments ....................................................................... 81 Figure 3.23: Bitrate allocated per each node in all training scenes obtained by the three algorithms (a) CommonConfig (b) MaximumFairness (c) Proposed ............................................... 83 Figure 3.24: Node’s power consumption profile for the high SI training scene obtained by the three algorithms: (a) CommonConfig (b) MaximumFairness (c) Proposed ......................... 84 Figure 3.25: Comparison of the Pnet, Pavg, and STD(Pi) values obtained from all test scenes............................................................................................................................................................................................. 87 Figure 4.1: Normalized CP of the “BQMall” and “Traffic” videos (15 fps, CIF, GOP=2) ............ 93 Figure 4.2: Fractional increase of normalized CP of the “BQMall” and “Traffic” videos (15 fps, CIF) ........................................................................................................................................................................... 93 Figure 4.3: Bitrate of a P-frame for different ML of “BQMall” video .................................................. 95 Figure 4.4: Camera placements ........................................................................................................................... 98 Figure 4.5: Content changes during a 10s camera1_shot3 video sequence. (a) frame 1 (b) frame 70 (c) frame 100 ........................................................................................................................................... 99 Figure 4.6: Content changes during a 10s camera2_shot2 video sequence. (a) frame 1 (b) frame 60 (c) frame 110 ........................................................................................................................................... 99 Figure 4.7: Flowchart for coding complexity and bitrate estimation error reduction method.......................................................................................................................................................................................... 100 xvii  Figure 4.8: Actual and estimated coding complexity for different values of k and GOP sizes for the following sequences (a) camera4_shot1 (b) camera1_shot3 ............................................. 102 Figure 4.9: Actual and estimated bitrate for different values of k and GOP sizes for the following video sequences (a) camera4_shot1 (b) camera1_shot3 ............................................... 104 Figure 4.10: Nodes’ power consumption in experiment 1 (a) scenario 1, (b) scenario 2, (c) scenario 3, (d) scenario 4 .................................................................................................................................... 107 Figure 4.11: Nodes’ power consumption in experiment 2 (a) scenario 1, (b) scenario 2, (c) scenario 3, (d) scenario 4 .................................................................................................................................... 109 Figure 5.1: Actual and estimated bitrate for varying GOP sizes, ML and QP values for the following videos (a) camera1_office_act1, (b) camera4_office_act1, (c) camera5_ffice_act3, (d) camera2_classroom_act3, (e) camera6_party_act2, and (f) camera3_party_act4. ............ 126 Figure 5.2: Actual and estimated coding complexity for varying GOP sizes, ML and QP values for the following videos (a) camera1_office_act1, (b) camera4_office_act1, (c) camera5_ffice_act3, (d) camera2_classroom_act3, (e) camera6_party_act2, and (f) camera3_party_act4. .............................................................................................................................................. 129 Figure 5.3: Video surveillance network layout ......................................................................................... 131 Figure 5.4: Video surveillance network power consumption minimization.............................. 133 Figure 5.6: Nodes’ video PSNR for the three test scenes analyzed ................................................. 136 Figure 5.5: Nodes’ power consumption profile for (a) office_act2 (b) classroom_act3 and (c) party_act4 scenes .................................................................................................................................................... 136 xviii  List of Abbreviations CPI Cycle per Instruction DVC Distributed Video Coding FPS Frame Per Second GOP Group of Picture HD High Definition video (19201080 pixels) IEEE Institute of Electrical and Electronic Engineers IEC International Electrotechnical Commission IoT Internet of Things ISO International Organization for Standardization ITU-T International Telecommunication Union JPEG Joint Photographic Expert Group  Kbps Kilo bit per second MAC Medium Access Control MAD Mean of Absolute Differences Mbps Mega bit per second MDC Multiple Descriptions Coding ME Motion Estimation MAPE Mean Absolute Percentage Error MPEG Motion Picture Experts Group PCC Pearson’s Correlation Coefficient PRD Power Rate Distortion PRA-VSN Power Rate Allocated – Video Surveillance Network PRE-VSN Power Rate hEuristic – Video Surveillance Network PRO-VSN Power Rate Optimal – Video Surveillance Network PSNR Peak Signal to Noise Ratio QoS Quality of Service QP Quantization Parameter RMSE Root Mean Square Error xix  SI Spatial Information SR Search Range for motion estimation TI Temporal Information UBC University of British Columbia VSN Video Sensor Network  WSN Wireless Sensor Network xx  Acknowledgments I would like to profoundly thank my supervisors Dr. Victor Leung and Dr. Panos Nasiopoulos for their continuous support, encouragements and guidance throughout the years I spent at UBC. Without their wisdom and valuable suggestions, this work would have not materialized. I am truly honored to be supervised by a pioneer researcher in the field of wireless communications and multimedia. I would also like to thank Dr. Nasiopoulos for his enthusiasm, motivation, and insightful suggestions that not only enlarged my vision of science but also about life. He has always been a great mentor and a role model for me. I am also grateful and thankful for the guidance, help, and support that I received from Dr. Mahsa Pourazad for the last four years of my Ph.D. Finally, I would like to also thank my supervisory committee members for their contribution in this thesis.  I would also like to extend my sincere thanks to my colleagues at the Digital Media Lab, Dr. Di Xu, Mohsen Amiri, Hamid Reza Tohidypour, Dr. Amin Balitalebi, Sima Valizadeh, Basak Oztas, Maryam Azimi, Ahmad Khaldieh, Pedram Mohammadi, Fujun Xie, Stelios Ploumis, Ilya Ganelin, Timothee Bronner and all other lab mates. It was a wonderful experience working in such a friendly but also enriching environment. I also thank my friends in the X310 lab, Dr. Davood Karimi, Hamid Palangi and others. I would also like to thank my colleagues at the Wireless Networks and Mobile Systems Labs for their support in the earlier stage of my research, Dr. Ali Al-Shidani, Dr. Amr Al-Asad and Dr. Sergio Gonzales. I also thank my friends and colleagues I met in Vancouver and also the Indonesian community in Vancouver. Each and every one of them supported me in one way or the other to complete this work. xxi  To my wife for her unconditional support that kept me going whenever everything does not seem to work, and to my kids that brought joy to my life; I cannot describe how thankful I am for their warmth and love that are reminders that my world does not revolve only around my research.  Last but not least, I owe praise and thanks to almighty God, the source of all good, for granting me the wisdom, health, and strength to undertake this task and enable me to complete this long journey.      xxii  Dedication      To my parents, words cannot describe how lucky I am to be your son To my wife for the love and constant support that keep me going To my kids for the joy that you brought to my life      1   Introduction Recent advances in hardware, computing, and communication technologies have enabled the implementation of pervasive computing applications that share the vision of ubiquitous networked devices that can function and process information gathered from the environment on behalf of users. Wireless sensor networks (WSNs) [1] that are able to provide a diverse set of context data from the monitored environment to interested clients can provide the basis architecture for such technology. Sensor nodes can sense physical features of the environment, perform simple data processing and send the information to a central device that usually has unlimited resources. The information gathered at the central device, usually called the sink node, can be used by a human operator or additional machine/software to perceive the condition of the monitored environment and provide some kind of action if necessary. Due to the ad-hoc nature of the deployment, the information sent to the sink is usually performed in a multi-hop wireless communication fashion.  Figure 1.1: Video Sensor Networks (VSNs) 2  Although WSN was originally used to gather and process simple physical measurements of the environment such as temperature and humidity, recent trends focus on a wide range of WSN applications that include the following: public structures (such as bridges) monitoring [2], habitat monitoring [3], healthcare [4], access hatch monitoring [5], fire alarm [6], volcanic eruption monitoring [7], and disaster-hit region surveillance [8]. With the availability of more advanced sensor nodes, there is an increasing demand in using rich context information in the form of multimedia streams such as image or video in WSNs [9][10]. This type of sensor networks is equipped with tiny camera sensor nodes, embedded processor, and wireless transceiver. This architecture is called wireless video sensor network (VSN) [11] (see Figure 1.1). Note that some literature uses VSN to denote visual sensor network as WSNs that have the ability to perform image or video processing on their nodes [12][13]. Unlike common WSNs, video sensor networks are able to provide richer sensing information and coverage that are beneficial for applications requiring visual information. Some of these applications include traffic monitoring [14][15], smart home [16][17], fire alarm [18][19], tracking [20], object classification and feature extraction [21][22], video summarization [23], predictive visual analysis [24], intelligent environment [25], Internet of Things [26], and surveillance [11], [27], [28]. In the case of surveillance application, VSNs offer some additional benefits as compared to the existing systems. Firstly, the deployment cost of VSN-based surveillance application is much cheaper than the existing surveillance system, since there is no wiring cost. In addition to that, VSNs have the potential to improve the ability to develop user-centric video surveillance applications to monitor and prevent harmful events [29]–[31].  3  A video surveillance system that was implemented using VSNs usually has limited resources in terms of energy resources, network bandwidth, and processing capability. Unlike the case with common sensor networks where power consumption mostly comes from the wireless transmission, the video encoding process contributes to a significant portion of the overall power consumption. However, there is a trade-off between coding complexity and bitrate in encoding a video such that higher compression, i.e., lower bitrate, comes at the cost of higher coding complexity. Furthermore, coding complexity and bitrate entailed by the encoding process depend on the encoding parameters used and content features of the captured video. Regarding transmission power consumption, there is a need for considering that the video surveillance node may need to relay information from other nodes. Thus, it is very important to increase the energy efficiency of video encoding and the transmission process performed in the video surveillance network. This thesis investigates the energy efficiency of video surveillance network where the trade-off between computation and communication power consumption is exploited. In particular, the effect of video encoding parameters, content complexity, and network topology are taken into consideration in the video surveillance network’s power consumption minimization. Chapter 2 describes the effect of video encoding configurations on the trade-off between computation and communication power consumption. Chapter 3 analyzes the effect of different video encoding parameter settings towards coding complexity and bitrate. In order to proceed with this analysis, a video surveillance dataset consisting of a large number of captured content with different levels of spatial detail and motion activity was captured. An algorithm that minimizes the video surveillance 4  network’s power consumption for different scene settings is also proposed in this chapter. In Chapter 4, we analyze the effect of different encoding parameter settings on power consumption by designing models to estimate the coding complexity and bitrate.  Chapter 5 discusses how video content features are incorporated into the models proposed in Chapter 4. The optimization method utilizing the coding complexity and bitrate models to minimize the video surveillance network’s power consumption is then explained and analyzed. Summary of the contributions and future work are detailed in Chapter 6. The following sections in this introductory chapter provide a literature review and an overview of the topics addressed in each of the research chapters.  Section 1.1 provides the literature review on video sensor network technology for surveillance application. Existing studies on video surveillance networks’ power consumption measurement, model and analysis are summarized in Section 1.2. Section 1.3 reviews the effect of video encoding parameters on the video surveillance network’s power consumption. In Section 1.4, we provide an overview of coding complexity and bitrate models of H.264/AVC. Section 1.5 discusses the effect of spatial and temporal information of content on power consumption in a video surveillance network. Section 1.6 concludes the introduction with a summary of research contributions presented in this thesis. 1.1 Video Surveillance Network Overview Research in the VSN for surveillance applications started at the beginning of this century. The idea of capturing multimedia data on a sensor network platform was first introduced in [11][12]. One of the earliest prototypes was reported in [32]. Since then, the field of video surveillance network has received increased attention from researchers 5  around the world. In [33], a low-end video surveillance network called Cyclops was built using mica2 sensor mote. It uses its own image processing library that can transfer compressed image with a data rate of 40 kbps. Some general purpose video surveillance networks were later developed using the Intel XScale PXA255 board [34] and webcams, e.g., Panoptes [35] and Meerkat [36]. These platforms have the ability to encode captured contents into a stream of JPEG still images. After that, there has been an increasing number of video processing capable sensor nodes reported in the literature, that include the likes of imote2 [36],  cmucam3 [38],  XYZ sensor mote [39], CITRIC [40], and DSPCam [41]. With higher processing power, these platforms have the computational capability to perform more advanced video encoding techniques. Although most of these nodes operate on the IEEE 802.15.4-compliant Zigbee module that has a data rate of 250 kbps, some of the motes can also use the 802.11b WiFi compliant module that has a maximum data rate of 11 Mbps.  The increase of information level processed by the nodes in a video surveillance network provides a number of research challenges and of course opportunities [9][10][42][43]. Some of the key active research areas are the following: sensor coverage [44]–[48], camera selection [49]–[52], node placement [53]–[55], quality of service (QoS) [56]–[59], energy consumption (and system lifetime) [60]–[67].  However, because of the underlining concern of limited energy resources, the issue of energy efficiency has become one of the most important design aspects in video surveillance networks. In a common sensor network that operates on scalar data, the energy required to process the data is negligible. Therefore, the emphasis on energy efficiency is mostly on the wireless transmission process [68]–[75]. However, video processing requires extensive resources in 6  both encoding video and transmitting the encoded stream. The techniques that focus on the efficiency of wireless transmission alone, e.g., the work in [76], [77], are not sufficient to guarantee the overall energy efficiency in video surveillance networks. The amount of information processed by a video surveillance node is at least one order of magnitude higher than the scalar data that was handled in the traditional sensor network. The signal processing and video compression techniques performed on the video surveillance network nodes enforce additional requirements and burden to the computation capability and energy consumption of the nodes. In this regard, researchers have proposed and analyzed different data compression techniques for video surveillance networks. Still image compression algorithms such as JPEG [78] are the earliest choices for low-end video surveillance network platforms. Some researchers have analyzed and modified JPEG to suit the requirements of video surveillance networks [11], [79]–[83]. Another technique considered to be used for video surveillance networks is distributed video coding (DVC) [84][85][86]. The technique is based on information-theoretic bound established by Slepian and Wolf [87] and later by Wyner and Ziv [88]. This technique allows multiple correlated sources to be encoded independently and later jointly decoded. In practice, a regularly spaced subset of frames, called key frames, is coded using a conventional intra-frame coding technique such as JPEG or the one adopted in H.264/AVC. All frames between the key frames are referred to as Wyner-Ziv frames and are intra-frame encoded but inter-frame decoded. Motion compensated frames are generated using key frames as the side information at the decoder. A few examples of DVC-based video surveillance network were reported in the literature [89]–[91]. However, although some 7  studies considered DVC as the low-power encoder for video surveillance networks, the authors in [92] showed that the power consumption gain of DVC against the standard H.264/AVC encoder in low complexity mode is not significant and may not warrant the use of the feedback channel required in a DVC implementation. The availability of more advanced sensor nodes allows us to use more advanced video compression techniques such as H.264/AVC [93], [94]. H.264/AVC’s is presently the most widely used ITU-T and MPEG video coding standard. Although this video coding standard is computationally more expensive than the other compression techniques mentioned in the previous paragraphs, it offers better compression performance. In addition to that, the recent advent of integration between sensor networks and Internet of Things (IoT) technologies [96] shows that there are additional benefits in using H.264/AVC. Firstly, H.264/AVC provides an excellent portability across different hardware and software platforms. Secondly, since H.264/AVC is a video coding standard, the issues and solutions for transmitting such a coded stream on the Internet have been well established compared to other codecs. It is worth noting that the performance of H.264/AVC in terms of quality, bitrate, and complexity, is determined by a large number of encoding parameters. In a resource constrained environment such as a video surveillance network, it is very important to choose the right configuration and setting parameters that lead to an optimal coding performance. The encoder configuration settings of a video surveillance node determine not only the node’s power consumption but also the overall power consumption. This will be discussed in more detail in the following sections. 8  1.2 Overview on Video Surveillance Network Power Consumption Estimation  Video surveillance networks’ computation and communication capabilities and their performance parameters depend on the sensor node platform they use [97]. In addition, video surveillance networks are often powered by batteries, whose replacement may be infrequent, not preferable, or even impossible in many applications. Although there has been a tremendous improvement in the battery technology in the past decades, the capacity of the modern lithium-ion battery is nearing its limit [98]. It is of high importance that a video surveillance network has the capability to perform sophisticated video processing in an energy efficient way.  One of the earliest studies on power consumption needed to encode a video and send the compressed data through a wireless medium is reported in [99]. In that study, it is shown that for even a relatively small picture size, i.e., QCIF (quarter common intermediate format, 176x144 pixels) video, the H.263 encoder consumes about two-thirds of the total power consumption of video transmission in a wireless local area network (WLAN). A characterization of video surveillance network’s power consumption sources in a Meerkat testbed was performed in [61]. The work in [62] highlighted that the two main sources of energy drain in a video surveillance node is the computation and communication. That paper studied the computation and communication power consumption of different video encoders, i.e., Motion JPEG2000, H.264/AVC Intra, H.264/AVC Inter main profile and DVC, implemented on a Stargate platform. The encoding power consumption is calculated as CPU cycles needed to encode the video multiplied by the energy consumption per cycle. It 9  was shown that the computation and communication power consumption of videos depend on the video encoder used and the encoder configuration settings. Based on these results, the authors claimed that for a single-hop transmission video surveillance network, the inter-based H.264/AVC encoding is not suitable for low energy sensor nodes. The authors in [100] studied the encoding and transmission power consumption of a multi-hop video surveillance network. The video encoders analyzed in [100] include H.264/AVC intra, DISCOVER, a DVC-based encoder presented in [86], and distributed compressive video sensing (DCVS) [101]. In [65], the trade-off between computation and communication energy consumption in a single-hop and multi-hops communication of MPEG-4 and H.264/AVC encoders was compared. It was shown that H.264/AVC is more energy efficient than MPEG-4 in a multi-hop video surveillance network environment. All the above studies have shown that different encoders and even different profiles of the same encoder lead to different power consumption. In addition, the configuration settings used to encode the video play an important role in controlling the overall power consumption of the video surveillance network. However, although these studies provided some insights about the energy consumption profile of different video encoders in a testbed, the number of video encoder configuration settings used in these studies is limited. In particular, the performance of the H.264/AVC encoder in low encoding mode was not properly investigated.  1.2.1 Video Surveillance Network Power Consumption Model A video surveillance network node’s total power consumption is due to encoding power consumption and transmission power consumption. Both consumptions depend 10  largely on video encoding parameters used to encode the video. This is due to the fact that the energy consumed to encode a video depends on the coding complexity of the underlined process. On the other hand, transmission of lower bitrates requires less amount of energy. As a result, in order to be able to estimate the video surveillance network’s power consumption, there is a need to have an in-depth investigation of the effect of encoder parameters settings towards coding complexity and the resulting bitrates.  In order to estimate the nodes’ power consumption, the authors in [62] calculate encoding energy consumption as the multiplication of the encoding elapse time, i.e., the time needed to encode a video, with the node’s energy consumption per cycle. In another study, the encoding energy consumption is calculated as the estimated encoding elapse time multiplied by the assumed voltage and current of a sensor node [100]. From these studies, it can be seen that the encoding power consumption can be estimated from the coding complexity measure for encoding the video and some parameters related to the sensor node hardware platform. On the other hand, the transmission energy consumption depends on the bitrate of the video, the transmission distance and some parameters related to wireless transmission.  However, the nodes in a video surveillance network may need to relay the encoded video data from the other nodes in the network. Therefore, the encoder parameters settings used at a specific node in the video surveillance network affect not only the node’s power consumption but also the distribution of power consumption in the overall system. In order to optimize the video surveillance network performance, i.e., minimize its power consumption, we need to take into consideration the encoding and communication power 11  consumption of all the nodes in a video surveillance network. This will be explained in more detailed in the following subsection. 1.2.2 Trade-off between Encoding and Communication Power Consumption Due to the nature of their deployment, the video surveillance nodes farther from the sink may need to send their data through intermediate nodes. This increases the communication power consumption of the nodes that are located closer to the sink [72], [102], [103]. In a battery powered video surveillance network, the amount of initial energy resources available per each node and the node’s total power consumption determines its system lifetime, which is an important aspect of its performance [104]. In order to determine the relation between encoding and communication power consumption of a video surveillance network, some researchers tried to relate the nodes’ power consumption with the bitrate of the encoded video. One such model is called the power-rate-distortion (PRD) [60], [105]. According to the PRD model proposed in [60], in order to achieve a specific encoding distortion requirement, one can either increase the encoding efficiency or the bitrate. The power consumption of the encoding process can be controlled by managing three video coding parameters: the frame rate, the number of skipped blocks, and the number of examined blocks for motion estimation [60]. In [105], the authors simplified the PRD model and showed that there is a trade-off between encoding and transmission where optimal power consumption of a source node can be achieved. However, the PRD model includes a parameter called encoding efficiency coefficient that is presumably specific to the hardware platform. 12  Based on the PRD model, an optimal encoding power, and rate allocation was proposed in [63]. In that study, the node’s encoding power consumption and bitrate were controlled such that the power consumption is minimized. Although the technique can allocate the encoding power consumption and bitrate for each node, the method to determine the encoding parameters settings to get such encoding power consumption and bitrate were not explained. Nevertheless, the nodes’ encoding power consumption can be estimated from the computational complexity of the encoding task. In addition, the coding complexity and bitrate of video depend on the encoder parameters settings used to encode the video. Therefore, in order to find the encoder configuration settings for each node that lead to minimum power consumption, we need to develop models that relate these encoding settings with coding complexity and bitrate.  1.3 Video Encoding Configuration and Fairness in a Video Surveillance Network H.264/AVC is presently the most widely used ITU-T and ISO/IEC video coding standard. This standard comes with plenty of setting parameters and configurations that can be tailored to suit the application requirements. In order to take full advantage of the flexibility and advanced features of the H.264/AVC, one needs to study the effect that different parameter settings have on H.264/AVC’s performance in terms of computation complexity and bitrate. Note that choosing the right configuration and parameters that lead to optimal encoding performance is of prime importance to video surveillance applications that are constrained in terms of bandwidth and energy resources. 13  There is a number of published research on H.264/AVC’s performance [106], [107]. However, the focus of the existing studies is to determine the optimal coding configuration without considering the total energy consumed for encoding and transmission. It is worth noting that there is a trade-off between coding complexity and bitrate such that in order to reduce the bitrate (lower transmission power consumption), more advanced encoding techniques are necessary (higher encoding power consumption). The trade-off between encoding and transmission energy consumption for a limited number of H.264/AVC configuration settings were investigated in [62], [65], [100]. However, H.264/AVC comes with a large number of encoding parameters and its low complexity performance needs to be studied in more detail.  Furthermore, video nodes in the surveillance network share the same wireless medium in order to send their encoded video to the sink node. Since the bandwidth allocated to a network is limited, there is an issue of fairness of bitrate allocated per each node. Allocating the same bitrate to each video node guarantees fairness in terms of bitrate and quality of the encoded video, given that each node has the same video and is using the same video encoding parameters settings. However, in many video surveillance deployment scenarios, nodes further from the sink usually need to relay their data through intermediate nodes. Therefore, the total energy consumption of nodes that are closer to the sink will be greater than the nodes that are further. More balanced energy consumption among nodes may be achieved by allocating different fairness ratio for each node in the video surveillance network. However, that needs to be achieved without sacrificing the quality of the transmitted video of any node. This can be done by assigning different 14  combinations of coding complexity and coding efficiency in terms of bitrate to each node in the video surveillance network so that the nodes produce compressed videos with almost similar quality in terms of peak signal to noise ratio (PSNR). This is in contrast to previous approaches, where all nodes use the same encoding configuration. Note that in this thesis, that approach is called the CommonConfig technique. By assigning different encoding configuration to each node, different fairness ratios are assigned to the nodes. That will affect the distribution of power consumption in a video surveillance network, such that the trade-off between encoding complexity and compression performance can be exploited in order to minimize the overall power consumption. To the best of our knowledge, this idea has not been investigated in any existing literature on video surveillance network. 1.4 Coding Complexity and Bitrate Models As mentioned in the previous subsections, the energy consumed to encode a video stream depends on the complexity of the encoding process. On the other hand, transmission of lower bitrates requires smaller amounts of energy. One approach to increasing the efficiency of energy consumption is by assigning different fairness ratio per each node, as mentioned in the previous subsection. This fairness ratio can then be mapped towards coding complexity and bitrate tuples that are generated from some combinations of encoding parameters. However, such approach is not flexible and is limited by the number of configurations that can be used per each node. Thus, models that can estimate the coding complexity and bitrate are necessary. Several studies have proposed models that relate the average bitrate of an encoded video to some video encoding parameters, such as the group of picture (GOP) size, frame rate, and/or different video content features. 15  Ding and Liu proposed one of the earliest rate models in [108] that relate the bitrate to the quantization step-size. In [109], [110], quadratic rate models based on the quantization parameter (QP) were proposed. Another quadratic rate model was proposed in [111] where the bitrate is estimated using QP and the frame complexity measure called mean absolute differences (MAD). MAD is calculated using linear predictions from both the actual MAD and the rough MAD, i.e., the MAD between the current frame and the previous reconstructed frame [112]. Furthermore, Ma et. al., proposed a bitrate model that considers the QP and the frame rate of the video [113]. This model is based on a multiplicative power function where the model parameters are estimated using the following video content features: frame difference, normalized frame difference, motion vectors, displaced frame difference, motion activity intensity, motion vectors normalized by contrast, motion vectors normalized by intensity, and motion vectors normalized by variance. The displaced frame difference is described as the mean absolute difference that is in the top ten percentile between of corresponding pixels in successive frames using an estimated motion vector. For more information on the features used in [113], please see [114]. The multiplicative power function model is later simplified in [115] by using spatial information (SI) and temporal information (TI) indices of video content (these indices are calculated based on [116]). The aforementioned models use the same GOP structure in their analysis. However, the GOP size affects the coding complexity and bitrate trade-off significantly. An elaborate study on the effect of GOP size towards the video surveillance network’s power consumption is thus necessary. 16  In terms of an encoding power consumption model, a power-rate-distortion model was used in [105]  where the video encoder power consumption is related to bitrate, video variance and a parameter called encoding efficiency coefficient. Computational complexity management and control strategy of H.264/AVC for low-power devices was proposed in [117], [118]. In these, based on the available processing power of the devices, the number of block size candidates used for motion estimation and the frame rate of the video were controlled. However, these studies only considered the IPPPPP GOP structure. The authors in [119] proposed a coding complexity model for H.264/AVC for different encoding configurations. However, some of the model parameters can only be obtained by encoding the video. Thus, the model cannot be used for new unknown videos There are some H.264/AVC encoding parameters that significantly affect the coding complexity and bitrate. These parameters include GOP size, GOP structure (whether the B-frame is used or not), sub-pixel motion estimation, motion estimation algorithms, the type and variety of block size candidates used for motion estimation, QP, and frame rate. In order to develop the coding complexity and bitrate models for video surveillance network, the effect and significance of those parameters need to be investigated. Furthermore, although the results shown in earlier studies indicate that video content affects the coding complexity and bitrate, there has been no in-depth investigation on the use of spatial and temporal information for estimating the coding complexity and bitrate, let alone incorporating this information in the power consumption estimation. The following subsection discusses the need to include the effect of spatial and temporal information in estimating power consumption in a video surveillance network. 17  1.5 The Effect of Content Complexity on Video Surveillance Network’s Power Consumption One of the common drawbacks of the existing studies on the video surveillance network’s power consumption is the fact that the analysis is performed with the assumption that the same video content is used in all the nodes [60], [63], [65], [100]. However, this assumption is a very simplistic approach and it does not represent a real video surveillance network deployment, where different nodes capture the scene from different point of views.  Therefore, the content complexity of captured video streams from all the node is not the same. Note that the performance of a video encoder in terms of computational complexity and bitrate depends on both encoding parameters and the temporal and spatial complexity of the content. That elevates the importance of determining the trade-off between encoding complexity and data transmission in a video surveillance network that would minimize power consumption. The coding complexity and bitrate of video determine the video surveillance network’s power consumption. Furthermore, the coding complexity and bitrate of a video depend on the encoding parameters and spatial/temporal information (SI, TI) of content. Therefore, in order to estimate the power consumption of a video surveillance network, one has to also consider the spatial and temporal complexity of the scene at which the video surveillance network will be deployed. In order to do that, it is important to have knowledge of the encoding complexity and efficient bitrate models that take into consideration the complexity level/information of the captured scene. It is important to note that for the same visual quality, the total bitrate generated by the captured scenes that 18  have high spatial and temporal detail will be higher than the ones obtained from scenes with lower spatial and temporal detail. Considering that the bandwidth of a video surveillance network is limited, the scene’s spatial and temporal detail will affect the visual quality of the encoded video. This means that the compression efficiency will have to change at every scene, depending on its complexity. However, to the best of our knowledge, such condition has not been explored in the existing studies. Furthermore, in order to be able to develop and test models for power consumption, there is a need to have a video surveillance dataset of real-life captured content that represents videos with different spatial and temporal complexities. Unfortunately, such a dataset is not publicly available.  1.6 Research Topic Formulation The overview provided in Section 1.2 to Section 1.5 presents two important research questions:  1. Can we understand how the encoder parameters settings, the complexity of content and network configuration affect the trade-off between coding complexity and bitrate of videos captured by the nodes in a video surveillance network? 2. Can we exploit the trade-off between the key factors mentioned in point 1 above to minimize the overall power consumption of a video surveillance network? The above-mentioned research questions require us to perform an in-depth study in both video compression and wireless network subjects. Note that, these two subjects have a diverse active research area, different theories, principles, assumptions and, sometimes, conflicting implementation requirements that are difficult to cover in a thesis. Therefore, in 19  order to proceed, we need to limit the scope of our research and use some necessary assumptions. The following subsections provide the scope and roadmap of the research presented in this thesis. 1.6.1 Scope and Limitation of the Thesis A major part of the work presented in this thesis is centered around topics related to the trade-off between encoding complexity and bitrate, based on the analysis of encoder parameter settings and content complexity of videos as shown in Figure 1.2. In exploiting the trade-off of the factors shown in that figure to minimize the overall power consumption, we focus our analysis on the encoding and communication operations on a video surveillance network. In addition to that, we also put an emphasis on taking into account a diverse set of video content to simulate a more practical and realistic deployment of a video surveillance network. In this regard, many aspects of the wireless transmission process of the simulated network were kept simple, to provide a greater depth analysis of the effect of spatial and temporal information content on the video surveillance network’s power consumption. The following are some assumptions that were used in this thesis.  Figure 1.2: Scope of the research 20  1. All the nodes in the video surveillance network encode and transmit videos of the captured event. This is performed in order to simulate the surveillance network under a heavy load that signifies maximum power consumption and minimum system lifetime. 2. A video surveillance network can have a wide variety of network layouts. We use a grid base topology for the following reasons: (a) Our experiments were conducted using a large number of real-life captured content that we generated; it is easier to deploy and manage the cameras that were deployed in a structured layout such as a grid. (b) Grid layout is one of the common settings used in video surveillance networks. We use nine cameras in our experiments to follow the experiments conducted in [63]. Furthermore, we also assume that each node in the network can only transmit data to its direct neighbor. Note that this assumption has been used in other studies as well [62], [65], [100]. 3. The sum of the data rates of the encoded video streams transmitted from all the nodes is constrained to simulate the resource limitation in a video surveillance network. The value of this constraint will be mentioned specifically in the experiment. 4. We assume that all nodes use the same QP so that the encoded videos from different nodes will have almost similar quality. The highest quantization parameter used in this thesis is set equal to 40 to make sure that the encoded video has the minimum acceptable quality. However, different scenes may need to use different QP values. 21  The minimum QP value, i.e., that gives the highest quality, that can accommodate the network bandwidth for the scene will be used. 1.6.2 Research Road Map Figure 1.3 shows the research roadmap of the thesis. The first effort in our research involved the use of the existing theoretical model, coming up with a practical implementation and trying to control the bitrate using variable frame rates. This is achieved by assuming that the video surveillance network’s area can be divided into different zones. Nodes at the same zone are using the same frame rate and thus have the same bitrate. This research allowed us to better understand the complexity of the encoding process and that by controlling video encoding parameters used at each node, we could control the trade-off between encoding complexity and bitrate much more efficiently than just using the frame rate. In addition, we established a solid understanding of the relationship between encoding power consumption and that of transmitting the resulting bitstream and the dependence on the network layout. As a result, we focused the next stage of our research on analyzing the effect of different encoder parameters on coding complexity and bitrate. The objective of this effort was to determine the encoder parameters that have the highest impact on coding complexity and bitrate. We learned that the group of pictures (GOP) and search range (SR) for motion estimation significantly affect the coding complexity and bitrate. Furthermore, we also learned that video content affects coding complexity and bitrate and thus it needs to be taken into consideration. However, a video surveillance dataset consisting of a wide variety of content was not available. In fact, every existing method assumed that every camera in the network had the same22      Figure 1.3: Research roadmap 23  video content, something that is far from reality and of course leads to erroneous results. This led to the third roadmap in our research, which was the generation of a comprehensive real-life video sensor dataset with different scenarios and scenes. In order to generate such a video surveillance dataset, we installed 9 HD cameras which captured different views of the same scene. We developed different scenario settings and simulated different real-life activities on each scene. In addition to that, we also identified that camera placement and video capturing synchronization were two other challenges we face in VSNs. In this regard, we decided to use a grid connected camera layout that provided easier installation and maintenance. We used the default synchronization provided by the surveillance camera system during the video capturing process. This research effort resulted in a video surveillance dataset consisting of 108 unique  videos with a wide range of content complexity. Our next effort was to use this dataset in our analysis and development of an overall architecture that minimizes the video surveillance network’s power consumption. At the fourth roadmap, we assumed that the tuples of encoding configurations, coding complexity, and bitrate for each video could be stored in a look-up-table. We used the spatial and temporal information unit metrics provided by MPEG to measure the video content complexity. These metrics were also used to classify the videos into different levels of complexity. In this regard, we viewed the nodes’ encoding configuration problem as bitrate allocation such that the bitrate fairness allocation of scenes’ with higher content complexity can be used to allocate the nodes’ configuration at the other scenes. Although the results of the look-up-table (LUT) based 24  approach were good, this approach is not practical. This led to the final roadmap in our research. Using the dataset generated at the third roadmap, and the knowledge about encoding parameters that significantly affect the encoder performance, we developed coding complexity and bitrate models. We performed more than 30000 experiments from 108 unique videos and 343 different encoding configurations for each video to train and evaluate the models. We then combined the developed models with information about the network layout and nodes’ position in the network to design a novel architecture to minimize the video surveillance network’s power consumption.  The roadmap explained in the previous paragraphs was used to guide the research presented in this thesis and has resulted in three major contributions that will be explained in the following subsection. 1.7 Summary of Thesis Contributions Three contributions are presented in this thesis. Each contribution targets specific problems and limitations relevant to the issues described in the previous sections. A summary of this thesis’ main contributions is given below: 1. Determined the H.264/AVC encoder parameters that significantly affect the trade-off between encoding and communication power consumption. In order to have a practical implementation of the PRD model [105], in Chapter 2, we proposed a method to control the nodes’ bitrate by using variable frame rate. This was achieved by organizing the video surveillance network into different zones based on 25  the nodes’ distance from the sink. Nodes that are located in the same zone will use the same encoding parameters settings. The encoding power consumption allocated to each node is related to the frame rate and the energy consumption needed to encode one frame that was reported in [62]. The distance-based heuristic was thus proposed [120] and showed to have a performance that although was not as good as the optimal technique that uses the PRD model [63] it was a much higher performance than the single configuration approach. Note that the work described in [63] uses the PRD model to assign different encoding power consumption - and consequently bitrate – to each node in a VSN, without providing any insight how a node can achieve such encoding power consumption and bitrate.  However, frame rate alone is not the best way to control bitrate and coding complexity. There are other encoder parameter settings that can offer better bitrate control and better quality trade-off. Hence, we continued our research by analyzing the effect of different encoder parameters settings on encoding and communication power consumption. The objective was to determine the encoder parameters that have the highest impact on the trade-off between coding complexity and video bitrate. Our analysis, presented in Chapter 3, showed that the group of picture (GOP) size and search range (SR) for motion estimation have the highest impact on coding complexity and bitrate. The tuples of encoding configuration combinations, coding complexity, and bitrate of the video were then represented as a lookup table that can be used to exploit the coding complexity and bitrate trade-off such that the overall power consumption can be minimized.  An additional factor that has a direct effect on coding complexity 26  and bitrate is video complexity in terms of texture, color, and motion. Since all existing approaches make the erroneous assumption of using the same video for each camera node, it was essential to generate a real-life comprehensive video surveillance dataset with a wide variety of content for our analysis. This leads to the next contribution. 2. Generated a video surveillance dataset that represents different spatial and temporal complexities. In Chapter 3, we highlighted the fact that each node in a video surveillance network may capture video from a different angle, thus producing video streams with different content. In addition, the scene layout and setting where the surveillance network is deployed can have different levels of spatial detail and temporal complexity. Unfortunately, earlier studies did not consider the significance of video content in their framework. In fact, all studies we found in the literature assumed that each node uses the same video in their analysis. In order to be able to incorporate the effect of content, we needed to have a video surveillance dataset that has different levels of content complexities that can be used in our study. However, such dataset is not available in the literature. In order to proceed, we set up a complete surveillance system and captured a large number of videos from real-life and simulated events in a couple of public buildings. As a result, we generated two video surveillance datasets. The first dataset was compiled using four cameras capturing real-life events in one of the buildings. Another dataset was obtained by installing nine cameras in one of our labs. Each of these cameras was arranged to have a different point of view. Also, in order to have different levels of spatial and temporal complexities, we arranged a variety of 27  scene settings and simulated different level of details and activities for each scene setting. In total, we captured 108 unique videos that we used for our analysis. The video surveillance dataset is made available to public for further research [121]. In order to incorporate the effect of content, the spatial and temporal information (SI and TI) metric used by MPEG were utilized. The video surveillance dataset was further divided into different scene categories according to their SI and TI values. In our attempt to provide the nodes’ encoding configuration allocation that minimizes the overall power consumption, we first assumed that the tuples of encoding configurations combinations along with its corresponding coding complexity and bitrate could be stored in the form of a lookup table.  We then proceeded by assuming that in order to find the optimal encoder parameters settings for the test set, we can use the bitrate ratio information obtained from the training set. Note that, in this contribution, the scenes that have high content complexity were chosen as the training set. This is due to the fact that the scenes with a high complexity of content tend to have a higher bitrate than the scene with lower content information. We addressed the power minimization issue as a fairness allocation problem, where each node needs to be allocated a certain portion of the bitrate of the overall bandwidth such that the power consumption is balanced among all nodes. Comparison with the existing techniques showed a good performance. However, using a look-up-table is not practical nor efficient. We needed to design coding complexity and bitrate models to have a better performance. 28  3. Designed a novel optimization framework utilizing the models that incorporate video content features to minimize the video surveillance network’s power consumption. To proceed with this contribution, we needed to develop models to predict the coding complexity and bitrate of the videos. In Chapter 4, we considered the effect of GOP size and the number of block sizes candidates used in motion estimation to design the models. We used the first video surveillance dataset that contains 20 videos because at the time we started the work in this contribution, we were still generating the video surveillance dataset described in Chapter 3. The model parameters were obtained by analyzing the effect of these encoding parameters using four training videos. The model was then evaluated against the remaining videos. A power consumption study using the models confirmed that the content of the video affects the video surveillance network’s power consumption [122]. In order to include the effect of spatial and temporal information of video, we made several modifications to the developed models in Chapter 5. In addition, we also incorporated the effect of QP to account the trade-off between video quality and bitrate. The video surveillance dataset explained in Chapter 3 was used to develop and test our new model. Note that, a few scenes that represent the scene with the low, medium and high complexity of content (as classified in Chapter 3) were used to train the models. The models were evaluated against the remaining videos, excluding the ones used in the training. We evaluated the models’ accuracy in terms of Pearson’s correlation coefficient (PCC), root mean square error (RMSE), mean absolute percentage error (MAPE) and R-squared values. Comparison 29  with the existing techniques showed the superiority of our proposed models. Furthermore, we combined the model with the information about network layout and nodes’ position in the network in a novel optimization framework to minimize surveillance network’s power consumption. The objective of the framework is to find the nodes’ encoder parameters settings for the specific scene's videos used that minimize the overall power consumption. The performance of the proposed framework for different scenes was shown to be superior to the existing techniques. 30   Encoding and Communication Power Consumption Trade-off in a Video Surveillance Network In this chapter, we propose a distance-based heuristic for energy efficient VSN for surveillance applications. The proposed method exploits the trade-off between encoding and communication power consumption by dividing the deployment area of the surveillance network into several zones. The video surveillance network nodes that are located in the same zone are configured to have the same encoding power consumption and bitrate. The encoding power consumption and bitrate in a zone are controlled by using variable frame rate. In order to estimate the encoding configuration per each node, we use the PRD model [60], [63]. Note that, the objective is to find the optimal zone and correspondingly the frame rate, i.e., bitrate, for each zone that minimize the overall power  Figure 2.1: Encoding and transmission power consumption 31  consumption. Section 2.1 and 2.2 describe the PRD model and the corresponding network maximization algorithm that will be used as the reference to evaluate the performance of the proposed heuristic. The proposed distance-based heuristic is provided in Section 2.3. Section 2.4 concludes the chapter.  2.1 Power-rate-distortion Model in Video Surveillance Networks According to the PRD model proposed in [60], [105], the distortion of a compressed video D depends on the bitrate R and encoding power consumption Pe. In this regard, the encoding distortion measured in terms of mean squared error (MSE), is estimated using the following formula:  3/22 ePReD   (2.1) where 2 is the average input variance and  is the encoding efficiency coefficient. Hence, to achieve the encoding distortion requirement, one can either increase the encoding power or bitrate. Obviously, there is a trade-off where optimal power consumption of a source node can be achieved, as shown in Figure 2.1. Note that this figure is generated using the parameters mentioned in [63] and video distortion requirement of 100 MSE. 2.2 Network Lifetime Maximization in Video Surveillance Networks Network lifetime is one of the most important performance measures of a video surveillance network. Depending on the application requirement, different definition of network lifetime exists in the literature [74], [123][104], [124]–[126]. However, for critical applications such as a video surveillance network, it is common to use the minimum node lifetime as the performance measure. The minimum node lifetime assumes that the system 32  is considered to be in the fail state when at least one of the nodes consumes its entire initial power source. In order to model the video surveillance network, battery powered sensor nodes are assumed to be statically deployed in a square area characterized by its width. It is assumed that a standard medium access control (MAC) protocol is applied to resolve the link interference problem. The network is modeled as an undirected graph G(N,L) where N is the set of nodes and L is the set of links. Node i can communicate to node j if a link between those nodes (Lij  L) exists. Sensor node i, can capture and encode video, and generate data traffic with a source bitrate Ri. Furthermore, each node can also relay the traffic from upstream nodes. The flow conservation law at each node is: ikiij Rrr   .  (2.2) Here rij denotes the outgoing rate at Lij while rki denotes incoming rates at Lki, and Lij , Lki L. The total power dissipation at node i consists of the transmission power consumption, the reception power consumption, and the encoding power consumption: eiritii PPPP    (2.3) A general energy consumption model for a wireless communication transmitter and receiver as presented in [127] is used. The total transmission power consumption of node i is the sum of all power consumed to transmit data to other nodes within its transmission range.    ijijti rdP  .  (2.4) 33  Here,  and  are constant coefficients and  is the path loss exponent and LijL. Also, the total reception power consumption of node i is the sum of all power consumed to receive data from other nodes, as formulated below, where  is a constant coefficient and LkiL.    kiri rP    (2.5) The encoding power consumption of node i is determined using the PRD model [105]. For a given distortion requirement, the relation between encoding power consumption and bitrate can be stated as follows:   23/2 /log iieiDRP   (2.6) In this equation, Di is the distortion requirement for node i, 2 is the average input variance, and  is the encoding efficiency coefficient. It is assumed that the node is powered by a battery with an initial amount of energy Bi. The lifetime of node i is equal to: iii PBT /   (2.7)  Figure 2.2: Optimization in PRO-VSN 34  Minimum node lifetime states that Tnet = min Ti, iN. Maximizing Tnet = min Ti is equivalent to maximizing Tnet subject to Tnet ≤ Ti  i. Therefore, the objective function can be reformulated as a linear program by introducing variables qnet = 1/Tnet, qi = 1/Ti, such that 𝑃𝑖 = 𝐵𝑖 ∙ 𝑞𝑖 , and  𝑞𝑖 ≤ 𝑞𝑛𝑒𝑡 .  The lifetime maximization of the video surveillance network is formulated as a constrained optimization problem as shown in Figure 2.2. This algorithm is similar to the centralized power-rate optimization proposed in [63] and it is called the Power Rate Optimal – Video Surveillance Network (PRO-VSN) algorithm in this thesis. 2.3 Distance Based Heuristics for Energy Efficient Video Surveillance Networks In the intra-coding video mode, the average energy dissipation to encode one frame can be assumed to be constant for a video that has no scene change. In this regard, the encoding power consumption depends mostly on the frame rate of the video. Since the encoding power consumption is known, to achieve the distortion requirement, the bitrate can be calculated using the PRD model.  Furthermore, the nodes that are located closer to the sink usually need to relay the video data from the upstream nodes that are located farther from the sink. Due to this, the energy consumption of the nodes closer to the sink is usually higher than the upstream nodes. However, in order to minimize the overall power consumption and maximize its system lifetime, it is necessary to balance out energy consumption within all nodes [104], [128]. Therefore, the nodes that are closer to the sink should spend less power in encoding 35  than the farther ones. In this regard, in order to achieve the target distortion and also maximize the system lifetime, we should control the bitrate and encoding power consumption on each node, estimated based on its position in the network, i.e., their distance from the sink [120]. 2.3.1 Proposed Approach In order to allocate nodes’ encoding power and bitrate based on its distance from the sink, we propose to divide the network deployment area into different zones (z) as shown in Figure 2.3. All nodes located in the same zone will be configured to have the same encoding power consumption and bitrate. A zone is defined as a section of the network covering an area whose distance to the sink is greater than rm-1 and smaller or equal to rm, 1 ≤ m ≤ Zopt. The number of zones (Zopt), 1 ≤ Zopt ≤ Zmax, should be carefully chosen so that the network lifetime is maximized. On the other hand, the value of Zmax depends on the nodes’ distortion requirement and the hardware’s capability, as will be explained in the following d1d2d3d4 Figure 2.3: A Video surveillance network’s zones  36  paragraphs. Note that in Figure 2.3, sensor node i that is located at a distance di from the sink is a member of Zm if rm-1 < di ≤ rm. Each node has to become a member of exactly one zone. However, a zone may have zero members.  As stated earlier, for the same target video distortion, the encoding power consumption of nodes that are closer to the sink should be lower than the nodes farther from the sink. Consequently, the bitrate of nodes closer to the sink will be higher than the nodes farther from the sink. Therefore, the encoding power consumption and bitrate of any video surveillance network node in zone m (zm) are dictated by the following rules: 𝑃1𝑧 <𝑃2𝑧 < ⋯ < 𝑃𝑚𝑧 < ⋯ < 𝑃𝑧𝑚𝑎𝑥𝑧  and 𝑅1𝑧 > 𝑅2𝑧 > ⋯ > 𝑅𝑚𝑧 > ⋯ > 𝑅𝑧𝑚𝑎𝑥𝑧 . Here, 𝑃𝑚𝑧  and 𝑅𝑚𝑧 are the encoding power consumption and bitrate allocated for zm. Furthermore, we know from the PRD model [105] that:     3/2113/2 zzzmzm PRPR    (2.8) Since the average power dissipation for encoding one frame is assumed to be constant, the encoding power consumption for any node in the first zone can be calculated as follows: zframez fPP 111  .  (2.9) Here 𝑃1𝑓𝑟𝑎𝑚𝑒  is the average power dissipation to encoding one frame while 𝑓1𝑧 is the frame rate of any node in the first zone. Using (2.8) and (2.9), we can calculate the encoding power dissipation and bitrate in zone m: )1( 11  mfPPzframezm   (2.10) 37     3/2111 1/  mffRR zzzzm .  (2.11) The value of 𝑓1𝑧depends on the target distortion and the estimated value of bitrate at zone one. Since 𝑅1𝑧 ≥ 𝑅𝑚𝑧 , the bitrate at zone one can be estimated as follows:   VWR z /1ˆ1   .  (2.12) Here W is the bandwidth (maximum data rate) of the network, V is the number of video nodes and  is a small number. With 𝐷1𝑧  as the value of distortion requirement for zone one, the values of 𝑓1𝑧  and 𝑅1𝑧  can then be obtained as follows:      framezz PWDVf 15.1211 /1/log (2.13)     3/211211 //log framezzz PfDR   . (2.14) Note that, unless it is specifically mentioned, it is assumed that the distortion requirement for all nodes is the same, 𝐷1𝑧 = 𝐷𝑖 = 𝐷.  The value of Zmax is obtained by considering the capability of the central processing unit (CPU) to process the captured video. Based on the results shown in [62], the number of cycles required to process one frame is approximately 20 million cycles. Considering that the maximum CPU frequency of the Stargate platform is 400 MHz [129], the maximum number of frames that can be processed in one second is approximately 16. Thus, the value of Zmax can be calculated as 𝑍𝑚𝑎𝑥 = 17 − 𝑓1𝑧 .  Recall that out goal is to find the number of zones that minimizes the overall power consumption. Note that the value of optimal zone (Zopt) is bound to be integral and solving a mixed integer linear problem is known to be intractable. However, since 1 ≤ Zopt ≤ Zmax and 38  the value of Zmax is relatively small, we can evaluate the overall power consumption of the video surveillance network for all possible number of zones values. On the other hand, the distance-based optimization itself is formulated as a constrained optimization problem shown in Figure 2.4. As can be seen from this figure, the bitrate allocated to node i  zm is equal to 𝑅𝑚𝑧  that is calculated using (2.15). The value of 𝑅𝑚𝑧  depends on the value of the bitrate in the first zone, i.e., 𝑅1𝑧 , that is calculated using (2.16). Furthermore, the encoding power consumption allocated to node i  zm is calculated using (2.17).  The algorithm to find the optimal number of zones and at the same time allocate the bitrate and encoding power consumption per each node is then proceed as follow: 1. Based on the distortion requirement, estimate the frame rate to be allocated at the first zone, i.e., 𝑓1𝑧 , using (2.18). Set the minimum power consumption (Pmin) to be equal to infinity.  2. For Z=1 until Zmax, repeat:  Figure 2.4: The distance-based optimization  39  a. Calculate the video surveillance network power consumption (Pnet) using distanceBasedOptimization(Z) procedure b. If Pnet is smaller than Pmin, then set the following: Pmin = Pnet and Zopt = Z. For ease of referencing, this approach is called the Power-Rate-hEuristic Video Surveillance Network (PRE-VSN) algorithm.  2.3.2 Experiments and Results Table 2.1 shows the parameters used in the experiments. It is assumed that the maximum transmission range is 100m and the network bandwidth is 1 Mbps. It is clear from (3.4) that the transmission power consumption depends on the transmission distance. Accordingly, we assume that each node is able to adjust its transmission power to achieve the required performance at the receiver. The term network width denotes the width of the square area in which sensor nodes are statically deployed. The rate adjustment coefficient  is set according to the number of zones in the corresponding Table 2.1:  Parameters used Parameters Description value B Initial energy (battery) 100 kJ  Energy cost for transmitting 1 bit 0.5 J/Mb  Transmit amplifier coefficient 1.310-8 J/Mb/m4  Energy cost for receiving 1 bit 0.5 J/Mb  Path loss exponent 4 2 Average variance of video (MSE) 3500  Encoding efficiency coefficient 55.54 W3/2 /Mbps max Max. rate adjustment coefficient 0.1 P1frame Power consumption to encode 1 frame 60 mJ/frame   40  experiment. Hence,   is set equal to 0 when the number of zones is 1 and max when the number of zones is equal to Zmax respectively. In addition to that, the sensor nodes are deployed uniformly in a square grid layout where each node is deployed at the center of each grid.  2.3.2.1 Performance of the Proposed Technique The performance of the proposed heuristic in terms of network lifetime and the optimal zone is shown in Figure 2.5. In Figure 2.5(a), the number of nodes is four and the  Figure 2.5: Performance of the proposed heuristic. (a) varying distortion, node=4, width=50m; (b) varying distortion, node=9, width=50m  Figure 2.6: Performance of the proposed heuristic. (a) varying width, node=4, distortion=400; (b) varying width, node=9, distortion=400  41  network width is 50m. The target distortion is varied from 50 to 800. On the other hand, Figure 2.5(b) depicts network lifetime when the number of nodes is nine, while the network width and the distortion requirement are similar to those in Figure 2.5(a). These graphs show some interesting insights. Firstly, the value of Zmax depends on the distortion requirement and the network size. For example, when the distortion requirement is 800, the value of Zmax for both cases are the same, i.e., Zmax is equal to four. On the other hand, when the  distortion requirement is equal to 200, the value of Zmax is five for the four-node video surveillance network and three for the  nine-node video surveillance network, respectively. Secondly, the network lifetime graph for lower distortion requirement is rather flat. This means that most of the power is consumed for encoding the captured video to achieve the target distortion.  Furthermore, it can be seen from Figure 2.6 that the value of the optimal number of zones is proportional to the network width. When the network is larger, the bitrate should be reduced to lower the communication power consumption. However, to compensate for the reduction in the bitrate, the video surveillance network’s nodes need to dissipate  Figure 2.7: Performance of the proposed algorithm. (a) maximum network lifetime, (b) the number of zone producing maximum lifetime (zopt)  42  higher encoding power to achieve the target distortion.  Figure 2.6(a) shows the case when the number of nodes is four and the distortion requirement is equal to 400,  while the network width is varied from 10m to 100m. Similar settings are used in Figure 2.6(b) where the number of nodes is increased to nine. In addition, Figure 2.7(a) shows the maximum network lifetime for these experiments, while Figure 2.7(b) shows the optimal number of zones to achieve those results. 2.3.2.2 Comparison with Other Techniques Common techniques in sensor networks assume that all nodes are configured to use the same processing power consumption [44], [71]. These approaches can be classified as a special case of the proposed distance-based heuristic where the number of zones is equal to one. For ease of referencing, this technique is called Power-Rate-Allocated Video Surveillance Network (PRA-VSN).  Table 2.2:  Average network lifetime and network lifetime offset, normalized to PRO-VSN Network width (m) Normalized Lifetime (%) Lifetime Offset (hours) PRA-VSN PRE-VSN PRA-VSN PRE-VSN 10 90.85 90.85 5.81 5.81 20 88.72 88.72 6.97 6.97 30 92.13 92.13 4.13 4.13 40 94.65 94.65 2.25 2.25 50 93.72 93.72 2.13 2.13 60 89.59 92.29 2.82 2.09 70 84.58 89.72 3.49 2.33 80 78.83 88.51 4.08 2.21 90 74.73 89.09 4.13 1.78 100 70.19 85.77 4.27 2.04  43  In order to compare the proposed approach with the existing techniques, we simulate video surveillance networks whose nodes are randomly deployed within an area whose width is varied between 10m to 100m. The experiments are repeated 10 times and the distortion requirement is set equal to 100. Two performance merits, normalized lifetime and lifetime offset, are used. Normalized lifetime is defined as the ratio of network lifetime obtained from the examined algorithms with the optimal one produced by PRO-VSN. Lifetime offset is the difference between the optimal network lifetime produced by PRO-VSN and the one obtained from the tested algorithms. Results are shown in Table 2.2 and Figure 2.8. Figure 2.8(a) compares the average network lifetime obtained using the proposed PRE-VSN algorithm to the ones obtained using the existing techniques. It can be seen that the proposed PRE-VSN performs better than PRA-VSN when the network width is greater than or equal to 60m. In addition to that, when the network width is smaller than or equal to 30m, the network lifetime obtained by PRE-VSN is at least 4 hours shorter than that obtained by PRO-VSN. However, the lifetime offset of PRE-VSN can be as little as 1.78 hours less than that achieved by PRO-VSN. This happens when the network width is  Figure 2.8: Network lifetime comparison (a) exact values (b) normalized towards PRO-VSN 44  greater than or equal to 40m. However, when the network width is greater than 40m, the difference in terms of the average network lifetime obtained by the three algorithms is small. Hence, in order to get a better understanding of the performance of the proposed algorithm, the network lifetime values were normalized towards PRO-VSN as shown in Figure 2.8(b). This figure shows that the network lifetime obtained by the proposed algorithm is between 0.85% ~ 0.94% of the optimal solution obtained by the PRO-VSN algorithm. These results show that the proposed distance-based heuristic can achieve a comparable performance with the optimal solution. Note that the PRO-VSN algoritrhm can provide better results because the encoding power is finely optimized. On the other hand, the proposed PRE-VSN uses the zone-based system on which the encoding power consumption per each zone is tied to the frame-rate of the video.  Figure 2.9 shows the average power consumption per each node when the network width is 60m and distortion requirement is 100. It is shown that each algorithm is able to control power dissipation such that all nodes spend roughly similar total power  Figure 2.9: Average power consumption profile 45  consumption. It can also be seen that in the PRA-VSN, all nodes dissipate similar encoding power consumption. However, the total power consumption obtained per each node obtained by the PRA-VSN is not the same. Therefore, it is clear that the proposed PRE-VSN performs better than the PRA-VSN, while the PRO-VSN has the best performance in term of minimum total power consumption. 2.4 Summary In this chapter, the distance-based heuristic for video surveillance network’s encoding power and bitrate allocation is proposed. This is performed by dividing the video surveillance network deployment into several zones. All nodes in the same zone are allocated the same encoding power consumption and bitrate. In order to determine the zones’ encoding power consumption and bitrate, a heuristic to allocate the encoder parameter frame rate for each zone was proposed.  The proposed technique shows that by assigning a different frame rate to each zone, the video surveillance network’s system lifetime can be optimized. However, in many video surveillance applications, it is often required that the captured videos are encoded at the same frame rate to guarantee the same video quality across all nodes. In addition, there are other encoder parameters that significantly affect the trade-off between encoding and communication power consumption in video surveillance networks. Furthermore, the videos captured by the surveillance nodes reflect the different point of views of the scene, i.e., each node captures unique video with different content. The different content captured by each node affect the distribution of power consumption in the video surveillance network. The effect of encoder parameters 46  settings and the complexity of content captured by various nodes in the video surveillance network are the focus of the discussion in the next chapter. 47   Encoding Parameters and Scene’s Content Complexity in Video Surveillance Networks In a resource constrained environment such as video surveillance networks, it is very important to choose the right configuration and setting parameters that lead to the optimal coding performance. On the other hand, the coding complexity and bitrate of videos also depend on the content of the video. This chapter studies the effect of encoding parameter settings and video content towards coding complexity and bitrate. An algorithm to minimize the video surveillance network power consumption considering the above-mentioned factors is also proposed in this chapter. 3.1 Video Encoding Parameters and Node’s Power Consumption H.264/AVC is a block-based hybrid video coding that reduces the bitrate generated by the encoder by exploiting source statistics. For this purpose, intra prediction technique are used to reduce redundancy within one frame; while inter prediction technique with motion estimation (ME) algorithm is used to exploit redundancies among subsequent frames. In terms of coding complexity, inter prediction is more involved than intra prediction technique. However, inter-coded frames have smaller bitrate than the intra-coded ones. Some initial observations of the encoder’s performance led us to a low complexity encoding configuration as shown in Table 3.1 [130]. This configuration will be used as the basis parameter settings used in this chapter.  48  Note that due to the limitation in the energy and processing resources of video surveillance networks, less complex encoder configurations are deployed. To this end, the baseline profile of H.264/AVC that is suitable for low complexity applications is used. Therefore, only I and P frames are used. The other encoding settings used in this thesis include the use of context-adaptive variable-length coding (CAVLC) entropy coding, one reference frame, SR equal to eight, while the rate distortion optimization (RDO), rate control, and the deblocking filter are disabled. 3.1.1 H.264/AVC Coding Performance Analysis  In order to analyze the relation between encoding configuration and power consumption, there is a need to investigate the effect of different encoder parameter settings towards the encoder`s coding complexity and bitrate. In order to perform our analysis, the H.264/AVC software, JM version 18.2 [131] is used in all experiments. For our Table 3.1:  Encoding parameters for the low complexity configuration Parameters Values Number of reference frame 1 Smallest motion compensation block size 8x8 Entropy coding CAVLC B Slices None Subpel motion compensation Disabled Rate Distortion Optimization Off Rate Control Off Deblocking filter Disabled Motion estimation search range (SR) 2 Group of pictures (GOP) size 1 Quantization parameter (QP) 34  49  initial analysis, two representative test videos are selected from the data set provided by MPEG for HEVC call for proposal [132]. They are the “BQMall” (832x480 pixels, frame rate=60) and “Traffic” (2560x1600 pixels, frame rate = 30). These sequences are downsampled to produce 15 frames per second (fps) videos  and resized to obtain the following resolutions: 4CIF (4 times of common intermediate format, 704x576 pixels) and CIF (common intermediate format, 352x288 pixels).   3.1.1.1 Encoding Complexity Measure The coding complexity is usually measured based on the encoder’s computation time, which may not be accurate since it widely depends on the device architecture and the optimization level of the algorithms used (e.g., if GPU or video processors are used) and whether CPU is involved with other processes than encoding. To have an accurate measure of the coding complexity, the instruction level profiler iprof [133] that provides the total number of basic instruction counts to execute a task is used. Figure 3.1(a) and Figure 3.1(b) illustrate the complexity measure using encoding time and the instruction counts (IC) provided by iprof, respectively. The video sequence used to obtain the plots in the figure is  Figure 3.1: Complexity measure of different GOP sizes and QP values (a) encoding time, (b) the number of instruction counts 50  the BQMall 15 fps CIF video. Note that when GOP>1 the inter prediction module in the encoder were set to use SR equal to two as stated in Table 3.1. It can be seen from the figure that the number of instruction count provides a more objective and accurate measure of coding complexity than the encoder`s computation time.   3.1.1.2 The Effect of GOP Size towards Coding Complexity and Bitrate The GOP size determines the number of successive inter-coded frames between intra-coded frames within a coded video stream. Increasing GOP size will increase the number of inter-frame coded frames. Figure 3.1(b) shows that the IC is increasing as the GOP size increases. It can be seen that the minimum IC happens when GOP size is equal to one. This is expected since the ME algorithm in the encoder is disabled when GOP=1. As it is observed in Figure 3.1(b), the encoder needs around 16 billion of instructions to encode video with GOP=1 configuration. On the other hand, it needs 22 billion of instructions for GOP=4. Furthermore, Figure 3.2 shows the bitrate and the corresponding peak signal to noise ratio (PSNR) of BQmall 15fps CIF video sequence encoded with the configuration  Figure 3.2: Rate distortion plot for different QP values and GOP sizes 51  settings similar to the one used in Figure 3.1(b). It is observed that the bitrate decreases as the GOP size increases. For example, when QP is equal to 34, the bitrate is around 800 kbps for GOP=1 and 570 kbps for GOP=4. Figure 3.1(b) and Figure 3.2 show that GOP size affects the coding complexity and bitrate significantly. 3.1.1.3 The Effect of QP towards Coding Complexity and Bitrate The quantization parameter (QP) regulates how much spatial detail is saved, affecting the quality of the encoded video. When QP is very small, almost all of that detail is retained, but at the price of higher complexity and bitrate. Figure 3.1(b) shows the effect of QP towards coding complexity. The figure shows that the coding complexity decreases as the QP increases. A similar observation can be seen for the bitrate (please see Figure 3.2), such that the bitrate decreases as the QP increases.  3.1.1.4 The Effect of SR towards Coding Complexity and Bitrate The JM 18.2 software provides a number of ME algorithms that can be used in the encoder. These algorithms differ in the method of exploring the search area, whose size is  Figure 3.3: The effect of different SR values to bitrate, complexity, and PSNR (a) BQmall 15 fps CIF video, GOP=75, QP=28 (b) Traffic 15 fps CIF video, GOP=75, QP=28 52  determined by the SR, for suitable block candidates in the reference frames. The EPZS algorithm that shown to provide the best trade-off in terms of coding complexity and performance is chosen to be used in all experiments [130]. Increasing the SR may provide us with a better prediction, i.e., better compression. However, it comes with the price of increasing coding complexity. Thus, the trade-off between coding complexity and bitrate also depends on the value of SR. Figure 3.3(a) shows the effect of using different values of SR to the bitrate, video quality, and coding complexity of BQMall 15 fps CIF video sequence. On the other hand, Figure 3.3(b) shows the plots obtained from Traffic 15 fps CIF video.  Figure 3.3 shows that video quality in terms of PSNR is almost the same regardless of the value of SR used to encode the video. However, the complexity of the encoder increases with the increase of SR. On the other hand, the bitrate of the encoded video decreases with the increase of SR, especially when SR is increased from 2 to 8. The bitrate reduction obtained from increasing the SR from 8 to 16 is not really significant. However, when the resolution is increased into 4CIF, we notice that increasing SR up to 16 is still beneficial in terms of bitrate saving [130]. Some interesting observations are thus obtained: 1. For the same value of QP, video quality is almost the same, regardless of the size of SR. However, there is a trade-off between coding complexity and bitrate when different values of SR is used; 2. The optimum SR value that should be used for a specific video sequence depends on the resolution of the video; and 53  3. Video content affects the PSNR of the encoded video. The QPs used in Figure 3.3(a) and Figure 3.3(b) are the same. Hence the total number of instructions used for both video sequences are roughly the same. However, it is observed that the quality of the encoded Traffic sequence is around 1.5 dB lower than the BQmall sequence.  3.1.2 Coding Complexity and Bitrate Trade-off Figure 3.4 shows coding complexity (shown as IC) and bitrate for different values of GOP and SR of the “BQMall” video. The figure shows that increasing GOP size results in higher IC but reduces the bitrate. For the same GOP size, increasing SR also increases the IC and reduces the bitrate. However, there are some configuration settings that are not optimal as shown by the bold line in the figure, i.e., point A, B, and C, which occurs when GOP>2 and SR=2. It is interesting to note that the configuration settings of point B (GOP=8,  Figure 3.4: Trade-off between coding complexity and bitrate for BQmall 15 fps CIF video, GOP={1,2,4,875}, SR={2,4,8,16}, QP=28 54  SR=2) and point D (GOP=4, SR=4) have almost IC. However, the bitrate produced by configuration D is smaller than configuration B. Removing points A, B, and C from the plot produces the dashed-line plot illustrated in Figure 3.4. These configuration settings are translated into a tabular format as shown in Table 3.2. It has to be noted that, these settings are used on top of the configuration detailed in Table 3.1. In addition to that, it is worth noting that the coding complexity and bitrate of video can be used to estimate the video surveillance network’s encoding and communication power consumption. This will be explained in more detail in the following subsections.  Table 3.2:  Configuration ID Configuration ID (CID) GOP Search Range 1 1 N/A 2 2 2 3 2 4 4 2 8 5 2 16 6 4 4 7 4 8 8 4 16 9 8 4 10 8 8 11 8 16 12 75 4 13 75 8 14 75 16  55  3.1.3 The Effect of Encoder Parameter Settings towards Node`s Power Consumption  The total energy dissipation of a video surveillance node consists of encoding power consumption and communication power consumption.  3.1.3.1 Encoding Power Consumption One of the most important parameter in estimating node’s power consumption is the average cycle per instruction (CPI) [134]. If the total number of instructions used to execute a task is known, the energy depleted to execute that task can be estimated as the multiplication of the total number of cycles to execute that task and the average energy depleted per cycle. The number of CPU cycles required to execute a task can be calculated as the elapsed time of the encoding task multiplied by the CPU frequency. However, as Figure 3.1 shows, the time elapse (encoding time) is not as accurate as the instruction count provided by iprof. Therefore, the total number of cycles to execute a task in this thesis is calculated as the multiplication of the instruction count (IC) and CPI. The average energy consumption required to encode a frame is then estimated as: fecefNECPIICE   (3.1) where IC is the total number of instructions to encode a sequence (provided by iprof), CPI is the average number of cycles per instruction of the CPU, Eec is the energy depleted per cycle and Nf is the number of frames. With Fr denotes the video frame rate, the encoding power consumption is then calculated as follows: refe FEP  .  (3.2) 56  3.1.3.2 Communication Power Consumption Node`s communication power consumption consists of transmission and reception power consumption (see Equation (2.4) and (2.5)). In the case of multi-hop transmission, the total communication power consumption dissipated for transmitting the encoded video stream from node i to the sink, separated by n-hops is calculated as follow [102]: rtc PnPnP  )1( .  (3.3) 3.1.3.3 Encoding and Communication Power Consumption Trade-off Recall that Table 3.2 shows the configuration IDs (CIDs) producing different pairs of coding complexity and bitrate. The coding complexity and bitrate determine the encoding and communication power consumption of a node in video surveillance networks. Therefore, Table 3.2 can be used to analyze the trade-off between encoding and communication power consumption. The same general parameters that were shown in Table 3.3 is used to analyze the trade-off between computation and communication power Table 3.3:  Parameters used for encoding and communication trade-off experiments Parameters Description value B Initial energy (battery) 100 kJ  Energy cost for transmitting 1 bit 0.5 J/Mb  Transmit amplifier coefficient 1.310-8 J/Mb/m4  Energy cost for receiving 1 bit 0.5 J/Mb  Path loss exponent 4 2 Average variance of video (MSE) 3500  Encoding efficiency coefficient 55.54 W3/2 /Mbps max Max. rate adjustment coefficient 0.1 P1frame Power consumption to encode 1 frame 60 mJ/frame   57  consumption. In addition, the following parameters were also used: energy depleted per cycle (Ec) is 1.215 nJ [62] while the average CPI is 1.78 [135].  Figure 3.5 (a) shows the encoding and communication power consumption for all CIDs shown in Table 3.2 and QP=28 for a single hop transmission to the sink. In this figure, it is assumed that the transmission distance is 50m. The figure shows that for CID=1, the encoding power consumption of the node is equal to 7.2 Watt while the transmission power consumption is around 0.87 Watt. However, the encoding power consumption is increased to roughly 10.9 Watt while the transmission power consumption is reduced to around 0.49 Watt when the CID=14 configuration is used. This figure shows the effect of using different encoder configuration settings, i.e., different CID, towards node’s power consumption. As it is observed, a node can save transmission power consumption by performing a more complex encoding process, i.e., to get a lower bitrate. On the other hand, when the node is required to save encoding power consumption, it can afford to spend more energy on transmission. Thus, the video surveillance network’s nodes can adjust their encoder parameter settings according to the CID table when the need arises. For example,  Figure 3.5: Node`s power consumption: (a) single hop (b) multi-hops transmission 58  as Figure 3.5(b) shows, the optimal configuration for the node that is one hop away from the sink is the CID=1 configuration. However, when the nodes are four hops away from the sink, the node needs to use CID=10 configuration in order to reduce its power consumption. These results show that encoder parameter settings affect the node`s power consumption. Furthermore, it can also be seen from Figure 3.5(b) that there is an intersection point of the different CIDs. This intersection illustrates the condition at which the computation and communication power consumption trade-off, i.e., because of using different CIDs, reach a balance. In some applications, the video surveillance network’s nodes need to transmit the encoded video stream through a more complex network layout. This creates power consumption and bitrate fairness issue between the nodes in a video surveillance network. This will be discussed in the following sections. 3.1.4 Fairness-based Node’s Encoder Configuration Allocation Video nodes in a surveillance network share the same wireless medium in order to send their encoded video to the sink. Since the bandwidth, i.e., total data rate, allocated for Table 3.4:  Simplified Configuration ID (CID) CID GOP 1 1 2 2 3 4 4 8 5 16 6 32 7 64  59  the whole nodes are usually limited, there is an issue of fairness of bandwidth allocated for each node. Furthermore, in many video surveillance’s deployment scenarios, nodes closer to the sink are required to relay the information from the nodes farther from the sink. The total energy consumption of nodes closer to the sink is thus usually greater than the nodes further. This problem can be alleviated by allocating different fairness ratio per each node by exploiting the trade-off between encoding and communication power consumption.  3.1.4.1 Simplified Configuration ID Table As mentioned in Section 3.1.1.4, the effect of SR towards coding complexity and bitrate depends on the video content. In order to develop a more general model, the configuration ID (CID) table is simplified by setting SR equal to eight as suggested by [106] for all CIDs. In addition to that, since GOP size provides a significant effect on coding complexity and bitrate, the number of GOP size options is increased. The modified CID table is then shown in Table 3.4. Note that, the values used in the table is chosen such that there is a relation between the GOP size and the CID, i.e., CID=log2(2·GOP) [136]. 3.1.4.2 Fairness Constraint Using the end-to-end fairness constraint (), the bitrate allocated to a node is controlled using the following relation [137]: BR ii     (3.4) where B denotes the network bandwidth of the video surveillance network. In this equation,  is fairness constraint, i.e., the maximum percentage of the total bitrate that can 60  be sent to the sink by each node. Furthermore, the sum of transmission rate of all nodes is constrained to be:  BRNj j 1 .  (3.5) Using (3.4), node i can only generate a flow to the sink that is lower than a fraction of i of the network bandwidth. It has to be noted that when all nodes are using the same fairness constraint, i=1/N, each node will be allocated equal transmission rate. In this condition, the video surveillance network is called to use the maximum fairness scheme.   3.1.4.3 Exhaustive Search Fairness-based Algorithm In [136], the exhaustive fairness based search for minimum power video surveillance network was proposed. The basic idea of the method is to find the fairness allocation per each node such that the power consumption is minimized. The algorithm assumes that the fairness ratio decreases linearly from the node closest to the sink towards the node farthest from the sink. In other words, the nodes closer to the sink is assigned a smaller CID (which entails high bitrate but low coding complexity) than the nodes farther  Figure 3.6: Camera placement 61  from the sink. In order to find the best fairness allocation to each node, an exhaustive search of all possible CID combinations were performed. It was reported that for a linear topology with six nodes, the exhaustive fairness-based algorithm manages to reduce the video surveillance network’s energy consumption by 11%.  However, the algorithm proposed in [136] also assumed that all nodes have the same video content.  3.2 Relation between Video Content, Bitrate and Coding Complexity Earlier studies on video surveillance network’s power consumption assume that all nodes use the same video content. However, this assumption is a very simplistic approach, and it does not represent a real video surveillance deployment where different nodes capture the scene from the different point of views. Therefore, the video content is not consistent over different nodes. Note that the performance of a video encoder in terms of encoding complexity and bitrate also depends on the spatial and temporal complexity of the content. That elevates the importance of determining the trade-off between encoding and communication power consumption in a video surveillance network.   Figure 3.7: Scene settings 62  3.2.1 Video Surveillance Dataset To obtain representative data with different temporal and spatial complexity levels, nine HD (high definition) cameras were installed in one of our labs. The cameras are installed such that each of them has a different field of view (FoV) while there are still overlaps among the cameras’ FoVs. The scene arrangement is such that the motion and activities are not centered in the middle of the field and the complexity level of content captured by each camera is different. With the assumption that important activities in surveillance applications partially occur around the entrance door, the middle camera (see camera 5 in Figure 3.6) is directed towards the door. Furthermore, we modified the layout of the lab to represent three different scenarios, namely “office”, “classroom”, and “party”.  Figure 3.8: Snapshots of the “office” scene in the first activity setting from (a) camera2 and (b) camera6; and snapshots the third activity setting: (c) camera2 and (d) camera6 63  Figure 3.7 shows the room set up for the different scenarios. To have a representative database with different activity levels, each scene is captured several times and the captured videos for each scene are categorized into four groups as follows: 1. The level of activity of all the people in the room is high, and the total number of people is between six to eight, 2. Three or more people moving around the room, while the total number of people in the room is around six,  3.  Couple of people walking around the room, while the total number of people in the room is around five, 4. About four people walking into the room   Figure 3.9: Snapshots of the “classroom” scene from (a) camera2 and (b) camera4; and the snapshots of the “party” scene from (c) camera2 and (d) camera4 64  Capturing each scenario (“office”, “classroom” and “party”) at four different activity levels using nine cameras, results in 108 HD video sequences for our study. Each video is 10 seconds long with a frame rate of 60 per seconds (fps). In order to simulate practical low power video surveillance networks applications, the captured videos are downsampled to 416x240 pixels, 15 fps. For ease of referencing, the following video identification is used hereafter: <camera-id__scene-setting__activity-level>. For example, camera2_party_act1 refers to the video captured by camera2 in the “party” scene when the activity level of the scene falls into the first setting as mentioned above. These downsampled videos have been made publically available [121].  Figure 3.8 shows snapshots of the “office” scene from camera2 and camera4 when the activity level of the scene falls into the first and third settings. As it is observed, the video content from the two cameras and the two activity settings are not the same. Figure 3.9 shows snapshots from the “classroom” and “party” scenes when the activity level of the scene falls into the second setting.    Figure 3.10: Complexity, bitrate, and video quality trade-off   65  3.2.2 Effect of Scene Complexity towards Coding Complexity and Bitrate For the purpose of analysis, the same CID table shown in Table 3.4 is used. Figure 3.10(a) shows the complexity and bitrate plot of the camera2_party_act2 video encoded with different CIDs and QPs. As it is observed, when CID is small, the compression performance of the encoder is sacrificed such that the bitrate is high. However, this is compensated with a low coding complexity. On the other hand, bigger CID means using higher coding complexity to gain a better compression performance.  Figure 3.10(a) also shows that reducing the QP increases both coding complexity  Figure 3.11: The complexity and bitrate of some videos from the “party” scene (a) different cameras but similar activity setting (b) the same camera but different activity settings  Figure 3.12: The complexity and bitrate of some videos captured by camera2 at similar activity setting but different scenes  66  and bitrate. Therefore, the bitrate and the complexity of the encoding process depend on the CID and QP used. It has to be noted that although the GOP size’s gap between CID=6  (GOP=32) and CID=7 (GOP=64) is high i.e., the GOP size gap is 32; the different in coding complexity (bitrate) between these two CIDs is very small. Furthermore, as mentioned in Section 3.1.4, there is a relation between the GOP size and the CID, i.e., CID=log2(2·GOP). This shows that CID values represent an encoder parameter, i.e., the GOP size, whose values affect the coding complexity and bitrate of the encoder. In terms of video quality, Figure 3.10(b) shows that for the same QP, the PSNR is almost the same, i.e., the difference is less than 0.5 dB, regardless of which CID value is used to encode the video.  3.2.3 Scenes’ Complexity Measurement and Classification Figure 3.11(a) shows the complexity and bitrate of videos captured by three different cameras in the “party” scene at the same activity level while Figure 3.11(b) shows the complexity and bitrate of videos captured by camera2 in the “party” scene at different activity levels. Figure 3.11(a) shows that the coding complexity and bitrate of videos depend on the  camera point of view at the scene. On the other hand, Figure 3.11(b) illustrates that the coding complexity and bitrate of videos captured at the same scene by the camera depends on the content’s motion activity levels. Furthermore, Figure 3.12 illustrates the coding complexity and bitrate of some videos captured by camera2 at similar activity setting but different scenes. This figure shows that the coding complexity and bitrate trade-off of the videos depends on the scene settings, i.e., spatial detail of the scene. Figure 3.11 and Figure 3.12 show that the coding complexity of the videos captured by each camera depends on the spatial (camera views and scene settings) and temporal (motion 67  activity levels) complexity of content. Moreover, the videos with a high spatial and temporal complexity tend to have a higher bitrate than the videos with lower content complexity. Consequently, the total bitrate generated by the captured scenes with high spatial and temporal detail will be higher than the ones obtained from the scenes with low spatial and temporal detail. In this regard, we proceed by assuming that the bitrate allocation information at the worst scenario,   i.e.,  when the total bitrate is high, can be used as an initial guide  to  allocate the configuration settings in the other scenarios. Therefore, the scenes with higher content complexity are used as the training set for the remaining scenes.  Figure 3.13: The complexity and bitrate of some videos captured at different scene settings  Spatial and temporal information of all scenes captured 68      Table 3.5:  Spatial Information unit (SI) of all videos  Scene cam1 cam2 cam3 cam4 cam5 cam6 cam7 cam8 cam9 office_act1 100.52 97.83 88.92 76.17 84.60 89.95 85.73 95.06 94.98 office_act2 85.83 90.26 85.90 71.24 76.14 84.45 85.20 90.06 88.62 office_act3 87.78 88.68 89.78 74.30 67.62 91.16 89.93 90.80 92.75 office_act4 84.97 87.01 87.96 71.40 61.17 89.97 83.85 87.36 87.43 classroom_act1 94.62 99.18 96.73 89.56 97.46 96.04 92.03 88.12 83.59 classroom_act2 96.30 94.46 96.30 91.46 74.07 88.83 92.74 85.52 84.99 classroom_act3 89.67 94.11 90.04 85.71 77.42 87.03 88.94 85.85 82.55 classroom_act4 101.69 101.21 94.74 92.68 89.08 94.57 91.86 89.19 83.64 party_act1 114.85 112.10 116.85 99.60 81.27 104.73 104.60 110.79 105.97 party_act2 115.40 114.49 107.53 90.61 95.06 101.31 104.56 111.56 101.90 party_act3 113.43 112.04 113.87 99.02 76.15 96.36 107.40 111.86 107.08 party_act4 106.64 103.51 111.48 89.54 93.97 95.22 101.28 102.44 105.30   69   Table 3.6:  Temporal Information unit (TI) of all videos  Scene cam1 cam2 cam3 cam4 cam5 cam6 cam7 cam8 cam9 office_act1 20.91 23.88 18.15 25.59 21.83 26.44 14.93 21.35 21.51 office_act2 14.59 19.21 18.46 22.00 19.43 23.26 17.33 15.10 15.47 office_act3 13.10 18.34 18.77 20.77 13.73 20.90 17.04 17.20 16.51 office_act4 14.05 16.76 16.21 17.74 1.74 15.87 14.55 18.08 18.60 classroom_act1 18.05 18.39 18.42 17.47 19.47 19.56 11.04 11.21 7.38 classroom_act2 12.04 16.78 12.75 15.14 12.25 13.82 13.85 7.60 11.58 classroom_act3 15.18 15.88 11.79 13.76 17.64 14.98 9.65 11.07 4.75 classroom_act4 16.52 16.24 13.41 15.65 18.40 17.46 11.22 12.04 6.79 party_act1 22.45 27.82 21.44 27.15 19.59 21.93 17.15 18.27 17.98 party_act2 26.10 26.24 19.76 17.66 21.24 26.86 22.29 22.67 15.76 party_act3 13.75 17.83 14.74 17.05 10.92 15.01 13.93 13.50 11.72 party_act4 16.59 16.78 17.00 13.17 18.25 15.32 11.10 13.31 10.83   70  In order to find the scenes with higher activity content, a methodology to classify each scene into different content complexity level is needed. For that purpose, the ITU-T recommendation that includes the use of spatial information (SI) and temporal information (TI) defined in [138] is used:     nspacetime FSobelstdSI max   (3.6)   1max  nnspacetime FFstdTI   (3.7) SI and TI measure the spatial and temporal activity level of videos. In this regard, Figure 3.13 shows the SI and TI values of camera1, camera2, camera5, and camera9. It can be seen from this figure that for the same scene, each camera has different spatial and temporal activity level.  Table 3.5 shows the SI values of all videos while Table 3.6 shows the TI values, respectively. Furthermore, in order to classify the scenes into different content complexity level, the following procedure is used: 1. Classify each video from a scene into different SI and TI classes using the following threshold: },{)(5.0)()(5.0)(TISICCCCstdCCmeantCCstdCCmeantCCHCCL  (3.8) 2. The value of 𝑡𝐿𝐶𝐶  and 𝑡𝐻𝐶𝐶 are used to classify whether a specific video has a low, medium or high SI or TI. For example, if a specific video’s SI is less than 𝑡𝐿𝑆𝐼 , the video is classified as low-SI_video. If the video’s SI is higher than 𝑡𝐻𝑆𝐼 , it is classified as high-SI_video. If the video’s SI is between 𝑡𝐿𝑆𝐼and 𝑡𝐻𝑆𝐼, the video is classified as medium-71  SI_video. Using (3.8) on the data in Table 3.4 and Table 3.5, the following threshold values are obtained: 𝑡𝐿𝑆𝐼 = 87.85, 𝑡𝐻𝑆𝐼 = 98.95, 𝑡𝐿𝑇𝐼 = 14.32, and 𝑡𝐻𝑇𝐼 = 19.04.  3. Based on the SI(TI) classes of the videos, we can then classify the scene into different SI(TI) classes using the following rules: a. The SI(TI) class of a scene is equal to the majority of SI(TI) classes of all videos from that scene b. If no majority is found, the scene is classified as medium SI(TI) scene Figure 3.14 shows an example of how to determine the office_act1 and classroom_act1 scene class. Using the rules mentioned in the previous paragraphs, office_act1is classified as medium-SI_high-TI_scene. On the other hand, scene classroom_act1 is classified as medium-SI_medium-TI_scene.  Recall that the scenes with high SI produce videos with higher bitrate than the scenes with lower SI. This is especially true for CID configuration one, i.e., the video is coded with intra prediction only. Therefore, the configuration setting that is suitable for a specific SI class may not be suitable to be used for the other SI classes. Based on this assumption, the scenes are arranged into three different sets, namely scenes that have high,  Figure 3.14: SI(TI) class of office_act1 and classroom_act1 scenes and their corresponding scene’s class 72  medium, and low SI. In each of these sets, the scene with the highest TI will be selected as the training scene. For example, in Figure 3.15, the scenes with medium SI are the office_act1, office_act3, classroom_act1, classroom_act2, and classroom_act4. Out of these five scenes, the scene that has the highest TI class, i.e., office_act1, is selected as the training scene for the medium SI scenes. If there is more than one candidate for the training scene, the scene that has the highest average TI will be selected as the training scene. Henceforth, the training set for the high SI scenes is party_act2, the training set for the medium SI Table 3.7:  Training and test scenes Scene’s SI class Training Scene Test Scene Test Scene Label High party_act2 party_act1 VS1 party_act3 VS2 Party_act4 VS3 Medium office_act1 classroom_act1 VS4 classroom_act2 VS5 classroom_act4 VS6 office_act3 VS7 Low office_act2 office_act4 VS8 classroom_act3 VS9    Figure 3.15: Scene’s classes  73  scenes is office_act1, and the training scene for the low SI scenes is office_act2. Note that the training scenes are shown in bold in Figure 3.15.  Table 3.7 shows the training scenes and their corresponding test scenes. We will use the training scenes to find the fairness ratio allocation per each node in the video surveillance network that minimizes the overall power consumption. The fairness ratio obtained from the training scenes will be used as an initial guess for the CID allocation in the test scenes. The following sections explain the algorithms proposed in this thesis. 3.3 Common Approach A common approach to allocating the encoding configurations per each video surveillance network node is by assigning the same settings for all nodes. We call this approach CommonConfig algorithm. This approach was used to analyze the video surveillance network’s power consumption of Intra only configuration and Inter Main Profile with GOP size of 6 and frame-type sequence of I-P-B-P-B-P-I in [62][65]. The authors in [100] also used CommonConfig algorithm for Intra-only configuration in their analysis. In order to implement the CommonConfig algorithm while still being fair with the implementation reported in the literature, we try to assign the same CID to all nodes such  Figure 3.16: MaximumFairness CID allocation algorithm 74  that the bandwidth constraint is not violated.  Another common approach to assign nodes’ encoding configuration is by allocating the same fairness constraints to each node, that is equal to fair = i = 1/N. This algorithm is called the MaximumFairness scheme, as shown in Figure 3.16. In this figure, the procedure getCID is a procedure to assign a node with a specific CID, where R{CIDi, QP}/B<i, and R is the bitrate allocated for node i when using the corresponding CID. Note that, fratio = {1,2,..N}, N is the number of nodes, and i=1/N. 3.4 Fairness-Based Optimization for Minimum Power Consumption The encoding power consumption of a video surveillance node depends on the CID value assigned to that node. In this regard, by merging (3.1) and (3.2), the encoding power consumption is estimated as:  frecieiNFECPIP  (3.9) where, i is the total number of instructions to encode the video for node i, CPI is the average number of cycles per instruction of the CPU, Eec is the energy depleted per cycle, Nf is the number of frames and Fr denotes the frame rate of the video sequence. The value of i is obtained using the following relation i = IC{CIDi, QPi}, where IC is the instruction count provided by iprof for the pair of CID and QP values used, 1 CIDi 7 and QPmin  QPi  QPmax. Since we want each node to produce a video with almost similar quality, all nodes have to use the same QP, thus, QPi = QP, i  N.  75  On the other hand, some nodes need to relay their data through intermediate nodes. Therefore, the node’s communication power consumption depends on both the CID value assigned to that node and the way the encoded data is relayed in the network. Combining the transmission, reception and nodes’ power consumption model formulation shown in (2.3), (2.4), and (2.5), the rate flow conservation law (2.2), the bandwidth constraint formula (3.5) and nodes’ encoding power consumption model (3.9); the task  to assign nodes’ CID allocation for minimum power consumption video surveillance network is formulated as an optimization problem shown in Figure 3.17.  The objective of the optimization in Figure 3.17 is to find the allocation of transmission rate on each link and CID at each node that minimize the total power consumption. The coding complexity and bitrate of node i, i.e., i and Ri respectively, in the  Figure 3.17: Power consumption minimization using CID 76  optimization shown in Figure 3.17 are obtained from the look-up table. Note that, the value of CID is constrained to be integral. On the other hand, the value of rij and rki that determine the routing of data from and to node i in (2.4) and (2.5) are rational numbers. However, although the CID values represent encoder configuration labels, it is also related with GOP size as explained in Section 3.1.4. Also, the value of bitrate (coding complexity) is monotonously decreasing (increasing) with the increase of CID value (see Figure 3.10). An optimization problem involving mixed linear and integer variables is NP-complete, where some of the solutions are intractable. However, using the branch and bound algorithm [139], the optimization procedure can be terminated early and as long as a solution that satisfies the stopping criteria is found. Therefore, a feasible, not necessarily optimal solution can be obtained. The following section explains the proposed approach.   Figure 3.18: Power consumption minimization with bounded CID values 77  3.4.1 Fairness-based CID allocation for Minimum Power Video Surveillance Network In this thesis, the branch and bound approach is implemented by using the following steps: 1) solve the bounded optimization problem 2) call a recursive procedure to perform branch and bound until a solution is found or termination criteria are satisfied. The bounded optimization problem is shown in Figure 3.18. The difference between this algorithm and minimizePowerCID optimization described in the previous section is the fact  Figure 3.19: RecursiveBranchBound procedure 78  that the CID values are given as upper and lower bounds. Note that, since the CID value represents the configuration label, whenever the optimization procedure needs to look up the values of the coding complexity and bitrate, the CID value needs be rounded to the nearest integer. If the CID provided by the bounded optimization does not satisfy the integrality constraint, the RecursiveBranchBound procedure will be called to perform branch and bound approach to find the solution. Figure 3.19 shows how the branch and bound algorithm is implemented to allocate the rate on each transmission link and CID per each node that minimizes the total power consumption. The algorithm proceeds by finding the CID allocation using the optimization procedure shown in Figure 3.18. If the obtained solution does not satisfy the integrality constraint, the optimization will be divided into two sub-problems by defining new upper and lower bounds followed by a call to the recursive functions. The integrality constraint  is small number that signify the error between the CID and the rounded integral value of the CID, such that 0 < 𝜀 < |𝐶𝐼𝐷𝑖 − 𝑟𝑜𝑢𝑛𝑑(𝐶𝐼𝐷𝑖)|. 3.4.2 Proposed Fairness-based CID Allocation Algorithm for the Test Set As mentioned in the previous sections, the optimal fairness ratio allocation obtained from the training sets will be used as an initial guess to allocate the CID for the test videos. However, since the content of the video of the training set and the test set are not exactly the same, we may need to perform some adjustment procedure while assigning the nodes’ CID in the test sets. Hence, the algorithm fairness based CID allocation for the test set is shown in Figure 3.20. In this algorithm, the getCID procedure returns the highest possible CID option that can be allocated to node i with fairness ratio equal to i. For example, if the possible CIDs that can be allocated to node i are either six or seven, the getCID procedure 79  will return CID equal to six. However, in some cases, the getCID procedure may not be able to find a suitable CID with fairness ratio allocation j to be allocated to node j. In this regard, the node will be assigned the highest CID that entails the lowest possible bitrate in our configuration. A variable named overflow is then updated with the difference between the allocated bitrate from the look-up table R{CIDj, QP} with the supposed maximum bitrate for that node, i.e., j*B. In this regard, the variable overflow is used to quantity the accumulative bitrate borrowed from the other nodes.  On the other hand, if an appropriate CID is available while the value of overflow variable is positive, another call to procedure getCID with a lower fairness ratio is performed to get another CID. This is performed so that we can ‘pay back’ the outstanding bitrate ‘debt’. The overflow variable is then updated accordingly. In the chance that the  Figure 3.20: Fairness-based minimum energy algorithm 80  overflow variable is still positive after the CID allocation for all nodes have been performed, a procedure checkBandwidthConstraint is then called to adjust the CID allocation per each node. Starting from the node furthest from the sink, the procedure checks whether assigning a higher CID to that node can reduce variable overflow to be less than or equal to zero. After that, nodes’ power consumption is calculated using minimizePower procedure shown in Figure 3.17.  Figure 3.21: Fairness-based with adjustment algorithm 81  As mentioned in the previous paragraphs, an adjustment procedure can be performed to reduce the power consumption further. Figure 3.21 shows the adjustment procedure performed on two nodes, i.e., the last node and the first node. These two nodes are chosen because of the following reasons. Firstly, using a lower bitrate configuration at the last node will reduce the transmission power consumption on all the nodes in the video sensor network. On the other hand, the extra bitrate available can be allocated to the first node so that the encoding power consumption at the first node is reduced.  3.4.3 Experiments Settings Figure 3.22 shows the network topology used in the simulations. In this figure, the dark node is the sink node while the blank nodes are the video node. Each node is given an identification number according to its distance to the sink. Therefore, the distance between node1 to the sink is smaller than the distance between node2 to the sink. It is assumed that each video node located at a specific location in the topology illustrated in Figure 3.22 is attached to the camera located at the same location shown in Figure 3.6. Therefore, node1  Figure 3.22: Network topology used in the experiments 82  will be using the video captured by camera1; node2 will be using the video captured by camera4, and so forth. The H.264/AVC software, JM version 18.2 is used to generate the CID lookup table of encoding complexity and bitrate of all videos. The QP value used in the experiment ranges from 28 until 36.  Two separate sets of experiments were performed. The first set of experiment is conducted on the training set. The objective is to compare the results obtained by the proposed optimization technique with the ones obtained using CommonConfig and MaximumFairness approaches. From this experiment, we obtain the fairness ratios of the training set that minimize the node’s maximum energy consumption. The energy consumption obtained using the proposed approach will be compared to the one obtained using the CommonConfig and MaximumFairness approaches. To this end, the parameters shown in Table 3.8 are used.  Table 3.8:  Parameters Used for Fairness-based Algorithm Experiments Parameters Description value  Energy cost for transmitting 1 bit 0.5 J/Mb  Transmit amplifier coefficient 1.310-8 J/Mb/m4  Energy cost for receiving 1 bit 0.5 J/Mb  Path loss exponent 4 CPI average cycle per instruction 1.78 Ec Energy depleted per cycle 1.215 nJ B Network Bandwidth 2 Mbps d Distance between node 5m  Integrality constraints  0.2   83  3.4.4 Performance of the Proposed Technique for the Training Scenes In order to find the minimum power consumption with the highest possible video quality, we need to find the minimum QP for each content complexity class. To do this, starting from QP equal to 28, a procedure to check the possibility to allocate CIDs to all the nodes is performed using the MaximumFairness approach. The minimum QP suitable for each scene may not be the same because each scene has different SI (TI) complexity class. For example, the minimum QP for scene party_act2 is 36. However, the minimum QP for scene office_act2 is 28. The optimization-based approach can then be performed on the corresponding training scene. The initial solution provided for the optimization algorithm in each run is different, controlled using the following setting. For each setting, the  Figure 3.23: Bitrate allocated per each node in all training scenes obtained by the three algorithms (a) CommonConfig (b) MaximumFairness (c) Proposed  84  algorithm is was run twice. 1. All nodes are initiated to use the lowest CID 2. All nodes are initiated to use the highest CID 3. The CID is allocated in increasing order, starting from the first node 4. The CID is allocated in decreasing order, starting from the first node For the purpose of the analysis, the performance of the algorithm is compared against CommonConfig and MaximumFairness approaches mentioned in Section 3.3. Figure 3.23 shows the bitrate allocated using the compared techniques. Figure 3.23(a) shows the bitrate allocation for the training scenes obtained using the CommonConfig algorithm. The figure shows that the difference between the highest bitrate and the lowest bitrate  Figure 3.24: Node’s power consumption profile for the high SI training scene obtained by the three algorithms: (a) CommonConfig (b) MaximumFairness (c) Proposed 85  allocated in each scene is as follow: 129.85 kbps for the high SI training scene (party_act2), 115.87 kbps for the medium SI training scene (office_act1) and 115.08 kbps for the low SI training scene (office_act2), respectively. The algorithm CommonConfig does not regulate the bitrate assigned per each node since the algorithm assign the same configuration per each node. Thus, the bitrate assigned to each node does not follow any trend. However, the MaximumFairness approach allocates roughly the same bitrate per each node in any training scene used as shown in Figure 3.23(b). Given that the content captured by each camera in each training scene is not the same, there are some variations in the bitrate assigned to each node. However, the difference between the highest bitrate and the lowest bitrate allocated in each scene is not significant, i.e., 50.37 kbps for the high SI training scene (party_act2), 63.85 kbps for the medium SI training scene (office_act1) and 41.64 kbps for the low SI training scene (office_act2), respectively.  On the other hand, as Figure 3.23(c) shows, the proposed technique allocates different bitrate to each node such that the nodes closer to the sink have generally higher bitrate than the nodes that are farther from the sink. The different between the maximum and minimum bitrate allocated in each scene has become more significant, i.e., equal to 327.73 kbps for the high SI scene, 471.56 kbps for the medium SI scene and 476.59 kbps Table 3.9:  Fairness ratio allocation obtained from each training scenes Training sequence Node 1 2 3 4 5 6 7 8 9 party1_act2 0.243 0.092 0.160 0.100 0.085 0.090 0.084 0.079 0.068 office_act1 0.296 0.079 0.131 0.082 0.071 0.063 0.090 0.089 0.099 office_act2 0.307 0.068 0.122 0.085 0.078 0.080 0.081 0.089 0.092   86  for the low SI training scene. Figure 3.23 also shows that node2 is allocated with a lower bitrate than the other nodes. This is due to the fact that node2 corresponds to camera4 (see Figure 3.6), which according to Table 3.5 and Table 3.6 has a lower content complexity level than the other cameras.  Figure 3.24 shows power consumption profile of the high SI scene. The communication power consumption shown in this figure is the sum of transmission and reception power consumption. Figure 3.24(a) shows that by using algorithm CommonConfig, each node consumes almost the same encoding power consumption. However, in the MaximumFairness approach (see Figure 3.24(b)), each node was assigned roughly the same bitrate. However, the nodes that are closer to the sink consume more energy because they need to relay data from the other nodes. On the other hand, Figure 3.24(c) shows that the proposed optimization-based approach manages to provide a more Table 3.10:  Pnet, Pavg and STD(Pi) of the Training Scenes   Algorithm Training sequence  party1_act2 office_act1 office_act2 Pnet CommonConfig 10.73 10.25 9.97 MaximumFairness 10.95 10.45 10.06 proposed 9.73 9.53 9.35 Pavg CommonConfig 9.4 9.03 8.8 MaximumFairness 9.39 9.03 8.83 proposed 9.31 9.38 9.19 STD(Pi) CommonConfig 0.54 0.49 0.47 MaximumFairness 0.64 0.57 0.48 proposed 0.18 0.09 0.09    87  balance power consumption among all nodes in the video surveillance network. This trend is also observed in the medium SI and low SI training scenes.  Table 3.10 shows the Pnet (nodes’ maximum power consumption), Pavg (average maximum power consumption) and STD(Pi) (standard deviation of nodes’ power consumption) of the three algorithms. It is interesting to see that CommonConfig algorithm manages to perform better than the MaximumFairness algorithm. This shows that assigning the same bitrate to each node does not help in reducing the video surveillance network’s power consumption. On the other hand, the table also shows that the optimization-based approach manage to have lower Pnet, Pavg and STD(Pi) than the other algorithms. This shows  Figure 3.25: Comparison of the Pnet, Pavg, and STD(Pi) values obtained from all test scenes 88  that by regulating the bitrate and assigning the corresponding CID to a node, video surveillance network’s power consumption is reduced.  3.4.5 Performance of the Proposed Technique for the Test Scenes Using the fairness ratio obtained from the training scenes (see Table 3.9), the fairness based algorithm explained in Section 3.4.2 is used to allocate the nodes’ CID for all test scenes. We noticed that the fairness-based adjustment algorithm shown in Figure 3.21 provides better results than the proposed fairness-based allocation algorithm [140]. Therefore, from this point forward, the comparison will show only the proposed fairness-based with adjustment with the existing techniques mention in Section 3.3. Correspondingly, Figure 3.25 compares the value of the Pnet, Pavg and STD(Pi) obtained by Table 3.11:  Percentage of improvement of the proposed algorithm against the other techniques  Test Scenes Improvement against CommonConfig (%) Improvement against MaximumFairness (%) Pnet Pavg STD(Pi) Pnet Pavg STD(Pi) VS1 8.33 -0.29 51.92 10.48 0.72 58.20 VS2 4.24 0.41 26.68 5.06 0.48 33.54 VS3 9.67 0.63 60.32 10.30 0.88 63.30 VS4 7.01 -1.29 38.01 10.23 1.04 46.57 VS5 7.88 0.83 40.86 7.97 1.29 41.32 VS6 7.50 0.41 41.89 8.83 1.29 45.48 VS7 4.60 0.63 36.85 6.28 0.53 47.90 VS8 5.40 0.96 33.65 6.21 1.05 38.81 VS9 8.12 0.73 42.56 8.24 1.20 43.33   89  the three algorithms in all test cases. Furthermore, Table 3.11 shows the percentage of improvement in terms of Pnet, Pavg and STD(Pi) obtained by the proposed techniques against the common approach for all test cases used. It can be seen that the amount of power consumption reduction obtained by the proposed fairness-based with adjustment technique is in the range of 5.06% to 10.48%, averaging into 8.18% improvement against MaximumFairness algorithm. On the other hand, the percentage of Pnet reduction against CommonConfig algorithm is in the range of 4.24% to 9.67%, averaging into 6.97% improvement. The average improvement of the proposed algorithm against common approaches is thus around 7.58%. This result shows that by using the fairness ratio obtained from the training scenes to allocate the nodes’ CID, the proposed algorithm reduces the video surveillance network’s power consumption by around 7.58%. Since video surveillance network nodes’ energy resource is usually limited, 7.58% reduction in power consumption equates to increasing the video surveillance network lifetime by 7.58%. In addition to that, except for test scene VS1 and VS2, the proposed algorithm also manages to slightly reduce the video surveillance network’s average power consumption. The standard deviation of nodes’ power consumption is also reduced by more than 40% on average.  3.5 Summary This chapter analyzed the effect of encoder parameter settings and content complexity to video surveillance network’s power consumption. A large number of real-life captured videos of simulated video surveillance network settings with different activity levels were developed and used in the analysis. The scenes were classified according to its 90  content complexity on which the higher activity level scenes were used as the training set. It was shown that the fairness ratio allocated per each node affects the distribution of power consumption in a video surveillance network. The fairness ratio obtained from the training sets were then used as the initial guide to allocate the nodes’ encoder configuration for the test scenes. The results show that the proposed techniques manage to reduce the power consumption by 7.58% as compared to the results obtained by the other techniques. The proposed method depends on the assumption that the nodes have the information about the coding complexity and bitrate of the videos in the form of a look up CID table. Models to estimate the coding complexity and bitrate of videos can alleviate the need for such a table. This is what will be studied in the next chapter. 91   Coding Complexity and Bitrate Models for H.264/AVC-based Video Surveillance Networks In this chapter, models to estimate coding complexity and bitrate of videos are proposed. The models were developed based on the analysis of the effect of some encoding parameters to coding complexity and bitrate on a set of training video, and tested against a set of unseen test video. An approach to reduce the models’ estimation error for videos whose content changes within a specific period of time is also proposed in this chapter. A study to estimate video surveillance networks’ total power consumption using the proposed models was also discussed.  4.1 Proposed Model In Chapter 3, it was shown that the GOP size that controls the number of the inter-coded pictures in successive frames is a parameter that significantly affects the coding complexity and bitrate of the video. The other factor that controls the coding complexity and bitrate is the type and variety of block sizes used in the inter prediction process [141]. Increasing the number of block sizes results in better prediction and consequently higher compression performance at the expense of increased complexity. In general, there are seven block sizes defined for inter-prediction in H.264/AVC. In this thesis, the complexity of motion estimation (ME) is classified into different levels of complexity based on the variety of block size candidates used (see Table 4.1) used to encode a video.  92  4.1.1 H.264/AVC Coding Complexity Model  The complexity of encoding process of a video sequence (CS) is formulated as follows: PPIIS nCnCC           (4.1) where CI is the complexity to encode an I-frame, CP is the complexity to encode a P-frame, nI is the number of I-frames in the sequence and nP is the number of P-frames in the sequence. For a video sequence with no scene change, the value of CI can be considered constant. On the other hand, CP depends on the complexity level of the ME process. The complexity level of ME process (called ML) is classified based on the used block size candidates in the encoding process as shown in Table 4.1.  The GOP size does not affect the normalized coding complexity of P frames at each ML [141]. Note that the normalized coding complexity is calculated by dividing the coding complexity at certain ML with the coding complexity when ML is equal to one. Furthermore, Figure 4.1 shows the plot of normalized CP (NCP) for the video “BQMall” and “Traffic”.  This figure shows that the plots have the same slope but the length is scaled by a constant. The Table 4.1:  ME complexity level (ML) ML Block Size Candidates 1 SKIP, 16X16 2 SKIP, 16X16, 16X8 3 SKIP, 16X16, 16X8, 8X16 4 SKIP, 16X16, 16X8, 8X16, 8X8 5 SKIP, 16X16, 16X8, 8X16, 8X8, 8X4 6 SKIP, 16X16, 16X8, 8X16, 8X8, 8X4, 4X8 7 SKIP, 16X16, 16X8, 8X16, 8X8, 8X4, 4X8, 4X4  93  normalized CP range is then defined as the range of normalized CP values for a specific video. As it is observed, the value of normalized CP for the “Traffic” video is between 1 and 1.485. Hence, the normalized CP range for “Traffic” video is thus equal to 0.485 and . On the other hand, the normalized CP range for the “BQMall” video is equal to 0.66. In this regard, Figure 4.1 also shows that the normalized CP range of a video is proportional to the value of CP when ML is equal to one (𝐶𝑃(𝑀𝐿=1)). Therefore, the range of normalized CP of a specific video sequence is modeled as:  Figure 4.1: Normalized CP of the “BQMall” and “Traffic” videos (15 fps, CIF, GOP=2)  Figure 4.2: Fractional increase of normalized CP of the “BQMall” and “Traffic” videos (15 fps, CIF)  94  bCaLMP  )1(1   (4.2) Furthermore, the fractional increase of normalized CP is calculated by scaling the value of normalized CP using the following formula:  1)7(1)()(LPLPLMLMNCiMNCiM ,  (4.3) where NCP (ML=i) denotes the normalized CP when ML is equal to i.  Figure 4.2 shows the ML plot of “Traffic” and “BQMall” videos. It is interesting to see that the increase of normalized CP with respect to ML is almost similar to both videos. By averaging the values of the fractional increase of normalized CP from both videos, we obtain the value of ML shown in Table 4.2.  Using 1, the complexity to encode a P-frame for a specific ML is:  1)1()( )(1    iCC MLMPiMP LL   (4.4) Considering that nI=N/GOP, where N is the total number of frames and nP=NN/GOP, then the average coding complexity per frame is: Table 4.2:  ME complexity level (ML) and ML ML ML 1 0 2 0.13 3 0.26 4 0.54 5 0.67 6 0.81 7 1  95       GOPGOPbCaiCCCLL MPMLMPIf/1)()(1 )1()1(      (4.5) 4.1.2 H.264/AVC Bitrate Model  Following the same approach to develop coding complexity model, the size of  encoded video sequence (in bit) is modeled as: PPIIS nRnRR    (4.6) where RI is the average size of an I-frame and RP is the average size of a P-frame. Note that the value of RP depends on the ML and GOP used by the encoder. As it is observed in Figure 4.3, the value of RP decreases as ML increases. Therefore, for a certain GOP value, the RP is modeled as: )()1,()( LMiGOPPiGOPP MfRR L     (4.7) where 𝑅𝑃(𝐺𝑂𝑃=𝑖,   𝑀𝐿=1)  is the bitrate of a P-frame when GOP=i and ML=1, and f(ML) is modeled as a decay function with respect to ML, that match the plot in Figure 4.3. We have analyzed different decay functions and decided to use the generalized logistic function. The  Figure 4.3: Bitrate of a P-frame for different ML of “BQMall” video 96  logistic function is a widely used sigmoid function for growth/decay modeling where the growth/decay is exponential at first, but eventually slower and then levels off. This matches the way RP is reduced with the increase of ML. The logistic function f(ML) used in our study is as follows: )(1)(srL MLepqpMf  (4.8) where p and q indicate the minimum and maximum asymptote of the plot respectively, r is the growth rate, while s signify the time for maximum growth. Figure 4.3 also shows that the slope of the RP plot for different GOP sizes is the same. Therefore, RP is modeled equal to: )()()1,2( GOPfMfRR LMLGOPPP     (4.9) where f(GOP) is a function that reflects the effect of GOP towards RP. To obtain the parameters for f(ML),  we used the least mean square regression technique on the normalized RP using the data from the training video sequences that were encoded with GOP=2 and ML=1. We used 2ln(GOP) to estimate the f(GOP) [141]. The value of 2 is estimated using the least square regression technique from the RP values of the training video sequences that are encoded with different GOP settings. Assuming that the average bitrate of an I-frame is equal to RI, the average bitrate of a frame (Rf) is estimated as: GOPGOPGOPepqpRGOPRRsrMLGOPPIfML)1()ln(12)()1,2(  (4.10) 97  4.1.3 Analysis of the Model In order to obtain the parameters for the model, we encode the first two frames of each video sequence. Assuming that there is no scene change in the video sequence, the bitrate of each I-frame will be almost similar. Therefore, for the bitrate model, RI is assumed to be equal to the bitrate of the encoded first frame, while RP(GOP=2, ML=1) is equal to the bitrate of the second frame. The parameters for (4.8) are obtained from the training videos, and the values are as follow: p=0.92, q=1, r=-21.36 and s=0.14.  For the complexity modeling, the iprof provides us with the complexity of encoding the first two frames of the video sequences, i.e., 𝐶2 𝑓𝑟𝑎𝑚𝑒𝑠 = 𝐶𝐼 + 𝐶𝑃(𝑀𝐿=1). In order to obtain the value of 𝐶𝑃(𝑀𝐿=1), we need to estimate the value of CI. In this regard, we assume that for the I-frame, the value of CI can be estimated from the value of RI using a linear regression of the training videos [141], i.e.,  𝐶𝐼 = 0.09 ∙ 𝑅𝐼 + 216.97. Furthermore, the value of 1 is calculated using (4.3) with the following parameters: a=0.0135 and b=-2.13. Table 4.3:  Coding complexity modeling error Test video Percentage Error (%) RaceHorses 2.79 PeopleOnStreet 2.05 Vidyo1 3.45 Table 4.4:  Bitrate modeling error Test video Percentage Error (%) RaceHorses 11.6 PeopleOnStreet 8.57 Vidyo1 9.55  98  The average percentage error of coding complexity and bitrate for GOP={1, 2, 4, 8, 16, 32, 64} and ML={1, 2, 3, 4, 5, 6, 7} for videos “RaceHorse”, “PeopleOnStreet” and “Vidyo1” [132] are calculated. As Table 4.3 shows, the average error in complexity modeling is less than or equal to 3.45% for the abovementioned video sequences, while the average error in bitrate modeling is less than or equal to 11.6% as reported in Table 4.4. 4.2 Model Evaluation in a Simulated Video Surveillance Test Environment In order to mimic realistic video surveillance network applications, we have captured real-life videos using four cameras in the atrium of a public building. The cameras were installed so that each of them had a different point of view as shown in Figure 4.4. The views of some cameras were overlapping with one another. The scene arrangement was such that each camera point of view was different. In order to mimic a practical application, the videos were downsampled to 416x240 pixels resolution and 15 frames per second (fps). Five shots of videos were captured using the four cameras, resulting in a total of 20  Figure 4.4: Camera placements 99  different videos. These videos were named using the following convention <camera-id_shot-id>. Therefore, camera1_shot1 is the video obtained by camera1 in the first shot. The four videos of the fifth shot were selected as the training set for the model, while the remaining videos were used as the test set.  4.2.1 Model Evaluation for Videos with Changing Content In many real-life captured videos, content may change during a 10s video shot. For example, Figure 4.5 shows the snapshots of camera1_shot3 video sequence at frame number 1, 70, and 100. It can be seen that the content at the start of the video (frame number 1) differs significantly from the content towards the end of the video (frame number 100). On the other hand, Figure 4.6 shows the snapshots at frames number 1, 60 and 110 of the camera2_shot2 video sequence. Again, we can see that the content at the  Figure 4.5: Content changes during a 10s camera1_shot3 video sequence. (a) frame 1 (b) frame 70 (c) frame 100  Figure 4.6: Content changes during a 10s camera2_shot2 video sequence. (a) frame 1 (b) frame 60 (c) frame 110  100  beginning of the video differs significantly from the ones captured at a later time, i.e., frames 60 and 110. From this observation, it is clear that utilizing model parameters from the first two frames at the beginning of video may lead to a large estimation error. In order to tackle this problem, we divide the 10s video into a number of sub-shots. In each sub-shot, coding complexity and bitrate estimation were performed.  Figure 4.7 shows the flowchart of the proposed method to reduce coding complexity and bitrate estimation error. In this figure, the variable frame_num is the current frame number, while k denotes the length of a sub-shot in terms of the number of frames. Note that, since the video is divided into N/k sub-shots, the first two frames of each sub-shots are encoded to obtain the required parameters for the model [142]. The estimation error is calculated from the complexity per second (Cps) and average bitrate (Rav), that is defined as follows [122]:    kNiiifravkNiiifrpsRNFRCNFC/1/1  (4.11) Here, Fr is the frame rate and i is calculated as follows:   otherwisekNkNiki),mod(/1  (4.12)  Figure 4.7: Flowchart for coding complexity and bitrate estimation error reduction method  101  In order to estimate the modeling error, the root mean square error (RMSE) of the coding complexity and bitrate for GOP={1, 2, 4, 8, 16, 32, 64}, ML={1, 2, 3, 4, 5, 6, 7} and k={150, 75, 60, 45} are calculated.  Recall that in order to implement our model using the complexity and bitrate modeling, we need to obtain several variables from each video sequence. Therefore, the first two frames of each sub-shots of the video sequence were encoded. For the bitrate model, RI is assumed to be equal to the bitrate of the encoded first frame of each sub-shot, while   𝑅𝑃 = 𝑅𝑃(𝐺𝑂𝑃=2,   𝑀𝐿=1) is equal to the bitrate of the second frame. The following parameters for (4.8) are used: p=0.92, q=1, r=-21.36 and s=0.14.  Table 4.5:  Coding complexity estimation error for different values of k Test sequence k=150 k=75 k=60 k=45 camera1_shot1 34.966 35.533 28.523 27.890 camera2_shot1 26.722 26.812 26.823 30.280 camera3_shot1 48.005 38.997 45.258 35.861 camera4_shot1 45.437 36.435 34.667 32.790 camera1_shot2 33.850 32.615 37.589 34.662 camera2_shot2 37.247 29.967 26.934 26.985 camera3_shot2 28.769 36.088 28.459 32.256 camera4_shot2 33.145 27.086 27.538 27.052 camera1_shot3 69.666 30.149 37.266 39.555 camera2_shot3 59.759 29.830 39.279 30.596 camera3_shot3 47.022 37.739 35.961 39.236 camera4_shot3 41.304 33.905 35.581 31.479 camera1_shot4 27.858 32.782 30.906 32.127 camera2_shot4 38.642 38.363 31.962 33.426 camera3_shot4 36.970 36.930 36.914 36.860 camera4_shot4 39.797 39.986 32.818 39.890   102  For complexity modeling, the coding complexity of the first two frames, i.e.,   𝐶2𝑓𝑟𝑎𝑚𝑒𝑠 =  𝐶𝐼 + 𝐶𝑃(𝑀𝐿=1) of each sub-shots is provided by the iprof tool. In order to obtain the value of 𝐶𝑃(𝑀𝐿=1) the value of CI is estimated from the value of RI using a linear regression of the training videos using the following formula CI =0.0637·RI + 214.56. Furthermore, the value of 1 is calculated using (4.5) and the following parameters: a=0.009, and b=-1.178 [122]. Table 4.5 shows the coding complexity estimation error of all test sequences and different values of k. The table shows that, in general, the coding complexity estimation error decreases as we use a larger number of sub-shots, i.e., using smaller k values. We can also see that the proposed method manages to reduce coding complexity estimation error in 11 out of 16 cases when k is set equal to 45. On the other hand, using k=60 frames, the coding complexity estimation error is reduced in 13 out of 16 cases. In particular, in the case of video camera1_shot3, coding complexity estimation error for k=150 and k=60 is equal to 69.666 and 37.266, respectively. This is equal to 46.5% reduction in estimation   Figure 4.8: Actual and estimated coding complexity for different values of k and GOP sizes for the following sequences (a) camera4_shot1 (b) camera1_shot3   103  error. Figure 4.8 compares the measured coding complexity and the estimated coding complexity per second (Cps) for different values of k and varying GOP sizes. Note that, in this figure, the value of ML is set to four. Furthermore, Table 4.6 shows the bitrate estimation of all test sequences and different values of k. Similar to the coding complexity case; the table shows that, in general, the bitrate estimation error decreases as smaller k values are used. We can also see that the proposed method manages to reduce the bitrate estimation error in 12 out of 16 cases when k is set equal to 45. However, when k is set equal to 60, the bitrate estimation error is reduced in 13 out of 16 cases. The highest error reduction is obtained in the case of the Table 4.6: Bitrate estimation error for different values of k Test sequence k=150 k=75 k=60 k=45 camera1_shot1 34.518 25.418 25.172 13.673 camera2_shot1 19.536 15.413 15.931 7.517 camera3_shot1 16.299 8.806 9.189 8.517 camera4_shot1 6.920 6.256 6.201 4.397 camera1_shot2 8.105 4.326 10.246 8.994 camera2_shot2 1.729 3.986 5.002 8.189 camera3_shot2 16.081 14.566 13.806 12.671 camera4_shot2 23.886 5.123 11.233 11.414 camera1_shot3 75.219 39.312 33.851 33.323 camera2_shot3 46.566 27.678 28.663 19.118 camera3_shot3 11.960 9.755 10.143 9.213 camera4_shot3 12.459 5.488 1.949 3.270 camera1_shot4 10.359 15.375 20.977 19.560 camera2_shot4 18.760 22.383 17.658 15.693 camera3_shot4 9.920 9.854 9.747 9.843 camera4_shot4 6.576 7.030 6.678 7.041   104  camera1_shot3 video sequence. In this particular video, the RMSE of the bitrate model for k=150 is equal to 75.219 kbps. However, when k is set equal to 60 frames, the RMSE of the bitrate model is reduced to 33.851 kbps. This is equal to 55.7% reduction in the estimation error. Figure 4.9 compares the measured bitrate and the estimated average bitrate (Rav) for different values of k and varying GOP sizes. Note that, in this figure, the value of ML is set equal to four.  The results analyzed in the previous paragraphs show that by dividing a video sequence into a number of sub-shots, the model estimation error is reduced. The results also show that the estimation error reduction varies from one video to another. However, it is observed that the setting k=60 provides the smallest estimation error. From this point onward, the analysis of video surveillance network’s power consumption is performed under the assumption that the value of k is set equal to 60 frames.  Figure 4.9: Actual and estimated bitrate for different values of k and GOP sizes for the following video sequences (a) camera4_shot1 (b) camera1_shot3  105  4.2.2 Video Surveillance Network’s Power Consumption Estimation  The power consumption of a node in a video surveillance network consists of encoding energy consumption and communication power consumption. The power consumption for encoding is estimated as follows: cpse ECPICP    (4.13) where CPI is the number of CPU cycles to perform one basic instruction and Ec is the energy depletion per cycle. On the other hand, for a direct transmission to the sink, the transmission power consumption is calculated as:    avt RdP    (4.14) where  is a constant coefficient related to coding and modulation,  is the amplifier energy coefficient, d is the transmission distance, and  is path loss exponent.  4.2.3 Experiments and Results For our analysis, we use the topology shown in Figure 4.4, consisting of four video Table 4.7: Parameters used Symbol Definition Value Fr Frame rate 15 fps N Number of frames 150 frames k The length of sub-shot 60 frames CPI Average cycle per instruction 1.78 Ec Energy consumption per cycle 1.215e-9 J/cycle  Energy cost for transmitting 1 bit 1e-9 J/b/m4  Transmit amplifier coefficient 5e-8 J/b  Path loss exponent 3.5   106  nodes and a sink. The parameters shown in Table 4.7 are used for the experiments. In order to analyze the effect of different video sources and encoding configurations, two sets of experiments are conducted. In the first experiment, the nodes’ encoder parameter settings are set to be the same in all scenarios. However, the video sources used in each scenario vary. On the other hand, in the second experiment, the nodes are configured to use the same set of video sources in all scenarios, while the nodes’ encoding parameter settings and the nodes’ distance to the sink are varied.  The scenarios’ configuration for the first experiment is shown in Table 4.8. In the first scenario, the nodes are using the videos obtained from the first shot: camera1_shot1, camera2_shot1, camera3_shot1, and camera4_shot1. On the other hand, in the second scenario of the first experiment, the videos used are the videos obtained from the second shot, and so on. Note that, for this experiment, the ML value is set equal to six.  Figure 4.10 shows the estimated nodes’ power consumption in the first experiment. The figure shows that the nodes’ power consumption in each scenario is not the same. We Table 4.8: Experiment 1 scenarios Scenario Test sequences used Distance to the sink GOP size 1 camera1_shot1 (node 1), camera2_shot1 (node 2),  camera3_shot1 (node 3), camera4_shot1 (node 4) 3m 8 2 camera1_shot2 (node 1), camera2_shot2 (node 2),  camera3_shot2 (node 3), camera4_shot2 (node 4) 3m 8 3 camera1_shot3 (node 1), camera2_shot3 (node 2),  camera3_shot3 (node 3), camera4_shot3 (node 4) 3m 8 4 camera1_shot4 (node 1), camera2_shot4 (node 2),  camera3_shot4 (node 3), camera4_shot4 (node 4) 3m 8  107  can also observe that the trend of nodes’ power consumption profile for each scenario varies. For example, Figure 4.10(a) that corresponds to scenario 1 shows that the node that consumes the highest power consumption is node 3. However, the difference in terms of total power consumption between node 3 and the other nodes in this scenario is not significant. In the other scenarios (i.e., scenario 2, scenario 3 and scenario 4), however, the node that has the highest power consumption is node 1. It is interesting to see that the encoding power consumption of each node in each scenario is not the same, even though all nodes are using the same encoding parameter settings in this experiment. In addition, the variance of nodes’ total power consumption varies between one scenario and the other. In terms of the video surveillance network’s average power consumption, we obtained the  Figure 4.10: Nodes’ power consumption in experiment 1 (a) scenario 1, (b) scenario 2, (c) scenario 3, (d) scenario 4  108  following values: 7.756 W (scenario 1), 7.843 W (scenario 2), 7.787 W (scenario 3) and 7.824 W (scenario 4). These results show that the content captured by each camera node affect not only the node’s power consumption but also the video surveillance network’s average power consumption. In the second set of experiments, the nodes are set to use the videos from the first shot. However, the nodes’ distance to the sink and the GOP size are varied. The ML is set equal to six, similar to the first experiment. The configuration used in the second experiment is summarized in Table 4.9. Figure 4.11 shows the estimated nodes’ power consumption in this experiment. Figure 4.11(a) and Figure 4.11(b) show the nodes’ power consumption when the nodes’ distance to the sink is equal to 1.5m for scenario 1 and scenario 2, respectively. However, the GOP size is set equal to 2 (scenario 1) and 16 (scenario 2). It can be seen from these figures that when the distance to the sink is small, using smaller GOP size will reduce the node’s power consumption. The video surveillance network’s average power consumptions shown by these figures are 7.388 W (scenario 1, Table 4.9: Experiment 2 scenarios Scenario Test sequences used Distance to the sink GOP size 1 camera1_shot1 (node 1), camera2_shot1 (node 2),  camera3_shot1 (node 3), camera4_shot1 (node 4) 1.5m 2 2 camera1_shot1 (node 1), camera2_shot1 (node 2),  camera3_shot1 (node 3), camera4_shot1 (node 4) 1.5m 16 3 camera1_shot1 (node 1), camera2_shot1 (node 2),  camera3_shot1 (node 3), camera4_shot1 (node 4) 5m 2 4 camera1_shot1 (node 1), camera2_shot1 (node 2),  camera3_shot1 (node 3), camera4_shot1 (node 4) 5m 16    109  GOP=2), and 7.451 W (scenario 2, GOP=16). The node’s power consumption can be further reduced if the nodes’ are configured to use GOP equal to one. In this case, the video surveillance network ’s average power consumption will be equal to 7.316 W.  Furthermore, Figure 4.11(c) and Figure 4.11(d) show the nodes’ power consumption when d is equal to 5 m. Similar to the previous case, the GOP size is set equal to 2 and 16 for scenario 3 and scenario 4, respectively. It can be seen clearly from these figures that the cost of transmitting the encoded video increased tremendously as compared with the first two scenarios when d is smaller. Therefore, when the nodes’ distance from the sink is large, the node’s power consumption can be reduced if bigger GOP sizes are used. Comparing Figure 4.11(c) and Figure 4.11(d) we observe that the video   Figure 4.11: Nodes’ power consumption in experiment 2 (a) scenario 1, (b) scenario 2, (c) scenario 3, (d) scenario 4  110  surveillance network’s average power consumptions for these scenarios are 14.607 W (scenario 3) and 8.616 W (scenario 4). These results show that the node’s power consumption depends on the encoding configuration used and the distance between the node and the sink. 4.3 Summary In this chapter, coding complexity and bitrate models were proposed. The models were developed by considering the video encoding parameters that significantly affect the coding complexity and bitrate. Through an adaptive scheme in adjusting model parameters, it was shown that the model estimation error can be reduced. Using the proposed models, a study on the video surveillance network node’s power consumption under different scenarios that involved the use of various video content, encoding configurations, and nodes’ distance from the sink were analyzed. It is shown that the nodes’ power consumption depends on the encoding parameter settings and video content captured by the node. In the next chapter, in addition to encoding parameters, the spatial and temporal complexity of the content will be incorporated into the bitrate and coding complexity models. Furthermore, in order to comply with the bandwidth constraint, the effect of different QP settings will also be considered. 111   Spatial and Temporal Complexity of Content and Video Surveillance Network’s Power Consumption In Chapter 3, we show that coding complexity and bitrate of videos depend on the encoder parameter settings and the video content. In order to spend energy efficiently, the nodes in a video surveillance network need to be able to estimate the coding complexity and bitrate for the captured content and utilize the models to minimize its power consumption. In this chapter, we extend the models designed in Chapter 4 by incorporating more encoding parameters, i.e., the quantization parameter, and the scene’s complexity of content. An optimization framework that utilizes the proposed coding complexity and bitrate models to minimize the video surveillance network’s power consumption is also proposed. 5.1 Methodology Our objective here is to model the effect of content complexity and encoding parameters on bitrate and coding complexity. To this end, we consider the effect of spatial/temporal information of video, GOP size, complexity level of motion estimation, and QP. The following subsections explain how the models were developed. 112  5.1.1 Content Complexity In order to incorporate the effect of content complexity in our models, the average spatial complexity (SA) and temporal complexity (TA) are used. To this end, we calculate SA and TA according to [115], as follows:  )]}([{ nspacetime FSobelstdmeanSA    (5.1) ]}[{ 1 nnspacetime FFstdmeanTA   (5.2) where std stands for the mean of standard deviation, Sobel refers to Sobel edge detection filter and Fn denotes the nth frame of the video sequence. The overall complexity of the content is a function of SA and TA (i.e., f(SA, TA)). Considering that the intra prediction technique considers only the redundancy of spatial information within each frame while the inter prediction technique exploits both the temporal and spatial redundancy, the effects of content complexity on intra and inter prediction are investigated separately. For Table 5.1:  Functions of SA and TA to model fI Functions fI to model Intra Frames a1∙SA + a0 a1∙log(SA) + a0  Table 5.2:  Functions of SA and TA to model fP Functions fP to model Inter Frames a1∙TA + a0 a1∙log(TA) + a0 a1∙SA∙TA + a0 a1∙log(SA∙TA) + a0 a2∙SA + a1∙TA + a0 a2∙log(SA) + a1∙log(TA) + a0      113  the intra-coded frames, f(.) depends only on SA. However ,in the case of inter-coded frames, f(.) depends on both SA and TA. To estimate f(.) for inter and intra-coded frames, a set of possible functions are presumed as suggested in [115] (see Table 5.1 and Table 5.2).  Subsection 5.1.4 discusses the process for estimating the f(.) and finding its parameters ai.  5.1.2 Design of Bitrate Model The average bitrate of a frame in a video sequence can be estimated as follows: GOPGOPNnGOPNnNRnRnRPIPPII)1(~~  (5.3) where 𝑅?̃?  is the estimated average bitrate of I-frames, 𝑅?̃?  is the estimated average bitrate of P-frames, N is the total number of frames, 𝑛𝐼  is the number of I-frames, 𝑛𝑃 is the number of P-frames, and GOP refers to group of pictures (GOP) size. In general, in order to obtain a higher compression performance, a more complex and computationally expensive encoding scheme is required. For instance, increasing the GOP size reduces the bitrate at the cost of higher coding complexity [141]. To model 𝑅?̃?  in (5.3), we need to take into account that the bitrate of an I-frame depends on the QP as well as the content complexity. Thus,  𝑅?̃?  is modeled as follows:  )(~QPgRR III    (5.4) where 𝑅𝐼̅̅ ̅ is the bitrate function related to content complexity of I-frames and gI is a bitrate function related to QP. To estimate 𝑅𝐼̅̅ ̅, we follow the approach proposed in [115] and 114  consider two possible functions based on the average spatial complexity (SA) unit as shown in Table 5.1.  The model parameters a1 and a0 for 𝑅𝐼̅̅ ̅ are estimated using least square regression (see subsection 5.1.4). For g(.), we use a slightly modified version of the power function used in [115] as follows:  IfIQPQPwQPg min)(   (5.5) where QP stands for the quantization parameter used for video compression, w is a model parameter that is estimated using least mean square regression, QPmin refers to the smallest QP value that can result in a decent quality compressed video in a surveillance system with limited recourses, and  fI is a function of SA unit.  The parameter w is restricted such that 1 ≤ w ≤ 1, where  is a small number (the value of  is set equal to 0.1 throughout this chapter). This restriction is made to ensure that when QP is equal to QPmin, the value of w is close to one. In this study, QPmin is set to 28 because we consider this as the smallest QP value that can deliver a decent quality video in a low-cost surveillance system. For fI, two possible functions based on the SA unit as shown in Table 5.1, are considered. Table 5.3:  ME complexity level (ML) ML Block size candidate ML 1 SKIP, 16x16 0.13 2 SKIP, 16x16, 16x8 0.26 3 SKIP, 16x16, 16x8, 8x16 0.39 4 SKIP, 16x16, 16x8, 8x16, 8x8 0.51 5 SKIP, 16x16, 16x8, 8x16, 8x8, 8x4 0.67 6 SKIP, 16x16, 16x8, 8x16, 8x8, 8x4, 4x8 0.81 7 SKIP, 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4 1      115  To model 𝑅?̃?  in (5.3), we need to consider that the bitrate of the P-frames depends on temporal content complexity, spatial content complexity, and motion estimation as well as QP. Thus we model 𝑅?̃?  as follows:  )()(~QPghRR PMLPPP     (5.6) where 𝑅𝑃̅̅̅̅  is the bitrate function related to temporal and spatial complexity of P-frames, gp is a bitrate function related to QP, and hp is a bitrate function related to motion estimation. 𝑅𝑃̅̅̅̅  is estimated by considering the six possible functions shown in Table 5.2.  The bitrate function related to QP, gp, is modeled similar to the one used for I-frames:  PfPQPQPwQPg min)(   (5.7) where fP is a function of temporal and spatial complexity (i.e., TA and SA).  Six possible functions based on TA and SA as shown in Table 5.2, can be considered for fP. To model hp in (5.6), we need to investigate the effect of motion estimation on bitrate. Using more variety of block-size candidates for motion estimation results in a better prediction and consequently higher compression performance at the expense of increased complexity [141]. There are seven block sizes defined for inter-prediction in H.264/AVC. Depending on the block-size candidates used in the encoding process, motion estimation (ME) can be classified into different levels of complexity (called ML) as shown in Table 5.3 [141]. In this Table, ML is defined as the ratio of normalized complexity increase for different ML with respect to the case where ML is equal to one (see [141] for more details). According to the study in [141], hp can be modeled using a logistic function as follows: 116     32      exp    11wwffhMLppp  (5.8) where fp denotes the effect of content complexity that can be represented by the functions shown in Table 5.2, w2 and w3 are model parameters that will be obtained using least square regression. 5.1.3 Design of Coding Complexity Model In a similar way to bitrate modeling, the coding complexity is modeled as follows: )()(~)(~~~QPvuCCQPvCCNCnCnCPMLPPPIIIPPII  (5.9) where 𝐶?̃?  and 𝐶?̃?  are the average coding complexity of I-frames and P-frames respectively, u(.) is a coding complexity function related to the ME process, and v(.) is a coding complexity function related to QP. 𝐶?̅?  and 𝐶𝑃̅̅ ̅  are the complexity functions related to the content complexity of I-frames and P-frames, respectively. 𝐶?̅?  will be estimated from content complexity functions shown in Table 5.1, while function vI is modeled as: vI (QP)= w ×QPQPminæèçöø÷fI  (5.10) Furthermore, the 𝐶𝑃̅̅ ̅ will be estimated using content complexity functions shown in Table 5.2, while function vP is modeled as: PfPQPQPwQPv min)(   (5.11) 117  Our previous work has shown that coding complexity increases linearly with the increase of ML [141]. However, the coding complexity of P-frames also depends on the content complexity. Therefore, the coding complexity function related to the ME process, i.e., up, is modeled as follows:  PMLMLP fu   1)(   (5.12) 5.1.4 Estimation of the Model Parameters In order to find the model parameters shown in (5.1) to (5.12), we utilize a set of videos for training and use the least square regression technique to estimate the parameters. To this end, the training videos are encoded with different GOP, ML values and QP as it is explained in the following subsections. In order to select the best functions for content complexity for our models, we calculate the root mean square error (RMSE) and Pearson’s Correlation coefficient (PC). The function providing the least RMSE and highest PC is selected to be integrated into the model. In the training process, we take into account that some data points might be outliers and have extreme/illegitimate values during model fitting. The presence of these points can lead to inflated error rates and substantial distortion in parameter estimation [143]. There are different approaches in dealing with extreme data in the training sample, such as removal, transformation, truncation and robust method [143]. In this study, we use one of the robust methods proposed in [144] by replacing extreme data with the next closest valid point in the dataset. 118  5.1.4.1 Video Sequences To obtain representative data with different temporal and spatial complexity levels, we use the video dataset developed in Section 3.2.1. In our study, we require selecting a representative set of videos with different spatial and temporal complexity (SA and TA) levels from the captured database for training our model. In this regard, the following scenes are used as the training set in this study: 1. scene office_act4 that represents scene with low SA and low TA, 2. scene classroom_act2 that represents scene with medium SA and medium TA, and 3. scene party_act1 that represents scene with high SA and high TA. Each of the above three scenes consists of nine videos. Thus, 27 videos are used as the training set. The remaining 81 videos from the other scenes and activity levels are used as the test set for the models. 5.1.4.2 Video Encoding Parameters In our study, due to the limitation in the energy and processing resources of the video surveillance network application, less complex encoder configurations are deployed. For instance, the baseline profile of H.264/AVC, which only utilizes I and P frames (no B-frame) and is suitable for low complexity applications is used. The other encoding settings in this study include using context-adaptive variable-length coding (CAVLC) entropy coding, one reference frame, and SR (search range) equal to eight, while the rate distortion optimization (RDO), rate control, and the deblocking filter are disabled. In addition, the QP of the P-frames is set equal to the QP of the I-frames minus one. The H.264/AVC reference encoder software (JM 18.2) is used in our implementation. To measure coding complexity, 119  the instruction level profiler iprof [133] is utilized. The iprof gives us the number of basic instruction counts (IC) to perform an encoding task and has been used to evaluate the performance of different ME algorithms in MPEG video coding standardization activities. The instruction count is proven to be an objective and accurate measure of coding complexity [106], [133], [141]. 5.1.4.3 Bitrate Model Parameters Estimation If we assume the GOP size is equal to one, nP in (5.3) will be equal to zero and thus ?̅? = 𝑅𝐼̅̅ ̅ ∙ 𝑔𝐼(𝑄𝑃). Furthermore, if QP is set to 28, the value 𝑔𝐼(𝑄𝑃) will be close to one. Therefore, the average bitrate of intra coded frames 𝑅𝐼̅̅ ̅, which is represented by one of the functions shown in Table 5.1, is approximately equal to the average bitrate of the video frames, when GOP=1 and QP=28. To estimate the model for I-frame bitrate, the two functions shown in Table 5.1 are used to estimate the average frame bitrate of training videos  encoded with GOP=1 and QP=28. The parameters of the functions are estimated using least square regression. Then, the RMSE and PC obtained from each function are calculated. The function with the least RMSE and highest PC is selected for the model. With this approach, we found that the function a1∙SA + a0 provides the best estimation with RMSE=5.5769 and PC=0.9493. The model parameters obtained are a1=1.3557, and a0=-40.093. Therefore, 𝑅𝐼̅̅ ̅ is modeled as:  40.093SA3557.1 IR   (5.13) Having the estimate for 𝑅𝐼̅̅ ̅, 𝑔𝐼(𝑄𝑃) can be estimated using (5.3), (5.4), and (5.5). To this end, we assume that the GOP size is equal to 1, thus ?̅? = 𝑅𝐼̅̅ ̅ ∙ 𝑔𝐼(𝑄𝑃). Using the average 120  frame bitrate of training videos (compressed with the following configuration:  GOP=1, QP=28 to 40 with step size of 2), we use (5.5) to estimate 𝑔𝐼(𝑄𝑃). The least square regression technique shows that the function a1∙log(SA) + a0 from Table 5.1 provides the best result with RMSE=0.0476 and PC=0.9801 when the model parameters are equal to w=1.0226, a1=-0.2451, and a0=-2.1131. Thus, 𝑔𝐼(𝑄𝑃) is formulated as follows: 1131.2  )log(  -0.2451min  0226.1)(SAIQPQPQPg   (5.14) To estimate the model for 𝑅𝑃̅̅̅̅ , we first assume ML=1 and QP= 28, thus ℎ𝑃(𝛿𝑀𝐿) ≃ 1 and 𝑔𝑃(𝑄𝑃) ≃ 1, respectively. Then using the estimated models for 𝑅𝐼̅̅ ̅ and 𝑔𝐼(𝑄𝑃), 𝑅𝑃̅̅̅̅  can be estimated based on (5.3) and (5.6). Here the frame bitrate of training videos (encoded with GOP= {2, 4, 8, 16, 32, 64} and QP=28) are used to identify which function shown in Table 5.2 results in the highest PC and lowest RMSE. The least square regression shows that the best estimation for  𝑅𝑃̅̅̅̅  is as follows:  1.80880.0223  TASARP   (5.15) resulting in minimum RMSE of 2.9867 and the highest PC of 0.9606.  The function ℎ𝑃(𝛿𝑀𝐿) quantifies the effect of using different ML values on the bitrate. Assuming that GOP=1 and QP=28, ℎ𝑃(𝛿𝑀𝐿) can be estimated using (5.3), (5.6), (5.8) and (5.15). To this end, the frame bitrate of training videos (encoded with GOP= {2, 4, 8, 16, 32, 64}, ML={1, 2, 3, 4, 5, 6, 7}, QP=28) is used to identify which function shown in Table 5.2 results in the highest PC and lowest RMSE. Note that the value of 𝑅?̃?  in (5.3) is calculated using (5.4), (5.13) and (5.14). Using least square regression, we obtain the lowest RMSE of 121  0.2251 and highest PC of 0.6823 when the function a2∙log(SA) + a1∙log(TA) + a0 is used with a2=0.012, a1=0.1539, and a0=0.5518. Therefore:    .55180)log(0.1539)log(0.0120394.0404.5exp11)(TASAbbbhMLMLP  (5.16) Note that, we have also used different functions to model ℎ𝑃(𝛿𝑀𝐿) and found that the logistic function provide the best estimate. Finally, 𝑔𝑃(𝑄𝑃) is estimated using (5.3) and (5.13) to (5.16). Here the frame bitrate of training videos (encoded with the following configuration: GOP= {2, 4, 8, 16, 32, 64}, ML={1, 2, 3, 4, 5, 6, 7}, QP= QP={28, 30, 32, 34, 36, 38, 40}) is used. Using least square regression, we find that the function shown in Table 5.2 that has the lowest RMSE and highest PC is a2∙SA + a1∙TA + a0 with the following model parameters: w=1.1, a2=0.0483, a1=-0.1492, and a0=-6.4872. This provide us an estimation with RMSE=0.4704 and PC=0.6242. Thus 𝑔𝑃(𝑄𝑃) is formulated as: 4872.60.1492-0.0483min1.1)(TASAPQPQPQPg   (5.17) 5.1.4.4 Coding Complexity Model Parameters Estimation  Similar with the bitrate model parameters estimation case, if we assume that the GOP is equal to one, nP in (9) will be equal to zero and thus 𝐶̅ = 𝐶̅ ∙ 𝑣𝐼(𝑄𝑃). Note that, when QP is set to 28, the value 𝑣𝐼(𝑄𝑃) will be close to one. Therefore, 𝐶?̅?  can be estimated from the frame coding complexity of training videos when GOP=1 and QP=28. Recall that 𝐶?̅?  can 122  be represented by any function shown in Table 5.1. Using least square regression, we noticed that function a1∙SA + a0 provides the best estimate with RMSE of 0.6725 and PC of 0.8542. The model parameters for this estimation are: a1=0.0889, and a0=208.6018. Therefore, 𝐶?̅?  is modeled as follows: 𝐶?̅? = 0.0889 ∙ 𝑆𝐴 + 208.6018  (5.18) Having 𝐶?̅?  modeled using (5.18), ), 𝑣𝐼(𝑄𝑃) can then be estimated from the frame coding complexity of training videos when compressed with GOP=1 and QP={28, 30, 32, 34, 36, 38, 40}. The least square regression technique shows that function a1∙SA + a0 from  Table 5.1 provides the best result with RMSE=0.0031 and PC=0.9048. The model parameters obtained are: w=0.9976, a1=-0.0011, and a0=0.0421. Therefore: 0421.00011.0min9976.0)(SAIQPQPQPv   (5.19) Now that 𝐶?̅?  and 𝑣𝐼(𝑄𝑃) are formulated, using (5.9), we can estimate the coding complexity of the video, if we know 𝐶𝑃̅̅ ̅, 𝑢𝑃(𝑀𝐿) and 𝑣𝑃(𝑄𝑃). Note that, according to Table 5.1, the value of 𝑀𝐿  is equal to zero when ML is set to one. In this setting, the value of 𝑢𝑃(𝑀𝐿) will be equal to one (see (5.12)). Furthermore, by setting QP equal to 28, the value of 𝑣𝑃(𝑄𝑃) is also equal to one. Following this approach, the 𝐶?̃?  that can be represented by the functions shown in Table 5.2 is estimated from the frame coding complexity of training videos that are compressed with the following configuration: ML=1, QP=28 and GOP={2, 4, 8, 16, 32, 64}. Using least square regression, we found that the best function that can represent 𝐶𝑃̅̅ ̅ is a2∙SA + a1∙TA + a0. This function provided an estimation with RMSE=0.975 123  and PC=0.9741. The model parameters are: a2=0.0639, a1=0.8404, and a0=168.1378. Thus, 𝐶𝑃̅̅ ̅ is: 𝐶𝑃̅̅ ̅ = 0.0639 ∙ 𝑆𝐴 + 0.8404 ∙ 𝑇𝐴 + 168.1378  (5.20) In order to find the model parameters for 𝑢𝑃(𝑀𝐿), we use (5.18) to (5.20) in addition to setting QP equal to 28, i.e., this will make the value of 𝑣𝑃(𝑄𝑃) close to one. To this end, we use the frame coding complexity from training videos that are encoded with the following configuration: GOP={2, 4, 8, 16, 32, 64}, ML ={1, 2, 3, 4, 5, 6, 7}, and QP= 28. The best result with RMSE=0.0151 and PC=0.9956 is obtained by function a2∙SA + a1∙TA + a0 from Table 5.2. The model parameters for this function are a2=7.2515·10e-4, a1=0.0084, and a0=0.2936. Thus, 𝑢𝑃(𝑀𝐿) is formulated as:  0.29360084.01025.71)( 4   TASAu MLMLP    (5.21) Next, we need to find the model parameters for 𝑣𝑃(𝑄𝑃). This can be done by using equations (5.18) to (5.21) and coding complexity data of the training videos with GOP={2, 4, 8, 16, 32, 64}, ML={1, 2, 3, 4, 5, 6, 7} and QP={28, 30, 32, 34, 36, 38, 40}. We found that by using least regression, the following model parameters w=0.9913 a2=0.0011, a1=-0.0077, and a0=0.1335 for function a2∙SA + a1∙TA + a0 provide us with the best estimation (RMSE=0.2728 and PC=0.76). Thus, 𝑣𝑃(𝑄𝑃) is then modeled as: 1335.0     0077.0  0011.0min  9913.0)(TASAPQPQPQPv   (5.22) 124  5.2 Experiments and Results The performance of the proposed model is analyzed using the 81 test video sequences (note that the video sequences used for training are excluded). To obtain actual values for bitrate and coding complexity, the test video sequences are first encoded using different configurations: GOP ={1, 2, 4, 8, 16, 32, 64}, ML={1, 2, 3, 4, 5, 6, 7} and QP={28, 30, 32, 34, 36, 38, 40}. Then, using the proposed models the bitrate and encoding complexity of the videos for each coding configurations are estimated. The estimated values are then compared with actual values using RMSE and mean absolute percentage error (MAPE). MAPE is formulated as follow [145]: %100'  yyymeanMAPE   (5.23)  where y is the actual value and y’ is the estimated one. Furthermore, the correlation between the estimated and the actual results is analyzed using Pearson’s coefficient correlation and R-squared values. The R-squared is calculated as:    22)(''1ymeanyyysquaredR   (5.24) We also compared the results of the proposed bitrate model with those of an existing scheme, which also takes into account content complexity in addition to coding parameters (similar to our scheme). Note that for estimating the coding complexity, to the best of our knowledge, there is no existing model in the literature that, as our model, takes into account content complexity in addition to coding parameters. The closest possible model that can be related to our coding complexity model is the one proposed in [119], 125  which measures coding complexity using the statistical information of compressed content. In other words, to obtain the required parameters for this model, video content needs to be compressed first. Hence, this approach cannot be used on unknown videos or for real-time applications. Due to these shortcomings, we have excluded this approach from our comparison study. The following subsections elaborate on our proposed models’ performance evaluation results.   5.2.1 Performance Evaluation of the Bitrate Model To compare the bitrate values estimated by our model with the actual ones, the Table 5.4:  Bitrate estimation results Scene PC RMSE (kbps) MAPE (%) R-squared office_act1 0.982 33.126 12.481 0.941 office_act2 0.989 24.155 16.254 0.964 office_act3 0.993 20.935 10.156 0.978 classroom_act1 0.983 35.596 13.120 0.949 classroom_act3 0.983 32.172 20.068 0.952 classroom_act4 0.986 31.849 20.026 0.958 party_act2 0.983 44.700 10.896 0.949 party_act3 0.986 54.229 16.332 0.920 party_act4 0.983 54.978 13.867 0.904 all sequences  (81 videos) 0.984 38.582 14.800 0.949   Table 5.5:  Bitrate estimation results for select videos Video sequence PC RMSE (kbps) MAPE (%) R-squared camera5_office_act3 0.999 4.919 3.842 0.997 camera3_party_act4 0.987 86.345 17.374 0.798    126  estimated values are multiplied by the sequences’ frame rates, as our model estimates the average bitrate per frame (see (5.3)).  Table 5.4 shows the bitrate estimation results for each scene. Note that, each scene contains videos from all nine cameras. As it is observed, the performance of the bitrate model varies over the scenes. The reported results for all the test sequences confirm high correlation between the estimated values and the actual ones (PC=0.984, R-squared 0.949) and small estimation error (RMSE=35.582, MAPE=14.8%). Figure 5.1 shows the estimated and actual bitrate for some test videos (captured by different cameras in different scenarios) when GOP size, ML and QP values are changing. As it is observed, there is a linear relationship between the estimated and actual bitrate values with a negligible error in some cases. To evaluate the performance of the model for a single video, the performance of our proposed model is objectively evaluated for  Figure 5.1: Actual and estimated bitrate for varying GOP sizes, ML and QP values for the following videos (a) camera1_office_act1, (b) camera4_office_act1, (c) camera5_ffice_act3, (d) camera2_classroom_act3, (e) camera6_party_act2, and (f) camera3_party_act4.   127  camera5_office_act3 and camera3_party_act4, where the performance of our model is the best and the worst, respectively. As Table 5.5 shows, in the case of camera5_office_act3 test sequence, the highest correlation between the estimated and actual bitrate values (PC=0.99954, R-squared=0.99741) is obtained and the estimation error is quite negligible (RMSE=4.9189, MAPE=3.8417%), however for camera3_party_act4 video, the results are not as good as camera5_office_act3 video with PC=0.98679, RMSE=86.345, MAPE=17.374% and R-squared=0.79784. 5.2.1.1 Comparison with an Existing Bitrate Estimation Model To investigate the performance of the proposed bitrate model, the estimated bitrates using our model are compared with those estimated by the model presented in [115] that utilizes the spatial and temporal information. Considering that the proposed model in [115] uses a fixed GOP size which is equal to the frame rate, for our experiments we used GOP = 16 which is quite close to the frame rate of the test videos (15 fps). Furthermore, since the technique in [115] does not take into account the effect of block size candidates used for motion estimation, we assume that the encoder uses the configuration setting which is equal to using ML=7. Since GOP size, ML and frame rate are fixed, QP is the only varying parameter (i.e., QP = {28, 30, 32, 34, 36, 38, 40}). Table 5.6:  Comparison with existing model Performance metric Proposed bitrate model Lotterman et. al. [113] PC 0.966 0.922 RMSE (kbps) 29.6453 45.676 MAPE (%) 17.819 19.201 R-squared 0.915 0.6709      128  Table 5.6 compares the performance of the proposed model over the 81 video test sequences in terms of PC, RMSE, MAPE, and R-squared against the technique in [115]. Note that, for the technique proposed in [115], the model parameters are set according to the authors’ suggestions and QPmin=24 (see [115] for more details). Table 5.6 confirms that our proposed model is superior to the existing technique in all terms of performance measures.  5.2.2 Performance Evaluation of the Coding Complexity Model To compare the complexity values estimated by our model with the actual ones, in order to obtain the coding complexity per second, (5.4) is multiplied by the frame rate. Table 5.7 shows the coding complexity estimation results for each scene. Note that, each Table 5.7:  Coding complexity estimation results Scene PC RMSE (millions of instruction) MAPE (%) R-squared office_act1 0.996 59.274 1.282 0.971 office_act2 0.997 44.233 1.090 0.983 office_act3 0.996 44.189 1.115 0.983 classroom_act1 0.997 54.490 1.223 0.976 classroom_act3 0.997 49.975 1.176 0.978 classroom_act4 0.996 50.696 1.225 0.978 party_act2 0.997 60.291 1.292 0.978 party_act3 0.997 47.667 1.147 0.983 party_act4 0.997 65.317 1.467 0.969 all sequences (81 videos) 0.996 53.368 1.224 0.978   Table 5.8:  Coding complexity estimation results for select videos Video sequence PC RMSE (millions of instruction) MAPE (%) R-squared camera4_office_act1 0.999 33.344 0.849 0.991 camera1_office_act1 0.997 87.270 1.863 0.939    129  scene contains videos from all nine cameras. As it is observed, there is a high correlation between the estimated and actual coding complexity values (PC=0.996, R-squared 0.978) and the estimation error is low (RMSE=53.368, MAPE=1.224%). Figure 5.2 shows the estimated and actual bitrates for some test videos (captured via different cameras in different scenes) when GOP size, ML and QP values are changing. As it is observed, there is a linear relationship between the estimated and actual coding complexity values with a negligible error in some cases. To evaluate the performance of the model for a single video, the performance of our proposed model is objectively evaluated for camera4_office1_act1 and camera1_office1_act1, where the performance of our model is the best and the worst, respectively. As Table 5.8 shows, in the case of camera4_office1_act1 test sequence, the highest correlation between the estimated and actual bitrate values (PC=0.99858, R-squared=0.99086) is obtained and the estimation error is quite negligible (RMSE=33.344,  Figure 5.2: Actual and estimated coding complexity for varying GOP sizes, ML and QP values for the following videos (a) camera1_office_act1, (b) camera4_office_act1, (c) camera5_ffice_act3, (d) camera2_classroom_act3, (e) camera6_party_act2, and (f) camera3_party_act4.  130  MAPE=0.84909%). However for camera1_office1_act1 video, the results are not as good as for camera4_office1_act1, with PC=0.99722, RMSE=87.27, MAPE=1.8634% and R- squared=0.93303.  5.3 Power Consumption Estimation using the Proposed Models  In order to estimate the video surveillance network’s power consumption, we consider the network layout shown in Figure 5.3. In order to perform the experiments, the network is modeled as a graph G(V,L), where V is the set of nodes and L is the set of links. It is assumed that a standard medium access control (MAC) protocol is applied to resolve the link interference problem. Note that i and j are two nodes in our network. Node i can communicate with node j if a link between those nodes (LijL) exists. In addition to that, we also assume that the link Lij exists if i and j are direct neighbors. Node i can capture and encode video, and then generate video traffic with bitrate 𝑅?̅?. Note that, 𝑅?̅? is estimated using our proposed model. Each node can also relay the traffic from the upstream nodes. The flow conservation law at each node is then:  rikiij FRrr   (5.25) where Fr is the video frame rate,  rij denotes the outgoing bitrate at Lij, rki is the incoming bitrate at Lki, and Lij, LkiL. The sum of transmission rates of all the nodes is constrained to be equal to the available bandwidth (B): BRF ir    (5.26) 131  As reported in [140], the network’s power consumption is reduced if the nodes that are further from the sink are allocated with smaller bitrate than the nodes’ that are closer to the sink. However, due to the difference in scene complexity, video captured by node i may have less bitrate than node j that is located further from the sink that node i. On the other hand, all data transmissions are assumed to go through the first node.  Therefore, we assume that the node closest to the sink has a higher bitrate than the other nodes:  1RRi    (5.27) The general model suggested by [127] for energy consumption of a wireless communication transmitter and receiver is used in this study. Thus, the total transmission power consumption of node i is equal to the sum of all power consumed to transmit data to other nodes within its transmission range as follows:   ijijti rdP )(   (5.28) where, Pti is the transmission power consumption of node i,  and  are constant coefficients,  is the path loss exponent, and dij is the distance between node i and node j.  Figure 5.3: Video surveillance network layout   132  The total reception power consumption of node i is equal to the sum of all the power consumed to receive data from other nodes, as formulated below, where  is a constant coefficient:   kiri rP    (5.29) To measure the required energy for encoding at each node, we need to multiply the total number of required cycles for encoding with the average energy depleted per cycle. The total number of cycles to execute encoding can be estimated as the multiplication of the total number instruction count with a parameter called cycle per instruction (CPI). The average encoding power consumption is then estimated as [141]:  ecriei ECPIFCP    (5.30)  where, 𝐶?̅? is the coding complexity of node i estimated using the proposed model, Fr is the frame rate, CPI is the average number of cycles per instruction of the CPU, and Ec is the energy depleted per cycle.  Furthermore, the total energy dissipation at a sensor node consists of the encoding power consumption (Pe), the transmission power consumption (Pt) and the reception power consumption (Pr):  ritieii PPPP    (5.31)  The objective is to minimize the maximum energy consumption among all nodes, i.e., minimize Pnet where PiPnet,iN. This optimization problem is shown in Figure 5.4. Our goal is to find the encoding parameter settings, i.e., GOP and ML, for each node that minimize the video surveillance network’s power consumption. Note that we assume that 133    Figure 5.4: Video surveillance network power consumption minimization  134  all nodes in the surveillance network use the same QP. However, different scenes may need to use different QP values. The minimum QP values that can accommodate the network bandwidth for the scene will be used. As can be seen from the figure, the encoding power consumption of node iV is calculated from the coding complexity at node i. The coding complexity of node i itself is estimated using the coding complexity model described in the previous sections. In order to estimate the coding complexity of node i, the spatial and temporal information of the video captured on the node, i.e., SAi and TAi, as well as the variable GOP size and ML value are needed. On the other hand, the node’s transmission power consumption depends on the bitrate of the encoded video. Similar to the coding complexity, the spatial and temporal information of video as well as the variable GOP size and ML value of node i will be used to estimate the bitrate of the node. Note that, it is assumed that instead of the real value ML, we will use the ML value in allocating the encoder configuration at the node in the surveillance network. Therefore, in the optimization procedure shown in Figure 5.4, an additional equation to calculate ML is used, Table 5.9:  Parameters used Parameter Description value  Energy cost for transmitting 1 bit 0.5 J/Mb  Transmit amplifier coefficient 1.310-8 J/Mb/m4  Energy cost for receiving 1 bit 0.5 J/Mb  Path loss exponent 4 CPI XScale average cycle per instruction  1.78 Ec Energy depleted per cycle  1.215 nJ B Network Bandwidth 2 Mbps d Distance between node 5m     135  i.e., 𝛿𝑀𝐿 = 0.1704 ∙ 𝑀𝐿 − 0.1943. At the beginning of the optimization procedure, each node is assigned to use the initial GOP value of 16 while ML is set equal to seven. Since the GOP and ML values obtained from the optimization procedure are in the form of floating number, the GOP and ML values were rounded to the nearest integer. Then the optimization algorithm is re-run with the values of the GOP and ML considered as constant, i.e., not variables. If the result from the rounded values violates the bandwidth constraint, we use the ceil function to round the GOP and ML values obtained from the first run of the optimization, followed by a re-run of the optimization with a newly ceiled constant of the GOP and ML values. Table 5.9 shows the parameters used in our experiments.  Table 5.10 shows the GOP and ML values of each node for office_act2, classroom_act3 and party_act4 obtained from the optimization procedure. In terms of video quality, each  Table 5.10:  GOP and ML values obtained for three scenes selected Node Test sequence office_act2 classroom_act3 party_act4 GOP ML GOP ML GOP ML 1 13 1 9 1 34 1 2 22 1 40 3 64 1 3 10 1 35 3 64 1 4 22 1 29 2 64 1 5 34 1 64 3 64 1 6 11 1 64 3 64 1 7 26 1 39 3 64 1 8 23 1 36 3 64 1 9 35 2 64 3 64 1   136  video node is assumed to use the same QP. However, the QP used for each scene may not be the same due to bandwidth restrictions. In this regard, our experiment suggests that the QP for the three scenes analyzed, i.e., office_act2, classroom_act3, and party_act4, as mentioned in the previous paragraphs, are using the following QPs: 29, 30 and 37, respectively. Furthermore, Figure 5.6 shows the nodes’ power consumption profile for the three scenes analyzed. It can be seen from this figure that the nodes’ maximum power consumption for the three scenes are not the same. This is because the content captured by each camera node in each scene is not the same. Note that, the communication power consumption shown in Figure 5.6 includes the transmission and reception power consumption. It can be seen from the figure that the nodes’ maximum power consumption  Figure 5.5: Nodes’ video PSNR for the three test scenes analyzed   Figure 5.6: Nodes’ power consumption profile for (a) office_act2 (b) classroom_act3 and (c) party_act4 scenes    137  (Pnet) in the office_act2 scene are 7.55 W. On the other hand, the Pnet for the classroom_act3 and party_act4 scene were 7.14W and 7.36W, respectively. As it is observed from the figure, the encoding power consumption per each node in each scene is not the same.   The nodes’ video quality encoded with the above-mentioned QPs and encoding parameters shown in Table 5.10 in terms of PSNR is shown in Figure 5.5. It can be seen that the PSNR of the encoded video per each node of the same scene is not the same. This is due to the fact that the videos captured by each node are not the same and the difference in the GOP and ML values used to encode the video. However, the PSNR difference of these videos is not significant, except for the party_act4 scene. This is because of the variation of SA and TA in that particular scene where the video captured by node 4, 5 and 6 have lower SA and TA values than the other videos captured at the same scene setting.   Table 5.11 shows the Pnet of each test scene obtained by 1) the proposed optimization approach utilizing the coding complexity and bitrate models (Proposed - Table 5.11:  Nodes’ maximum power consumption Scene Nodes’ maximum power consumption (W) Proposed -  estimated Proposed - actual Technique in [136] office_act1 7.548 7.620 8.772 office_act2 7.553 7.397 8.341 office_act3 7.239 7.242 8.404 classroom_act1 7.739 7.808 8.794 classroom_act3 7.140 6.876 8.232 classroom_act4 7.257 7.070 8.412 party_act2 7.910 7.550 8.841 party_act3 7.377 7.117 8.757 party_act4 7.359 7.275 8.924   138  estimated), 2) the proposed optimization approach and the coding complexity and bitrate of the videos (Proposed - actual), and 3) using the technique proposed [140]. We observe that the proposed algorithm manages to reduce the video surveillance network power consumption obtained by the technique proposed in [140] by 1.15W on average. In addition, the table shows that the results obtained by the proposed algorithm utilizing the models described in Section 5.1 are very close to the results obtained using the measured coding complexity and bitrate data. The range of error is between 0.002W and 0.36W. Overall, the proposed method manages to estimate the video surveillance network’s power consumption within a marginal error of RMSE=0.195 and MAPE=2.23%. 5.4 Summary In this chapter, we proposed models for estimating the coding complexity and bitrate for H.264/AVC-based video surveillance applications. The proposed models incorporate content’s spatial and temporal complexity as well as encoder parameter settings that significantly affect compression performance. For the purpose of the analysis, a large number of real-life content with diverse spatial and temporal information was captured. Some of the captured videos were used as the training set while the remaining videos were used to test our models. It was shown that the proposed models are able to estimate the coding complexity and bitrate with high accuracy. It was also shown that the proposed models outperform existing techniques in all performance measurements, i.e., Pearson’s correlation coefficient, root mean square error, mean absolute percentage error, and R-squared values.   139  We also proposed an optimization procedure utilizing the proposed models. The objective is to find the nodes’ encoding parameter settings that minimize the video surveillance network’s power consumption, while considering the following factors: 1) that videos captured by each node have different spatial and temporal complexity, and 2) the nodes are located at different distance from the sink and thus may need to relay information from the other nodes. Overall, the proposed method manages to estimate the video surveillance network’s power consumption within a marginal error of RMSE=0.195 and MAPE=2.23%.  140   Conclusions and Future Work  6.1 Summary of Thesis Contributions This thesis studies the issues related to the trade-off between video encoding complexity and communication energy consumption for an energy efficient VSN for surveillance applications. The main goal of the thesis is to investigate the factors affecting the video surveillance networks` power consumption and propose a methodology that exploits the trade-off between these factors in order to minimize the VSN`s overall power consumption. The methods proposed in this thesis incorporate various aspects of video encoding and the video surveillance network’s deployment settings to address this challenge. These include video encoding parameters used at each node, the spatial and temporal complexity of content and the network topology. The performance of the proposed methods has been verified through large-scale experiments using a dataset comprised of real-life captured content. We started our investigation by proposing a practical implementation of the existing power-rate-distortion model such that the nodes’ bitrate in the video surveillance network is controlled using variable frame rate. This was achieved by assuming that the video surveillance networks’ deployment area can be divided into several zones, where the encoding configuration of nodes in each zone is related to the video frame rate. This allowed us to regulate the overall power consumption by controlling the video frame rate. It was shown that in terms of system lifetime, the proposed approach is better than the common single configuration setting, but inferior than the optimal PRD-based solution 141  reported in [63]. However, controlling the bitrate alone is not enough since there are other encoding parameters that provide better bitrate control and video quality trade-off. For this purpose, we investigated the effect of various encoder parameters towards coding complexity and bitrate in Chapter 3. The objective was to determine the encoding parameters that have the highest impact on the coding complexity and bitrate of the video. This is because the coding complexity and bitrate of videos determine the encoding and communication power consumption of a video surveillance network. We showed that the GOP size significantly affects coding complexity and bitrate. In addition, we also showed that the tuples containing encoding configuration settings, coding complexity and bitrate of the videos that were stored in a form of a lookup table could be used to exploit the trade-off between encoding and communication power consumption. Furthermore, different nodes in a video surveillance network may capture a scene from different angles such that each view is unique. Video content affects coding complexity and bitrate and thus it needs to be taken into consideration in minimizing the video surveillance network’s power consumption. For that purpose, we needed to have a video surveillance dataset that has a wide range of content complexity. Unfortunately, such video surveillance dataset was not available.  In order to consider the effect of content, we generated a video surveillance dataset containing a diverse set of content. The dataset was generated by installing nine HD cameras in one of our labs to capture simulated events with varying degree of spatial details and motion activity levels. The cameras were arranged such that their point of views are different. This arrangement provided us with a video dataset consisting of a large 142  number of unique videos that have a wide variety of content complexity. The dataset was made available for public use to facilitate further research in this area [121]. In our first attempt to account the effect of content complexities on power consumption minimization, we proposed a classification technique to divide the videos into training and test sets. Furthermore, we also proposed a look-up table denoting different encoding configuration ID that can be used by each node in the video surveillance network. This look-up table was then used in the proposed fairness-based power consumption minimization for each scene setting. Although the look-up table approach showed a good performance, it was not practical nor efficient for a real-life implementation of video surveillance networks. In this regard, we believed that models that can estimate the coding complexity and bitrate of videos will have a higher performance than the look-up table approach. In Chapter 4, models to estimate the coding complexity and bitrate were proposed. These models considered the effect of GOP size and the type and variety of block size candidates used in motion estimation on coding complexity and bitrate. In Chapter 4, the effect of content complexities was represented by using data from encoding the first two frames of videos of each scene. In order to improve the estimation accuracy, in Chapter 5, we improved our models by incorporating the effect of quantization parameter and content complexity in terms of spatial and temporal information. The models were trained and evaluated using the dataset created in Chapter 3. A comprehensive examination of our model using test videos with different error measurement metrics was performed. The comparison of the proposed model and an existing state-of-the-art technique reconfirmed the superiority of our approach. Finally, we combined the proposed models that consider 143  encoding parameters and content complexities, network layout and nodes’ position in the network in an optimization procedure to minimize the overall power consumption. The objective was to find the nodes’ encoding configuration that leads to minimal power consumption. The results obtained from the proposed optimization technique and the work described in Chapter 3 were compared. It was shown that, compared to existing techniques, the proposed optimization technique managed to significantly reduce the video surveillance network’s power consumption. 6.2 Significance and Potential Applications of the Research The research presented in this thesis aims at determining the encoder parameter settings per each video surveillance network’s node that lead to minimum power consumption. Our findings in Chapter 3 are helpful in defining encoder parameter settings that significantly affect coding complexity and bitrate of the video. The same chapter also highlighted the fact that the complexity of spatial and temporal content have a direct impact on coding efficiency and bitrate, something that unfortunately has been neglected by existing works in the video surveillance network field. An approach to classifying videos and events based on content complexity was suggested in Chapter 3. In addition to that, a simple look-up table method for minimizing video surveillance networks’ power consumption was proposed. The models proposed in Chapter 4 and defined in Chapter 5 have various potential applications, especially for devices or infrastructures with limited resources. Such applications include wireless surveillance, ad-hoc video monitoring, habitat monitoring, video-based Internet of Things, wearable MPEG [146], green MPEG [147] and any 144  technology that allows the proliferation of video application on pervasive computing platforms. All the above-mentioned applications and technologies could benefit from the findings investigated in this thesis. 6.3 Future Work 6.3.1 The Use of HEVC HEVC is the latest MPEG ITU-T video coding standard that comes with an increased video compression performance at the cost of almost double coding complexity than H.264/AVC [148]. Although HEVC is currently not widely used in many low-power devices, future adoption of HEVC is inevitable. There has been some effort to analyze HEVC coding complexity and bitrate performance in the literature. Unfortunately, the focus of these efforts was mostly on the decoding end [149]. Note that, HEVC comes with plenty of coding tools and modes that were not available in H.264/AVC. However, we can follow the same approach described in Chapter 3 and extend it to apply to the new HEVC coding standard in the future. 6.3.2 Event-triggered Content-Based Video Surveillance Network The model developed in this thesis assumes that all the nodes in the video surveillance network encode and transmit videos of the captured event. This is performed in order to simulate the video surveillance network under a heavy load that signifies maximum power consumption and minimum system lifetime. Nevertheless, in order to prolong video surveillance networks’ system lifetime, we can utilize content information captured by the video nodes to regulate the video encoding process in the whole system. 145  Note that, the calculation of SI and TI of videos is performed on raw video data, i.e., we do not need to encode the video to get content information. Therefore, based on the value of content complexity, i.e., SI and TI values, we can decide the following: 1. Whether or not the video surveillance network needs to capture an event. This could be done by using a simple thresholding mechanism, such that if there is an increase in the SI or TI values, the video surveillance network can start or stop encoding the event. 2. The nodes that should encode or transmit video. Based on the captured content complexity at each node, we can build a heuristic or an optimization approach to decide the number of nodes needed to capture an event and assign the specific nodes to encode and transmit the encoded video. As our experiments suggest, some nodes can have a lower SI or TI than the other nodes during an event. In addition, the number of nodes with high SI or TI increases as the event’s content complexity increases, i.e., party_act1 event has more spatial and temporal information than the office_act4 event. By considering the event’s content complexity, nodes’ SI and TI values, nodes’ remaining energy resources, we can formulate a method to find the most optimum video capturing process per each event. 6.3.3 Content Based Frame Rate Adjustment In Chapter 2, the effect of the frame rate on video surveillance networks’ power consumption was investigated. On the other hand, the effect of content complexity of videos with the same frame rate was investigated in Chapter 3, 4 and 5. We can extend the 146  work in this thesis by incorporating both the video frame rate and content complexity into our model. For example, instead of assigning each video node to encode video with the same frame rate, we can assign the video frame rate at each node based on the SI or TI values of the node. It is worth noting that assigning low video frame rate results to video jerkiness. Nevertheless, the energy saved from using a lower video frame rate is significant. Therefore, a study to find the correlation between video perception quality, frame rate and content complexity need to be performed. 6.3.4 Cloud Based Video Surveillance Network  We have shown that the video encoding process at a video node contributes to a significant portion of the overall power consumption. Furthermore, it was also shown that it is more beneficial in terms of power consumption to use a simple encoding approach, i.e., intra coding only, for a short distance communication. However, the use of intra only coding entails high bandwidth or storage. Therefore, there is a need to transcode the intra-only coded video into another format that is more efficient for storage and later distribution. This can be done by establishing a cloud-based video surveillance network, where the sink is a server that is placed at the edge of a cloud platform. Assuming that there is no issue with bandwidth availability with the coming UWB standard for a video surveillance sensor network, we can shift the burden of the video encoding process from the node to the sink or the cloud. The cloud will transcode the video from the video surveillance sensor network nodes from intra-coded into another video format for efficient distribution. The cloud can also perform further processing such as video analysis and feature extraction to obtain a much more detailed information from the captured event. 147  References [1] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “A Survey on Sensor Networks,” IEEE Commun. Mag., vol. 40, no. 8, pp. 102–114, Aug. 2002. [2] N. Xu, S. Rangwala, K. K. Chintalapudi, D. Ganesan, A. Broad, R. Govindan, and D. Estrin, “A Wireless Sensor Network for Structural Monitoring,” in 2nd international conference on Embedded networked sensor systems, New York, 2004. [3] T. Naumowicz, R. Freeman, H. Kirk, B. Dean, M. Calsyn, A. Liers, A. Braendle, T. Guilford, and J. Schiller, “Wireless Sensor Network for Habitat Monitoring on Skomer Island,” in 2010 IEEE 35th Conference on Local Computer Networks (LCN), 2010, pp. 882–889. [4] V. Shnayder, Chen B.-r., K. Lorincz, T. R. F. F. Jones, and M. Welsh, “Sensor Networks for Medical Care,” in Proceedings of the 3rd international conference on Embedded networked sensor systems (SenSys 05), New York, 2005, pp. 314–314. [5] Hoo-Rock Lee, Kyung-Yul Chung, and Kyoung-Son Jhang, “A Study of Wireless Sensor Network Routing Protocols for Maintenance Access Hatch Condition Surveillance,” J. Inf. Process. Syst., vol. 9, no. 2, pp. 237–246, 2013. [6] Zhen-Jiang Zhang, Jun-Song Fu, Hua-Pei Chiang, and Yueh-Min Huang, “A Novel Mechanism for Fire Detection in Subway Transportation Systems Based on Wireless Sensor Networks,” Int. J. Distrib. Sens. Netw., vol. 2013, 2013. [7] G. Werner-Allen, J. Johnson, M. Ruiz, J. Lees, and M. Welsh, “Monitoring Volcanic Eruptions with a Wireless Sensor Network,” in Proceeedings of the Second European Workshop on Wireless Sensor Networks (EWSN 2005), 2005, pp. 108–120. [8] T. Miyazaki, R. Kawano, Y. Endo, and D. Shitara, “A Sensor Network for Surveillance of Disaster-hit Region,” in Proceedings of the 4th international conference on Wireless pervasive computing, 2009, pp. 53–58. [9] I. F. Akyildiz, T. Melodia, and K. R. Chowdhury, “A Survey on Wireless Multimedia Sensor Networks,” Comput. Netw. Int. J. Comput. Telecommun. Netw., vol. 51, no. 4, pp. 921–960, Mar. 2007. [10] I. T. Almalkawi, M. G. Zapata, J. N. Al-Karaki, and J. Morillo-Pozo, “Wireless Multimedia Sensor Networks: Current Trends and Future Directions,” Sensors, vol. 10, no. 7, pp. 6662–6717, Jul. 2010. [11] C.-F. Chiasserini and E. Magli, “Energy Consumption and Image Quality in Wireless Video-Surveillance Networks,” in Personal, Indoor and Mobile Radio Communications, 2002. The 13th IEEE International Symposium on, 2002, vol. 5, pp. 2357–2361 vol.5. [12] K. Obraczka, R. Manduchi, and J. J. Garcia-Luna-Aveces, “Managing the Information Flow in Visual Sensor Networks,” in Wireless Personal Multimedia Communications, 2002. The 5th International Symposium on, 2002, vol. 3, pp. 1177–1181 vol.3. [13] S. Soro and W. Heinzelman, “A Survey of Visual Sensor Networks,” Adv. Multimed., vol. 2009, pp. 1–21, 2009. [14] Z. Shuai, S. Oh, and M.-H. Yang, “Traffic Modeling and Prediction Using Camera Sensor Networks,” in Proceedings of the Fourth ACM/IEEE International Conference on Distributed Smart Cameras, 2010, pp. 49–56. 148  [15] Xiaoguang Niu, Ying Zhu, QingQing Cao, Xining Zhang, Wei Xie, and Kun Zheng, “An Online-Traffic-Prediction Based Route Finding Mechanism for Smart City,” Int. J. Distrib. Sens. Netw., vol. 2015, 2015. [16] S. C. Mukhopadhyay, A. Gaddam, and G. S. Gupta, “Wireless Sensors for Home Monitoring - A Review,” Recent Pat. Electr. Eng., vol. 1, no. 1, pp. 32–39, 2010. [17] Jae Mun Sim, Yonnim Lee, and Ohbyung Kwon, “Acoustic Sensor Based Recognition of Human Activity in Everyday Life for Smart Home Services,” Int. J. Distrib. Sens. Netw., vol. 2015, 2015. [18] J. C. Yang, “Vision Based Fire/Flood Alarm Surveillance System via Robust Detection Strategy,” in Instrumentation and Measurement Technology Conference Proceedings, 2008. IMTC 2008. IEEE, Victoria, BC, 2008, pp. 1085–1090. [19] Xufeng Wei, Yahui Wang, and Yanliang Dong, “Design of Fire Detection System in Buildings Based on Wireless Multimedia Sensor Networks,” in Intelligent Control and Automation (WCICA), 2014 11th World Congress on, 2014, pp. 3008–3012. [20] Panti, Berner, Monteiro, Pedro, Pereira, Fernando, and Ascenso, Joao, “Descriptor-based Adaptive Tracking-by-Detection for Visual Sensor Networks,” in Multimedia Expo Workshops (ICMEW), 2015 IEEE International Conference on, 2015, pp. 1–6. [21] Civelek, M., Yazici, A., Yilmazer, C., and Korkut, F.O., “Feature Extraction and Object Classification for Target Identification at Wireless Multimedia Sensor Networks,” in Signal Processing and Communications Applications Conference (SIU), 2014 22nd, 2014, pp. 778–781. [22] Dan, G., Khan, M.A., and Fodor, V., “Characterization of SURF and BRISK Interest Point Distribution for Distributed Feature Extraction in Visual Sensor Networks,” Multimed. IEEE Trans. On, vol. 17, no. 5, pp. 591–602, May 2015. [23] Shun-Hsing Ou, Chia-Han Lee, Somayazulu, V.S., Yen-Kuang Chen, and Shao-Yi Chien, “On-Line Multi-View Video Summarization for Wireless Video Sensor Network,” Sel. Top. Signal Process. IEEE J. Of, vol. 9, no. 1, pp. 165–179, Feb. 2015. [24] E. Eriksson, G. Dán, and V. Fodor, “Predictive Distributed Visual Analysis for Video in Wireless Sensor Networks,” IEEE Trans. Mob. Comput., vol. 15, no. 7, pp. 1743–1756, Jul. 2016. [25] Juan C Augusto, Vic Callaghan, Diane Cook, Achilles Kameas, and Ichiro Satoh, “Intelligent Environments: A Manifesto,” Hum.-Centric Comput. Inf. Sci., vol. 3, no. 12, Jun. 2013. [26] HyungWon Kim, “Low Power Routing and Channel Allocation Method of Wireless Video Sensor Networks for Internet of Things (IoT),” in Internet of Things (WF-IoT), 2014 IEEE World Forum on, 2014, pp. 446–451. [27] P.-Y. Chen, W.-S. Lee, and C.-F. Huang, “Design and Implementation of a Real Time Video Surveillance System with Wireless Sensor Networks,” in Vehicular Technology Conference, 2008. VTC Spring 2008. IEEE, Singapore, 2008, pp. 218–222. [28] C. Pham, “Low Cost Wireless Image Sensor Networks for Visual Surveillance and Intrusion Detection Applications,” in Networking, Sensing and Control (ICNSC), 2015 IEEE 12th International Conference on, 2015, pp. 376–381. 149  [29] Petri M¨ah¨onen, “Wireless Video Surveillance: System Concepts,” in Image Analysis and Processing, 1999. Proceedings. International Conference on, Venice, 1999, pp. 1090–1095. [30] M. Valera and S. A. Velastin, “Intelligent Distributed Surveillance Systems: A Review,” IEE Proceeding - Vis. Image Signal Process., vol. 152, no. 2, pp. 192–204, Apr. 2005. [31] T. D. R¨aty, “Survey on Contemporary Remote Surveillance Systems for Public Safety,” Syst. Man Cybern. Part C Appl. Rev. IEEE Trans. On, vol. 40, no. 5, pp. 493–515, Sep. 2010. [32] R. Holman, J. Stanley, and T. Ozkan-Haller, “Applying Video Sensor Networks to Nearshore Environment Monitoring,” Pervasive Comput. IEEE, vol. 2, no. 4, pp. 14–21, Oct. 2003. [33] M. Rahimi, R. Baer, O. I. Iroezi, J. C. Garcia, J. Warrior, D. Estrin, and M. Srivastava, “Cyclops: In Situ Image Sensing and Interpretation in Wireless Networks,” in Proceedings of the 3rd international conference on Embedded networked sensor systems, San Diego, CA, 2005, pp. 192–204. [34] “Intel® PXA255 Processor Electrical, Mechanical, and Thermal Specification Datasheet.” Intel Corporation. [35] W.-C. Feng, E. Kaiser, W.-C. Feng, and M. L. Bailif, “Panoptes: Scalable Low-power Video Sensor Networking Technologies,” ACM Trans. Multimed. Comput. Commun. Appl. TOMCCAP, vol. 1, no. 2, pp. 151–167, May 2005. [36] C. B. Margi, X. Lu, G. Zhang, G. Stanek, R. Manduchi, and K. Obraczka, “Meerkats: A Power-Aware, Self-Managing Wireless Camera Network for Wide Area Monitoring,” in Distributed Smart Cameras Workshop - SenSys06, 2006. [37] “Imote2 Hardware Reference Manual.” [Online]. Available: http://web.univ-pau.fr/~cpham/ENSEIGNEMENT/PAU-UPPA/RESA-M2/DOC/Imote2_Hardware_Reference_Manual.pdf. [Accessed: 07-Dec-2012]. [38] Anthony Rowe, Adam Goode, Dhiraj Goel, and Illah Nourbakhsh, “CMUcam3: An Open Programmable Embedded Vision Sensor,” Carnegie Mellon University, Robotics Institute, CMU Technical Report CMU-RI-TR-07-13, May 2007. [39] Dimitrios Lymberopoulos and Andreas Savvides, “XYZ: A Motion-enabled, Power Aware Sensor Node Platform for Distributed Sensor Network Applications,” in Proceeding IPSN ’05 Proceedings of the 4th international symposium on Information processing in sensor networks, 2005, pp. 449–454. [40] P. Chen, P. Ahammad, C. Boyer, Shih-I Huang, Leon Lin, E. Lobaton, M. Meingast, Songhwai Oh, S. Wang, Posu Yan, A.Y. Yang, Chuohao Yeo, Lung-Chung Chang, J.D. Tygar, and S.S. Sastry, “CITRIC: A Low-bandwidth Wireless Camera Network Platform,” in Distributed Smart Cameras, 2008. ICDSC 2008. Second ACM/IEEE International Conference on, 2008, pp. 1–10. [41] A. Kandhalu, A. Rowe, and R. Rajkumar, “DSPcam: A Camera Sensor System for Surveillance Networks,” in Distributed Smart Cameras, 2009. ICDSC 2009. Third ACM/IEEE International Conference on, Como, 2009, pp. 1–7. [42] X. Ren and Z. Yang, “Research on the Key Issue in Video Sensor Network,” in Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on, Chengdu, 2010, vol. 7, pp. 423–426. 150  [43] M. Al-Nuaimi, F. Sallabi, and K. Shuaib, “A Survey of Wireless Multimedia Sensor Networks: Challenges and Solutions,” in Innovations in Information Technology (IIT), 2011 International Conference on, Abu Dhabi, 2011, pp. 191–196. [44] S. Soro and W. B. Heinzelman, “On the Coverage Problem in Video-based Wireless Sensor Networks,” in Broadband Networks, 2005. BroadNets 2005. 2nd International Conference on, Boston, MA, 2005, vol. 2, pp. 932–939. [45] N. Tezcan and W. Wang, “Self-Orienting Wireless Multimedia Sensor Networks for Maximizing Multimedia Coverage,” in IEEE International Conference on Communications (ICC’08), Beijing, China, 2008, pp. 2206–2210. [46] A. Makhoul, R. Saadi, and C. Pham, “Adaptive Scheduling of Wireless Video Sensor Nodes for Surveillance Applications,” in Proceedings of the 4th ACM workshop on Performance monitoring and measurement of heterogeneous wireless and wired networks, 2009, pp. 54–60. [47] M. A. Guvensan and A. G. Yavuz, “On Coverage Issues in Directional Sensor Networks: A Survey,” J. Ad Hoc Netw., vol. 9, no. 7, pp. 1238–1255, Sep. 2011. [48] Florence G. H. Yap and Hong-Hsu Yen, “A Survey on Sensor Coverage and Visual Data Capturing/Processing/Transmission in Wireless Visual Sensor Networks,” sensors, no. 14, pp. 3506–3527, Feb. 2014. [49] J. Park, P. C. Bhat, and A. C. Kak, “A Look-Up Table Based Approach for Solving the Camera Selection Problem in Large Camera Networks,” in in Proceedings of the International Workshop on Distributed Smart Cameras, 2006. [50] S. Soro and W. Heinzelman, “Camera Selection in Visual Sensor Networks,” in Advanced Video and Signal Based Surveillance, 2007. AVSS 2007. IEEE Conference on, 2007, pp. 81–86. [51] S. M. Amiri, P. Nasiopoulos, and V. C. M. Leung, “Collaborative Routing and Camera Selection for Visual Wireless Sensor Networks,” Commun. IET, vol. 5, no. 17, pp. 2443–2450, Nov. 2011. [52] R. Ghazalian, A. Aghagolzadeh, and S. M. H. Andargoli, “Energy Consumption Minimization in Wireless Visual Sensor Networks Using Convex Optimization,” in Telecommunications (IST), 2014 7th International Symposium on, 2014, pp. 312–315. [53] Alarcon-Herrera, J. and Chen Xiang, “Graph-Based Visual Sensor Deployment and Optics Optimization,” in Control Conference (CCC), 2014 33rd Chinese, 2014, pp. 486–491. [54] Halder, S. and Ghosal, A., “Enhancing the Lifespan of Visual Sensor Networks Using a Predetermined Node Deployment Strategy,” in Computers and Communication (ISCC), 2014 IEEE Symposium on, 2014, pp. 1–6. [55] S. Halder and A. Ghosal, “A Location-Wise Predetermined Deployment for Optimizing Lifetime in Visual Sensor Networks,” IEEE Trans. Circuits Syst. Video Technol., vol. 26, no. 6, pp. 1131–1145, Jun. 2016. [56] T. Zahariadis, K. Petrakou, and S. Voliotis, “Enabling QoS in Visual Sensor Networks,” in 48th International Symposium ELMAR-2006 focused on Multimedia Signal Processing and Communications, 2006, pp. 327–330. 151  [57] Fallahi, A. and Hossain, E., “A Dynamic Programming Approach for QoS-Aware Power Management in Wireless Video Sensor Networks,” Veh. Technol. IEEE Trans. On, vol. 58, no. 2, pp. 843–854, Feb. 2009. [58] G. A. Shah, W. Liang, and O. zgu¨r B. Akan, “Cross-Layer Framework for QoS Support in Wireless Multimedia Sensor Networks,” IEEE Trans. Multimed., vol. 14, no. 5, pp. 1442–1455, Oct. 2012. [59] Juan R. Diaz, Jaime Lloret, Jose M. Jimenez, and Joel J. P. C. Rodrigues, “A QoS-Based Wireless Multimedia Sensor Cluster Protocol,” Int. J. Distrib. Sens. Netw., vol. 2014, 2014. [60] Z. He, Y. Liang, L. Chen, I. Ahmad, and D. Wu, “Power-Rate-Distortion Analysis for Wireless Video Communication Under Energy Constraints,” Circuits Syst. Video Technol. IEEE Trans. On, vol. 15, no. 5, pp. 645–658, May 2005. [61] C. B. Margi, V. Petkov, K. Obraczka, and R. Manduchi, “Characterizing Energy Consumption in a Visual Sensor Network Testbed,” in Testbeds and Research Infrastructures for the Development of Networks and Communities, 2006. TRIDENTCOM 2006. 2nd International Conference on, 2006. [62] J. J. Ahmad, H. A. Khan, and S. A. Khayam, “Energy Efficient Video Compression for Wireless Sensor Networks,” in Information Sciences and Systems, 2009. CISS 2009. 43rd Annual Conference on, Baltimore, MD, 2009, pp. 629–634. [63] Y. He, I. Lee, and L. Guan, “Distributed Algorithms for Network Lifetime Maximization in Wireless Visual Sensor Networks,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 5, pp. 704–718, May 2009. [64] I. Lee, W. Shaw, and J. H. Park, “On Prolonging the Lifetime for Wireless Video Sensor Networks,” J. Mob. Netw. Appl., vol. 15, no. 4, pp. 575–588, Aug. 2010. [65] S. Ullah, J. J. Ahmad, J. Khalid, and S. A. Khayam, “Energy and Distortion Analysis of Video Compression Schemes for Wireless Video Sensor Networks,” in Military Communication Conference, 2011, pp. 822–827. [66] Ilkyu Ha, Mamurjon Djuraev, and Byoungchul Ahn, “An Energy-Efficient Data Collection Method for Wireless Multimedia Sensor Networks,” Int. J. Distrib. Sens. Netw., vol. 2014, 2014. [67] Halder, S. and Ghosal, A., “A Location-wise Predetermined Deployment for Optimizing Lifetime in Visual Sensor Networks,” Circuits Syst. Video Technol. IEEE Trans. On, vol. PP, no. 99, pp. 1–1, 2015. [68] W. B. Heinzelman, A. P. Chandrakasan, and H. Balakrishnan, “An Application-Specific Protocol Architecture for Wireless Microsensor Networks,” IEEE Trans. Wirel. Commun., vol. 1, no. 4, pp. 660–670, Oct. 2002. [69] P. Cheng, C.-N. Chuah, and X. Liu, “Energy-Aware Node Placement in Wireless Sensor Networks,” in Global Telecommunications Conference, 2004. GLOBECOM ’04. IEEE, 2004, vol. 5, pp. 3210–3214. [70] O. Younis and S. Fahmy, “HEED: A Hybrid, Energy-Efficient, Distributed Clustering Approach for Ad Hoc Sensor Networks,” Mob. Comput. IEEE Trans. On, vol. 3, no. 4, pp. 366–379, Dec. 2004. [71] J.-H. Chang and L. Tassiulas, “Maximum Lifetime Routing in Wireless Sensor Networks,” Netw. IEEEACM Trans. On, vol. 12, no. 4, pp. 609–619, Aug. 2004. 152  [72] J. Haapola, Z. Shelby, C. Pomalaza-Ráez, and P. Mähönen, “Multihop Medium Access Control for WSNs: An Energy Analysis Model,” EURASIP J. Wirel. Commun. Netw., vol. 2005, no. 4, pp. 523–540, Sep. 2005. [73] M. Esseghir, N. Bouabdallah, and G. Pujolle, “Sensor Placement for Maximizing Wireless Sensor Network Lifetime,” in Vehicular Technology Conference, 2005. VTC-2005-Fall. 2005 IEEE 62nd, 2005, vol. 4, pp. 2347–2351. [74] D. Jung, T. Teixeira, and A. Savvides, “Sensor Node Lifetime Analysis: Models and Tools,” ACM Trans. Sens. Netw., vol. 5, no. 1, Feb. 2009. [75] Anuradha Pughat and Vidushi Sharma, “A Review on Stochastic Approach for Dynamic Power Management in Wireless Sensor Networks,” Hum.-Centric Comput. Inf. Sci., vol. 5, no. 4, Feb. 2015. [76] M. Maimour, C. Pham, and J. Amelot, “Load Repartition for Congestion Control in Multimedia Wireless Sensor Networks with Multipath Routing,” in 3rd International Symposium on Wireless Pervasive Computing, ISWPC 2008, Santorini, Grece, 2008. [77] Syed Muhammad Asad Zaidi, Jieun Jung, and Byunghun Song, “Prioritized Multipath Video Forwarding in WSN,” J. Inf. Process. Syst., vol. 10, no. 2, pp. 176–192, 2014. [78] ISO/IEC 10918, ITU-T T.81, “JPEG Standard. ITU-T Recommendation T.81.” Sep-1992. [79] Taylor, C.N. and Dey, S., “Adaptive Image Compression for Wireless Multimedia Communication,” in Communications, 2001. ICC 2001. IEEE International Conference on, 2001, vol. 6, pp. 1925–1929 vol.6. [80] H. S. Aghdasi, M. Abbaspour, M. E. Moghadam, and Y. Samei, “An Energy-Efficient and High-Quality Video Transmission Architecture in Wireless Video-Based Sensor Networks,” Sensors, vol. 8, no. 8, p. 4529, 2008. [81] A. Mammeri, A. Khoumsi, D. Ziou, and B. Hadjou, “Energy- Efficient Transmission Scheme of JPEG Images over VSN,” in 4th IEEE International Workshop on Performance and Management of Wireless and Mobile Networks P2MNET (2008), Montreal, Quebec, 2008, pp. 639–647. [82] Zhe-yuan Xiong, Xiao-Ping Fan, Shao-qiang Liu, and Zhi Zhong, “Low Complexity Image Compression for Wireless Multimedia Sensor Networks,” in Information Science and Technology (ICIST), 2011 International Conference on, Nanjing, 2011, pp. 665–670. [83] B. A. B. Sarif, V. C. M. Leung, and P. Nasiopoulos, “Energy Efficient Multiple Description Coding for Video Sensor Network,” in 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications Workshops, Busan, Korea, 2011, pp. 153–158. [84] A. Aaron, S. Rane, R. Zhang, and B. Girod, “Wyner–Ziv Coding for Video: Applications to Compression and Error Resilience,” in Proceedings of the Conference on Data Compression, Washington, DC, 2003, pp. 93–102. [85] Bern Girod, Anne Margot Aaron, Shantanu Rane, and David Rebollo-Monedero, “Distributed Video Coding,” in Proceeding of IEEE, 2005, vol. 93, pp. 71–83. [86] X. Artigas, J. Ascenso, M. Dalai, S. Klomp, D. Kubasov, and M. Ouaret, “The DISCOVER Codec: Architecture, Techniques and Evaluation,” in Picture Coding Symposium, Lisbon, 2007. 153  [87] D. Slepian and J. Wolf, “Noiseless Coding of Correlated Information Sources,” Inf. Theory IEEE Trans. On, vol. 19, no. 4, pp. 471–480, Jul. 1973. [88] A. Wyner and J. Ziv, “The Rate-Distortion Function for Source Coding with Side Information at the Decoder,” Inf. Theory IEEE Trans. On, vol. 22, no. 1, pp. 1–10, Jan. 1976. [89] R. Puri, A. Majumdar, and P. Ramanathan, “PRISM: A Video Coding Paradigm With Motion Estimation at the Decoder,” Image Process. IEEE Trans. On, vol. 16, no. 10, pp. 2436–2448, Oct. 2007. [90] Yaacoub, C., Farah, J., and Pesquet-Popescu, B., “Joint Source-Channel Wyner-Ziv Coding in Wireless Video Sensor Networks,” in Signal Processing and Information Technology, 2007 IEEE International Symposium on, 2007, pp. 225–228. [91] Jinhong Di, Aidong Men, Bo Yang, Zeyang Yang, and Manman Fan, “Motion-Compensated Refinement Based Distributed Video Coding for Wireless Video Sensor Network,” in Wireless Personal Multimedia Communications (WPMC), 2011 14th International Symposium on, 2011, pp. 1–4. [92] A. Ukhanova, E. Belyaev, and S. Forchhammer, “Encoder Power Consumption Comparison of Distributed Video Codec and H.264/AVC in Low-Complexity Mode,” in Software, Telecommunications and Computer Networks (SoftCOM), 2010 International Conference on, 2010, pp. 66–70. [93] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC Video Coding Standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003. [94] I. E. Richardson, The H.264 Advanced Video Compression Standard, Second Edition. John Wiley & Sons, Ltd, 2010. [95] Cristina Alcaraz, Pablo Najera, Javier Lopez, and Rodrigo Roman, “Wireless Sensor Networks and the Internet of Things: Do We Need a Complete Integration?,” in 1st International Workshop on the Security of The Internet of Things, Tokyo, Japan, 2010. [96] ISO/IEC JTC1/SC29/WG11, “Exploration on  Media-centric Internet of Things.” MPEG Doc. N15085, Geneva, Feb-2015. [97] A. Seema and M. Reisslein, “Towards Efficient Wireless Video Sensor Networks: A Survey of Existing Node Architectures and Proposal for A Flexi-WVSNP Design,” Commun. Surv. Tutor. IEEE, vol. 13, no. 3, pp. 462–486, Third Quarter 2011. [98] Richard Van Noorden, “The Rechargeable Revolution: A Better Battery,” Nature, vol. 507, no. 7490, pp. 26–28, Mar. 2014. [99] P. Agrawal, J.-C. Chen, S. Kishore, P. Ramanathan, and K. Sivalingam, “Battery Power Sensitive Video Processing in Wireless Networks,” in Personal, Indoor and Mobile Radio Communications, 1998. The Ninth IEEE International Symposium on, Boston, MA, 1998, vol. 1, pp. 116–120. [100] N. Imran, B.-C. Seet, and Alvis C. M. Fong, “A Comparative Analysis of Video Codecs for Multihop Wireless Video Sensor Networks,” Multimed. Syst., vol. 18, no. 5, pp. 373–389, 2012. [101] Li-Wei Kang and Chun-Shien Lu, “Distributed Compressive Video Sensing,” in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, Taipei, 2009, pp. 1169–1172. 154  [102] Katja Schwieger and Gerhard Fettweis, “Multi-hop Transmission: Benefits and Deficits,” in GI/ITG Fachgespraech Sensornetze, Karlsruhe, Germany, 2004, pp. 26–27. [103] Junlin Li and Ghassan AlRegib, “Network Lifetime Maximization for Estimation in Multihop Wireless Sensor Networks,” IEEE Trans. Signal Process., vol. 57, no. 7, pp. 2456–2466, Jul. 2009. [104] Nok Hang Mak and W.K.G. Seah, “How Long is the Lifetime of a Wireless Sensor Network?,” in Advanced Information Networking and Applications, 2009. AINA ’09. International Conference on, Bradford, 2009, pp. 763–770. [105] Z. He and D. Wu, “Resource Allocation and Performance Analysis of Wireless Video Sensors,” Circuits Syst. Video Technol. IEEE Trans. On, vol. 16, no. 5, pp. 590–599, May 2006. [106] H. K. Zrida, A. C. Ammari, M. Abid, and A. Jemai, “Complexity/Performance Analysis of a H.264/AVC Video Encoder,” in Recent Advances on Video Coding, InTech, 2011. [107] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. Wedi, “Video Coding with H.264/AVC: Tools, Performance, and Complexity,” IEEE Circuits Syst. Mag., vol. 4, no. 1, pp. 7–28, 2004. [108] W. Ding and B. Liu, “Rate Control of MPEG Video Coding and Recoding by Rate-Quantization Modeling,” IEEE Trans. Circuits Syst. Video Technol., vol. 6, no. 2, pp. 12–20, Feb. 1996. [109] T. Chiang and Y.-Q. Zhang, “A New Rate Control Scheme Using Quadratic Rate Distortion model,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 2, pp. 246–250, Feb. 1997. [110] J. Ribas-Corbera and S. Lei, “Rate Control in DCT Video Coding for Low-Delay Communications,” IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 2, pp. 172–185, Feb. 1999. [111] Y. Liu, G. Li, and Y. C. Soh, “A Novel Rate Control Scheme for Low Delay Video Communication of H.264/AVC Standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 1, pp. 68–78, Jan. 2007. [112] Z. G. Li, F. Pan, K. P. Lim, G. N. Feng, X. Lin, and S. Rahardja, “Adaptive Basic Unit Layer Rate Control for JVT,” Pattaya II, Thailand, JVT-G012-r1, Mar. 2003. [113] Zhan Ma, Meng Xu, Yen-Fu Ou, and Yao Wang, “Modeling of Rate and Perceptual Quality of Compressed Video as Functions of Frame Rate and Quantization Stepsize and Its Applications,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 5, pp. 671–682, May 2012. [114] Yen-Fu Ou, Zhan Ma, Tao Liu, and Yao Wang, “Perceptual Quality Assessment of Video Considering Both Frame Rate and Quantization Artifacts,” IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 3, Mar. 2011. [115] Christian Lottermann, Alexander Machado, Damien Schroeder, Yang Peng, and Eckehard Steinbach, “Bit Rate Estimation for H.264/AVC Video Encoding Based on Temporal and Spatial Activities,” in 2014 IEEE International Conference on Image Processing (ICIP), Paris, 2014, pp. 3195–3199. [116] Yang Peng and Eckehard Steinbach, “A Novel Full-reference Video Quality Metric and its Application to Wireless Video Transmission,” in 2011 18th IEEE International Conference on Image Processing, Brussels, 2011, pp. 2517–2520. 155  [117] C. Sampath Kannangara, Iain. E. Richardson, and A. J. Miller, “Computational Complexity Management of a Real-Time H.264/AVC Encoder,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 9, pp. 1191–1200, Sep. 2008. [118] Wonkyun Kim, Jongmin You, and Jechang Jeong, “Complexity Control Strategy for Real-Time H.264/AVC Encoder,” IEEE Trans. Consum. Electron., vol. 56, no. 2, pp. 1137–1143, May 2010. [119] Yousef O. Sharrab and Nabil J. Sarhan, “Aggregate Power Consumption Modeling of Live Video Streaming Systems,” in Proceedings of the 4th ACM Multimedia Systems Conference, 2013, pp. 60–71. [120] B. A. B. Sarif, V. C. M. Leung, and P. Nasiopoulos, “Distance Based Heuristic for Power and Rate Allocation of Video Sensor Networks,” in Wireless Communications and Networking Conference (WCNC), 2012 IEEE, 2012, pp. 1893–1897. [121] Digital Multimedia Lab, UBC, “Video Surveillance Dataset.” [Online]. Available: http://dml.ece.ubc.ca/data/DML-Video-Surveillance/. [Accessed: 16-Jun-2015]. [122] Bambang A.B. Sarif, Mahsa T. Pourazad, Panos Nasiopoulos, and Victor C.M. Leung, “A Study on the Power Consumption of H.264/AVC-based Video Sensor Network,” Int. J. Distrib. Sens. Netw., vol. 2015, Article ID 304787. [123] I. Dietrich and F. Dressler, “On the Lifetime of Wireless Sensor Networks,” ACM Trans. Sens. Netw., vol. 5, no. 1, Feb. 2009. [124] Kewei Sha and Weisong Shi, “Modeling the Lifetime of Wireless Sensor Networks,” Sens. Lett., vol. 3, no. 2, pp. 1–10, 2005. [125] Ananthram Swami, Qing Zhao, Yao-Win Hong, and Lang Tong, Law of Sensor Network Lifetime and Its Applications. Wiley, 2007. [126] Qi Xue and Aura Ganz, “On the Lifetime of Large Scale Sensor Networks,” Comput. Commun., vol. 29, no. 4, pp. 502–510, Feb. 2006. [127] T. S. Rappaport, Wireless Communications: Principles and Practice, 2nd ed. Prentice Hall, 2001. [128] Haibo Zhang and Hong Shen, “Balancing Energy Consumption to Maximize Network Lifetime in Data-Gathering Sensor Networks,” IEEE Trans. Parallel Distrib. Syst., vol. 20, no. 10, pp. 1526–1539, Oct. 2009. [129] “Stargate datasheet.” [Online]. Available: http://platformx.sourceforge.net/Documents/manuals/6020-0049-02_A_Stargate.pdf. [Accessed: 07-Dec-2012]. [130] B. A. B. Sarif, M. T. Pourazad, P. Nasiopoulos, and V. C. M. Leung, “Encoding and Communication Energy Consumption Trade-off in H.264/AVC Based Video Sensor Network,” in IEEE World of Wireless, Mobile and Multimedia Networks, WoWMoM’13, 2013, pp. 1–6. [131] “H.264/AVC JM Reference Software.” [Online]. Available: http://iphome.hhi.de/suehring/tml/. [Accessed: 07-Dec-2012]. [132] ISO/IEC JTC1/SC29/WG11, “Joint Call for Proposals on Video Compression Technology.” MPEG Doc. N11113, Kyoto, Jan-2010. [133] ISO/IEC JTC1/SC29/WG11, “A Complexity Analysis Tool: iprof (version 0.41).” MPEG Doc. M3551, Jul-1998. 156  [134] S. Gurun and C. Krintz, “Energy Characterization of the Stargate Sensor Network Gateway,” Department of Computer Science, University of California, Santa Barbara, 2006–8, Jun. 2006. [135] D. Chinnery and K. Keutzer, Closing the Power Gap between ASIC & Custom: Tools and Techniques for Low Power Design, 1st edition. Springer, 2007. [136] B. A. B. Sarif, M. T. Pourazad, P. Nasiopoulos, and V. C. M. Leung, “Analysis of Energy Consumption Fairness in Video Sensor Networks.” 2013 Qatar Annual Research Conference. [137] B. Krishnamachari and F. Ord´o˜nez, “Analysis of Energy-Efficient, Fair Routing in Wireless Sensor Networks through Non-linear Optimization,” in Vehicular Technology Conference, 2003. VTC 2003-Fall. 2003 IEEE 58th, Orlando, Florida, 2003, vol. 5, pp. 2844–2848. [138] ITU-T P.910, “Subjective Video Quality Assessment Methods for Multimedia Applications.” Apr-2008. [139] J. Clausen, “Branch and Bound Algorithms - Principles and Examples,” University of Copenhagen, Mar. 1999. [140] Bambang A.B. Sarif, Mahsa Pourazad, Panos Nasiopoulos, Victor C.M. Leung, and Amr Mohamed, “Fairness Scheme for Energy Efficient H.264/AVC-based Video Sensor Network,” Hum.-Centric Comput. Inf. Sci., Feb. 2015. [141] B. A. B. Sarif, M. T. Pourazad, P. Nasiopoulos, and V. C. M. Leung, “Analysis of Power Consumption of H.264/AVC-based Video Sensor Networks through Modeling the Encoding Complexity and Bitrate,” in the 18th International Conference on Digital Society (ICDS 2014), Barcelona, 2014. [142] Bambang A.B. Sarif, Mahsa T. Pourazad, Panos Nasiopoulos, and Victor C.M. Leung, “A New Scheme for Estimating H.264/AVC-based Video Sensor Network Power Consumption,” in 2015 World Congress on Information Technology Applications and Services, Jeju, Korea, 2015. [143] Jason W. Osborne, Best Practices in Data Cleaning. California, USA: Sage Publications Inc., 2013. [144] K. Lane, “What is Robust Regression and How Do You Do It?,” in The annual meeting of the Southwest Educational Research Association, Austin, TX, 2002. [145] Richard T. O’Connell and Anne B. Koehler, Forecasting, Time Series, and Regression: An Applied Approach, 4th ed. South-Western Pub, 2004. [146] ISO/IEC JTC1/SC29/WG11, “Exploration on Wearable MPEG.” MPEG Doc. w15200, Geneva, Feb-2015. [147] ISO/IEC JTC1/SC29/WG11, “Draft Call for Proposals on Green MPEG.” MPEG Doc. N13428, Geneva, Jan-2013. [148] Mahsa T. Pourazad, Colin Doutre, Maryam Azimi, and Panos Nasiopoulos, “HEVC: The New Gold Standard for Video Compression: How Does HEVC Compare with H.264/AVC?,” Consum. Electron. Mag. IEEE, vol. 1, no. 3, pp. 36–46, Jul. 2012. [149] E. Nogues, S. Holmbacka, M. Pelcat, D. Menard, and J. Lilius, “Power-Aware HEVC Decoding with Tunable Image Quality,” in Signal Processing Systems (SiPS), 2014 IEEE Workshop on, 2014, pp. 1–6.  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0308674/manifest

Comment

Related Items