On energy-efficient data gathering in sensor networks by Debojit Dhar B.E., Jadavpur University, India, 2005 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in The Faculty of Graduate Studies (Electrical and Computer Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) January 2011 c© Debojit Dhar 2010 Abstract Wireless sensor networks (WSNs) are becoming increasingly important as they provide an ideal platform for monitoring and controlling physical en- vironments. Starting from small match-box sized nodes to tiny coin-like sensors a WSN promises to be the most ubiquitous information gathering system produced. Being tiny enables ubiquitous and pervasive sensing. However, this form factor comes at the cost of severe resource constraints for the sensor nodes. The nodes can accommodate only a certain amount of energy in the form of batteries and can store and process only a small amount of data due to its crippled size. Due to this reason, sensor networks cannot host more so- phisticated applications and the mean time to failure, due to nodes running out of energy, is also low. These are probably the main reasons why sensor networks have not reached their expected potential. This thesis is an effort to alleviate the energy problem of sensor nodes. We attempt to solve the problem using different data and user centric models which can lead to a multi-fold increase of life for the sensors. Most of the research till date has aimed at micro-adjustments in the sensor hardware and software in order to improve performance. These techniques, though beneficial, increase com- plexity, and are sometimes difficult to implement. The thesis demonstrates simple techniques that can significantly improve energy-savings over and above the micro-adjustment techniques. The thesis takes a radical point of view and looks at higher level primitives that can be modified for certain applications. This thesis provides two energy reduction techniques. The first technique involves aggressive duty-cycling of sensors while maintaining connectivity and coverage followed by reconstruction at the base station. Significant gains can be achieved when the sensed environment has correlation between sensor readings. The second technique involves sampling interval schedul- ing depending on the utilization of the sampled data based on user queries. While the first method ensures correct reproduction of the sensed environ- ment, while reducing the burden on individual sensors, the second method provides an optimal sampling frequency that regulates energy consumption ii Abstract depending on user demands. iii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction and overview . . . . . . . . . . . . . . . . . . . . 1 1.1 Sensor network challenges . . . . . . . . . . . . . . . . . . . . 1 1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Contribution of the thesis . . . . . . . . . . . . . . . . . . . . 2 1.3.1 Duty-cycling and reconstruction . . . . . . . . . . . . 3 1.3.2 Optimal sampling based on user queries . . . . . . . . 4 1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Sleep scheduling and data reconstruction in wireless sensor networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 System model and assumptions . . . . . . . . . . . . . . . . . 8 2.3 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Maintaining connectivity and coverage . . . . . . . . . . . . 10 2.4.1 Decentralized activity planning . . . . . . . . . . . . . 11 2.4.2 Correctness . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.3 Bounding the sub-optimality of algorithm CC . . . . 12 2.4.4 Energy optimizations . . . . . . . . . . . . . . . . . . 14 2.4.5 Sleep scheduling in the communication disc model . . 14 2.4.6 Simple coverage scheme . . . . . . . . . . . . . . . . . 15 2.4.7 Finding the value of γ . . . . . . . . . . . . . . . . . . 16 2.5 Compressed sensing basics . . . . . . . . . . . . . . . . . . . 17 iv Table of Contents 2.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.6.1 Evaluation of connectivity algorithms . . . . . . . . . 18 2.6.2 Evaluation of compressed sensing . . . . . . . . . . . 24 2.7 Software modifications . . . . . . . . . . . . . . . . . . . . . 29 2.8 Limitations of compressed sensing . . . . . . . . . . . . . . . 29 2.9 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3 Optimal sampling based on user queries in sensor networks 32 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 System model and assumptions . . . . . . . . . . . . . . . . . 35 3.3 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4 Problem description . . . . . . . . . . . . . . . . . . . . . . . 37 3.4.1 Problem formulation . . . . . . . . . . . . . . . . . . 37 3.5 Decision making . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.6 Analysis for poisson user query arrival . . . . . . . . . . . . . 40 3.6.1 Multiple poisson arrival in parallel . . . . . . . . . . . 41 3.7 Analysis for hyper exponential user query arrival . . . . . . . 43 3.7.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . 44 3.7.2 Optimal scheduling strategies . . . . . . . . . . . . . 45 3.7.3 Sub-optimal policies . . . . . . . . . . . . . . . . . . . 46 3.8 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.8.1 Numerical results . . . . . . . . . . . . . . . . . . . . 48 3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 Summary of contribution . . . . . . . . . . . . . . . . . . . . 53 4.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 v List of Figures 1.1 Optimization areas. . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 Sensor network with active nodes connected to a base station. 8 2.2 The first figure shows the sensors that were turned on in a particular instant. Blue dots represent inactive sensors and reds indicate active sensors. The first figure represents the first phase of coverage where random nodes are selected for wakeup. The second figure is similar to the first except that the connectivity algorithm has been applied on it, hence a few extra sensors can be found awake. The sensors marked with grey circles indicate some of the sensors that were awaken by the connectivity algorithm. . . . . . . . . . . . . . . . . . . . 20 2.3 The three figures compare the performance of the simple cov- erage scheme and the decentralized activity planning algo- rithms. In all the figures, plots of total number of awake nodes are made against the fraction of nodes which are awake for different values of γavg. All nodes were placed on a 256 by 256 unit length grid with a total of 829 nodes placed randomly. To test the system for various different kinds of deployment, plots have been made for different transmission range for the nodes which results in a different average number of neigh- bours. The first figure has been plotted for average number of neighbours = 9 with transmission range = 2 units. The sec- ond and the third figure have been plotted for transmission range = 4 and 8 units respectively. . . . . . . . . . . . . . . . 22 2.4 Fraction of total nodes vs total number of nodes at different values of γ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.5 Normalized mean square error versus fraction of sensors trans- mitting data for various correlation values. . . . . . . . . . . . 26 2.6 Normalized mean square error versus fraction of sensors trans- mitting data for various value of P (P=1,0.8). . . . . . . . . . 27 vi List of Figures 2.7 Normalized mean square error versus fraction of sensors trans- mitting data for various value of P (P=0.8,0.6). . . . . . . . . 27 2.8 The first figure shows the original data. The second image is the sampled data with 20 percent of the sensors sending back data to the base station. The third figure shows the recovered data using our reconstruction algorithm. . . . . . . . . . . . . 28 2.9 Scaled and reconstructed wifi signals from the Stevens Cam- pus. Only 10 percent of the nodes’ data is used for recon- struction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1 Our data processing system answers queries from the users and schedules sampling intervals on the data store. . . . . . . 34 3.2 Simple timeline graph demonstrating sampling and event ar- rival times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3 Simple timeline graph for poisson event arrival. . . . . . . . . 40 3.4 Cost vs λ1/λ2 for different ratio between kq and kd . . . . . . 43 3.5 Sampling time with varying values of lambda – poisson dis- tribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.6 Sampling time vs kq/kd – poisson distribution . . . . . . . . 50 3.7 Cost vs lambda – poisson distribution . . . . . . . . . . . . . 51 3.8 Cost vs kq/kd for different hyper-exponential sampling policies. 52 vii Acknowledgements I would like to extend my gratitude to those, without whom, it would not be possible for me to complete my thesis. The name that comes foremost is my wonderful advisor Dr. Sathish Gopalakrishnan. He introduced me to the academic world with full sincerity and devotion. Not only has he guided me through the maze of choosing a proper research topic, he has been the person who had always provided me that extra incentive and motivation to perform during my term at UBC. I am indebted to all the professors whose courses I took – Dr. Sarbjit Sarkaria, Dr. Rachel Pottinger, Dr. Matei Ripeanu, Dr. Eric Wohlstadter and of course my advisor Dr. Gopalakrishnan. All of them were kind enough to listen to my problems and extend their help beyond class hours. I would specially like to thank Dr. Ripeanu and Dr. Pottinger whose valuable suggestions during the course projects have helped me build my thesis. I am thankful to Dr. Bhargava, Dr. Schober and Dr. Gopalakrishnan as committee members of my thesis defense. My sincere thanks to all the members of our CompSys lab, with whom I had a wonderful time over the past two years. I start by thanking Maliha and Sriram who had both been my project mates and I had many a fruitful discussion with them. Theepan, who has always been my buddy for an intriguing dialogue. I would like to thank Karim, who helped me with my second paper. We would discuss strategies to solve the mathematical challenges posed by the problem at hand. I would like to extend my gratitude to all the other members of the lab who made my life in Vancouver a fun one. I would like to sincerely thank Angshul, without whose help I would never be in UBC. He was the one who helped me during my application, took me up as his room-mate as I reached Vancouver and helped me with my research and reviewed my project reports and papers. I would also like to thank all my friends at UBC who have contributed significantly to my enjoyable life in Vancouver. Finally, I would like to thank my mother, my brother and sister-in-law without whose encouragement, I wouldn’t be here today. This thesis presents research conducted by Debojit Dhar, in collabora- viii Acknowledgements tion with Dr. Sathish Gopalakrishnan. Chapter 2 Sleep Scheduling and Data Reconstruction in Wireless Sensor Networks is based on the work conducted by Debojit Dhar who received suggestions and feedback from his supervisor, Dr. Gopalakrishnan. Chapter 3 Optimal Sampling based on User Queries in Sensor Networks is based on the work conducted by Debojit Dhar in collaboration with Karim Rostamzade. Suggestions and feedback were received from from their su- pervisor, Dr. Gopalakrishnan. ix Dedication To my Mother and Late Father who would have been very proud to see this day. x Chapter 1 Introduction and overview 1.1 Sensor network challenges The use of sensor networks has been thoroughly studied in the past decade. Even though considerable success has been made in gaining insight about the computation and communication logic behind the sensors, there has not been widespread implementation of this technology. The most probable rea- son behind that being the failure to sustain a deployment for considerable time spans. Wireless sensor networks are an ideal platform for monitor- ing and controlling physical environments. Such networks, which typically consist of battery-powered sensor nodes with limited computational power, can be used in diverse applications such as wildlife and habitat monitor- ing [33], rainfall assessment [38], health assessment of civil engineering ar- tifacts [31], volcanic and seismic activity detection [58] and transportation monitoring [25]. Some of these applications often require the nodes to be placed in not so easily accessible terrain which makes it hard for application developers to replenish the node batteries once they run out of energy. In such applications, it is critical that the life span of each node is elongated to the greatest possible extent such that they can run for substantial amounts of time before they are either replaced by new nodes or their batteries are replaced/recharged. Most of the applications for sensor networks run the risk of failure due to nodes running out of energy. This problem arises because of the inherent energy constraint the nodes. All nodes in a sensor network communicate to a data processing unit called a sink node wirelessly. Since deployments are large, these nodes have to rely on multi-hop communication to reach the sink. The network forms a tree-like structure with the sink as the root node. The tree structure indicates that failure of a node results in either a partial outage or increased overhead on existing live nodes. It also indicates that failure of a few nodes may stop the network from being operational. The lifetime of each node in the network is hence crucial to sustain the intended life span of the network and makes us think of designs that can be used to elongate the life of the network. 1 1.2. Background 1.2 Background How do we reduce energy consumption and gain higher lifetime from a sen- sor network while maintaining quality of service? Each node in the net- work faces mainly two overheads – computation and communication. It has been shown that communication is the more energy sapping operation for nodes [7] and many methods have been devised to reduce energy lost in communication. Methods such as in-network aggregation [34] where data packets are merged and sent to the sink, energy efficient routing methods [50] where routing paths are selected such that they reduce the energy lost in communicating have been investigated. From a hardware perspective, use of faster processors have been shown to reduce energy lost in the sensors [8]. Many researchers have proposed the use of proper scheduling techniques to reduce energy consumption. These include sleep-wakeup protocols [54], MAC based protocols [42] as well as topology control protocols [40]. Oth- ers stress on data acquisition techniques such as hierarchical sampling [59], viewing the sensor network as a database system [36] and others who design an on-demand data supply system [36] to reduce energy consumption in the network. Most of these methods are effective in reducing the energy consumption of a sensor network. However, some of them are complex and application specific. If we inspect carefully, all these methods either deal with reduc- ing energy consumed by a single node or aim for an efficient protocol for communication or data gathering from the network. Specifically, we can consider them as a tweak to one of the links in the chain of processes for the sensor network and hence we describe them as Micro Optimizations. 1.3 Contribution of the thesis The thesis addresses this most practical sensor network problem, how do we increase the lifespan of a sensor network? As cited before, there has been a lot of research aimed at achieving the goal we wish to achieve. However, the pitfall in most of these methods is that they are targeted towards very specific applications and are sometimes not compatible with other micro op- timizations. Further, most Micro-optimizations lead to increased complex- ity. Low-level optimizations lead to increased complexity in software and are difficult to implement across various sensor network platforms. Debugging applications which use these optimizations is complex [56], making such op- timizations hard to implement and maintain. It has also been shown that 2 1.3. Contribution of the thesis many of the protocol level Micro optimization techniques do not work well in conjunction. The work on Integrating Adaptive Components by Heo et al. [1] demonstrates this fact by providing a simple example where a routing protocol and a MAC protocol prove to ve counter productive. The simplest alternative to these methods seem to be adding extra batteries to the node or invest in replenish-able energy sources such as solar charged batteries. The first plan is interesting, yet it leads to increased size of the node which is not desirable for many applications. The second method is still under research and is expensive as well. In this thesis, we have pointed our attention to- wards broad and high level improvements in the energy management scheme for sensor networks. These schemes are mostly triggered by optimizations at the network and data access levels rather than node level and they ac- complish multi-fold energy improvements over pre-existing schemes. The optimizations do not take into account any node-specific attributes, neither any routing or placement arrangements are taken into consideration. Since we do not consider the internals of the nodes, our techniques are at a higher level of abstraction and we call them Macro Optimization techniques. The interesting option which comes with Macro optimizations is that all Micro optimizations can be made along with these, which result in further energy savings. Figure 1.1: Optimization areas. 1.3.1 Duty-cycling and reconstruction In the thesis, we have provided two Macro perspectives of energy man- agement for sensor networks. Both the perspectives could lead to drastic energy reduction for a sensor network deployment. In the first macro energy management technique, we view the sensor network energy problem from a 3 1.3. Contribution of the thesis data gathering point of view which involves the nodes and the sink. The main energy loss in a sensor network occur when the nodes send data to the sink. While doing so, the nodes exhaust their batteries. If there were a way to reduce communication for individual nodes in the network, their lifespan could be increased. The most common solution to this problem is duty-cycling. However, we identify that duty-cycling is only a part of the whole process. Duty-cycling alone can reduce the load on a sensor network, but increases uncertainty in terms of coverage and connectivity. To counter this, we employ intelligent duty-cycling to achieve energy savings in a dense network deployment. In order to sustain duty-cycling while maintaining coverage and connectivity we come up with our own coverage and connec- tivity algorithms. We show that our connectivity and coverage algorithm is near-optimal and works efficiently in a duty-cycled environment. In addi- tion to duty-cycling in dense deployments, we point our attention to certain applications where duty-cycling can be achieved even when the deployment is not so dense. In such applications, the data sent by the sensors to the sink can be further optimized. In sensing applications which measures physical parameters such as temperature, humidity or Wi-Fi signal strength which have high correlation between data sent by neighbours, we generally have sensor nodes sending redundant information to the sink. Redundant infor- mation in this context refer to neighbour nodes sending similar information or the data following a trend which makes it easy to interpolate. In such a scenario, we have the advantage of duty-cycling a sparse deployment. We demonstrate using compressed sensing techniques that it is possible to in- terpolate the missing data at the base station or sink with high accuracy. Such techniques ensure that a large number of nodes can be switched off every sampling interval and lead to greater energy savings. 1.3.2 Optimal sampling based on user queries In the first macro optimization technique, we successfully reduced the num- ber of nodes sending data while accurately representing the sensed field. However, we were indifferent to the frequency at which the network is sam- pled, which is another deciding factor for the lifetime of the sensor nodes. The frequency of sampling depends on the application’s quality of service demands. In the second Macro technique, we move one level up in ab- straction from our first optimization method and view the network from an user’s perspective. The broad question we are trying to answer here is how frequently should we sample the sensor network to meet quality of service de- mands from the user, assuming accurate representation of the sensing field 4 1.4. Organization from the sensor readings. To solve this problem, we analyse user query in- formation and try to provide users with fresh data while trying to minimize energy loss for the sensor network. Often, sensor networks are deployed for applications which uses only the most recent information aggregated by the sensors. In such cases, more than events occurring on the network end, the time of arrival of user queries become the deciding factor that dictates the sampling interval of the network. We study policies for sampling or querying the network that provide a balance between data freshness and en- ergy consumption. We specially investigate the use of statistical information about the arrival rate of user queries to decide on optimal and sub-optimal sampling schedules. We claim that in most applications, the arrival rate of user queries can be represented as a known mathematical distribution and once that is established, we can find an optimal scheduling strategy for this distribution. If we look at fig. 1.1, optimization-1 refers to our first macro technique where our aim is to reduce communication of redundant information from the nodes to the sink. In the second technique, marked as optimization- 2 in the figure, we attempt to reduce transmission of information that is not relevant to the users of the system. In this scenario, we restrict our operation to interactions between the sink and users. Since each of our optimization technique work at different levels of the hierarchy in sensor networks, they are essentially independent of each other and can be made to work in conjunction without affecting each other. We believe that having either of these techniques in place would result in significant energy savings. In an ideal situation, we would like to view an application which makes use of both our optimization methods as well as some of the micro techniques where applicable. Such an infrastructure would hopefully alleviate the short lifetime problem of the sensor network to a large extent. 1.4 Organization The rest of the thesis is organized as follows. In Chapter 2, we introduce our connectivity and coverage algorithm. To gain further savings from our duty-cycling method, we introduce compressed sensing for reconstruction. Detailed experiments follow demonstrating the combined effectiveness of our algorithm with compressed sensing. Chapter 3 deals with the sampling frequency problem where we show optimal schemes to sample a network when user queries follow a known distribution. We conclude in Chapter 4 where we summarize our contributions and cite extensions of this thesis. 5 Chapter 2 Sleep scheduling and data reconstruction in wireless sensor networks 2.1 Introduction Wireless sensor networks are an ideal platform for monitoring and controlling physical environments. In such networks, nodes communicate with each other and convey data periodically to a base station or fusion centre over multi-hop wireless links. A first-order constraint for some applications is the energy budget because the network may be deployed in harsh, and not easily accessed, terrain. Application users would then prefer to maximize the interval between returning to the sensing field to replace batteries. The lifetime of a wireless sensor network can be maximized in several ways. One obvious solution is to add more batteries to sensor nodes but this approach leads to an increase in the form factor of the nodes, which makes them unsuitable for several applications. An alternative solution is to utilize a high density of nodes and employ only a fraction of these nodes at any given time instant. This approach requires that we determine appro- priate duty cycles for different nodes. This approach is also well suited to the case when nodes may be equipped with rechargeable batteries and the recharge operation (whether through solar energy or other harvesting meth- ods) requires nodes to transition to an inactive state. A further advantage of high-density deployments is fault tolerance because the redundancy present in the system enables correct functioning even when certain nodes fail. As a consequence of the advantages and technological feasibility of high-density sensor networks, we will discuss methods that are well suited to lifetime management in this scenario. In particular, we determine near-optimal methods for duty cycling be- tween sensor nodes so as to preserve coverage of the sensing field and connec- tivity between active nodes and the base station. Decentralized operation of 6 2.1. Introduction sensor nodes is desirable because this limits the effect of failures. We there- fore focus on decentralized methods for ensuring coverage and connectivity. In certain applications where the data observed by different sensors is correlated, as is the case when extracting thermal profiles of a region, one can utilize a limited number of nodes and yet reconstruct the complete profile through compressive sensing [22]. Thus, when data observed by sensor nodes is correlated, duty cycling does not diminish the accuracy of data gathering. We will emphasize the use of compressive sensing when appropriate to extend network lifetime while acquiring data with acceptable error bounds. An advantage of compressive sensing is that we can keep only a few nodes active at every time instant even when node density is not high if the data correlation is high. It is important to note that even when compressive sensing is employed, we do need methods to ensure appropriate coverage and a connected backbone to gather the sensed data. Whereas the idea of duty cycling to save energy is not new, our central contributions are to: • develop a decentralized, and near-optimal, method for duty cycling while preserving connectivity and coverage in the sensing field (Sec- tion 2.4), • establish bounds on the performance of the node activation (sleep scheduling) algorithm (Section 2.4.2), and • utilize compressive sensing for reconstructing missing data in the ap- propriate scenarios (Section 2.5). When describing our technique for selecting nodes to be active we also es- tablish the hardness of obtaining optimal solutions to that problem. Our methods extend the state of the art in sensor sleep scheduling (or duty cy- cling). In designing our techniques, we were guided by the need for simplicity because implementation of algorithms for WSNs are often more challenging than they may appear [56]. Through simulation, we are able to demonstrate that we can extend the lifetime of certain networks by a factor of 5 while ensuring that data quality is sufficiently high. While our methods are useful for a variety of sensor network applications, they are not tailored for some situations such as intruder detection and object tracking [49] that require fast response times and where data correlation between sensors is poor. 7 2.2. System model and assumptions 2.2 System model and assumptions In our model of a wireless sensor network, we consider a set of sensor nodes that can communicate with each other wirelessly. We assume that each sensor node knows is location, either through GPS information or through other well-studied techniques such as localization [23]. We make no assump- tions about the topology of the network. In other words, nodes could be distributed according to a regular pattern such as a grid or be distributed randomly over the sensing region. Figure 2.1: Sensor network with active nodes connected to a base station. Each node can communicate with other nodes that are within its trans- mission region. Nodes within the transmission region of a node are called its neighbours. Each node also has a sensing range, which determines the area that it is able to gather data from. In the network, a particular node is assumed to be the base station or data sink. This node does not have energy constraints because it may be connected to a permanent power source. The base station is computation- ally more powerful than other nodes in the network. This node is, however, subject to the same communication constraints as other nodes in the net- work. Henceforth, we will use the term node to refer to nodes that are not the base station, and we will use the term data sink or base station when 8 2.3. Related work we explicitly refer to this particular node. We assume that nodes can be in one of two states: active or inactive. In the active state, a node senses data and communicates with a subset of its neighbours to ensure that the data is transmitted to the base station, possibly over multiple hops. We do not assume that intermediate nodes fuse data to reduce the size of data transmitted although data fusion of this nature can be accommodated by the methods that we propose. We assume that the network is connected if there is a path from every active node to the base station, and all nodes on the path are also active. The network provides full coverage if the sensing ranges of all active nodes covers the entire sensing region. We do make the assumption that the initial deployment of nodes provides complete coverage when all nodes are active. 2.3 Related work Prior work on data gathering techniques for sensor networks have explored various avenues [35, 36] to reduce the cost of acquiring data from a sensor network. These include energy-efficient routing protocols [50], aggregation techniques [14, 34] and data store abstractions [35, 36]. These schemes did not adopt duty cycling but focused on reducing the energy cost of com- munication, and the gains achieved were limited because application-aware duty cycling can explicitly transition nodes to an inactive state and achieve greater energy savings. Researchers have also developed policies that use transmission power regulation to provide energy savings [24]. These policies require complicated MAC and routing protocols. Since our aim is to build a system which is simple to implement, we try to avoid the use of such complex protocols. Sleep scheduling and duty cycling have been proposed in the past, and the main goal is to preserve some level of connectivity and coverage among nodes that are active. Whereas prior work has demonstrated the value of exploiting low-power sleep states in sensor nodes [54], we have developed a decentralized scheme for sleep scheduling that is configurable in terms of the active neighbourhood size for any node. We also establish performance bounds on our scheme. We gain knowledge from the previous body of work on connectivity and coverage in sensor networks. While some papers stress coverage [9, 51, 55], others stress connectivity [10, 13, 61]. Most of the coverage techniques try to find a relation between the sensing range and communication range of the nodes. Such results are relevant to our proof for bounding the sub-optimality of the CC Algorithm. 9 2.4. Maintaining connectivity and coverage We employ compressed sensing when possible to reconstruct missing data in a duty-cycled WSN. Compressed sensing has been applied in image pro- cessing [19, 53] for recovering dead and unreadable pixels from surrounding pixels. Jarvis et al. [22] were the first to utilize this technique for sensor networks. Jarvis et al. used compressed sensing to conserve bandwidth in a WSN and did not tackle the problem of lifetime maximization. Other work connecting sensor networks to compressed sensing [2, 4, 37, 44] has been directed at devising better reconstruction algorithms and have tried applying compressed sensing to solve different shortcomings in sensor networks. Quer et al. [43] tested these algorithms on real workloads but even they did not articulate concrete implementation metrics and did not measure energy savings. Other researchers have come up with compression schemes that can be adapted efficiently with routing [18]; our work is different from them in the sense that compression and routing are dealt with independently. We present an algorithm that constructs a connected set of active nodes and we assume tree-structured communication [52] to gather data at the data sink. 2.4 Maintaining connectivity and coverage Our initial goal is to ensure that, in a dense deployment of sensor nodes, a sufficient number of nodes remain in the active state at any time instant to ensure sensing coverage and network connectivity. Other nodes in the network can transition to the inactive state and conserve energy. Apart from requiring that nodes know their location (see Model), we assume that nodes employ some light-weight time synchronization mechanism. We divide time into discrete activity intervals and each node, according to a decentralized decision-making policy, decides to be active or inactive in that interval. Only active nodes can participate in sensing, processing, and communication. A node can communicate only with neighbours that are also active. The network has a distinct base station and all nodes should be able to send data to the base station. The base station is assumed to be active during all activity intervals. An active node should either be a neighbour of the base station or should have at least one active neighbour that has a path to the base station. To ensure sufficient coverage of the sensor field, we also require that every node (active or inactive) have at least one neighbouring node that is active in each time interval. We generalize the connectivity and coverage requirements through a pa- 10 2.4. Maintaining connectivity and coverage rameter γ. γ is the number of nodes that must be active in the neighbour- hood of any sensor node v. γ = 1 provides basic connectivity and coverage while γ > 1 is useful when more robust coverage and fault tolerance is re- quired. The value of γ has a direct relationship with the fraction of nodes we want to be kept active at any time instant. In section 2.4.7, we will discuss more about the methods for deriving γ. The properties that should be satisfied by the decentralized activity plan- ning algorithm can now be listed: • Each node v with γv neighbours must have at least min(γ, γv) active neighbours at any time instant. When deployments are dense γv γ. • The algorithm will guarantee node coverage – all active nodes will be connected and every node (active or inactive) will have an active neighbour. • To ensure load balancing among nodes, the set of active nodes will change from one activity interval to the next. 2.4.1 Decentralized activity planning We now present a simple randomized scheme that preserves the properties we desire. Randomization ensures that the set of active nodes changes between activity intervals. The following steps are executed at each node v in the network. Our scheme requires that every node v transitions to the active state at the end of every activity interval, negotiates with its neighbours and then determines its state for the next activity interval. The following steps are followed at every node v: 1. Pick a random ballot bv where bv is a real number in the set [0, 1]. 2. Broadcast bv and receive the ballots from neighbours. The neighbours are denoted by the set γv. Let the set of ballots received be Bv. 3. Broadcast Bv and receive Bw from each w ∈ γv. 4. If |γv| < γ or |γw| < γ for any w ∈ γv, then stay active and terminate the planning phase. 5. Compute Av = {w|w ∈ γv and bw < bv}. 6. Transition to the inactive state if the following two conditions are true, else stay active. 11 2.4. Maintaining connectivity and coverage • Any two nodes in Av are connected either directly themselves or indirectly through some node w within v’s two-hop neighbour- hood such that bw < bv. • Any node in γv has at least γ neighbours in their Av. We shall refer to the algorithm just described as Algorithm CC (connectivity and coverage). We shall now discuss the correctness of this algorithm and then derive bounds on the sub-optimality of the algorithm. 2.4.2 Correctness Algorithm CC ensures that any node v has at least min{γ, γv} active neigh- bours. When γv < γ, then none of v’s neighbours will transition to the inactive state. When γv ≥ γ then we shall establish the result by con- tradiction. Assume that for i ≤ γ the neighbour of v with the ith lowest ballot transitions to the inactive state. Call this neighbour w. Aw will have at most i − 1 nodes that are neighbours of v. Since i − 1 < γ, w cannot transition to the inactive state thus resulting in a contradiction. If the original graph was connected then Algorithm CC results in a con- nected graph. We shall establish this result by contradiction. Suppose that the output graph is disconnected. Add the deleted nodes back in the graph in ascending order of their ballots, and let v be the first node that makes the graph connected. By the time we add v, all members of Av are already present. Moreover, nodes in Av are already connected since they are connected by nodes with ballots at most bv. Let w be a node that was disconnected from Av but gets connected to Av by v. This contradicts the rule that v can sleep only if all its neighbours (including w) are connected to at least γ nodes in Av. 2.4.3 Bounding the sub-optimality of algorithm CC The problem of maintaining coverage and connectivity in a graph (network) even for the special case of γ = 1 is NP-complete. This is seen by noting that the problem when γ = 1 reduces to the minimum dominating set problem that is NP-complete. A natural question to answer then concerns bounds on the sub-optimality of the proposed algorithm. More specifically, if the number of nodes kept active by an optimal algorithm is aopt, we would like to know how far acc, the number of nodes kept active by Algorithm CC, is from aopt. Obtaining a bound on the sub-optimality is difficult in the general set- ting but we will study the particular setting when all nodes can directly 12 2.4. Maintaining connectivity and coverage communicate with all nodes that are within a surrounding disc of radius r and when node density is sufficiently high. For any γ ≥ 1, suppose n nodes are placed, uniformly at random, within a region such that each node has (on average) at least 4(γ+log n) neighbours. Then, with high probability, acc = O(log n)aopt. We shall find a lower bound on aopt and an upper bound on acc so as to estimate the sub-optimality of Algorithm CC. Let G be the graph of all nodes and let ν = 4(γ + log n) be the average node degree in G. Applying Chernoff bounds, with high probability, ν/4 ≤ γv ≤ 4ν. Let Gopt be the graph that is produced by an optimal algorithm. This graph contains all nodes in G but all edges involving two inactive nodes are deleted. In Gopt, each node has at least γ neighbours and therefore the number of edges in Gopt is at least nγ/2. Further, since nodes in Gopt have at most 4ν neighbours, the total number of edges is at most 4νaopt. Therefore, we have 4νaopt ≥ nγ/2aopt ≥ nγ/(8ν). Consider j = Cγn log n/ν for some appropriate constant C and run Algorithm CC on graph G. Let b be the jth smallest ballot selected by a node in G. We apply a lemma (Lemma 2.4.3) and note that all nodes that drew a ballot greater than b are inactive w.h.p.. It is possible that some nodes will ballots less than bj may also be inactive but this claim is sufficient to establish a bound on acc, w.h.p. By this claim, we have acc ≤ (Cγn log n)/ν = (8C log n)·nγ/(8ν) ≤ (8C log n)aopt. Therefore, acc = O(log n)aopt, w.h.p.. We shall now state and prove the lemma that we applied in the proof of the sub-optimality theorem. When we run Algorithm CC on a graph G, for some constant C, let j = Cγn log n/ν and let b be the jth smallest ballot drawn by a node in G. W.h.p., all nodes with ballots greater than b are inactive. We can view the process of constructing G as one of first selecting node ballots and then placing the nodes at random. Let v be a node with a ballot that is not in the bottom j ballots. Let Dv be a disc of radius r around v. The average node degree is ν, and therefore we have that each node is placed in Dv with probability ν/n. Let Av be the active nodes around v and let A ′ v be the nodes in Av with ballots in the top j ballots. Then, |A′v| is distributed according to a binomial(j, ν/n) distribution. Applying Chernoff bounds, w.h.p., there are at least x = jν/(4n) = (Cγ log n)/4 randomly placed nodes in Dv that have ballots less than b (and hence have ballots less than bv). 13 2.4. Maintaining connectivity and coverage Consider any two nodes u and w in Av, and let Du and Dw, respectively, be discs of radius r around these two nodes. Define Dv/2 to be the disc of radius r/2 around v; two nodes in Dv/2 will be neighbours. Because u and w are in Dv, the areas of overlap between Du and Dv/2 and between Dw and Dv/2 are each at least 1/12 of the area of Dv. Using the fact that |A′v| ≥ x is logarithmic size, at least one node u′ (w′) lies within the intersection of Du and Dv/2 (Dw and Dv/2). Additionally, u and w are connected by the path u ↔ u′ ↔ w′ ↔ w. The expected number of nodes in A′v that fall in the intersection of Du and Dv is at least x/3. Hence, using Chernoff bounds again, w.h.p. any node u in Nv has at least γ neighbours from A ′ v. Thus, w.h.p., all the conditions for node v to transition to the inactive state are satisfied. 2.4.4 Energy optimizations The decentralized activity planning method we have proposed is effective in providing coverage and connectivity in a sensor network deployment. It however requires local communication at every decision epoch. Every cycle, a node has to communicate its ballot value bv and the set of received ballots Bv to all its one-hop neighbours γv. In order to restrict this overhead, every node v should be provided with: 1. The set of random number generator seeds Rv that is used to generate ballots bv for all its one-hop neighbours γv. 2. The set of random number generators Rw from each w ∈ γv. This indicates that every cycle, each node can calculate bw and Bw for all its neighbours which relieves it from having to communicate the ballot values. Since computation is a much cheaper operation than communication [7], we end up being energy conservant. 2.4.5 Sleep scheduling in the communication disc model A further optimization in Algorithm CC 2.4.3 can be achieved in case the sensor nodes are placed along plain grids. In such regular distributions, it is easy to divide the area into smaller sub-grids and try to maintain connectivity and coverage in each sub-grid. Since we are now applying our optimizations in smaller sections of known dimension and density, better results can be obtained. In this optimization method, we divide the entire sensed region into smaller sectors. The sector size depends on the communication range of the 14 2.4. Maintaining connectivity and coverage nodes and are designed such that all nodes in the sector would be a one or two hop neighbour to each other. This can be easily ensured using the communication disc logic where each node v has a communication disc Dv. Due of the regular grid pattern, we can ensure a certain number of nodes to be present in each node’s Dv and hence decide on a sector size that asserts our assumption. Since our algorithm maintains coverage and connectivity in each sector, adjacent sectors can be assumed to be connected through one or more nodes. This makes it possible for each sector to act as an autonomous group and optimize behavior within itself. The connectivity and coverage algorithm inside each sector is similar to the decentralized activity planning algorithm with a few simplifications. Decision for the number of nodes to be kept awake in a sector is made based on the value of γ as it was done in the previous algorithm. For each node in the sector, there must be at least γ active neighbours. In this model, each node v 1. Is provided a random number generator seed rv which is used to gen- erate its ballot bv. 2. Keeps track of the seed of all other nodes in its sector, Rv. 3. Calculates the set Bw for each w ∈ γv. 4. If |γv| < γ or |γw| < γ for any w ∈ γv, then stay active and terminate the planning phase. 5. Remaining steps are the same as in the CC algorithm. This distributed algorithm does not involve message exchange between the nodes and hence remains energy efficient. After certain intervals, which are usually long, the nodes exchange heartbeat messages to sync their seed information with other nodes in the same sector. 2.4.6 Simple coverage scheme The previous schemes are valuable in case strict coverage and connectivity between the nodes is necessary. However, if strict coverage is not required, a simpler scheme might prove useful. We call this our Simple Coverage Scheme. In this scheme, each node v: 1. Is provided with a random number generator seed rv, which is used to generate its random ballot bv (a positive integer in this case). 15 2.4. Maintaining connectivity and coverage 2. Is provided with a set of random number seeds from all other nodes in its sector, Rv. (Similar to the sub-grid optimization, the nodes are divided into sectors.) 3. Is provided with λ, λ = 1/f where f is the fraction of nodes desired to be active for the application. Once a node generates its ballot bv, it performs modulo operation with λ to decide whether to wake up or not. (For example, if bv mod λ = 1, wake up.) 4. If bv mod λ 6= 1, • Compute Av = {w|w ∈ γv and rv mod λ = 1}. • Compute Cv = {w|w ∈ Av and |Aw| = 0}. • If |Cv| = 0, then sleep. • If |Cv| > 0, for each w in Cv check all possible neighbours in the sector that can wake up to become a neighbour of w. If bv is the lowest among all neighbours which are not a part of Av, wake up, else go to sleep. The Simple Coverage Scheme reduces computation for active nodes in each cycle compared to the sub-grid optimization. It also involves minimum communication, similar to the sub-grid optimization. However, this algo- rithm can only guarantee basic connectivity, as it does not ensure nodes with multiple active neighbours. Coverage is maintained through random- ization. 2.4.7 Finding the value of γ While using the decentralized connectivity and coverage algorithms, we based our assumption on the fact that the system has knowledge about γ, the number of active neighbours for each sensor node. However, calculating the value of γ is complicated and requires knowledge of several parameters. The value of γ is critical to the amount of energy savings, the accuracy and robustness of the sensor network deployment. To decide on a value of γ, we provide two simple strategies. Depending on the information available during deployment, we can use one of the following techniques to judge and distribute a value of γ to all the sensor nodes. We present detailed description of the strategies that can be used to calculate γ. The average neighbour strategy: In this method, we need knowl- edge of the percentage of nodes that are required to stay active for every 16 2.5. Compressed sensing basics cycle. This fraction value f is generally derived from compressed sensing 2.5 or calculated from the intended lifetime of the sensor network application. In this strategy, we must derive the average number of neighbours for a node in the network. This can be done at the set-up phase, if all nodes broad- cast their one-hop neighbour count γv to the sink. The average number of neighbours, γavg can be calculated from the values of γv transmitted to the sink. γ can then be calculated as: γ = |f × γavg|. Generally, in a deploy- ment which provides a duty-cycling option, we can expect that the value of γ γavg. The fraction neighbour strategy: This is similar to the average neighbour strategy. We must derive the value of f as was done in the previous method. This value of f must then be distributed to all nodes in the network. Each node calculates its own γ from its neighbour count γv as: γ = |f × γv|. An important outcome of this scheme is that it results in different values of γ for different nodes in the network. This is contradictory to our connectivity strategy which assumes that the value of γ is fixed. However the CC algorithm can easily accommodate variable values of γ. All steps in the CC Algorithm remain the same except for the 2nd part of the last step (6) where each node needs to check if each of its neighbour have their individual γ number of neighbours awake for a particular cycle. 2.5 Compressed sensing basics Having discussed how we can maintain connectivity and coverage in conjunc- tion with sleep scheduling, we now consider the use of compressed sensing for data reconstruction in certain sensor network applications. The advan- tage of compressed sensing is that we can reconstruct the reading of the inactive nodes in the sensing field with high accuracy. This allows us to deactivate a larger number of nodes while duty cycling and extract higher energy saving. Compressed sensing [16] relates to the problem of solving a system of under-determined linear equations where the solution is supposed to be sparse, i.e., y = Ax, where x is a sparse signal. In the above equation, y is a compressed version of x. x can be considered to be a one dimension vector of length N while A is a matrix of size M × N , with M < N . In general, an under-determined system has infinitely many solutions but it has been shown that if the solution itself is sparse, it is typically unique. Given enough time and computational resource, it is possible to solve the said problem by a brute force search. Mathematically it is represented as: 17 2.6. Experiments min||x||0 subject to y = Ax Unfortunately this is an NP hard problem [16]. However, solving the inverse problem by this method requires lesser number of equations. Com- pressed Sensing has proved that the said inverse problem can be solved by the following convex relaxation of the NP hard problem, min||x||1 subject to y = Ax The l1-norm is the tightest convex envelope of the NP-hard l0-norm. This optimization problem can be solved by linear programming. It has been shown that intermediate between the NP hard (l0-norm) problem and its convex relaxation (l1-norm) lies the following non-convex relaxation [12], min||x||p subject to y = Ax, 0 < p < 1 It is possible to solve the above non-convex problem and the number of equations required to solve it is intermediate between the NP hard (l0-norm) and the convex (l1-norm) problem. The only issue theoreticians arise is that the non-convex problem may not converge to a global minima. However, practically this had never been the problem and this sort of non-convex methods give much better results than their more popular convex counter- parts. Our evaluation results confirm this fact. Practically, the actual signal is never sparse, but has a sparse represen- tation in the transformed domain (Fourier, DCT, Wavelet etc.). In such a scenario, the inverse problem can be framed as, y = AS’z where z = Sx, x = S’z, S being the sparsifying transform. Therefore the corresponding optimization problem is: min||z||1 subject to y = AS’z This formulation has been the basis for nearly all compressed sensing applications in signal processing. From the data acquisition perspective, the reading from all the N nodes in the network is represented by x. The sink will receive only readings from the sensors that are active, which is the vector y. Since the sink has knowledge about the nodes that has transmitted, it would have information of the transformation matrices and hence can compute the original matrix x from the transmitted values. 2.6 Experiments 2.6.1 Evaluation of connectivity algorithms In order to test the coverage and connectivity algorithms, we built a simple simulator. The simulator accepts the co-ordinates of all the nodes in the 18 2.6. Experiments network and the desired fraction of awake nodes as inputs. To test the CC Algorithm, we select a random placement of nodes over a sensing region. For the sub-grid optimization and the Simple Coverage Scheme, we consider a regular grid structure. We consider average neighbour strategy for neighbour assignment. We ran experiments to verify effectiveness of the Simple Coverage Scheme. The simulator first outputs the initially active nodes and then the nodes that needed to wake up in order to maintain connectivity (Fig. 2.2). We also conducted tests to compare the performance of our different connectivity and coverage algorithms. We found it difficult to compare our scheme with other connectivity algorithms since most of them have different objectives. Different objectives arise from the fact that the parameters we provide to our algorithm is unique and different from any other algorithm. To provide a simple comparison, we considered two different flavours of our algorithm – the CC Algorithm and the Simple Coverage Scheme. Each of them accept similar inputs but work differently. Such a comparison help users to decide which algorithm they want to implement. In order to compare performance, we choose the number of awake nodes produced by each algorithm at the end of a cycle as the metric. For the same input parameters, the algorithm producing the lower number of nodes to maintain connectivity is an obvious winner. However, it is not clear which algorithm performs better as we can see from our experiments with the test data (Fig. 2.3). It seems from the results that the Simple Coverage Scheme generally performs better than its counterpart when the number of neighbour is high. It is upto us to decide the algorithm to be used. How- ever, depending on the topology of the network and the required sparsity, a conscious decision can be made based on the results published here. 19 2.6. Experiments (a) Before Connecting (b) After Connection Figure 2.2: The first figure shows the sensors that were turned on in a particular instant. Blue dots represent inactive sensors and reds indicate active sensors. The first figure represents the first phase of coverage where random nodes are selected for wakeup. The second figure is similar to the first except that the connectivity algorithm has been applied on it, hence a few extra sensors can be found awake. The sensors marked with grey circles indicate some of the sensors that were awaken by the connectivity algorithm. 20 2.6. Experiments (a) Transmission Range=2 (b) Transmission Range=4 21 2.6. Experiments (c) Transmission Range=8 Figure 2.3: The three figures compare the performance of the simple coverage scheme and the decentralized activity planning algorithms. In all the figures, plots of total number of awake nodes are made against the fraction of nodes which are awake for different values of γavg. All nodes were placed on a 256 by 256 unit length grid with a total of 829 nodes placed randomly. To test the system for various different kinds of deployment, plots have been made for different transmission range for the nodes which results in a different average number of neighbours. The first figure has been plotted for average number of neighbours = 9 with transmission range = 2 units. The second and the third figure have been plotted for transmission range = 4 and 8 units respectively. We also tested the bounds of the CC algorithm at different concentration of nodes in the sensing region. The same experiment would also establish the fact that the size and density of the network would stress the performance of our CC algorithm. In order to test our algorithm, we considered a 256 by 256 grid and varied the density or number of nodes in the area. For each node density, we varied the value of γ to see how the CC algorithm would react to varying application demands. As expected, the results (Fig. 2.4) conform 22 2.6. Experiments to an increased number of awake nodes as the deployment becomes sparser. The number of neighbours for a given node, γv decreases with sparsity and we see that from some point, all nodes need to stay awake to support the value of γ set for that network. We also illustrate that the fraction of nodes that are awake also increases as γ is increased. Figure 2.4: Fraction of total nodes vs total number of nodes at different values of γ Connectivity test To prove that the connectivity algorithm is effective in reducing discontinu- ities in the network, we added a module to our simulator to test connectivity. In this test, we consider the same set of nodes that was used to illustrate the connectivity and coverage algorithm. At the end of a given cycle, we randomly pick nodes and test if there is a path to the sink from this node. The algorithm works in accordance to the following steps: • For the randomly chosen node, find all awake neighbours in transmis- sion range. Let the awake neighbours for node v be represented by a set Av. • For all w,w ∈ Av, calculate their distance dw from the sink S. 23 2.6. Experiments • Choose a neighbour node ws such that for all w,w ∈ Av,dw is the minimum. • Replace v with ws • Continue till v or ws is a neighbour of the sink, S. • If there is a break in any of the above steps, connectivity is not main- tained. We ran exhaustive tests on the nodes to prove that all our connectivity algorithm does not leave behind isolated nodes. 2.6.2 Evaluation of compressed sensing Once connectivity and coverage issues are factored in the network, we con- duct experiments to test the effectiveness of compressed sensing for reducing energy consumption in sensor networks. To apply compressed sensing, we test our data acquisition scheme on synthetic as well as real data traces. We first describe how we generate the test data and then apply compressed sensing techniques to interpolate missing data. We present detailed graphs to show how applications would decide on their compressed sensing param- eters. Data generation We use data generation techniques used to obtain synthetic traces. There has been a lot of research aimed at generating accurate sensor network traces since having an actual deployment is not always feasible. We choose one such tool developed by Jindal et al. [28] to build our datasets. The method for generating these synthetic traces have been validated against sensor traces from actual deployments such as the Intel lab data trace [27] and the S-Pol radar dataset [48].The data generation tool follows a mathematical model to capture the spatial correlation in sensor network data. The tool can be used to generate a wide range of synthetic traces showing different levels of correlation. We use different levels of correlation between the network data to simulate a wide range of possible deployments. Correlation among data translates to different physical phenomenon being monitored. As it seems evident, a major advantage of using a synthetic data generator is that it can mimic several types of sensor network applications. Actual data only covers one example of data that it monitors. Apart from this, a synthetic 24 2.6. Experiments generator can create a large or a small network in terms of number of nodes which can test out the compressed sensing algorithm thoroughly. Results We used the synthetic data generator from the previous discussion to gen- erate the sensor data and used compressed sensing to retrieve back missing values from the sensor data. In our experiments, we tried to change the correlation between data in each of the synthetic datasets and also tried changing the value of P during reconstruction to test the accuracy of recon- struction at various values of P . For each of the different values, we plotted the normalized mean square error versus the fraction of sensor values being used for reconstruction. For highly correlated data, the NMSE is similar for different values of P if we have 20 percent of the sensors sending back data to the base station. Beyond this point, P=1 produces increased error compared to lower values of P (Fig. 2.5). For medium and low correlations, we can see that as the correlation of the data decreases, the NMSE increases. For most of the cases, having the value of P below 1 shows lower reconstruction error. Beyond the point where 80 percent of the sensors are missing, P=1 produces much higher reconstruction error compared to lower values of P . We demonstrate the change in error with varying correlation values(Fig. 2.6 and 2.7). It is evident that with higher correlation, the NMSE is lower. Also lower values of P give better results compared to P=1. From the results presented in Fig. 2.8, we can see that though the cor- relation among data points is not too high yet with 20 percent of the nodes sampling, we get a near-exact reconstruction of the original signal. Regardless of the values of P and the correlation factors, one fact is ev- ident from the results. For most sensor network deployments, we can take readings from a small fraction of nodes and reconstruct the missing node values without sacrificing accuracy. The fraction of nodes that is required depends on the error tolerance for the application. However, as we can see from our results, if 20% of the nodes in a deployment send back data to the base station, we can reconstruct the readings from all other nodes with very small reconstruction error, given we have a sufficiently large dataset. With this mechanism, energy consumption in nodes can drop to one-fifth of traditional data acquisition methods in sensor networks. We can expect the life time of each node to be 5 times longer than usual which is enormous compared to current standards.A question that remains unanswered is the overhead at the base station for running our reconstruction algorithm. Is 25 2.6. Experiments it possible to run frequent reconstructions every time the sensors send back data? Overhead at the base terminal is quite critical as it determines the frequency with which the sensors can operate. If the time taken to recon- struct the missing values is really large, it impacts the frequency at which readings can be taken. Since the t ime for reconstructing the data at the base station is proportional to the size of the network, a large network will require larger reconstruction time than a smaller network. Needless to say, reconstruction time will also depend on the processing hardware. The ap- plication developer must decide the sampling frequency of an application while keeping these details in mind. (a) Low Correlation (b) Medium Correlation (c) High Correlation Figure 2.5: Normalized mean square error versus fraction of sensors trans- mitting data for various correlation values. 26 2.6. Experiments Figure 2.6: Normalized mean square error versus fraction of sensors trans- mitting data for various value of P (P=1,0.8). Figure 2.7: Normalized mean square error versus fraction of sensors trans- mitting data for various value of P (P=0.8,0.6). Actual data Testing with synthetic data provides us with very accurate results even when the percentage of active sensors is low. To test the system on actual data, we ran compressed sensing on on actual sensor data. This gave us insight to expected results when compressed sensing is applied on real applications. We tested our algorithms on wifi data collected at the Stevens Campus and published in [43]. Since the actual values of the sensors were not available, we decided to scale down the image to a 32× 32 grid and ran compressed sensing reconstruction on it. Fig. 2.9 shows the actual scaled and the reconstructed versions of the wifi data. The reconstruction is done with 10 percent active 27 2.6. Experiments (a) Original Data (b) Sampled Data (c) Recovered Data Figure 2.8: The first figure shows the original data. The second image is the sampled data with 20 percent of the sensors sending back data to the base station. The third figure shows the recovered data using our reconstruction algorithm. sensors and P = 0.8. Though the mean squared reconstruction error is 14 percent, as the number of sensors is increased to 20 percent, the error drops to 7 percent. It is difficult to compare our improvement in efficiency with other wireless systems. In several other wireless applications, efficiency is measured as throughput of the network per joule of energy consumed [47]. However, such a metric is not appropriate for defining efficiency of our compressed sensing algorithm. In our case, throughput is measured through the accuracy of the reconstructed data while energy cost is measured through the fraction of nodes kept awake. To gain higher throughput or accuracy of reconstruction, a larger fraction of nodes need to stay awake. 28 2.7. Software modifications Figure 2.9: Scaled and reconstructed wifi signals from the Stevens Campus. Only 10 percent of the nodes’ data is used for reconstruction. 2.7 Software modifications To implement our coverage and connectivity algorithm and the compressed sensing reconstruction scheme on an actual sensor network system, we would need to make a few system level changes. The operating system running on the nodes needs to provide added support for the connectivity module while the base station would have to run the reconstruction algorithm every cycle. We studied some of the commonly used WSN operating systems and decided to investigate the added components required in LiteOS [6] to implement our CC Algorithm. Since LiteOS allows configurabilty, it allows different protocols to be implemented as threads running on the kernel. For example, the routing and MAC protocols are implemented as LiteOS threads. To im- plement the CC Algorithm, each node should store its own random number generator seed and the value of γ. The connectivity thread should be able to access the random seed associated with each node and also maintain a neighbour table listing the position and seed of each neighbour. In LiteOS, a neighbour table is maintained by the routing protocol, the connectivity thread could choose to use this table or create its own. Implementing our system on other operating systems should be similar. 2.8 Limitations of compressed sensing We have managed to demonstrate a data management scheme that can lead to huge savings in many sensor network deployments. For WSNs that mon- itor some natural phenomenon with smoothly varying distributions, com- pressed sensing can produce multi fold increase in life time of individual 29 2.9. Extensions nodes. Though most sensing applications fall under this category, a few may not. For example, an intrusion detection system which needs to be highly reliable may not be the best application where compressed sensing can be applied. Also, systems that generate extremely random signals can- not apply compressed sensing due to low correlation in the generated signal. Localization applications which try to track objects [49] do not fit well into the compressed sensing model as it needs to act instantaneously and every data sample is of utmost importance. Further, the compressed sensing system needs a sufficiently large network to work. For example a grid having 20 nodes may not be the best network to test compressed sensing. Since we mainly focus on natural phenomenon, we assume that the network would be sufficiently large to allow our algorithm to select a low percentage of nodes and reconstruct without significant error. 2.9 Extensions We believe that using our coverage and connectivity algorithm in conjunc- tion with compressed sensing would lead to significant gains in most sensor network applications. However there are several extensions that could make our system even more robust and efficient. At the moment we believe that fairness, in terms of distribution of load on the nodes is done through ran- dom assignment. However, a more optimal scheme could be devised in order to make load distribution dependent on energy content of individual nodes. Such energy-aware sampling has already been implemented in IDEA [11] and Peleton [57], where nodes share energy states resulting in efficient sam- pling schemes. Our system could incorporate these ideas as well. If nodes were to store energy states of the neighbours, a selection of energetic nodes could be made over the worn out ones. Further discrepancies creep in when nodes allow energy harvesting (for example, solar charging). In such cases, some nodes would inherently have more energy than others and it should be our aim to utilize the more energetic nodes compared to others. In our scheme, we consider that sampling rate is constant, however for an application which is dependent on event occurrence rates, further optimiza- tions could be made. For example, an animal habitat monitoring system might experience more movement at day time compared to night. In such cases, the sampling rate could be increased during the day and decreased at night. Some applications depend on user query rates, the sampling rate of such applications could be controlled by the sink depending on number of incoming requests. We shall discuss more about this in Chapter 3. 30 2.10. Conclusion While reconstructing the original data using compressed sensing, we ig- nore quantization errors. In many cases, sensors send digital data which means that they can only send data packets in certain quantization steps. Such discrete steps lead to inherent error in the sampled data. Such er- rors add to the reconstruction error when we are computing the NMSE for compressed sensing making our calculatiions erroneous. If we were to consider quantization errors while calculating the reconstruction error from compressed sensing, more accurate results can be obtained. 2.10 Conclusion We have presented a framework that can be used to elongate the life of a sensor network deployment. We have proved that our CC Algorithm is near-optimal and successfully maintains coverage and connectivity in a sen- sor network. Combining our CC Algorithm with compressed sensing, we promise significant energy savings along with robustness over a well con- nected selection of sensor nodes. Experimentally, using simulation and test data, we have demonstrated the effectiveness of our scheme. We have also provided hints at how to employ our system on actual sensor software. The next big step for us would be to test our scheme on an actual system. A large deployment with a significant number of nodes which monitors some phys- ical phenomenon having significant correlation would be the ideal platform to demonstrate our scheme. Two important aspects of our scheme are simplicity and effectiveness. We have made our scheme as simple as possible such that it can be easily adopted. From the point of effectiveness, we point out the amount of energy savings leading to increased lifetime of sensor nodes that can be extracted using a combination of our CC Algorithm and compressed sensing. Such energy savings are not possible using other micro optimization techniques. 31 Chapter 3 Optimal sampling based on user queries in sensor networks 3.1 Introduction Sensor networks are often deployed for applications which use only the most recent information aggregated by the sensors. In such cases, the time of arrival of user queries become the deciding factor that dictates the sampling interval for the network, more than events occurring on the network end. The question that arises is how often should we query the network so that users do not get stale data. However, one should also try to minimize the energy loss associated with querying the network. To answer this question, we investigate the use of statistical information about the inter-arrival time of user queries in order to decide on an optimal sampling schedule. Our sampling policies are optimal from the point of view that they provide a balance between data freshness and energy loss due to querying. Many sensing systems are based on data that is only useful at real time. The performance of such systems will depend on the frequency and dis- tribution of user queries. A simple traffic control system can be used to demonstrate the problem we are trying to solve. Let us consider that traffic is monitored using sensors which send back data to a data processing centre. Car drivers can lookup this information and decide to take a route which is less congested. The main quality of service parameter for such an application is data freshness. If the application delivers stale data, an user may receive information regarding a junction being traffic free which over the delay in- terval has become congested. This shows that stale data makes the system useless. However, if the application starts monitoring traffic conditions at short intervals, it provides information of excellent quality but the energy lost in communicating the traffic states to the data processing center signif- icantly reduces the lifetime of the sensors. Further, if only a few cars are 32 3.1. Introduction interested in the traffic conditions, the energy spent in gathering the precise data is wasted. If the traffic control system were to track the number of user queries and match it with the quality of data, a much more efficient system would result. The same principles are true for several systems where a data collection center aggregates information and users utilize this information. The underlying network can be considered to be a sensor network or any system that involves a certain cost for aggregating and sending data to a centralized data processing unit. Our approach to designing a data collection system, which responds to user queries is a three step process. In the first step, we analyse statistical information regarding the arrival of queries. Next, we associate a cost to data staleness and energy lost in network querying. Finally, we find a trade- off that is optimal for the given system. In order to compare the costs incurred by our system, we associate certain weights to the querying cost and the staleness penalty and try to optimize the sum of these weighted values. The system specific to this project, is a wireless sensor network(WSN) which consists of sensor nodes monitoring a sensing field. The nodes send their reading regularly to a data processing unit or sink which is in turn queried by users. Every time the nodes send data to the sink, they incur a cost which reduces the lifetime of the nodes. The sampling instants are fixed by the data processor. A larger interval allows the node to sleep and save energy, while a smaller interval increases the application’s precision. A key success parameter for our project is the distribution of user queries. In case the user queries follow an unknown distribution, it becomes compli- cated to model and decide on a strategy that yields optimal performance. However, in most cases, the arrival times of user queries can be modelled some known distribution which can be analysed. After analysing the distri- bution, the sink decides on an optimal sampling interval. The intervals are designed such that the delay between data aggregation and query arrival is minimized, leading to a reduction of data staleness. To decide on a sampling interval, the sink evaluates the statistical information related to the arrival time of the user queries and decides on a new sampling interval. The new sampling interval is communicated to all nodes which can turn off their radio and save energy during the interval. While we develop optimal strategies for sampling and querying the network, our energy saving methodology is to- tally orthogonal to other methods that can be used to conserve energy. Such methods which lead to energy savings such as device driver optimizations [30], energy-efficient routing protocols [50] could be used in conjunction with our methods. In order to regulate the emphasis on data freshness, we allow 33 3.1. Introduction application developers to choose a delay penalty factor. If the criticality of fresh data is high, a higher cost is incurred for data staleness. The system then reacts to keep the data staleness cost to a minimum. While optimizing the energy budget for the WSN, we make the following contributions. • Describe a general framework that can be used to make trade-offs be- tween energy lost due to querying and delay penalty cost when replying to user requests. • When user queries follow a simple distribution such as a Poisson dis- tribution, we show that a periodic sampling is optimal. For periodic sampling, we find an optimal sampling rate that minimizes the cost for the system. We extend our solution to the scenario when user queries follow multiple Poisson processes. • We identify the best policy when user queries follow a hyperexponen- tial distribution. We demonstrate methods to take optimal decisions using dynamic programming. However, the way to arrive at an opti- mal decision is computation intensive and hence we present suboptimal strategies. • We demonstrate using experiments that our sub-optimal strategies are nearly as good as the optimal solution in terms of performance. In this chapter, we focus on scheduling sampling intervals for a WSN. The methods identified are general and are applicable to any scheduling problem where there is a cost imposed by delaying an action and a cost involved in taking an action. The WSN is treated as a black box (The data store in fig. 3.1) which incurs energy loss when queried. Any other system that incurs a cost to supply data can be used to replace the WSN. Figure 3.1: Our data processing system answers queries from the users and schedules sampling intervals on the data store. 34 3.2. System model and assumptions 3.2 System model and assumptions Wireless sensor networks are envisioned to be used extensively for several applications such as industrial control [41], traffic condition monitoring [26] and many others. Each WSN consist of several sensor nodes that can com- municate with each other wirelessly. A particular node is assumed to be the base station or data sink. This node does not have energy constraints because it may be connected to a permanent power source. The base sta- tion is computationally more powerful than other nodes in the network. Henceforth, we will use the term node to refer to nodes that are not the base station, and we will use the term data sink or base station when we explicitly refer to this particular node. Energy is a first order constraint for sensor nodes. Since most of the sensor nodes run on batteries, the life span of each node is limited. Applications for WSNs should be designed to keep communication between the nodes to a minimum, since communication is the most energy consuming operation for the nodes [7]. To minimize energy loss due to communication, the nodes send their sampled data back to the sink after certain intervals. During this inter- val, the nodes turn off their radio and transition to a low-power inactive state. The interval at which the nodes need to wake up is decided by the sink, a longer interval results in more energy savings. If the sensors have to sample at fixed periods, the sink notifies the nodes of the sampling in- terval and all the nodes follow this schedule (We assume that the sensors use a time synchronization technique [20].). This limits the communication between the sink and the sensor nodes and hence saves energy. In case the sampling interval keeps varying, the sink notifies the sensors regarding the next sampling instant. We assume that there is some existing protocol for transmitting data from the nodes to the sink which is not of much interest to our scheduling problem. In our model, we assume that all queries are answered by the sink itself and the sink does not issue requests to the nodes to retrieve the latest values. Further, the time at which the sink queries the network is not significant; the only interesting point is when all the data is returned to the sink. We do not consider latency between issuing queries and data retrieval, however it can be incorporated in our model. Our aim is to provide the user with the latest data. We assume that the data for the queries come from a fixed number of nodes in the network. This assumption is important as we consider the querying cost to be fixed and does not vary with user queries. For the application users, the entire network is abstracted to a data store [36] from which data can be retrieved when queried. An important 35 3.3. Related work assumption in our model is that the WSN encounters a fixed energy cost every time it sends data to the sink. All overheads for the network are included in this cost. 3.3 Related work Our work on scheduling sampling intervals in sensor network networks is unique. We do not have knowledge of any previous body of work that uses user query information to schedule sampling intervals for sensor networks. However, we gain insight from scheduling approach used in other fields. We also use information on mathematical distributions and use them to model user query inter-arrival time. Though data freshness is not so well researched in the sensor network community, it is better known in real-time systems. There is an existing body of work on scheduling which consider data freshness as a quality of service parameter [45, 60]. It is important to note that we try to strike a balance between energy consumption and data freshness. Most of the systems till date have tried to optimize for either of these which resulted in a rather different scheduling protocol compared to ours. Liu et al. [32] have devised predictive sensing strategies for the sensors tailored to serve different applications where sensors have to switch from high to low sampling rates in order to save energy. Such switching of sampling rate relates to the frequency of events sensed by the sensors. Our system can be considered to have a similar architecture where we switch depending on the distribution of user queries. The work on query interval scheduling by Gopalakrishnan [21] is relevant to our scheduling problem. While their idea is to predict the arrival time of events to enable prompt sensing, we try to predict user query arrival times to provide them with fresh data. Other work on sampling rates for sensor networks is also relevant to our project. Some of these related projects focus entirely on communication constraints and assume that sensor networks are continually backlogged with data – Bandyopadhay et al.[3]. Work related to real time query scheduling is also related to our project. Chipara et al. [15] in their paper describe how to approach conflict-free transmission schedules for real time queries. Work on sleep scheduling by Keshavarzian et al.[29] and Cao et al. [5] demonstrate strategies useful for energy saving without concerning data staleness. 36 3.4. Problem description 3.4 Problem description We aim to provide users with fresh data subject to the constraint of reduced energy consumption. Users query a data sink which replies with the latest data it has procured. The sink node in turn queries the sensor network and gathers the most up to date data. In order to achieve the best balance between data freshness and energy consumption, we try to model a query scheduling scheme that provides optimized results. 3.4.1 Problem formulation Let us consider a network having a single data sink. Users query the network through this sink which also decides the sampling interval for the network. It can sample the network at regular intervals or could set a new sampling time depending on the frequency of users querying the network. In this model, the sensor network incurs two costs, the querying cost Cq and the delay penalty Cd. Our model is similar to most financial models which try to balance two contrasting costs. This is true for most demand and supply systems. Our system similarly depends on two costs – staleness and energy loss. A balance of these two costs result in optimal behaviour. Fig. 3.2 represents a typical time graph depicting arrival times of user queries and the sampling intervals. User queries are denoted as tui for the ith user query while the sampling instants are represented as tsj for the jth sampling instant. Let us represent the time between successive user query arrivals as τi s.t., τi = tui+1 − tui . To formulate the system cost, let us consider an interval of time t = 0 to t = T . In this interval, let us query/sample the network n times. Each of these sampling instants are represented as tsj , where j = 0,1,2,3... n. Figure 3.2: Simple timeline graph demonstrating sampling and event arrival times. 37 3.4. Problem description The cost for querying the network is given by Ec for each sampling instant and total cost for querying the network over time T is represented as Ec × n. (Given that there are n sampling instants within time T .) Our aim is to come up with an optimal querying interval (tsj+1 − tsj ). Let us represent the jth interval as tj . During this interval, multiple user queries are made to the system which are denoted by the set {tui}. Depending on the staleness of data available to each user query, the system pays a delay penalty proportional to (tui − tsj ). For the interval tsj to tsj+1 , let us consider there were m user queries. The system cost during the interval can then be represented as: Cj = KqEc +Kd ∑m i=1 [ tui − tsj ] (3.1) In the above equation, Kq and Kd are the querying and delay constants. The value of these constants should depend on the application. For example, an application which has low energy budget should choose a high value of Kq while a system which is sensitive to supplying stale data should choose a high value of Kd. By choosing an appropriate value of Kq and Kd, users can provide weigh- tage to the staleness penalty or energy loss for a given system. In fact, we can show that the expected staleness can be bounded within certain limits as well. However, no hard bound on delay can be provided because the optimal sampling rate depends on the user arrival process. If hard bounds on the delay have to be provided, a dynamic approach like ours would not work. One would instead look at simpler sampling strategies such as peri- odic sampling such that for any given distribution of user queries, no query would ever cross the staleness bound. Over the total time span T, the system incurs a cost C(T ) = [KqEc]n+Kd ∑n j=1 ∑m i=1 ( tui − tsj ) (3.2) Now, if we considered the desired optimal querying interval to be a fixed value ts, Number of sampling intervals, n = T/ts. For a given time interval, considering periodic sampling, the cost function can be written as Ci = KqEcT/ts +Kd ∑ T ∑ ts ( tui − tsj ) (3.3) Since the user queries follow some probabilistic model, instead of computing actual values, we use expected values to represent the cost function. The 38 3.5. Decision making expected cost can then be represented as: E [C] = E [KqEc × n] + E [ Kd ∑ T ∑ ts tui ] (3.4) In equation3.4, we have a cost function that can be used to model any probabilistic distribution of user queries. Let us now look at policies that can be used to decide on an optimal sampling interval using the above equations. 3.5 Decision making In the above model, we show that the cost of the system can be considered to be comprising of two components, a delay penalty and the querying cost per cycle. To minimize the system cost, the sink decides the next sampling time at the end of each interval. A decision-making policy must take into consideration the type of distribution and the history of user queries made to the system. If periodic sampling is the optimal policy, the sink decides on the optimal interval and communicates this value to all the nodes. When a new decision needs to be taken at the end of every interval, we choose dynamic programming, which is a natural choice to solve decision making problems. In the dynamic programming model, at the end of every sampling interval ts, a decision is taken regarding the next sampling instant. In the formulation, tu represents the arrival time of an user query. The optimal cost function can be represented as shown below: C∗ (ti) =minli+1≥0 {E [C (ti)] + P (tu ≥ ti+1)C∗ (ti+1)} where li+1 is the length of the i+ 1 th interval. The cost function for a single interval can now be represented as: C (ti) = KqEc +Kd (tu − ti)1 (tu ≤ li+1) (3.5) KqEc is the querying cost for a single interval. ti is the start of a sampling interval. ti+1 is the start of the i + 1 th sampling interval. We use the indicator function 1(.) to represent the scenario when there are no user queries within an interval. In our model, we consider a single user query as a boundary condition. If no user query arrive in a given interval, a new sampling interval has to be enforced. 39 3.6. Analysis for poisson user query arrival 3.6 Analysis for poisson user query arrival When the user query arrival intervals maintain a pattern, we can predict the next sampling interval with high accuracy. For example, if the user queries follow a Poisson distribution with some arrival rate λ, we could arrive at a strategy to optimize the cost of our system. When the inter arrival times τi are exponentially distributed, the query arrival rate λ can be used to completely characterize the distribution. We can also claim that decision making at each stage is independent of the prior stages as the Poisson process is memoryless. Hence the optimal schedule for this distribution defaults to periodic scheduling with intervals of equal length. Once we know that the optimal strategy for this problem is fixed interval sampling, we can use equation 3.3 to solve this problem. Figure 3.3: Simple timeline graph for poisson event arrival. We show a simple example of user query arrivals within a sampling in- terval in Fig. 3.3. In the figure, x is the interval between the first sampling instant and the first user query’s arrival time. Since the user queries follow a Poisson distribution, the interval between any two queries tui and tui+1 follow an exponential pattern with some arrival rate λ. In equation 3.3, we can substitute the expected delay penalty as: E [∑n−1 i=1 tui ] = E[n]× E[tui ] Now, tui = x+ ∑n−1 j=1 Zj where, Zj is an exponential random variable. Therefore, 40 3.6. Analysis for poisson user query arrival E [tui ] = E [ x+ ∑n−1 j=1 Zj ] Since, x is the time from the arrival of the first user query, we can consider that x is also exponentially distributed with the same λ. This means that E [tui ] can be written as: E [tui ] = E [∑n j=1 Zj ] Expected cost of a sum of exponentially distributed random variables can be written as λts× 1/λ and E [n] = λts. The cost function can be now written as: E [C] = KqEcT/ts +Kd × λt2s × T/ts In order to find the minimum cost, we consider the first order differential E [C] ′ = 0 E [C] ′ = −KqEcT/t2s +Kd × λT = 0 ts = √ KqEc Kdλ (3.6) 3.6.1 Multiple poisson arrival in parallel In our initial discussion on Poisson event arrival, we evaluate a situation where user query arrivals follow a single Poisson distribution. A natural extension to this problem is the scenario when event arrivals follow differ- ent Poisson arrival frequencies. If we follow our previous methodology, the optimal strategy would be to have different sampling intervals for each ar- rival rate. However, from an implementation and energy efficiency point of view, a single uniform rate is much more desirable. In this discussion, we will demonstrate that it is better to have a single sampling rate rather than interleaving intervals at different rates. We also show that the single sampling rate performs better in terms of energy efficiency in comparison to the multiple sampling rate strategy. Let us start with the simple assumption that query arrivals follow two independent Poisson distributions with arrival rates λ1 and λ2. Let us ini- tially assume that the delay penalty Kd for each of these distributions also remains the same. Following equation 3.6, we have the sampling intervals for the two processes as: ts1 = √ KqEc Kdλ1 41 3.6. Analysis for poisson user query arrival ts2 = √ KqEc Kdλ2 Let us now consider that we have a single sampling rate. Even if we have event arrivals following multiple Poisson processes, our sampling decision still remains memoryless as each sampling interval is indistinguishable from the other. Hence, we decide on periodic sampling. The energy consumption formulation remains the same, which is: E [C] = E [KqEcT/ts] + E [ Kd ∑ T ∑ ts tui ] where ∑ ts tui is sum of delay caused by user query arrivals from two different Poisson arrivals with inter-arrival rates λ1 and λ2. Since the two event arrival processes are independent of each other, we can use the theory of Linearity of Expectations [39] to show: E [∑n−1 i=1 tui ] = E [∑x j=1 Zj1 ] + E [∑y j=1 Zj2 ] where x and y are the number of events of rate λ1 and λ2 in the sampling interval ts. Also note that x+y = n. Replacing expected values with actual values and considering ts to be the optimal sampling interval, we get E [∑n−1 i=1 tui ] = λ1 × t2s + λ2 × t2s The total energy consumed over time T equates to: E [C] = KqEcT/ts +Kd ∑ T t 2 s {λ1 + λ2} E [C] = KqEcT/ts +KdTts {λ1 + λ2} For an optimum global sampling rate ts, let us consider the first order deriva- tive, E [C] ′ = 0. Solving this equation provides us with the value of ts as: ts = √ KqEc Kd(λ1+λ2) The above expression represents the sampling interval when we have queries following two independent Poisson inter-arrival rates. Generalizing this equation to n different Poisson arrivals would yield ts as: ts = √ KqEc Kd( ∑n i=1 λn) If each of these n arrivals were to be associated with a different delay penalty factor Kdn , it is trivial to prove that: ts = √ KqEc∑n i=1 Kdnλn 42 3.7. Analysis for hyper exponential user query arrival Experimental validation In this section, we will try to prove that the single sampling interval is actually better than the multiple querying interval. To test this we consider a system that has Ec = 1 and we change the values of λ1 and λ2 to check how the performance varies. Figure 3.4: Cost vs λ1/λ2 for different ratio between kq and kd As it is evident from Fig. 3.4, the cost incurred after deciding on a single sampling interval is always better than having multiple different sampling intervals. 3.7 Analysis for hyper exponential user query arrival Though a majority of user query processes can be categorized as being Pois- son, there are definitely a lot of processes that do not conform to a sim- ple exponential distribution and exhibit long-tailed behaviour. To model such processes use a hyper-exponential distribution [46]. The hyperexpo- nential distribution is characterized by parameters λ = {λ1, λ2, λ3, ..λm} 43 3.7. Analysis for hyper exponential user query arrival and q = {q1, q2, q3, .........qm}. This represents an hyperexponential distribu- tion having m phases with the probability of the process being in phase i is given by qi. Events in phase i follow a Poisson distribution with rate λi. 3.7.1 Preliminaries Let us consider that we have a known instant of time, t and the predicted next sampling interval is of length l in time. We assume that the next user query arrives at some time instant τ . When τ follows a hyperexponential distribution, we have P (τt > t) = P (τ > t+ l|τ > t) = P (τ>t+l)P (τ>t) where τt = τ − t = ∑n k=1 qke −λkte−λkl∑n k=1 qke −λkt = ∑n k=1 Zk (q, l) e −λkt where we define Zk (q, l) := qke −λkt∑n j=1 qje −λjt , k = 1, 2, ., n. Following the definition of Zk (q, l), we observe that ∑n k=1 Zk (q, l) = 1 and Zk (q, l) > 0. From this observation and the characterization of τt in the above equation, it follows that τt is a hyperexponentially distributed vari- able with parameters λ and Zk (q, l). This description of τt suggests that we view Z (q, l) as a distribution transform: it transforms a hyperexponen- tial distribution into another hyperexponential distribution with the same rate parameter λ but different phase probabilities. Using the definition of Zk (q, l) we can show that Z (q, l1 + l2) = Z (Z (q, l1) , l2). In other words, z is composable and we can denote Z (Z (q, l) , l) as Z2 (q, l) and, as a natural extension, Zm (q, l) = Z ( Zm−1 (q, l) , l ) . Now, if we define U (k) to be an n-dimensional vector with the kth element taking the value 1 and the re- maining elements being 0. We define I (q) as the smallest index k such that qk > 0. Using these definitions, we state the following: For a fixed q, and with λ1 < λ2 < λ3... < λn lim n→∞zn (q, l) = U (I (q)) [21] 44 3.7. Analysis for hyper exponential user query arrival As a consequence of this proposition, when the gap between successive sam- pling increases, the likelihood of the phase in the hyperexponential distri- bution with the smallest rate grows. Further, the weight, ( element of the q vector) of this phase approaches 1. It seems that after a sufficiently long sampling interval, the optimal strategy would be to schedule sampling peri- ods assuming only the phase of the hyperexponential distribution with the smallest arrival rate. 3.7.2 Optimal scheduling strategies As we can see from the discussion above, for an hyperexponential process, the optimal scheduling strategy corresponds to one for a Poisson arrival process with rate λ1 i.e the smallest arrival rate. However, this decision is only optimal after a sufficiently long time. In the earlier stages, we use dynamic programming to decide the next sampling interval. The first step in finding a solution to the dynamic programming problem for the optimal sampling time, l∗ is to restrict the action space to upper and lower bounds. For this specific reason, we provide bounds on the choice of a sampling period. Proposition.It is sufficient to consider sampling intervals between the interval [0,lmax] where lmax = √ Cmax−KqEc Kdmink λk (3.7) and Cmax = KqEc +Kd maxk λk Proof. In order to derive the maximum cost, we considered a policy that samples the nodes every unit time. The maximum cost for such a scheduling policy per sampling interval could be described as C = KqEc +Kd ∑ t=1 [tui ] For maximum cost, the hyperexponential distribution must be in its highest λ, therefore the maximum cost Cmax = KqEc +Kd maxk [λk] For the given scheduling policy, Cmax is the highest price that can be in- curred by the system. Now for any system, it is trivial to perform better than this maximum cost Cmax. So, in order to find lmax, we use the following formulation. 45 3.7. Analysis for hyper exponential user query arrival Cmax = KqEc +Kd mink [λk] l 2 max The above equation yields lmax = √ Cmax−KqEc Kdmink[λk] Let us now formulate our dynamic programming problem. For a discrete- time MDP over a given state space and considering non-negative costs, we have C∗ (t) =minl≥0 {KqEc +Kd ∑ t tui 1 (τt ≤ l) + P (τt ≥ l)C (t+ l)} In the above equation, τt = tu1− t, or the arrival time of the first user query from the start of sampling. τt is also hyperexponentially distributed with parameters λ and Z (q (0) , t). One can obtain the minimum cost and the optimal scheduling policy using value iteration [17], and by using the fact from our previous proposition that Z (q (0) , t) asymptotically converges to U ( Iq(0) ) . For the value iteration, we use Cj+1 (t) = min l≥0 { KqEc +KdE [∑ t tui ] 1 (τt ≤ l) +P (τt ≥ l)Cj (Z (q (0) , t))} (3.9) and we can use C0 (.) = 0 to initiate the iteration. The iterations asymp- totically converge to the optimum [17] C∗ (t) =limj→∞ Cj (t) Using value iteration at each decision making epoch, however, imposes a high computational cost because we have to explore all possible states and it is not a practical method. We will therefore concentrate on alternate suboptimal strategies that are easier to implement. 3.7.3 Sub-optimal policies Memoryless policies Since optimal results derived from dynamic programming is highly compu- tation intensive, we look for suboptimal strategies to compute the length of 46 3.7. Analysis for hyper exponential user query arrival the next sampling period. In this problem, we consider memory-less policies which take decisions without considering the history of events. Similar to Poisson arrivals, the expected cost function for hyper exponential process arrivals can be represented as: E [C] = E [KqEcT/ts] + E [Kd ∑ T ∑ t tui ] Let n be the number of user queries between sampling and m be the number of phases for the hyperexponential distribution. E [∑n−1 i=1 tui ] = E[n]× E[tui ] Now, tui = x+ ∑n−1 j=1 Zj Here, Zj is an hyperexponential random variable. Similar to the Poisson query inter arrival analysis, we can make the assumption that for a memo- ryless process, the variable x follows the same hyperexponential distribution as Zj . This simplifies the above equation as: E [tui ] = ts∑m K=1 PK λK ×∑mK=1 PKλK E [n] = ts∑m K=1 PK λK E [C] = KqEcT ts + KdtsT∑m K=1 PK λK For optimal ts, we consider E [C] ′ = 0 which yields the value of ts as: ts = √ KqEc ∑m K=1 PK λK Kd (3.10) As the above equation is based on attributes which are constants for a given system, the sampling period can be easily derived. However, it is important to note that memoryless strategies would never achieve equivalent energy savings compared to the optimal dynamic programming model. Two-stage policies In the memory-less policy for deciding the optimal sampling period, we de- cide on a fixed sampling period and adhere to it. However, a better approach is to use a mix of dynamic programming and memoryless strategies. In the two-stage method, the first stage involves a dynamic programming decision. 47 3.8. Evaluation The following stages employ a memoryless strategy. This reduces the cost for the system, bringing it closer to optimum compared to a completely memo- ryless strategy. The computation cost for the two-stage policy is much less compared to the optimal scheme where calculations had to be performed for every stage. In this scheme, we divide the entire sampling space into smaller time seg- ments with n sampling instants in each segment. Let the sampling durations be denoted by {l1, L2, L3....Ln}. where l1 is the sampling interval discovered using dynamic programming and the remaining Lj , j ∈ n follow the mem- oryless policy. The size of n is decided depending on the accuracy vs com- putation tradeoff. A smaller value of n would imply more accurate/efficient sampling intervals in lieu of increased computation. The optimal cost for the first interval is given by: C∗1 (0) =minl1≥0 {c(0, l1) + P (τ > l1)C∗2 (l1)} (3.11) where C∗1 (0) represents the cost for the first stage while C∗2 represents the cost for the second stage which can be found using the memoryless policy. We can use equation 3.11 and dynamic programming to derive the length of the first interval l∗1 and the lengths of the other n− 1 intervals follow the memoryless policy. Thus, the computation overhead of applying dynamic programming is incurred every n − 1 cycles. Depending on the value of n, the system will perform closer to optimality at the cost of increased computation. 3.8 Evaluation In the previous sections, we have discussed several methodologies for schedul- ing sampling intervals for different distribution of user queries. In this section, we compare these policies with the help of some experiments to demonstrate how each scheme performs relative to the other. In the process of experimenting, we also gain insight on how these policies change depend- ing on the rate of arrival of events and how much weight we allocate to delay penalty over sampling cost. 3.8.1 Numerical results In order to test performance of our various sampling techniques, we selected arbitrary values for the query arrival rate and varied them over a large range to show the change in performance. The general methodology that 48 3.8. Evaluation we followed was to compute sampling interval or cost (depending on which was appropriate) while varying the value of λ or the ratio between Kq and Kd. For the cost incurred every sampling instant, we considered a fixed value of Ec. Poisson arrivals For Poisson inter-arrivals, we studied the variation of sampling period as we modify λ for different values of the the ratio kq/kd (Fig. 3.5). As expected, the value of the sampling period decreases as we increase λ. The variation of kq/kd affects the value of λ which is also expected as per equation 3.6 for the optimal sampling period. The trend with expected cost( 3.7) reflects varia- tion of an opposite fashion compared to sampling intervals, when measured across varying values of λ. As the sampling interval decreases, the cost of the system increases. We also conducted experiments to show the trend of optimal sampling interval when the ratio between kq and kd is varied while the value of λ is kept fixed(Fig. 3.6). The interesting observation made here is the increase in system cost when query penalty is high. This leads us to the conclusion that though we associate a cost for delay, the querying cost holds the more important role in deciding the system cost. Figure 3.5: Sampling time with varying values of lambda – poisson distri- bution 49 3.9. Conclusion Figure 3.6: Sampling time vs kq/kd – poisson distribution Hyperexponential arrivals To evaluate the hyper-exponential inter-arrival time, we consider a hyperex- ponential distribution with the following parameters: λ = 0.2, 4, 10 and q = 0.05, 0.3, 0.65 . To measure optimal sampling, we consider a time interval of 100 units and use finite horizon dynamic programming to decide the best sampling strategy. We acknowledge that the policy is computationally in- tensive and difficult to implement, however it provides a benchmark for the other sub-optimal policies. We evaluated the different sub-optimal schemes to figure out which one performs the best. We compared the memoryless policy with the two stage policy (with n = 10) alongside the optimal policy. From the experiments in Fig. 3.8, we can see that the two stage policy per- forms much better compared to the memoryless policy. Compared to the optimal policy, the two-stage policy is not far behind in terms of system cost. 3.9 Conclusion We have demonstrated strategies for sampling a sensor network to balance the cost between data staleness and query scheduling. We show that if user queries follow a certain distribution, the sampling intervals should be 50 3.9. Conclusion Figure 3.7: Cost vs lambda – poisson distribution designed to provide users with fresh data and at the same time, keep the cost of sampling to as low as possible. We show that choosing an arbitrary sampling interval when deploying the system may not take into consideration the pattern of user queries and hence prove inefficient. However, if the system reacts to user queries, a much more efficient system results. We have restricted our study to two different query inter-arrival distri- butions. We try to discover optimal sampling intervals when user query inter-arrival times follow a Poisson or hyperexponential distribution. We believe that these two distributions can be used to model several kinds of distribution of user queries. However, other distributions could also be re- searched into. As an extension of this work, a phase-type distribution could be investigated. A phase type distribution is perhaps the most general form of distribution which can be considered for event arrivals. However, it involves more variables compared to a Poisson or hyperexponential distribu- tion and hence computation becomes complex. We suggest that the method used to come up with a solution is more important than the mathematical distribution itself. The same methods will be applicable for any distribu- tion(including phase type) though the approach to solve the problem might be slightly different. An important additional task would be to look at how to implement 51 3.9. Conclusion Figure 3.8: Cost vs kq/kd for different hyper-exponential sampling policies. this system for an actual application. One of the first steps would be to quantify the energy cost for each sampling query. Though this is not a simple task to perform, an average value can be arrived at. The next step would be to study the pattern of user queries and decide on a representative distribution. This can be done by studying the past patterns and come up with a predictive model. The values of kd and kq are also critical and can be changed at different times of the day in order to accommodate different user query patterns and priorities. Once we analyse the above parameters and set values for each, our system is sure to provide energy savings. In this work, we have outlined the simplest energy balancing strategies to use while deploying a sensor network. It is the first step as a guidance to a system developer. Putting these ideas to practice is probably the next big challenge. The simplicity of this scheme makes us believe that the ideas can be easily implemented. The biggest contribution from this chapter is successfully identifying the cost associated with querying and the cost asso- ciated with delayed response and then finding a way to balance them. This is not only helpful for a sensor network but is also generic from the point of view that it can be applied to any system that incur similar overheads. 52 Chapter 4 Conclusion 4.1 Introduction We started our thesis with the aim of reducing energy consumption in sen- sor networks. We proposed to build high level optimization techniques that would produce a large improvement of lifetime for sensor nodes over present standards. We stress that since micro optimizations have not yielded the life-span expansion required to make sensor networks a more acceptable technology, it is time to pursue radical changes to save energy in sensor net- works. In this thesis, we show two such changes which are mainly overlooked by other micro-optimization techniques. The broad questions that we have answered in this thesis are: • In certain applications, sensors send redundant data to the sink. How do we reduce this? • All the data that is sensed and sent by the sensors may not be useful to the users. How do we reduce needless sensing? From our perspective, more than making an efficient sensing system, reduc- ing unnecessary energy loss can lead to higher gains in the network. We show using our techniques that multi-fold increase in the lifetime of nodes is possible. Another important aspect of our technique is that we try to keep the changes required in existing sensor network systems to the simplest pos- sible. Both our techniques are simple to understand and implement. This makes our scheme adoptable to any software or hardware platform. In fact, our second technique is generic enough to be applicable to applications other than sensor networks. 4.2 Summary of contribution In our first optimization technique, we show that deploying a dense network of nodes and efficiently duty-cycling it could lead to high energy gains. We propose algorithms that maintain connectivity and coverage over a network 53 4.3. Future work of nodes that are being duty-cycled. Such a network ensures energy savings as well as accuracy. In certain cases, where correlation exists between the data sent by neighbouring nodes, further savings can be achieved through compressed sensing. Compressed sensing is used to reconstruct the data from the nodes that were not active in a given cycle. We demonstrate using experiments that in a suitable compressed sensing environment, the sensed values from all the nodes in the network can be recovered with high accuracy even when only 20 percent of the nodes are active. Of course, compressed sensing does not require a dense deployment, it requires a sufficiently large one. When only 20 percent of the nodes are required to represent the entire sensing field, we should be able to increase lifetime of the nodes by a factor of 5 while maintaining accuracy, coverage and connectivity. Of course, these results are idealistic when we consider fair switching of nodes and a suitable compressed sensing environment. In our second optimization technique, we seek to optimize information processing in real time applications for sensor networks. Certain applications which are based on user queries face the choice between quality of service in terms of data freshness and energy consumption. We show that such a system is governed by the rate and distribution of user queries. If the incoming queries can be modelled as a mathematical distribution, an optimal sampling interval or a decision making process can be formulated. We show that having such processes in place would be efficient in decreasing the energy loss in the network due to frequent sampling. Since our optimization techniques work on independent sections of the sensor network, they can be combined to produce the ideal system. Our simple scheme allows most of the micro optimization techniques to be applied in conjunction. Our hope – such a system will elongate the lifespan of sensor networks to acceptable levels such that it can be used in applications which demand a longer lifespan from the nodes. 4.3 Future work We consider that this thesis could be the stepping stone to a new method of energy conservation in sensor network research. As we pointed out before, we have tried to eliminate redundancy and energy wastage while continuing to provide adequate quality of service for WSN applications. The next step for us would be to look at better data reconstruction techniques which can model data evolution. Such techniques would work in scenarios where the correlation between data points are time-varying. Further, for the usage- 54 4.3. Future work centric approach to lifetime improvements, one can seek to model more general usage patterns that could be described using phase-type distribu- tions. We also note that it would be of value to model energy harvesting techniques when planning duty cycles and query schedules. 55 Bibliography [1] Integrating adaptive components: An emerging challenge in performance-adaptive systems and a server farm case-study. In Pro- ceedings of the 28th IEEE International Real-Time Systems Sympo- sium, RTSS ’07, pages 227–238, Washington, DC, USA, 2007. IEEE Computer Society. [2] Waheed Bajwa, Jarvis Haupt, Akbar Sayeed, and Robert Nowak. Com- pressive wireless sensing. In IPSN ’06: Proceedings of the 5th interna- tional conference on Information processing in sensor networks, pages 134–142, New York, NY, USA, 2006. ACM. [3] S. Bandyopadhyay and E.J. Coyle. Spatio-temporal sampling rates and energy efficiency in wireless sensor networks. In INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Com- munications Societies, volume 3, pages 1728 –1739 vol.3, 2004. [4] E.J. Candes and M.B. Wakin. An introduction to compressive sampling. Signal Processing Magazine, IEEE, 25(2):21 –30, mar. 2008. [5] Qing Cao, Tarek Abdelzaher, Tian He, and John Stankovic. Towards optimal sleep scheduling in sensor networks for rare-event detection. In Proceedings of the 4th international symposium on Information process- ing in sensor networks, IPSN ’05, Piscataway, NJ, USA, 2005. IEEE Press. [6] Qing Cao, Tarek Abdelzaher, John Stankovic, and Tian He. The liteos operating system: Towards unix-like abstractions for wireless sensor networks. In IPSN ’08: Proceedings of the 7th international conference on Information processing in sensor networks, pages 233–244, Wash- ington, DC, USA, 2008. IEEE Computer Society. [7] Qing Cao, Debessay Fesehaye, Nam Pham, Yusuf Sarwar, and Tarek Abdelzaher. Virtual battery: An energy reserve abstraction for embed- ded sensor networks. Real-Time Systems Symposium, IEEE Interna- tional, 0:123–133, 2008. 56 Bibliography [8] Qing Cao, Dong Wang, Tarek Abdelzaher, Bodhi Priyantha, Jie Liu, and Feng Zhao. Energy-optimal batching periods for asynchronous mul- tistage data processing on sensor nodes: Foundations and an mplatform case study. Real-Time and Embedded Technology and Applications Sym- posium, IEEE, 0:101–110, 2010. [9] M. Cardei, M.T. Thai, Yingshu Li, and Weili Wu. Energy-efficient target coverage in wireless sensor networks. In INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, volume 3, pages 1976 – 1984 vol. 3, 2005. [10] A. Cerpa and D. Estrin. Ascent: adaptive self-configuring sensor net- works topologies. Mobile Computing, IEEE Transactions on, 3(3):272 – 285, 2004. [11] Geoffrey Werner Challen, Jason Waterman, and Matt Welsh. Idea: integrated distributed energy awareness for wireless sensor networks. In MobiSys ’10: Proceedings of the 8th international conference on Mobile systems, applications, and services, pages 35–48, New York, NY, USA, 2010. ACM. [12] R. Chartrand and Wotao Yin. Iteratively reweighted algorithms for compressive sensing. In Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, 31 2008. [13] Benjie Chen, Kyle Jamieson, Hari Balakrishnan, and Robert Morris. Span: an energy-efficient coordination algorithm for topology main- tenance in ad hoc wireless networks. Wireless Networks, 8:481–494, September 2002. [14] Lei Chen, Christopher Olston, and Raghu Ramakrishnan. Parallel eval- uation of composite aggregate queries. In ICDE ’08: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, pages 218–227, Washington, DC, USA, 2008. IEEE Computer Society. [15] O. Chipara, Chenyang Lu, and G.-C. Roman. Real-time query schedul- ing for wireless sensor networks. In Real-Time Systems Symposium, 2007. RTSS 2007. 28th IEEE International, pages 389 –399, 2007. [16] D.L. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289 –1306, apr. 2006. [17] D.P.Bertsekas. Dynamic Programming and Optimal Control,2 ed, vol.1. Athena Scientific, 1996. 57 Bibliography [18] M.R. Duarte, M.B. Wakin, D. Baron, and R.G. Baraniuk. Universal distributed sensing via random projections. pages 177 –185, 2006. [19] M. Elad, J.-L. Starck, P. Querre, and D.L. Donoho. Simultaneous car- toon and texture image inpainting using morphological component anal- ysis (mca). Applied and Computational Harmonic Analysis, 19(3):340 – 358, 2005. Computational Harmonic Analysis - Part 1. [20] Jeremy Elson and Deborah Estrin. Time synchronization for wireless sensor networks. Parallel and Distributed Processing Symposium, In- ternational, 3:30186b, 2001. [21] Sathish Gopalakrishnan. Optimal schedules for sensor network queries. In Proceedings of the 2010 Real-Time Systems Symposium, pages 140– 149, Los Almos, CA, USA, 2010. IEEE Computer Society. [22] J. Haupt, W.U. Bajwa, M. Rabbat, and R. Nowak. Compressed sensing for networked data. Signal Processing Magazine, IEEE, 25(2):92 –101, mar. 2008. [23] Tian He, Chengdu Huang, Brian M. Blum, John A. Stankovic, and Tarek Abdelzaher. Range-free localization schemes for large scale sensor networks. In Proceedings of the 9th annual international conference on Mobile computing and networking, MobiCom ’03, pages 81–95, New York, NY, USA, 2003. ACM. [24] J.L. Hill and D.E. Culler. Mica: a wireless platform for deeply embed- ded networks. Micro, IEEE, 22(6):12 – 24, 2002. [25] Tim Tau Hsieh. Using sensor networks for highway and traffic applica- tions. Potentials, IEEE, 23(2):13 – 16, 2004. [26] Bret Hull, Vladimir Bychkovsky, Yang Zhang, Kevin Chen, Michel Goraczko, Allen Miu, Eugene Shih, Hari Balakrishnan, and Samuel Madden. Cartel: a distributed mobile sensor computing system. In Proceedings of the 4th international conference on Embedded networked sensor systems, SenSys ’06, pages 125–138, New York, NY, USA, 2006. ACM. [27] INTEL. Intel lab data. [28] Apoorva Jindal and Konstantinos Psounis. Modeling spatially corre- lated data in sensor networks. ACM Trans. Sen. Netw., 2(4):466–499, 2006. 58 Bibliography [29] Abtin Keshavarzian, Huang Lee, and Lakshmi Venkatraman. Wakeup scheduling in wireless sensor networks. In Proceedings of the 7th ACM international symposium on Mobile ad hoc networking and computing, MobiHoc ’06, pages 322–333, New York, NY, USA, 2006. ACM. [30] Kevin Klues, Vlado Handziski, Chenyang Lu, Adam Wolisz, David Culler, David Gay, and Philip Levis. Integrating concurrency control and energy management in device drivers. SIGOPS Operatin Systems Review, 41:251–264, October 2007. [31] Jeongyeup Paek Nupur Kothari Sumit Rangwala John Caffrey Ramesh Govindan Erik Johnson Krishna Chintalapudi, Tat Fu and Sami Masri. Monitoring civil structures with a wireless sensor network. IEEE In- ternet Computing, 10:26–34, 2006. [32] Chandra Abhishek Srivastava Jaideep Liu, Haiyang. Pss: Predictive energy-efficient sensing scheduling in wireless sensor networks. 2006. [33] Liqian Luo, Qing Cao, Chengdu Huang, Tarek Abdelzaher, John A. Stankovic, and Michael Ward. Enviromic: Towards cooperative storage and retrieval in audio sensor networks. In ICDCS ’07: Proceedings of the 27th International Conference on Distributed Computing Systems, page 34, Washington, DC, USA, 2007. IEEE Computer Society. [34] Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong. Tag: a tiny aggregation service for ad-hoc sensor networks. SIGOPS Operating Syststems Review, 36(SI):131–146, 2002. [35] Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong. The design of an acquisitional query processor for sensor net- works. In SIGMOD ’03: Proceedings of the 2003 ACM SIGMOD inter- national conference on Management of data, pages 491–502, New York, NY, USA, 2003. ACM. [36] Samuel R. Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong. Tinydb: an acquisitional query processing system for sen- sor networks. ACM Transactions on Database Systems, 30(1):122–173, 2005. [37] R. Masiero, G. Quer, M. Rossi, and M. Zorzi. A bayesian analysis of compressive sensing data recovery in wireless sensor networks. pages 1 –6, oct. 2009. 59 Bibliography [38] John McCulloch, Paul McCarthy, Siddeswara Mayura Guru, Wei Peng, Daniel Hugo, and Andrew Terhorst. Wireless sensor network deploy- ment for water use efficiency in irrigation. In REALWSN ’08: Pro- ceedings of the workshop on Real-world wireless sensor networks, pages 46–50, New York, NY, USA, 2008. ACM. [39] Eli Upfa Michael Mitzenmacher. Probability and Computing. Cam- bridge University Press, Cambridge, UK, 2005. [40] Jianping Pan, Y. Thomas Hou, Lin Cai, Yi Shi, and Sherman X. Shen. Topology control for wireless sensor networks. In Proceedings of the 9th annual international conference on Mobile computing and networking, MobiCom ’03, pages 286–299, New York, NY, USA, 2003. ACM. [41] G. Platt, M. Blyde, S. Curtin, and J. Ward. Distributed wireless sensor networks and industrial control systems - a new partnership. In Pro- ceedings of the 2nd IEEE workshop on Embedded Networked Sensors, pages 157–158, Washington, DC, USA, 2005. IEEE Computer Society. [42] Joseph Polastre, Jason Hill, and David Culler. Versatile low power media access for wireless sensor networks. In Proceedings of the 2nd in- ternational conference on Embedded networked sensor systems, SenSys ’04, pages 95–107, New York, NY, USA, 2004. ACM. [43] G. Quer, R. Masiero, D. Munaretto, M. Rossi, J. Widmer, and M. Zorzi. On the interplay between routing and signal representation for compres- sive sensing in wireless sensor networks. pages 206 –215, feb. 2009. [44] Michael Rabbat, Jarvis Haupt, Aarti Singh, and Robert Nowak. Decen- tralized compression and predistribution via randomized gossiping. In IPSN ’06: Proceedings of the 5th international conference on Informa- tion processing in sensor networks, pages 51–59, New York, NY, USA, 2006. ACM. [45] Krithi Ramamritham, Sang H. Son, and Lisa Cingiser DiPippo. Real- time databases and data services. Real-Time Systems, 28:179–215, 2004. 10.1023/B:TIME.0000045317.37980.a5. [46] A. Riska, V. Diev, and E. Smirni. Efficient fitting of long-tailed data sets into hyperexponential distributions. In Global Telecommunications Conference, 2002. GLOBECOM ’02. IEEE, volume 3, pages 2513 – 2517 vol.3, 2002. 60 Bibliography [47] V. Rodoplu and T.H. Meng. Bits-per-joule capacity of energy-limited wireless networks. Wireless Communications, IEEE Transactions on, 6(3):857 –865, 2007. [48] S-POL. S-pol radar data trace. [49] Chatschik Bisdikian Lance M. Kaplan Sadaf Zahedi, Mani B. Srivas- tava. Quality tradeoffs in object tracking with duty-cycled sensor net- works. In Proceedings of the 2010 Real-Time Systems Symposium, pages 160–169, Los Almos, CA, USA, 2010. IEEE Computer Society. [50] C. Schurgers and M.B. Srivastava. Energy efficient routing in wireless sensor networks. In Military Communications Conference, 2001. MIL- COM 2001. Communications for Network-Centric Operations: Creat- ing the Information Force. IEEE, 2001. [51] S. Slijepcevic and M. Potkonjak. Power efficient organization of wireless sensor networks. In Communications, 2001. ICC 2001. IEEE Interna- tional Conference on, 2001. [52] K. Sohrabi, J. Gao, V. Ailawadhi, and G.J. Pottie. Protocols for self- organization of a wireless sensor network. Personal Communications, IEEE, 7(5):16 –27, October 2000. [53] Jean-Luc Starck, E.J. Candes, and D.L. Donoho. The curvelet trans- form for image denoising. Image Processing, IEEE Transactions on, 11(6):670 –684, jun. 2002. [54] Srikanth Sundaresan, Israel Koren, Zahava Koren, and C. Mani Kr- ishna. Event driven adaptive dut cycling in sensor networks. Interna- tional Journal on Sensor Networks, 6:89–100, October 2009. [55] Di Tian and Nicolas D. Georganas. A coverage-preserving node schedul- ing scheme for large wireless sensor networks. In Proceedings of the 1st ACM international workshop on Wireless sensor networks and applica- tions, WSNA ’02, pages 32–41, New York, NY, USA, 2002. ACM. [56] Megan Wachs, Jung Il Choi, Jung Woo Lee, Kannan Srinivasan, Zhe Chen, Mayank Jain, and Philip Levis. Visibility: a new metric for pro- tocol design. In Proceedings of the 5th international conference on Em- bedded networked sensor systems, SenSys ’07, pages 73–86, New York, NY, USA, 2007. ACM. 61 Bibliography [57] Jason Waterman, Geoffrey Werner Challen, and Matt Welsh. Peloton: coordinated resource management for sensor networks. In HotOS’09: Proceedings of the 12th conference on Hot topics in operating systems, pages 9–9, Berkeley, CA, USA, 2009. USENIX Association. [58] Geoff Werner-Allen, Konrad Lorincz, Jeff Johnson, Jonathan Lees, and Matt Welsh. Fidelity and yield in a volcano monitoring sensor network. In OSDI ’06: Proceedings of the 7th symposium on Operating systems design and implementation, pages 381–396, Berkeley, CA, USA, 2006. USENIX Association. [59] R. Willett, A. Martin, and R. Nowak. Backcasting: adaptive sampling for sensor networks. In Information Processing in Sensor Networks, 2004. IPSN 2004. Third International Symposium on, pages 124 – 133, 2004. [60] Ming Xiong, John A. Stankovic, Krithi Ramamritham, and Don Towsley. Scheduling access to temporal data in real-time databases, 1997. [61] Ya Xu, John Heidemann, and Deborah Estrin. Geography-informed en- ergy conservation for ad hoc routing. In Proceedings of the 7th annual international conference on Mobile computing and networking, Mobi- Com ’01, pages 70–84, New York, NY, USA, 2001. ACM. 62
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- On energy-efficient data gathering in sensor networks
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
On energy-efficient data gathering in sensor networks Dhar, Debojit 2010
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | On energy-efficient data gathering in sensor networks |
Creator |
Dhar, Debojit |
Publisher | University of British Columbia |
Date Issued | 2010 |
Description | Wireless sensor networks (WSNs) are becoming increasingly important as they provide an ideal platform for monitoring and controlling physical environments. Starting from small match-box sized nodes to tiny coin-like sensors a WSN promises to be the most ubiquitous information gathering system produced. Being tiny enables ubiquitous and pervasive sensing. However, this form factor comes at the cost of severe resource constraints for the sensor nodes. The nodes can accommodate only a certain amount of energy in the form of batteries and can store and process only a small amount of data due to its crippled size. Due to this reason, sensor networks cannot host more sophisticated applications and the mean time to failure, due to nodes running out of energy, is also low. These are probably the main reasons why sensor networks have not reached their expected potential. This thesis is an effort to alleviate the energy problem of sensor nodes. We attempt to solve the problem using different data and user centric models which can lead to a multi-fold increase of life for the sensors. Most of the research till date has aimed at micro-adjustments in the sensor hardware and software in order to improve performance. These techniques, though beneficial, increase complexity, and are sometimes difficult to implement. The thesis demonstrates simple techniques that can significantly improve energy-savings over and above the micro-adjustment techniques. The thesis takes a radical point of view and looks at higher level primitives that can be modified for certain applications. This thesis provides two energy reduction techniques. The first technique involves aggressive duty-cycling of sensors while maintaining connectivity and coverage followed by reconstruction at the base station. Significant gains can be achieved when the sensed environment has correlation between sensor readings. The second technique involves sampling interval scheduling depending on the utilization of the sampled data based on user queries. While the first method ensures correct reproduction of the sensed environment, while reducing the burden on individual sensors, the second method provides an optimal sampling frequency that regulates energy consumption depending on user demands. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2011-02-01 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0071598 |
URI | http://hdl.handle.net/2429/31027 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2011-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2011_spring_dhar_debojit.pdf [ 1.53MB ]
- Metadata
- JSON: 24-1.0071598.json
- JSON-LD: 24-1.0071598-ld.json
- RDF/XML (Pretty): 24-1.0071598-rdf.xml
- RDF/JSON: 24-1.0071598-rdf.json
- Turtle: 24-1.0071598-turtle.txt
- N-Triples: 24-1.0071598-rdf-ntriples.txt
- Original Record: 24-1.0071598-source.json
- Full Text
- 24-1.0071598-fulltext.txt
- Citation
- 24-1.0071598.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0071598/manifest