Distributed Dynamic Coalition Formation for Bearings-only Localization in Wireless Sensor Networks by Omid Namvar Gharehshiran B.A.Sc., Sharif University of Technology, 2007 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in The Faculty of Graduate Studies (Electrical and Computer Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) January 2010 c© Omid Namvar Gharehshiran 2010 Abstract Lifetime maximization is a key challenge in the design of sensor-network-based track- ing applications. In this dissertation, formation of optimal coalitions of nodes is investigated for data acquisition in bearings-only target localization such that the average sleep times allocated to the nodes are maximized. Targets are assumed to be localized with a pre-defined accuracy where the determinant of the Bayesian Fisher information matrix (B-FIM) is used as the metric for estimation accuracy. Coop- erative game theory is utilized as a tool to devise a distributed dynamic coalition formation algorithm in which nodes autonomously decide which coalition to join, while maximizing their feasible sleep times. Nodes in the sleep mode do not record any measurements; hence, save power in both sensing and transmitting the sensed data. The proposed scheme reduces the number of sensor measurements by captur- ing the spatio-temporal correlation of the information provided by the sensors from one side and bounding the localization accuracy to the pre-defined value from the other side. It is proved that if each node operates according to this algorithm, the average sleep time for the entire network converges to its maximum feasible value. In numerical examples, we illustrate the inherent trade-off between the localization accuracy and the average sleep time allocated to the nodes and demonstrate the superior performance of the proposed algorithm via Monte Carlo simulations. ii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Energy Concerns in Wireless Sensor Networks . . . . . . . . . . . . 1 1.2 Contributions and Results . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Why Cooperative Games? . . . . . . . . . . . . . . . . . . . 4 1.2.2 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.3 Related Publications . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Background and Related Works . . . . . . . . . . . . . . . . . . . . 7 1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Power Conserved Target Localization . . . . . . . . . . . . . . . . . 12 2.1 Network Average Lifetime Maximization Problem . . . . . . . . . . 13 2.2 Non-linear Parameter Estimation in Noise . . . . . . . . . . . . . . 16 2.3 Target Localization and Stochastic Observability . . . . . . . . . . . 20 iii Table of Contents 3 The Coalition Formation Game . . . . . . . . . . . . . . . . . . . . . 26 3.1 Cooperative Game Theory . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.1 Coalition Formation Games . . . . . . . . . . . . . . . . . . 27 3.1.2 Solution Concept . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 Sensors Coalition Formation Game for Target Localization . . . . . 35 3.3 Characteristic Function for the Coalition Formation Game . . . . . 36 3.4 Formulation of Constraints on the Characteristic Function . . . . . 39 4 Dynamic Coalition Formation for Target Localization . . . . . . . 41 4.1 Dynamic Coalition Formation Solution . . . . . . . . . . . . . . . . 42 4.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.1.2 Myopic Best-reply Correspondence . . . . . . . . . . . . . . 43 4.2 Distributed Dynamic Coalition Formation Algorithm . . . . . . . . 51 4.3 Convergence of the Distributed Dynamic Coalition Formation Algo- rithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.4 Network Architecture and Implementation Issues . . . . . . . . . . . 59 4.5 Multiple Target Localization Algorithm in Large WSNs . . . . . . . 61 5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.1 Example: Target Localization . . . . . . . . . . . . . . . . . . . . . 66 5.2 Example: Tracking Slow Moving Targets . . . . . . . . . . . . . . . 75 6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . 78 6.1 Summary of Work Accomplished . . . . . . . . . . . . . . . . . . . . 78 6.2 Directions for Future Work . . . . . . . . . . . . . . . . . . . . . . . 79 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 iv Table of Contents Appendices A Derivation of the Characteristic Function . . . . . . . . . . . . . . 89 B Resource Allocation in Cognitive Radio Networks . . . . . . . . . 93 B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 B.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 96 B.3 Distributed Dynamic Coalition Formation . . . . . . . . . . . . . . . 101 B.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 v List of Tables 5.1 Expected Time Before Absorption: Exhaustive Search Method vs. Randomized Search Method. . . . . . . . . . . . . . . . . . . . . . . 69 B.3 Channel Gains for Example 1 . . . . . . . . . . . . . . . . . . . . . . 108 B.4 Channel Gains for Example 2 . . . . . . . . . . . . . . . . . . . . . . 109 vi List of Figures 1.1 Taxonomy of energy conservation schemes in WSNs . . . . . . . . . 8 2.1 Geometry of bearings-only localization in two-dimensional space. . . 21 3.1 Cooperative games in partition form vs. characteristic form. . . . . . 29 3.2 Games in graph form . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.1 Effects of the network configuration and prior density of the target on the coalition structure in the core. . . . . . . . . . . . . . . . . . . 68 5.2 Localization of a single target: optimal coalition structure and sleep times in the core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.3 Multiple target localization: optimal coalition structure and sleep times in the core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.4 Average sleep time allocated to the sensors versus number of sensors in the network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.5 Average sleep time allocated to the nodes during a localization task versus required localization accuracy Õk. . . . . . . . . . . . . . . . . 74 5.6 Network configuration for tracking a slow moving target . . . . . . . 76 5.7 Tracking a slow moving target: optimal coalition structure and the sleep times allocated to the sensors in the core . . . . . . . . . . . . 77 B.1 A typical primary and secondary network architecture in CRNs. . . 95 vii List of Figures B.2 Example 1: Coalition structure and profits allocated to the CRs in the core. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 B.3 Example 2: Coalition structure and profits allocated to the CRs in the core. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 B.4 Average individual CR payoff versus number of CRs in the CRN. . . 110 viii Acknowledgments I would like to take this opportunity to express my sincere gratitude to Prof. Vikram Krishnamurthy who encouraged me at every stage of my research. This work could not have been completed without your constant guidance and support. You managed to provide me the freedom to think and explore and the enthusiasm to tackle new problems. Indeed, it is my great fortune to have worked under supervision of a professor as knowledgeable, experienced and tireless as you. My accomplishments to date would not have been achievable without the love and support of my family. All my life, they have always been there to take care, to encourage and to support. Their perpetually reminding me of what can be accomplished if you set your mind to it is what drives my success. Last, but not least, I would like to extend my sincere thanks to all past and present members of the Statistical Signal Processing Laboratory and my friends. I am thankful for their sharing and caring, for the laughters and the friendship. Omid Namvar Gharehshiran ix Dedication To My Parents x Chapter 1 Introduction 1.1 Energy Concerns in Wireless Sensor Networks In recent years, wireless sensor networks (WSNs) have gained increasing attention in a wide range of applications [1]. A WSN is a distributed embedded system comprising a large number of low-cost, low-power and energy-constrained sensor nodes. These sensors communicate over a wireless channel, performing distributed sensing and collaborative data processing tasks for various vital military and civilian applications. Typically, a power source supplies the energy required by the sensors to perform the above tasks. This power source is often made up a battery with a limited energy budget. Even though, in some applications, it might be possible to scavenge energy from the environment (e.g., by using solar cells), external power sources exhibit a non-continuous characteristic which mandates the need for having power supplies such as batteries. In addition, since sensors may be deployed in a hostile or unreachable environment, it could be inconvenient or impractical to replace or recharge the batteries [2]. Hence, the crucial question that arises is: “how to reduce energy consumption in WSNs so that the lifetime of the network is prolonged?” In general, energy expenditure in WSNs can be divided into three main com- ponents: (i) data acquisition (sensing), (ii) data processing, and (iii) data trans- mission. Experimental measurements have shown that data transmission consumes significantly more energy that data processing [3]. However, in many real applica- tions (e.g., [4–7]) the power needed by the sensors is comparable, or even greater 1 1.2. Contributions and Results than, the power consumption by the radio. Furthermore, in some applications (e.g., target detection, target localization, etc.), due to the dense deployment of sensor nodes, sensor observations are highly correlated in the space domain. The nature of the physical phenomenon also establishes the temporal correlation between the consecutive measurements of the sensors in tracking applications [8]. This spatio- temporal correlation results in unneeded sensed data (redundant information) which is unnecessary to be transmitted to the sink. Hence, the benefits from developing efficient data sensing protocols which captures this spatio-temporal correlations is two-fold: (i) by taking less measurements, it reduces power consumption when the sensor itself is power hungry, and (ii) it reduces the unneeded communications even if the cost of sensing is negligible [2]. 1.2 Contributions and Results In this dissertation, the problem of power conservation for data acquisition is studied in a WSN that is deployed to localize multiple targets based on noisy bearing (an- gle) measurements at individual sensors. A novel distributed coalition formation and sleep allocation scheme is proposed to reduce the number of sensor measurements by keeping the localization accuracy within an acceptable level. Since estimating the position of a target in two dimensions needs at least two angle measurements (to per- form triangularization), it is natural for the sensors to form cooperative coalitions. There exists an inherent trade off between battery power and sensing accuracy such that if too few sensors form a coalition, the variance of their collaborative estimate is high. On the other hand, if too many sensors form a coalition, excessive energy is consumed. As an example, when two sensors lie on almost a straight line with the target, they record almost the same bearing information about the target. This redundant data can be avoided by putting one of the sensors in the sleep mode. Sen- sor nodes in the sleep mode do not record observations, and as the result, conserve 2 1.2. Contributions and Results energy both in data acquisition and transmitting the sensed data. Given that target localization requires sensors cooperation, the main idea is to develop a methodology for sensors to dynamically form optimal collaborative coalitions in a distributive manner. The abstract formulation we consider is a non- superadditive cooperative game. The term non-superadditive means that the grand coalition (the coalition comprising all sensors) is not optimal. This is mainly due to the trade off between battery life and the variance of estimates mentioned above. The nodes in each coalition share measurements to localize a particular target, and as a result, are rewarded with sleep times. Two questions that arise are: (i) What are the optimal coalition structures for localizing multiple targets with a pre-defined accuracy? (ii) How can nodes choose the optimal coalitions over time to ensure that the average sleep times allocated to the nodes are maximized (and, hence, battery life is prolonged)? The above questions can be addressed nicely within the framework of coalition formation in a cooperative game. Assuming that the true position of the target is unknown, a lower bound can be derived for the covariance of the estimated target position using the Cramér-Rao lower bound (CRLB), the inverse of which is known as the Bayesian Fisher information matrix (B-FIM). As is commonly used in the tracking literature (e.g., [9, 10]), determinant of the B-FIM is utilized as the met- ric of estimation accuracy. Throughout, this measure is referred to as stochastic observability. Since stochastic observability depends on both the angle of measurements and distances of sensors to the target, it is clear that the optimal coalition does not necessarily comprise the nearest sensors to the target. The optimal coalition struc- ture would typically have some sort of diversity amongst angle measurements of the sensor nodes. Therefore, determining the optimal coalition structure for track- ing multiple targets is a challenging task. Moreover, devising an algorithm that each sensor deploys so that the entire network eventually converges to the optimal 3 1.2. Contributions and Results coalition structure is of significant interest. 1.2.1 Why Cooperative Games? Noncooperative game-theoretic methodologies have been developed for sensor acti- vation in [11, 12]. These approaches are fundamentally different to the cooperative game theoretic framework considered in this work. Cooperative game theory pro- vides an expressive and flexible framework for modeling collaboration in multi-agent systems. This is appropriate for bearings-only localization in which localization is essentially achieved by triangularization as explained above. Non-superadditive coalition formation games, as a main branch of cooperative games, study the com- plex interactions among agents when the equilibrium state comprises several disjoint coalitions. Hence, it conforms to the framework in multi-target tracking where the optimal network structure comprises several coalitions of sensors, each localizing a particular target. Considering the spatio-temporal correlation of the information provided by the sensors, a cooperative game analysis allows us to optimize each of these coalitions in terms of power consumption. 1.2.2 Main Contributions The main contributions of the present work can be summarized as follows: • Formulation of the network average lifetime maximization problem as a non- superadditive coalition formation game: The power conservation problem is formulated for data acquisition in two-dimensional bearings-only target local- ization as a maximization problem for the average sleep time allocated to the sensors. Each node shares its measurements with other nodes in the coalition and, as the payoff, obtains a share from the total sleep time achievable by the coalition under constraints on stochastic observability. In addition, a fairness criteria is defined for the sleep times allocated to the individual sensors. This problem is then formulated as a non-superadditive coalition formation game 4 1.2. Contributions and Results in which the characteristic function gives the maximum total sleep time that can be allocated to each coalition of nodes such that the pre-defined accuracy is satisfied for the corresponding target. The aforementioned fairness criteria guarantees that this total sleep time is divided among the sensor nodes in a fair fashion. We propose to use the modified core as the solution concept for this cooperative game. • The distributed dynamic coalition formation algorithm: In non-super-additive TU games, finding the optimal coalition structure such that the sum of the total payoffs gained by each coalition is maximized is an NP-hard problem. The reason for this complexity is that one needs to search among all possible coalition structures which is given by the N th Bell number [13] in an N-person game. This motivates using randomized algorithms to solve the coalition for- mation problem in non-superadditive games. In this work, a distributed dy- namic coalition formation algorithm (Algorithm 4.1) is proposed in which each sensor (as the player of the game) greedily maximizes its expected sleep time (payoff) for the next period by choosing the optimal coalition whenever it gets the opportunity to revise its strategy. In addition, sensors rarely choose suboptimal coalitions whenever they are aware of a potential increase in their allocated sleep times in future. It will be proved in Sec. 4.3 that if all the sensors follow the proposed algorithm, the entire network eventually reaches the maximum total feasible sleep time constrained on the required localization accuracy which corresponds to the core of the defined non-superadditive coali- tion formation game. Our work generalizes recent results in dynamic coalition formation [14] in the sense that convergence to the core is proved when the full set of blocked nodes is not available at each iteration. A randomized search method (Algorithm 4.2) for the blocked sensors (players) is also combined with the above algorithm to achieve a compromise between the computational cost at each iteration and the convergence rate of the algorithm. 5 1.2. Contributions and Results The proposed algorithm can be applied to a large class of problems in which “players” cooperate to achieve a common goal and the optimal structure com- prises several disjoint coalitions of players (See Appendix B). • Multiple target localization and tracking: Multiple target localization is achieved by employing the proposed distributed dynamic coalition formation algorithm in a sequential-Bayesian framework (Algorithm 4.3). In general, any Bayesian estimator can be utilized; However, in this dissertation, the sequential Markov chain Monte Carlo (particle filter) is selected due to its superior performance in bearings-only localization and tracking [15]. In addition, a pre-processing al- gorithm is proposed which addresses the redundant processing and data trans- missions in WSNs monitoring a large geographical area. It will be proved that the proposed algorithm guarantees reaching an absorbing state in the Markov chain underlying the proposed distributed dynamic coalition formation algo- rithm. This work also incorporates the implementation issues and required network structure for a WSN operating to localize multiple targets using the proposed distributed scheme and based on optimal bearing-measurements. • Numerical examples: Numerical examples illustrate the behavior of the pro- posed algorithm in different scenarios through which the structural results, exploited to devise the pre-processing algorithm, are studied. These scenarios include both target localization and tracking. The inherent trade-off between the energy consumption and the localization accuracy is also demonstrated by comparing the average sleep times required by the nodes to localize targets through Monte Carlo simulations. In addition, the superior performance of the proposed algorithm is demonstrated, in terms of the average sleep time allocated to the nodes, over the heuristic range-based measurement allocation method using Monte Carlo simulations. 6 1.3. Background and Related Works 1.2.3 Related Publications The main body of this dissertation is summarized in the following papers: • O. Namvar Gharehshiran and V. Krishnamurthy, Dynamic Coalition Forma- tion for Efficient Sleep Time Allocation in Wireless Sensor Networks Using Cooperative Game Theory, in Proc. of 12th International Conference on In- formation Fusion (ICIF), Seattle, WA, July 2009. • O. Namvar Gharehshiran and V. Krishnamurthy, Dynamic Coalition Forma- tion for Sleep Time Allocation in Bearings-only Target Localization Using Wireless Sensor Networks, Submitted to IEEE Transactions on Signal Pro- cessing, revised Nov. 2009. • O. Namvar Gharehshiran, and V. Krishnamurthy, On Prolonging Life-time in Wireless Sensor Networks With Application in Localization: A Coalitional- Game Theoretic Approach, Accepted to the 35th IEEE International Con- ference on Acoustics, Speech and Signal Processing (ICCASP), Dallas, TX, March 2010. The results appearing in Appendix B are summarized in the following paper: • O. Namvar Gharehshiran, A. Attar, and V. Krishnamurthy, Dynamic Coali- tion Formation for Resource Allocation in Cognitive Radio Networks, Accepted to the IEEE International Conference on Communications (ICC), Cape Town, South Africa, May 2010. 1.3 Background and Related Works In the past decade, a great body of research has been developed focusing on how to reduce energy consumption of the sensor nodes in WSNs such that the network lifetime can be extended to reasonable times. At a very general level, [2] classifies the 7 1.3. Background and Related Works Energy Conservation Schemes Duty Cycling Data-driven Mobility-based Mobile-sinkMobile-relayEnergy-efficient Data Acquisition Data ReductionTopology Control Sleep\Wakeup Protocols Adaptive Sampling Figure 1.1: Taxonomy of energy conservation schemes in WSNs. main enabling techniques for energy conservation in WSNs as: duty cycling, mobility- based, and data-driven. These approaches have to be exploited simultaneously to achieve the maximal energy conservation. Fig. 1.1 shows the taxonomy of the power conservation schemes in WSNs. Duty cycling is primarily focused on the networking subsystem. Approaches based on duty cycling propose protocols on how to put the radio transceiver of the sensors in the sleep mode when there is no more data to send/receive, and are commonly accompanied by a sleep/wakeup scheduling algorithm to coordinate the nodes’ sleep/wakeup times. Indeed, duty cycling techniques exploit the network redundancy to prolong the network lifetime by adaptively selecting only a minimal subset of sensors to be in active mode such that the connectivity is maintained. Examples of such techniques include: topology control protocols [16], sleep/wake up protocols [17–19], and MAC protocols with low duty cycle [20, 21]. In mobility based techniques, particular nodes which are less energy constrained (e.g., nodes whose battery can be recharged or replaced) are assigned to collect data from the sensors. Hence, ordinary nodes save energy due to the reduction in path length, contention and forwarding overhead. In addition, the nodes which are located in more loaded paths or closer to the sink do not suffer premature power 8 1.3. Background and Related Works depletion. These schemes can be classified as: mobile-sinks [22, 23] and mobile- relays [24, 25]. Mobility plays a critical role in maintaining connectivity in an initially connected network which turn into a set of disconnected subnetworks due to the energy depletion. Data driven techniques, in contrast to the duty cycling and mobility approaches which focus on reducing power consumption in data transmission, are designed to optimize the energy expenditure in both data transmission and sensing subsystems by keeping the sensing accuracy within an acceptable level. In general, these ap- proaches can be classified as: (i) data reduction techniques, and (ii) energy-efficient data acquisition. As examples of data reduction schemes, we refer to: data aggre- gation [26], data compression [27], and data prediction [28, 29]. Energy-efficient data acquisition techniques concentrates on power conservation by decreasing the number of data samples. Adaptive sampling schemes reduces the number of samples by by exploiting spatio-temporal correlations between the information provided by the sensors. The algorithms in this class are mostly application-tailored. As in- stances, we refer to [30] and [31] which consider the adaptive sampling problem in a flood warning system and environmental monitoring scenario, respectively. In this dissertation, we focus on the a distributed game-theoretic adaptive sampling scheme in bearings-only localization which, to the best of our knowledge, has not been investigated in any previous study. In literature, there exist only a few works investigating coalition formation as a dynamical process. Among these works, [32–34] can be regarded as the most relevant ones to the approach presented in this dissertation. In [32], a dynamic social learning model is considered where each player observes a random sample of demand vectors and adjusts his demand based on a best-reply rule in each period. In addition, [32] introduces “mistakes” on the part of players (analogous to the “experimentation” in [14]) and proves that the set of stochastically stable states which can be reached is a subset of the set of the cores of the game. 9 1.3. Background and Related Works The approach proposed here departs from [32] in several aspects. First, the re- sults presented in [32] are restricted to supperadditive game. However, the work pre- sented in this dissertation can be applied to both supperadditive and non-superadditive games. Second, [32] completely abstracts from coalition formation and is focused on allocations. However, the approach considered here explicitly investigates the coali- tion formation process. Allowing players to choose their coalitions (to join) makes this work invaluable for applications where distributed decision making is of interest. Finally, the best-reply rule considered in [32] differs from the one considered here in the sense that the players try to maximize their expected payoff, conditional on the probability that their demands are feasible in a particular coalition. In [33], players are considered to be farsighted and the transition probabilities between different coalition structures follow a Markov chain. Given this Markov chain, players try to maximize the present value of their future expected payoffs. The transition probability from one state to another is positive only if there exists a coalition in which all players achieve larger expected future payoffs. However, in our model, players are myopic which corresponds to the special case of [33] with a discount factor of 0. The main difference between these two works is that when players are farsighted, the core will be reached only if it is the unique limit state of the dynamic process. However, with bounded rationality, a core will be reached even though the process contains several absorbing states which are not in the core. In [34], a generic approach is proposed for coalition formation through simple merge and split operations. These operations take place when they result in an improvement with respect to some given comparison relation, e.g., utilitarian order. Furthermore, conditions are introduced under which every iteration of the merge and split operations yields a unique outcome, which led to a natural notion of a stable partition. This approach, unlike the approach presented in [32], can be utilized in both supperadditive and non-superadditive games. However, it departs from the work presented here in the sense that it is focused on the coalition structure 10 1.4. Thesis Organization generation process and does not investigate the bargaining process. The algorithm devised in this work is based on the approach presented in [14] and focusses on both the allocations and coalition formation for both supperadditive and non-superadditive games. Our work differs from [14] in the sense that we assume not having full information about the set of blocked players at each period. It will be proved that if each sensor (as the player of the game) follows the proposed algorithm, the entire network reaches a point in the set of core states of the game with probability one, in which the maximum average sleep time is achieved and the total sleep time in each coalition is divided among the sensors in a fair fashion. 1.4 Thesis Organization The rest of this dissertation is organized as follows. In Chapter 2, the power conser- vation problem is formulated for data acquisition in bearings-only target localization in two-dimensional space and the stochastic observability is introduced as the lo- calization accuracy metric. This problem is interpreted as a non-supperadditive cooperative game in Chapter 3, where the related concepts and definitions from cooperative game theory are also provided. In Chapter 4, a distributed dynamic coalition formation algorithm is introduced for multiple target localization which converges to the solution to the power conservation problem. In addition, con- vergence proofs are provided and implementation issues are discussed. Numerical examples are provided in Chapter 5 to illustrate the behavior and performance of the proposed solution. Finally, Chapter 6 concludes the thesis and enumerates some possible future extensions. Derivation of the characteristic function is presented in Appendix A and Appendix B demonstrates how the proposed distributed dynamic coalition formation algorithm can be utilized to solve the load-balanced resource allocation in cognitive radio networks (CRN). 11 Chapter 2 Power Conserved Target Localization In this chapter, the power conservation problem is formulated for data acquisition in bearings-only target localization in two-dimensional space and in a scenario where each target is expected to be localized with a pre-defined accuracy. Target local- ization is formulated as a non-linear parameter estimation in a sequential Bayesian framework where, in each period, the prior information about the position of the target is updated using sensors’ bearing measurements (i.e. the posterior distribu- tion) and is regarded as the prior for the next period. To measure the accuracy in localizing targets, stochastic observability is introduced as the estimation accuracy metric. Notation and Terminology : Let N = {1, 2, · · · ,N} denote the set of sensors. Any subset S ⊆ N is called a coalition and can be identified with a vector S = (s1, · · · , sN) ∈ {0, 1}N, where si = 1 if i ∈ S0 if i /∈ S ∀i ∈ N . (2.1) Those subsets that only contain one node are called singleton coalitions, i.e. {i}. In addition, the coalition containing all nodes (i.e. N ) is called the grand coalition. The set of all coalitions which forms a partition on N is denoted by P and is called the coalition structure. Further, the set of all possible coalition structures (i.e, the set 12 2.1. Network Average Lifetime Maximization Problem of all partitions on N ) is denoted by C, the cardinality of which is given by the Nth Bell number [13]. Finally, K = {1, · · · ,K} denotes the set of targets detected in the network. 2.1 Network Average Lifetime Maximization Problem Consider a scenario in which N sensors are to form coalitions to localize K targets in two-dimensional space. Each target k is required to be localized with a pre-defined accuracy denoted by Õk. Coalitions Sk, k = 1, · · · ,K will be formed, each localizing a particular target k, and the sensors which are not assigned the localization task will form singleton coalitions. All sensors in a particular coalition Sk share bearing measurements to localize target k and as the reward receive some sleep time denoted by ti∆. In this formulation, ti ∈ Z∩ [0, T ] and ∆ denotes the time required by each sensor to record a single measurement. Therefore, T − ti determines the number of measurements that each sensor i records from a maximum of T measurements. Sensors seek to reduce their power consumption in data acquisition by maximiz- ing ti’s. The aim is to determine the optimal coalition structure and sleep time allocations such that the average sleep time of the sensors is maximized and, at the same time, all the targets are localized with the pre-defined accuracy. In addition, to prevent premature power depletion of the sensor nodes, each sensor is guaranteed a minimum sleep time equal to τi measurements. The coalition formation problem can be formulated as: maximizet∈[0,T ]N P∈C ∑ Sk∈P (∑ i∈Sk ti ) N (P1) subject to det (JB (Sk, t)) ≥ Õk ∀k ∈ K (C1) ti ≥ τi ∀i ∈ N (C2) where t = (t1, · · · , tN) denotes the sleep time allocation vector and det(JB(Sk, t)) 13 2.1. Network Average Lifetime Maximization Problem denotes the stochastic observability for a coalition Sk localizing target k which will be defined in Sec. 2.3. In (P1), the objective function is defined as the average sleep time allocated to the sensors which is aimed to be maximized over the set of all possible coalition structures C. The constraints in (C1) guarantee that the required accuracy is achieved for all the targets in the network. This formulation establishes a trade-off between the localization accuracy for each target Õk and the average sleep time allocated to the sensors in the localization task. In addition to (C1) and (C2), we introduce the fairness constraints on the sleep time allocations as follows: ∑ i∈Sk ti ≥ g∗ (Sk) ∀Sk ∈ 2N \∅, ∀k ∈ K, (2.2) where g∗ (Sk) = max{ti; i∈Sk} ∑ i∈Sk ti (2.3) subject to det (JB (Sk, t)) ≥ Õk. Intuitively, g∗ (Sk) gives the maximum total sleep time achievable in a coalition Sk such that the pre-defined localization accuracy is satisfied. We set g∗ (Sk) = 0 if the feasible set in (2.3) is empty. The set of K(2N − 1) constraints in (B.6) defines a fairness criteria on the sleep times allocated to the sensor nodes in the sense that the sleep times allocated to the nodes cannot be further improved by forming a new coalition Sk. Therefore, although the sum of the total sleep times for all coalitions ∑ Sk∈P (∑ i∈Sk ti ) is maximized, the total sleep time achievable by each coalition is divided fairly among the sensors in that coalition. In tracking applications, as the target moves, the optimum coalition structure and sleep time allocations evolve over time. Hence, the above optimization problem should be solved repeatedly. To solve the combinatorial optimization problem in (P1), one has to search 14 2.1. Network Average Lifetime Maximization Problem among all possible coalition structures C which is an NP-hard problem (|C| is given by the N th Bell number). In addition, the feasible values for the vector t has to be found such that the constraints (C1)-(C3) are satisfied. These tasks incur an immense computational overhead which have to be accomplished in a centralized fashion considering the limited power and computational resources of the sensors in WSNs. Outline of Main Result: In this dissertation, the above problem is interpreted as a non-superadditive coalition formation game with N constituting the set of players. The characteristic function1 v (S) for this game gives the total sleep time that can be achieved by a particular coalition S such that the following relaxed version of (C1) is satisfied: L (log (det (JB (Sk, t)))) ≥ log ( Õk ) ∀k ∈ K. (2.4) Here, L (·) denotes the lower bound as an operator. We propose a distributed dy- namic coalition formation algorithm in Sec. 4.2 in which, in each iteration, each sensor as a myopic optimizer chooses between the existing coalitions to greedily maximize its expected sleep time as follows: Si (ω) = argmaxSk∈P { v (Sk ∪ {i})− ∑ j∈Sk\{i} tj } with probability 1− ²i Uniform(P) with probability ²i (2.5) where ω = (P, t) and Uniform(·) denote the state of the network and discrete uniform distribution, respectively. In addition, ²i = ² > 0 only when there exists a coalition S ′k comprising sensor i for which ∑ i∈S′ k ti < v (S ′k). As it will be explained in Sec. 4.1, the randomization among the existing coalitions that happens with 1The term characteristic function is as used in cooperative games (see Sec. 3.1) and is completely unrelated to characteristic functions in probability theory. 15 2.2. Non-linear Parameter Estimation in Noise probability ² prevents the nodes being stuck in non-optimal coalition structures. It will be proved in Theorem 4.3.1 that if each sensor follows (2.5), iterations of the above algorithm eventually converges to the solution of the problem resulted from substituting (2.4) in both (C1) and (2.3). Hence, the optimal coalition structure and sleep time allocations are achieved such that the average sleep time for the sensors in the network is maximized. This approach brings about two main advantages: (i) it is performed distributively among the sensors and eliminates the need for a central decision making device, and (ii) in each iteration, unlike the NP-hard problem in (P1), sensors have to solve the non-combinatorial optimization problem in (2.5) for which the computational cost is linear in the number of coalitions (i.e. the number of targets K). 2.2 Non-linear Parameter Estimation in Noise In this section, we start with an abstract measurement model. Later in Sec. 2.3, this abstract formulation is explained in terms of localization in two-dimensional space, based on which stochastic observability is derived as the metric for localization accuracy. Consider a set of sensors denoted by N = {1, 2, . . . ,N}. The parameters that the sensors aim to estimate constitute an L dimensional vector denoted by p̃ ∈ RL. Each sensor i ∈ N records a measurement characterized by zi = hi(p̃) + vi s.t. p̃ ∼ p (p̃) (2.6) where zi ∈ RD denotes the measurement vector for the i-th sensor, hi is an arbi- trary vector-valued but differentiable function of p̃, and vi’s are mutually indepen- dent Gaussian random vectors, each with zero mean and covariance matrix Ri. In addition, p (p̃) denotes the prior distribution for parameter vector p̃. The set of measurements collected by a coalition of sensors S will also be denoted by Z (S). 16 2.2. Non-linear Parameter Estimation in Noise Let p̂ (Z (S)) denote an estimate of p̃ which is a function of the observation vector Z (S). Then, the covariance of p̂ can be expressed as Cp̂ = EZ(S) { (p̂ (Z (S))− p̃) (p̂ (Z (S))− p̃)T } . (2.7) In this work, we adopt a sequential Bayesian framework where the posterior distribu- tion is obtained by updating the prior p (p̃) using the sensors’ bearing measurement. This posterior density is then utilized as the prior for the next period. Hence, the posterior Cramér-Rao lower bound (P-CRLB) theorem [35] can be employed to es- tablish a lower bound on Cp̂. According to this theorem, there exists JB(S, t) given by JB (S, t) = Q+ Ep(p̃) {J (S, t)} (2.8a) Q = Ep(p̃) { [∇p̃ ln (p (p̃))]T [∇p̃ ln (p (p̃))] } (2.8b) J (S, t) = Ep(Z(S)|p̃) { [∇p̃ ln (p (Z (S) |p̃))]T [∇p̃ ln (p (Z (S) |p̃))] } (2.8c) such that Cp̂ ≥ J−1B (S, t) (2.9) where the matrix inequality indicates that Cp̂ − J−1B (S, t) is positive semi-definite. In the above equations, J(S, t) and JB(S, t) denote the Fisher information matrix (FIM) and Bayesian Fisher information matrix (FIM), respectively. Similarly, when p̃ is regarded as a non-random parameter, Q = 0 and J−1(S, t) provides a lower bound for the covariance of p̂. In Proposition 2.2.1, an expression is derived for the B-FIM using the measure- ments Z (S) recorded by a coalition of sensors S aiming at estimating the unknown parameter vector p̃. Proposition 2.2.1. Having a coalition of sensors S which provide a set of bear- ing estimations Z (S), based on the measurement model adopted in (2.22), the B- 17 2.2. Non-linear Parameter Estimation in Noise FIM JB(S, t) can be expressed as JB (S, t) = Q+ ∑ i∈S Ep(p̃) { [∇p̃hi]TR−1i [∇p̃hi] } . (2.10) Proof. Noting that the vectors vi, i ∈ S, are mutually independent, p (Z (S) |p̃) can be expressed as p (Z (S) |p̃) = ∏ i∈S p (zi|p̃) . (2.11) From (2.22), p (zi|p̃) = 1√ (2pi)D det (Ri) exp ( −1 2 (zi − hi (p̃))TR−1i (zi − hi (p̃)) ) . (2.12) Therefore, ∇p̃ ln p (zi|p̃) = − (zi − hi (p̃))TR−1i (∇p̃hi (p̃)) . (2.13) In addition, since the vectors vi, i ∈ S, are zero mean, Ep(zi|p̃) {∇p̃ ln p (zi|p̃)} = 0. (2.14) Subsequently, having (2.14) and substituting (2.13) in (2.8c), J (S, t) can be written as J (S, t) = Ep(Z|p̃) {[∑ i∈S (zi − hi (p̃))T R−1i (∇p̃hi (p̃)) ]T [∑ i∈S (zi − hi (p̃))T R−1i (∇phi (p̃)) ]} = ∑ i∈S (∇p̃hi (p̃))TR−1i Ep(Z|p̃) { (zi − hi (p̃)) (zi − hi (p̃))T } R−1i (∇p̃hi (p̃)) = ∑ i∈S (∇p̃hi (p̃))TR−1i (∇p̃hi (p̃)) . (2.15) 18 2.2. Non-linear Parameter Estimation in Noise Finally, substituting (2.15) into (2.8a), the B-FIM can be written as JB (S, t) = Q+ ∑ i∈S Ep(p̃) { (∇p̃hi (p̃))TR−1i (∇p̃hi (p̃)) } . (2.16) In addition, if the prior density p (p̃) follows a Gaussian distribution with the covariance matrix Cp̃, the following lemma expresses Q (see (2.8b)) in terms of Cp̃. Lemma 2.2.1. If the prior density p(p̃) follows a Gaussian distribution with Ep(p̃) {p̃} = µ̃ and Ep(p̃) { (p̃− µ̃)2 } = Cp̃, Q (as in 2.8b) is given by Q = C−1p . (2.17) Proof. Having Ep(p̃) {p̃} = µ̃ and Ep(p̃) { (p̃− µ̃)2 } = Cp̃, p(p̃) can be written as p(p̃) = 1√ (2pi)L det (Cp̃) exp ( −1 2 (p̃− µ̃)TC−1p̃ (p̃− µ̃) ) . (2.18) Therefore, ln (p (p̃)) = ln 1√ (2pi)L detCp̃ − 1 2 (p̃− µ̃)TC−1p̃ (p̃− µ̃) , (2.19) and ∇p̃ ln (p (p̃)) = − (p̃− µ̃)TC−1p̃ . (2.20) Finally, substituting (2.20) back in (2.8b), Q = Ep(p̃) {[ − (p̃− µ̃)TC−1p̃ ]T [− (p̃− µ̃)TC−1p̃ ]} = C−1p̃ Ep(p̃) { (p̃− µ̃) (p̃− µ̃)T } C−1p̃ = C−1p̃ . (2.21) 19 2.3. Target Localization and Stochastic Observability In the next section, the above abstract formulation is employed in the frame- work of bearing-only localization in two-dimensional space. In addition, we benefit form Proposition 2.2.1 and Lemma 2.2.1 to derive a closed-form expression for the stochastic observability. 2.3 Target Localization and Stochastic Observability In this section, the measurement model is described for bearings-only target local- ization in two-dimensional space. In addition, using the results in Proposition 2.2.1 and Lemma 2.2.1, we fill in the details of the stochastic observability constraints in (P1) (see Sec. 2.1). Consider a coalition of sensors S which are localizing a particular target by each sensor recording noisy bearing measurements of the target relative to a coordinate frame in two dimensions. Let vector p̃ ∈ R2 denote the position of the target that the coalition aims to estimate. Then, each sensor i records a noisy measurement θ̂i = hi (p̃) + ηi s.t. p̃ ∼ p (p̃) (2.22) where θ̂i and ηi ∼ N ( 0, σ2i ) denote the estimated bearing and the error in the estimation of the bearing for sensor i, respectively. In addition, p (p̃) denotes the prior density of the position of the target. Here, comparing with the measurement model in (2.6), L = 2, D = 1 and hi (p̃) = θ̃i, where θ̃i denotes the true bearing from sensor i to the target. Fig. 2.1 illustrates the geometry of bearings-only localization in two-dimensional space. Suppose the target is stationary. Given that the true position of the target and the position of the i-th sensor are denoted by p̃ = [xt, yt] T 20 2.3. Target Localization and Stochastic Observability Figure 2.1: Geometry of bearings-only localization in two-dimensional space. and [xi, yi] T, respectively, hi (p̃) is given by hi(p̃) = arctan ( yt − yi xt − xi ) . (2.23) Here, it is assumed that the measurement intervals are sufficiently long such that ηi’s are independent between the intervals. In addition, in order to take the sleep time of each sensor into account, the noise variance is modeled as σ2i ∝ 1 f(T − ti) (2.24) where f(·) is an increasing function such that f (0) = 0. This implies that the variance of estimation error for each sensor is inversely proportional to the active time of that sensor through an increasing function. Therefore, as the sleep time for a particular sensor decreases, it records more measurements and provides a smaller error in the bearing estimate. We now proceed to define the stochastic observability as the metric for localiza- tion accuracy. Definition 2.3.1. Stochastic observability is defined as det (JB(S, t)), where JB(S, t) 21 2.3. Target Localization and Stochastic Observability denotes the B-FIM. The P-CRLB, as stated in Sec. 2.2, establishes a lower bound on Cp̂ (covariance of the target position estimation), the inverse of which is referred to as the B- FIM matrix. In the literature, various matrix means of the B-FIM have been used as the estimation accuracy metric, e.g. trace, determinant [9, 36]. The choice of determinant in this work is justified as it can be attributed to how accurate an estimate is by noting that it determines the volume of the 1−σ confidence ellipsoid around the estimate [35]. This boundary is defined as the points [x, y]T satisfying [ x− x̂t y − ŷt ] C−1p̂ x− x̂t y − ŷt = 1, (2.25) where [x̂t, ŷt]T denotes the mean of the posterior target distribution. The following proposition provides a closed-form expression for the stochastic observability. Proposition 2.3.1. Consider the measurement model adopted in (2.22)-(2.24). Given a particular coalition of sensors S and sleep time allocation vector t, the stochastic observability det (JB(S, t)) can be expressed as det (JB(S, t)) = det (Q) + ∑ i∈S Ep(p̃) { g (1) i } + ∑ i∈S ∑ j∈S Ep(p̃) { g (2) ij } . (2.26) Here, g (1) i = α ( f(T − ti) r2i σ 2 i )( q11 cos2 ( θ̃i ) + 2q12 sin ( θ̃i ) cos ( θ̃i ) + q22 sin2 ( θ̃i )) , (2.27) g (2) ij = α2 2 ( f (T − ti) r2i σ 2 i )( f (T − tj) r2jσ 2 j ) sin2 ( θ̃i − θ̃j ) . (2.28) In addition, ri = √ (xt − xi)2 + (yt − yi)2 denotes the relative distance of the i-th sensor to the target and α represents the proportionality constant in (2.24). 22 2.3. Target Localization and Stochastic Observability Proof. First, we use (2.23) to write ∇p̃hi (p̃) = 1 ri − sin(θ̃i) cos(θ̃i) . (2.29) Therefore, substituting (2.29) in (2.10), the B-FIM can be written as JB (S, t) = Q+ ∑ i∈S αf(T − ti) r2i σ 2 i sin2(θ̃i) − sin(θ̃i) cos(θ̃i) − sin(θ̃i) cos(θ̃i) cos2(θ̃i) . (2.30) Subsequently, noting that QT = Q and using det a11 a12 a21 a22 = a11a22 − a12a21, det (JB (S, t)) can be expressed as det (JB (S, t)) =det (Q) + det ( Ep(p̃) {J (S, t)} ) + Ep(p̃) {∑ i∈S α ( f (T − ti) r2i σ 2 i ) · ( q11 cos2 ( θ̃i ) + 2q12 sin ( θ̃i ) cos ( θ̃i ) + q22 sin2 ( θ̃i ))} , (2.31) where det ( Ep(p̃) {J (S, t)} ) = Ep(p̃) {∑ i∈S ( αf (T − ti) r2i σ 2 i ) sin2 ( θ̃i )} · Ep(p̃) ∑ j∈S ( αf (T − tj) r2jσ 2 j ) cos2 ( θ̃j ) (2.32) − ( Ep(p̃) {∑ i∈S ( αf (T − ti) r2i σ 2 i ) sin ( θ̃i ) cos ( θ̃i )})2 . In the next step, using the following equalities to change the indices in the sums (∑ i∈S xi )∑ j∈S yj = 1 2 (∑ i∈S xi )∑ j∈S yj + 1 2 ∑ j∈S xj (∑ i∈S yi ) , (2.33) 23 2.3. Target Localization and Stochastic Observability (∑ i∈S xi )2 = (∑ i∈S xi )∑ j∈S xj , (2.34) det (J (s, t)) can be written as det (J (S, t)) = α2 2 Ep(p̃) {∑ i∈S ∑ j∈S ( f (T − ti) r2i σ 2 i )( f (T − tj) r2jσ 2 j ) ( sin2 ( θ̃i ) cos2 ( θ̃j ) + sin2 ( θ̃j ) cos2 ( θ̃i ) − 2 sin ( θ̃i ) sin ( θ̃j ) cos ( θ̃i ) cos ( θ̃j ))} . (2.35) In the last step, having the trigonometric equality sin(θ̃i − θ̃j) = sin(θ̃i) cos(θ̃j)− sin(θ̃j) cos(θ̃i), (2.36) det (J (S, t)) is simplified as det (J (S, t)) = α2 2 ∑ i∈S ∑ j∈S Ep(p̃) {( f (T − ti) r2i σ 2 i )( f (T − tj) r2jσ 2 j ) sin2 ( θ̃i − θ̃j )} . (2.37) Hence, (2.31) together with (2.37) completes the proof. Proposition 2.3.1 will be used in Sec. 3.3 to derive the characteristic function for the coalition formation game. The expectations in (2.26) cannot be evaluated analytically. Although one can utilize Monte Carlo methods, a simpler approach to avoid computing expectations is to approximate the B-FIM as: JB (S, t) ≈ Q+ J (S, t) |p̃=µ̃, (2.38) where µ̃ denotes the mean of the prior density p (p̃). In many practical applica- tions, the P-CRLB is approximated by (2.38), e.g. the covariance of the estimate is approximated in the same way in the extended Kalman filter [15]. In this disser- 24 2.3. Target Localization and Stochastic Observability tation, as will be seen in Sec. 3.3, the above approximation is used to compute the characteristic function for the defined coalition formation game. Throughout this work, it is also assumed that the prior density of the target is approximated by a Gaussian distribution with the covariance given by Cp̃. There- fore, as proved in Lemma 2.2.1, Q = C−1p̃ . (2.39) This assumption helps to reduce the computations in evaluating the characteristic function in Sec. 4.5. 25 Chapter 3 The Coalition Formation Game In this chapter, the power conservation problem presented in (P1) with the relaxed constraints in (2.4) is interpreted as a non-superadditive coalition formation game. The advantage of such an interpretation is that one can use dynamic coalition for- mation algorithms to compute the solution. As it will be explained later in Sec. 3.3, the characteristic function is defined such that larger coalitions of sensors do not necessarily ensure larger sleep times. This is mainly due to the fact that the stochastic observability, depending on both relative angles and distances of sensors to the target, does not necessarily increase as the number of sensor nodes in a coali- tion goes up. We utilize the modified core [37] as the solution concept for this game. This chapter also incorporates the related concepts and definitions from cooperative game theory. 3.1 Cooperative Game Theory Game theory provides a formal analytical framework to investigate the complex interactions among rational players. Throughout the past decades, game theory has been vigorously utilized in a large class of disciplines such as engineering, economics, political science, philosophy, etc [38]. Recently, there has been a significant growth in exploiting game-theoretic approaches to analyze wireless networks. This is mainly due to: (i) the emergence of large-scale, distributed and heterogeneous wireless networks; (ii) dramatic improvement in computation power, which makes it possible for network entities to make independent and rational strategic decisions; and (iii) 26 3.1. Cooperative Game Theory the need for low complexity distributed algorithms that can efficiently represent competitive or cooperative scenarios between network entities [39]. In general, game theory can be classified as follows: (a) Non-cooperative Game Theory: This class provides analytical tools to study the interactions among competing players. Each player chooses it strategy independently such that its own utility improves. In noncooperative games, players cannot make binding commitments. There exists several solution concepts for non-cooperative games, among the most renowned ones are Nash equilibrium (NE) and correlated equilibrium (CE) [40]. (b) Cooperative Game Theory: This class investigates the behavior of rational play- ers when they collaborate. The main branch of cooperative games is focused on the formation of cooperating groups of players, referred to as coalitions [37]. Hence, the game is a competition between coalitions of players, rather than between individual players. 3.1.1 Coalition Formation Games A coalition formation game is uniquely defined by the pair (N , v). N = {1, 2, · · · ,N} denotes the set of players (e.g., network entities,) who seek to form groups in order to collaborate with each other. Any nonempty subset S ⊆ N is called a coalition. Coalitions with |S| = 1, where |X | denotes the cardinality of the set X , are called singleton coalitions and N is called the grand coalition. A coalition represents an agreement between the coalition members to act as a single entity (e.g., pursue the same goal). The set of all coalitions in a game is called coalition structure and is denoted by P. v denotes the coalition value which quantifies the worth of a coalition in a game. Coalition formation games can be categorized into three different classes based on the definition of coalition value as follows: 1. Games in Characteristic Form: This is the most common form of coalition 27 3.1. Cooperative Game Theory formation games in which the value of each coalition S is exclusively determined by the players in that coalition, independently from all other players N\S and how they are structured. Games in characteristic form are classified as: (a) Games with Transferable Utility (TU): This class of coalition formation games was first introduced by Von Neuman and Morgenstern [41]. The value func- tion in these games is defined by a mapping v : 2N → R, where 2N denotes the power set of players. This value function is referred to as characteristic function and asso- ciates with every coalition S ⊆ N the maximal total payoff for that coalition. The TU property implies that the total payoff v(S) gained by a coalition can be dis- tributed in any manner between the coalition members i ∈ S. However, this value is commonly distributed using an appropriate fairness rule. Each player’s share of the total payoff is denoted by xi which forms a vector x = (x1, · · · , xN ) ∈ R|N |. This vector is referred to as payoff allocation or simply allocation vector. In this dissertation, we consider this class of coalition formation games. (b) Games with Non-transferable Utility (NTU): This class of coalition formation games was first introduced by Aumann and Peleg [42]. In an NTU game, there exist rigid constraints on the distribution of the coalition values and the characteristic function represents a mapping to a vector space, i.e, v(S) ⊂ R|S|. Furthermore, the share of the total payoff for each player i is a function of the joint actions of other players, i.e., S\{i}, in that coalition. Therefore, a TU game can be considered as a special case of the NTU game [38]. Coalition formation games in characteristic form (both TU and NTU) constitute one of the most important types of games and its application in wireless networks is progressively increasing. As some instances, the interested reader is referred to [43], [44], and [45]. 2. Games in Partition Form: This class of coalition formation games was first introduced by Thrall and Lucas [46]. In this class, unlike the games in characteristic 28 3.1. Cooperative Game Theory Figure 3.1: Cooperative games in partition form vs. characteristic form. form, the value of each coalition S depends on how other players in the game N\S are structured. The following Definition introduces the notion of collections and partitions on the set of players. Definition 3.1.1. A collection in N is any family of mutually disjoint coalitions C := {S1, · · · , SM}, ∀i 6= j, Si ∩ Sj = ∅. M is called the size of the collection. If additionally ⋃M l=1 Sl = N , the collection C is called a partition of N . Using the above definition, a coalition structure P is a partition of the grand coalition N . In these games, the value of each coalition S ∈ P is defined as v(S,P) which imposes the dependence on the structure of other players in the game. Games in partition form are intrinsically difficult to solve. Fig. 3.1 illustrates the coalitions in a cooperative game. The coalition structures specified by the solid and dotted lines can be expressed as P1 = {{1, 2, 4}, {3}, {5}} and P2 = {{1, 2, 4}, {3, 5}}, re- spectively. If the characteristic function v in the cooperative game ({1, 2, 3, 4, 5}, v) is defined to be in characteristic form, the coalition value for coalition S1 = {1, 2, 4} satisfies v (S1,P1\S1) = v (S1,P2\S1). However, if it is defined to be in partition form, it is possible that v (S1,P1\S1) 6= v (S1,P2\S1). 29 3.1. Cooperative Game Theory (a) Coalition S with graph G1S (b) Coalition S with graph G2S Figure 3.2: Games in graph form: It is possible that v ( G1S ) 6= v (G2S). 3. Games in Graph Form: This class of coalition formation games was first introduced by Myerson [47], [48]. In some applications [49], the players are inter- connected and communicate through pairwise links in a graph. However, the games in characteristic and partition form can not model the inter-connectivity of such graphs. Games in graph form consider this interconnection by defining the coalition values relative to the connectivity structure of the graphs. In these games, given a graph GS with the vertices defined as the coalition members of S and a coali- tion formation game (N , v), the value of S is given by v(GS). Therefore, in this class, a specific coalition S can be assigned different coalition values for different connectivity structures, i.e., v(G1S) 6= v(G2S). These graphs can be both directed or undirected and the value of each coalition can also depend on the connectivity structure of other coalitions GN\S as well. Fig. 3.2 demonstrates how coalition val- ues are dependent on the connectivity graph in coalition formation games in graph form. 30 3.1. Cooperative Game Theory 3.1.2 Solution Concept In coalition formation games, the aim is to find a coalition structure Ps and allo- cation vector xs such that no group of player have the incentive to leave Ps. This coalition structure is referred to as a stable coalition structure and, together with the corresponding allocation vector, is considered as the solution to the game. Hence, a solution concept for a game in characteristic form can be interpreted as a coalition structure, under which the highest gain is achieved, and a fairness criteria which determines how the gains in each coalition are distributed amongst players. In the literature, there exist formal solutions for the class of coalition forma- tion games in characteristic form with supperadditive characteristic function. Here, superadditivity is defined in TU games as a property of the characteristic function. Definition 3.1.2. In a TU game, the characteristic function is referred to as sup- peradditive if v(S1 ∪ S2) ≥ v(S1) + v(S2), ∀S1,S2 ⊆ N , S1 ∩ S2 = ∅. (3.1) Simply put, a TU game is superadditive if cooperation is always beneficial. Thus, it is guaranteed that if two disjoint coalitions join together and form a single coali- tion, they will at least receive the same payoff as by acting separately. In super- additive games, forming the grand coalition will always be to the joint benefit of all players. The above definition can also be extended for NTU games by replacing (3.1) with {x | (xi)i∈S1 ∈ v(S1), (xj)j∈S1 ∈ v(S2)} ⊆ v(S1 ∪ S2). (3.2) The most celebrated solution concept for the coalition formation games with supperadditive characteristic functions is the core [37]. Other prominent solutions include Shapley value, kernel, and Nucleolus. These solution concepts are only de- fined for superadditive TU games (extensions for NTU superadditive games are only 31 3.1. Cooperative Game Theory formalized for the core and Shapley value in literature [38]). The literature dealing with non-supperadditive games usually redefines these solution concepts or presents alternatives which are application-specific [14, 34, 50, 51]. Therefore, unlike the coalitional games with superadditive characteristic functions for which formal so- lutions exist, the solution for a non-superadditive game is not straightforward and mainly depends on the game under study. In order to define the core for a non-superadditive TU game, we need the fol- lowing definitions from the cooperative game theory context. Definition 3.1.3. An allocation vector x is called individually rational if xi ≥ v({i}) for all i ∈ N . In other words, an allocation vector is individually rational if no player can do better by acting alone. Definition 3.1.4. In a superadditive TU game, an allocation x is called feasible if ∑ i∈N xi ≤ v(N ), (3.3) and is called group rational or efficient if the equality holds. Simply put, an allocation is feasible if sum of the payoffs allocated to the players in the grand coalition does not exceed the total gain achievable by the grand coali- tion. suppose an allocation is proposed. If a group of players can form a coalition in which players are guaranteed to achieve higher payoffs, the new coalition will block the proposed allocation. Definition 3.1.5. In a TU game, an allocation x is said to be blocked by a coalition S if ∑ i∈S xi < v(S). (3.4) 32 3.1. Cooperative Game Theory This property is independent of the superadditive characteristic of the game and, as will be seen in Definition 3.1.6, acts as the fairness rule on the payoffs allocated to the players. We now proceed to define the core of a supperadditive TU game. Definition 3.1.6. If the TU game is superadditive, an allocation xc is in the core if it both efficient, i.e., ∑ i∈N xci = v(N ), (3.5) and non-blocking, i.e., ∑ i∈S′ xci ≥ v(S ′) ∀S ′ ⊆ N . (3.6) In other words, the core of a superadditive game constitutes the grand coali- tion N and an allocation vector which guarantees that no group of players can leave the grand coalition. Hence, reaching the core directly implies stability of the grand coalition. Definition 3.1.6 can be modified for superadditive NTU games as follows: If the NTU game is superadditive, an allocation xc is in the core if it is both feasible, i.e. xc ∈ v(N ), (3.7) and non-blocking, i.e., @y ∈ v(S ′) ; ∀i ∈ S ′, yi > xci ∀S ′ ⊆ N . (3.8) In many practical application, the superadditivity of the characteristic function is a quite restrictive requirement. These applications include the scenarios in which a charge is incurred for the information exchange or bargaining process. To deal with these problems, the feasibility concept is redefined for non-superadditive games as follows [14]: 33 3.1. Cooperative Game Theory Definition 3.1.7. In non-superadditive TU games, an allocation x is called feasible if ∑ i∈N xi ≤ maxP∈C ∑ S∈P v(S), (3.9) and is called efficient if the equality holds. In other words, an allocation is called feasible if the total payoffs allocated to the players does not exceed the highest possible outcome. Note that if v is supper- additive, max P∈C ∑ S∈P v(S) = v(N ). (3.10) and (3.9) reduces to (3.3). In many practical applications, players cannot be allo- cated continuous-valued payoffs. Let ∆ be the smallest accounting unit. Then, the players’ allocations are restricted to integral multiples of ∆ in the interval [v({i}),maxP∈C∑S∈P v(S)]. This new set is denoted by Xi. Considering this re- striction, together with the modified definition for feasibility, the definition of the core can be redefined for non-superadditive games as follows: Definition 3.1.8. For any non-superadditive TU game, an allocation vector xc is called to be in the core if ∑ i∈N xci ≤ maxP⊂C ∑ S∈P v(S) < ∑ i∈N xci +∆, x c i ∈ Xi (3.11a) ∑ i∈S′ xci ≥ v(S ′) ∀S ′ ⊆ N . (3.11b) It is worth emphasizing that in non-supperadditive games, unlike superadditive games where the core is reached in the grand coalition, the coalition structure in which the core can be reached constitutes several disjoint coalitions of players. In this dissertation, interpreting (P1) (see Sec. 2.1) as a non-superadditive TU game, we are interested in both the core sleep time allocation and the coalition structure in which this allocation can be reached. 34 3.2. Sensors Coalition Formation Game for Target Localization 3.2 Sensors Coalition Formation Game for Target Localization The power conservation problem for data acquisition in bearings-only localization presented in (P1) (see Sec. 2.1) can be interpreted as a TU coalition formation game defined by the set of sensors N and a real-valued characteristic function v : 2N \∅ −→ R. The characteristic function v associates with any nonempty coalition the maximum total sleep time that can be gained by that coalition such that the required localization accuracy is satisfied for the corresponding target. In other words, v (S) can be interpreted as the reward for sensors collaboration in localizing a particular target. The payoff for each sensor i is a share ti from v (S) that it claims from the coalition S to which it belongs and tries to maximize it. It is worth noting that, as stated in Sec. 3.1.1, the coalition formation games encompass cooperative games where the coalition structure plays a major role. Intuitively, each sensor i is encouraged to join a non-singleton coalition if its feasible sleep time in that coalition is rational, i.e. ti > v ({i}). Comparing with the constraints in (C2), we set v ({i}) = τi. As a result, each sensor’s sleep time is restricted to the integers in the interval [τi, T − 1]. This set represents the possible sleep time values that a sensor can claim from a coalition upon joining it and is denoted by Di throughout this work. Here, T is removed from Di since sensors with sleep times equal to T do not contribute to the localization task and, therefore, to the stochastic observability; hence, they cannot join non-singleton coalitions. It is worth mentioning that this is just a virtual modification in the formulation of the game. In reality, since the sensors in singleton coalitions do not contribute in localization of the targets, they can sleep for the whole period T∆ (i.e. v ({i}) = T ). In this work, the redefined core for non-superadditive TU games is formulated as the solution concept for the defined power conservation game. To make the notations consistent with (P1), we replace x and Xi with t and Di, respectively, 35 3.3. Characteristic Function for the Coalition Formation Game in Definition 3.1.8. Hence, defining the characteristic function v such that (2.4) is satisfied, the modified core for the coalition formation game (N , v) is the solution to the relaxed combinatorial optimization problem in (P1). We now proceed to derive the characteristic function for the game. In the next section, the characteristic function for the above formulated game will be derived and the properties will be investigated. 3.3 Characteristic Function for the Coalition Formation Game To derive the characteristic function for the sensors coalition formation game defined in Sec. 3.2, a lower bound is found for the logarithm of the determinant (log- determinant) of JB (S, t) using the expression given for det (JB (S, t)) in Proposition 2.3.1. As can be seen in (2.28), assuming f (T − ti) to be a linear function of the active time of each sensor (i.e., f (T − ti) ∝ (T − ti)), det (JB (S, t)) turns out to be a bilinear function of the active time of sensors (i.e. T − ti’s). This motivates to take the logarithm of the constraints in (C1). Let k and Sk denote the target and the coalition of sensors localizing it, re- spectively. Assuming f (T − ti) = α · (T − ti) and a diagonal Q, the characteristic functions is defined as: v(Sk) = ⌊ T − 1 log(T ) [ |Sk| log ( 3αT 3 √|Sk| (|Sk| − 1) det(Q) Õk ) + 1 3 ( ∑ i∈Sk v (1) i + 1 |Sk| − 1 ∑ i∈Sk ∑ j∈Sk j 6=i v (2) ij )]⌋+ (3.12) v (1) i = log q11 cos2 ( θµ̃i ) ( rµ̃i )2 σ2i + q22 sin2 ( θµ̃i ) ( rµ̃i )2 σ2i (3.13) 36 3.3. Characteristic Function for the Coalition Formation Game v (2) ij = log 1 2 sin2 ( θµ̃i − θµ̃j ) ( rµ̃i )2 σ2i ( rµ̃i )2 σ2j (3.14) where bxc+ = max {0, bxc} and b·c denotes the greatest integer function. In addition, assuming µ̃ = [xµ̃, yµ̃] T, rµ̃i and θ µ̃ i are given by rµ̃i = √ (xµ̃ − xi)2 + (yµ̃ − yi)2, (3.15) θµ̃i = arctan ( yµ̃ − yi xµ̃ − xi ) . (3.16) This characteristic function evaluates the total sleep time that can be achieved by Sk in terms of multiple integrals of ∆. In addition, as stated in Sec. 2.3, the simplifying assumption in (2.38) is made to compute the estimations in (2.26); As the result, the distances and bearings of the sensors to the target are defined relative to the mean of the prior density µ̃. Detailed derivation of the characteristic function is provided in Appendix A. The same approach can be utilized to derive the characteristic function when Q is non-diagonal. We next investigate the properties of the function given in (3.12)-(3.14). As can be seen in (3.12), the first term goes up as the number of sensors in a specific coalition increases. This translates to the more sleep time that can be allocated to the sensors in more populated coalitions. However, if the sensor nodes provide worthless information, the second term forces (3.12) to decrease. This worthless information can be classified as: • Redundant Information: This worthless information corresponds to the case where two or more nodes lie on almost a straight line relative to the target. Formally, ∃ i, j ∈ Sk : θµ̃i ' θµ̃j ± pi. (3.17) Therefore, sin ( θµ̃i − θµ̃j ) ' 0 and log ( sin ( θµ̃i − θµ̃j )) < 0 in (3.14). 37 3.3. Characteristic Function for the Coalition Formation Game • Imprecise Information: This type of worthless information is provided by the sensors which are located far relative to the target. Formally, if there exists a sensor i with ri À 1 such that: sin2 ( θµ̃i − θµ̃j ) ( rµ̃i )2 σ2i ( rµ̃j )2 σ2j ' 0, (3.18) then log sin2 ( θµ̃i − θµ̃j ) ( rµ̃i )2 σ2i ( rµ̃j )2 σ2j < 0. (3.19) On the other hand, if the prior density shows higher uncertainty in y direction (i.e. (Cp̃)22 À (Cp̃)11) those sensors located on θµ̃i ≈ 0 or θµ̃i ≈ pi (relative to the target) will be more useful in reducing uncertainty in that direction. In this case, considering the fact that Q ≈ C−1p̃ , it is concluded that q11 À q22. As it is clear from (3.13), the characteristic function will also allocate larger sleep times to those coalitions comprising the sensors satisfying θµ̃i ≈ 0 or θµ̃i ≈ pi. Formally, the numerator in (3.13), i.e. q11 cos2 ( θµ̃i ) + q22 sin2 ( θµ̃i ) , increases as cos2 ( θµ̃i ) gets larger. It is worth mentioning that the same property will be observed for the sensors located on θµ̃i ≈ ±pi2 (relative to the target) when (Cp̃)11 À (Cp̃)22. The above discussion implies that larger coalitions of sensors do not necessarily guarantee greater characteristic function values. Hence, the characteristic function exhibits the non-superadditive property as it contradicts the condition in Definition 3.1.2. In addition, the trade-off between the sleep times allocated to the sensors and the localization accuracy for each target Õk can be clearly seen in (3.12), where as Õk goes up, the total sleep time allocated to coalition Sk is reduced. Now, we proceed to discuss the constraints that above formulation imposes on the characteristic function. 38 3.4. Formulation of Constraints on the Characteristic Function 3.4 Formulation of Constraints on the Characteristic Function In the context of cooperative game theory, it is assumed that the characteristic function is a fixed function such that if S1 = S2, then v (S1) = v (S2) [52]. In our formulation, although the characteristic function is fixed, its value changes for a specific coalition as it attempts to localize two different targets. This problem stems from the fact that the relative bearings θµ̃i and distances r µ̃ i , as well as the required accuracy Õ, change as a coalition of sensors tries to localize different targets. Therefore, the values of the coalitions are dependent to the target indices. In order to avoid this inconsistency, it is assumed that targets are included in the coalitions as players of the game achieving zero payoffs. Formally, each coalition is considered as {k,Sk} where k and Sk denote the target and the coalition of sensors localizing it, respectively. Singleton coalitions are also denoted by {0, {i}}. Since our model assumes a separate coalition of sensors to localize each target, for our formulation to be well posed, we need to disallow targets leaving or jumping between coalitions. This requires us to impose the following constraints on the characteristic function (the process of joining and leaving coalitions will be fully explained in Sec. 4.1): 1. In order to prevent targets leave coalitions, we set v({Sk}) = −∞ ∀k ∈ K. (3.20) This forces the sensors to disband and form singleton coalitions when the target leaves the coalition. 2. In order to prevent targets jump between coalitions, we set v({k1, k2,Sk1}) = −∞ ∀k1, k2 ∈ K. (3.21) 39 3.4. Formulation of Constraints on the Characteristic Function If k2 joins the coalition localizing k1, its expected payoff in the new coalition will be −∞. Thus, k2 prefers to stay in its current coalition, in which it achieves zero payoff. 3. When there exists only one sensor in a coalition localizing a target, since no measurement diversity is provided, we set v({k, {i}}) = 0 ∀k ∈ K, ∀i ∈ N . (3.22) Thus, no sleep time is rewarded to the sole sensor. 4. Finally, to avoid construction of futile coalitions, we set v({0,S0}) = ∑ i∈S0 v({0, {i}}) ∀S0 ⊂ N . (3.23) This prevents each singleton coalition join other singleton coalitions. Indeed, there is no motivation for the sensors which are not localizing any target to cooperate. We now proceed to develop a distributed decision-making framework through which the WSN reaches the core of the above formulated coalition formation game by each sensor following a simple best-reply rule. 40 Chapter 4 Dynamic Coalition Formation for Target Localization Having interpreted the combinatorial optimization problem in (P1) as a coalition formation game, in this chapter a distributed dynamic coalition formation algorithm is proposed that converges to the core of the defined game. In each iteration of the proposed algorithm, as briefly explained in Sec. 2.1, sensors which have the opportu- nity to change their “strategies”, greedily maximize their expected sleep time for the next iteration by joining one of the existing coalitions. In addition, sensors which are potential of obtaining larger sleep times in future rarely choose suboptimal strategies to ensure not getting stuck in non-optimal states of the Markov chain underlying the proposed algorithm. It will be proved that iterations of the above simple oper- ations executed by the sensors (as the players of the game) eventually converges to the core of the coalition formation game, which is the solution to the NP-hard opti- mization problem in (P1). The comprehensive cooperative game-theoretic coalition formation and sleep time allocation algorithm is also presented for multiple target localization by integrating the distributed dynamic coalition formation algorithm with a sequential Bayesian estimator. 41 4.1. Dynamic Coalition Formation Solution 4.1 Dynamic Coalition Formation Solution 4.1.1 Overview In Sec. 3.1, it was shown that core allocations can be considered as equilibrium points in the game in the sense that reaching a core allocation and the coalition structure corresponding to it, no player can achieve larger payoff by deviating from it. However, it was not explained how the coalition structure and allocations evolve over time to arrive at such an equilibrium point. The current section addresses these issues. Simply put, we seek answers for the following questions: How do coalitions form and change over time? How do players decide on distributing the total gain determined by the characteristic function? what coalition structure and allocations will the players eventually arrive at? Throughout this section, since the proposed approach can be employed in different scenarios, we refer to the constituents of the game as “players”. Later in this section, we return to the terminology of the problem investigated in this dissertation. In order to answer these questions, we use recent results in dynamic coalition formation in cooperative game theory [14]. Our work generalizes [14] in the sense that convergence to the core is proved when partial information is available about the set of blocked players at each period. In addition, a randomized search method (Algorithm 4.2) for the blocked players is combined with the dynamic coalition formation algorithm which establishes a tradeoff between the computational cost at each iteration and the convergence rate of the algorithm. In a TU game, reaching a certain allocation requires two a priory independent processes on the side of players: 1. Coalition Structure Generation Process: At this point, the set of players learn to partition themselves into disjoint coalitions. The objective of any player i is to join a coalition S which ensures it larger payoff than the singleton coalition. 2. Bargaining Process: For the game being settled in a stable coalition structure, 42 4.1. Dynamic Coalition Formation Solution in which all players are satisfied with their allocated payoffs, we need a way of dividing the total payoff determined by the characteristic function such that the allocation vector satisfies the core stability concept [52]. This setup is very similar to the dynamic learning model in non-cooperative games with local interaction and player mobility. In these games, players can move between several locations and the play of the game takes place only between players at the same location [53]. Thus, players strategies comprise of choosing a location and an action for the game. In the model considered here, each player’s strategy comprises a coalition and a demand for his share of the total payoff gained by the whole coalition. A player joins a coalition based on a best-reply rule: A player switches coalition only if his expected payoff in the new coalition strictly exceeds his current payoff and demands the highest achievable payoff, given the demands of other coalition members, conditional on feasibility. Using the pure best-reply process, the absorbing states of the process generated by all players following the best-reply rule do not necessarily converge to the core allocations. However, if the players are allowed to choose suboptimal strategies with a small probability when there is a potential of gaining more in future, all the absorbing states determine core allocations of the game [14]. 4.1.2 Myopic Best-reply Correspondence In the context of cooperative game theory, the basic assumption is that the players are rational. Rationality means that players always try to maximize their social welfare which is evaluated in terms of the utility value (characteristic function); Therefore, rational players select strategies that maximize their expected utility taking into account the strategies of their opponents. In the proposed approach, it is assumed that the players are bounded rational. Here, this means that the players are myopic. A player which is selected to move, considering the feasibility constraints, tries to maximize its expected payoff only for the next period. 43 4.1. Dynamic Coalition Formation Solution We now proceed to the best-reply process. Time is discrete. At each period n, each player announces his demand for the next period denoted by σn+1i . Allocations in the next period are determined as follows xn+1i = σn+1i if σ n+1 j ≤ v (S)− ∑ j∈S j 6=i xnj v({i}) otherwise ∀i ∈ S, ∀S ∈ P. (4.1) In other words, in a particular coalition S, each player receives his payoff only if the demands of all players in that coalition are feasible. Therefore, the reservation payoff v({i}) can be considered as a disagreement to how the coalition payoff is being divided amongst players. Each player i chooses his demand σni from the interval [v({i}),maxP∈C∑S∈P v(S)]. However, for the sake of mathematical tractability, we restrict the demands to multiples of some small number ∆ in the above interval. This new set of demands is denoted by Xi. Each player’s strategic variables are his choice of coalition and the share of the total payoff gained by the coalition. At period n, given that we are in a specific coalition structure Pn, strategies available to player i for the next iteration are given by: Ξn+1i (Pn) = { (Sn+1i , σn+1i ) | Sn+1i = Sn ∪ {i},Sn ∈ Pn ∪ {∅}, σn+1i ∈ Xi } . (4.2) Players get the chance to revise their strategies according to the following rule: at each time step, each player takes a random draw from a Bernoulli trial with probability ξ and outcomes: “revise strategy” and “keep strategy”. In literature, this trial is referred to as receiving the learn draw [54]. Hence, at each period, a random subset of the players get the chance to revise their strategies. If the outcome of the trial is “revise strategy”, the player decides whether to join any of the existing coalitions S ∈ Pn or to form the singleton coalition, and at the same time, announces his demand for the next period σn+1i . These decisions 44 4.1. Dynamic Coalition Formation Solution are based on the following information from the current period: coalition structure Pn, and allocation vector xn = (xn1 , · · · , xnN). However, due to incompatibility of players’ plans, we may encounter coordination problems. This occurs when a player decides to join a coalition based on the coalition structure in the last period, while other players in that coalition plan to change their strategies for the next period. In order to solve this problem, we assume that the players, having the opportunity to revise their strategies, can always leave their current coalitions and join any other coalition; and no player is forced to stay in any coalition because of any other player’s plan. As stated earlier, players are assumed to be myopic. Hence, players which have the chance to revise their strategies choose the coalitions which assure them the highest expected feasible payoff for the next period. Formally, each player determines his demand and the coalition in which it can be achieved as follows [14] xn+1i (ω) = maxS∈Pn∪{∅} v (S ∪ {i})− ∑ j∈S j 6=i xnj ; x n+1 i (ω n) ∈ Xi, (4.3a) Sn+1i (ω) ∈ argmaxS∈Pn∪{∅} v (S ∪ {i}})− ∑ j∈S j 6=i xnj . (4.3b) Here, xn+1i (ω n) and Sn+1i (ωn) denote the maximum expected payoff for the next period and the coalition in which it can be achieved for player i when the game is in state ωn = (Pn,xn), respectively. If the maximizer coalition in (4.3b) is not unique, i.e., ∣∣∣Sn+1i (ω)∣∣∣ > 1, the player randomizes between all the maximizer coalitions with equal probability 1|Sn+1i (ω)| . The players strategies for the next period are determined by a best-reply rule: A player switches coalition only if its expected payoff in the new coalition is strictly greater than its current allocation, i.e., xn+1i (ω n) > xni , and they demand the most they can get considering the feasibility constraints (see 4.3a). The maximum expected payoff rule, given in (4.3), defines a finite state Markov 45 4.1. Dynamic Coalition Formation Solution chain. This Markov chain is referred to as the best-reply process [14]. Formally, the Markov chain is defined as follows Ω = {ω = (P,x) | P ∈ C, x ∈ ×i∈N Xi} (4.4a) Pωω′ = (∏ i∈Rωω′ ξ Λi (ω ′|ω) ) (1− ξ)N−|Rωω′ | if ω′ 6= ω 1−∑ω′∈Ω ω′ 6=ω Pωω′ if ω′ = ω (4.4b) where Rωω′ denotes the set of players which have to change their strategies in order to move from state ω to ω′. Let S(i) and S ′(i) denote the coalition to which node i belongs in state ω and ω′, respectively. Then, Λi(ω′|ω) is the probability of player i choosing S ′(i) if he is a member of S(i) in state ω and can be formulated as Λi ( ω′|ω) = 1 |Si(ω)| if x ′ i = xi(ω) ∧ S ′(i) ∈ Si(ω) 0 if x′i 6= xi(ω) ∨ S ′(i) /∈ Si(ω) (4.5) where ∧ and ∨ denote the “logical and” and “logical or”, respectively. As stated before, ξ also denotes the probability that each player i gets the chance to revise his strategy. Therefore, ξΛi (ω′|ω) is the probability of player i switching to (x′i,S ′(i)) from (xi,S(i)) in state ω. As the players jump between coalitions over time, they reach a stable coalition structure in which no player has an incentive to move anymore. For this purpose, we define the concept of ergodic sets and absorbing states. Definition 4.1.1. A set F ⊂ F is called ergodic if we have Pωω′ = 0 ∀ω ∈ F , ∀ω′ /∈ F , (4.6) and no nonempty subset of F has this property. If |F| = 1, i.e., for some ω ∈ F we have Pωω = 1, the ergodic set is called absorbing state. The following theorem guarantees reaching an ergodic set in the above formu- 46 4.1. Dynamic Coalition Formation Solution lated Markov chain. Theorem 4.1.1 (Theorem 3.1.1 [55]). In any finite Markov chain, no matter where the process starts, the probability that the process reaches an ergodic state or set of states after n steps tends to one as n tends to infinity. The best-reply process considered in this work may have multiple ergodic sets and/or absorbing states. The above theorem only guarantees that an ergodic set will be eventually reached. However, we are only interested in those absorbing states which constitute the cores of the coalition formation game. The fact that which of these ergodic sets (states) will be finally reached is determined by the initial state. The following example demonstrates how the initial state affects the ergodic state being reached by the best-reply process for a small game. Example 4.1. Consider the game (N , v) with N = {1, 2, 3, 4} and v(S) = 0 if |S| = 1|S|+ 2 if |S| > 1 (4.7) The unique core allocation for this game is given by: x = (2, 2, 2, 2). This allocation can be reached in any coalition structure with the following property: P = {S1,S2} s.t. |S1| = |S2| = 2. (4.8) Suppose the game is started from the initial state ω0 = ({ {1, 2, 3}, {4} } , ( 5 3 , 5 3 , 5 3 , 0 )) . (4.9) Given the coalition structure P0, suppose that only player 4 has the opportunity to revise his strategy. Therefore, following (4.3), x14(ω 0) = 1 and S1i (ω0) = {1, 2, 3}. 47 4.1. Dynamic Coalition Formation Solution The resulting state is: ω1 = ( {1, 2, 3, 4} , ( 5 3 , 5 3 , 5 3 , 1 )) . (4.10) Although ω1 is an absorbing state, x1 is not in the core. All the players, getting the chance to revise their strategies, will stick with their current strategies. Otherwise, they should form a singleton coalition in which they achieve zero payoff. Now, suppose the game is initiated from the following state ω0 = ({ {1, 2} , {3} , {4} } , (2, 2, 0, 0) ) . (4.11) Furthermore, suppose that players 3 and 4 get the chance to revise their strategies at the same time. Therefore, according to (4.3), the resulting state will be ω1 = ({ {1, 2}, {3, 4} } , (2, 2, 4, 4) ) . (4.12) However, noting the characteristic function in (4.7), this state is infeasible and should be fixed as soon as either player 3 or 4 gets the opportunity to revise his strategy. Again, if both players 3 and 4 get the chance to revise their strategies simultaneously, the following absorbing state results: ω2 = ({ {1, 2, 3, 4} } , (2, 2, 1, 1) ) . (4.13) Hence, starting from different initial states, different absorbing states can be reached which are not necessarily included in the core of the game. The fact that absorbing states may comprise non-equilibrium states is also ob- served in the literature of evolutionary models of non-cooperative games [56]. The solution to this problem is to introduce perturbations, i.e., to allow players choose suboptimal strategies with a small probability, which can be interpreted as mis- 48 4.1. Dynamic Coalition Formation Solution takes. The limit distribution for this stochastic process can be reached by letting the probability of mistakes go to zero [32]. In this dissertation, experiments are introduced on the side of players. The difference between trembles and experiments is that experiments are done deliberately. If the game enters an absorbing state which is not the core of the game, there will be a blocked coalition which guarantees some players to achieve higher payoffs than their current allocations. However, this coalition cannot be directly formed. Thus, if the members of the blocked coalition experiment with some positive probability, they can destabilize the absorbing state. To let the players deviate with suboptimal strategies, the best-reply process is modified as follows: in any state ω = (P,x), when there exist a coalition S ′ /∈ P such that ∑ i∈S′ xi < v (S ′), each player i ∈ S ′ chooses the best-reply rule with probability 1− ² and chooses each strategy (Si, xi) ∈ Ξi(P) with probability ²|Ξi(P)| . The best-reply process modified by this convention is called best-reply process with experimentation [14]. Let B (ω) = ⋃S′ /∈P S ′ denote the set of all players in blocked coalitions. Then, the transition probabilities (4.5) of the Markov chain defined in (4.4) can be modified as follows: Λi ( ω′|ω) = 1−²i |Si(ω)| + ²i |Ξi(P)| if x ′ i = xi (ω) ∧ S ′ (i) ∈ Si (ω) ²i |Ξi(P)| if x ′ i 6= xi (ω) ∨ S ′ (i) 6= Si (ω) (4.14) where ²i = ² > 0 if i ∈ B (ω)0 otherwise (4.15) Therefore, if i /∈ B (ω), player i only joins the maximizer coalition Si(ω) and de- mands the feasible sleep time in that coalition σi (ω) with probability 1 (if |Si(ω)| > 1, it randomizes between them with equal probabilities 1|Si(ω)|). However, if i ∈ B (ω), it will join Si(ω) and demands σi (ω) with probability 1 − ² (if |Si(ω)| > 1, it ran- domizes between them with equal probabilities 1−²|Si(ω)|) if it does not experiment and with probability ²|Ξi(P)| if it experiments; Hence, the total probability adds up 49 4.1. Dynamic Coalition Formation Solution to 1−²|Si(ω)| + ²i |Ξi(P)| . The following example demonstrates how the experimentation plays a role in destabilizing the absorbing states of the best-reply process when they are not in the core for the small game considered in Example 4.1. Example 4.2. Consider the game studied in Example 4.1. Suppose that the game has reached the absorbing state: ω2 = ({1, 2, 3, 4} , (2, 2, 1, 1)) . (4.16) Starting from ω2, since S ′ = {3, 4} blocks the current allocations for players 3 and 4, they will experiment with probability ² if they have the opportunity to revise their strategies. Now, suppose that player 3 gets the chance to revise his strategy and experiments by forming a singleton coalition and x33 = 2. The resulting state will be ω4 = ({ {1, 2, 4}, {3} } , (2, 2, 2, 1) ) , (4.17) where the allocation for player 3 is infeasible and must be fixed as soon as player 3 gets the chance to revise his strategy again. However, if player 4 gets the chance to revise his strategy before player 3 does, it will join the coalition S2 = {3} and the core will be reached in the following iteration: ωc = ω5 = ({ {1, 2}, {3, 4} } , (2, 2, 2, 2) ) . (4.18) Hence, experimentation acts as a de-stabilizer for the non-equilibrium states existing in the set of ergodic sets of the best-reply process. Next section introduces the distributed dynamic coalition formation algorithm in the framework of the sensors coalition formation in a multi-target localization scenario formulated in Sec. 3.2. 50 4.2. Distributed Dynamic Coalition Formation Algorithm 4.2 Distributed Dynamic Coalition Formation Algorithm In this section, we return to the notations and terminology used in the the sensors coalition formation game for target localization as formulated in Sec. 3.2. Since our model assumes a separate coalition of sensors to localize each target (see Sec. 3.4), the best-reply process defined in (4.3) must be modified as follows: tn+1i (ω n) = max k∈K∪{0} v ({k,Snk ∪ {i}})− ∑ j∈Snk j 6=i tnj , t n+1 i ∈ Di (4.19a) Sn+1i (ω n) ∈ { argmax k∈K∪{0} v ({k,Snk ∪ {i}})− ∑ j∈Snk j 6=i tnj } . (4.19b) where Di is defined as in Sec. 3.2. Here, it is assumed that each sensor getting the opportunity to revise its strategy will receive the sleep time given by (4.21). Infeasible allocations in a particular coalition (due to incompatibility of sensors’ strategies) are fixed as soon as the next sensor in that coalition gets the chance to change strategy. The following algorithm is being executed independently by each sensor in the network. This algorithm is decentralized in the sense that each node makes a se- quence of decisions independently (without considering other nodes’ decisions at the current period) which ultimately results in the whole network converging to the core of the defined coalition formation game (see Theorem 4.3.1). As explained in Sec. 3.2, reaching the core ensures that the average sleep time allocated to the sensors is maximized under constraints on localization accuracy and following the defined fairness criteria. In what follow, based on the approach explained in Sec. 4.1, the distributed dynamic coalition formation algorithm is presented in pseudo-code format. 51 4.2. Distributed Dynamic Coalition Formation Algorithm Algorithm 4.1: (Distributed Dynamic Coalition Formation) Initialization: At n = 0 select initial coalition structure such that each non-singleton coalition comprises at least two sensors and each sensor can at least achieve its reservation sleep time v({0, {i}}) = τi. Set t0i = v({k,S(i)}) |S(i)| if |S (i)| > 1 v ({0, {i}}) otherwise ∀i ∈ N (4.20) where S(i) denotes the coalition comprising sensor i. Set ω0 = (P0, t0). In addition, let ξ, ² ∈ (0, 1) to be fixed for all sensors in the network. - The following steps are done independently by each node i ∈ N : Step 1—Revision Strategy: Take a random draw from the Bernoulli trial with probability ξ. If the outcome is “keep strategy”, set tn+1i = tni , Sn+1 (i) = Sn (i) and go to Step 5. Otherwise, go to Step 2. Step 2—Evaluation of the Best Strategy for the Next Period: Com- pute tn+1i (ω n) = max k∈K∪{0} v ({k,Snk ∪ {i}})− ∑ j∈Snk j 6=i tnj , t n+1 i (ω n) ∈ Di (4.21) Sn+1i (ω n) ∈ { argmax k∈K∪{0} v({k,Snk ∪ {i}})− ∑ j∈Snk j 6=i tnj } , (4.22) where Sn0 = {∅}. Step 3—Experimentation: 52 4.2. Distributed Dynamic Coalition Formation Algorithm If i ∈ Bn, take a random draw from the Bernoulli trial with probabil- ity ². If the outcome in is “experiment”, choose tn+1i ∈ Di with equal probability 1T−v({0,{i}}) and k ∈ K∪ {0} with equal probability 1K+1 . Go to Step 5. Else, go to Step 4. Step 4—Best-reply Process: Set tn+1i = t n+1 i (ω n) and choose Sn+1 (i) ∈ Sn+1i (ωn) with equal probability 1 |Sn+1i (ωn)| . Step 5—Recursion: Set n← n+ 1 and go to Step 1. In the above algorithm Step 2 to Step 4 correspond to the greedy strategy as in (2.5). Algorithm 4.1 is accompanied by a procedure for detecting the blocked sensors Bn. In this procedure, the set of all possible combinations of coalitions S ′k ∈ 2N \∅ and targets k ∈ K are checked to find those for which the following inequality holds: ∑ i∈S′ k ti < v ({ k,S ′k }) . (4.23) This corresponds to coalitions S ′k for which the constraint (C2) (see Sec. 2.1) is not satisfied. Throughout, this method is referred to as exhaustive search for blocked sensors and, as explained above, requires to check K(2N− 1) different combinations of targets and sensors (K choices for the target and 2N − 1 choices for the set of all nonempty coalitions 2N \∅). As the number of sensors increases, this number goes up exponentially fast. Furthermore, in order to prevent examining a particular coalition S ′k repeatedly, one needs to keep track of the coalitions for which the above inequality has already been checked. This imposes an immense memory, as well as computational, overhead on the network. As a variation to the above procedure, we propose to construct a sample set from the set of all possible coalitions and examine (4.23) only for this set. This sample set is constructed by taking ζ samples from the set 2N \∅ and combining with the set of targets K. This method is referred to as randomized search for blocked sensors 53 4.2. Distributed Dynamic Coalition Formation Algorithm and the resulting set is denoted by Bnrs. Although Bnrs ⊆ Bn, it will proved that by replacing Bn with Bnrs, Algorithm 4.1 still converges to the core of the defined coalition formation game with probability one. In what follows, the randomized search method is presented in pseudo-code format. Algorithm 4.2: (Randomized Search for Blocked Sensors) for i = 1 to ζ do for k = 1 to K do Choose a coalition Sik ∈ 2N \∅ with equal probability 12N−1 . if { k,Sik } /∈ Pn then if ∑ i∈Si k ti < v ({ k,Sik }) then Bnrs → Bnrs ∪ Sik endif endif endfor endfor Remark 4.2.1. It is clearly not necessary to store the sequences ti(ωn) and Si (ωn), as well as tni and Sn (i), for all n. This also holds for the set of blocked sensors Bn (Bnrs). These sequences can be overwritten at each iteration. Using Algorithm 4.2 to detect blocked sensors, the memory requirement at each sensor is reduced to O(N) from O(2N) in the exhaustive search method. This is due to the fact that we do not keep track of the coalitions S ′k for which (4.23) has already been checked. The computational costs at each sensor can also be improved to O(Kζ) from O(K2N). However, this improvement results in slower convergence to the core. Hence, depending on the network architecture and specifications of the sensors deployed in the network, one can compromise between the memory and computational cost and the convergence rate of Algorithm 4.1 changing the size of 54 4.3. Convergence of the Distributed Dynamic Coalition Formation Algorithm the sample set denoted by ζ. In addition, Algorithm 4.1 should be accompanied by a mechanism to update the state of the network as it requires ωn at each period n+1 to compute tn+1i (ω n) and Sn+1i (ωn). This mechanism, as well as the search method for blocked nodes, seem to require a centralized device to accomplish these tasks. However, adopting a hierarchical network architecture, these tasks can be carried out in a distributed manner as will explained later in Sec. 4.4. It is worth mentioning that Algorithm 4.1 exploits the Markov chain Monte Carlo method in the sense that a Markov chain is constructed such that the limiting distribution only assigns probability 1 to the core state. Having constructed such a Markov chain, we form a realization of the chain { ω(0), ω(1), ω(2), · · · } and once the core is reached, the network remains in ωc in the consecutive periods. 4.3 Convergence of the Distributed Dynamic Coalition Formation Algorithm We now proceed to prove that the proposed distributed dynamic coalition formation algorithm accompanied by the randomized search method for blocked sensors guar- antees the maximum average sleep time for the sensors conditional on feasibility and fair sleep time allocations as defined in Sec. 2.1. This will be proved by showing that the best-reply process with experimentation in Algorithm 4.1 converges to the core of the defined coalition formation game. Theorem 4.3.1. Suppose the randomized search method (Algorithm 4.2) is em- ployed to detect blocked sensors. If every sensor follows Algorithm 4.1 and if the core of the game is nonempty, the best-reply process with experimentation converges to the core of the game with probability one. Equivalently, in the Markov chain, lim n→∞P ( ωn = ωc|ω0 ) = 1, ∀ω0 ∈ Ω (4.24) 55 4.3. Convergence of the Distributed Dynamic Coalition Formation Algorithm where ωc = (Pc, tc) denotes the core of the game. Proof. Suppose γ blocked coalitions exist in state ωn. If ζ samples are taken (we assume ζ À γ) from 2N \∅, the probability of detecting % blocked coalitions in state ωn is given by (γ%)( K2N−γ Kζ−% ) (K2NKζ ) . Particularly, the probability of detecting all blocked coalitions is (K2 N−γ Kζ−γ ) (K2NKζ ) . Therefore, there is a positive probability to detect blocked coalitions with only checking the sample set. Note that any two blocked coalitions may comprise overlapping blocked sensors. Hence, the probability of detecting all blocked sensors follows pd ≥ (K2N−γ Kζ−γ ) (K2N Kζ ) . (4.25) Detecting at least one blocked sensor, as stated in Sec. 4.1, will guarantee destabi- lizing non-equilibrium absorbing states with probability ². In [14, Theorem 2] it is proved that the vector of sleep times, allocated in an absorbing state of the best-reply process with experimentation, coincides with the set of core allocations of the game. Therefore, it is implied that if ω = (P, t) is an absorbing state, then t will be a core allocation for the game and P will be the coalition structure in which it can be achieved. Finally, we prove that the best-reply process with experimentation in Algorithm 4.1 converges to an absorbing state with probability one as time tends toward infinity when Algorithm 4.2 is deployed for detecting blocked sensors. This is proved by showing that the process will not get stuck in ergodic sets other than the absorbing states. Suppose that there exists a non-singleton ergodic set Ψ ⊂ Ω such that |Ψ| ≥ 2. [14, Theorem 2] guarantees that none of the states in Ψ involve a core allocation (absorbing states are singleton ergodic sets). As a result, for each ω ∈ Ψ there exists {k,S ′k} /∈ P such that ∑ i∈S′ k ti < v({k,S ′k}). Therefore, some sensors have the incentive to experiment. There is a positive probability that all the sensors in the blocked coalitions (m sensors) are detected. In addition, these sensors can 56 4.3. Convergence of the Distributed Dynamic Coalition Formation Algorithm experiment and form singleton coalitions with some positive probability. Hence, Pn+1 which comprises of: (i) singleton coalitions, and (ii) non-singleton coalitions which have no blocked sensors, can be reached in one step. Since ωn+1 can be reached from ωn with some positive probability, we have ωn+1 ∈ Ψ. Now, using the fact that the core is nonempty and starting with ωn+1, an absorbing state ωc = (Pc, tc) can be reached in one step. All sensors in non-singleton coalitions in ωn+1 do not experiment and all sensors in singleton coalitions experiment. This occurs with probability (ξ²)m. Now, for every Sc ∈ Pc, we fix one sensor denoted by i(Sc). There is a positive probability 1 Ξi(Pn+1) that all the other sensors in Sc experimenting in ωn+1, join Sc and demand tci . The resulting state is ωn+2 = ωc. Therefore, starting from ωn+1 there is a positive probability to reach an absorbing state. This contradicts the assumption that ωt+1 is an element of an ergodic set and completes the proof. Noting that the exhaustive search method is a special case of Algorithm 4.2, in which all the blocked sensor nodes are detected, the convergence of the best-reply process with experimentation can be easily inferred from Theorem 4.3.1. Mean Time to Absorption In order to study the tradeoff mentioned in Remark 4.2.1, we propose to use the mean time before absorption as the convergence rate for Algorithm 4.1. Suppose the core of the game is nonempty. Then, there exists at least one recurrent state in addition to the transient states in the Markov chain defined by the best-reply process with experimentation. By definition, a homogeneous Markov chain with at least one transient and one recurrent state is called absorbing. The state space Ω for an absorbing chain can be decomposed as Ω = T+ ∑ j Rj , (4.26) 57 4.3. Convergence of the Distributed Dynamic Coalition Formation Algorithm where Rj ’s denote the disjoint recurrent classes (states comprising core in this work) and T denotes the set of all transient states. The transition probability matrix can also be block-partitioned as P = R1 · · · 0 0 ... . . . ... ... 0 · · · Rn 0 TR1 · · · TRn Q . (4.27) Here, Ri’s denote the sub-matrices with transition probabilities within each recur- rent class and Q denotes the sub-matrix with transition probabilities within the transient states; Furthermore, TRi’s contain the probabilities of going from each transient state to each state in the recurrent classes. In this work, there exists only one state comprising the core in each recurrent class, i.e., Ri = 1 for i = 1, 2, . . . , n. Therefore, (4.27) reduces to P = In 0 TR Q , (4.28) where In denotes the identity matrix with n equal to the cardinality of the set of cores in the game. We seek to compute the mean time before absorption by a given recurrent class starting from a given transient state. For this purpose, the notion of fundamental matrix S for absorbing chains is employed which is defined by S = ∞∑ n=0 Qn. (4.29) Staring from (4.29) and multiplying both sides by (In−Q), it is straight forward to show that S(In −Q) = In. (4.30) 58 4.4. Network Architecture and Implementation Issues Furthermore, since |T| is finite, S = (In −Q)−1. (4.31) Now, the following theorem provides the mean time before absorption by any of the recurrent classes Rj (core of the coalition formation game) in terms of the above defined fundamental matrix. Theorem 4.3.2. [57, Theorem 6.2] Consider a homogeneous Markov chain with transition probability matrix P as in (4.28). The expected absorption time from each transient state i is given by the i-th element of E(TR) = S1. (4.32) where 1 represents a column vector of ones and TR denotes the first-time visit to one of the recurrent classes Rj after time 0. This measure will be used in Sec. 5.1 to compare the convergence rate of Al- gorithm 4.1 when accompanied with the two different search methods for blocked sensors (see Sec. 4.2). 4.4 Network Architecture and Implementation Issues In order to employ Algorithm 4.1 to perform coalition formation amongst sensors to localize multiple targets, it is important to consider the constraints imposed by the sensor technology. Consider a hierarchical WSN composed of: (i) moderately populated sensor nodes with limited processing power, memory and energy, and (ii) a backbone of sparsely spread sensor nodes which assume the role of coalition heads (CH) and have more computational power and memory and provide larger commu- nication ranges. Assume the CHs are able to communicate with each other. Each non-singleton coalition is assigned to a CH which knows the network configuration 59 4.4. Network Architecture and Implementation Issues (i.e. locations of other nodes in the network) through an initial setup process. Each node, existing in a non-singleton coalition, sets up a bidirectional communication link with the CH. It is also assumed that the sensors are equipped with passive direction-of-arrival (DOA) detectors and use the Zigbee/IEEE 802.15.4 protocol to transmit data. The main computational overhead in Algorithm 4.1 is to detect the blocked sensors. This task is being accomplished by the CHs collaboratively such that each CH searches for the blocked coalitions associated with a specific target. In other words, the CH to which the DOA estimations of a specific target are sent by the sensor nodes which are localizing it searches for the blocked coalitions comprising that target. Then, the blocked sensors’ indices are broadcasted among all CHs. Therefore, the CHs will also be responsible to inform the sensors whenever they are blocked by any other coalition. In addition, since the CHs represent the role of the base station for each coalition, they will be responsible for updating the state of the network after each iteration of Algorithm 4.1 for all sensors in the network. Hence, computing (4.19) can also be turned over to the CHs. Otherwise, the sensor nodes have to incur the communication overhead for receiving the following information from CHs: Qn, Pn and ∑j∈Snk j 6=i tnj for all k ∈ K. In the latter case, the sensors also need to experience an initial setup process to receive the information about the location of all the other sensors in the network. Each sensor node, joining a new coalition, sends a message to inform the new CH. The former CH will also be informed about this move through communication with the new CH at the end of each period. However, if a sensor is leaving a coalition to form a singleton coalition, the former CH should be informed. The overhead for leaving a coalition and joining a new coalition is called the switching cost. This cost only comprises the communication overhead for informing either the new or the old CH. Hence, the switching cost for the sensor nodes is inexpensive and they can jump between coalitions without expending much power. 60 4.5. Multiple Target Localization Algorithm in Large WSNs 4.5 Multiple Target Localization Algorithm in Large WSNs Pre-processing for Large WSNs In large WSNs (comprising large number of sensor nodes), to prevent ineffective sensors taking part in the coalition formation, a pre-processing algorithm is proposed which both reduces the memory and compu- tational costs to a great extent and ensures that the best-reply process with the new set of sensors reaches an absorbing state. The following theorem states the condi- tion under which the existence of at least one absorbing state is guaranteed in the Markov chain underlying the best-reply process in the sensors coalition formation game. Theorem 4.5.1. In a sensor network comprising N sensor nodes attempting to localize K targets, there exists at least one absorbing state in the Markov chain defined by the best-reply process if ∃ k ∈ K s.t. ∑ i∈N v ({0, {i}}) ≤ v ({k,N}) . (4.33) Proof. The proof is very similar to the one presented in [14, Theorem 1]. Suppose that for target k, which satisfies (4.33), the grand coalition {k,N} is formed and each sensor achieves a sleep time tsi ≥ v ({0, {i}}) such that ∑ i∈N tsi = v({k,N}) ≥∑ i∈N v ({0, {i}}). Now, it is easy to show that there is no incentive for any sensor to leave the grand coalition. Each sensor i ∈ N has two choices: (i) join other non-singleton coalitions, (ii) form the singleton coalition. Since tsi > 0 for all i ∈ N , they have no incentive to form non-singleton coalitions where v({k, {i}}) = 0. Furthermore, since tsi ≥ v({0, {i}}), they have no incentive to form singleton coalitions. Hence, ωs = ({k,N} , ts) constitutes an absorbing state for the best- reply process. Four parameters, as explained in Sec. 3.3, affect the total sleep time allocated 61 4.5. Multiple Target Localization Algorithm in Large WSNs to each coalition: (i) number of sensor nodes, (ii) relative distances of the sensors to the target, (iii) relative bearings of the sensors to the target, and (iv) prior density of the target. Considering these four parameters, the set of nodes participating in the algorithm is contracted as follows: The algorithm is initialized with the two nearest sensors to µ̃. Starting the algo- rithm, in each iteration, consider the set of sensors located inside a circle, denoted by Srm , with radius rm and centered at µ̃. If v ({k,Srm}) ≥ v ({ k,Srm−1 }) in- crease rm by ∆r steps; Otherwise, using the structural results presented in Sec. 5.1, eliminate the sensor nodes with the following properties: i ∈ E1 ∪ E2, (4.34) E1 = { ı ∈ N ∣∣∣ (θı − θ ' `pi, ` = 0,±1,±2) ∧ (rı ≥ r) for ı, ∈ Srt , ı 6= } , E2 = { ı ∈ N ∣∣∣ (Cp̃11 À Cp̃22 ∧ θı = ±pi2) ∨ (Cp̃22 À Cp̃11 ∧ θı = 0 or ± pi)} . Continue increasing rm until v ({k,SrM}) < v ({ k,SrM−1 }) even by eliminating the above nodes and define R = rM −∆r. Then, Ôk is defined to replace Õk in (3.12) such that v ({k,SR}) = ∑ i∈SR v ({0, {i}}) . (4.35) Ôk is updated in each iteration of the Bayesian estimator until Ôk > Õk. Here, Ôk is defined since computing the characteristic function using the orig- inal Õk can result in small characteristic function values for which (4.33) cannot be satisfied for any rm. In cases where the prior points to a large uncertainty area (i.e. Cp̃11,Cp̃22 À 1) or the target is required to be localized with very high ac- curacy (i.e. Õk À 1), (3.12)-(3.14) may even produce negative values. Hence, the above pre-processing algorithm is devised to approach the required accuracy Õk in consecutive steps of the Bayesian estimator, at the same time, guarantee the exis- tence of an absorbing in each step. Note that more accuracy in localizing the targets 62 4.5. Multiple Target Localization Algorithm in Large WSNs (i.e. larger Õk) translates to more iterations of the Bayesian estimator to reach Õk. Hence, sensors are required to take more measurements which again establishes the aforementioned trade-off between Õk and the average sleep time allocated to the sensor nodes. The pre-processing algorithm is executed for all targets k ∈ K and the result- ing smaller network comprises the sensors: N ′ = ⋃Kk=1 SRk . If any sensor in the set N\N ′ joins N ′, the assumption in Theorem 4.5.1 is not satisfied anymore; hence, the existence of the absorbing state is not guaranteed. Remark 4.5.1. As explained in Remark 4.2.1, the memory and computational over- head is closely connected to the number of sensor nodes being involved in the dis- tributed dynamic coalition formation algorithm. Particularly, by reducing the num- ber of sensors using the pre-processing algorithm, the computational overhead for finding blocked sensors using exhaustive search method is vastly improved. We now proceed to present the main target localization algorithm by integrat- ing Algorithm 4.1 with a sequential Bayesian estimator. In general, any Bayesian estimator can be utilized. Here, the sequential Markov Chain Monte Carlo (particle filter) is selected due to its superior performance in bearings-only tracking [15]. Algorithm 4.3: (Multiple Target Localization in Large WSNs) Initialization: Set t = 1. Generate particles { p t,l k = [ xt,lk , y t,l l ] , wt,lk }L l=1 for all k ∈ K based on the prior density of the target p (p̃k) where wt,lk denotes the weight of particle l for target k at time t. Step 1— Compute µ̃tk = E { ptk } , (4.36) and 63 4.5. Multiple Target Localization Algorithm in Large WSNs Qtk = var (xtk) 0 0 var ( ytk ) −1 , (4.37) for all k ∈ K. Step 2— Run the Pre-processing Algorithm for each target based on µtk and Qtk. Compute Ô t k according to (4.35) for all k ∈ K. If Ôtk > Õk, set Ôtk = Õk and FLAG = 1. Then, set N ′ = K⋃ k=1 SRk . (4.38) Step 3— Run the Dynamic Coalition Formation Algorithm (Algorithm 4.2 by CHs, as explained in Sec. 4.4, and Algorithm 4.1 by each sensor node) with the initial state ωt,0 and using µtk, Q t k, Ô t k and N ′ to reach the core ωc = (Pc, tc). Step 4— Each sensor i existing in a non-singleton coalition Sck ∈ Pc takes a number of measurements equal to T − tci from target k, transmits the measurements to the corresponding CH and then enters the sleep mode: Zt = {Zt1, · · · ,ZtK}. Step 5— Run the particle filter { p t+1,l k , w t+1,l k }L l=1 = Particle Filter ({ p t,l k , w t,l k }L l=1 ,Ztk ) , ∀k ∈ K. (4.39) Step 6— If FLAG 6= 1: Set ωt+1,0 = ωt,c and t← t+ 1. Go to Step 1. Else, finish. Next chapter, provides numerical examples to demonstrate the behavior and 64 4.5. Multiple Target Localization Algorithm in Large WSNs performance of the approach proposed to save power in data acquisition amongst sensor nodes assigned the localization task in different scenarios. 65 Chapter 5 Numerical Examples In this chapter, examples are provided to illustrate the behavior and performance of Algorithm 4.3. Throughout this section, a standard deviation of 10 degrees is assumed as the measurement error variance for all sensors, i.e. σi = 10◦. It is also assumed that α = 1, T = 10 and Õ1 = Õ2 = 103. Finally, we assume that τi = 5 for all i ∈ N . Hence, sensors receive sleep times in the interval [5, 10] and it is guaranteed that ∑ i∈N t c i N ≥ ∑ i∈N τi N = 5. 5.1 Example: Target Localization Structural Results: In this part, the behavior of the distributed dynamic coali- tion formation algorithm is illustrated in a small network comprising 8 sensors. The small size of the network helps to gain insight on how the prior density of the target and the relative configuration of the network play a role in the optimal coalition structure Pc and sleep times tc allocated to the sensors in the core. Example 1 Consider the network configuration depicted in Fig. 5.1(a). Assume p (p̃) is Gaus- sian with covariance matrix Cp̃ = 100 0 0 100 . Equal variances in the x and y direction are considered to ignore the effects of the prior density of the target. At this point, we only aim at studying the role of the relative configuration of the sensors and target on the coalition structure and allocations in the core. Since the sensor pairs {1, 5}, {2, 6}, {3, 7} and {4, 8} are located on the same line of sight from 66 5.1. Example: Target Localization the target, they provide the same bearing information about the target. However, the information that the sensors in coalition {1, 2, 3, 8} provide about the position of the target (i.e. p̃ = [x, y]T ) is more accurate due to being closer to the target. In addition, the bearings to nodes 1 and 3, as well as nodes 2 and 8, are perpendicular which provide the highest diversity in measurements. Hence, it is expected that the coalition {1, 2, 3, 8} be allocated the largest total sleep time. If any of the sensors in the set {4, 5, 6, 7} joins this coalition, stochastic observability is no further improved and the characteristic function allocates less total sleep time as explained in Sec. 3.3. This is verified by the simulation results in which v({1, {1, 2, 3, 8}}) = 26 and v({1, {1, 2, 3, 4, 8}}) = v({1, {1, 2, 3, 5, 8}}) = 19. In Fig. 5.1(a), mean of the target prior distribution and sensors are depicted by the + and ¤ signs, respectively. Filled squares represent the optimal coalition of sensors localizing the target and ti’s give the sleep times, in terms of multiples of ∆, allocated to the sensors in the core. The solid and dashed ellipses also represent the prior and posterior densities of the target, respectively. As can bee seen, the optimal coalition localizing the target and the sleep time allocations in the core are Sc1 = {1, {1, 2, 3, 8}} and tc = (7, 6, 7, 10, 10, 10, 10, 6), respectively. Table 5.1 gives the expected time before absorption to the core (see Sec. 4.3) for two different values of ² for both exhaustive and randomized search methods. As it can be seen, decreasing the probability that the set of blocked sensors get the chance to experiment, the expected time before absorption increases. The trade off mentioned between the size of the sample set and the expected time before absorption can also be observed in Table 5.1. Example 2 In this example, effect of the prior density of the target p(p̃) is investigated on the optimal coalition structure reached by every sensor following Algorithm 4.1. The prior density is assumed to be a zero-mean Gaussian distribution with its covariance 67 5.1. Example: Target Localization −60 −40 −20 0 20 40 −60 −40 −20 0 20 40 4 t4=10 5 t5=10 6 t6=10 7 t7=10 x coordinate y co or di na te 1 t1=7 2 t2=63 t3=7 8 t8=6 (a) Cp̃1 −60 −40 −20 0 20 40 −60 −40 −20 0 20 40 3 t3=10 4 t4=10 5 t5=10 6 t6=10 7 t7=10 x coordinate y co or di na te 1 t1=5 2 t2=6 8 t8=6 (b) Cp̃2 Figure 5.1: Effects of (a) relative configuration of the target and sensors, and (b) prior density of the target on the optimal coalition structure in the core for local- ization of a single target. 68 5.1. Example: Target Localization Table 5.1: Expected Time Before Absorption: Exhaustive Search Method vs. Ran- domized Search Method. S01 = {1, {1, 3}} and t0i = 5, ∀i ∈ N ξ ² ζ E(TR) Exhaustive Search 0.3 0.6 - 31.9 0.3 0.3 - 49.8 Randomized Search 0.3 0.6 7 46.3 0.3 0.3 7 63.4 given by one of the two following matrices: Cp̃1 = 100 0 0 100 , Cp̃2 = 25 0 0 100 . (5.1) Since Cp̃2 places larger uncertainty on the y coordinate of the target position, it is expected that the optimal coalition structure is formed such that it provides more information about that coordinate. In addition, Fig. 5.1(b) reveals that nodes {1, 5} and {3, 7} provide information only on the y and x coordinates of the target’s position, respectively. However, sensors in the set N\{1, 3, 5, 7} provide information on both coordinates. Since the uncertainty in x direction is small compared to y direction, node 3 may provide redundant information as nodes {2, 8} reduce the uncertainty in x direction. Fig. 5.1(b) justifies the above discussion by showing the optimal coalition localizing the target and allocations in the core: Sc1 = {1, {1, 2, 8}} and tc = (5, 6, 10, 10, 10, 10, 10, 6). Now, consideringCp̃2 since the uncertainty along the x axis is increased, we anticipate that node 3 also takes part in the localization which is verified in Fig. 5.1(a). 69 5.1. Example: Target Localization Target Localization In this part, the behavior of the multiple target localization algorithm (Algorithm 4.3) is investigated for the network configuration depicted in Fig. 5.1. It is assumed that the covariance of the target distribution is given byCp̃1 at t = 0. Here, running the pre-processing algorithm results in N ′ = N . Fig. 5.1(a) shows the optimal coalition structure and sleep times allocated to the sensors when Algorithm 4.1 reaches the core at t = 1. Subsequently, each sensor takes a number of measurements equal to T − ti which results in the posterior distribution depicted by the dash-dot ellipse. This updated distribution will be used as the prior for the next decision epoch. Fig. 5.2(a) and Fig. 5.2(b) demonstrate the core state, as well as the prior and posterior distribution of the target, at t = 2 and t = 3, respectively. As can be seen, although the optimal coalition structure Pc remains the same for t = 1 to t = 3, the optimal sleep times tc allocated to the sensors in the core change. Furthermore, an example is provided to study the behavior of the Algorithm 4.1 for multiple target localization in the network depicted in Fig. 5.3. Here, it is assumed that Cp̃1 = Cp̃2 = [ 100 0 0 100 ]. Running the pre-processing algorithm results in:SR1 = {7, 12, 13, 16, 17, 18, 22, 23} and SR2 = {3, 8, 9, 12, 13, 14, 18, 19}. These sensors are shown with the same color as the corresponding target in Fig. 5.3. Here, SR1 ∩ SR2 = {12, 13, 18}. Hence, there exists a competition between the two coalitions localizing the two targets and these sensors will join the coalition in which they can achieve larger sleep times. Fig. 5.3 demonstrates the optimal coalition structure and sleep times allocated to the sensors in the core for t = 1. Finally, the performance of Algorithm 4.3 is compared with a scenario in which a fixed set of sensor nodes Sr are assigned to localize the target in a Bayesian framework. These sensors are assumed to be the closest sensors to the target, hence providing more accurate measurements comparing to other sensors. In multi-target tracking scenarios, if a sensor i is in the closest sensors set for more than one target (i.e. ∃ k1, k2 ∈ K such that i ∈ Srk1 and i ∈ Srk2), then the sensor node chooses the 70 5.1. Example: Target Localization −60 −40 −20 0 20 40 −60 −40 −20 0 20 40 4 t4=10 5 t5=10 6 t6=10 7 t7=10 x coordinate y co or di na te 1 t1=6 2 t2=73 t3=8 8 t8=7 (a) t = 2 −60 −40 −20 0 20 40 −60 −40 −20 0 20 40 4 t4=10 5 t5=10 6 t6=10 7 t7=10 x coordinate y co or di na te 1 t1=8 2 t2=63 t3=6 8 t8=6 (b) t = 3 Figure 5.2: Localization of a single target: optimal coalition structure and the sleep times allocated to the nodes in the core in term of multiple of ∆ at: (a) t = 2, and (b) t = 3. 71 5.1. Example: Target Localization 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 x coordinate y co or di na te T1 T2 t13=8 t14=7 t8=6 t9=10 t3=10 t19=10 t22=6 t16=6 t17=7 t18=7 t12=7 t23=10 t7=10 Figure 5.3: Multiple target localization: optimal coalition structure and the sleep times allocated to the sensors in the core at t = 1. target to localize randomly. We refer to this method as range-based measurement allocation. These nodes are assumed to be awake for the whole period and take T measurements, i.e. ti = 0 for all i ∈ Sr, after which the prior of the target is updated. Fig. 5.4 shows the average sleep times allocated to the sensors in each method as a function of the number of sensors in the network. Here, 100 random network configurations are generated in two-dimensional space with N sensors and K targets spread uniformly in each network. The sensors are spread around the targets in a 200m × 200m square and the prior densities of the targets are assumed to be Gaussian with covariance matrix Cp̃ = [ 100 00 100 ]. The core is replaced with the absorbing state of the best-reply process without experimentation when the core turns out to be empty. As the number of sensors in the network increases, uniform distribution of the sensors provides more diversity in the relative configuration of the target and sensor nodes; Hence, more diverse bearing measurements are collected and the average sleep times allocated to the sensors increases in both methods. 72 5.1. Example: Target Localization 10 15 20 25 30 0 10 20 30 40 50 60 70 80 90 100 Number of Nodes (N) Av er ag e Sl ee p− tim e pe r N od e (% ) Range−based Measurment Allocation |S| = 3, K = 1 Range−based Measurment Allocation |S| = 6, K = 1 Range−based Measurment Allocation |S| = 3, K = 2 Dynamic Coalition Formation, K = 1 Dynamic Coalition Formation, K = 2 Figure 5.4: Average sleep time allocated to the sensors during a localization task versus number of sensors in the network: distributed dynamic coalition formation vs. range-based measurement allocation. However, as can be seen in Fig. 5.4, the distributed dynamic coalition formation approach demonstrates a significant average sleep time increase compared with the range-based method. Particularly, the average sleep time allocated to the nodes is guaranteed to be larger that ∑ i∈N τi N = 5. The aforementioned trade-off between the required localization accuracy Õk and the average sleep time allocated to the sensors is also demonstrated and compared with the heuristic range-based measurement allocation in Fig. 5.5. Here, we again considered 100 random network configurations with 10 sensors and one target spread uniformly in a 200m × 200m square network. The prior density of the target is also assumed to be Gaussian with covariance matrix as above. Fig. 5.5 illustrates that as Õk goes up, the average sleep time allocated to the sensors decreases in both approaches; However, the average allocated sleep time drops more rapidly in the range-based method. Hence, the distributed dynamic coalition formation approach 73 5.1. Example: Target Localization 102 103 104 0 10 20 30 40 50 60 70 80 90 100 localization accuracy Õk A ve ra g e S le ep -t im e p er N o d e (% ) Distributed Dynamic Coalition Formation Range−based Measurment Allocation |S| = 3 Figure 5.5: Average sleep time allocated to the nodes during a localization task versus required localization accuracy Õk: distributed dynamic coalition formation vs. range-based measurement allocation. provides a better trade-off between Õk and the average sleep time allocated to the sensors. 74 5.2. Example: Tracking Slow Moving Targets 5.2 Example: Tracking Slow Moving Targets In this example, we use Algorithm 4.3 for tracking a slow moving target as follows: Algorithm 5.1: (Coalition Formation in Target Tracking) Require: p(p̃0k) (initial prior density of the targets with mean µ̃ 0 k and covariance C0p̃k) and Õk for all k ∈ K. Set t = 1: Step 1— Run Algorithm 4.3 with p(p̃tk) and Õk. Step 2— Approximate the posterior as a Gaussian distribution p ( p̂tk ) with µ̃tk = E { p f k } and Qtk = var ( xfk ) 0 0 var ( yfk ) −1 for all k ∈ K where { p f,l k = [ xf,lk , y f,l l ] , wf,lk }L l=1 denotes the final set of particles and their weights when Algorithm 4.3 terminates. Step 3— Predict the distribution of the target position for the next period: p ( p̃t+1k ) = Traget Model ( p̂tk,∆t ) . Step 4— Set t← t+ 1. Go to Step 1. In Algorithm 5.1, the basic assumption is that the targets move slowly enough such that the core of the game does not change until Algorithm 4.3 is converged. Considering the fact that several iteration of Algorithm 4.3 are required to reach the required accuracy for localization of the target, it is assumed that the time 75 5.2. Example: Tracking Slow Moving Targets 0 5 10 15 20 25 30 35 40 −25 −20 −15 −10 −5 0 5 10 15 20 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 x coordinate y co or di na te Potential Sensors Figure 5.6: Network configuration for tracking a slow moving target: Dashed circle depicts the set of sensors determined by the pre-processing algorithm. interval ∆t between the consecutive iterations of Algorithm 5.1 is at least one order of magnitude larger than T∆. Sensors enter the sleep mode after transmitting the final measurements (see Step 4 in Algorithm 3.3) and wake up to record and transmit the first set of measurements in the next iteration of Algorithm 5.1. Consider a network of sensors as shown in Fig. 5.6. The target motion model is assumed to follow a random walk to the left on the x axis as follows: the target decides whether to go to the left for 0.2m or stay at its current position every 50∆s. The time interval between the consecutive iterations of Algorithm 5.1 is also assumed to be 1000∆ s. Results are shown in Fig. 5.7 for two iterations of Algorithm 5.1 starting with Cp̃0 = [ 25 00 25 ]. Sensors which form the optimal coalition for tracking the target are depicted with filled squares for the first iteration of Algorithm 4.3. Fig. 5.7 illustrates two iterations of Algorithm 5.1. The solid ellipse and ∗ depict the updated posterior distribution of the target and its mean, respectively, after reaching the required accuracy. 76 5.2. Example: Tracking Slow Moving Targets 20 22 24 26 28 30 32 34 36 −20 −15 −10 −5 0 5 10 15 2 3 6 7 10 11 13 14 x coordinate y co or di na te t3= 7 t11= 6 t14= 6 (a) t = 0 18 20 22 24 26 28 30 32 −10 −5 0 5 10 15 2 3 6 10 x coordinate y co or di na te t2= 9 t3= 9 t6= 7 t10= 8 (b) t = 1000∆ Figure 5.7: Tracking a slow moving target: optimal coalition structure and the sleep times allocated to the nodes in the core in terms of multiples of ∆ for the first iteration of Algorithm 4.3 at (a) t = 0, and (b) t = 1000∆. 77 Chapter 6 Conclusions and Future Work This chapter summarizes the work accomplished in this dissertation, followed by a discussion on the possible future work for further investigation. 6.1 Summary of Work Accomplished This dissertation considered the power conservation problem for data acquisition in WSNs deployed to localize multiple targets. The problem was formulated as combi- natorial optimization problem where the goal is to maximize the average sleep time allocated to the sensors over the set of all possible coalition structures such that a pre-defined localization accuracy is provided for all targets. This NP-hard problem was interpreted as a non-superadditive coalition formation game in a bearings-only localization scenario. We proposed a distributed dynamic coalition formation algo- rithm in which each sensor greedily maximizes its expected sleep time for the next iteration by joining one of the existing coalitions. The notion of the core was rede- fined and used as the solution concept for this game. It was shown that the sleep time allocations and coalition structure in the core is the solution for the aforemen- tioned combinatorial optimization problem. Furthermore, we proposed a distributed dynamic coalition formation algorithm in which each node greedily maximizes its expected sleep times for the next iteration. Experimentation was introduced on the side of sensors (as players of the game) such that each sensor chooses suboptimal strategies whenever it knows the chance of achieving larger sleep times in future. It was shown that if every sensor in the network follows this algorithm, it converges 78 6.2. Directions for Future Work to the core of the coalition formation game, and hence the optimal coalition struc- ture and allocations, with probability one. The main advantage of this algorithm is that the solution to an NP-hard problem is reached distributively by each sensor following a simple best-reply rule. This algorithm was integrated with a sequential Bayesian estimator to localize targets, for which the superior performance over the heuristic range-based measurement allocation method was demonstrated through Monte Carlo simulations. The proposed algorithm can also be employed in range- only tracking scenarios by deriving the appropriate characteristic function. 6.2 Directions for Future Work 1. Numerous applications in wireless networks: The distributed dynamic coalition formation algorithm developed in Sec. 4.2 provides a generic frame- work to solve the problem of coalition formation in multi-agent domain. This approach can be exploited to optimize various performance metrics in wire- less networks, where the network entities collaborate to achieve some common goal, by reaching the core of the underlying game. An example is provided in Appendix B in which the problem of resource allocation with load balancing is investigated in cognitive radio network. In this example, cognitive radios (as players of the game) form collaborative groups to exploit the spectrum available to cognitive base stations most efficiently. Hence, defining the ap- propriate characteristic function for the cooperative scenario under study, the proposed distributed dynamic coalition formation algorithm can be utilized to reach the optimal coalition structure and allocations. 2. Distributed overlapping coalition formation: A possible extension to the present work is to investigate a scenario in which sensor nodes are involved in executing more than one task, e.g. detecting intrusions to the network, record- ing measurements from more than one target at each period, forwarding the 79 6.2. Directions for Future Work collected measurements to the sink in a multi-hop scenario, etc. Therefore, sensors have to distribute their resources between several (not necessarily dis- joint) coalitions. To tackle such scenarios, the cooperative game model devel- oped in Sec. 3.1 should be modified allowing overlapping coalitions. Following this modification, the notion of the core, as the equilibrium state in the game, should be redefined such that once the system reached that state no player can do better off by deviating from it. A distributed decision-making framework can also be developed similar to the one proposed in this dissertation. For this purpose, one can benefit form [58] which introduces a model for coopera- tive games with overlapping coalitions and provides some good insight on the notion of stability in the defined setting. 3. Leftover battery and communication cost: Another possible extension to the present work is to take both the leftover battery energy for each sensor node and the communication cost between the sensors and the CHs into account. For this purpose, one needs to modify the characteristic function defined for the sensors coalition formation game to incorporate both these effects. We expect this modification to increase the expected time elapsed before the first sensor runs out of battery. In addition, it is expected to achieve a trade-off between the relative distances of the sensors to the target and relative distances of the sensors to the CH in the optimal coalition structure reached in the core by incorporating the communication cost. 4. Alternate bounds for stochastic observability: As explained in Ap- pendix A, to derive the characteristic function for the game, a lower bound is found for the stochastic observability (determinant of the B-FIM). The ap- proach used in Appendix A requires this lower bound to be linear in terms of the active times of the sensors. There exists no generic approach to derive a closed-form expression for such a bound; However, one may use heuristic 80 6.2. Directions for Future Work methods to find a tighter lower bound. It is expected to achieve larger char- acteristic function values, hence, larger sleep times for a coalition of sensors compared to the characteristic function proposed in this dissertation if one can derive such tighter bounds. 81 Bibliography [1] I. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “Wireless sensor networks: a survey,” Computer networks, vol. 38, no. 4, pp. 393–422, 2002. [2] G. Anastasi, M. Conti, M. Francesco, and A. Passarella, “Energy conservation in wireless sensor networks: A survey,” Ad Hoc Networks, vol. 7, no. 3, pp. 537–568, 2009. [3] V. Raghunathan, C. Schurgers, S. Park, and M. Srivastava, “Energy-aware wireless microsensor networks,” IEEE Signal Process. Mag., vol. 19, no. 2, pp. 40–50, 2002. [4] I. Akyildiz, T. Melodia, and K. Chowdhury, “A survey on wireless multimedia sensor networks,” Computer Networks, vol. 51, no. 4, pp. 921–960, 2007. [5] G. Simon, M. Maróti, Á. Lédeczi, G. Balogh, B. Kusy, A. Nádas, G. Pap, J. Sal- lai, and K. Frampton, “Sensor network-based countersniper system,” in Proc. of 2nd Intl. Conference on Embedded Networked Sensor Systems (SenSys’04), Nov. 2004. [6] D. Diamond, “Energy consumption issues in chemo/biosensing using WSNs,” in Energy and Materials: Critical Issues for Wireless Sensor Networks Workshop, 2006. [7] G. Werner-Allen, K. Lorincz, M. Ruiz, O. Marcillo, J. Johnson, J. Lees, and M. Welsh, “Deploying a wireless sensor network on an active volcano,” IEEE Internet Comput., vol. 10, no. 2, pp. 18–25, 2006. 82 Bibliography [8] M. Vuran, Ö. Akan, and I. Akyildiz, “Spatio-temporal correlation: theory and applications for wireless sensor networks,” Computer Networks, vol. 45, no. 3, pp. 245–259, 2004. [9] Y. Bar-Shalom, X. Li, and T. Kirubarajan, Estimation with Applications to Tracking and Navigation. Wiley New York, 2001. [10] C. Hue, J. Le Cadre, and P. Perez, “Posterior Cramer-Rao bounds for multi- target tracking,” IEEE Trans. Aerosp. Electron. Syst., vol. 42, no. 1, pp. 37–49, 2006. [11] V. Krishnamurthy, “Self-configuration in dense sensor networks via global games,” IEEE Trans. Signal Process., vol. 56, no. 10, pp. 4936–4950, 2008. [12] V. Krishnamurthy, M. Maskery, and G. Yin, “Decentralized activation in a ZigBee-enabled unattended ground sensor network: A correlated equilibrium game theoretic analysis,” IEEE Trans. Signal Process., vol. 56, no. 12, pp. 6086–6101, 2008. [13] L. Lovasz, Combinatorial Problems and Exercises. Ams Chelsea Pub., 2007. [14] T. Arnold and U. Schwalbe, “Dynamic coalition formation and the core,” Jour- nal of Economic Behavior and Organization, vol. 49, no. 3, pp. 363–380, 2002. [15] B. Ristic and S. Arulampalam, Beyond the Kalman Filter: Particle Filters for Tracking Applications. Artech House, 2004. [16] P. Santi, “Topology control in wireless ad hoc and sensor networks,” ACM Computing Surveys, vol. 37, no. 2, pp. 164–194, 2005. [17] C. Schurgers, V. Tsiatsis, S. Ganeriwal, and M. Srivastava, “Optimizing sen- sor networks in the energy-latency-density design space,” IEEE Trans. Mobile Comput., pp. 70–80, 2002. 83 Bibliography [18] X. Yang and N. Vaidya, “A Wakeup Scheme for Sensor Networks: Achieving balance between energy saving and end-to-end delay,” in IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS) 2004, 2004, pp. 19–26. [19] G. Lu, N. Sadagopan, B. Krishnamachari, and A. Goel, “Delay efficient sleep scheduling in wireless sensor networks,” in Proc. IEEE INFOCOM 2005, Mar. 2005. [20] V. Rajendran, K. Obraczka, and J. Garcia-Luna-Aceves, “Energy-efficient, collision-free medium access control for wireless sensor networks,”Wireless Net- works, vol. 12, no. 1, pp. 63–78, 2006. [21] W. Ye, J. Heidemann, and D. Estrin, “Medium access control with coor- dinated adaptive sleeping for wireless sensor networks,” IEEE/ACM Trans. Netw., vol. 12, no. 3, pp. 493–506, 2004. [22] Z. Wang, S. Basagni, E. Melachrinoudis, and C. Petrioli, “Exploiting sink mo- bility for maximizing sensor networks lifetime,” in Proc. of the 38th Annual Hawaii International Conference on System Sciences (HICSS’05), Hawaii, Jan. 2005. [23] J. Luo and J. P. Hubaux, “Joint mobility and routing for lifetime elongation in wireless sensor networks,” in IEEE INFOCOM 2005, vol. 3, Mar. 2005, pp. 1735–1746. [24] A. Somasundara, A. Kansal, D. Jea, D. Estrin, and M. Srivastava, “Control- lably mobile infrastructure for low energy embedded networks,” IEEE Trans. Mobile Comput., vol. 5, no. 8, pp. 958–973, 2006. [25] S. Jain, R. Shah, W. Brunette, G. Borriello, and S. Roy, “Exploiting mobility for energy efficient data collection in wireless sensor networks,”Mobile Networks and Applications, vol. 11, no. 3, pp. 327–339, 2006. 84 Bibliography [26] E. Fasolo, M. Rossi, J. Widmer, and M. Zorzi, “In-network aggregation tech- niques for wireless sensor networks: a survey,” IEEE Wireless Commun. Mag., vol. 14, no. 2, pp. 70–87, 2007. [27] C. Tang and C. Raghavendra, “Compression Techniques for Wireless Sensor Networks,” Wireless Sensor Networks, pp. 207–231, 2006. [28] Y. Le Borgne, S. Santini, and G. Bontempi, “Adaptive model selection for time series prediction in wireless sensor networks,” Signal Processing, vol. 87, no. 12, pp. 3010–3020, 2007. [29] D. Ganesan, A. Cerpa, W. Ye, Y. Yu, J. Zhao, and D. Estrin, “Networking issues in wireless sensor networks,” Journal of Parallel and Distributed Com- puting, vol. 64, no. 7, pp. 799–814, 2004. [30] J. Zhou and D. De Roure, “FloodNet: coupling adaptive sampling with energy aware routing in a flood warning system,” Journal of Computer Science and Technology, vol. 22, no. 1, pp. 121–130, 2007. [31] M. Rahimi, M. Hansen, W. Kaiser, G. Sukhatme, and D. Estrin, “Adap- tive sampling for environmental field estimation using robotic sensors,” in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS 2005), 2005, pp. 3692–3698. [32] M. Agastya, “Perturbed adaptive dynamics in coalition form games,” Journal of Economic Theory, vol. 89, no. 2, pp. 207–233, 1999. [33] H. Konishi and D. Ray, “Coalition formation as a dynamic process,” Journal of Economic Theory, vol. 110, no. 1, pp. 1–41, 2003. [34] K. Apt and A. Witzel, “A generic approach to coalition formation,” in Proc. of the Int. Workshop on Computational Social Choice (COMSOC), Amsterdam, Netherlands, Dec. 2006. 85 Bibliography [35] H. L. Van Trees, Detection, Estimation and Modulation Theory. John Wiley & Sons, 1968. [36] J. Helferty and D. Mudgett, “Optimal observer trajectories for bearings-only tracking by minimizing the trace of the Cramer-Rao lower bound,” in Proc. of the 32nd IEEE Conf. on Decision and Control, San Antonio, TX, 1993. [37] G. Owen, Game Theory. Acad. Press San Diego, 1995. [38] R. B. Myerson, Game Theory: Analysis of Conflict. Harvard Univ Press, 1991. [39] W. Saad, Z. Han, M. Debbah, A. Hjørungnes, and T. Başar, “Coalitional Game Theory for Communication Networks: A Tutorial,” IEEE Signal Process. Mag., vol. 26, no. 5, pp. 77–97, 2009. [40] T. Başar and G. Olsder, Dynamic Noncooperative Game Theory. SIAM: Clas- sics in Applied Mathematics, Philadelphia, PA, 1999. [41] J. Von Neumann and O. Morgenstern, Theory of Games and Economic Behav- ior. Princeton, 1944. [42] R. Aumann and B. Peleg, “Von Neumann-Morgenstern solutions to cooperative games without side payments,” Bulletin of the American Mathematical Society, vol. 66, pp. 173–179, 1960. [43] W. Saad, Z. Han, M. Debbah, A. Hjørungnes, , and T. Başar, “Coalitional games for distributed collaborative spectrum sensing in cognitive radio net- works,” in Proc. of IEEE INFOCOM, Rio de Janeiro, Brazil, Apr. 2009. [44] S. Mathur, L. Sankar, and N. Mandayam, “Coalitions in cooperative wireless networks,” IEEE J. Sel. Areas Commun., vol. 26, no. 7, pp. 1104–1115, 2008. [45] W. Saad, Z. Han, M. Debbah, and A. Hjørungnes, “A distributed merge and split algorithm for fair cooperation in wireless networks,” in Proc. Intl. Conf. on 86 Bibliography Communications, Workshop on Cooperative Communications and Networking, Beijing, China, May 2008. [46] R. Thrall and W. Lucas, “N-person games in partition function form,” Naval Research Logistics Quarterly, vol. 10, no. 1, 1963. [47] M. Roger, “Graphs and Cooperation in Games,” Mathematics of Operation Research, vol. 2, pp. 225–229, 1977. [48] M. Jackson and A. Wolinsky, “A strategic model of social and economic net- works,” Journal of Economic Theory, vol. 71, no. 1, pp. 44–74, 1996. [49] W. Saad, Z. Han, M. Debbah, A. Hjørungnes, and T. Başar, “A game-based self-organizing uplink tree for voip services in IEEE 802.16j networks,” in Proc. Intl. Conf. on Communications, Dresden, Germany, June 2009. [50] T. Sandholm, K. Larson, M. Andersson, O. Shehory, and F. Tohme, “Coalition structure generation with worst case guarantees,” Artificial Intelligence, vol. 111, no. 1, pp. 209–238, 1999. [51] D. Ray, A game-theoretic perspective on coalition formation. Oxford University Press, USA, 2007. [52] R. Aumann and S. Hart, Handbook of Game Theory with Economic Applica- tions, Volume 2: Volume II. North Holland, 1994. [53] T. Dieckmann, “The evolution of conventions with mobile players,” Journal of Economic Behavior and Organization, vol. 38, no. 1, pp. 93–111, 1999. [54] G. Nöldeke and L. Samuelson, “An evolutionary analysis of backward and for- ward induction,” Games and Economic Behavior, vol. 5, no. 425-454, p. 302, 1993. [55] J. Kemeny and J. Snell, Finite markov chains. Springer, 1976. 87 Bibliography [56] M. Kandori, G. Mailath, and R. Rob, “Learning, mutation, and long run equi- libria in games,” Econometrica: Journal of the Econometric Society, pp. 29–56, 1993. [57] P. Bremaud, Markov Chains: Gibbs fields, Monte Carlo Simulation, and Queues. Springer, 1999. [58] G. Chalkiadakis, E. Elkind, E. Markakis, and N. R. Jennings, “Overlapping coalition formation,” in Proc. of the 4th Intl. Workshop on Internet and Net- work Economics (WINE08). Springer, 2008, pp. 307–321. [59] The Federal Communications Commission, “Second Report And Order And Memorandum Opinion And Order,” ET Docket No. 08-260, Nov. 2005. [60] C. Stevenson, G. Chouinard, Z. Lei, W. Hu, S. Shellhammer, and W. Cald- well, “IEEE 802.22: The first cognitive radio wireless regional area network standard,” IEEE Commun. Mag., vol. 47, no. 1, pp. 130–138, 2009. [61] J. Mitola III, “Cognitive radio for flexible mobile multimedia communications,” in IEEE International Workshop on Mobile Multimedia Communications (MO- MUC’99), San Diego, CA, 1999, pp. 3–10. [62] I. Akyildiz, W. Lee, M. Vuran, and S. Mohanty, “NeXt generation/dynamic spectrum access/cognitive radio wireless networks: a survey,” Computer Net- works, vol. 50, no. 13, pp. 2127–2159, 2006. [63] S. Haykin, “Cognitive radio: brain-empowered wireless communications,” IEEE J. Sel. Areas Commun., vol. 23, no. 2, pp. 201–220, 2005. [64] T. Chen, H. Zhang, G. Maggio, and I. Chlamtac, “CogMesh: a cluster-based cognitive radio network,” in 2nd IEEE DySPAN 2007, Dublin, Ireland, 2007, pp. 168–178. 88 Appendix A Derivation of the Characteristic Function In this appendix, detailed derivation of the characteristic function presented in (3.12)-(3.14) is provided. In what follows, we benefit from the following inequal- ity: log (∑N i=1Xi N ) ≥ ∑N i=1 log (Xi) N . (A.1) This inequality holds due to the concave property of the logarithm function. As the first step, we remove the expectations in (2.26) using the approximation in (2.38). Then, applying the above inequality in (2.26)-(2.28) repeatedly, a lower bound can be found as follows: log (det (JB (Sk, t))) ≥ 13 [ log (3 det (Q)) + log ( 3 ∑ i∈Sk α (T − ti)( rµ̃i )2 σ2i · ( q11 cos2 ( θµ̃i ) + q22 sin2 ( θµ̃i ))) + log ( 3 ∑ i∈Sk ∑ j∈Sk j 6=i 1 2 α (T − ti)( rµ̃i )2 σ2i α (T − tj)( rµ̃j )2 σ2j sin2 ( θµ̃i − θµ̃j ))] (A.2) 89 Appendix A. Derivation of the Characteristic Function ≥ log (3) + 1 3 [ log (det (Q)) + 1 |Sk| · ∑ i∈Sk log ( |Sk|· α (T − ti)( rµ̃i )2 σ2i · ( q11 cos2 ( θµ̃i ) + q22 sin2 ( θµ̃i ))) + 1 |Sk| (|Sk| − 1) ∑ i∈Sk ∑ j∈Sk j 6=i log ( |Sk| (|Sk| − 1) 2 · α (T − ti)( rµ̃i )2 σ2i α (T − tj)( rµ̃j )2 σ2j sin2 ( θµ̃i − θµ̃j ))] . (A.3) Subsequently, applying ∑ i∈Sk ∑ j∈Sk j 6=i log ((T − ti) (T − tj)) = 2 (|Sk| − 1) ∑ i∈Sk log (T − ti) , (A.4) one can write (A.3) as log (det (JB (Sk, t))) ≥ log ( 3α 3 √ |Sk| (|Sk| − 1) det (Q) ) + 1 3|Sk| ∑ i∈Sk log 1( rµ̃i )2 σ2i ( q11 cos2 ( θµ̃i ) + q22 sin2 ( θµ̃i )) + 1 3|Sk| (|Sk| − 1) ∑ i∈Sk ∑ j∈Sk j 6=i log 1( rµ̃i )2 σ2i 1( rµ̃j )2 σ2j sin2 ( θµ̃i − θµ̃j ) + 1 |Sk| ∑ i∈Sk log (T − ti) . (A.5) Since the arguments T−ti in log(T−ti) are integers in the closed interval [1, T − v({i})] and due to concavity of the logarithm function, log (T − ti) can be lower bounded by log (T − ti) ≥ − log (T ) T − 1 ti + log (T ) . (A.6) 90 Appendix A. Derivation of the Characteristic Function In the next step, applying (A.6) in (A.5), one can write log (det (JB (Sk, t))) ≥ log ( 3αT 3 √ |Sk| (|Sk| − 1) det (Q) ) + 1 3|Sk| ∑ i∈Sk log 1( rµ̃i )2 σ2i ( q11 cos2 ( θµ̃i ) + q22 sin2 ( θµ̃i )) + 1 3|Sk| (|Sk| − 1) ∑ i∈Sk ∑ j∈Sk j 6=i log 1( rµ̃i )2 σ2i 1( rµ̃j )2 σ2j sin2 ( θµ̃i − θµ̃j ) − log (T )|Sk| (T − 1) ∑ i∈Sk ti. (A.7) Finally, applying the relaxed constraints (2.4) in (A.7), the sum of sleep times of the sensors in a particular coalition Sk can be expressed as ∑ i∈Sk ti ≤ |Sk| (T − 1)log (T ) ( log ( 3αT 3 √|Sk| (|Sk| − 1) det (Q) Õk ) + 1 3|Sk| ∑ i∈Sk v (1) i + 1 3|Sk| (|Sk| − 1) ∑ i∈Sk ∑ j∈Sk j 6=i v (2) ij ) , (A.8) where v (1) i = log q11 cos2 ( θµ̃i ) ( rµ̃i )2 σ2i + q22 sin2 ( θµ̃i ) ( rµ̃i )2 σ2i , (A.9) v (2) ij = log 1 2 sin2 ( θµ̃i − θµ̃j ) ( rµ̃i )2 σ2i ( rµ̃i )2 σ2j . (A.10) The right-hand side in (A.8) gives the maximum total sleep time that can be achieved by a coalition of sensors Sk constrained on the required localization accuracy Õk. In this dissertation, the aim is to minimize the energy consumption by maximizing the average sleep time allocated to the sensors. Therefore, we equate the sum of the sleep times of the sensors in Sk to the upper bound provided by the right-hand side. 91 Appendix A. Derivation of the Characteristic Function However, as defined in Sec. 2.1, ti’s are positive integer numbers. As the result, the sum on the left-hand side should also be confined to Z+ (positive integer numbers). Hence, ∑ i∈Sk ti = ⌊ (T − 1) log (T ) ( |Sk| log ( 3αT 3 √|Sk| (|Sk| − 1) det (Q) Õk ) + 1 3 ∑ i∈Sk v (1) i + 1 3 (|Sk| − 1) ∑ i∈Sk ∑ j∈Sk j 6=i v (2) ij )⌋+ , (A.11) where bxc+ = max {0, bxc} and b·c denotes the greatest integer function. This func- tion provides the maximum feasible sleep time for a coalition Sk localizing target k and hence is considered as the characteristic function for the sensors coalition for- mation game defined in Sec. 3.2. 92 Appendix B Resource Allocation in Cognitive Radio Networks Cognitive radio networks (CRN) are promising to enrich the connectivity demand of users through exploitation of under-utilized licensed bands. However, there are few vigorous studies in the literature that provide a holistic view of CRNs in performing tasks such as cognitive radio (CR) admission policy, resource allocation and load balancing. In this work, the load balanced resource allocation problem is formulated in CRNs as a non-superadditive coalition formation game in which the CRs, as players of the game, form collaborative groups to exploit the spectrum available to cognitive base stations (CBS) most efficiently. Upon joining a coalition, each CR receives a payoff that can be translated to the revenue form the achieved throughput subtracted by the cost incurred by occupying one of the available sub-channels. A distributed decision making framework is developed for coalition formation among CRs based on the approach presented in Sec. 4.1, that converges to the core of the defined game corresponding to the maximum average payoff conditional on feasibility and subject to the fairness rule as defined in 2.1. Furthermore, we will elaborate on load balancing properties of the proposed solution and demonstrate its superior performance compared to opportunistic scheduling. 93 Appendix B.1. Introduction B.1 Introduction The end users’ demand to be “always connected” is progressively increasing, man- dating ever more efficient spectrum utilization techniques. This demand for high bandwidth, seamless and heterogeneous connectivity makes the traditional fixed spectrum allocation paradigm unsustainable. Spectrum regulatory bodies have, hence, embraced a change of paradigm towards more flexible spectrum manage- ment methods such as secondary spectrum access (SSA), e.g. the FCC has recently authorized unlicensed access to TV white spaces [59]. A number of standardization efforts including IEEE 802.22 [60] and IEEE SCC41 are already underway to address the market needs with SSA-enabled technologies. Cognitive radio (CR) [61] is a key enabling technology to this end, facilitating intelligent and agile spectrum access while protecting the licensed users of a given band from degrading interference. Consider a wide geographical area covered with several primary transmitters (PT), e.g. TV broadcast towers, each with their dedicated (and licensed) frequency band as shown in Fig. B.1. A cognitive radio network (CRN) comprised of several cognitive base stations (CBS) is also operating in that area, very much in-line with IEEE 802.22 standard assumptions. Due to the lack of dedicated signaling channel for CRN and inherent instability of underlying secondary bands, each CBS should autonomously perform tasks such as CBS-CR association, resource allocation, load balancing and CR hand over (HO). In this appendix, a distributed decision making framework is proposed based on the approach presented in Sec. 4.1, which addresses the aforementioned resource allocation and load balancing problem in a CRN in two steps. In the first step, each CBS admits an optimal set of CRs, where the admissibility criterion is based on finding the optimal coalition structure in each cell such that the sum of utilities for all cells is maximized. Indeed, this step can be conceived as the coalition structure generation process, as explained in Sec. 4.1. Next, the admitted CRs negotiate to 94 Appendix B.1. Introduction Figure B.1: A typical primary and secondary network architecture pertinent to our analysis. reach the optimal payoff allocation in a distributed fashion such that the average payoff of the CRs is maximized, conditional on feasibility and subject to a fairness criterion (see Sec. 2.1), hence, converging to the core of the game. This last step can also be identified as the bargaining process (see Sec. 4.1). In the past decade, since the introduction of cognitive radio into wireless commu- nications lexicon, a great body of research has been developed focusing on various aspects of this promising technology [62], [63]. However, a holistic view of a CRN has only recently gained the attention of researchers [60], [62], [64]. To the best of our knowledge no previous study has investigated the distributed CR admission, load balancing and optimal channel allocation in a CRN rigorously. Using cooperative game theory as a tool to optimize performance metrics in wireless systems has also recently gained attention. In [44], formation of virtual MIMO systems is formulated as a super-additive cooperative game and the stability of the grand coalition is studied for both transmitters and receivers cooperation. 95 Appendix B.2. Problem Formulation A similar problem is investigated in [45], considering the cost (in terms of power) for exchanging data between cooperating transmitters. To this end, the authors developed a simple merge and split algorithm, based on the approach proposed in [34], through which transmitters are able to self-organize and form stable coalitions. As another example, [43] studies collaborative spectrum sensing (CSS) in CRNs in which secondary users interact for improving their sensing performance, while taking into account the false alarm cost. Since the utility represents probabilities, the game is formulated as a non-transferable cooperative game and a distributed algorithm is proposed for coalition formation. The rest of this appendix is organized as follows: In Sec. B.2, the load balanced resource allocation problem is formulated. In Sec. B.3, the problem is interpreted as a cooperative game and a two-step dynamic coalition formation algorithm is presented. Finally, the numerical results are presented and discussed in Sec. B.4. B.2 Problem Formulation Notation Let N = {1, 2, · · · ,N} denote the set of CRs in the multi-cell CRN. Coalitions and coalition structures are denoted by S and P, respectively. The set of all pos- sible coalition structures is also denoted by C. Finally, the set of sub-channels in coalition S is denoted by Rs. Each coalition S is identified by a tuple S = (x,y) where x ∈ {0, 1}N is given by xi = 1, if i ∈ S0, if i /∈ S (B.1) Let {ei; i = 1, · · · ,N} denote the standard basis vectors in RN. Then, x =∑i∈S ei. In addition, the occupied sub-channels in a particular coalition S are identified by 96 Appendix B.2. Problem Formulation a vector y ∈ Z+|Rs| given by yj = i, if sub-channel j is oocupied by CR i0, if sub-channel j is not occupied (B.2) Lastly, we define an indicator function to determine the sub-channel index to which each CR is connected as follows I(i) = j if yj = i0 otherwise (B.3) Channel Model Let F denote the set of licensed frequency bands available in the investigated ge- ographical area. A specific primary cell k, coexisting with the secondary cell k, as depicted in Fig. B.1, exploits a subset of these licensed bands denoted by Fk ⊆ F , where k ∈ K = {1, 2, . . . ,K}. The set of available frequency bands for SSA is then indicated by F\Fk and is denoted by F−k. It is assumed that the CBS in cell k will partition F−k into Mk equal-bandwidth sub-channels denoted by the set Mk. For mathematical tractability of the investigated problem, it is also assumed that each CR requires to access one sub-channel and each sub-channel in cell k will exclusively be allocated to a single CR. We assume a block fading channel model for all sub- channels, where the channel fading distribution in the Mk sub-channels of each cell, as monitored by each CR, are i.i.d. The CSI is assumed to be known at the CRs, and are fed back to CBS, which can be reliably estimated using the periodic pilot signals from each CBS. Pricing Mechanism Consider a scenario where the primary license owner charges the CRN a flat price, say G ($), for the total available frequency for SSA over a fixed and agreed period of 97 Appendix B.2. Problem Formulation time τ (S). This “spectrum lease” period is ideally a multiple times of the resource allocation period of CRN, i.e. τ = T × τ̃ , where T ∈ N+ and τ̃ denotes the dura- tion of CRN resource allocation period. Such an agreement alleviates the need of primary network monitoring the CRN operation compared to the case where spec- trum charges are assumed proportional to the CRN activity. Assuming existence of equal-size secondary bandwidth in all cells, the incurred cost of CRN per cell over one resource allocation period is G̃ = GT×K ($). If secondary bandwidth is not equal in all cells, a proportional cost distribution can be utilized. The CRN, hence, is interested in maximally utilizing the available bands so as to gain a higher utility from the charged frequency bands. In the rest of this appendix, and without loss of generality, we focus on an arbitrary resource allocation period of CRN. To compensate the incurred cost, CBS k is assumed to use a cost function defined by G(‖xk‖) = λ · (exp ln( G̃ λ +1) |Mk| ‖xk‖−1), (B.4) where λ is an arbitrary constant and ‖ · ‖ indicates the L1 norm in RN, defined by ‖xk‖ := ∑N i=1 |xi|. This cost function demonstrates increasing differences such that when user i joins cell k, it will pay a higher price of G (‖xk‖)−G (‖xk − ei‖), compared to existing users in that cell. Therefore, as a particular cell becomes con- gested, CRs will be motivated to join other cells provided that they have access to sub-channels in other cells. This approach amounts to a distributed load balancing mechanism in the CRN which can be parameterized by λ and G̃. The utility func- tion (in dollars) for a coalition of CRs associated with CBS k, for a given resource allocation period, is defined as U(Sk) = C ∑ i∈Sk [ log ( 1 + Pi,k · gi,I(i),k σ2i )]−G (‖xk‖) , (B.5) where Pi,k denotes the allocated power to CR i from CBS k, σ2i denotes the noise 98 Appendix B.2. Problem Formulation power for CR i (assumed to be equal for all sub-channels) and gi,I(i),k is the channel gain for CR i when using sub-channel I(i) in CBS k. The first term in (B.5) returns the revenue (in dollars) as a function of the achieved throughput. Hence, the utility function (B.5) essentially determines the profit that each coalition of CRs gains by occupying a subset of the sub-channels available in a particular cell. Note that while the cost for accessing resources in each cell, i.e. G (‖xk‖) in (B.5), has a readily accessible monetary interpretation, the revenue term may need to be indirectly translated to a monetary regime. The revenue of each CR, for instance, can be envisioned as the dollar equivalent of end user satisfaction proportional to its received service from CBS k. Upon joining a given coalition, CR i will contribute towards the coalition utility and in return will receive a pay-off, denoted by pi. The problem considered in this appendix can then be formally stated as maximizep∈D P∈C ∑ Sk∈P (∑ i∈Sk pi ) N (P2) subject to ∑ i∈Sk pi ≤ U (Sk) ∀k ∈ K (C1) pi ≥ pi ∀i ∈ A (P) (C2) where p = (p1, · · · , pN) denotes the payoff vector for all CRs and A (P) denotes the set of all non-singleton coalitions formed in a particular coalition structure P which will be vigorously defined later. To make the problem mathematically tractable, the set of profit values that can be allocated to the CRs is confined to a finite set. Suppose ∆ ($) is the smallest profit unit. CRs’ demands are then restricted to the integer multiples of ∆ in the closed interval [0,maxxk∈{0,1}N,‖xk‖≤|Mk| U(Sk)]. Throughout, this set is denoted by D. In the combinatorial optimization problem in (P2), the objective function is defined as the average payoff achievable by the CRs forming coalitions to exploit the resources available for SSA in the CRN. This objective function is aimed to 99 Appendix B.2. Problem Formulation be maximized over the set of all possible coalition structures C. The constraints in (C1) guarantee that the total payoff allocations in a particular coalition do not exceed the profit gained by that coalition. Furthermore, as formulated in (C2), it is assumed that a given CR i is interested in cooperation only if its payoff is greater than pi. Let N ′ denote the set of CRs admitted to network in the solution to (P2). A fairness rule is also defined, as in Sec. 2.1, on the vector of payoffs p allocated to the CRs in N ′ as follows: ∑ i∈Sk pi ≥ U (Sk) ∀Sk ⊆ 2N \∅, ∀k ∈ K. (B.6) These constraints guarantee that the total profit in the coalitions are divided among the CRs in a fair fashion such that no CR can gain higher profits by exploiting the resources associated to any other CBS. Next section provides a two-step distributed solution to the above problem fol- lowing the approach presented in Sec. 4.1. Formulation of Constraints In the presented formulation, since the channel gains vary as the CRs connect to different CBSs, the utility function for each coalition, given in (B.5), depends on the CBS to which the CRs in that coalition are connected, as well as the sub-channels . Hence, as explained in Sec. 3.4, CBSs are considered as (virtual) players of the game achieving zero payoffs. Hereafter, to denote this convention, each coalition k ∈ K will be denoted by {CBSk,Sk}. Singleton coalitions will also be denoted by {CBS0, {i}}, where i ∈ N\(∪Sk∈PSk). Finally, for our formulation to be well posed, the following constraints are imposed on the utility function: • To prevent CBSs leaving coalitions, the utility of a coalition of CRs without 100 Appendix B.3. Distributed Dynamic Coalition Formation the corresponding CBS is set to U(Sk) = 0 ∀k ∈ K. (B.7) • To prevent CBSs jump between coalitions, we set U({CBSk, CBSk′ ,Sk}) = 0 ∀k, k′ ∈ K. (B.8) • To allocate each sub-channel exclusively to one CR, we set U({CBSk,Sk ∪ {l}}) = 0, if ∃ i ∈ Sk : I(i) = I(l) = j. (B.9) • Finally, the utility for singleton coalitions is set to U({CBS0, {i}}) = 0, (B.10) and to prevent singleton coalitions joining other singleton coalitions, we set U({CBS0,S0}) = 0, (B.11) where S0 ∈ 2N\∪Kk=1Sk . B.3 Distributed Dynamic Coalition Formation In this section, we propose a two-step algorithm which converges to the solution to problem (P2) by each CR independently following simple best-reply rules, as explained in Sec. 4.1. In the the first step a distributed decision-making framework is developed in the form of two simple join and disjoin rules. It is guaranteed that if each CR follows these rules, coalitions are formed in each cell such that the sum of the profits in all coalitions is maximized. Note that traditional (centralized) 101 Appendix B.3. Distributed Dynamic Coalition Formation scheduling schemes have a myopic view of isolated cells which, as shown in Sec. B.4, will fail to maximize the overall utility of all cells. The second step of our proposed solution ensures reaching a core allocation in the non-superadditive cooperative TU game defined on the set of CRs admitted to the CRN in the first step. We now proceed to elaborate these steps separately. Admission The first step is essentially the coalition structure generation process, as explained in Sec. 4.1, and aims to find a partition on the set of CRs such that the sum of the utility functions for all coalitions is maximized, i.e. maximize P∈C ∑ k∈K U({CBSk,Sk}) (B.12) subject to < xk1 ,xk2 >= 0 ∀k1, k2 ∈ K where < ·, · > denotes the inner product of vectors in RN. Intuitively, the con- straints in (B.12) emphasize that coalitions are disjoint, as defined in Chapter 2; hence, no CR exploits the resources in two different cells simultaneously. This com- binatorial optimization problem can be interpreted as a cooperative game (N , v) = ({1, · · · ,N} ,U), where N denotes the set of CRs in the CRN which collaborate to maximize the profit gained by forming coalitions to utilize resources in the CRN. To reach the optimal coalition structure in the game, a distributed join and disjoin algorithm is proposed. In each iteration of this algorithm, CRs follow simple join and disjoin rules which are similar to the merge and split rules as proposed in [34]. In addition, to prevent incompatibility of CRs’ strategies, it is assumed that CRs get the chance to revise their strategies according to a Bernoulli trial with probability ξ as in Sec. 4.1. Each CR, having the opportunity to revise its strategy, employs the following rules to decide which coalition to join in the next period: 102 Appendix B.3. Distributed Dynamic Coalition Formation Join and Disjoin Rules: Rule 1: CR i joins coalition Sk if di→k := [U({Sk ∪ {i}}) + U({S(i)\{i}})]− [U({Sk}) + U({S(i)}] > 0, (B.13) and k = argmax j∈K ( max l∈Rk yl=0 di→j ) , (B.14) where S(i) denotes the coalition to which CR i belongs. Rule 2: CR i disjoins coalition Sk if U({Sk\ {i}}) > U({Sk}), (B.15) More generally, all CRs i ∈ Sk\S ′k disband if there exists a coalition S ′k such that: U (S ′k) > U (Sk) . (B.16) To simplify the notation, we dropped CBSk when referring to coalition k in the above rules. Note that di→k in effect determines the added profit (or loss when di→k < 0) of CR i leaving its current coalition S (i) to join Sk. Further- more, to prevent getting stuck in non-optimal coalition structures, it is assumed that if there exists a coalition S ′k such that ∑ i∈S′ k di→k > 0, where all i ∈ S ′k are in the range of CBS k, CRs i ∈ S ′k experiment as explained in Sec. 4.1. The following theorem shows the convergence and stability of the partitions resulting from the above operations. 103 Appendix B.3. Distributed Dynamic Coalition Formation Theorem B.3.1. Any iteration of successive join and disjoin operations terminates. In addition, the coalition structure reached PA∞ is stable and PA∞ = argmaxP∈C ∑ Sk∈P U (CBSk,Sk) . (B.17) The stability implies reaching a specific coalition structure P, such that no CR is interested in leaving P through join and disjoin operations. All CRs in non-singleton coalitions N ′ = { ∪k∈KSk;Sk ∈ CSA∞, |Sk| > 1 } are admitted to the network. All other CRs form singleton coalitions and achieve zero payoffs. Dynamic Coalition Formation In the next step, the admitted CRs, denoted by the set N ′, collaborate to form coalitions such that the average payoff allocated to each CR is maximized conditional on the constraints (C1) and (C2) and subject to the fairness criteria (B.6). This problem can be formulated as maximizep∈D P̃∈C̃ ∑ S̃k∈P̃ (∑ i∈S̃k pi ) |N ′| subject to ∑ i∈S̃k pi ≤ U ( CBSk, S̃k ) ∀k ∈ K∑ i∈S̃′ k pi ≥ U ( CBSk, S̃ ′k ) ∀S̃ ′k ⊆ 2N ′\∅, ∀k ∈ K pi ≥ pi ∀i ∈ N ′ The above resource allocation problem can be interpreted as a non-super-additive cooperative TU game (N ′, v), where v(·) = U(·) as defined in (B.5). CRs cooperate in order to exploit available resources most efficiently. Each CR i is assumed to be interested in cooperation only if its profit is greater than pi. Hence, the reservation payoff v (CBS0, {i}) is set to pi, i.e. v (CBS0, {i}) = pi. Those CRs which prefer not to cooperate will simply choose the best available sub-channel and the cost for accessing each sub-channel will be the uniform distribution of the remaining cost 104 Appendix B.3. Distributed Dynamic Coalition Formation between unoccupied sub-channels in each cell in the core. In this step, the distributed dynamic coalition formation algorithm (Algorithm 4.1 accompanied by Algorithm 4.2) is utilized to reach the core of the defined re- source allocation game. For this purpose, Algorithm 4.1 is required to be modified as follows: Algorithm B.1: (Distributed Dynamic Coalition Formation for Re- source Allocation) Initialization: At n = 0 select initial coalition structure such that each non-singleton coalition comprises at least two CRs and each CR can at least achieve its reservation sleep time v({CBS0, {i}}) = pi. Set p0i = v({CBSk,S(i)}) |S(i)| if |S (i)| > 1 v ({CBS0, {i}}) otherwise ∀i ∈ N ′ (B.18) where S(i) denotes the coalition comprising CR i. Set ω0 = (P0,p0). In addition, let ξ, ² ∈ (0, 1) to be fixed for all sensors in the network. - The following steps are done independently by each CR i ∈ N ′: Step 1—Revision Strategy: Take a random draw from the Bernoulli trial with probability ξ. If the outcome is “keep strategy”, set pn+1i = pni , Sn+1 (i) = Sn (i) and go to Step 5. Otherwise, go to Step 2. Step 2—Evaluation of the Best Strategy for the Next Period: Com- pute pn+1i (ω n) = max k∈K∪{0} ( max I(i)∈Yn k v ({ CBSk, S̃k ∪ {i} }) − ∑ l∈S̃k l 6=i pl ) , (B.19) 105 Appendix B.3. Distributed Dynamic Coalition Formation where pn+1i (ω n) ∈ Di, and Sn+1i (ω n) = {( xn+1 (ωn) ,yn+1 (ωn) ) ;xn+1m = 1, y n+1 m′ = i } , (B.20) respectively. m and m′ are also determined by m ∈ { argmax k∈K∪{0} ( max I(i)∈Yn k v ({ CBSk, S̃k ∪ {i} }) − ∑ l∈S̃k l 6=i pl )} , (B.21) m′ ∈ argmax I(i)∈Yn k { v ( {CBSk, S̃k ∪ {i}} ) − ∑ l∈S̃k l 6=i pl } , (B.22) where S̃n0 = {∅} and Ynk denotes the set of unoccupied sub-channels in cell k and in period n. Step 3—Experimentation: If i ∈ Bn, take a random draw from the Bernoulli trial with probability ². If the outcome in is “experiment”, choose pn+1i uniformly in the interval Di. In addition, choose k ∈ K∪{0} and I (i) ∈Mk uniformly with probabilities 1K+1 and 1|Mk| , respectively. Go to Step 5. Else, go to Step 4. Step 4—Best-reply Process: Set pn+1i = p n+1 i (ω n) and choose Sn+1 (i) ∈ Sn+1i (ωn) with equal probability 1|Sn+1i (ωn)| . Step 5—Recursion: Set n← n+ 1 and go to Step 1. In Algorithm B.1, Di denotes the set of integral multiples of ∆ in the closed interval [ pi,maxP∈C ∑ Sk∈P U (CBSk,Sk) ] , where the upper bound is determined by the solution to the admission step. Theorem 4.3.1 proves that if each CR i ∈ N ′ follows Algorithm B.1 and if the core of the game is non-empty, the core of the resource allocation game is reached with probability one which provides the solution 106 Appendix B.4. Numerical Results −60 −40 −20 0 20 40 60 −40 −30 −20 −10 0 10 20 30 40 X Y p2=0 p4=0 p1=11 p5=7 p6=7 p3=2 p7=8 p8=0 p9=8 p10=0 p11=5 p12=5 CBS 1 CBS 2 Figure B.2: Example 1: Coalition structure and profits allocated to the CRs in the core. to the problem (P2). After reaching the core, those CRs which form singleton coalitions will simply choose the best available sub-channel and the cost for accessing each sub-channel will be the uniform distribution of the remaining cost of available resources in each cell between unoccupied sub-channels when the core is reached. B.4 Numerical Results The simulation set-up is chosen pertinent to the IEEE 802.22 standard. The carrier frequency is assumed 700 MHz (TV UHF band) and sub-channel bandwidth is normalized so that the throughput is expressed as Nats/S/Hz. The channel path loss model is considered to be COST 231, with CBS hight of 50 m and CR height of 2 m. We assume a 3 dB log-normal shadowing and Rayleigh fading with unit mean. In addition, the maximum transmit power of each CBS and the cell radius are assumed to be 4 Watts and 30 Km, respectively. Furthermore, it is assumed that λ = 6, G̃ = 2 and C(x) = C · x in (B.5) with C = 3.5 $/MNats. 107 Appendix B.4. Numerical Results Table B.3: Channel Gains for Example 1 1 2 3 4 g1,I(1),1 5.8e-16 1.2e-10 1.3e-27 0.4869 g6,I(6),1 3e-11 1e-27 0.0056 0.7108 To demonstrate the behavior of the proposed algorithm, two examples are pro- vided in Figs. B.2 and B.3. In both examples, a CR is admitted for each sub-channel by reaching the stable coalition structure through the join and disjoin rules proposed in the admission step. The admitted CRs are shown with unfilled squares in color with the corresponding CBS. Non-singleton coalitions are depicted by filled squares and CRs which are not admitted to the network are depicted by black squares. In the first example, we consider the network configuration illustrated in Fig. B.2. It is assumed that the available bandwidth for each cell is equal to other cells and is divided into 4 equal sub-channels. Fig. B.2 shows the coalition structure and payoffs allocated to the CRs in the core using the proposed distributed dynamic coalition formation algorithm. The sub-channel allocations in the core can also be shown with the vectors y1 = [5, 3, 6, 1] and y2 = [7, 11, 9, 12]. As can be seen in Table B.3, sub- channel 4 in cell 1 provides the highest channel gain for CR 6. However, it prefers sub-channel 3 and lets CR 1 to occupy sub-channel 4 so that the total throughput, and as a result, the total cell profit improves. This can be verified noting that v({CBS1, ({1, 5, 6}, [5, 1, 0, 6])}) = 14.6 and v({CBS1, ({1, 5, 6}, [5, 0, 6, 1])}) = 25.3. In the second example, the network configuration shown in Fig. B.3 is considered and it is assumed that the available bandwidth for each cell is divided into 5 equal sub-channels. Fig. B.3 demonstrates the coalition structure and payoffs allocated to the CRs in the core. The sub-channel allocations in the core can also be shown with the vectors y1 = [0, 0, 2, 0, 7], y2 = [0, 0, 1, 3, 8], and y3 = [0, 6, 4, 0, 0]. As can be seen in Table B.4, the channel gain for sub-channel 2 is higher in cell 1 108 Appendix B.4. Numerical Results −80 −60 −40 −20 0 20 40 60 80 −80 −60 −40 −20 0 20 40 60 80 X Y p2=6 p7=5 p1=8 p3=8 p8=2 p4=6 p6=8 p5=0 CBS 1 CBS 2 CBS 3 Figure B.3: Example 2: Coalition structure and payoffs allocated to the CRs in the core. Table B.4: Channel Gains for Example 2 1 2 3 4 5 g6,I(6),1 5.8e-44 0.1038 7.2e-19 0.0001 2e-34 g6,I(6),3 5.4e-8 0.0599 7.3e-24 8.8e-13 0.0002 than in cell 3. However, as shown in Fig. B.3, due to the load balancing property discussed in Sec. B.2, CR 6 achieves less profit occupying the third sub-channel in cell 1 rather than occupying the second sub-channel in cell 3. This can be verified by comparing the feasible payoff for CR 6 in cell 1, given by v({CBS1, {2, 6, 7}})− v({CBS1, {2, 7}}) = 5.7, and its payoff in the core, p6 = 8. In Fig. B.4, the average individual CR payoff using the proposed distributed dynamic coalition formation scheme is compared to the opportunistic scheduler as the number of CRs in the CRN increases. Each point of the graph is an average over 1000 i.i.d. realizations of the CRN depicted in Fig. B.3. The opportunistic scheduler 109 Appendix B.4. Numerical Results 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 −2 −1 0 1 2 3 4 5 6 N Av er ag e Pa yo ff pe r C R Opportunistic Scheduling Dynamic Coalition Formation Figure B.4: Average individual CR payoff versus number of CRs in the CRN: Dis- tributed Dynamic Coalition Formation vs. Opportunistic Scheduling. in each CBS uses the same utility function as in (B.5) except for the fact that G̃ is uniformly distributed between all sub-channels. As it can be seen in Fig. B.4, the average individual payoff increases as the number of CRs increases in both schemes. However, the distributed dynamic coalition formation approach demonstrates a sig- nificant average payoff increase compared with opportunistic scheduling, achieving up to 95.3% improvement at N = 32. 110
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Distributed dynamic coalition formation for bearings-only...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Distributed dynamic coalition formation for bearings-only localization in wireless sensor networks Namvar Gharehshiran, Omid 2010
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Distributed dynamic coalition formation for bearings-only localization in wireless sensor networks |
Creator |
Namvar Gharehshiran, Omid |
Publisher | University of British Columbia |
Date Issued | 2010 |
Description | Lifetime maximization is a key challenge in the design of sensor-network-based tracking applications. In this dissertation, formation of optimal coalitions of nodes is investigated for data acquisition in bearings-only target localization such that the average sleep times allocated to the nodes are maximized. Targets are assumed to be localized with a pre-defined accuracy where the determinant of the Bayesian Fisher information matrix (B-FIM) is used as the metric for estimation accuracy. Cooperative game theory is utilized as a tool to devise a distributed dynamic coalition formation algorithm in which nodes autonomously decide which coalition to join, while maximizing their feasible sleep times. Nodes in the sleep mode do not record any measurements; hence, save power in both sensing and transmitting the sensed data. The proposed scheme reduces the number of sensor measurements by capturing the spatio-temporal correlation of the information provided by the sensors from one side and bounding the localization accuracy to the pre-defined value from the other side. It is proved that if each node operates according to this algorithm, the average sleep time for the entire network converges to its maximum feasible value. In numerical examples, we illustrate the inherent trade-off between the localization accuracy and the average sleep time allocated to the nodes and demonstrate the superior performance of the proposed algorithm via Monte Carlo simulations. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-01-22 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0068953 |
URI | http://hdl.handle.net/2429/19003 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2010-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2010_spring_namvar gharehshiran_omid.pdf [ 1.04MB ]
- Metadata
- JSON: 24-1.0068953.json
- JSON-LD: 24-1.0068953-ld.json
- RDF/XML (Pretty): 24-1.0068953-rdf.xml
- RDF/JSON: 24-1.0068953-rdf.json
- Turtle: 24-1.0068953-turtle.txt
- N-Triples: 24-1.0068953-rdf-ntriples.txt
- Original Record: 24-1.0068953-source.json
- Full Text
- 24-1.0068953-fulltext.txt
- Citation
- 24-1.0068953.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0068953/manifest