UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Essays in political economy Cornwall, Thomas Hans Dillon 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2018_november_cornwall_thomas.pdf [ 4.34MB ]
JSON: 24-1.0372876.json
JSON-LD: 24-1.0372876-ld.json
RDF/XML (Pretty): 24-1.0372876-rdf.xml
RDF/JSON: 24-1.0372876-rdf.json
Turtle: 24-1.0372876-turtle.txt
N-Triples: 24-1.0372876-rdf-ntriples.txt
Original Record: 24-1.0372876-source.json
Full Text

Full Text

Essays in Political EconomybyThomas Hans Dillon CornwallM.A., The University of British Columbia, 2009B.A., Simon Fraser University, 2008A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Economics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)October, 2018c© Thomas Hans Dillon Cornwall 2018The following individuals certify that they have read, and recommend to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, the dissertation entitled:Essays in Political Economy, submitted by Thomas Cornwall in partial fulfillmentof the requirements for the degree of Doctor of Philosophy in Economics.Examining Committee:Robert Heinkel, Business ChairFrancesco Trebbi, Economics SupervisorPatrick Francois, Economics Supervisory Committee MemberWei Cui, Law University ExaminerJoshua Gottlieb, Economics University ExaminerGergely Ujhelyi. Economics External ExaminerAdditional Supervisory Committee Members:Siwan Anderson, Economics Supervisory Committee MemberThorsten Rogall, Economics Supervisory Committee MemberiiAbstractChapter 1 estimates how an individual’s expressed sentiment responds to messagesfrom their social network connections. I use machine learning to code messages forexpressions of one type of sentiment: happiness. Because network link formationis not random, I use exogenous shifters to instrument for the message volume ofeach of a user’s neighboring nodes. Specifically, I interact neighbor daylight withaverage neighbor sentiment, and aggregate this across neighbors to construct aninstrument for viewed messages. A user with neighbors in different places withdifferent average sentiment receives a shock to their feed when light levels differacross those places. I find that a user’s happiness increases by 3.4% when thehappiness of incoming messages increases by 10%.Chapter 2 presents a general framework for estimating the causal effect of socialinteractions on online social networks. This context presents two challenges forcausal estimation beyond the endogeneity problem discussed above. First, socialnetworks are dynamic: users are affected not only by contemporaneous messagesbut also by past messages. Second, some data is missing. These networks haverelatively low levels of clustering, which means that it is computationally infeasibleto collect all of the neighbors of a sample of the network. I introduce an estimationstrategy for addressing these two challenges. I also construct six new instrumentswithin this framework and compare their strength.Chapter 3 develops a method for estimating the impact of voter demobilizationefforts on voter turnout. We exploit two facts: a) demobilization is typically tar-geted to avoid the supporters of the intended beneficiary, and b) voting results areavailable at the sub-district (poll) level. Omitted variables will generally be constantiiiAbstractacross a district, while the impact of the violations will be decreasing in the levelof support for the violating party. Our method is general in the sense that it doesnot require a natural experiment and is robust to countrywide shifts in voter sup-port (”swing”). We apply this method to allegations of fraud in the 2011 Canadianfederal election, and estimate that illegal demobilization efforts reduced turnout by3.9% in affected districts.ivLay SummaryThis thesis consists of three chapters in behavioural economics, the econometricsof networks, and political economy. In Chapter 1, I investigate how an individual’shappiness is affected by the happiness of their neighbors through online networks,using data from Twitter. I find that a 10% increase in the happiness of the mes-sages a user sees increases the happiness of the messages they send out by 3.4%.In Chapter 2, I build on the identification strategy outlined in Chapter 1 to developa framework for estimating how individuals respond to their neighbor’s character-istics on online networks. Chapter 3 investigates the effect of robocalls on voterturnout in the 2011 Canadian Election. We find that a robocalls reduced the turnoutof opposition voters by 3.9% in affected districts.vPrefaceChapters 1 and 2 are my original, unpublished, and independent work. Chapter 3 isjoint work with Professor Anke Kessler (SFU). The research question was originallyposed by Dr. Kessler. I developed the identification strategy. The analysis andwriting was split equally between us.viTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Estimating the Diffusion of Sentiment Through Networks . . . . . . . 41.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.3 Description of the Data . . . . . . . . . . . . . . . . . . . . . . . . 151.4 Empirical Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . 221.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362 The Econometrics of Social Interactions on Online Networks . . . . . 512.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512.2 General Framework . . . . . . . . . . . . . . . . . . . . . . . . . 542.3 New Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . 782.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86viiTable of Contents3 Voter Demobilization: Estimating the Impact in Multi-district Elec-tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893.2 Data and Empirical Strategy . . . . . . . . . . . . . . . . . . . . . 983.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1073.4 Robustness Checks . . . . . . . . . . . . . . . . . . . . . . . . . . 1103.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121AppendicesA Effects on Recipient Volume . . . . . . . . . . . . . . . . . . . . . . . 132B Proof of Consistency of Corrected Estimator . . . . . . . . . . . . . . 136C Nonlinear Model Results . . . . . . . . . . . . . . . . . . . . . . . . . 137D Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138E Electoral District Lists . . . . . . . . . . . . . . . . . . . . . . . . . . 140viiiList of Tables1.1 List of Variables Used and Definitions . . . . . . . . . . . . . . . . 401.2 List of Variables Used and Definitions con’t . . . . . . . . . . . . . 411.3 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 411.4 OLS Recipient Sentiment on Sender Sentiment and Sender Volume . 441.5 First Stage Feed Sentiment on Light-Sender Sentiment Interaction . 451.6 First Stage Feed Volume on Feed Light Levels . . . . . . . . . . . . 461.7 IV: Recipient Sentiment on Feed Sentiment . . . . . . . . . . . . . 471.8 IV: Recipient Sentiment on Feed Volume . . . . . . . . . . . . . . . 481.9 Contribution of Network to Well-being . . . . . . . . . . . . . . . . 491.10 Effects of Twitter on Mean of Sentiment, Between and Within Users 491.11 Effects of Twitter on Standard Deviation of Sentiment, Betweenand Within Users . . . . . . . . . . . . . . . . . . . . . . . . . . . 502.1 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 812.2 Effects of Volume Shifters on Feed Size . . . . . . . . . . . . . . . 832.3 Effects of Instruments for Sentiment on Feed Sentiment . . . . . . . 853.1 District-Level Summary Statistics . . . . . . . . . . . . . . . . . . 1013.2 Cross-Section Regression of Turnout at the District Level . . . . . . 1073.3 Within District Regression of Turnout at the Poll Level . . . . . . . 1083.4 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 1123.5 Controlling for Campaign Intensity . . . . . . . . . . . . . . . . . . 1163.6 Falsification Tests Using the 2008 and 2006 Elections . . . . . . . . 117A.1 OLS: Recipient Volume on Feed Sentiment and Feed Volume . . . . 133A.2 IV: Recipient Volume on Feed Sentiment . . . . . . . . . . . . . . . 134A.3 IV: Recipient Volume on Sender Volume . . . . . . . . . . . . . . . 135C.1 Nonlinear Model Results . . . . . . . . . . . . . . . . . . . . . . . 137ixList of Figures1.1 Locations of Recipients (top) and Senders (bottom) . . . . . . . . . 381.2 Geographic Mean Sentiment of Recipients (top) and Senders (bottom) 391.3 Locations and Light Levels for Neighbors of User 516134500 . . . 421.4 Locations and Sentiment for Neighbors of User 516134500 . . . . . 433.1 Canadian Federal Electoral Districts . . . . . . . . . . . . . . . . . 1003.2 The District of London North, ON . . . . . . . . . . . . . . . . . . 120xAcknowledgementsThe completion of a PhD requires the efforts of many individuals whose names donot appear on the title page. But even by those standards, I feel I am unusuallydependent on support from others. My studies were interrupted partway through bya serious medical incident. So while the completion of this thesis was a considerableendeavour that required the assistance of many mentioned below, recovering myhealth was a necessary condition for completion and was equally challenging. I’mindebted to the following individuals:My supervisors, Francesco Trebbi and Patrick Francois. I’m especially gratefulfor how they responded while I was on medical leave. They never pressured meto return before I was able. When I did come back, they helped me return at myown pace. Had I been less fortunate in my choice of department and supervisors,I would have been forced out of the program years ago. On a less personal note,Francesco, Patrick, and Thorsten Rogall provided outstanding support during thecompletion of this document.Maureen Chin, our Graduate Progam Advisior. She probably had to file morepaperwork for me than for any other grad student in the history of the program, andnever once complained or made a mistake.The team who helped me heal during my leave period, and helped me stay pro-ductive for the remainder of my studies: David Bailey, Susan Barlow, Maya Bleiler,Bev Kosuljandic, Lisa Montesi, Deborah Swan, Adrienne Wang, and Wendy Woodger..The cohort that adopted me on my return: Joao Fonseca, Alastair Fraser, BradHackinen, Nouri Nijjar, Jose Pulido Pescador, Ian Snoddy, and Ruoying Wang.My other classmates at UBC: Nathan Canen, Jacob Schwartz, Rogerio Santarossa,xiAcknowledgementsHugh Shiplett, Gaëlle Simard-Duplain, Adlai Newsom, Neil Lloyd, and TímeaLaura Molnár. All provided encouragement and friendship in when it was neededmost.Siwan Anderson, Matilde Bombardini, Vadim Marmer, Kevin Song, and MunirSquires, who provided suggestions and comments on my research, and on preparingfor the job market.Anke Kessler, my first coauthor. She taught me a great deal about how to writea paper and was patient and generous in doing so.My parents, Claudia and Gordon, my sister, Talia, and my in-laws, Bob, Davey,Gavin, Karen, and Pat. I’m more grateful than I can say for your support, bothemotional and financial, and your encouragement.Finally, my wife, Leanna. Your unfailing belief in me, your wisdom, and yourequanimity sustained me throughout this process. May I give you lighter burdensto carry in the future.xiiFor LeannaxiiiIntroductionThe chapters of this thesis cover somewhat different fields in economics. Never-theless, they share three important similarities. First, they are all concerned withcausal estimation on large observational datasets. Second, they introduce identi-fication strategies that exploit particular features of the datasets that would not bepossible in more aggregated data. Third, all three chapters examine the transmissionof information (in a broad sense) in non-market settings.Chapter 1 estimates how an individual’s expressed sentiment responds to mes-sages from their neighbors on social networks. I use machine learning to code mes-sages for expressions of one type of extensively studied and readily quantifiablesentiment: happiness. Because network link formation is not random, I use exoge-nous shifters to instrument for the message volume of each of a user’s neighboringnodes. Specifically, I interact neighbor daylight with average neighbor sentiment,and aggregate this across neighbors to construct an instrument for the content ofviewed messages (their feed). A user with neighbors in different places with differ-ent average sentiment receives a shock to their feed when light levels differ acrossthose places. This instrument is broadly applicable to estimating social interac-tions on online social networks in the presence of endogeneity. I also present othersuitable exogenous shifters for sender behavior. I find that a user’s happiness in-creases by 3.4 percentage points when the happiness of their feed increases by 10percentage points. In addition, users become 4.7 percentage points happier whenthe number of messages they receive doubles, holding content constant. I comparethe magnitude of these effects with estimates from the literature on the effect ofincome on well-being.1IntroductionChapter 2 presents a general framework for estimating the causal effect of so-cial interactions on online social networks. This context presents three challengesfor causal estimation. First, there is endogeneity. This problem is largely addressedin Chapter 1, but Chapter 2 presents a more general formulation and discusses anadditional source of endogeneity: recipient selection into sending messages. Sec-ond, these networks are dynamic: users are affected not only by contemporaneousmessages but also by past messages. Third, some data is missing. Social networkshave relatively low levels of clustering, which implies that it is computationally in-feasible to collect all of the neighbors of a sample of the network of any reasonablesize. In addition, there is a censoring problem: user’s responses to messages are notobserved when they do not themselves send messages.This chapter makes two contributions. First, I introduce an estimation strat-egy for addressing these three challenges. Second, I construct six new instrumentswithin this framework, based on temperature, precipitation, local time, a users’shistorical pattern of tweeting throughout the day, and the long-run trend in a user’smessage volume. I compare the strength of these instruments. These two contri-butions are complementary in two ways. First, some of the methodological inno-vations are necessary preconditions for making meaningful comparisons betweenthe instruments. Second, some of the methodological challenges are most readilysolved by multiple instruments.Chapter 3 investigates the practice of voter demobilization: efforts by one partyin a election to reduce the turnout of their opponent’s supporters. We develop amethod for estimating the impact of voter demobilization efforts on voter turnout.In multi-district elections, estimates of the effect of demobilization efforts on dis-trict level turnout will be biased. This is because both likelihood of demobilizationand turnout are increasing in perceived closeness of the race, and this is only im-perfectly observed. We exploit two facts: a) demobilization is typically targeted toavoid the supporters of the intended beneficiary, and b) voting results are availableat the sub-district (poll) level. Closeness of the race and other omitted variables,2Introductionsuch as candidate quality, will generally be constant across a district, while theimpact of the violations will be decreasing in the level of support the beneficiaryenjoys. Our method is general in the sense that it does not require a natural ex-periment and is robust to countrywide shifts in voter support (“swing”). We applythis within-district approach to allegations of fraud in the 2011 Canadian federalelection, and estimate that illegal demobilization efforts reduced turnout by 3.9% inaffected districts, or approximately 1,800 votes per district.3Chapter 1Estimating the Diffusion ofSentiment Through NetworksWhatever is the passion which arises from any object in the person principallyconcerned, an analogous emotion springs up, at the thought of his situation,in the breast of every attentive spectator. – Adam Smith, The Theory of MoralSentiments1.1 IntroductionThis chapter explores the process by which sentiment diffuses through social net-works and provides estimates of the parameters governing such diffusion. 81% ofAmericans have at least one social media account (Greenwood et al., 2016), andthe average social media user spends 2.15 hours per day on these networks (Man-der, 2016). Diffusion of information through networks is a widely studied topic ineconomics, both theoretically1 and empirically.2This chapter focuses instead on diffusion of emotion. I follow Rick and Loewen-stein (2008) who divide emotions into two categories, integral (emotions generatedby the decision or context at hand), and incidental (emotions generated by unrelated1See Acemoglu et al. (2010), and (Jackson, 2010) and the references therein.2For a sense of the breadth of fields where information diffusion has been investigated, seeHalberstam and Knight (2016) for transmission of information on Twitter, Banerjee et al. (2014) forthe effect of diffusion of information on economic opportunities, and Trebbi and Weese (2015) forevidence of coordination among insurgent groups in Iraq and Afghanistan. Mokyr (2007) discussesthe importance of the “Republic of Letters,” a network of Europe’s brightest minds, in creating theconditions for the Industrial Revolution.41.1. Introductiondecisions or contexts). While integral emotions are recognized as playing a key rolein the decision making process (Damasio, 1994), this does not imply they are rele-vant for economists. For example, an individual confronted by an armed group ata voting station may feel fear and run away, but it is not necessarily to preciselymeasure the voter’s fear response to estimate the deterrent effect that armed groupshave on voter turnout.However sound this logic may be for integral emotions, it does not hold whenemotions incidental. For example, if a voter’s decision to participate is affectedby the performance of her favorite sports team (Healy et al., 2010), understandingthe sources of incidental emotions becomes relevant for understanding outcomes ofinterest to political economists (in this case, who wins the election).3While there are many possible sources of incidental transmission, economistssince Smith (1822) have argued that social networks are a key source of inciden-tal emotions. Durkheim (1912) and Canetti (1962) argue that not only are socialnetworks an important source of incidental emotions, but that this transmission isa critical factor in the organizing of one type of social network, crowds, where alarge number of interpersonal interactions take place at high frequency. The powerof these emotionally charged crowds has been recognized since ancient Greece andRome (by Plato and Livy, respectively, see Plato et al. (1988) and Livy (1960)).They have been key catalysts for some of the most important political revolutions.4Although crowds are now more likely to form online, they continue to be politi-cally relevant today (e.g., the Arab Spring (Howard et al., 2011) and the OrangeRevolution in the Ukraine (Beissinger, 2013)).53Other notable examples of papers demonstrating the existence of incidental emotions includeCard and Dahl (2011) on the effort of sports team performance on domestic violence; Reifman et al.(1991)on the link between temperature and violent behaviour, and Chen and Loecher (2016) on theeffects of both sports and weather on judicial decisions.4Crowds have played an important role in the American (Nash, 2009), French (McClelland,2010) , and Haitian (Fick, 1990) revolutions, among many others.5One way to interpret the model of political transitions in Acemoglu and Robinson (2001) is thatthere has been a move to institutionalize the power of the general public in ways (such as regularelections, secret ballots, and jury trials) that are less uncertain and less volatile than crowds threat-51.1. IntroductionDespite their importance, there has been little empirical research done on emo-tional transmission in crowds or other social networks. This is partly due to practi-cal difficulties in measuring transmission. While researchers have developed tech-niques for measuring emotions in the lab (Adam et al., 2011), it has been impossibleto extend these to in-person social networks.6 And while social networks have pro-duced written communications amenable to sentiment analysis, they have histori-cally done so on a time scale that is likely too long to detect emotional transmission.This paper makes three contributions. First, I introduce a general instrument foridentifying social interactions on online social networks. There are two economet-ric challenges I must overcome to consistently estimate these social interactions:reflection7 and homophily.8 These challenges are well recognized in the literatureon identifying peer effects online (Shalizi and Thomas, 2011), and in the socialinteractions literature more generally (McPherson et al., 2001).I proceed by identifying factors that a) shift the volume of messages social net-work users send, b) vary over time and across space, and c) are unrelated to thecontemporaneous characteristics of the network. There are a number of possiblecandidates: local time, daylight, temperature, and precipitation. All of these factorsening revolution. These institutions are also less amenable to emotional transmission. One effectof online social networks is to create a channel through which “the power of the street” (Acemogluet al., 2014) can be exercised without these constraints.6There has been considerable controversy over whether emotions can in fact be measured at all.Scherer (2005) defines emotions to include both feelings (the internal experience of an emotion) andverbal expressions and physical responses to stimulus. While it is impossible to measure the trans-mission of feelings, without relying on self reports, it is possible in principle to measure transmissionof their external manifestations (or lack therof).7Reflection refers to a situation where Alice responds to Barbara’s action by taking some action,which Barbara responds to, which Alice then responds to again. This leads to inflated estimates ofthe effect of Barbara’s actions on Alice.8Homophily is a common property of networks with endogenous linking: people are more likelyto befriend individuals with whom they share characteristics (Jackson, 2010). Bollen et al. (2011a)show this is a property of online social networks for sentiment, a finding I confirm in my data.Halberstam and Knight (2016) show it also holds for political affiliation. Note that homophilycannot be dismissed simply by looking at within-user variation because users likely share time-varying unobservable characteristics (e.g., correlated responses to specific types of news events) aswell as time-invariant ones.61.1. Introductionaffect the number of messages that users send.9 I focus on daylight because it variesfor all users and has a quantitatively large impact on the number of messages sent.My measure of daylight is whether the sun is above the horizon for a given user ata specific time of day.10To see how changes in message volume can produce changes in the averagecontent of the messages a user sees (their feed), consider two users, Alice and Bar-bara, in the same city. Suppose one wants to estimate how they respond to messagesthat express positive sentiment. Alice follows one positive sender in Los Angelesand one neutral sender in New York. Barbara follows one positive sender in NewYork and one neutral sender in Los Angeles. On average, sunrise in New York willbe two hours and 57 minutes earlier than in L.A. In the mornings, Barbara will seemore positive messages than Alice, and in the evening this will reverse.I employ a specification in which several dimensions of unobserved heterogene-ity are accounted for, through the use of user-fixed effects, time-fixed effects, andcontrols for the light level of the recipient and the average of the light levels of thesenders a recipient follows. I explain the importance of these controls in greaterdetail in section §1.4.For this instrument to be valid, it must be the case that changes in recipient sen-timent are unrelated to the changes in sender sentiment caused by daylight. For ex-ample, suppose recipients who were systematically happier earlier in the day weremore likely to follow senders who sent more positive messages in the morning, ascompared to recipients who were happier in the evenings. Suppose further that thepattern were reversed for both types of recipients in the evening. If the differences9People send more messages when it rains and where temperatures are uncomfortably hot orcold, likely because bad weather reduces the attractiveness of their outside options: the term “outsideoption” has an especially literal meaning in this context. These effects are strongly statisticallysignificant, but substantially smaller in magnitude than those for daylight.10This does not imply the relationship between daylight and tweet volume is causal — light iscorrelated with local time and may be correlated with weather. However, I do not need to estimate thecausal effect of light on tweet volume of the senders of messages in order to use it as an instrument,provided that sender light and the variables it partly proxies for only affect recipients through senderbehavior.71.1. Introductionin message volume between the two types of senders were caused by daylight, theexclusion restriction would be violated.Because linking decisions are endogenous, the exclusion restriction can be in-terpreted as a limitation on how users form links on the network. Any one of the fol-lowing restrictions will be sufficient: 1) When recipients choose to follow a sender,they lack information about the time profile of the sender’s future messages. 2) Therecipient’s preferences over the sentiment of the messages they receive are indepen-dent of within-day changes in their own sentiment. 3) The recipients fail to appre-ciate that the preferences of their future self for message positivity vary predictablyover time. However, if recipients are simply more likely to follow senders whotend to tweet when the recipient experiences daylight, regardless of the sender’ssentiment level, this will not present a problem.Two features of this instrument are worth highlighting. First, there is variationin the instrument for almost all users, and all days. Of course, the structure ofthe network means that some users have more variable feeds than others, but theestimates I obtain are not identified from the behavior of a small sub-population oran isolated event. This means the local average treatment effect I identify is likelyto be close to the average treatment effect for the population. Second, while I usethis instrument to investigate the transmission of happiness, the instrument is notspecific to this emotion, or even to emotions in general. The same procedure can beused to estimate the impact of greater exposure to any observable characteristic ofsenders or of messages.11The second contribution of the paper is to use the above instrument to estimatewhether individuals are positively or negatively affected by the sentiment of theirneighbors on the network Negative effects are in principle possible if users aremotivated by envy (Krasnova et al., 2013), (Appel et al., 2015), schadenfreude,or “fear of missing out” (Kim et al., 2009). This question is of intrinsic interestbecause it provides a measure of the likely stability of observed outcomes on the11In another paper, I use the same instrument to estimate how white supremacist supporters re-spond to messages that express anger or discuss violence.81.1. Introductionnetwork. Large, positive parameter estimates imply that small changes in externalconditions will produce large changes in observed behavior on the network.I collect data from the Twitter API12 covering over 5,000 recipients, over 10,000senders, over 60 million messages, and over a billion recipient-sender interactions. Ifocus on a particular dimension of sentiment, the degree of happiness users expressin their messages. This is the most studied dimension of sentiment in the litera-ture and also by far the most commonly expressed one on social media. I measurehappiness by using machine learning to code words in message text by their simi-larity to the word “happy”,13 as measured by the probability of co-occurrence. Thisis similar to the approach taken by other papers that extract sentiment from socialnetwork messages.14 These approaches have been shown to produce results con-sistent with survey-based questions that ask respondents about their current level ofhappiness.15I find that positive expressions of recipient sentiment increase by 3.4 percent-age points when the sentiment expressed by incoming messages increases by 10percentage points. The magnitude of this effect is quite stable across alternativespecifications. To the best of my knowledge, this is the first paper to estimate12An Application Programming Interface (or API) is a web service that allows for data collectionwithout manually loading the corresponding webpages in a browser and saving the contents. TheTwitter API is described in detail at https://developer.twitter.com/en/docs.html. Retrieved July 6,2018.13This approach assumes that that expressions of sentiment in a messages are comparable acrossmessages and users, and that message content is fundamentally honest. If some users are sarcasticand become more negative and more sarcastic when exposed to positive messages, this method ofinterpreting messages will fail. However, this approach is not different from what has been done ina multitude of other studies that use sentiment calculated from text analysis as a dependent variable(see examples in the following footnote). More work is needed on showing the degree to whichindividuals honestly report their emotional states in social network messages, but this is outside thescope of this thesis.14Bollen et al. (2017) show that average sentiment on Twitter is closely correlated with stockmarket returns. Baylis (2015) and Baylis et al. (2017) show that weather that is generally consideredto be unpleasant (e.g., rain, high temperatures) causes social media users to express less positivesentiment in their messages.15 Using Turkish data, Durahim and Cos¸kun (2015) show that subnational variation in the senti-ment expressed in tweets is closely correlated with subnational variation in offline survey measuresof well-being.91.1. Introductionemotional transmission on social networks from observational data in a way thataccounts for reflection and homophily in unobservables with time-varying impact.These estimates contribute to a small literature on online transmission of emo-tions. A number of papers investigate this issue using cross-sectional16 or paneldata,17 but these approaches fail to address endogeneity. Weinstein (2017) doesso by using a lab experiment to investigate the emotional consequences of partic-ipating in social media, and finds that it worsens the user’s emotional state. Thechallenge with lab experiments is they effectively require the researcher to specifythe counterfactual — what social media users would do if they were not on socialmedia.18 Two papers avoid this issue by using field-experimental (Kramer et al.,2014) or quasi-experimental (Coviello et al., 2014) variation in viewed content toidentify the degree of sentiment transmission, but do not agree on the effect size,and do not distinguish between volume and content effects. I contrast my work withthese papers in more detail in Section 1.2.The third contribution of this paper is to estimate the total effect of Twitter onexpressions of positive sentiment by its users. I combine the above approach for es-timating how users react to emotionally laden content from peers, with estimates ofhow users react to emotionally neutral content. As any participant in the academicpeer review process can attest, content that itself lacks an expression of emotioncan nonetheless provoke strong emotional responses in the recipient. I estimatehow recipients respond to changes in the volume of messages they receive, hold-ing message content constant. The logic of the instrumental variables strategy isvery similar to that for sentiment: I instrument for sender volume using average16For example, Pittman and Reich (2016) conduct their own survey and find generally positiveeffects from social media.17Kraut et al. (1998), Kross et al. (2013), and Shakya and Christakis (2017) find negative effectsof social network use on well-being in panel data.18There is a broader literature on offline emotional transmission. For example, Eisenberg et al.(2013) show that college students who are assigned depressed roommates are more likely to be-come depressed. Fowler and Christakis (2008) show that an additional happy friend can increasea person’s own happiness by as much as 25%. However, the papers in this literature have similarmethodological issues to those that investigate transmission of emotions in online settings.101.1. Introductionsender light levels. Since I control for message content, this can be interpreted asthe average emotional response to the non-emotional content of the messages. I findquantitatively small, but statistically significant, positive effects of sender messagevolume on recipient sentiment. A one standard deviation increase in the number ofmessages received causes an increase in expressed positive sentiment of 0.13 stan-dard deviations. These estimates are obviously specific to the content in question.However, these aggregate estimates are useful not only for estimating the total ef-fect of network participation on sentiment, but also as a baseline of comparison forfuture work on estimating emotional transmission for specific topics. An importantlimitation of this approach is that I do not have estimates of the prevalence and de-gree of transmission of negative emotions. I discuss this in more detail in Section1.5.1.With these estimates, I compare observed data to a counterfactual where all arecipient’s links are removed from the network graph, so the user no longer receivesany messages. I remove the estimated effect of sender sentiment from observedrecipient sentiment. I then subtract from this the increase in sentiment due to thevolume of incoming tweets. I find that destroying all recipient links would decreaserecipient expressions of positive sentiment by 10.6%. According to estimates fromStevenson and Wolfers (2013) for rich countries, a 10.6% decrease in happiness isequivalent to a decrease in GDP of 22%. When multiplied by the number of USTwitter users and the fraction of time spent on the network, this implies that thetotal annual social benefit from Twitter in the US alone is $21 billion, or 1.4 timesTwitter’s current market capitalization.The rest of the paper is organized as follows: Section 1.2 differentiates this pa-per from previous work that estimates diffusion of emotions. It also discusses linksto other economic research that involves social media use and sentiment. Section1.3 discusses the structure of Twitter, the advantages it has for the questions I ask,and my data collection procedure. Section 1.4 provides a detailed discussion of theidentification strategy. Section 1.5 presents the results, and Section 1.6 concludes.111.2. Related Work1.2 Related Work1.2.1 Previous Work on Estimating the Transmission ofSentimentThis paper is most closely related to two papers that estimate the effect of peersentiment on an individual’s own sentiment on online social networks. The first isan experiment conducted by Kramer et al. (2014). The researchers perturbed theFacebook algorithm for selecting user feeds and slightly decreased the proportionof happy messages users received. They found that 53% of the decrease in positivepeer messages is passed through to users.19 Since the experiments manipulatedmessage content but not volume of messages, the authors cannot report volumeeffects.This research was extremely controversial (Kahn et al., 2014). A number of in-dividuals and institutions objected to the appearance that Facebook was conductingresearch without subject consent. Facebook apologized for the way the researchwas communicated, and the journal issued an “Editorial Expression of Concern”.Given the controversy, it seems unlikely that Facebook or other social media com-panies will have much interest in the publication of similar research in the nearfuture. Progress on these questions will require other approaches.Coviello et al. (2014) represent such an approach. They use rainfall as an instru-ment for sender sentiment (people are less happy when it rains). After aggregatingusers to the city level, they find that a 10% increase in sender sentiment increasesrecipient sentiment by 15%. Unfortunately, there are two probable violations of theexclusion restriction. First, because the authors aggregate their data to the city-daylevel, their estimates are contaminated by reflection: without knowing how oftenusers communicate with one another (how many reflections there are), the magni-tude of the estimated coefficients is not obviously interpretable. Second, a user’s19They did not report the peer effect specifically but it can be reconstructed from the data they didreport.121.2. Related WorkFacebook network is correlated with their offline social network. Since emotionslikely transmit through the offline network, this violates the exclusion restriction.Estimating these effects on Twitter avoids this problem because users are much lesslikely to have offline relationships with their online neighbors. Also, since rainfallaffects both message volume and message sentiment, the resulting estimates haveto be interpreted as a combination of these two treatments, whereas I am able toseparate these two effects.1.2.2 Investigating the Impact of Online Social NetworksMore generally, I contribute to a growing literature in economics that examines theconsequences of online social networks using two approaches.The first approach, and the one taken in this chapter, is bottom-up. The papersin this literature pinpoint a specific feature of online social networks and analyzehow that aspect of the network functions. This includes papers that examine howinformation is transmitted over social networks. Halberstam and Knight (2016)show that political information spreads faster in larger groups on Twitter. Theyalso show that Twitter users are more likely to follow one another if they sharepolitical views (homophily). There is also a large empirical literature in computerscience that studies information diffusion on social networks. This literature islargely descriptive in nature, although Goel et al. (2015) argue that the observedpattern of retweets is consistent with homophily.A related set of papers use data from social networks to measure high-frequencytime variation in sentiment, as either a dependent or independent variable. Bollenet al. (2011b) show that moods expressed on Twitter predict stock market perfor-mance. Baylis (2015) and Baylis et al. (2017) estimate how changes in weatheraffect well-being, and apply the results to forecasting the hedonic impacts of cli-mate change.The second is a top-down approach. The papers in this literature examine anonline social network in its entirety in a relatively specific setting and measure131.2. Related Workoffline effects. Acemoglu et al. (2014) show that aggregate social network activitypredicted street protest during the Arab Spring in Egypt. Qin et al. (2016) showthat social media posts are locally predictive of both public protest and corruptioncharges against public officials in China. Enikolopov et al. (2016) exploit quasi-random variation in the market penetration of Russia’s leading social network toshow that online social networks cause increases in street protest, and they mayalso increase support for the incumbent party. In related work, Enikolopov et al.(2016) show the patterns of protest are consistent with social signaling motivatingprotest participation. More generally, Manacorda and Tesei (2016) show that theprovision of mobile phone service in Sub-Saharan Africa increases the sensitivityof protest to economic conditions and enables coordination.The advantage of the top-down approach is that it provides information aboutthe types of offline outcomes that social networks can produce. However, there aremany online social networks and they differ substantially in terms of demographics,scale, and user interface. Furthermore, on any given network, these characteristicsare evolving rapidly over time. It is difficult to know what outcomes of socialnetworks are generalizable and which are not. The advantage of the bottom-upapproach is that it allows for parameterization of specific features of the network sothat the consequences of changes to those parameters can be inferred.Finally, by specifically estimating the hedonic value of Twitter, I contribute toa literature that aims to estimate the consumer surplus created by online activity.Following Goolsbee and Klenow (2006), this literature generally uses time use tomeasure the value of internet services, because the prices of the goods are verylow relative to the amount of time spent using them. Brynjolfsson and Oh (2012)separately estimate the value of different internet services. They estimate Twittercreated $1.1 billion of consumer surplus per year in the United States. The differ-ence between their figures and those reported in this paper is likely because Twittergrew by approximately ten times between the respective data collection periods.Valuing social networks through time use may be even more challenging than141.3. Description of the Datavaluing other internet services, such as Wikipedia or Google Maps. First, timeuse surveys may struggle to measure time on social networks because it is often asecondary activity. For example, users often log in to Twitter while watching TV orsporting events to comment in real-time on those other events, or for brief periods atwork. This would bias time-use estimates downwards. Second, participation in anysocial network is strongly strategically complementary (if no one else is a member,then participation is useless). If there are negative externalities to non-participation,people may spend time on the network despite the fact that they do not enjoy doingso. This would bias time-use estimates upwards.1.3 Description of the Data1.3.1 Data SourceThe data source for this paper is messages collected from Twitter. Twitter is a pop-ular social network, with 52 million users in the United States (Greenwood et al.,2016).20 Twitter messages are referred to as tweets and are short, 140 characters orless.21 The messages may include images or links to external sites. Users decidewhat messages they see by choosing to follow other users. For example, Alice maychoose to follow Barbara but this doesn’t imply that Barbara will receive messagesfrom Alice.In brief, Twitter functions as follows: when users log in to Twitter, they seethe messages of the people they follow in chronological order (their feed). Userscan create their own messages or forward messages they see to their own followers(retweeting). Users can also reply to messages. If they do so, they have the optionof making the reply visible to all their followers, or only to those followers whofollow both them and the sender of the original message. Messages may be tagged20It was the second or third most popular network during the sample period, behind Facebookand, at times, Instagram.21Twitter worked this way during my period of analysis. Users now have the ability to send 280character tweets.151.3. Description of the Datawith hashtags and/or geotags. In addition to looking at messages in their feed, userscan search for messages on a particular topic, either by hashtag or with a keywordsearch.For research purposes, Twitter has a number of advantageous features. First, theasymmetric nature of relationships on Twitter means that a very small fraction ofusers produce a large share of the viewed content. This makes it feasible to analyzethe content produced in a way that would not be possible if the content creation wasmore decentralized. It also means that many of the links on Twitter are betweenpeople who have no offline relationship. Potentially, this increases the diversity ofviews to which people are exposed, thus increasing the range over which responsescan be identified. Second, the shortness of the messages means that each messageis similar in terms of the amount of informational content. Finally, the shortnessof the messages and the general lack of personal connections between sender andrecipient mean that this is quite a demanding environment in which to establishemotional contagion. The estimates obtained are likely to be lower than those onewould find on family networks, or between close friends.1.3.2 Data Collection ProcedureApproximately 68 million, or 21% of Americans have a Twitter account.22 In col-lecting the data, my goal was to collect the most representative sample of theseusers Twitter users possible,23subject to two limitations. First, I need to be ableto locate users to determine the value of the instrument for them. Second, Twitterlimits the number of requests that can be made to its API, and so I need to choosethe subset of the data that is likely have variation in the feed content and expressedsentiment.22Washington Post, Jun 27 2017, Twitter lost 2 million users in the U.S. last quarter, retrieved July15, 2018.23These users tend to be younger and somewhat more concentrated in urban areas than the generalpopulation (Mislove et al., 2011). I do not attempt to reweight the accounts in my sample to matchthe US averages because Twitter does not provide any demographic covariates for its users, althoughthey could in principle be inferred based on language use and other characteristics of speech.161.3. Description of the DataI collect data from the Twitter REST API in three stages. In the first stage, Idraw a random sample of Twitter accounts. I do this by exploiting the fact that allTwitter users are assigned a unique ID. Furthermore, before September 15, 2015,these numbers were ordered by account creation date, and ranged between 1 and 3.5billion (after this date, much larger numbers are used). However, not all numbersin this range are associated with a valid account. I queried the Twitter API witha random 1/1000 sample of integers between 1 and 3.5 billion. I stratified thisquery by thousands: one number queried between 1 and 1000 inclusive, one numberqueried between 1001 and 2000 inclusive, and so on. Of these, 37%, or 1.3 of 3.5million, were assigned to an actual Twitter account. These requests return eachTwitter user’s profile, including information on the user’s language, total number ofmessages sent, number of followers, and number of accounts followed. I drop allnon-English speaking accounts, leaving 691,811 accounts. I also drop all accountsthat had created fewer than fifty tweets: if users do not create content, I cannotevaluate their sentiment. This leaves 112,394 accounts. I drop accounts that followfewer than five accounts, on the grounds that they do not appear to be interested inconstructing a feed and therefore are unlikely to pay much attention to it. I dropaccounts that follow more than 5,000 accounts, because their feeds are so largethey are likely not paying attention to them either. This leaves 87,962 accounts.In the second stage, I request from the API the most recent 200 messages fromthe remaining users. I drop any account that does not have at least one tweet witha geocoded location. This leaves 14,370 accounts.24 In the third stage, I collectall tweets from the remaining users, up to 3,200 each.25 I also collect the list of24It would be simpler to request a list of all the users that satisfied these criteria, including havingat least one geocoded message within their most recent 3,200, and randomly sample from that list.The Twitter API does not allow for such requests. I have to make rate-limited requests of user-profiles in order to get the information I need to make (more tightly) rate-limited requests for themessages of candidate users most recent 200 messages, which allow me to make (even more tightly)rate-limited requests of the most recent 3,200 messages for the subset of the candidate users whohave a geocoded location in their first 200 messages. This sequential process allows the data to becollected in approximately two months.25Twitter limits the number of messages that can be downloaded for any user to the most recent171.3. Description of the Datapeople this sample follows, and any places geotagged in their messages. Tweetssent before January 1, 2011 are dropped (because of the exponential growth ofTwitter, observations are very sparse before this point).Twitter does not provide direct access to a user’s feed. In the third stage of datacollection, I reconstruct the feeds of the recipients by collecting user profile data forall users who are followed by more than one of my users. A measure of influenceis then constructed by multiplying these users’ tweet volumes by their number offollowers, and data for the most influential 20,000 users is collected. Weightedby views, these users account for 48% of the total viewed content of my sample(although only 18% of total viewed content could be collected due to rate limiting).1.3.3 Locating UsersTo construct the instrument, which is based on daylight, I must have a estimateof each sender’s location (I also require an estimate of the recipient’s location toconstruct controls). To do this, I extract the geotagged locations from each user’stweets. Each geotagged location is either a bounding box26 or a point.In principle, I could estimate a separate location for each user-time-bin, usingonly the messages from that time bin. This would allow for user’s moving fromone location to another. However, the vast majority (>95%) of user messages arenot geotagged, and therefore the vast marjority of user-time bins would not havegeotagged information. Even if all messages were geotagged, I would still need toestimate a location in the periods where no messages were sent in order to calculatethe instrument. I proceed by interpreting user’s locations reported as i.i.d. drawsfrom the distribution of time spent by that user in different locations. This distribu-tion can be different for every user, but I assume for each user it does not changeover time.3,200. In some cases I have collected more than 3,200 messages due to repeated sampling.26A bounding box is an area on the earth’s surface defined by minimum and maximum latitudeand longitude coordinates.181.3. Description of the DataI concede this is a strong assumption. It implies that a individual could dividetheir time between two cities, flying back and forth, but could not accomodate a situ-ation where an individual first lived in one city, an they moved to another. However,if I estimate user locations in a way that is consistent this assumption (following theprocedure outlined below) violations of the assumption will only produce noise andreduce the power of the instrument.To proceed, I match the geotagged locations toa 0.75◦× 0.75◦ grid. Grid squares range in size from 1,176 square miles to 2,031square miles, depending on latitude.27 For point locations, I assign them to theclosest grid square. For bounding boxes, I divide the box among the overlappinglocations and weight by the population of the intersection.28 This means that myestimate of a tweet’s location is a 1,577 element vector, one for each grid square inthe Continental US. Each element is an estimate of the probability that the user isin that grid square. For each user, I then take the simple average of the vectors ofthe locations they tag in their tweets to obtain their location vector.29I drop all locations that match grid squares outside the Continental US. To speedcomputation, I assign all locations with a frequency of less than 5% a probabilityof zero. This drops only 4.5% of locations weighted by probability. However, itincreases the speed of data construction by an order of magnitude, because it dropsabout 90% of all user-latititude-longitude observations.I drop users who do not have any recorded locations, or do not have locations27This choice of grid size is somewhat arbitrary, and larger than what typically be used in em-pirical work. In this case, aggregating at this level loses very little of the variation in daylight. Inaddition, computational time in preparing the dataset is considerable and linear in the number ofgrid points. This grid was originally used beause it is the smallest grid for which weather data isavailable from the ECMWF. See Chapter 2 for further information.28Population counts are obtained from GPW v4, see Lloyd et al. (2017).29There is a computer science literature on other strategies for locating users on Twitter. Thisliterature focuses on predicting a user’s most likely location (classification) instead of predicting thedistribution over all possible locations. Despite this, I’m currently working on an extension thatapplies some of these techniques to increase the sample. In addition, while it has not been pursuedin this literature, I could estimate how persistent locations were for Twitter users in my sample, anduse these estimates to impute a different distribution for each time period. For example, a user whowas recorded as being in San Antonio three weeks ago, and in Boston today, would see their locationvector change smoothly over those three weeks.191.3. Description of the Datawithin the Continental US. The final sample contains 7,177 recipients and 8,058senders. Finally, for computational feasibility, I aggregate the data to three-hourtime intervals. The final dataset covers 1.052 billion (recipient, sender, date, three-hour bin) observations. Figure 1.1 shows the distribution of Twitter users in the twosamples.1.3.4 Calculating DaylightTo calculate the value of daylight for a user in a specific time period, I start atthe grid square by time level. I calculate the fraction of time the sun was abovethe horizon in the three-hour time bin for the center of the grid square. I thenaggregate this over all grid squares, weighted by the estimated probabilities in theuser’s location vector. The value of daylight then ranges between zero and one forall users, and will be reach its minimum and maximum values for each user everyday.301.3.5 Measuring SentimentThe basic problem I face is how to collapse the richness of variable length text datainto a single, real-valued measure of sentiment. A secondary problem is to do soin a way that the resulting measure has a clear interpretation beyond “large valuesexpress more positive sentiment.”In principle, I could categorize messages as “happy” if they contained the word“happy”, and as “not happy” otherwise. However, messages that contained wordsthat are closely related to happiness, such as “excited”, “ecstatic”, “joy”, etc. wouldbe categorized as “not happy”. The resulting measure would not be completelyuninformative but it would be very noisy.30The maximum value will be less than one for users who report locations outside of the Conti-nental US, or who have some that were initially assigned a probability less than 5%, and dropped(see 1.3.3). This ensures that the variance of the volume shifter across senders is proportional to thequality of information I have about their locations.201.3. Description of the DataTo smooth out the data, I operationalize the measure of sentiment as the proba-bility that the given message contains the word “happy” based on the other wordsin the message. To do this, I need a measure of the probability of co-occurrencebetween each word in a message and “happy”. As Gentzkow et al. (2016) pointout, there is a problem with using sample co-occurrence as estimates of populationco-occurrence: as the sample size increases, so do the number of words, so there arealways co-occurrences that are imprecisely estimated. To surmount this, I replaceeach word with a word vector, obtained from Pennington et al. (2014). These wordvectors are trained on an extremely large corpus of text (2 billion tweets) in such away that the co-occurrence probabilities are reasonably accurate even for relativelyrare words. The resulting vectors are arranged in a 200 dimensional space so thatthe cosine similarity between two vectors is a reasonable estimate of the proba-bility of co-occurrence of the corresponding words, adjusted for word frequency.This approach is becoming standard in text analysis, partly because words that havesimilar meanings, such “happy” and “joy”, tend to have high probabilities of co-occurrence. This specific implementation is referred to as GloVe (Pennington et al.,2014).Before converting words to vectors, I split each tweet into words and apply stan-dard processing steps to each word. I convert emoji to words31, collapse emoticonsto those in the GloVe, remove punctuation, convert to lower case, and split hyphen-ated words not in GloVe. Finally, I strip out the word “happy” from every tweet, asit is the outcome of interest.32For each tweet, I then take the simple average of these word vectors and traina random forest algorithm on the resulting data. Random forests are a popularensemble classifier : they fit a large number of decision trees (200, in this case)33 to31For example, I replace the tears of joy emoji with five words: “face”, “with”, “tears”, “of”,“joy”. I do this because emoji do not appear in the GloVe.32To constrain the complexity of the problem, I ignore pictures and links in the messages.33 Each decision tree is trained on a subset of the data, sampled with replacement, using a randomsubset of the total number of explanatory variables (in this case, vectors). Each decision tree esti-mates the probability that messages contain the word “happy” by dividing the data into a number of211.4. Empirical Strategythe data. Each of the decision trees is fit on a subsample of the data that is sampledwith replacement, using a subset of the explanatory variables. Each tree is a poorpredictor of the outcome. However, the fitting error of each tree is is indepedentacross trees. Fitting many trees then averages this out.34Random forests perform quite well in comparisons of machine learning algo-rithms (Caruana and Niculescu-Mizil, 2006) and can be fit in parallel, because eachtree is independent of the others. I obtain the predicted probabilities from each de-cision tree and take the average across all trees as my measure of sentiment. Finally,I bin the sentiment of messages from the same user and three-hour period togetherusing a simple average. I do this to obtain a dataset organized at the user by date-three-hour bin level. In most cases, users send no messages in a three hour period,and so the data is missing. I discuss how I address this in the next section. Figure1.2 shows the distribution of sentiment across the United States. While senders arehappier on average than recipients, they are heavily concentrated in urban centers,and those senders located outside of urban areas are less happy.1.4 Empirical StrategyFor convenience, all variables along with their definitions are listed in Tables 1.1and 1.2.groups (eight, in this case). This is done in seven steps. At each step, one of the groups is split intotwo groups along one of the explanatory variables, in a way that maximizes the chi-square of thesplit. For example, a decision tree predicting wages might split on “female” and “male”, then mightsplit the male group into “male and high school or less” and “male and some college or more”, andso on. At each level of the tree, the split can be on any variable within the subset and at any cutoffof that variable. The prediction of the outcome for each group is the average of values across thatgroup. So if a decision tree resulted in a group of “female with some college or more younger than65” predicted income for individuals in that group would be the average of the sample for that group.On their own decision trees are prone to overfitting; averaging across trees greatly mitigates this.34For more information on random forests, see Breiman (2001).221.4. Empirical StrategyIdeal Estimating EquationIn the absence of endogeneity issues I would estimate how recipient sentiment isaffected by feed sentiment and feed volume as follows:yrt = α0+α1∑s∈Nrt nstyst∑s∈Nrt nst︸ ︷︷ ︸feed sentiment+α2 ∑s∈Nrtnst|Nrt |︸ ︷︷ ︸feed volume+εrt (1.1)User r’s sentiment, yrt , may be affected by the messages of the set of usersr follows, Nrt , through two channels. 35 The first channel is the weighted aver-age of the sentiments expressed by the senders followed by user r, weighted bynst = ln(Tst +1), where Tst is the number of tweets sent by user s at time t. Theconcave function captures the idea that repeated messages from the same user likelyhave diminishing marginal impact. The second channel is the weighted volume ofmessages received by user r at time t. The weighted volume is normalized by thenumber of senders a recipient follows. I normalize in this way because users whohave chosen to follow many accounts are likely less sensitive to new messages thanusers who have chosen to follow a small number of accounts. The t subscript ac-counts for the fact that feed sizes vary over time. This is because recipients mayfollow senders whose accounts were created after their own.OLS estimation of the parameters in this equation will be inconsistent for tworeasons: reflection and homophily in time-varying unobservables. Reflection oc-curs when users r and s follow each other. Changes in s’s behavior affect r, whichin turn affects s, and so on. While this is less of a concern on Twitter data becauseover 90% of the links in my data are not mutual, the instrument accounts for it.Homophily in unobservables with time-varying impact is a more serious concern.For example, two economists who are connected on Twitter may be very interestedin the Nobel Prize announcement. If they both agree with the judgment of the com-35In principle, one would also like to estimate coefficients on a polynomial expansion of theseterms, with the order of the expansion going to infinity with the sample size. As the instrumentsproposed below lack the power to identify these higher-order terms, I omit them here for simplicity.231.4. Empirical Strategymittee, both will send more happy messages when the prize is announced. This willbe true even after conditioning on user-fixed and time-fixed effects. But it is unclearwhether the recipient is happier because they received positive messages from thesender, or because they both received a similar shock.To address this, I instrument for the volume of messages each recipient receiveswith the average light levels of the senders that receiver follows. Specifically, theshock to sender s at time t, zst , is the probability that it is daylight in the locationof the sender at that time. In all specifications, light increases the number of tweetssent: people are more active during the day.nst = δ0zst +µst (1.2)I do not explicitly characterize the channel through which light increases tweetvolumes. Daylight may correlated with temperature, time of day, and month of theyear, and it could increase tweet volumes through any of these, or directly. However,I do not attempt to estimate the causal effect of light levels on tweet volumes. Isimply need a variable that shifts the volume of tweets that senders send. There canbe many channels through which light affects sender behavior, provided that it onlyaffects recipient behavior through sender behavior.I aggregate these tweet volume shocks in two different ways to construct instru-ments for feed sentiment and feed volume.Feed Volume Instrument ConstructionI sum equation 1.2 across senders in a given feed, and divide by the number ofsenders in the feed, |Nrt |.FVolrt = δ0FLightrt +µrt (1.3)Where FVolrt =∑s∈Nrt nst|Nrt | , FLightrt =∑s∈Nrt zst|Nrt | , and µrt =∑s∈Nrt µst|Nrt | . Figure 1.3 showshow the instrument for volume is constructed from the network graph.241.4. Empirical StrategyUnobserved TweetsIn any period, I do not observe all tweets received by any recipient for two reasons.First, all recipients follow some senders who are not in the group of the 20,000 mostinfluential senders I collect data on. Second, the senders I do collect data on are notobserved in all periods due to the rate-limiting issues discussed in Section 1.3.2. Inaddition, I do not even observe the count of tweets that are unobserved.I am forcedto make assumptions about how the observed component of variables are relatedto the unobserved components. I discuss this issue more systematically and relaxthese assumptions in Chapter 2.I proxy for the sum of observed and unobserved (to the researcher) feed volume,FVolrt , with the observed component of feed volume, FVolobsrt =∑s∈Nrt nst|Nobsrt |. This isconsistent with homophily. The fact that the senders whom I observe are particu-larly active at a given time suggests that the unobserved senders who are followedby the same recipient will be active as well. I also proxy for the instrument usingthe observed component: FLightort =∑s∈Nrt zst|Nobsrt |: if I observe that a user follows manysenders in a particular city, residents if the same city are likely well-represented inthe senders I do not observe.36This procedure is conservative because if the observed and unobserved compo-nents of the instrument are less than perfectly correlated, the estimated magnitudeof the effect of volume on sentiment will be biased towards zero. Even if the mag-nitudes are biased, reported p-values will still be correct because a violation of theexclusion restriction requires the unobserved component of the instrument to becorrelated with both feed volume and recipient sentiment. Under the null hypothe-sis this will not be the case.ControlsI add four additional sets of controls in all specifications.36I relax these assumptions in Chapter 2.251.4. Empirical StrategyFirst, I want the variation in the instrument to be variation in the volume oftweets the senders in r′s feed, Nrt , send, holding the average sentiment of thosemessages constant. To achieve this, I add feed sentiment, FSentrt =∑s∈Nrt nstyst∑s∈Nrt n∗st, andthe shock to feed sentiment in each period, F∆Sentobsrt =∑s∈Nobsrtnst(yst−y¯s)∑s∈Nobsrtnst. I addthese terms separately because the shock to sentiment is measured with error andso the coefficient may differ.Second, I add Locrt , a vector with twenty four elements. Each element is theprobability that the local time in the location of the recipient is a particular value.It is a probability because locations are probabilistic, but the interpretation is verysimilar to a vector of dummy variables. I control for recipient local time becauserecipients tweet more at some times (and are happier at some times) than others.Third, I need to control for the possibility that sender light levels are affectingrecipients through recipient light levels. This is possible because light affects therecipients in the same way that it affects the senders, and their locations may becorrelated. To address this, I control for the value of the instrument for the recipient,zrt . This control implies that the validity of the instrument requires senders andrecipients to be in different places: if network was partitioned by location, zrt andFLightrt would be equal. Fortunately, in my sample, the average distance betweenrecipient and sender is 911 km (i.e., the distance between the cities of New Yorkand Cincinatti).The final set of controls is two groups of time zone controls. First, I addTZrt =∑s∈Nrt TZs|Nrt | , where TZs is a five element vector of the probability that users is in one of five continental US time zones.37 In specifications without user-fixedeffects, this ensures that the source of the identifying variation is not the differentialaverage volume across senders in different time zones, which may be correlatedwith recipient characteristics. This is time-varying only because not all sender ac-counts exist for all periods. Second, I add TZESentrt =∑s∈Nrt T Zsys|Nrt | . This ensures37The major time zones are Eastern, Central, Mountain, Pacific Time. Most of Arizona does notobserve Daylight Savings Time. The easiest way to deal with this is to add an element to the vectorfor Arizona.261.4. Empirical Strategythat the variation in the instrument is not driven by time-invariant cross-user differ-ences in feed composition. Finally, I add yr, the average value of sentiment for thereceiver in specifications without user-fixed effects.Feed Sentiment Instrument ConstructionTo construct an instrument for feed sentiment, I follow a similar procedure to thatoutlined for feed volume in Subsection 1.4. I multiply Equation 1.2 by sendersentiment, yst , sum across senders in a given feed, and divide by the what the feedsize would be in the absence of the instrument, ∑s∈Nrt n∗st .FSentrt = δ0FLightSentrt +νrt, where FSentrt =∑s∈Nrt nstyst∑s∈Nrt n∗st, FLightSentrt =∑s∈Nrt zstyst∑s∈Nrt n∗st, and νrt =∑s∈Nrt νst∑s∈Nrt n∗st. Thetext classification of tweet happiness is imperfect, and so is measured with error. Iadd and subtract y¯s, the long-run average sentiment of sender s, from yst to createterms I do observe:FSentst = δ0FLightESentrt +δ1F∆Sentrt +ν′rt, where FLightESentrt =∑s∈Nrt zsysn∗st. Let Feed∆µrt =∑s∈Nrt (yst−ys)µst∑s∈Nrt nst. Then ν ′rt =δ2Feed∆µrt + νrt . This change is quite demanding of the data because there ismuch more variation in yst than in ys. The reason for doing this is that if I con-struct the instrument using yst , measurement error creates bias that is of uncertainsign. Typically, measurement error creates attenuation bias. However, in this case,measurement error in yrt and yst are likely correlated, because connected users talkabout similar topics.38 As a result, the sign of the bias introduced by measurementerror in yst is uncertain.39 An increase in the value of the instrument is an increase38For example, a sender-recipient pair both expressing criticism of comedian Joy Behar wouldlikely see their tweets coded as happy because of the presence of the word “Joy”. This wouldrepresent transmission, but not of the sentiment of interest.39 It is possible to instrument for the time-varying component of sender sentiment using laggedsentiment. Future work will extend the results using this approach.271.4. Empirical Strategy(decrease) in light levels among senders who express more (less) positive sentimenton average, but this doesn’t necessarily mean they are sending more positive mes-sages in the current period. Figure 1.4 shows how the instrument for sentiment isconstructed from the network graph.Unobserved TweetsAs before, I do not observe all tweets received by user r. In addition, yst is notobserved if Tst = 0. I proxy for feed sentiment with its observed component. Thisis consistent with homophily. The fact that the senders whom I observe expressmore positive sentiment at a given time suggests that the unobserved senders whoare followed by the same recipient are likely expressing positive sentiment as well.I also assume this is the case for the instrument. I follow the same procedure asabove in proxying for variables using their observed counterparts.FSentobsst = δ0FLightESentobsrt +δ1F∆Sentrt +ν′′rt, where FSentobsrt =∑s∈Nrt nstyst∑s∈Nrt nst, FLightESentobsrt =∑s∈Nrt zstyst|Nobsrt |, and ν ′′rt =∑s∈Nrt νst|Nobsrt |.Using |Nobsrt | as the denominator on the right hand side is attractive because it doesnot depend on any time-varying components of sender tweet behavior that might berelated to the outcome of interest. The t subscript is simply to account for the factthat some accounts are created at different times than others.I add additional sets of controls in all specifications. FELightESentobsst =∑s∈Nrt zs|Nrt |∑s∈Nrt ys|Nrt |is the interaction between the average value of the light shock and the averagesender sentiment, within the feed of recipient r at time t. I want the variation inthe instrument to be variation in distribution of light levels among the senders of afeed, holding the average sentiment of those senders constant. This term ensuresthat this is the case. As above, I add Locrt, TZrt, and TZESentrt, zrt , and y¯r.281.5. Results1.5 ResultsThis section presents results for the causal response of recipient sentiment to feedsentiment and feed volume. This section also performs a number of exercises to pro-vide context for the magnitudes of the estimated effects. In Appendix A, I performa similar set of regressions for the response of recipient volume to feed sentimentand feed volume.Summary StatisticsTable 1.3 shows there are fewer observations (2.9 million) of sentiment than mostother variables (17.3 million) because of the sampling procedure. The data includesall periods where a recipient sent one or more tweets. For computational feasibilityI retain a 20% random sample of the periods where the recipient sent no tweets. Inthese periods, recipient sentiment is unobserved, but sender variables are generallyobserved. In a smaller number of cases, I observe no tweets in a recipient’s feedfor a given period. Consequently, feed sentiment is not missing for 13.6 millionobservations.40OLS RegressionsTable A.1 shows the OLS estimates of the response of recipient sentiment to bothfeed sentiment and feed volume. In the simplest specification, (1), an increase infeed positivity of 10% is associated with a 1.6% (Standard Error: 0.1%) increasein recipient sentiment. The coefficient on feed sentiment is positive in all speci-fications, although it decreases in magnitude as more controls are added. This isconsistent with either homophily, positive social interactions, or both. The coef-ficient on feed volume is generally negative, small in magnitude, and not alwayssignificant. In Appendix A, I show that recipients spend more time on the network40Estimates are quantitatively similar if I interpolate missing values of the recipient’s feed usinglagged values. These results are available on request.291.5. Resultswhen their feed is larger. This implies that the negative coefficient of feed volumeis consistent with previous research showing a negative correlation between timespent on social media and well-being.First Stage RegressionsTable 1.5 presents first stage regressions for the instruments of feed sentiment. Inthe preferred specification (4), with both user-fixed and time-fixed effects, a 10%increase in the instrument increases feed sentiment by 3.06% (SE 0.32%), and theF-statistic on feed sentiment is 76. This shows weak instruments are not a concern.The coefficients are relatively stable across specifications, and the F-statistics rangefrom 33 to 2817. In the preferred specification (4), with both user-fixed and time-fixed effects, daylight increases feed volume by 3.58% (SE 0.04%), and the F-statistic on feed volume is 73.Table 1.6 presents first stage regressions for the instruments of feed volume.The IV F-statistic is very strong (20 to 73) unless both user-fixed and time-fixedeffects are included. This is intuitive. Unlike the instrument for sentiment, wheresome users receive positive sentiment shocks at the same time as other users receivenegative ones, all users receive positive (negative) volume shocks of the same sign(if not the same magnitude) in the morning (evening), as dawn (dusk) moves acrossthe Continental US. Seasonal variation in which senders are affected in which ordermeans it is in principle possible to estimate these effects with both user-fixed andtime-fixed effects, but it places unreasonable demands on the data. Consequently,specification (2) is my preferred specification for estimating volume effects.Second Stage RegressionsTable 1.7 shows estimates of the effects of feed sentiment on recipient sentiment.In the preferred specification (4), the estimated effect is 0.34. In other words, if thefraction of happy tweets in a recipient’s feed increased by 10 percentage points, thefraction of the tweets sent out by that recipient that were happy would increase by301.5. Results3.4 percentage points. Estimated effect sizes are quite stable across other specifica-tions. The magnitudes of these effects are quite large because they do not includereflection. Taking reflection into account will necessarily increase the importance ofsocial interactions, although the degree to which this is important will vary greatlyby user. I discus the total impulse response, including reflection, implied by theseestimates in Section 1.5.3These effects are considerably larger than the OLS estimates. However, this isnot surprising because the endogenous variable is a weighted average of the senti-ment of the tweets in a recipient’s feed. The sentiment of these messages is mea-sured with error. Because feeds can be quite small, my measure of feed sentimentis relatively noisy. However, this measurement error is greatly reduced in the in-strument, because I interact light levels with the long-run average of the sentimentexpressed in senders’ tweets. This average is typically over hundreds or thousandsof messages, and so the error is much smaller. The magnitude of the coefficientsis slightly smaller than what Kramer et al. (2014) report from their experiment onFacebook (they estimate a 10% increase in feed positivity creates a 5.3% increase inrecipient sentiment), but this is consistent with Facebook relationships being closerand more meaningful relations on average than Twitter links are.Table 1.8 presents estimates of the effect of feed volume on recipient sentiment.In the preferred specification (2), the estimated effect is 0.047. Since the averagenumber of log tweets received per sender is 0.022, this implies that a doubling ofthe average number of tweets a user receives would increase the happiness of themessages they send out by 0.0009 units (an increase of about 2% of a standard de-viation). Although estimates are not always statistically significant, the coefficientsare always positive and of approximately the same magnitude.1.5.1 Total EffectsTable 1.9 presents estimates for a counterfactual where all links of the recipientsin my sample are removed from the network graph. This is a partial equilibrium311.5. Resultscounterfactual, because I do not take into account how a user would be affected byother users losing their links.41 In addition I make the strong assumption that users’short-run responses to changes in their feed content and volume are the same aswhat their long-run responses would be.To construct these estimates, I take the estimate of the effect of peer sentimenton user sentiment from Table 1.7 and multiply that by the difference between feedsentiment and the average sentiment expressed by the recipient. As senders are onaverage happier than their recipients, this difference is positive on average but itvaries across users. To this, I add the estimate of the effect of peer volume on usersentiment from Table 1.8 multiplied by the peer volume. In the preferred specifica-tion (2), I find that on average, removing all a recipient’s links would decrease theirexpressed happiness by 10.6%. I prefer this specification because, as I describe inthe estimation strategy section, the logic of the volume instrument is to some extentincompatible with both user-fixed and time-fixed effects.To get a sense of the magnitude of the mean effects, I compare this decrease inemotional affect to the effect of income on emotional affect estimated by Stevensonand Wolfers (2013). I choose the most recent estimates for rich countries, whichshow that a 10% increase in income would increase average reported happiness by0.468 of a standard deviation.42 I start by taking the estimates recorded in Table 1.9.To convert these to an annual value, I multiply the effects by the fraction of wakinghours (17) that individuals in my sample are definitely on Twitter (0.06). Thisassumes that individuals are not affected by their Twitter feed when they are nottweeting. This is somewhat conservative because there are times when recipientsare on Twitter but do not send any tweets. I estimate that removing Twitter would41This is also slightly different from a partial-equilibrium counterfactual in which the users werekicked off Twitter entirely, because users can still search for tweets on a particular topic or look upthe trending topics in their area.42The authors present estimates for the impacts of income on a large number of measures ofwell-being. I focus on the affective measures because they are conceptually most similar to themeasure of sentiment in this chapter. However, life-satisfaction is more sensitive to income than isemotional affect. Using estimates for other measures of well-being would lead to lower estimateddollar equivalents of Twitter use.321.5. Resultsdecrease annual expressed happiness by 0.106 of a standard deviation (95% CI:-1.87, -0.286).I divide this by the value from Stevenson and Wolfers, 0.468, to get 22%, thepercentage decline in log income equivalent to losing all links on Twitter. I adjustthis by median household income in the US, $75,200, divided by average familysize of 2.53. This gives an median cost of $613. I multiply this by the numberof US Twitter users, 52 million, to get a monetary equivalent of the total annualsocial benefit of Twitter of $21 billion dollars. This is 1.4 times Twitter’s marketcapitalization. Taken at face value, these estimates suggest that not only has Twitterbeen unable to capture most of the social value of participating in the network butalso that investors believe it will be unlikely to do so in the future.Table 1.9 also shows estimates for the effect of removing Twitter from the net-work on the standard deviation of recipient sentiment. This is important because Doet al. (2008) argues that how individuals perceive how enjoyable an experience wasin hindsight (their retrospective well-being) is not the simple integral of their expe-riences throughout the experience. Instead individuals tend to overweight extremeevents, as well as the final experience of an event. I cannot do much to investigatefinal experiences as I do not observe exactly when users log off Twitter. However, Ican estimate whether Twitter increases the standard deviation of the sentiment of itsusers. In general, I find negative effects: Twitter reduces fluctuations in sentiment.This is intuitive: because individual sentiment now depends on the sentiment ofothers, feelings are smoothed across the network.The counterfactual described here ignores the transmission of other emotions(i.e., anger, fear disgust, and sadness). This is for two reasons. First, negativeemotions occur less frequently online (Kramer et al., 2014), and are more difficultto measure.43 Second, even if I did have measures of their occurence and couldcalculate how transmission of each emotion affected each other emotion, there areto my knowledge no estimates of the dollar value cost of anger, disgust etc.. A valid43Happy people often say they are happy or use happy emoticons. Angry people are much lesslikely to reflect on their anger, while angry331.5. Resultsinterpretation of the effects described here is that they represent gross effects. Thetotal effects of the transmission of all emotions may be smaller or larger.441.5.2 Decomposition into Within-User and Between-UserEffectsThe decrease in the standard deviation of expressed sentiment of all users does notmean that individual users experience a lower standard deviation in their expressedsentiment. I assume that senders respond homogeneously to the daylight treatment(no defiers), and the recipients respond homogeneously to sentiment and to volume.I decompose the estimated effect of removing Twitter into between-user and within-user effects. Table 1.10 presents results of this decomposition of means. Table 1.11presents results for the decomposition of variance. I bootstrap the confidence inter-vals in all specifications because estimating the counterfactuals requires combiningestimates of population means and regression coefficients. I find that Twitter de-creases the standard deviation of sentiment across users, and increases the standarddeviation of sentiment within users. This is because while Twitter smooths senti-ment across the network on average, the values of sentiment a user observes fortheir friends are likely correlated with any shocks that user receives from outside ofTwitter. In other words, Twitter magnifies the salience of news shocks. This is po-tentially important because it reconciles the anecdotal experience of Twitter users -that participation can lead to large and sometimes unwelcome swings in emotionalaffect - with the aggregate-level estimates I provide in previous sections.44Sadness to sadness transmission is presumably positive, and its inclusion would reduce theestimate of welfare gains. However Kramer et al. (2014) find that exposure to positive messagesreduces the number of sad messages a recipient sends: happy to sadness transmission is likely tobe negative. Therefere, its inclusion would increase the estimate of welfare gains. Since happymessages occur more often than sad messages, the welfare gains of taking other emotions intoaccount could be even larger than what I report here.341.5. Results1.5.3 User InfluenceThe estimates presented in Table 1.7 show how recipients respond to changes inthe sentiment of their feed. These can also be interpreted as estimates of how auser’s followers respond to the sentiment of their messages.45 I estimate the averagenumber of accounts a user follows, Navg, to be 789, so a one standard deviationincrease in the sentiment expressed by a user will raise the average sentiment ofeach of their followers by 0.00043 of a standard deviation.However, this does not address the total effect of a sender’s tweets on networksentiment. While sender positive messages will raise the expressed sentiment oftheir followers, this will also raise the expressed sentiment of their followers’ fol-lowers, and so on. I calculate the total effect of a sender’s tweets on network senti-ment as follows. For simplicity, consider the case where all accounts send the samevolume of tweets at all times. Then the influence of user s can be defined as:As = 1+α0 ∑r∈NsAr|N f olr |, where Ai is the total effect of a one unit increase in the sentiment of the messagessent by user i. By definition, an increase in positive sentiment expressed by senders increases their own positive sentiment by one unit. For each follower of s, r, italsoincreases the sentiment of their followers by α0Ar/|N f olr | units, where N f olr is thenumber of accounts r follows, and α0 is the strength of social interaction estimatedpreviously. This can be rewritten as:As(1−α0)Nall=1−α0Nall+α0Nall∑r∈NsAr (1−α0)|N f olr |Conveniently, As (1−α0) is the definition of PageRank. This is the algorithm45The effect of a single sender’s sentiments on one of their followers is small because it must bedivided by their feed size. However, since this is summed over all of a sender’s followers to get thetotal effect for a user’s feed, the feed size enters both the numerator and the denominator and cancelsout on average.351.6. Conclusionused by Google to rank the importance of web pages. To predict the influence of auser from PageRank, Flammini and Menczer (2008) proceed by assuming that N f olris the same for every user. They then show:Ast(1−α0)Nall' 1−α0Nall+α0NallNsNavgWhere Nall is the network size and Navg is the average number of followers a userhas. The approximation error goes to zero as network size goes to infinity. Thisimplies that for the average user (one with Ns = Navg followers), their influence onother users will be Ast−1 = α01−α0 . For α0 = 0.34, this implies Ast = 0.52To get a sense of how influential the most influential accounts on Twitter are,consider @BarackObama, who has 102 million followers. This is the third mostfollowed Twitter account and the most followed political account. A completelypositive tweet from @BarackObama increases total expressions of positive senti-ment on Twitter by 265,000, relative to a counterfactual where he sends a neutralmessage.This can be converted into a monetary amount by following a similar exerciseto that in Section 1.5.1. The estimates in this chapter imply that the total value ofa tweet from Barack Obama that is two standard deviations more positive than aneutral message is $12.01 million.1.6 ConclusionI introduce an instrumental variable strategy for estimating social interactions ononline social networks. This strategy is robust to the presence of correlated shocksand homophily in time-varying unobservables. It is specific to a single outcome ofinterest. I apply this strategy to detecting the transmission of one particular dimen-sion of sentiment: happiness. In general, I find that Twitter has positive effects onthe expressed sentiment of its users. A 10% increase in the positivity of incomingmessages increases the positivity of outgoing messages by 3.4%. A 10% increase361.6. Conclusionin the volume of messages increases the positivity of outgoing messages by 0.47%.Twitter also decreases the inequality in sentiment expressed by users. These esti-mates have important implications for evaluating the consumer surplus of onlinesocial networks. I find that Twitter causuespositive sentiment among its users toincrease by an equivalent to an increase in US income of $21 billion.In addition, these results constitute a baseline for future work in two ways. First,I have provided estimates for transmission of the most easily observed and best un-derstood dimension of sentiment: happiness. The level of positive sentiment in thepopulation is far from politically irrelevant, as any politician seeking re-electioncan attest. Nor is it economically meaningless, as the interest given to surveys ofconsumer and business confidence shows. But anger and fear are perhaps evenmore important for determining political and economic outcomes. Understandinghow these other dimensions of sentiment transmit through online social networksconstitutes an important avenue for future work. Second, I have provided estimatesfor transmission for the average US Twitter user. While this is a natural place tostart, the most interesting implementations of the estimation strategy laid out hereare likely for subpopulations that are difficult or impossible to treat experimentallyfor ethical or practical reasons. For example, my future work will investigate thetypes of sentiment, and the topics to which white supremacists respond most avidly.This could not be done with field experiments if there was a possibility of stokingwhite-supremacist attitudes.46 The instrument outlined here allows for exactly that.Another intriguing possibility involves using the same strategy to estimate how em-igrants transmit attitudes about the culture and institutions of their destination coun-tries back home. Running a survey in both the origin and destination country andmatching immigrants to their home country links is logistically challenging, and theapproach outlined here is well suited to the task.46Specifically, I can combine estimates of location and time-specific topic and sentiment shockswith estimates of the geography of the links on the network. I can then estimate how these shockswill disseminate through white supremacist networks over time, and compare these predictions tothe pattern of white supremacist attacks in the US over the same time period.371.6. ConclusionFigure 1.1: Locations of Recipients (top) and Senders (bottom)The top panel shows the geographic distribution of log number of recipients. The bottom panelshows the geographic distribution of log number of senders. In both panels purple indicates lowerpopulation density and yellow indicates higher, while gray indicates zero users. Each user at alocation gets equal weight: users who send many tweets have the same weight as those who sendfew tweets.381.6. ConclusionFigure 1.2: Geographic Mean Sentiment of Recipients (top) and Senders (bottom)The top panel shows average sentiment of the receivers in each location. Thebottom panel shows average sentiment of the senders in each location. In bothpanels purple indicates less expressed positive sentiment and yellow indicatesmore, while gray indicates zero users. Sentiment is on the same scale in each plot.In both panels, sentiment estimates were smoothed by estimating a gridsquare-specific random effect using a gaussian multilevel model to deal with cellswith low user counts. Each user at a location gets equal weight in this calculation:users who send many tweets have the same weight as those who send few tweets.391.6. ConclusionTable 1.1: List of Variables Used and DefinitionsVariable Definition Explanationyrt Recipient sentiment for r of tyst Sender sentiment for s of tNrt Set of senders followed by r at time tTst Number of tweets sent by sender s at time tnst ln(Tst +1) Contemporaneous sender volumezrt Probability of daylight for r at time tzst Probability of daylight for s at time tn∗st Counterfactual tweets sent if zst = 0FVolrt∑s∈Nrt nst|Nrt | Average volume of senders in feed of r at time tFSentrt∑s∈Nrt nstyst∑s∈Nrt nstAverage sentiment of the feed of r at time tFLightrt∑s∈Nrt zst|Nrt | Average daylight for senders in feed of r at time tFLightSentrt∑s∈Nrt zstyst|Nrt | Average of sentiment-light interaction for senders infeed of rF∆Sentrt∑s∈Nrt nst(yst−ys)∑s∈Nrt nstDifference between Exp. and Obs. feed sentimentFLightESentrt∑s∈Nrt zsys|Nrt | Interaction of sender average sentiment and light forsenders of r, averaged across the feed of rFELightESentrt FLightrt× ∑s∈Nrt ys|Nrt | Interaction of sender average sentiment and theaverage light of the senders of rFVolobsrt∑s∈Nobsrtnst|Nobsrt |Observed proxy for feed volumeFLightobsrt∑s∈Nobsrtzst|Nobsrt |Observed proxy for feed lightFSentobsrt∑s∈Nobsrtzstyst∑s∈NobsrtnstObserved proxy for feed sentiment401.6. ConclusionTable 1.2: List of Variables Used and Definitions con’tVariable Definition ExplanationF∆Sentobsrt∑s∈Nobsrtnst(yst−ys)∑s∈NobsrtnstObserved proxy for shock to feed sentimentLocrt Vector of local time probabilitiesTZrt∑s∈Nrt TZsNrt | Vector of time-zone probabilities, averaged acrosssenders of r at time tTZESentrt∑s∈Nrt T Zsys|Nrt | Interaction of time-zone vectors with averagesentiment, averaged across senders of r at tNote: I suppress the obs superscript in all tables that report estimates.Table 1.3: Summary StatisticsMean St. Dev. CountSENTIMENTrt 0.085 0.051 2914577FSENT.rt 0.087 0.030 13610625E FSENT.rt 0.086 0.020 13796901LIGHT * FSENT.rt 0.027 0.024 17298548SENT. SHOCKrt -0.038 0.047 17316870LIGHT * EFSENT.rt 0.026 0.024 17316870Obs. Receiver Tweets 0.041 0.210 17316870Feed Size 0.117 0.138 17316870Sender Count 59.110 77.567 17316870Estimates of means and standard deviations are weighted by the inverse of the sampling probability.The sampling probability is 1 if a recipient sent at least one tweet in a period, and 0.2 otherwise.411.6. ConclusionFigure 1.3: Locations and Light Levels for Neighbors of User 516134500In the top panel, circles represent the locations of the relevant senders that User 516134500follows. The shaded area represents the portion of the United States that is in darkness onFeb 2, 2017 at 6pm, Central Standard Time.In the bottom panel, the thin line represents how the instrument for volume, average feedlight, evolves over time, while the thick line shows the values of the instrument when thedata is binned in three hour intervals for computational feasibility. All estimates in thechapter use the binned data.421.6. ConclusionFigure 1.4: Locations and Sentiment for Neighbors of User 516134500In the top panel, circles represent the locations of the relevant senders that user516134500 follows, and the colors represent their average sentiment. Yellow representsmore positive sentiment than average and dark blue represents less positive sentiment. Theshaded area represents the portion of the United States that is in darkness on Feb 2, 2017 at6pm, Central Standard Time.In the bottom panel, the thin line represents how the instrument evolves over time, whilethe thick line shows the values of the instrument when the data is binned in three hourintervals. All estimates in the chapter use the binned data. Each horizontal line indicatesthe hours in which the sender is in daylight. The y-coordinate of each horizontal lineindicates the average sentiment of that user.431.6. ConclusionTable 1.4: OLS Recipient Sentiment on Sender Sentiment and Sender Volume(1) (2) (3) (4)SENT.rt SENT.rt SENT.rt SENT.rtFSENT.rt 0.1605∗∗∗ 0.0435∗∗∗ 0.1324∗∗∗ 0.0232∗∗∗(0.006) (0.002) (0.001) (0.002)VOLUMErt -0.0075∗∗∗ -0.0006 -0.0120∗∗∗ -0.0017∗∗(0.002) (0.001) (0.000) (0.001)Linear regression of recipient sentiment, yrt , on feed sentiment, FSentrt , and feed volume, FVolrt .***, **, and *: significance at 0.1%, 1%, and 5% levels, respectively. Errors are clustered at theuser level unless time-fixed effects are included and user-fixed effects are not. In this case they areclustered at the datetime-bin level. Standard errors are in parentheses. For convenience, all variablesdefined and explained in Table 1.1.441.6. ConclusionTable 1.5: First Stage Feed Sentiment on Light-Sender Sentiment Interaction(1) (2) (3) (4)F. SENT.rt F. SENT.rt F. SENT.rt F. SENT.rtLIGHT*SENT.rt 0.3614∗∗∗ 0.3062∗∗∗ 0.7024∗∗∗ 0.2749∗∗∗(0.063) (0.032) (0.013) (0.031)LIGHT*E SENT.rt 0.1238 -0.1420∗∗∗ 0.0744∗∗∗ -0.1258∗∗∗(0.064) (0.033) (0.014) (0.031)SENT. SHOCKrt 0.1996∗∗∗ 0.2895∗∗∗ 0.2050∗∗∗ 0.3182∗∗∗(0.003) (0.003) (0.001) (0.003)LIGHTrt -0.0182∗∗∗ -0.0064∗∗∗ -0.0043∗∗∗ -0.0006(0.001) (0.000) (0.000) (0.000)E SENT.r 0.1598∗∗∗ 0.1330∗∗∗(0.009) (0.001)Observations 2515955 2515955 2515955 2515955IV F stat. 33.13 92.74 2816.99 76.28Rcp. TOD Dummies X X X XUser Fixed Effects X XTime Fixed Effects X XLinear regression of feed sentiment, FSentrt on the interaction of sender light and sender long-runaverage sentiment, LightSentrt .Controls: LightESentrt is average feed light multiplied by average feed sentiment across all senders.SentShockrt is the shock to the sentiment of the recipient’s feed at time t. RcpLightrt is the recipientlight level and RcpESentr is the average sentiment of the recipient. Rcp. TOD Dummies are controlsfor the local time of the recipient and sender time-zone controls. Chi Squared reports the standardChi Squared statistic for a test that the effect of the instrument is zero.***, **, and *: significance at 0.1%, 1%, and 5% levels, respectively. Errors are clustered at theuser level unless time-fixed effects are included and user-fixed effects are not. In this case they areclustered at the datetime-bin level. Standard errors are in parentheses. For convenience, all variablesare defined and explained in Table 1.1.451.6. ConclusionTable 1.6: First Stage Feed Volume on Feed Light Levels(1) (2) (3) (4)VOL.st VOL.st VOL.rt VOL.rtF. LIGHTrt 0.0404∗∗∗ 0.0358∗∗∗ 0.0133∗∗∗ 0.0109(0.009) (0.004) (0.003) (0.007)SENT. SHOCKrt -0.7480∗∗∗ -1.3725∗∗∗ -0.2111∗∗∗ -1.0006∗∗∗(0.024) (0.016) (0.002) (0.015)F. SENT.rt -0.0821∗∗ 0.6351∗∗∗ -0.1579∗∗∗ 0.4632∗∗∗(0.032) (0.014) (0.004) (0.013)LIGHTrt -0.0452∗∗∗ -0.0339∗∗∗ -0.0002 0.0006(0.005) (0.002) (0.001) (0.002)E SENT.r -0.4775∗∗∗ -0.2610∗∗∗(0.072) (0.006)Observations 2515955 2515955 2515955 2515955IV F stat. 20.05 72.81 26.46 2.27Rcp. TOD Dummies X X X XUser Fixed Effects X XTime Fixed Effects X XLinear regression of feed volume, FVolrt average light experienced by the senders in the feed at timet, FLightrt .Controls: SentShockrt is the shock to the sentiment of the recipient’s feed at time t, FSentrt is thesentiment of the recipient’s feed at time t, RcpLightrt is the recipient light level and RcpESentr isthe average sentiment of the recipient. Rcp. TOD Dummies are controls for the local time of therecipient and sender time-zone controls. Chi Squared reports the standard Chi Squared statistic fora test that the effect of the instrument is zero.***, **, and *: significance at 0.1%, 1%, and 5% levels, respectively. Errors are clustered at theuser level unless time-fixed effects are included and user-fixed effects are not. In this case they areclustered at the datetime-bin level. Standard errors are in parentheses. For convenience, all variablesare defined and explained in Table 1.1.461.6. ConclusionTable 1.7: IV: Recipient Sentiment on Feed Sentiment(1) (2) (3) (4)SENT.rt SENT.rt SENT.rt SENT.rtF. SENTIMENT.rt 0.2224∗ 0.3145∗ 0.0970∗∗∗ 0.3379∗(0.101) (0.126) (0.027) (0.141)F. LIGHT * E SENT.rt -0.0890 -0.0244 -0.0582∗∗ -0.0174(0.049) (0.022) (0.021) (0.022)SENT. SHOCKrt -0.0414∗ -0.0882∗ -0.0172∗∗ -0.1036∗(0.020) (0.037) (0.006) (0.045)LIGHTrt 0.0042∗ 0.0021∗ 0.0016∗∗∗ 0.0014∗∗(0.002) (0.001) (0.000) (0.001)E SENTIMENTr 0.9474∗∗∗ 0.9700∗∗∗(0.016) (0.004)Observations 2515955 2515955 2515955 2515955AR Chi Squared 6.36 6.81 13.01 6.37Rcp. TOD Dummies X X X XUser Fixed Effects X XTime Fixed Effects X X2SLS regression of recipient sentiment, yrt on feed sentiment, FSentrt using the interaction of senderlight and sender long-run average sentiment,FLightSentrt , as the instrument.Controls: FELightESentrt is average feed light multiplied by average feed sentiment. SentShockrt isthe shock to the sentiment of the recipient’s feed at time t. RcpLightrt is the recipient light level andRcpESentr is the average sentiment of the recipient. Rcp. TOD Dummies are controls for the localtime of the recipient and sender time-zone controls. AR Chi Squared reports the Anderson-Rubinweak instrument-robust Chi Squared statistic.***, **, and *: significance at 0.1%, 1%, and 5% levels, respectively. Errors are clustered at theuser level unless time-fixed effects are included and user-fixed effects are not. In this case they areclustered at the datetime-bin level. Standard errors are in parentheses. For convenience, all variablesare defined and explained in Table 1.1.471.6. ConclusionTable 1.8: IV: Recipient Sentiment on Feed Volume(1) (2) (3) (4)SENT.rt SENT.rt SENT.rt SENT.rtFVOL.rt 0.0371∗ 0.0467∗ 0.0596 0.1516(0.016) (0.019) (0.041) (0.148)SENT. SHOCKrt 0.0244∗ 0.0555∗ 0.0107 0.1472(0.012) (0.026) (0.009) (0.149)FSENT.rt 0.0352∗∗∗ 0.0097 0.0314∗∗∗ -0.0443(0.002) (0.012) (0.007) (0.069)LIGHT.rt 0.0019∗∗∗ 0.0020∗∗∗ 0.0012∗∗∗ 0.0011(0.001) (0.000) (0.000) (0.001)ESENT.r 0.9957∗∗∗ 0.9954∗∗∗(0.008) (0.011)Observations 2515955 2515955 2515955 2515955AR Chi Squared 7.27 6.85 2.28 1.99Rcp. TOD Dummies X X X XUser Fixed Effects X XTime Fixed Effects X X2SLS regression of recipient sentiment, yrt on feed volume, FVolrt using average light experiencedby the senders in the feed at time t, FLightrt , as the instrument.Controls: SentShockrt is the shock to the sentiment of the recipient’s feed at time t, FSentrt is thesentiment of the recipient’s feed at time t, RcpLightrt is the recipient light level and RcpESentris the average sentiment of the recipient. Rcp. TOD Dummies are controls for the local time ofthe recipient and sender time-zone controls. AR Chi Squared reports the Anderson-Rubin weakinstrument-robust Chi Squared statistic.***, **, and *: significance at 0.1%, 1%, and 5% levels, respectively. Errors are clustered at theuser level unless time-fixed effects are included and user-fixed effects are not. In this case they areclustered at the datetime-bin level. Standard errors are in parentheses. For convenience, all variablesare defined and explained in Table 1.1.481.6. ConclusionTable 1.9: Contribution of Network to Well-being(1) (2) (3) (4)CI ub. -.0191 -.0286 .0323 .268Point Est. -.0852 -.106 -.128 -.313CI lb. -.153 -.187 -.288 -.901User Fixed Effects X XTime Fixed Effects X XEstimates of the expected impact on well-being of breaking all a users links on Twitter. CI lb and CIub are the lower and upper bounds of the 95% confidence interval, respectively. To calculate bounds,I use the upper (lower) bounds of individual estimates instead of seemingly unrelated regression.This means the resulting confidence intervals are conservative. Errors are clustered at the user levelunless time-fixed effects are included and user-fixed effects are not. In this case they are clustered atthe datetime-bin level. Standard errors are in parentheses.Table 1.10: Effects of Twitter on Mean of Sentiment, Between and Within Users(1) (2) (3)Actual Counterfactual Differencemean mean mean p5 p95Between SENT. SD 0.024 0.053 -0.029 -0.040 -0.010Wthin SENT. SD 0.072 0.056 0.016 0.008 0.031Both SENT. SD 0.051 0.072 -0.022 -0.034 0.001Estimates of the effects of Twitter on population average sentiment, and the average of user averagesentiment. The two means are not the same because the former is weighted by volume and thelatter is not. Counterfactuals combine an estimate of sentiment on sentiment effects obtained withuser-fixed and time-fixed effects with an estimate of sentiment on volume effects obtained usinguser-fixed effects only. All coefficients are in levels of positive sentiment expressed.The first two columns report the mean values of the relevant variables, and their counterfactual val-ues: what they would be in the absence of the network. The third column reports the differencebetween these columns (actual - counterfactual). The fourth and fifth columns report the boot-strapped 95% confidence intervals for the difference between the actual and counterfactual values.Bootstrapped resampling is done at the user level in all specifications.491.6. ConclusionTable 1.11: Effects of Twitter on Standard Deviation of Sentiment, Between andWithin Users(1) (2) (3)Actual Counterfactual Differencemean mean mean p5 p95Pop. LR. Avg. SENT. SD 0.024 0.053 -0.029 -0.040 -0.010Pop. Avg. SENT. SD 0.072 0.056 0.016 0.008 0.031Pop. Avg. SENT. SD 0.051 0.072 -0.022 -0.034 0.001Estimates of the effects of Twitter on the standard deviation of sentiment, the standard deviation ofaverage user sentiment, and the standard deviation of the shocks experienced by users. Counterfac-tuals combine an estimate of sentiment on sentiment effects obtained with user-fixed and time-fixedeffects with an estimate of sentiment on volume effects obtained using user-fixed effects. All coef-ficients are in levels of positive sentiment expressed.The first two columns report the mean values of the relevant variables, and their counterfactual val-ues: what they would be in the absence of the network. The third column reports the differencebetween these columns (actual - counterfactual). The fourth and fifth columns report the boot-strapped 95% confidence intervals for the difference between the actual and counterfactual values.Bootstrapped resampling is done at the user level in all specifications.50Chapter 2The Econometrics of SocialInteractions on Online Networks2.1 IntroductionThis chapter proposes a general framework for estimating social interactions ononline networks. While some features of online networks facilitate the estimationof social interactions (such as extremely large sample sizes), estimation is subject tosome of the same problems (e.g. endogeneity), that arise in the estimation of socialinteractions in offline networks. In addition, estimation on online social networksoffers a specific set of challenges that offline networks largely do not.This chapter makes two contributions. The first is to propose a general frame-work for estimating peer effects on online networks. This framework discussesfour difficulties that are involved in obtaining causal estimates of peer effects, andproposes strategies for addressing them.First, naive estimation of the effect of sender speech on recipient speech suf-fers from an endogeneity problem: both are affected by time-varying shocks. Inaddition, sender speech is almost always measured with error. Chapter 1 showshow this problem can be overcome by finding variables that shift the volume ofmessages that users send.Second, social networks are dynamic. An attractive feature of social networksis that agents can be observed repeatedly over time. However, this these observa-tions tend to be closely spaced in time. An individual’s behavior could be not onlyaffected by their peers’ contemporaneous behaviour, but also by their peers’ past512.1. Introductionbehaviour; failing to take into this into account will lead to biased estimates. Toaddress this, I explain how to estimate a distributed-lag model in this context. I alsodiscuss the limitations about what can be learned about the dynamic process.Third, researchers do not generally observe all the messages a user receives.This is because online social networks tend to have one very large cluster that com-prises most of the nodes on the network (Kumar et al., 2010, Myers et al., 2014).If a sample is taken of the users in this cluster, even of quite similar users, the totalnumber of neighbours will be orders of magnitude larger than the original sam-ple.47 Practically, this makes it nearly infeasible for the researcher to collect all ofthis data. Nor can I think about this problem going away as the number of recipientsor the length of time they are observed goes to infinity.Fourth, data is (heavily) censored. Individuals choose whether or not to sendmessages, and typically do not. But if the outcome of interest is based on thecontent of messages, this means that the outcome of interest is typically unobserved.Since factors that change an individual’s message content also likely affect theirpropensity to send messages, the data presents a selection problem. I show howto estimate the equivalent of a Heckman selection correction when the selectionprocess is negative binomial instead of Bernoulli.Chapter 1 represents a proof of concept that instrumental variables can be usedto estimate peer effects on online networks. This chapter substantially generalizesthe methods used in Chapter 1. The purpose of doing so is to expand the numberof questions that can be addressed with this instrumental variables strategy, and toallow for meaningful comparisons to be made across questions and contexts.However, successfuly implementing solutions to the above problems dependscrucially on the existence of instruments, in some cases on multiple instruments.The second contribution of this chapter is to propose six new volume shifters that47One could in principle use the small clusters as the sample (in which case the relevant senderswould be limited to those in the same cluster), it is likely that these users are not representative ofthose from large clusters. Myers et al. (2014) shows that the largest weakly connected cluster onTwitter contains over 99% of users.522.1. Introductioncan be used to construct instruments for feed content. These new volume shiftersare 1) the probability that the user experienced rain in the three hour period, 2) thedifference between this probability and its 39 year average, 3) the absolute devia-tion from 13.5◦ Celsius, 4) the difference between this deviation and its 39 year48average, 5) predicted message volume as a function of local time, common acrossall users, and 6) predicted message volume as a function of Greenwich Mean Timefor each user.While the framework outlined here depends critically on the existence of suchvolume shifters, it also clarifies what conditions an instrument needs to satisfy inthis context. These are slightly different than the typical IV case. I argue thatthe identification conditions are somewhat less stringent than we would normallyexpect for instrumental variables. The framework and the instruments are comple-ments: without the framework, there are conceptual reasons why the instrumentsgenerated from different volume shifters would not pass an overidentification testeven if they would do so once the above problems were addressed.While the identification conditions may be weaker, I am forced to make strongassumptions to address the other three issues. To the extent that the specific assump-tions I make do not appeal to other researchers, I welcome other approaches. Partof the contribution of this chapter is pointing out where the most serious problemsare in estimating social interactions online.This chapter relates directly to a very small literature that causally estimatessocial interactions on online networks. The bulk of this literature is experimental.Kramer et al. (2014) estimate the transmission of happiness and sadness. Lewiset al. (2012) estimates how individuals are influenced by their neighbours purchas-ing decisions, and Bond et al. (2012) shows how messages that encourage turnouttransmit across the network. All three papers used data from Facebook, and the au-thors partnered with Facebook to change the content users saw. The disadvantageof this approach is that it is limited to topics that Facebook approves of, and each48This uses the full extent of the ERA interim data available from the ECMWF (Dee et al., 2011).532.2. General Frameworkexperiment is essentially single-purpose.A smaller branch of the literature uses observational data. Coviello et al. (2014)use observational data and attempt to causally estimate peer effects. They argue thatrainfall makes people unhappy, and use this to instrument for feed content. I dis-cuss the limitations of this approach in Chapter 1. Aral and Nicolaides (2017) alsouse variation in rainfall, but they use it to estimate how physical activity (running)transmits on a social network for runners, Strava.The rest of the paper is organized as follows: Section 2.2 lays out the generalframework, the four problems, and proposes solutions for them. It also discusseswhich of these problems are likely to be most serious. Section 2.3 describes thenew instruments. Section 2.4 applies the techniques described in section 2.2 tocalculating the strength of the first stage for each instruments. Section 2.5 concludesby discussing the logical next steps for expanding and improving on this framework.2.2 General FrameworkLet r index message recipients, and s index message senders. The data is binnedinto time intervals, where t indexes time bin.The set of senders recipient r follows is Rr. For simplicity of exposition, I as-sume this set is fixed: in practice it is quite persistent (at least on Twitter). However,this is an admittedly strong assumption. I make it for two reasons. First, even on afixed network, there are significant challenges in estimation that must be addressed.I view the static network as a useful baseline for the more general dynamic case.Second, all of the instruments presented here and in the previous chapter vary muchmore within-day than they do across days. Viewing the network as static is consis-tent with the fact that it is generally static over the time horizon of the identifyingvariation in the instruments.Define Yrt to be the outcome of interest of user r at time t. If the underlyingdata is irregularly spaced (e.g., messages on Twitter or Facebook), this could be542.2. General Frameworkmeasured as the average of the messages r sends out in a fixed time interval. Inother words, the data is binned into set intervals of time. To simplify the exposition,I assume that Y is a measure of happiness for both senders and recipients. Tit is thenumber of messages sent by user i at time t.I am interested in how users respond to the content of the messages they receive,Feedrt , conditional on controls, Wrt.Yrt = βFeedrt +W′rtθ + εrt (2.1)Feedrt =∑s∈Rr f (Tst)Yst∑s∈Rr f (Tst)(2.2)f (Tst)≥ 0, f ′ (Tst)> 0, f ′′ (Tst)< 0, f (0) = 0A user’s feed is a weighted mean of the outcome of the users they follow. Iassume each additional message from the same user has diminishing impact. Whilethe general idea of concavity is probably uncontroversial, the choice of functionalform is somewhat arbitrary. I assume:f (Tst) = ln(Tst +1) (2.3)In what follows, I present each problem as a departure from an idealized setup,and discuss interactions between the problems only where they are relevant. Thissimplifies the notation and focuses attention on the problem at hand.2.2.1 Spurious Correlation and Measurement ErrorThe main problem of endogeneity relates to the nature of online social networks.Network content is essentially discussion of and responses to a series of newsshocks. Other researchers have shown that online social networks are character-ized by homophily, both in terms of sentiment (Bollen et al., 2017) and political552.2. General Frameworkaffiliation (Halberstam and Knight, 2016). If this was the only source of spuriouscorrelation, this could be addressed by including user-fixed effects as controls. Al-ternatively, if everyone responded in the same way to news shocks, relative to theirbaseline, this could be addressed by including time-fixed effects as controls. Thechallenge is that users react heterogeneously to news shocks and are likely to fol-low people who will react in similar ways to themselves. In other words, changesin feed are correlated with changes in the error term, even after including user-fixedand time-fixed effects.The key insight of Chapter 1 is that it is possible to construct an instrument forthe feed by finding variables that shift the volume of messages that senders send. Idiscuss this again here to make a number of clarifying points about the source ofvariation of the instrument. Assume:ln(Tst +1) = δ0Vst +µst (2.4)I made the same assumption in Chapter 1. It is admittedly quite strong. I assumethat the effect of the volume shifter on the user is constant over all periods and allusers. This assumption would be violated if users completely disengaged from thenetwork at some times. For example, suppose V is rainfall. If it rained, but thesender was asleep, it might well have a different effect on message volumes than itwould if the sender was awake.49 Then:Feedrt =∑s∈Rr Yst (δ0Vst +µst)∑s∈Rr δ0Vst +µstSome of the variation in feed sentiment is driven by shifts in volume, not within-user shifts in the sentiment of the messages. The complication is that there isn’t onesingle volume shifter that affects the recipient’s feed; there are as many as there aresenders. Conceptually, I want the instrument to be the change in the feed that is theconsequence of the net effect of all of these volume shifters. That is, the difference49I discuss this possibility in more detail in Section 2.4562.2. General Frameworkbetween the observed feed and a counterfactual where the volume shifters wereremoved.Zrt = Feedrt−FeedCFrt (2.5)One way to operationalize this counterfactual is to think about what it wouldhave been had all the volume shocks been realized as their long-run average, Vs.For simplicity, assume that, ∀s, E [Zst ] = c (if necessary, we can force this to be trueby subtracting the mean of Vst from the realized values for each sender, so c = 0).Define:FeedCF0rt =∑s∈Rr Yst (δ0Vs+µst)∑s∈Rr δ0Vs+µst(2.6)There are two problems with this approach. First, the instrument’s effect onfeed is nonlinear in parameters, which is not ideal for computational purposes. Sec-ond, some of the variation in feed content is being driven by volume shocks that arecommon within a recipient’s feed at a point in time. To clarify this, consider an al-ternative counterfactual, one where each volume shock is replaced with the averageof the volume shocks within the same feed, V ∗rt = 1#Rr ∑s∈Rr Vst , at that time (#Rr isthe number of senders in ofRr).FeedCF1rt =∑s∈Rr Yst (δ0V∗rt +µst)∑s∈Rr δ0V∗rt +µst(2.7)Since the sum of the shocks is the same in Feedrt and FeedCF1rt , the denomina-tor will also be the same. The difference between the two is:δ0Z′rt = δ0∑s∈Rr YstVst−∑s∈Rr Yst 1#Rr ∑s∈Rr Vst∑s∈Rr Nst(2.8)This has the advantage of being linear in parameters.50 It is also conceptually50Chapter 1 employed a very similar instrument. The difference was that the denominator inChapter 1 was the the number of senders, not the sum of their message volumes. This avoided572.2. General Frameworkattractive - the variation in the instrument is driven by the interaction between thevolume shocks a user’s neighbours experienced, and their average level of senti-ment. Note that that this counterfactual is not internally consistent. When multiplerecipients follow the same sender, this implies that that sender receives differentvolume shocks in the counterfactual depending on whom they are sending mes-sages to.The difference between these two counterfactuals can then be approximatedtaking the first term in the Taylor series of FeedCF1rt (V∗rt) at V∗rt = 0, and rearranging.FeedCF0rt −FeedCF1rt ≈−∂FeedCF1rt∂Vrt|Vrt=0V ∗rt= δ0∑s∈Rr Yst∑s∈Rr Nst−∑s∈Rr YstNst(∑s∈Rr Nst)2 V ∗rtFeedCF0rt −FeedCF1rt ≈ δ0∑s∈Rr Yst−Feedrt∑s∈RrV ∗rt (2.9)What is the difference between FeedCF0 and FeedCF1? The first one is con-stant over time. The second one is not. But, within period, neither have anyvariation in sender shocks. In other words, the difference between FeedCF1rt andFeedCF0rt is based on changes in feed content driven by a constant level increase(or decrease) in Vst for all senders.How is this possible? In the case of happiness, suppose the happy senders don’tsend many messages, and the unhappy senders do. Then a positive shock to allsenders’ feed volume will increase the share of the feed coming from happy people.This is unattractive for two reasons. First, (as Chapter 1 shows) the relationshipsbetween both feed sentiment and feed volume and recipient sentiment are endoge-nous. If the instrument for feed sentiment is driven by changes in feed volume, itwill not be possible to disentangle a volume effect from a sentiment effect.having to deal with missing sender problem that I address in Section 2.2.3 at the cost of reducedpower.582.2. General FrameworkSecond, it is heavily reliant on the particular functional form we specified inequation (2.4), as discussed above. For these reasons, we construct the instrumentsfrom the volume shifters using (2.5) and (2.7).Measurement Error/Zero TweetsI only observe Yst if s sends at least one message at time t. However, the instrumentmust be calculated over all senders, whether or not they sent a message. In addition,even if we observe Yst , it is very likely noisily measured.51 To deal with this, Ireplace Yst in equation (2.8) with its long-run average, denoted Ys =∑Yst>0 Yst/ f (Nst)∑Yst>0 1(this is an average of averages, not a simple average), and obtain:δ0Zrt = δ0∑s∈Rr YsVst−∑s∈Rr Ys 1#Rr ∑s∈Rr Vst∑s∈Rr Nst(2.10)The numerator is the difference between the average sentiment of a recipient’sfollowers, weighted by volume shocks, and the average sentiment of the recipient’sfollowers multiplied by the average of the volume shock. This is scaled by the sizeof the feed. This formulation addresses an additional concern. Depending on thechoice of Vst , it may also affect sentiment directly. For example, if the instrumentis rainfall, people may tweet more when it rains but it may affect their sentimentdirectly.This also changes how we can interpret the results. Divide the feed into twocomponents:Feedrt = ExpFeedrt +FeedShockrt (2.11)ExpFeedrt =∑s∈Rr NstYs∑s∈Rr Nst51Text analysis generally produces noisy measures, particularly of short messages. Even if themessages were read and coded by human coders, the coders would not necessarily share the partic-ular social context of the sender in question, and may interpret their messages differently from thesender’s followers.592.2. General FrameworkFeedShockrt =∑s∈Rr Nst (Yst−Ys)∑s∈Rr NstThe first term, ExpFeedrt , is what the feed would be on average, conditional onknowing the volume of messages that were received from each sender. The secondterm FeedShockrt , is the difference between the actual feed and the first term; it isthe change in feed driven by senders expressing more or less happiness than wouldbe expected based on the long-run average of their sentiment, The instrument isacting entirely through the first term, not the second. The drawback of this is thatI am unable to identify differences between how people respond to the differentsources of changes in their feed in this framework.52 I discuss extensions that couldaddress this in Section 2.5.ExpFeedrt = δ0Zrt +W′rtγ+µrt (2.12)Main ControlsThere are two sets of controls that must be included in Wrt: local-time-of-day fixedeffects and time-fixed effects.Time-fixed effects are required to deal with common shocks to sentiment. Tosee why, suppose users on the east coast are generally happier than those on thewest coast, and it rains on the east coast. Eastern users will tweet more, the av-erage Twitter user’s feed will be more positive. However, for any one user, theirexpressed sentiment could be responding not only to their own feed, but also totrending topics, or searches, or even to other social networks. This is because allof these sources of information will be more positive at the same time. If onewere merely interested in establishing that emotional transmission occurred, this52How important this drawback is will depend on the variable of interest. Suppose one is modelinginformation transmission on a network of rational agents. Then agents should not respond to changesin ExpFeedrt at all, and the usefulness of this approach will be limited to testing that β0 = 0. Onthe other hand, if the variable of interest is a dimension of sentiment that individuals respond tosubconsciously, this is easier to justify.602.2. General Frameworkdoesn’t matter — depending on the volume shifter, the common shocks may still beplausibly exogenous. But in order to estimate the magnitude of transmission, thisalternative channel needs to be ruled out.Local-time-of-day fixed effects are recommended because there are systematicpatterns in sentiment throughout the day (people are happier in the mornings), andmany of the volume shifters are also subject to daily cycles.In addition, it makes sense to add user-fixed effects to all specifications. Thesefixed effects are unlikely to affect the consistency of the estimates (the instrumentwill be close to mean zero for all users), but they do reduce variance.Placebo ControlsThere are additional transformations of the volume shocks that are worthy of con-sideration.53 First, the volume shifter could directly affect the outcome of interestfor the senders:Yst =V ′stζ1+W′stθ + εst (2.13)It is plausible that the volume shifter could affect feed. Baylis (2015) showsthat sentiment on Twitter is affected by rainfall.54 However, this is unlikely tosubstantially bias the estimate of β , for two reasons. First, the instrument wasconstructed so it is not driven by changes in the average value of the shock — whatmatters it is how the realization of volume shocks interacts with the average ofsender sentiment. Second, the definition of feed does not respond to these short-term fluctuations in sender sentiment. However, we can plug equation (2.13) into53I mention these partly because they were more important for the estimation strategy used inChapter 1 because the construction of the instrument was slightly different.54The reader may ask why not use this as an instrument for feed content. This approach is pursuedin Coviello et al. (2014), but it is problematic in this context. Social network locations are generallymeasured with error, and the locations of neighbours is potentially informative about a user’s ownlocation. This creates a potential violation of the exclusion restriction for location-based instrumentslike rainfall; neighbour rainfall affects a user not through their feed but through being predictive oftheir own rainfall. It it also implies that each outcome of interest requires a different instrument.612.2. General Frameworkequation (2.2) and calculate:∂Feedst∂ζ1=∑s∈Rr NstZst∑s∈Rr Nst(2.14)Alternatively, we can control for FeedShockrt . The disadvantage of this ap-proach is that the control is contaminated with measurement error.Second, the instrument could be correlated with average feed size, and the sen-timent of recipients could be directly affected by feed volume.Yrt = βvol1#Rr∑s∈RrNst +βFeedrt +W′stθ + εstIn theory, the construction of the instrument effectively rules this out the instru-ment shifting volume, but adding 1#Rr ∑s∈Rr Vst as a control is a useful placebo. Thiscontrol can also be used as an instrument for the effect of feed volume on recipientsentiment, after controlling for 2.9, 2.14, and Vrt , the volume shock of the recipi-ents. These controls are necessary to isolate a shock to feed volume that doesn’taffect sentiment.Third, the recipient’s own volume shock, Vrt , may be correlated with the averageof the sender’s volume shocks (because they are in similar locations, for example).Again, there isn’t a strong argument for why the average of the sender volumeshocks should be correlated with Feedrt . But since the recipient’s sentiment islikely affected by their own volume shock, this is also a reasonable control.Yrt =Vrtζ2+βFeedrt +W′stθ + εstIn principle, there isn’t a strong argument that any of these controls should matter,but they represent a reasonable placebo test that the instrumental variables strategyis doing what it should. This particularly true in view of the fact that I am combiningmultiple fixes to multiple different problems.One nice feature of an identification strategy where the variables in this sectionare not necessary for consistency is that it implies that for location-based instru-622.2. General Frameworkments, the recipient locations only need to be known for the local time-of-day fixedeffects. If it turns out that the estimates are robust to including or excluding the localtime fixed effects, this implies that recipient locations need not be estimated, evenfor location-based instruments. This would greatly simplify the selection procedureand reduce concerns that the users that volunteer locations are atypical.552.2.2 DynamicsIt is very difficult to make an argument that lags of feed cannot affect a recipient’ssentiment, for two reasons. First, while the order in which messages appear in thefeed is partly chronological, there is no particular reason why a user could not scrolldown to see messages from the previous time bin. If the user in question followed arelatively small number of accounts, it would often be the case that messages fromprevious periods would be quite close to the top of their feed. Second, messagesare likely to have persistent effects. In the case of emotional transmission, theframework necessarily assumes some degree of persistence. If not, the emotionswould revert to baseline between the time the recipient received a message and senta message of their own, and transmission would always be zero.56 I model this asfollows:Yrt =T∑k=0βkFeed∗rt−k +W′stθ + εrt , K > 1 (2.15)Yrt is a function of contemporaneous feed, and all lags of feed, and Feed∗rt−kis the feed that the recipient observes at time t − k (which is unobserved to theresearcher). T is the total number of periods.55Of course, it is possible that the local time-fixed effects matter only for the users who do notvolunteer location information, but it is unclear why this would be the case.56In the case of informational transmission, we might expect a high degree of persistence, asagents forget what they learn relatively slowly.632.2. General FrameworkFeed∗rt−k =T∑j=0κt−k− jFeedt−k− j (2.16), ∑Tj=0κt−k−, j = 1, κrt−k− j ≥ 0∀ j. The assumption that the κ ′s are constant overrecipients and stationary over time is admittedly quite strong, but greatly simplifiesthe analysis.Here the β ’s are parameters of interest. They describe the degree of emotionalpersistence over time. The κ ′s are effectively nuisance parameters — they describehow quickly users see the messages in their feed. This implies:Yrt =T∑l=0Feedt−ll∑m=0βmκl−m (2.17)If I estimate the baseline model equation (2.1), thenβˆ = β0+T∑l=1ρFeedt,t−ll∑m=0βmκl−mAssuming the correlation between feed and its kth lag, ρFeedt,t−k is positive,this would overestimate β0. However, the parameter of interest is not β0, it is thetotal effect that a shock to feed today will have of recipient sentiment now and inthe future, which is ∑Tl=1∑lm=0βmκl−m. Put another way, the coefficient of interestshould not depend on how large the time bins I created are.Consequently, failing to account for the lags here will be conservative; it resultsin an underestimate of the responsiveness of users to the sentiment of their feed,because ρFeedt,t−k < 1.Without any additional information, the β ’s and κ ′s are not separately identi-fied. Fortunately, Twitter data provides key additional information (and other socialnetworks may as well). Twitter users have the option to retweet the messages of thepeople the follow. When they do so, I observe both the creation and retweet timesof the message.57 If one is willing to assume that if users retweet a message, they57I restrict my attention to cases where the message the recipient retweets was created by the642.2. General Frameworkdo so immediately upon viewing, I can use the pattern of retweets to estimate κ ′s.Define Mrτ to be the τth message sent by user r in the entire sample period (herethe time subscript subscripts messages, not time bins). Define R(Mrτ) to be themessage Mrτ is a retweet of. Define datetime(Mrτ) to be the timestamp of messageMrt , in seconds. Then:gaprτ = datetime(Mrτ)−datetime(R(Mrτ))κˆl = ∑r∈R1TrTr∑τI(l ≤ gaprτL< l+1)κˆl is then the average fraction of retweets that occur with lth lag of the originalmessage (L is the length of a time bin in seconds). I can plug the estimated κˆ ′s intoequation (2.17) to getYrt =T∑l=0Feedt−ll∑m=0βm ˆκl−m (2.18)This cannot be estimated because it relies on having observations for an un-realistic number of lags. However, it is likely that persistence on social networksis relatively low: Topics typically change completely from one day to the next, al-though in rare cases a news story may persist for several days. I propose estimating:Yrt =K∑l=0Feedt−ll∑m=0βm ˆκl−m (2.19)with K = 8 (a full day of lags), and testing against K = 16 to ensure that theresults do not change too much.58sender, and is not a retweet of a retweet.58A high degree of persistence on the network would raise concerns about reflection. While I donot expect this to be the case, there isn’t much to be done about it if it is a problem, except chooseinstruments that are not serially correlated.652.2. General FrameworkDynamics with EndogeneityWith instruments, the relevant correlation is between Zrt and Zrt−k, instead of ρFeedt,t−k.However, provided that the instrument is also positively serially correlated withinuser, a similar result holds. The seriousness of this problem will depend on thestrength of within-user serial correlation. This in turn implies that if used naively,two volume shifters could produce instruments that fail an overidentification testfor no other reason than that they exhibited different degrees of persistence.It is tempting to try and estimate equation (2.19) with instruments. However,this would require as many instruments as there are lags. In principle, one coulduse lagged values of the instrument, but serial correlation of the instruments willcause that power to fall quickly as more lags are added.For some instruments that are based on daily fluctuations, the correlation be-tween the instrument and the eighth lag of the instrument (24 hours ago) is veryclose to one. The instruments do not really represent shocks to the feed at a partic-ular point in time — they represent shocks to the sequence of the feed at particularpoints in time. But this is acceptable, as long as the instrument affects the first stage.I constrain equation (2.19) as followsYrt =K∑l=0Feedt−lβl∑m=0λ−l ˆκl−m+W′stθ + εrtTypically, W′st will include user-fixed and/or time-fixed effects. In general, non-linear models with high-dimensional fixed effects are not easily solved. However,in this case the model is linear in all variables — the nonlinearity is in parameters. Ican apply the Frisch–Waugh–Lovell (FWL) theorem to remove controls, includingfixed effects and then estimate the following, much simpler model, using nonlinear-least squares.PWYrt = βK∑l=0PW Feedt−ll∑m=0λ−l ˆκl−m+PW εrt662.2. General FrameworkThis requires only two instruments per dimension of sentiment. I can then checkto verify that if the persistence of emotions is described by λ , the lags of feed moredistant than K will have negligible impact on the results. If this is not the case, Kwill have to be increased, but this will not require estimating additional parameters.Finally, I make an additional adjustment to the first period in the above equation.Yrt is aggregated over messages sent in the same time bin as messages in Feedrt .Assuming both sender and recipient messages are uniformly distributed over timewithin bin (consistent with both being generated as a Poisson process), the proba-bility that a given message sent by the recipient occurs after a given message sentby one of the senders is 0.5.59 Since recipient messages cannot be affected by mes-sages they have not yet received, it makes sense to deflate the expected magnitudeof the effect of contemporaneous (in terms of bins) feeds by this value.PWYrt = βK∑l=0(1− I(k = 0)2)PW Feedt−ll∑m=0λ−l ˆκl−m+PW εrtNote if βk = 0 ∀k, we don’t need to worry about the channel of alternate timeperiods creating a violation of the exclusion restriction. This is a general feature ofthese social network problems. The size of the tests under the null will be correct,but if β 6= 0, there are many additional potential channels for the instrument toaffect the dependent variable that I need to account for. This implies it is muchmore challenging to consistently estimate the magnitude of emotional transmissionthan it is to detect that it exists.2.2.3 Missing Data — SendersThis section addresses two limitations of the data collection procedure that createmeasurement error in the endogenous variables and the instruments. In both cases,59Admittedly, this ignores volume effects — it assumes none of the recipients messages are cre-ated in response to sender volume — but it is a reasonable first approximation.672.2. General Frameworkthe problem is that I do not observe all of a recipient’s feed.The first limitation is that I may only observe recent messages of a sender, notall the messages they have sent. This is a characteristic of Twitter data (where Ireceive only the most recent 3200 messages) although other social networks mayhave similar limitations. Because the influential senders are especially active onTwitter, this restriction is binding in more than half of cases. However, for thisgroup I generally observe the instrument even if I do not observe their contributionto the endogenous variable number of messages they send, or the content of thosemessages.The second limitation is that I only observe any tweets for a subset of senders.For users not in the influential group, I observe neither the endogenous variable northe instrument. This limitation is likely to apply to almost any dataset collectedfrom any popular social network (e.g., Twitter, Facebook, Instagram). This is be-cause the only way not to have missing data is to collect data on all senders, whichwould be petabytes of data. Despite the fact that this limitation is more general, Iaddress the rate limiting issue first — it is simpler, and the intuition is similar forboth issues.Rate-limited Senders:Let[Rirt]i∈[c,l] be the sets of user-time bins that r follows that are collected (c), orrate-limited (l). So ∀t, Rr =Rcrt ∪Rlrt . Even through Rr is fixed, Rcrt varies; thefurther back in time collection goes, the more users to which rate-limiting applies.Define:Sirt =∑s∈Rirt Nst∑s∈Rrt Nst, i ∈ {c, l}This is the share of user r’s feed that is collected (c) or rate-limited (l) at t.Define:Zirt =∑s∈Rirt Zist∑s∈Rrt Nist, i ∈ {c, l}682.2. General FrameworkThis is the value of the instrument defined over the collected and rate-limitedportions of user r′s feed. By construction:Feedrt = ∑i∈{c,l}FeedirtSirtAssume:E[∑i∈{c,l}ZirtSirtεrt]= 0 (2.20)FeedirtSirt = ZirtδSirt +Wi′rtγSirt +µirtSirt (2.21)(2.20) is the standard IV Assumption. (2.21) assumes that the instrument affectsthe rate-limited and collected feeds in the same way. Since there are drawn fromthe same population, this assumption is not as strong as it may seem.There are two consequences of ignoring the rate-limited users. First, in someperiods, all of the senders a recipient follows will be rate-limited. These observa-tions will be dropped from the estimation and power will be negatively affected.Second, if Zcrt and Zlrt are correlated, e.g.:Zlrt = λ0+α1Zurt +υrt (2.22)Where α1 > 0, Zurt will be correlated with FeedlrtSlrt through Zlrt . But sinceFeedlrtSlrt affects the outcome of interest, the exclusion restriction will be violated,provided β 6= 0. As before, the rejection rate of the tests that ignore this channel ofcausation will still be correct.This is quite plausible because the users in the two groups are drawn from thesame sampling procedure. The only way it would not be the case is if the expectedwithin-user covariance between Ys and Vst before the follower network was drawnwas zero in all periods. That is, all of the variation in the instrument is driven bythe fact that the senders are a “small” sample. As the feed got larger, the variation692.2. General Frameworkin the instrument would go to zero by the law of large numbers.Fortunately, even in cases where we do not observe tweet volumes and senti-ment for a sender, we generally still observe the instrument, Zlrt . This is because theinstrument aggregates a value for each sender that does not depend on the sender’svolume or sentiment expressed in the current period (e.g. long-run average senti-ment multiplied by contemporaneous rainfall).60Plugging (2.21) into (2.1) gives the reduced form equation:Yrt = ∑i∈{c,l}ZirtpiSirt + ∑i∈{c,l}Wi′rtΠSirt +νrtwhere νrt = εrt+∑i∈{c,l} µrtSirt . This is analogous to two-sample-two-stageleast-squares. It is not exactly the same because the samples substantially overlap in thesense that they are over the same (r, t) pairs. But the appropriate value of the in-strument in the reduced form equation and the first stage equation for the sameobservation will only be the same if none of the senders a recipient follows arerate-limited.In the first stage, we predict FeedirtSirt as a function of the instrument and con-trols on the collected sample only. We then apply these coefficients to predictingthe values of the instrument for the rate-limited portion of the sample. Given as-sumptions (2.20) and (2.21), and Scrt fixed, βˆ = pˆi/δˆ is a consistent estimator for βby the standard proof of the consistency of the two-sample-two-stage-least-squaresestimator (see Inoue and Solon (2010)).If instead we do not know Sirt but can replace it with Sˆirt = E[Sirt |Ω]βˆ =pˆi( ∑i∈{c,l}Zirt Sˆirt)′(∑i∈{c,l}Zirt Sˆirt)pˆi−1 pˆi( ∑i∈{c,l}Zirt Sˆirt)′Yrt60An important caveat to this is that the instruments do depend on the aggregate size of the feed.We address this in the following section.702.2. General Frameworkβˆ =pˆi( ∑i∈{c,l}Zirt Sˆirt)′(∑i∈{c,l}Zirt Sˆirt)pˆi−1 pˆi( ∑i∈{c,l}Zirt Sˆirt)′(∑i∈{c,l}ZirtpiSirt +νrt)βˆ =pˆi( ∑i∈{c,l}Zirt Sˆirt)′(∑i∈{c,l}Zirt Sˆirt)pˆi−1 pˆi( ∑i∈{c,l}Zirt Sˆirt)′(∑i∈{c,l}ZirtpiSirt +νrt)βˆ = β +(Dˆ)−1 ∑i∈{c,l}Zirt(piSirt− pˆi(Sˆirt)+νrt)By assumption, E[Zirtνrt]= 0, and pˆi→p pi . Sˆirt does not converge to Sirt , as thefraction of the feed that is sampled does not go to one as the number of users ortime periods goes to infinity. But by assumption, Sˆirt = E[Sirt |Ω]. Since Sˆirt variesby user, the difference will go to zero as the number of users goes to infinity.Estimating Feed Shares:A complication for applying this strategy to address rate limiting is that we donot observe Scrt . More specifically, we do not observe the denominator of Scrt ,(Ncrt +Nlrt)−1, because we do not see Nlrt , how many messages were in the feedthat were not collected in a given period.We replace Scrt with its expectation. Assume:∀r, i, t, T irt = ∑s∈RirtT ist ∼ PoissonT i∗rt ∑s∈RirtTsTtI could allow each sender within a feed to have a different underlying propen-sity to send messages, but since the sum of independent Poisson processes is alsoPoisson and we are only interested in aggregate volumes, this adds little to the anal-712.2. General Frameworkysis.61 The mean of the process is allowed to vary over time — this is consistentwith people sending more messages at some times of day than others, for example,and matches the overdispersion of the raw count data. The mean is also allowedto vary across user, which is consistent with some users sending messages morefrequently than others. Rirt is allowed to vary within user over time because anyuser that is rate-limited at some time must be collected at a some more recent pointin time.Latent message sending propensity is assumed to be independent of whetherthe user is in the rate-limited group or not (after accounting for user and time-fixedeffects):∀r, i, t, T i∗rt = T irtThe accounts in these two groups of were drawn from the same sampling procedure,with one exception: within a period the rate-limited senders must have sent mes-sages at a higher rate in the time periods after t62, relative to the collected senders,in order to be rate-limited in period t. It is reasonable to think these groups behavesimilarly after adjusting for average tweet volumes, Ts and Tt .This could be validated by testing the robustness of the estimates to droppingthe oldest messages for each rate-limited sender (i.e., drop the oldest 10%, 20% and50% of messages for all rate-limited senders). If the results do not change as therate-limiting becomes more severe, this is consistent with the results not changingif the rate-limiting was relaxed completely.To allow T ∗rt to vary, I fit a negative binomial regression to the observed counts.This assumes the counts are Poisson distributed and the means of the Poisson pro-cesses are Gamma distributed. Because the Gamma distribution is the conjugateprior of the Poisson distribution, this allows me to calculate a posterior for T ∗rt ,given T crt , and average rate at which the senders in the two groups send messages.61I lack the variation to readily identify it from the aggregated data in any case.62After because I collect the data at a later point in time but do so retrospectively, and I canobserve only a certain number of messages of for each sender722.2. General FrameworkAssume that within all feeds, T lst = 0 for all but one s. That is, one senderaccounts for all rate-limited tweets. This isn’t terribly realistic. However, withoutthis assumption we have model the distribution of tweets within senders. I discusshow one would relax this assumption in (2.2.3)Then the expected share of tweets generated by users for which the data is col-lected can be estimated as:ˆ(Ncrt +Nlrt)−1=∞∑k=01ln(T crt +1)+ ln(k+1)pNB(aˆpost ,bˆpost1+ bˆpost)(2.23)aˆpost = aˆ+T crtbˆpost = bˆ+1#Rirt∑s∈RirtTsTrwhere pNB(·, ·) is the probability mass function of the negative binomial dis-tribution. This follows from the fact that the negative binomial can be interpreted aGamma-Poisson mixture distribution. (2.23) is(Ncrt +Nlrt)−1 summed over all pos-sible realizations of T lst , weighted by the probability of k successes from a NegativeBinomial distribution with shape aˆpost +Trt and scale bˆpost . These are the appropri-ate posteriors for a negative binomial distribution given the priors of aˆpost and bˆpost .Since we restrict the sample to observations where T cst > 0, this is well defined.Unobserved SendersIn Chapter 1, I sampled a subset of the senders on Twitter for computational pur-poses. However, I did not subset randomly; instead, I chose highly influentialsenders to maximize the expected share of feed that I could collect for a subsetof fixed size. This subsampling is essentially unavoidable (unless the researcherpartners with the social network itself) because social networks have a relatively732.2. General Frameworklow degree of clustering. For any sample of the users on an online social networkof adequate size (say, 5000 accounts), the union of their neighbours will likely be onthe order of a million accounts. I denote the highly influential senders “observed”(o) and their complement “unobserved” (u).63Let[Rir]i∈[o,u] be the set of accounts that user r follows that are in the observedand mostly unobserved groups, respectively. Rr = Ror ∪Rur , Sirt =∑s∈RirtNst∑s∈Rrt Nst, i ∈[o,u].Yrt = ∑i∈{o,u}FeedirtSirt +W′rtγ+ εrtAs before, we assume that:FeedirtSirt = δ0ZirtSirt +Wi′rtγSirt +µirtSirt (2.24)In this case, the assumption is stronger because these two groups of accountsare quite different. Accounts in the influential group have many followers and areoften famous offline as well as online; accounts in the unobserved group tend tohave few followers and are more likely to have in-person offline connections to thefollowers they do have.Incomplete sampling creates the same problem here as the rate limiting does inthe previous section: Zort may affect Yrt through Zurt and Feedurt , which would violatethe exclusion restriction.This is not a function of the sampling procedure — a random sample of the feedwould be similarly biased. Again, this is only a problem if transmission is non-zero— the size of the test of β0 = 0 will still be correct. But the difference from the rate-limited case is that we do not observe Zurt in the periods where we do not observeFeedurt . Instead, they are both unobserved.I address this by exploiting the fact that the “unobserved” senders are onlymostly unobserved. In my case, the recipients are a random sample of the social63The observed senders represent the union of the rate-limited and collected groups from above.742.2. General Frameworknetwork’s users, so they are also a random sample of the senders of each recip-ient. This also implies that the assumption in (2.24) is testable if the sample of“unobserved” senders for whom data is collected gets sufficiently large.64For the set of users for whom I observe some of their “unobserved” followers,I can calculate a measure of Zurt . I would like to know the relationship between Zortand Zurt . I parameterize this asZurt = a0+a1Zort +υrt (2.25)Assuming that Sort is known, andE[Zo′rt µort |Sort]= 0E[Zo′rt µurt |Sort]= 0 (2.26)E[Zo′rt εrt |Sort]= 0Replacing Zort with Zort (Sort + aˆ1 (1−Surt)) and Feedrt with FeedrtSort allows forconsistent estimates of β0. For the proof, see Appendix (B).If we replace Sort with its expectation based on observables, following the samesteps as in the rate-limited case obtains a consistent estimate of β .Distributing Feed SharesThe final complication is that there is almost always more than one rate-limitedsender. Dividing the rate-limited tweets over multiple senders decreases Sˆcrt becauseof the concavity of the f (·) function. To address this, Icould estimate:Ncrt = ω∞∑k=0ln(T crt +1) pNB(aˆpost , bˆpost)+ηrt64If this were not the case, one could randomly sample the unobserved senders at moderate addi-tional cost.752.2. General Frameworkand choose the value of ω that best fits the data. I would then use this value ofω to fit:ˆ(Ncrt +Nlrt)−1= ωˆ∞∑k=01ln(T crt +1)+ ln(k+1)pNB(aˆpostωˆ,bˆpost1+ bˆpost)Even this is imperfect. Ideally we would want to do this in a more systematicway. We could estimate the negative binomial on the disaggregated sender data,and integrate up from there. This is computationally very intensive, and comparingthe log of the sum of tweet volumes to the sum of the logs for arbitrary groups ofrecipients suggests that the approximation error is small.2.2.4 Censored Recipient SentimentThe problem of censoring on the sender side was discussed in (2.2.1). A corre-sponding problem occurs on the recipient side:Yrt =Y ∗rt if Nrt > 00 otherwiseY ∗rt is the sentiment that user r experiences, and Yr is the sentiment that user rexpresses.P(T ≥ k) = 1−k∑j=0Γ(j+a−1)Γ( j+1)Γ(a−1)(a−1a−1+µrt)1/a( µrta−1+µrt) jΓ(·) is the Gamma function.µrt = Tr +Tt +δVstThe mean of the negative binomial is a function of a user-fixed effect, Tr, a762.2. General Frameworktime-fixed effect, Tt , and a volume shifter Vrt . This assumes overdispersion cannotbe accounted for by user-fixed and time-fixed effects, alone. This is consistent withwhat is observed in the data. Leth(k,µrt ,a) = 1−P(T ≥ k)τrt ∼U [0,1]Then the observed negative binomally distributed data is consistent with:Trt = k if τrt ∈ [h(k,µrt ,a) ,h(k+1,µrt ,a)]∀k ∈ NThis allows the selection problem to be framed as a type II Tobit. Consequently,it is very close to the typical case where a Heckman selection correction would beappropriate. Yrt is the observed variable, and Y ∗rt is the latent variable. There are twofactors that distinguish this from the typical case of selection bias.First, on a network, latent sentiment does not propagate: observed sentimentdoes. Even if I am only interested in how average latent sentiment responds toan external shock, for example, I need to understand how observed sentiment re-sponds to feed sentiment, because only observed sentiment propagates through thenetwork. If the effect of feed sentiment on observed sentiment is sufficiently large,the total response to the external shock will be much more sensitive to how observedsentiment responds than it will be to how latent sentiment does.Second, the selection equation is driven by a negative binomial process, not abinary one. While I could simply collapse the count down to a binary variable forwhether it was greater than zero, this approach is inefficient. Typically, a Heckmanselection correction requires an instrument — something that is driving selectionbut not the outcome of interest. In this case, the differences in sentiment between theobservations where Trt is greater than zero are potentially sufficiently informativeabout the sign and seriousness of the selection problem. For example, on my data772.3. New Instrumentsthe correlation between sentiment and the number of tweets sent (on the subset ofobservations where sentiment is observed) is 0.23. This implies that selection biasdrives estimates of the effect of feed sentiment on recipient sentiment downwards.The fact that selection is driven by a negative binomial process makes the en-dogeneity easier to correct than it would be in the traditional case. However, thisdoes mean that I can’t use the traditional Heckman estimator to adjust for selection.Instead, the equivalent of the Inverse Mills ratio is:IMRrt =∞∑k=0I(Trt = k)1h(k+1,µrt ,a)−h(k,µrt ,a)2.3 New InstrumentsThe instrument I developed in Chapter 1 is based on using daylight as a volumeshifter. I introduce six additional volume shifters that can be used to constructinstruments.Weather-based InstrumentsFirst, DEV T EMPst = ∑ pi|Tempi−13.5|. This is an estimate of the absolute devi-ation from 13.5◦ C experienced by the sender. It is an estimate because locationsare imperfectly known. pi is the probability that sender s is at grid square i. I chose13.5◦ Celsius as the bliss point because it is associated with the lowest level ofmessage sending among recipients.Second, IPRECIP = ∑ piI(Precip > 0). This is the probability that the senderexperienced rain during the time bin in question.Both these instruments are based on the idea that weather is outside of user’scontrol. The intuition for both is that worse weather (extreme temperatures, orrain) discourages people from going outside. And when they are inside, users aremore likely to be active on social media. That being said, I emphasize that I amnot arguing that the first stage coefficients of these instruments causally estimate782.3. New Instrumentsthe effect of temperature or precipitation on tweet volumes. It could be capturingdaylight, or people having more time at some times of the year than others to tweet,or other seasonal effects, such as temperature affecting the rate at which local newsshocks arrive (e.g. temperature affects crime which affects stories about crime).The less attractive feature of these instruments is they combine two sourcesof variation: relatively high-frequency (1 week) variation in weather, and low-frequency (annual) changes in seasons. It is important to point out that this doesn’tviolate the exclusion restriction per se. Suppose sender and recipient locations werecorrelated, so that recipients were rained on more, on average, when senders wererained on. By construction, the instrument does not depend on this average increasein volume. Even if the recipient’s location was correlated with the locations of thehappy people in their feed, but not of the unhappy people, the exclusion restric-tion would not be violated, because I control for the value of the instrument for therecipient.If the locations were measured with error and the measurement error on the LHSwas correlated with the measurement error on the RHS, differentially for happysenders, this would violate the exclusion restriction. I find it difficult to construct aplausible story for why this would be the case.However, out of an abundance of caution, I introduce the third and fourth in-struments. These aredDEV T EMPst =(DEV T EMPst− 1Yw−1YW∑y=1DEV T EMPst−y)dIPRECIPst =(IPRECIPst− 1Yw−1YW∑y=0IPRECIPst−y)Each is an estimate of the difference between the original shifter, and its averageover the previous YW = 39 years (all that I have data for).65This is essentially a non-parametric estimate of the shock to weather. These65Weather data is collected from the ERA Interim model of the ECMWF (Dee et al., 2011).792.3. New Instrumentsvolume shifters have two advantages. First, they are not deterministic. If therewas a concern that some users are manipulating feeds by anticipating cycles involume (attempting to send messages when they know their followers receive fewermessages), these instruments could be used to test for that. Second, these volumeshifters are poorly explained by time-fixed effects, so they are potentially moreuseful for estimating the effect of volume on sentiment.Cyclic InstrumentsThe fifth new instrument, LOCALT IMEst is obtained by regressing the sender mes-sage volumes on the local time of the sender at the time of the message.ln(Tst) =24∑h=1ch−8∑tz=−4ptzI(tz− t = h)+ εrtThere is a dummy for each local time. I then use the fitted values to constructthe instrument. As before, the ptz terms (probability that the user in question is ina particular time zone) enter because locations are probabilistic. Conceptually, thisinstrument is similar to daylight. The differences are that a) it doesn’t assume adramatic drop-off in message volumes when the sun drops below the horizon, thedecline can be more continuous and b) it only varies within the day.The sixth new instrument isBART IKst =8JJ∑j=1I( mod ( j,8) = mod ( j,8)) f (Tst)− 1JJ∑j=1f (Tst)This is the additional average volume of messages user s sends out at whatevertime of day it is at t, relative to the rest of the day. This instrument is unique in thatit doesn’t depend on the location of the sender. This may be important for extendingthis approach to networks where no information is available on user locations.802.4. ResultsTable 2.1: Summary StatisticsCount Min. Max. Mean. St. Dev.ExpFeed 12,045,821 -.0502 .169 .0756 .0173Feed Shock 11,990,379 -.409 .316 .0113 .0248Feed Size 12,151,591 .00148 117 .184 .466Dev. Temp Reallocation IV 12,151,591 -146 215 .434 2.42Any Precip. Reallocation IV 12,151,591 -6.28 7.98 .0117 .0798Temp. Shock. Reallocation IV 12,151,591 -84 85.6 .014 .906Precip. Shock Reallocation IV 12,151,591 -4.55 5.47 -.0011 .0597Light Level Reallocation IV 12,151,591 -7.34 11.8 .0223 .13Bartik Reallocation IV 12,151,591 -10.1 10.2 -.000206 .0604Local Time Reallocation IV 12,151,591 -.231 .318 .000184 .00771Long Run Reallocation IV 12,151,591 -2.54 1.67 -.00806 .0252Dev. Temp, Vol. UnWeighted 12,151,591 .0000258 37.1 3.98 2.12Any Precip., Vol. UnWeighted 12,151,591 0 1 .107 .111Temp. Shock., Vol. UnWeighted 12,151,591 -15.8 21.2 .0331 1.76Precip. Shock, Vol. UnWeighted 12,151,591 -.946 1 -.0102 .11Light Level, Vol. UnWeighted 12,151,591 0 1 .273 .228Bartik, Vol. UnWeighted 12,151,591 -23.7 13.4 .132 .159Local Time, Vol. UnWeighted 12,151,591 -.314 .172 .00797 .0725Long Run, Vol. UnWeighted 12,151,591 -4.63 17 .104 .1592.4 ResultsData was collected using the same procedure as Chapter 1, from the same sample.Only the construction of the instrument differs. Table 2.1 presents the summarystatistics for the key variables.Table 2.2 presents the results of using volume as a shifter, and using the averageof the volume shocks to instrument for feed size. Standard errors are clustered atthe user level.The results in the first column are directly comparable to those in Chapter 1 be-cause the instrument is defined in a different way. In general, a number of the instru-ments are reasonably powerful. The exceptions to this are the local time instrumentand the Bartik instrument. Because these instruments are completely explained by812.4. Resultsdaily cycles, much of their variation is absorbed by time-fixed effects. In addition,the instruments based on deviations from climate averages have the wrong sign, aresult I do not have intuition for.822.4.ResultsTable 2.2: Effects of Volume Shifters on Feed Size(1) (2) (3) (4) (5) (6) (7)Light Level, Vol. UnWeighted 0.134∗∗∗(14.75)Dev. Temp, Vol. UnWeighted 0.00345∗∗∗(6.33)Any Precip., Vol. UnWeighted 0.0651∗∗∗(16.26)Temp. Shock., Vol. UnWeighted -0.000253(-0.99)Precip. Shock, Vol. UnWeighted -0.0254∗∗∗(-13.15)Bartik, Vol. UnWeighted 0.0915∗∗(2.69)Local Time, Vol. UnWeighted 0.0137(0.60)Observations 11988635 11988635 11988635 11988635 11988635 11988635 11988635IV F stat. 217.6 40.04 264.2 0.989 173.0 7.222 0.356Sentiment-Volume IV X X X X X X XReceiver Volume Shifter X X X X X X XRecipient Local Time Dummies X X X X X X XUser Fixed Effects X X X X X X XTime Fixed Effects X X X X X X Xt statistics in parentheses. **, **, and *: significance at 0.1%, 1%, and 5% levels, respectively.Standard errors are clustered at the user level.832.4. ResultsTable 2.3 presents the results obtained by constructing the various instrumentsand using them to predict the persistent component of the recipient’s feed.842.4.ResultsTable 2.3: Effects of Instruments for Sentiment on Feed Sentiment(1) (2) (3) (4) (5) (6) (7)Light Level Reallocation IV 0.000137(0.58)Dev. Temp Reallocation IV -0.0000941∗∗∗(-6.68)Any Precip. Reallocation IV -0.00139∗∗∗(-5.33)Temp. Shock. Reallocation IV -0.0000870∗∗∗(-7.16)Precip. Shock Reallocation IV 0.000376∗(2.56)Bartik Reallocation IV 0.00160∗(2.40)Local Time Reallocation IV 0.00597(1.55)Observations 11988635 11988635 11988635 11988635 11988635 11988635 11988635IV F stat. 0.341 44.59 28.39 51.33 6.561 5.764 2.410Recipient Local Time Dummies X X X X X X XUser Fixed Effects X X X X X X XTime Fixed Effects X X X X X X Xt statistics in parentheses. **, **, and *: significance at 0.1%, 1%, and 5% levels, respectively.Standard errors are clustered at the user level.852.5. ConclusionIn general, these results are somewhat disappointing. In particular, there doesn’tappear to be much of a relationship between the instruments that perform well inpredicting volume and the instruments that perform well in predicting sentiment.While there are reasons that some instruments might perform relatively better atone or the other, it is more difficult to explain the reversal of signs in several casesbetween the two tables. I suspect that the reason for this is that (2.4) is misspecified.This misspecification error is likely averaged out more effectively when instrument-ing for volume than for sentiment. I could instead specify:ln(Nst +1) = δ0Vst +θs+θt +δ1 (θs+θt)Vst +µstThis way, senders who were very unlikely to send messages could be estimatedto be less likely to be affected by the instrument. They would, in turn be assignedless weight when calculating the counterfactual. The disadvantage is that this sig-nificantly increases computational cost, but it appears to be necessary.2.5 ConclusionThis chapter presents a framework that addresses important issues in the estimationof social interactions on online networks. I address four problems: endogeneity,dynamics, missing data on the sender side, and unobserved data on the recipientside. The solutions discussed are computationally straightforward. This is impor-tant because if there are four problems, each of which requires a complex solution,the combination of these complex solutions is likely to be exponentially more com-plex. Put another way, part of the contribution of this paper is that the solutions Ipropose fit well together.I also present six new volume shifters that can be used to construct instrumentsfor feed content. Unfortunately, the empirical performance of these shifters is some-what disappointing. I discuss why this might be the case and suggest an improve-ment that could address this.862.5. ConclusionWhile this framework is a significant conceptual advance over what I used pre-viously in Chapter 1, there are two key areas where I believe important progresscould be made at an acceptable cost.The first concerns long-run trends in user behaviour. This paper assumes thatTwitter is static: there are no long-run trends in recipient sentiment. This requirestwo conditions to hold. First, there can be no changes in the network structureover time. Second, there can be no long-run trends in sender message volume orsentiment. If the first condition did not hold and β 6= 0, neither would the second:changes in the network structure would change the mix of messages senders wereexposed to, and produce persistent changes in sender behavior. If the second con-dition did not hold and β 6= 0, then persistent changes in sender behaviour wouldinduce persistent changes in recipient behaviour. In any case, both conditions arecounterfactual.I maintain this assumption partly because the instrumental variation I exploreis largely high-frequency and unsuitable for estimating long-run effects, and partlybecause even with this simplifying assumption, the data presents numerous com-plications. However, I concede that some of the most interesting questions in thisarea are questions about the long-run. For example, do exogenous changes in feedcontent cause recipients to change who they follow to return their feed content to itsprevious average? Does long-run exposure to a particular type of content sensitizeor inure recipients to that message? In principle, I think it is possible to expandthe framework to accommodate these issues. Specifically, I could use the long-runtrend of a sender’s message volume as an volume shifter to construct an instrumentfor long-run changes in feed content.However, this creates additional complications. For example, if senders exhibitpersistent changes in message volume, they may also exhibit persistent changesin message content. This implies the assumption that the average of a sender’ssentiment is a good proxy for their current sentiment may not hold, particularlyas the sample is extended the instrument back in time to points where the sender872.5. Conclusionis rate-limited. In other words, even the most parsimonious extension that allowsfor dynamics in network structure in a way that is internally consistent would be amajor undertaking.The second dimension that I would like to explore is variation in sender messagecontent over the short run. This chapter essentially ignores this variation becauseit is noisy. But it’s certainly possible in some contexts that recipients respond dif-ferently to shocks in message content than they do to changes in feed driven byvolume shifts.It may be possible to construct an instrument for sentiment shocks by using thestrategy outlined above to instrument for the sentiment of the senders of the senders.This would generate predicted values of sender sentiment that changed in the shortrun and were purged of measurement error. These predicted values could then beaggregated to construct an instrument for the shock to the recipient’s feed. This isnot as difficult as it appears — there is considerable overlap between the senders ofthe senders and the senders themselves, and so additional data collection would bemoderate.88Chapter 3Voter Demobilization: Estimating theImpact in Multi-district Elections3.1 IntroductionIn the past twenty five years, interest in the conduct of elections around the worldhas increased dramatically. There has been particular interest in determining whetherelections are “free and fair.” This has been accompanied by a doubling in the num-ber of countries holding regular elections. However, only half of the most recentnational elections held met the standard of being free and fair (Bishop and Hoeffler,2016). The international community has responded to this gap between standardsand performance by devoting significant resources to the monitoring of elections inwhich problems are anticipated. This typically involves sending election observersto the country in question. Some observers may arrive months before the election,and may assist with planning the election and training election officials, but thebulk of them arrive close to the election day. They monitor a random sample ofpolling stations, documenting (and attempting to prevent) irregularities at those sta-tions. This effort culminates with the tabulation of a “quick count” of votes fromthe observed polling stations where violations of electoral fairness like ballot stuff-ing were hopefully deterred. The announced results can then be compared to the“fair” quick count.However, observers have been crititized for being overly focused on what hap-pens on election day, and for not devoting sufficient attention to violations of demo-cratic norms that occur before the election day (see Carothers (1997) and Kelley893.1. Introduction(2010)). This is despite the fact that Bishop and Hoeffler (2016) find that undemo-cratic acts are more likely to take place in the leadup to the election (what they termfreeness violations) than during the actual voting and counting process (what theyterm fairness violations).These questions of legitimacy are more common in emerging democracies, butthey are not unknown to established ones. For example, states in the AmericanSouth put in place institutions such as poll taxes and literacy requirements to fos-ter African American disenfranchisement in the late 19th and early 20th century.More recently, 19 US states have enacted bills that require voters to prove citizen-ship or show a photo ID upon casting their ballot, or reduce early voting periodssince 2011. These measures were seen by many as a strategy to demobilize partsof the electorate (Bentele and O’brien, 2013). Other recent instances of vote sup-pression encompassed deliberate misinformation, often through automated phonecalls (“robocalls”). In the 2008 US Presidential elections, Democrats in Nevadareceived robocalls informing them that they could vote on November 5 – a day af-ter the election – to avoid long lines. At the same time, Hispanic voters in Nevadareceived messages saying that they could vote by phone. Similarly, in the 2010 gu-bernatorial election, approximately 50,000 registered Democrats in Maryland wereencouraged through automated phone calls to “relax and stay at home” althoughthe polls were still open.66 A 2006 investigation in Birmigham, England, uncov-ered systematic manipulations of a postal voting scheme in municipal elections, andFukumoto and Horiuchi (2011) document widespread fraud in voter registration inJapanese municipal elections.This chapter proposes a method for estimating the impact of specific violationsof electoral freeness that is novel and can be applied under reasonably broad circum-stances. Throughout this chapter, we focus on violations, instead of fraud, becausewhether a particular violation of electoral norms is fraud may depend on the local66Washington Post, November 5 2010, Election-night robocall tied to employee of political op-erative working for Ehrlich, retrieved January 11 2014. See also Barton (2011) for a review of thewide range of demobilization tactics that have been observed in the U.S., and further examples.903.1. Introductionlegal environment. 67 We emphasize that this strategy estimates the average impactof a specific violation. It cannot show that violations did or did not occurr in aspecific district, or identify the person or persons responsible for them. However,identifying this average impact is important for two reasons. First, it provides anestimate the impact of violations on the overall election results. Second, it indicateshow profitable similar violations are likely to be in the future. In general, any suchmethod will have to overcome two main challenges.First, violations generally do not occur randomly, and those that do (accidentsin the administration of elections) are less concerning than those that are part of adeliberate strategy to change the election outcome. This must be taken into accountwhen estimating the impact of a norm violation: consider a hypothetical electioncontested across districts, with reports of norms violations in some districts but notothers. To fix ideas, suppose that a supporter of party A engagedin malfeasance,and that the impact of the violation was to reduce turnout among supporters ofparty B. We cannot make a simple comparison between the results in districts whereviolations were reported to results in districts were they were not because the twotypes of districts are inherently different; in one, the violation was deemed to beworth committing, and in another, it was not.68However, if poll-level election results are available, what we can do is lookat the results within districts. If violations are orchestrated at the district level, theselection of districts where they occur will not be random. But since all votes withina district are equally pivotal, we argue that supporters of the non-offending partywithin a district where violations occur are equally likely to be affected by fraud.This in turn implies that the impact of the violations will be larger in polls wherethere are a higher proportion of the supporters of the non-offending party: if theviolations did not differentially affect the supporters of one party, there would be67Actions that discourage voters from turning out may be technically legal but still of significantconcern to international observers.68For a discussion of the incentives to engage in electoral manipulation, see Rundlett and Svolik(2016).913.1. Introductionno point in doing them.69 In short, the value to the incumbent of norms violationsvaries across districts, and using cross-district variation in violations to estimatetheir impact will be contaminated by selection bias. But the value of deterringvoters is constant within a district, so estimating the effect of fraud using within-district variation will lead to consistent estimates.The second challenge is that since violations are generally illegal (or at least un-savoury), those engaging in them typically make attempts to hide their activities. Asa result, data on the extent of voter suppression activities is almost always incom-plete. Reports of violations typically capture a small fraction of the total, and thereports themselves may not be random, even conditional on violations occurring.However, provided that the reports are not so biased as to be completely useless,the logic above for address selection on violations applies equally well to selectionon reports of violations. This means we can treat the incompleteness of reports asclassical measurement error - it will bias estimates towards zero, but it will still bepossible to estimate a lower bound for the impact of a norm violation.Our estimation strategy is applicable under the following fairly general condi-tions: a) parties compete over many districts in the sample election, b) districts aredivided into polls, c) election results are available for at least two elections at thepoll level, d) poll boundaries are reasonably persistent over time, and e) fraud isorchestrated and reported at the district level.70We also require that there is significant variation, both spatial and temporal, inreports of violations at the district level. That is, we need it to be true that for somedistricts, there were no reports of treatment in one election, and then there weresome reports in the subsequent election, or vice versa.71The remainder of the paper is organized as follows: Section 3.1.1 presents69If the norm violation was vote buying, the magnitude of the fraud would be proportional to thefraction of non-voters in each poll (or the combined fraction of non-voters and oppostion voters),but the essential logic of proportionality would still apply.70Some types of fraud, such as ballot stuffing, are plausibly orchestrated at the poll level, and ourmethod will not be appropriate for estimating their impact.71We require an additional condition for identification of the effect, discussed below.923.1. Introductionthe example we use to test this method and reviews related literature, Section 3.2discusses the data and our strategy, Section 3.3 presents the results, Section 3.4presents robustness tests, and Section 3.5 concludes.3.1.1 Application: 2011 Canadian Federal ElectionWe test our method on the results of the Canadian federal election, that was heldon May 2, 2011. The incumbent Conservative party won the election, and formed amajority government. On February 23 2012, news broke that Elections Canada, theindependent agency responsible for administering and monitoring Canadian elec-tions, was investigating complaints about alleged automated or live phone calls thathad attempted to suppress votes for opposition candidates. The investigation, aswell as the exposed voter demobilization tactics under scrutiny, subsequently re-ceived extensive media coverage, and became commonly known as the “RobocallScandal”. Although Elections Canada does not comment on ongoing investigations,it produced a report in 2013 as a response to the scandal which provided some in-formation about the incidents.72 In the report, the agency revealed that it had re-ceived numerous complaints from voters having received automated or real-personphone calls, purportedly from Elections Canada, falsely informing recipients thattheir polling station had moved (this message would have been particularly salientbecause Canadians may only vote at their local polling station on election day).Other complaints alleged numerous, repetitive, annoying, or sometimes aggres-sive live or automated calls, as well as calls made late at night. Following thedisclosure of the investigation in the media, the Elections Commissioner received atotal of over 30,000 communications regarding deceptive and harassing phone calls,including reports from 1,394 individual electors who recalled specific instances ofhaving received calls misinforming them with respect to their correct polling sta-tion, or harassing messages made on behalf of a contender in the election for local72See Elections Canada (2013), Preventing Deceptive Communications with Electors Officer ofCanada Following the 41st General Election.933.1. Introductionmember of parliament.73 The calls appear to have targeted supporters of the twolargest opposition parties, the Liberals and the New Democrats.Initially, 18 districts were under investigation, but after the media picked upthe story and the public learned about the probe, more voters came forward withcomplaints and the list of allegedly affected districts grew to 27 by February 26,2012. By mid-August 2012, complaints had been recorded from voters in over200 districts, according to court documents filed by the Commissioner of ElectionsCanada.74 In response to a legal challenge by a group of voters in six of thesedistricts, a Federal Court found that “widespread election fraud occurred” duringthe 2011 federal election, stating in its ruling that “there was an orchestrated effortto suppress votes during the 2011 election campaign by a person with access to the[Conservative Party’s] CIMS database.” Overall, however, the Federal court ruledthat the evidence was not quantitively significant enough to warrant overturning theConservative MPs’ mandated terms.75We have chosen this episode of voter suppression as an application for ourmodel for two reasons. First, this is an issue of significant interest to Canadian pub-lic. The scandal has received widespread media attention. But despite the scrutiny,and a two-year-long investigation, details about what happened have been scant.Elections Canada was able to confirm that 7,600 automated calls directing voters tothe wrong polling station were made in the district of Guelph, and charged a Con-servative Party communications staffer, Michael Sona, in connection with the calls.These calls resulted in only 68 complaints, or 4.8% of the total complaints madeto Elections Canada nationally. Prospects for successful investigations outside ofGuelph are poor; Elections Canada has acknowledged it is unlikely they will everbe able to produce an accurate account of the events in the days leading up to the73Over 70 % of those voters said the calls directed them to a wrong polling station, and roughly 50% of voters said that the callers, live or recorded, were claiming to emanate from Elections Canada.See Elections Canada (2013).74See National Post from August 21, 2012, Robocall Complaints About the Federal Election HaveDoubled Since March: Court Records. Retrieved January 11, 2014.75McEwing v Canada (Attorney General) 2013 FC 525 at para 184, 246.943.1. Introduction2011 federal election, or that there will be a satisfactory conclusion to the criminalinvestigation. In sum, it does not appear likely that non-statistical methods will beable to say much more about the extent or impact of this episode of electoral fraud.Second, this is a demanding test for our model. While the number of voterswho received suppressing phone calls may have been high in an absolute sense, thefraudulent activity was not so obvious that it attracted widespread attention as it wasoccurring. In addition, we do not observe treatment, only reports of treatment. Asindividuals engaged in fraudulent activity usually attempt discretion, this is likelyto be the case for any other situation where this method would be applied.Related LiteratureOur paper relates directly to two literatures: a literature on the detection of electoralfraud, and a related literature on the effectiveness of specific methods of influencingvoter turnout.Detection of Electoral Fraud The literature on detection of electoral fraud canbe divided into three strands. The first consists of observer-based methods for de-tecting fraud. These typically consist of sending observers to monitor elections incountries where it is expected that fraud may occur. Some observers may arrivemonths before the election, and may even help the local electoral commission planthe election. This literature may also include collecting reports, media or other-wise, from citizens of the country about electoral problems. These methods havethe added benefit that they may deter fraud, as well as detect it. There is mixedevidence onthe effectiveness of observers in discipling electoral participants. Hyde(2007) uses the random assignment of election observers in Georgia to show thatthe presence of election observers at a poll reduced the electoral support for thegovernment, and estimates the size of the reduction at 6%. However, Ichino andSchündeln (2012) implement a field experiment in Ghanaian elections to show thatparties may respond to observers by reallocating problematic activities from ob-953.1. Introductionserved polling stations to unobserved polling stations, and thus a direct comparisonof treated and untreated polls may overstate their effectiveness.In general, there are two problems with these methods. The first is that theyare expensive. The second is that they effectively rely on observers being able toaggregate all the fraudulent, irregular, or problematic behaviour they have seen intoa pass/fail metric. This is especially difficult without estimates of the likely impactof different types of fraud.A second body of work focuses on fraud where vote counts have been directlymanipulated (e.g. ballot stuffing). This work is based on the idea that humansare poor random number generators - and vote counts that have been altered tend todisplay unusual features, such as too many or too few zeros in the second significantdigit (see in particular Mebane (2012), and the references therein).Lastly, our paper is most closely related to the third strand of the literature on de-tecting electoral fraud, which is not limited to exposing directly manipulated counts.Fukumoto and Horiuchi (2011) use a natural experiment to show that fraudulent ad-dress changes are common in Japanese municipal elections, and estimate their im-pact on the results. Wand et al. (2001) employ multiple methods, including ordinaryleast squares, to detect anomalies in the vote for Buchanan in the 2000 Presiden-tial Election. The authors conclude that the butterfly ballot used in the county ofPalm Beach, FL, caused over 2,000 Democratic voters to mistakenly vote for PatBuchanan. Our paper differs from these studies because we look at polling stationlevel variation (without polling station level treatment).Klimek et al. (2012) also use poll-level (and district-level) variation in turnoutand incumbent support to analyze recent elections in Russia and Uganda. Theyshow that, for both countries, the relationship between turnout and incumbent vote-share scatterplot is suspiciously bimodal. In the Russian case, one of the modes is at100% turnout and 100% support for the incumbent. However, this method can alsoproduce false positives in the presence of strategic voting, although the approachhas been modified by Mebane and Kalinin (2014) to mitigate these issues. We963.1. Introductiondiffer methodologically from these papers in that we exploit the fact that Canadianpoll-level election results are highly persistent over time and control for past results,which increases the precision of our estimator.76Determinants of Voter Turnout There has been considerable research on thedeterminants of voter turnout and voter mobilization.77 Following the pioneeringwork of Gerber and Green (2000), a large body of work studies the efficacy of “GetOut the Vote” (GOTV) campaigns based on randomized field experiments.78 Onefinding that is consistent across a number of mobilization experiments is that onlypersonalized messages, delivered in person through live phone calls (Gerber andGreen, 2000) or door-to-door canvassing (Nickerson, 2006, Arceneaux and Nicker-son, 2010) are effective in mobilizing voters. In contrast, experiments testing im-personal GOTV methods such as mass email (Nickerson et al., 2007) and robocalls(Green and Karlan, 2006, Ramirez, 2005) find no statistically significant effects onvoter turnout.A second branch of the literature has focused on voter demobilization – themethods and messages used by key players in the electoral process to limit turnout,or to discourage specific (groups of) voters from voting. One main question of thisresearch has been whether or not negative campaigning depresses voter turnout.The evidence here is somewhat mixed. (Ansolabehere et al., 1994, 1999) find thatnegative campaigning significantly reduces turnout at the polls, while subsequentstudies reach more optimistic conclusions, finding no evidence in support of the de-mobilization hypothesis (see e.g. Clinton and Lapinski (2004)). Other work dealswith the historical effect of African American disenfranchisement in the South.Jones et al. (2012) measure the impact of both formal laws and informal modes of76The papers in this literature also make distributional assumptions (i.e. normality of key out-comes in the absence of fraud), which we do not. This additional structure allows for estimation ofthe incidence and severity of fraud without relying on reports of fraud, which our approach requires.They are also able to measure fraud in contexts we cannot, such as presidential elections.77See Geys (2006) for a comprehensive survey of the literature.78See their book Green and Gerber (2004) for additional information and further references.973.2. Data and Empirical Strategyvoter suppression (poll tax, literacy tests, lynching) on African American politicalparticipation, and find that having one more lynching in county in a year decreasesblack turnout by 3 to 6 percentage points. Further evidence in support of voterdisenfranchisement is presented in Naidu (2012), who identifies a causal link be-tween disenfranchisement and reductions in state level spending on black schools,as well as Cascio and Washington (2012), who show that the Voting Rights Act of1965 significantly increased in black voter participation and induced a shift in thedistribution of state aid toward localities with large black populations.The present paper is the first study on the impact of deliberate misinformation onvoter turnout using data from an actual election. Due to legal and ethical concerns,there have been no field-experiments conducted on whether or not intentionallymisleading voters has an effect, and if so, how large that effect is. The only otherrelated contribution we are aware of is Barton (2011), who reports on a framedfield experiment where participants in a “mock” gubernatorial election held on auniversity campus concurrently with the actual gubernatorial election were beingintentionally misinformed about the timing of the election. He shows that misin-formation regarding election timing reduces voter turnout by 50 percent relativeto a control group, but that warning voters of potential misinformation beforehandremoves this effect.3.2 Data and Empirical StrategyThe empirical analysis is based on a sample that includes the official election resultsfor the 41st Canadian General Election (May 2, 2011), and the results from theprevious three general elections as controls, the 40th General Election (October14, 2008), the 39th General Election (January 23, 2006), and the 38th GeneralElection (June 28, 2004 ).79 The data are available on the Elections Canada websitehttp://www.elections.ca/. For each election and electoral district, we obtained the79We do not use data from previous or subsequent elections because there were substantialchanges to district-level boundaries before 2004 and after 2011.983.2. Data and Empirical Strategynumber of votes for each candidate running in that district, as well as the numberof absentee ballots, the number of rejected (invalid) ballots, the total votes cast, andthe number of eligible voters. Importantly, those figures are broken down withinelectoral districts by polling station, which is central to the identification strategylaid out below. There are 308 districts in total.Ideally, of course, one would want to indicate ‘treatment’ using the set of dis-tricts where Elections Canada confirmed incidences of illegal activities, but thefull investigation has not been completed yet, and as mentioned above, ElectionsCanada has a policy of not disclosing information on ongoing investigations. Wetherefore can only make use of the information on complaints that have been madepublic in the media. With the exception of some reports that appear to have beeninitially investigated by Elections Canada (in a few districts, such as Guelph), thosecomplaints have not been officially verified. Relying on allegations as reported inthe press may lead to considerable measurement error in the data. We do not evenhave a list of the number of verified complaints by district. For this reason, weconfine ourselves to a list of 27 districts that was made publicly available throughvarious media and party websites in Canada relatively early into the probe (as ofFebruary 26, 2012). This list, which was apparently leaked from a source insideElections Canada, is primarily composed of districts where reports of robocallswere received before this issue became a national news story. Because media re-ports of districts where individuals came forward with their recollection of robocallsa week after the news broke are likely subject to even larger measurement error, thisearly list is likely to be much more reliable. As we will see below, the estimated ef-fect becomes smaller in value and insignificant if instead we use an extended list of102 districts where the media reported alleged robocalls as late as August 2012.80The dependent variable is voter turnout, defined as the percentage of the reg-istered voters who actually cast their vote in the 2011 federal election. Figure 3.1shows a geographic map of all 308 electoral districts boundaries, as well as a cate-80The complete names of all districts both the original list of 27, as well as the extended list of102, can be found in the Appendix.993.2. Data and Empirical StrategyFigure 3.1: Canadian Federal Electoral Districtsgorical breakdown of district-level voter turnout; the 27 districts on our treated listare highlighted in red.The descriptive statistics are summarized in Table 3.1.Apart from concerns pertaining to measurement error, the second challengewhen estimating the impact of possible misinformation voter turnout is that thedistricts that were (allegedly) subjected to the phone calls do not necessarily con-stitute a random sample. For example, one plausible selection criterion for anyonewho deliberately sought to suppress the vote is the expected margin of victory, i.e.,those districts where the race was expected to be close (and thus the impact of anycalls largest) could have been deliberately targeted. The data support this logic: theaverage winning margin for districts with no robocall-allegations was 10,903 votesor 22.8 percentage points. Ridings where allegations of impropriety have emerged,1003.2. Data and Empirical StrategyTable 3.1: District-Level Summary StatisticsRobocalls NoMean St. Dev. Mean St. Dev.Registered Voters 80,914 15,142 78,268 17,817Total Votes Cast 51,667 9,620 47,692 11,794Winning Margin (Votes) 8,734 7,040 12,752 8,774Number of Polling Stations 202 38.7 204 40.5District Turnout .642 .0467 .609 .057District Opponent Vote Share (2008) .558 .112 .542 .173Observations 27 279Note: Summary statistics of key variables at the electoral district-level in the 2011 federal election.Opponent vote share is the combined number of votes for parties other than the Conservatives,divided by the total number of votes cast.in contrast, had a margin of victory that was almost 28 percent lower: 8,719 votesor 16.3 percentage points. At the same time, we know from existing work thatsome form of “closeness” of the race has a significant and positive impact on voterturnout.81 Moreover, the treated districts are –by design or by chance – almost allurban in character; average income and education levels of the electorate are thusabove those of untreated districts, again increasing expected turnout. Even if therewas no causal effect of robocalls on turnout, we would therefore expect a higherturnout in the affected districts. Indeed, in the 2011 election, turnout in allegation-free districts was an average of 52.1 percentage points compared to 53.3 percent inrobocalled districts.The primary problem that we need to address is thus one of unobserved variablescorrelated with the selection of targeted districts that also impact voter turnout. Onestrategy that naturally presents given data availability is a difference-in-differencesapproach. Difference-in-difference estimates use pre-treatment differences in out-81The estimated size of the effect in the literature is such that an increase in closeness by onestandard deviation unit increases turnout rates by approximately 0.58 – 0.69 standard deviation unitson average. See Geys (2006) for a comprehensive survey.1013.2. Data and Empirical Strategycomes between treatment and control group to control for pre-existing differencesbetween groups. That is, they measure the impact of a treatment by the differencesbetween the treated group and the control group in the before-after differences inoutcomes. Applied to our context, we would compare the change in voter turnoutfrom the 2008 to the 2011 election in the affected districts (the treatment group)with the change in voter turnout in the unaffected districts (the control group),possibly controlling for other observable covariates at the district-level such as(lagged) margin of victory and changes in population demographics. This iden-tification method, however, essentially proxies the unobserved expected differencesin voter turnout between treatment and control group (absent treatment) with actualvoter turnout in the previous election. The exclusion restriction would thus be thatbetween the 2008 and 2011 elections, there was no change in those district charac-teristics on which voters and robocall initiators based their voting and robocallingdecisions. This is obviously would be a strong assumption.82For this reason, we employ a slightly different, and to our knowledge novel,identification strategy: instead of using between-district variation to identify theeffect of alleged misconduct, we use within district variation, taking advantage ofthe fact that Elections Canada breaks the results down at the level of the pollingstation for each district. Studying individual polling station outcomes within elec-toral districts has the advantage that we can employ district fixed effects, whichwill absorb any unobserved heterogeneity at the district level, including the – un-observed – estimated margin of victory just prior to the election and other districtlevel characteristics that may have changed between 2008 and 2011.83Figure 3.2 helps to illustrate our basic idea using the Ontario district of London82Section 3.4 below presents, among other robustness checks, the results using a classic DiDapproach, which yields a statistically significant effect of robocalls that is statistically significant,and twice as large in magnitude as the one we identify.83Naturally, using district fixed effect is also important because polling stations in the same dis-trict may be subject to common shocks, so their outcomes are likely correlated. Because treatment(robocall) status is also uniform within a district, the correlation in turnout outcomes may be mis-takenly be interpreted as an effect of the being robocalled. The district fixed effects eliminate thisconcern.1023.2. Data and Empirical StrategyNorth, on our treated list, as an example. The figure shows that there is consid-erable variation at the poll level within each district. This is true both for voterturnout (the map on the left) and, importantly, also for political affiliations of itsresidents (the map on the right): some neighbourhoods within a district tend to leantowards the Conservatives, while others are more inclined to vote for the Liberals,the New Democrats, the Bloc, or the Green Party. Within each district, the vot-ing stations with a higher fraction of non-Conservative voters will typically havea lower turnout rate than usual due to the fact that the Conservatives mobilizedtheir constituency more effectively (it was their best election result since 1988). InFigure 3.2, districts with dark green shading in the map on the bottom (indicatingmore non-Conservative voters) will on average have a lighter blue colour (indicatinglower turnout) in the map on the top, and vice versa. We should thus see a negativerelationship between the share of non-Conservative voters and voter turnout. Ourparameter of interest is the impact of a robocall on the (relative) probability that anon-Conservative voter votes.Without taking a position on who exactly the robocall instigators were, we as-sume they likely called only voters they believed to be supporting other parties (de-termining which voters support them and which do not is an important and uncon-troversial function of Canadian political campaigns, and a relatively large numberof people within a party have access to these records).84 This means proportionatelymore calls would go to polls with more voters the instigators believed to be opposi-tion supporters, and they should experience a more pronounced drop in turnout ofthose polls, relative to the district average, in the treated districts as compared to theuntreated districts. Consider the following, idealized estimating equation:84Elections Canada provides all candidates in a district with a list of registered voters for thatdistrict by address, with phone numbers included. Campaigns determine who their supporters areon these lists, either by telephone or door-to-door canvassing, and directly asking voters who theyplan to support. Campaigns then contact their supporters repeatedly on election day until the votervotes (confirmation of which is available from party observers stationed within polling stations).Even single contacts of this nature have been been shown to increase turnout by 7% (Green et al.,2003).1033.2. Data and Empirical StrategyYi jt = γOpp∗i jt+δ(Robocall j×Opp∗i jt\begin{landscape}E f f ectso fVolumeShi f tersonFeedSize)+θ jt+εi j.(3.1)The dependent variable Yi jt is the log of voter turnout (in percent) in the 2011federal election at polling station i in district j, where voter turnout is defined as theabsolute number of people voting at polling station i in district j, divided by the ab-solute number of registered voters for that polling station and district.85 Robocall jis a dummy for whether district j was robocalled in the 2011 election. Opp∗i jt is thefraction of voters in poll i of district j that the instigator believes to be oppositionparty supporters, which is unobserved by the researcher. The coefficient on the in-teraction term Robocall j×Opp∗i jt is the parameter of interest. We cannot estimatethis equation because we do not observe Opp∗i jt . However poll-level party supportis relatively persistent over time (the R-squared from a regression of 2011 opposi-tion vote share on 2008 opposition vote share at the poll level is 0.84). It is likelythat Opp∗i jt is close to Oppi jt−1, the observed fraction of opposition voters in thepoll in the previous election. 86After this replacement, we can estimate:Yi jt = γFracOppi jt−1+δ(Robocall j×Oppi jt−1)+β1Yi jt−1+β2Yi jt−2+θ jt + εi j(3.2)As before, the coefficient on the interaction term Robocall j×Oppi jt−1 is the pa-rameter of interest. The right-hand side controls are the combined vote share of allnon-conservative candidates at this polling station in the 2008 election, Oppi jt−1,log voter turnout at the same polling station in the 2008 federal election, Yi jt−1, andthe 2006 federal election, Yi jt−1, and electoral district fixed effects θ jt .85We take the log of the percent turnout, from 0 to 100, plus one.86To the extent that Oppi jt−1 is a noisy measure of the instigator’s beliefs, it will bias the estimatedeffects towards zero. We cannot use Oppi jt , the observed opposition vote share in the poll, as aproxy. This is for two reasons. First, it is affected by treatment. Second, it is implausible that theinstigator’s supporter lists are perfectly accurate.1043.2. Data and Empirical StrategyIf γ is negative, those polling stations with more non-Conservative voters expe-rienced a drop in voter turnout from the 2008 to the 2011 election, whereas turnoutat polling stations with more Conservative voters rose between the 2008 and the2011 election, relative to the district average, controlling for turnout rates in pre-vious elections for which we have data. The coefficient on the interaction term,δ , now measures whether this effect is stronger in districts affected by the robo-calls, i.e., whether the robocall indicator detects a differential impact. A negativeand significant value of δ thus indicates that the difference between how Conser-vative voters and voters with a different political orientation turned out at the pollswas larger in those districts that were allegedly targeted by calls directed to sup-press the (presumably non-Conservative) vote, controlling for voter affiliation andturnout rates in previous years. The identifying assumption in this strategy is thatthe incidence of robocalls is unrelated to the potential outcomes at polling stationsrelative to the district average, conditional on polling station turnout in the previ-ous elections and the fraction of non-Conservative voters. This assumption impliesthat in the absence of any misconduct, polls with similar (non-)Conservative voteshares within the same district should have seen a similar deviation from the districtaverage turnout, conditional on past deviations from the district average.The exclusion restriction would be violated if there were other factors that dif-ferentially affected turnout that were correlated with robocall reports. For example,if the Conservative campaigns in districts where robocalls were reported used legalmeans to dissaude opposition voters from voting (i.e. convincing them that their lo-cal candidate could not be trusted) that would confound the effect of robocalls. Weinvestigate this possibility in Section 3.4.2. Alternatively, if opposition voters haddemographic characteristics relative to the district average that affected their changein turnout, and those characteristics were differently distributed across treated andnon-treated districts, the exclusion restriction would be violated. If for example, thegap in age between opposition supporters and Conservative supporters was largerin districts where robocalls were reported, and younger people were differentially1053.2. Data and Empirical Strategyless likely to vote in 2011 relative to 2008, then declines in opposition voter turnoutthat should have been attributed to age would wrongly be attributed to robocalls.87We use the log of turnout in our base specification because the magnitude of thepredicted effect should be larger in polling stations with higher turnout (there aremore potential voters to deter from going to the polls) and taking logs adjusts forthis.Two treated districts were dropped from the analysis: the first is Portneuf-Jacques Cartier, where no Conservative ran in 2008. The other district is SaanichGulf-Islands, where fradulent robocalling was already reported in 2008.88 We alsodropped all advance, absentee, and mobile polls, where the logic of the identifica-tion strategy does not apply. Similarly, we drop a number of polling stations thatdo not match up from previous elections to 2011 because they have been split upin some way, or rejoined. Another potential problem is that there are substantialdifferences in the number of polling stations in each district. If we weight each ob-servation (polling station) equally, districts with more polling stations would havemore influence on the results. We address this problem by weighting the pollingstations so that within a district, the weight of each polling station is equal, andthat the sum of the weights for the polling stations in a district is the same for alldistricts.89We do not take a position whether the robocalling we investigate was coordi-nated at a central level or by individual Conservative candidates, or by individualsworking within specific campaigns without the knowledge of their candidate. Thesedifferent possibilities would likely imply different selection processes for robocallsat the district level. However, since the above estimation procedure is robust to87Future research will investigate this alternative.88If there had been reports of fraud in multiple districts in 2008 as well as 2011, we could haveestimated a model with one interaction term for fraud where fraud occured in 2011, and another forwhere fraud occurred in 2008. Identification would then require that the locations where fraud werereported be meaningfully different in the two elections.89In general, our results were very robust to alternative weighting functions, including droppingweights entirely. See also Section 3.4 below.1063.3. ResultsTable 3.2: Cross-Section Regression of Turnout at the District LevelCoefficient Robust Standard ErrorRobocalls Reported .00407 (.00377)District Turnout(2008) .644∗∗∗ (.0493)District Turnout(2006) .295∗∗∗ (.06)District Margin (2008, 10K) -.0396∗∗∗ (.0145)District Opponent Vote Share (2008) -.00139 (.00797)Observations 306Note: Superscripts ***, **, and * indicate significance at the 1%, 5%, and 10% level, respectively.district-level selection into treatment, we do not need to know the specifics of theselection procedure to know what were.3.3 ResultsTables 3.2 and 3.3 below reports the resulting parameter estimates.Table 3.2 shows the results of a simple cross-section regression, where 2011voter turnout in district j is explained by the robocall indicator variable (robocall),controlling for district characteristics through variables from the previous elec-tion(s): voter turnout (lagTurnout, lag2Turnout), the percentage margin of victory(lagMargin) and the combined percentage share of non-Conservative votes (lagOp-pvoteshare).We see that the dominant determinant of turnout in district j in the 2011 elec-tion was turnout in the same district in 2008. Turnout in the 2006 election, however,also explains some of the variation. The coefficient of the winning margin (close-ness of the election) in 2011 is negative, as expected. At the district level, politicalorientation does not matter much, as indicated by the small (negative) and statis-tically insignificant coefficient of the lagged opponent vote share. The coefficienton the robocall indicator, though insignificant, is positive. In other words, even ifwe proxy the expected margin of victory by its value in the last election, and con-1073.3. ResultsTable 3.3: Within District Regression of Turnout at the Poll LevelCoefficient Robust Standard ErrorOppi jt−1×Robocall -.0703∗∗ (.0304)ln(Turnout) (2008) .574∗∗∗ (.0151)ln(Turnout) (2006) .278∗∗∗ (.0146)Oppi jt−1 -.0987∗∗∗ (.0145)Observations 44750Note: Oppi jt−1 is the 2008 combined vote share of all opposition parties in poll i of district j. Thestandard errros reported in parentheses are clustered at the district level.. Superscripts ***, **, and* indicate significance at the 1%, 5%, and 10% level, respectively.trol for historical turnout, the treatment group still has higher turnout, on average,than the control group. However, As discussed earlier, this observed positive cor-relation could be driven by a change of characteristics of the district between 2008and 2011: it would emerge if, for instance, if treated districts had an expectationof closer races in 2011 than they had in 2008, and if voters in those districts weremore likely to go to the polls because they felt that their vote mattered more.Our identification strategy addresses this issue in a natural way through intro-ducing electoral district fixed effects, which absorb any (change in) unobserved dif-ferences at the district level. Table 3.3 presents results using our baseline regression3.2 that includes electoral district fixed effects. We see that at the level of a pollingstation, turnout in previous elections still matters most. The combined vote share ofthe non-Conservative candidates in the prior election is now an important determi-nant of voter turnout: those polling stations with higher margins for the candidatesrunning against the Conservative candidate in 2008 experienced a drop in turnout inthe 2011 election relative to the district average. The coefficient on the interactionterm signifies that this effect is more severe in districts with alleged misconduct. Inother words, relatively more voters from polling stations that were predominantlynon-Conservative stayed home in robocall districts. The point estimate of the pa-1083.3. Resultsrameter on the interaction term is .070, and has a p-value of 0.022.90 The estimateimplies that if we compared a (hypothetical) polling station with 100 percent non-Conservative votes in a specific district with another (hypothetical) polling stationwith 100 percent Conservative votes, in the same district, the former had 7 percent-age points less turnout in those districts where robocalls were reported, relative todistricts were robocalls were not reported.To better assess the magnitude of this effect, we can take the mean lagged com-bined vote share of all opponents to the Conservative candidate in the affected dis-tricts, which was 55.1 percent, and multiply it by the coefficient estimate.91 Theresulting figure of -3.85 gives an estimate of the reduction in voter turnout, mea-sured as a percentage, for the targeted districts. Using the fact that the averagetargeted district had 80,913 registered voters, this translates into an estimated ab-solute number of roughly fewer 3,000 votersper district, a substantial number. Ofthose districts on our list allegedly affected by robocalls, a total of seven had win-ning margins smaller than that. The lower bound of the 95% confidence intervalis still 450 fewer voters that did not vote in robocall districts on election day, anamount which is exceeds the winning margin in one affected district.To provide a first check on the plausibility of those estimates, it is instructiveto look at the district of Guelph, which is so far is the only district where the robo-call investigation of Elections Canada has disclosed quantitative evidence: follow-ing numerous complaints about telephone calls regarding voting location changes,Elections Canada was able to trace the calls back to a pay-as-you-go cell phone90We also bootstrap our standard errors as an alternative (non-parametric) method to measure theaccuracy of our estimates. To that end, we use the 279 districts where no robocalls were reported, andrandomly draw 27 existing districts (with replacement) to replace the dropped districts, and assignrobocalls to those districts in the new sample. Finally we calculate the parameter of interest and thet-statistics of our specification. This procedure is repeated 1000 times to obtain the joint distributionof the t-statistics for both years. The samples were unbiased in both elections. The estimatedboundary of the 95 % confidence for the 2011 interval occurs at t=1.70, which is considerably lowerthan the asymptotic value of 1.96. This is possibly because our strategy of clustering errors at thedistrict level is overly conservative.91Alternatively, we could use the mean lagged combined voteshare on the polling station level,which is also 55.1093.4. Robustness Checkswhich led the investigators to a voice broadcasting vendor. Records from the ven-dor show that 7,676 calls were made to Guelph phone numbers within 15 minuteson election day. The calls to electors were transmitted from the voice broadcastingvendor using VoIP (voice over Internet Protocol) calling technology.92 The list ofnumbers that were called is consistent with a list of non-supporters of the Conser-vative Party of Canada, obtained from that party’s database. The vendor charged atotal of $162.10 for these calls93 which illustrates how cheap the technology is. Ifwe take the confirmed number of 7,676 contacted household as a lower bound forthe quantitative incidence of the robocalls in Guelph, we can apply our parameterestimate to obtain an estimate of the maximal “success” rate of voter suppression.The average number of persons in a household in Guelph was 2.5 (2011 Censusof the Population). Depending on how many voters were were affected per con-tacted household, this would give us a maximal success rate that lies between 16percent (all household members) and 39 percent (one voter per household). As anupper bound, this does not seem to be unreasonable given Barton (2011) finds a 50percent success rate for voter misinformation.3.4 Robustness ChecksThis section present results from a range of sensitivity exercises that we carried outto ensure the robustness of the basic results; it also discusses possible weaknessesin the identification strategy. In addition, to address the concern of whether theexclusion restriction is satisfied, we perform a number of falsification tests.92VoIP calling is computer-generated calling over the Internet to recipients’ telephones. Thistechnology allows a voice broadcasting vendor to program into the call process any calling numberits client wishes to be displayed on a recipient’s call display. That number would have nothing to dowith the actual call made by the vendor.93See the Elections Canada report (2013). A former Conservative campaign worker is facing acriminal charges under the Canada Elections Act. The charge is listed under section 491(3)d of theCanadian Elections act, which prohibits preventing or trying to prevent a voter from casting a ballot.The maximum penalty is a $5,000 fine and five years in prison.1103.4. Robustness Checks3.4.1 Sensitivity AnalysisTable 3.4 displays the estimated coefficient of the robocall interaction term usingvariations of our baseline specification from Table 3.3. For ease of reference, row(1) repeats the results. As a first robustness check, we add an additional lag. Theestimated equation isYi j = δ1(Robocall j×Oppi jt−1)+δ2(Robocall j×Oppi jt−2)(3.3)γ1Oppi jt−1+ γ2Oppi jt−2+β1Yi jt−1+β2Yi jt−2+θ jt + εi jtThis specification uses two lags of opposition vote share as a proxy for the insti-gator’s beliefs about contemporaneous counterfactual opposition support instead ofone. In other words, we proxy for Opp∗i jt using both Oppi jt−1 and Oppi jt−2. Forease of comparison, we report the sum of δ1 and δ2, but calculate significance usingthe F-test that they are jointly zero, which is slightly more conservative (F = 3.65,p = 0.02). Removing lags, as we do in row (3), leaves the magnitude of the coef-ficient largely unchanged, but standard errors do increase slightly. The stability ofthe coefficient estimates indicates indicates that serial correlation in the error termsshould not be a major concern.94Next, we consider a modified difference-in-differences specification, estimatingYi j,t = α j,t + γOppi jt−1+δ(robocall jt×Oppi jt−1)+θ jt + εi j,t (3.4), where t = 2011,2008,2006. This equation states that in the absence of treatment,expected log turnout of poll i in district j at time t is determined by a time-variantdistrict effect θ j,t , as well as the composition of the electorate in that poll i j at94Also, the estimated coefficients on the lagged turnout from both the 2008 and the 2006 electionsare precisely measured and different from each other at the 1 percent level, suggesting that theunderlying data-generating process does not correspond to a difference-in-difference model. Wealso instrumented the 2011 opposition vote share and the interaction with the 2008 opposition voteshare and the interaction in our original regression. The point estimate became even more negativeand was significant at the 1% level. Further tests led us to reject both a fixed effects and AR(1)specifications for the evolution of turnout and opposition vote share.1113.4. Robustness ChecksTable 3.4: Sensitivity AnalysisEstimated δ Robust St.Err. # of Obs.(1) original specification −0.070∗∗ (.0307) 44,750(2) saturated lag structure −0.099∗∗ (.0384) 44,750(3) one lag of turnout −0.076∗∗ (.0327) 54,080(4) DiD specification −0.170∗∗ (.071) 144,015(5) weight polls by size −0.079∗∗ (.016) 44,750(6) restrict sample to Ontario −0.044 (.0303) 16,483(7) restrict sample to close races −0.109∗∗ (.0477) 9731(8) extended list of 102 ridings 0.005 (.0122) 44,750(9) current opposition vote share −0.091∗∗∗ (.0393) 44,751(10) absolute value of turnout −0.038∗∗ (.015) 44,750(11) gls with logit link −0.162∗∗∗ (.062) 44,742Note: All entries represent estimates from the specification 3.3, which includes district fixed ef-fects. The standard errors reported in parentheses are heteroskedasticity--robust, and clustered at thedistrict level. Superscripts ***, ** and * indicate significance at 1%, 5%, and 10% respectively.time t, as proxied by the corresponding non-Conservative vote share of the previouselection (time t−1). Note that since this specification includes district time-varyingdistrict fixed effects, the treatment effect only emerges through the interaction term.As can be seen from row (4), the corresponding estimate of the treatment effectalmost doubles, and remains significant at the 10% level.95The specification in row (5) adjusts the weight given to each poll, so the sum ofthe weights of the observations from each district is equal, while within a districtpolls are weighted in proportion to their number of voters. Either is consistent withour strategy of clustering errors at the district level; we are treating each district as asingle observation. Alternatively, we could continue to weight each district equally,but assign more weight to polling stations with more votes cast, as the larger samplesize in these polls reduces the variance of outcomes. 9695In contrast to our baseline regression (1), the difference-in-differences specification does notallow for time-dependent variation of turnout within a district other than through the political affili-ation of the electorate.96Dropping weights altogether also does not alter our findings. See the previous version of thispaper for details.1123.4. Robustness ChecksThe next two rows (6) and (7) address the possibility that the effect we measurepicks up a trend that is specific to districts in Ontario, or to districts where the racewas close, defined as belonging to the lowest quantile of winning margins (below5,000 votes in total). For instance, it could have been the case that the provincialarm of a federal party in opposition was involved in a scandal between the 2008 and2011 election, which may have prompted more non-Conservative voters to stay athome on election day in Ontario relative to other provinces. Since most districts weidentify as treated are in Ontario, comparing the outcome in those districts to elec-toral districts in other parts of the country would falsely attribute part of this effectto our robocall measure. Similarly, close races could have discouraged opposition-leaning individuals from going to the polls. By restricting the entire sample todistricts with those respective characteristics, we ensure that the untreated observa-tions are comparable in this dimension, at the cost of losing observations.97Row (6) shows that the magnitude of the effect is slightly lower in the Ontario,and the coefficient is no longer significant. This is not unexpected, because treat-ment is imperfectly observed. By isolating our comparison to districts that weregeographically close to treated districts, we are both decreasing the sample size andincreasing the likelihood that our comparsion districts were treated, but it was notinitially reported. In the restricted sample with closest races shown in (7), the abso-lute value of the estimated effect increases and is still significant at the 5% level.98However, the same is not true if we use the latest (extended) list of districts thathad reported robocalling, as can be seen in row (8). In late March, the media hadcompiled a list of 102 affected electoral districts. These complaints came from elec-tors directly contacting the media, since Elections Canada would not disclose any97In Appendix A to this study, we document the results from using a somewhat more sophisticatedmethod, namely propensity score matching, to address this problem more generally. The objectivewas to to determine whether the findings still hold when we mandate that the control group be assimilar as possible to the treatment group, where ‘similar’ was determined by a propensity score.98Indeed, a closer look at districts with close races reveals that opposition voters turned out ingreater numbers there, ceteris paribus. This is of course consistent with the increase in δ in absolutevalue when restricting the sample to districts with small winning margins. If close races induce abias, it is likely to be positive and we would thus be underestimating the effect.1133.4. Robustness Checksinformation pertaining to the scandal. As discussed earlier, therefore, one wouldnaturally presume there is considerably more measurement error in these later re-ports, and this outcome is thus not surprising. 99Row (9), row (10), and row (11) present alternative variables and alternativefunctional forms. Row (9) reports the results of proxying for counterfactual op-position vote share using actual opposition vote share. Because current oppositionvote share is affected by treatment, the resulting coefficient will be biased if the trueeffect is non-zero (and the direction of the bias is not clear). However, the test thatthe effect of treatment is zero will have the correct size under the hypothesis thatthere is no effect, and we see that the result is significant. Row (10) replaces loggedturnout with turnout on both sides of the equation and reports the OLS results, whilerow (11) makes the same substitution but reports the result of using a generalizedleast squares model with a logit link, a specification that is appropriate for datawhere the dependent variable is a proportion. The results remain significant, andthe coefficient estimates are the same or larger in magnitude. 10099Obtained from The Sixth Estate on March 29, 2012, Irregularities Reported in 200 RidingsDuring 2011 Election, retrieved April 15, 2012.100 As an additional robustness check, we verified that our results are not being driven by districtsthat are geographically distant from those that we flagged as being robocalled. Since approximately8.9% of polls are in robocalled districts in our sample, we retain in the sample only polls in robo-called districts and the 9% of polls that have the shortest distance to these robocalled districts.Distance between polls is measured as the number of polls one would have to cross to get to a robo-called poll (queen distance). We use this distance measure because geodesic distance is confoundedby urban/rural differences. This measure is negative within districts that were robocalled, and posi-tive outside of those districts. There are 5161 polls within districts that were robocalled (we had touse the subset of polls that match to available shape files). This reduced sample gives a point esti-mate of -0.046 with a standard error of 0.0158 (the t-statistic is -2.93) for the parameter of interestin our main regression. Since there were 5160 polls within 3 polls of the border to a robocalled dis-trict, we restrict the sample to all polls within robocalled districts and those polls outside robocalleddistricts within 3 polls of the border. The results are a point estimate of -0.051 with a standard errorof 0.026 (the t-statistic is -2.33), which is significant at the 5 % level. The standard errors are thusslightly larger, which is not surprising considering the procedure removes 4/5th of the sample.1143.4. Robustness Checks3.4.2 Campaign IntensityOne possible concern with our estimation strategy is that the regressions may bepicking up some form of unobserved campaigning “intensity”. In particular, sup-pose the electoral districts that were allegedly targeted by illegal phone calls werealso experiencing more campaigning efforts by legal means. As mentioned earlier,the studies that have tried to link (negative) campaigning with turnout have beeninconclusive overall, so there is little sound evidence by which to go on. Generally,though, it is conceivable that more voters with affiliations to the Liberal Party or theNew Democratic Party were discouraged from going to the polls in districts wherethe Conservative candidate spent more on campaign advertising and canvassing etc.If the Conservative spending – for whatever reason – is correlated with the robocallindicator, estimates based on a model that does not include campaign finance datawould be biased upward.For this reason, we reran our main specification with an interaction term that al-lows turnout to decline more in polling stations with a larger opposition vote shareas a function of campaign spending. The results are listed in Table 4. We see thatcontrolling for campaign spending of Conservative candidates leaves both the mag-nitude and the significance of the coefficient on the robocall indicator unaffected,which is reassuring. There is also no detectable differential effect of spending onhow many (opposition-leaning) voters were discouraged from going to the polls.The coefficient on the interaction term of the share of non-Conservative votes inthe 2008 and campaign spending is very small and not significantly different fromzero.1013.4.3 Falsification TestsAssuming that the alleged incidences of robocalling actually took place, it seemslikely that they were targeted in some way. Due to the district-specific fixed ef-101We only report our baseline specification here, but this finding was very robust with regard tovarious alternatives models.1153.4. Robustness ChecksTable 3.5: Controlling for Campaign IntensityCoefficient Robust standard errorTurnouti jt−1 0.575∗∗∗ (.0150)Turnouti jt−2 0.279∗∗∗ (.0146)Oppi jt−1 −0.101∗∗∗ (.0159)Oppi jt−1×Robocall −0.067∗∗ (.0316)Oppi jt−1×Cspending −0.00002 (.0007)District Fixed Effects YesNumber of Observations 44,750Note: The standard errors reported in parentheses are clustered at the district level. Superscripts***, ** and * indicate significance at 1%, 5%, and 10% levels, respectively.fects, our estimation strategy allows for the fact that whoever was behind the callscould have been directing the misinformation towards (opposition) voters in partic-ular districts. In principle, it also can accommodate a selection of targeted votersthat reside in particular polls within a district, provided that this selection was notbased on a poll-specific characteristic that is correlated with the error term. Moregenerally, our identification strategy provides an unbiased estimate of the treatmenteffect only if in the absence of any robocall allegations, polling stations across alldistricts would have experienced similar turnout changes as a function of their ob-served characteristics from the 2008 (and 2006) elections, relative to the districtaverage. One possibility that could invalidate this assumption is that polling sta-tions were targeted based on characteristics correlated with turnout. For example,suppose voters differ in their propensity to go to the polls on election day, and thatthis characteristic varies across the electorate of different polling stations (e.g., be-cause it varies across voters with different demographics). If the instigator of thoserobocalls somehow targeted those polling stations,102 the robocall indicator wouldpick up this effect - we would conclude that robocalls cause a drop in turnout rela-102During the scandal, there were suggestions in the media that elderly people were specificallytargeted. See The Toronto Star from March 09, 2012, Robocalls: Older voters targeted by electionday phone calls, Elections Canada believes, retrieved April 11 2014.1163.4. Robustness ChecksTable 3.6: Falsification Tests Using the 2008 and 2006 ElectionsCoefficient Robust standard ErrorA. 2008 General ElectionTurnouti jt−1 0.538∗∗∗ (.015)Turnouti jt−2 0.379∗∗∗ (.014)Oppi jt−1 −0.084∗∗∗ (.020)Oppi jt−1×Robocall 0.007 (.086)District Fixed Effects YesNumber of Observations 45,183B. 2006 General ElectionTurnouti jt−1 0.696∗∗∗ (.011)Oppi jt−1 0.056∗∗∗ (.016)Oppi jt−1×Robocall 0.054 (.039)District Fixed Effects YesNumber of Observations 45,949Note: The standard errors reported in parentheses are clustered at the district level. Superscripts***, ** and * indicate significance at 1%, 5%, and 10% levels, respectivelytive to the district average, when in fact the drop would have occurred even in theabsence of robocalls because those polling stations happen to have an electoratethat is of the easily-to-demobilize kind, and they were selected for that very reason.To gain some insight on whether or not our robocall variable may be correlatedwith unobserved, time-invariant characteristics of polling stations (as opposed todistricts), we reran the baseline regression in Table 3.6, using turnout from the 2008and 2006 general election as our main dependent variable as a falsification test. Theestimated coefficients are reported in Table 3.6.The findings do not support evidence of a statistically significant relationshipbetween the turnout at specific polls and the robocall identifier in previous elections.The robocall variable does not touch on statistical significance in either year andthe point estimate is positive in both years, suggesting that there are no systematic1173.5. Conclusiontrends in voter turnout that are correlated with our main right-hand side variable ofinterest. Separate regression relating changes in turnout from the 2008 to the 2011election to the corresponding changes in turnout from the 2006 to the 2008 electionalso indicate that there is no (time-invariant) time trend in turnout.1033.5 ConclusionThis paper has investigated allegations of attempted voter demobilization in thecontext of the Canadian 2011 federal election. In 27 of the 308 districts, voters al-legedly received automated phone calls containing false information on the locationof their election site, or harassing them in the name of one of the contestants. Theresults suggest that, on average, voter turnout in those districts affected by the de-mobilization efforts was significantly lower than in the districts where no automatedphone calls have been reported. The point estimate gives a decrease of 7 percentagepoints in the number of opposition voters turning out. While this may be a reason-able estimate of the effect of misinformation in the 2011 Canadian federal election,care should be interpreted in applying the specific value to other contexts, becausethe effectiveness of voter misinformation is strongly negatively related to the degreeto which voters expect it (Barton, 2011).On the other hand, we believe the method outlined in this paper for detect-ing these effects has broad applicability to other contexts and other violations ofdemocratic norms, provided that elections are contested in small districts, poll level103See our discussion paper for a more in-depth discussion as well as additional robustness checksusing alternate control groups determined by propensity score matching. In summary, the estimatedeffects are not fully robust to the details of the matching procedure, and the pre-selection of thesample. The main problems with matching estimators in our context are that a) our sample ofonly 27 affected districts out of 306 is quite small, implying that whether a particular observationis included (matched) or excluded (not matched) can change the coefficient and standard errorssignificantly, and b) we do not observe “treatment” with certainty. The latter means that if we usea binary variable — whether or not a district appears on a list of robocalled electoral districts thatwas published early in the scandal – as a proxy for actual treatment, the probit regressions we useto calculate a district’s propensity score provide an alternative estimate of the probability of beingtreated.1183.5. Conclusionresults are available and useful, there is a benchmark of past election results to com-pare to, and there is sufficient spatial and temporal variation in reports of normsviolations. We also believe that electoral malfeasance is likely to shift away frompoll-specific technologies (e.g. ballot stuffing) to more geographically diffuse ones,such as robocalls. This is because automated methods such as robocalls or falseinformation spread by viral marketing campaigns are cheaper, require fewer peopleto participate in the manipulation, and are more difficult to trace back to their in-stigator. The technique that we lay out in this chapter points to a way forward forestimating the effects of these types of manipulations.However, this does not imply that fraud can be determined automatically, or onthe basis of a rule. Care must be taken to evaluate reports of violations, and discardthose that are not credible or that were made in response to media attention aboutthe violations instead of the violations themselves; a failure to do this in our casewould have led to a null finding. Researchers must also have detailed informationabout the specific example they are investigating. In our case, our results dependon a constant treatment probability among opposition voters within treated districts(a plausible assumption when voters are contacted by phone), but other violationsmay have a different distribution of treatment. However, if researchers understandwhat these impacts are likely to be, it should be possible to tailor our method toother circumstances.1193.5. ConclusionFigure 3.2: The District of London North, ON120BibliographyDaron Acemoglu and James A Robinson. A theory of political transitions. Ameri-can Economic Review, 91(4):938–963, 2001.Daron Acemoglu, Asuman Ozdaglar, and Ali ParandehGheibi. Spread of (mis)information in social networks. Games and Economic Behavior, 70(2):194–227,2010.Daron Acemoglu, Tarek A Hassan, and Ahmed Tahoun. The power of the street:Evidence from egypt’s arab spring. Technical report, National Bureau of Eco-nomic Research, 2014.Marc Adam, Matthias Gamer, Jan Krämer, and Christof Weinhardt. Measuringemotions in electronic markets. 2011.Stephen Ansolabehere, Shanto Iyengar, Adam Simon, and Nicholas Valentino.Does attack advertising demobilize the electorate? American political sciencereview, 88(4):829–838, 1994.Stephen D Ansolabehere, Shanto Iyengar, and Adam Simon. Replicating exper-iments using aggregate and survey data: The case of negative advertising andturnout. American Political Science Review, 93(4):901–909, 1999.Helmut Appel, Jan Crusius, and Alexander L Gerlach. Social comparison, envy,and depression on facebook: a study looking at the effects of high comparisonstandards on depressed individuals. Journal of Social and Clinical Psychology,34(4):277–289, 2015.121BibliographySinan Aral and Christos Nicolaides. Exercise contagion in a global social network.Nature communications, 8:14753, 2017.Kevin Arceneaux and David W Nickerson. Comparing negative and positive cam-paign messages: Evidence from two field experiments. American Politics Re-search, 38(1):54–83, 2010.Abhijit Banerjee, Arun G Chandrasekhar, Esther Duflo, and Matthew O Jackson.Gossip: Identifying central individuals in a social network. Technical report,National Bureau of Economic Research, 2014.Jared Barton. Keeping out the vote: An experiment on voter demobilization. Tech-nical report, 2011.Patrick Baylis. Temperature and temperament: Evidence from a billion tweets.Energy Institute at HAAS working paper, 2015.Patrick Baylis, Nick Obradovich, Yury Kryvasheyeu, Haohui Chen, LorenzoCoviello, Esteban Moro, Manuel Cebrian, and James H Fowler. Weather im-pacts expressed sentiment. arXiv preprint arXiv:1709.00071, 2017.Mark R Beissinger. The semblance of democratic revolution: Coalitions inukraine’s orange revolution. American Political Science Review, 107(3):574–592, 2013.Keith G Bentele and Erin E O’brien. Jim crow 2.0? why states consider and adoptrestrictive voter access policies. Perspectives on Politics, 11(4):1088–1116, 2013.Sylvia Bishop and Anke Hoeffler. Free and fair elections: A new database. Journalof Peace Research, 53(4):608–616, 2016.Johan Bollen, Bruno Gonçalves, Guangchen Ruan, and Huina Mao. Happiness isassortative in online social networks. Artificial life, 17(3):237–251, 2011a.122BibliographyJohan Bollen, Huina Mao, and Xiaojun Zeng. Twitter mood predicts the stockmarket. Journal of computational science, 2(1):1–8, 2011b.Johan Bollen, Bruno Gonçalves, Ingrid van de Leemput, and Guangchen Ruan. Thehappiness paradox: your friends are happier than you. EPJ Data Science, 6(1):4,2017.Robert M Bond, Christopher J Fariss, Jason J Jones, Adam DI Kramer, CameronMarlow, Jaime E Settle, and James H Fowler. A 61-million-person experiment insocial influence and political mobilization. Nature, 489(7415):295–298, 2012.Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.Erik Brynjolfsson and JooHee Oh. The attention economy: measuring the value offree digital services on the internet. 2012.Elias Canetti. Crowds and power. Macmillan, 1962.David Card and Gordon B Dahl. Family violence and football: The effect of unex-pected emotional cues on violent behavior. The Quarterly Journal of Economics,126(1):103–143, 2011.Thomas Carothers. The observers observed. Journal of Democracy, 8(3):17–31,1997.Rich Caruana and Alexandru Niculescu-Mizil. An empirical comparison of super-vised learning algorithms. In Proceedings of the 23rd international conferenceon Machine learning, pages 161–168. ACM, 2006.Daniel L Chen and M Loecher. Events unrelated to crime predict criminal sentencelength. 2016.Joshua D Clinton and John S Lapinski. "targeted" advertising and voter turnout:An experimental study of the 2000 presidential election. Journal of Politics, 66(1):69–96, 2004.123BibliographyLorenzo Coviello, Yunkyu Sohn, Adam DI Kramer, Cameron Marlow, MassimoFranceschetti, Nicholas A Christakis, and James H Fowler. Detecting emotionalcontagion in massive social networks. PloS one, 9(3):e90315, 2014.Antonio R Damasio. Descartesâ error: Emotion, rationality and the human brain.1994.Dick P Dee, SM Uppala, AJ Simmons, Paul Berrisford, P Poli, S Kobayashi, U An-drae, MA Balmaseda, G Balsamo, d P Bauer, et al. The era-interim reanalysis:Configuration and performance of the data assimilation system. Quarterly Jour-nal of the royal meteorological society, 137(656):553–597, 2011.Amy M Do, Alexander V Rupert, and George Wolford. Evaluations of pleasurableexperiences: The peak-end rule. Psychonomic Bulletin & Review, 15(1):96–98,2008.Ahmet Onur Durahim and Mustafa Cos¸kun. # iamhappybecause: Gross nationalhappiness through twitter analysis and big data. Technological Forecasting andSocial Change, 99:92–105, 2015.Emile Durkheim. The elementary forms of the religious life [1912]. na, 1912.Daniel Eisenberg, Ezra Golberstein, Janis L Whitlock, and Marilyn F Downs. So-cial contagion of mental health: evidence from college roommates. Health eco-nomics, 22(8):965–986, 2013.Ruben Enikolopov, Alexey Makarin, and Maria Petrova. Social media and protestparticipation: Evidence from russia. 2016.Carolyn E Fick. The making of Haiti: The Saint Domingue revolution from below.Univ. of Tennessee Press, 1990.Alessandro Flammini and Filippo Menczer. Approximating pagerank from in-degree. In Algorithms and Models for the Web-Graph: Fourth International124BibliographyWorkshop, WAW 2006, Banff, Canada, November 30-December 1, 2006, RevisedPapers, volume 4936, page 59. Springer, 2008.James H Fowler and Nicholas A Christakis. Dynamic spread of happiness in alarge social network: longitudinal analysis over 20 years in the framingham heartstudy. Bmj, 337:a2338, 2008.Kentaro Fukumoto and Yusaku Horiuchi. Making outsiders’ votes count: Detectingelectoral fraud through a natural experiment. American Political Science Review,105(3):586–603, 2011.Matthew Gentzkow, Jesse M Shapiro, and Matt Taddy. Measuring polarization inhigh-dimensional data: Method and application to congressional speech. Tech-nical report, National Bureau of Economic Research, 2016.Alan S Gerber and Donald P Green. The effects of canvassing, telephone calls,and direct mail on voter turnout: A field experiment. American political sciencereview, 94(3):653–663, 2000.Benny Geys. Explaining voter turnout: A review of aggregate-level research. Elec-toral studies, 25(4):637–663, 2006.Ashish Goel, Kamesh Munagala, Aneesh Sharma, and Hongyang Zhang. A note onmodeling retweet cascades on twitter. In International Workshop on Algorithmsand Models for the Web-Graph, pages 119–131. Springer, 2015.Austan Goolsbee and Peter J. Klenow. Valuing consumer products by the timespent using them: An application to the internet. American Economic Review,96(2):108–113, May 2006. doi: 10.1257/000282806777212521. URL http://www.aeaweb.org/articles?id=10.1257/000282806777212521.Donald Green and Dean Karlan. Effects of robotic calls on voter mobilization.Unpublished Manuscript. Yale University, 2006.125BibliographyDonald P Green, Alan S Gerber, and David W Nickerson. Getting out the votein local elections: Results from six door-to-door canvassing experiments. TheJournal of Politics, 65(4):1083–1096, 2003.Shannon Greenwood, Andrew Perrin, and Maeve Duggan. Social media update2016. Pew Research Center, 11, 2016.Yosh Halberstam and Brian Knight. Homophily, group size, and the diffusion ofpolitical information in social networks: Evidence from twitter. Journal of PublicEconomics, 143:73–88, 2016.Andrew J Healy, Neil Malhotra, and Cecilia Hyunjung Mo. Irrelevant events affectvoters’ evaluations of government performance. Proceedings of the NationalAcademy of Sciences, 107(29):12804–12809, 2010.Philip N Howard, Aiden Duffy, Deen Freelon, Muzammil M Hussain, Will Mari,and Marwa Maziad. Opening closed regimes: what was the role of social mediaduring the arab spring? 2011.Susan D Hyde. The observer effect in international politics: Evidence from a naturalexperiment. World Politics, 60(1):37–63, 2007.Nahomi Ichino and Matthias Schündeln. Deterring or displacing electoral irregu-larities? spillover effects of observers in a randomized field experiment in ghana.The Journal of Politics, 74(1):292–307, 2012.Atsushi Inoue and Gary Solon. Two-sample instrumental variables estimators. TheReview of Economics and Statistics, 92(3):557–561, 2010.Matthew O Jackson. Social and economic networks. Princeton university press,2010.Daniel B Jones, Werner Troesken, and Randall Walsh. A poll tax by any othername: The political economy of disenfranchisement. Technical report, NationalBureau of Economic Research, 2012.126BibliographyJeffrey P Kahn, Effy Vayena, and Anna C Mastroianni. Opinion: Learning aswe go: Lessons from the publication of facebook’s social-computing research.Proceedings of the National Academy of Sciences, 111(38):13677–13679, 2014.Judith Kelley. Election observers and their biases. Journal of democracy, 21(3):158–172, 2010.Junghyun Kim, Robert LaRose, and Wei Peng. Loneliness as the cause and theeffect of problematic internet use: The relationship between internet use andpsychological well-being. CyberPsychology & Behavior, 12(4):451–455, 2009.Peter Klimek, Yuri Yegorov, Rudolf Hanel, and Stefan Thurner. Statistical detectionof systematic election irregularities. Proceedings of the National Academy ofSciences, 109(41):16469–16473, 2012.Adam DI Kramer, Jamie E Guillory, and Jeffrey T Hancock. Experimental evidenceof massive-scale emotional contagion through social networks. Proceedings ofthe National Academy of Sciences, 111(24):8788–8790, 2014.Hanna Krasnova, Helena Wenninger, Thomas Widjaja, and Peter Buxmann. Envyon facebook: A hidden threat to users’ life satisfaction? 2013.Robert Kraut, Michael Patterson, Vicki Lundmark, Sara Kiesler, Tridas Mukophad-hyay, and William Scherlis. Internet paradox: A social technology that reducessocial involvement and psychological well-being? American psychologist, 53(9):1017, 1998.Ethan Kross, Philippe Verduyn, Emre Demiralp, Jiyoung Park, David SeungjaeLee, Natalie Lin, Holly Shablack, John Jonides, and Oscar Ybarra. Facebookuse predicts declines in subjective well-being in young adults. PloS one, 8(8):e69841, 2013.127BibliographyRavi Kumar, Jasmine Novak, and Andrew Tomkins. Structure and evolution ofonline social networks. In Link mining: models, algorithms, and applications,pages 337–357. Springer, 2010.Kevin Lewis, Marco Gonzalez, and Jason Kaufman. Social selection and peer in-fluence in an online social network. Proceedings of the National Academy ofSciences, 109(1):68–72, 2012.By Livy. The early history of rome (books iv). 1960.Christopher T Lloyd, Alessandro Sorichetta, and Andrew J Tatem. High resolutionglobal gridded data for use in population studies. Scientific data, 4:170001, 2017.Marco Manacorda and Andrea Tesei. Liberation technology: mobile phones andpolitical mobilization in africa. 2016.Jason Mander. Gwi social summary. Global Web Index, 2016.John S McClelland. The Crowd and the Mob (Routledge Revivals): From Plato toCanetti. Routledge, 2010.Miller McPherson, Lynn Smith-Lovin, and James M Cook. Birds of a feather:Homophily in social networks. Annual review of sociology, 27(1):415–444, 2001.Walter Mebane and Kirill Kalinin. Geography in election forensics. 2014.Walter R Mebane. Second-digit tests for voters’ election strategies and electionfraud. In Annual Meeting of the Midwest Political Science Association, Chicago,IL, April, pages 11–14. Citeseer, 2012.Alan Mislove, Sune Lehmann, Yong-Yeol Ahn, Jukka-Pekka Onnela, and J NielsRosenquist. Understanding the demographics of twitter users. ICWSM, 11(5th):25, 2011.128BibliographyJoel Mokyr. The market for ideas and the origins of economic growth in eighteenthcentury europe. Tijdschrift voor Sociale en Economische geschiedenis, 4(1):3,2007.Seth A Myers, Aneesh Sharma, Pankaj Gupta, and Jimmy Lin. Information networkor social network?: the structure of the twitter follow graph. In Proceedings ofthe 23rd International Conference on World Wide Web, pages 493–498. ACM,2014.Suresh Naidu. Suffrage, schooling, and sorting in the post-bellum us south. Tech-nical report, National Bureau of Economic Research, 2012.Gary B Nash. The urban crucible: The northern seaports and the origins of theAmerican revolution. Harvard University Press, 2009.David W Nickerson. Volunteer phone calls can increase turnout: Evidence fromeight field experiments. American Politics Research, 34(3):271–292, 2006.David W Nickerson et al. Does email boost turnout. Quarterly Journal of PoliticalScience, 2(4):369–379, 2007.Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Globalvectors for word representation. In Empirical Methods in Natural Language Pro-cessing (EMNLP), pages 1532–1543, 2014. URL http://www.aclweb.org/anthology/D14-1162.Matthew Pittman and Brandon Reich. Social media and loneliness: Why an insta-gram picture may be worth more than a thousand twitter words. Computers inHuman Behavior, 62:155–167, 2016.Plato, Stephen Halliwell, Plato, and Stephen Halliwell. Republic 5. Aris & PhillipsWarminster, UK, 1988.129BibliographyBei Qin, David Strömberg, and Yanhui Wu. The political economy of social mediain china. Technical report, Working paper, 2016.Ricardo Ramirez. Giving voice to latino voters: A field experiment on the effective-ness of a national nonpartisan mobilization effort. The Annals of the AmericanAcademy of Political and Social Science, 601(1):66–84, 2005.Alan S Reifman, Richard P Larrick, and Steven Fein. Temper and temperature onthe diamond: The heat-aggression relationship in major league baseball. Person-ality and Social Psychology Bulletin, 17(5):580–585, 1991.Scott Rick and George Loewenstein. The role of emotion in economic behavior.Handbook of emotions, 3:138–158, 2008.Ashlea Rundlett and Milan W Svolik. Deliver the vote! micromotives and macrobe-havior in electoral fraud. American Political Science Review, 110(1):180–197,2016.Klaus R Scherer. What are emotions? and how can they be measured? Socialscience information, 44(4):695–729, 2005.Holly B Shakya and Nicholas A Christakis. Association of facebook use with com-promised well-being: a longitudinal study. American journal of epidemiology,185(3):203–211, 2017.Cosma Rohilla Shalizi and Andrew C Thomas. Homophily and contagion are gener-ically confounded in observational social network studies. Sociological methods& research, 40(2):211–239, 2011.Adam Smith. The theory of moral sentiments, volume 1. J. Richardson, 1822.Betsey Stevenson and Justin Wolfers. Subjective well-being and income: Is thereany evidence of satiation? The American Economic Review, 103(3):598–604,2013.130Francesco Trebbi and Eric Weese. Insurgency and small wars: Estimation of unob-served coalition structures. 2015.Jonathan N Wand, Kenneth W Shotts, Jasjeet S Sekhon, Walter R Mebane,Michael C Herron, and Henry E Brady. The butterfly did it: The aberrant votefor buchanan in palm beach county, florida. American Political Science Review,95(4):793–810, 2001.Emily Weinstein. Influences of Social Media Use on Adolescent Psychosocial Well-Being: ’OMG’ or ’NBD’? PhD thesis, 2017.131Appendix AEffects on Recipient VolumeThis section repeats the same exercise for recipient volume as the dependent vari-able that was done for recipient sentiment in Section 1.5.These results are important for two reasons. First, they present suggestive evi-dence as to the reasons that Twitter users participate on the social network, and whythose reasons may have been misunderstood in the public discourse around Twitter.Second, they are needed to calculate how sensitive Twitter users are to a positiveshock to external sentiment.OLS RegressionsTable A.1 presents OLS results for a regression of recipient volume on sender sen-timent and sender volume. Recipient volume is positively associated with sendervolume. Recipient volume is negatively related to sender sentiment. In the simplestspecification, (1), a shift of 10% of the sender messages from neutral to positivedecreases recipient tweets by 2%. This is consistent with the aphorism “if it bleeds,it leads”: people are more active on Twitter when the general climate is less positiveTable A.2 presents IV results for a regression of recipient volume on sendersentiment. Relative to the OLS results, the sign is reversed. Recipients are morelikely to send messages when they receive positive messages. The estimated co-efficients are always positive and strongly statistically significant. They are alsolarger in magnitude than the OLS estimates. In the preferred specification, (4), ashift of 10% of the messages from neutral to positive increases recipient tweets by90%. This specification is preferred to maintain consistency with the regressionsof sentiment on sentiment. It implies that while users are more active on Twitter132Appendix A. Effects on Recipient VolumeTable A.1: OLS: Recipient Volume on Feed Sentiment and Feed Volume(1) (2) (3) (4)nrt nrt nrt nrtFSentrt (Imp) -0.0921∗∗∗ 0.0259∗∗∗ -0.1227∗∗∗ -0.0094∗∗∗(0.00) (0.00) (0.01) (0.00)FVolrt 0.0602∗∗∗ 0.0365∗∗∗ 0.0591∗∗∗ 0.0517∗∗∗(0.00) (0.00) (0.00) (0.00)Observations 17214831 17214831 17214831 17214831User Fixed Effects X XTime Fixed Effects X XLinear regression of recipient sentiment on feed sentiment and feed volume. FSent Imp. is theaverage sentiment in the feed of each user at the current time, where missing periods are imputedusing the most recent available lag. FSize is the average of the log of number of tweets plus onefrom each sender.***, **, and *: significance at 0.1%, 1%, and 5% levels, respectively. Errors are clustered at theuser level unless time fixed effects are included and user fixed effects are not. In this case they areclustered at the datetime-bin level. Standard errors are in parentheses. For convenience, all variablesare defined and explained in Table 1.1.at times when less positive sentiments are expressed, they respond more to positivesentiment. It also suggests that part of the reason users participate on Twitter is thatthey are looking for emotional stimulus.IV RegressionsTable A.3 presents IV results for a regression of recipient volume on sender volume.The estimated coefficients are always positive and strongly statistically significant.They are also larger in magnitude than the OLS estimates. In the preferred specifi-cation, (2), an increase in sender tweets of 20% increases recipient tweets by 22%.This specification is preferred to maintain consistency with the regressions of senti-ment on sentiment. The OLS estimates may be larger than the IV estimates becauseof measurement error - when the instrument predicts that people are tweeting more,they are putting more effort into the tweets they do send, and the observed measureof volume does not capture this increase in the volume of information.133Appendix A. Effects on Recipient VolumeTable A.2: IV: Recipient Volume on Feed Sentiment(1) (2) (3) (4)nrt nrt nrt nrtFSentrt (Imp) 0.9659∗ 0.7576∗ 0.3912∗∗∗ 0.9238∗(0.45) (0.37) (0.04) (0.46)FLightESentrt -0.1403 0.1320∗ -0.0961∗∗∗ 0.0632(0.16) (0.06) (0.02) (0.06)F∆Sentrt -0.1824 -0.1787 -0.1349∗∗∗ -0.3567∗(0.10) (0.10) (0.01) (0.14)Avg. FVol 0.0905 -0.3691∗∗∗ 0.0763∗∗∗ 0.0116(0.05) (0.07) (0.00) (0.06)RcpLightrt 0.0025 -0.0089∗∗∗ -0.0047∗∗∗ -0.0059∗∗∗(0.01) (0.00) (0.00) (0.00)Rcp. Avg Volume 1.0245∗∗∗ 0.0000 0.9993∗∗∗(0.01) (.) (0.00)Observations 17214568 17214568 17214568 17214568IV AR stat. 5.14 4.41 108.80 4.35Rcv. TOD Dummies X X X XUser Fixed Effects X XTime Fixed Effects X X2SLS regression of recipient volume, nrt on feed sentiment, FSentrt using the interaction of senderlight and sender long-run average sentiment,FLightSentrt , as the instrument.Controls: FELightESentrt is average feed light multiplied by average feed sentiment. F∆Sentst isthe shock to the sentiment of the recipient’s feed at time t. RcpLightrt is the recipient light level andRcpESentr is the average sentiment of the recipient. Rcp. TOD Dummies are controls for the localtime of the recipient and sender time-zone controls. AR Chi Squared reports the Anderson-Rubinweak instrument-robust Chi Squared statistic.***, **, and *: significance at 0.1%, 1%, and 5% levels, respectively. Errors are clustered at theuser level unless time fixed effects are included and user fixed effects are not. In this case they areclustered at the datetime-bin level. Standard errors are in parentheses. For convenience, all variablesare defined and explained in Table 1.1.134Appendix A. Effects on Recipient VolumeTable A.3: IV: Recipient Volume on Sender Volume(1) (2) (3) (4)nrt nrt nrt nrtFVolrt 0.2334∗∗∗ 0.2236∗∗∗ 0.5674∗∗∗ 0.4872∗∗∗(0.02) (0.02) (0.03) (0.09)FSentrt 0.0503∗∗∗ 0.0377∗∗∗ 0.0501∗∗∗ 0.0196∗(0.01) (0.00) (0.00) (0.01)RcpLightrt -0.0023∗ -0.0026∗ -0.0060∗∗∗ -0.0060∗∗(0.00) (0.00) (0.00) (0.00)Rcp. Avg Volume 1.0201∗∗∗ 0.0000 1.0698∗∗∗(0.00) (.) (0.00)Observations 13610625 13610625 13610625 13610625IV AR stat. 105.31 104.21 918.20 43.05Rcv. TOD Dummies X X X XUser Fixed Effects X XTime Fixed Effects X X2SLS regression of recipient sentiment, nrt on feed volume, FVolrt using average light experiencedby the senders in the feed at time t, FLightrt , as the instrument.Controls: F∆Sentst is the shock to the sentiment of the recipient’s feed at time t, FSentst is thesentiment of the recipient’s feed at time t, RcpLightrt is the recipient light level and RcpESentris the average sentiment of the recipient. Rcp. TOD Dummies are controls for the local time ofthe recipient and sender time-zone controls. AR Chi Squared reports the Anderson-Rubin weakinstrument-robust Chi Squared statistic.***, **, and *: significance at 0.1%, 1%, and 5% levels, respectively. Errors are clustered at theuser level unless time fixed effects are included and user fixed effects are not. In this case they areclustered at the datetime-bin level. Standard errors are in parentheses. For convenience, all variablesare defined and explained in Table 1.1.135Appendix BProof of Consistency of CorrectedEstimatorI would like to run “regular” two stage least-squares. I proceed by looking for a Krtsuch that if ZortKrt is chosen as the instrument, and XortKrt is the explanatory variable,we can consistently estimate β . By construction,βˆ =(KrtXo′rt KrtZo′rt(KrtZo′rt KrtZort)−1KrtZo′rt KrtXort)−1KrtXo′rt KrtZo′rt(KrtZo′rt KrtZort)−1KrtZo′rt YrtSince Krt is a scalar, this is equivalent to:βˆ =(Xo′rt Zo′rt(Zo′rt Zort)−1Zo′rt Xort)−1Xo′rt Zo′rt(Zo′rt Zort)−1Zo′rt K−1rt YrtI need to split apart Yrt :Yrt = βXortSort +βXurtSurt + εrtYrt = βXortSort +β (δ1ZurtSurt +µurtSurt)+ εrtYrt = βXortSort +β (δ1 (α1Zort)Surt +µurtSurt)+ εrtYrt = βXortSort +β(α1 (XortSort−µortSort) SurtSort+µurtSurt)+ εrtYrt = βXortSort +β(α1 (XortSort−µortSort) SurtSort+µurtSurt)+ εrtYrt = β (Sort +α1Surt)Xort +β (α1µortSurt +µurtSurt)+ εrtPlugging this into X, we get:βˆ =(Xo′rt Zo′rt(Zo′rt Zort)−1Zo′rt Xort)−1Xo′rt Zo′rt(Zo′rt Zort)−1Zo′rt K−1rt Zo′rt (β0 (Sort +α1Surt)Xort)+β (α1µortSurt +µurtSurt)+ εrtIf K = (Sort +α1Surt),βˆ = β + Dˆ−1β (α1µortSurt +µurtSurt)+ εrtAll terms but the first go to zero by the assumptions made in equation 2.26.136Appendix CNonlinear Model ResultsTable C.1: Nonlinear Model ResultsVariables Robocall RobocallMargin of Victory 2008 -3.06e-05 -2.31e-05(2.55e-05) (2.61e-05)Opposition Vote Share 2008, %, 2008 0.0347*** 0.0378**(0.0134) (0.0188)Turnout 2008, % -0.0743 0.000239(0.0504) (0.0829)log(Con. Expenses), 2011 1.299** 1.950**(0.603) (0.852)Pop. 65 and over, % 0.175*** 0.144**(0.0510) (0.0633)Unemployment Rate, % 0.0648 0.0338(0.0564) (0.0919)Post-secondary Education, % -0.0336 -0.0361(0.0315) (0.0396)Home ownership, % -0.0382*** -0.0491***(0.0123) (0.0188)Median Income 0.000178*** 0.000162***(5.22e-05) (6.09e-05)St. Dev. of Income -2.62e-05 -1.21e-05(0.000171) (0.000192)Distance to Robocalled District -0.448*** -0.149(0.0995) (0.183)Observations 306 135Province Fixed Effects No YesRobust standard errors in parentheses*** p<0.01, ** p<0.05, * p<0.1137Appendix DMatchingAs a final robustness test of our findings, we also ran our main specification regres-sions using alternate control groups. The objective was to to determine whether theresults still hold when we mandate that the control group be as similar as possibleto the treatment group, where ‘similar’ was determined by a propensity score. Thelatter was created with the k-nearest neighbour matching method, using on a richset of controls to predict the incidence of robocalls, including the winning margin,turnout, and opposition vote share in the previous election, as well as conservativecampaign spending, district population statistics based on income, age structure,and education, and a leave-one-out estimator of the distance to the nearest robo-called district. Varying k = 1,2,3, we then reran the main regression in Table 3.3,effectively reweighing the untreated observations with the weights assigned to themby the propensity score. In summary, while we find substantial common support forthe treated group (robocalled ridings) and the matched untreated group (ridings notour list), the estimated effects are not fully robust to the details of the matchingprocedure, and the pre-selection of the sample. If we use the full sample of 279untreated ridings to create the matched control group, the point estimates for ourprimary parameter of interest range is between -0.002 (standard error = 0.044, fork = 2) to -0.098 (standard error = 0.050, for k = 1). If we restrict the matchingto Ontario, estimates remain negative but are not statistically significant, consistentwith the Ontario-only results without matching in Table 3.There are two main problems with matching estimators in our context. First, oursample of only 27 affected ridings out of 306 is quite small. Because of small sam-ple size, whether a particular observation is included (matched) or excluded (not138Appendix D. Matchingmatched) can change the coefficient and standard errors significantly. Second, andperhaps more importantly, we do not observe “treatment” with certainty. Instead,we use a binary variable — whether or not a riding appears on a list of robocalledelectoral districts that was published early in the scandal – as a proxy for actualtreatment, i.e., the exposure to misleading phone calls. If the latter were deliber-ately targeted, the probit regressions we use to calculate a riding’s propensity scoreprovide an alternative estimate of the probability of being treated, which is based ona wide range of observable variables, including some that are highly significant inthe regression (such as spending by the Conservative candidate and the fraction ofthe population over 65). It is quite possible, then, that the propensity score providesadditional information regarding actual treatment, and may even predict the prob-ability of actual exposure to robocalls better than the binary measure of treatmentwe employ. In this case, of course, the main identifying assumption of matchingestimators would be violated.139Appendix EElectoral District ListsList of 27 electoral districts where Elections Canada received reports of false ormisleading phone calls during the 2011 General Election, as released by interimLiberal Party leader Bob Rae on February 26 2012.1041. Sydney-Victoria (N.S.): Winner: Liberals; Margin of victory: 765 votes2. Egmont (P.E.I.): Winner: Conservatives; Margin of victory: 4,470 votes3. Eglinton-Lawrence (Ont.): Winner: Conservatives; Margin of victory: 4,062votes4. Etobicoke Centre (Ont.): Winner: Conservatives; Margin of victory: 26 votes5. Guelph (Ont.): Winner: Liberals; Margin of victory: 6,236 votes6. Cambridge (Ont.): Winner: Conservatives; Margin of victory: 14,156 votes7. Hamilton East-Stoney Creek (Ont.): Winner: NDP; Margin of victory: 4,364votes8. Haldimand-Norfolk (Ont.): Winner: Conservatives; Margin of victory: 13,106votes9. Kitchener-Conestoga (Ont.): Winner: Conservatives; Margin of victory: 17,237votes10. Kitchener-Waterloo (Ont.): Winner: Conservatives; Margin of victory: 2,144votes11. London North Centre (Ont.): Winner: Conservatives; Margin of victory:1,665 votes104Source: Yahoo news.See http://ca.news.yahoo.com/blogs/canada-politics/robocall-scandal-could-lead-elections-202108363.html140Appendix E. Electoral District Lists12. London West (Ont.): Winner: Conservatives; Margin of victory: 11,023 votes13. Mississauga East-Cooksville (Ont.): Winner: Conservatives; Margin of vic-tory: 676 votes14. Niagara Falls (Ont.): Winner: Conservatives; Margin of victory: 16,067 votes15. Oakville (Ont.): Winner: Conservatives; Margin of victory: 12,178 votes16. Ottawa Orleans (Ont.): Winner: Conservatives; Margin of victory: 3,935votes17. Ottawa West-Nepean (Ont.): Winner: Conservatives; Margin of victory: 7,436votes18. Parkdale-High Park (Ont.): Winner: NDP; Margin of victory: 7,289 votes19. Perth-Wellington (Ont.): Winner: Conservatives; Margin of victory: 15,420votes20. Simcoe-Grey (Ont.): Winner: Conservatives; Margin of victory: 20,599 votes21. St. Catharines (Ont.): Winner: Conservatives; Margin of victory: 13,598votes22. St. Paul’s (Ont.): Winner: Liberals; Margin of victory: 4,545 votes23. Sudbury (Ont.): Winner: NDP; Margin of victory: 9,803 votes24. Wellington-Halton Hills (Ont.): Winner: Conservatives; Margin of victory:26,098 votes25. Willowdale (Ont.): Winner: Conservatives; Margin of victory: 932 votes26. Saint Boniface (Man.): Winner: Conservatives; Margin of victory: 8,423votes27. Winnipeg South Centre (Man.): Winner: Conservatives; Margin of victory:8,544 votesList of 102 electoral districts with some additional information where accordingto media sources and reports from the Liberal Party of Canada or the New Demo-cratic Party of Canada, voters received misleading or harassing phone calls during141Appendix E. Electoral District Liststhe 2011 General Election. Dated March 29, 2012.1051. Ajax–Pickerin, reported by Hill Times.2. Ancaster-Dundas-Flamborough, reported by National Post.3. Barrie, reported by National Post (calls impersonating Elections Canada anddirecting voters to bogus polling locations).4. Bas-Richelieu-Nicolet-Becancour, reported by National Post.5. Beaches–East York, reported by Globe & Mail.6. Beausejour, reported by CBC.7. Brampton West, reported by National Post8. Burnaby-Douglas, reported by Burnaby Now (calls impersonated ElectionsCanada and misdirected voters).9. Burnaby-New Westminster, reported by Burnaby Now.10. Calgary Centre, reported by Hill Times.11. Cambridge, reported by private citizen (Postmedia: "harassing phone calls").12. Cariboo–Prince George, reported by Prince George Citizen (harassment calls).13. Chilliwack-Fraser Canyon, reported by National Post.14. Davenport, reported by NDP.15. Don Valley East, reported by National Post.16. Dufferin–Caledon, reported by Orangeville Banner.17. Edmonton Centre, reported by NDP (CBC: phone calls misdirected voters towrong polling stations).18. Edmonton East, reported by NDP (fake live calls impersonating ElectionsCanada, misdirecting voters. Postmedia: some live calls originally claimedto be from Elections Canada, then when pressed, said they were actuallyfrom a Conservative call centre.)19. Eglinton-Lawrence,reported by Liberals (Fake Liberal calls targeted Jewishvoters on Saturdays, and even accidentally phoned the Liberal riding phone105Source: The Sixth Estate. See http://sixthestate.net/?p=3646142Appendix E. Electoral District Listsbank, which has sworn out an affidavit.)20. Egmont, reported by Liberals (Postmedia: live callers pretended to representLiberal candidate, but mispronounced his name).21. Elmwood-Transcona, reported by NDP (A formal complaint has been sent toElections Canada over phone calls claiming voting locations had changed.)22. Esquimalt-Juan de Fuca, reported by campaign volunteer to Sixth Estate (overnightcalls impersonating the Liberal Party).23. Essex, reported by NDP (National Post: robocalls misdirected voters).24. Etobicoke Centre, reported by Liberals (a court case will begin in April tohear allegations that Conservatvies temporarily shut down a polling stationand harassed Liberal voters. See also Global News).25. Fredericton, reported by private citizen (CBC: Phone number connected tothe Conservative Party attempted to misdirect voters to wrong polling sta-tion).26. Guelph, reported by Liberals. Guelph is the centre of most of the allegations;this riding received widespread reports of both hoax night-time phone callsclaiming to be Liberals, and election-day calls claiming voting locations hadchanged.)27. Haldimand-Norfolk reported by Liberals (Postmedia: harassing overnightcalls impersonated the Liberal Party)28. Haliburton–Kawartha Lakes–Brock, reported by Liberal candidate.29. Halton, reported by Elections Canada: election-day robocalls misdirectedvoters.30. Hamilton Centre, reported by NDP.31. Hamilton East-Stoney Creek, reported by Liberals32. Kelowna-Lake Country, reported by Conservatives33. Kingston and the Islands, reported by Liberals (CBC: Callers impersonatingLiberal Party misdirected voters to wrong voting locations on election day.)34. Kitchener Centre, reported by voting officer ("a lot" of electors were called143Appendix E. Electoral District Listsand told their polling stations had changed).35. Kitchener Waterloo, reported by Elections Canada36. Kitchener-Conestoga, reported by private citizen (election-day robocalls mis-directed voters).37. Lac Saint Louis, reported by Liberals (Cyberpresse: Voters received misdi-rection calls.)38. Lanark-Frontenac-Lennox and Addington, reported by National Post.39. London North Centre, reported by Liberals (Postmedia: Telephone campaignfalsely informed listeners that the Liberal candidate spent half of each year inAfrica).40. London West, reported by Liberals (Local radio: MP3 recording of an allegedhoax robocall attempting to misdirect a voter).41. Malpeque, reported by PEI Guardian (misleading calls).42. Markham-Unionville, reported by NDP, reported by National Post.43. Mississauga East-Cooksville, reported by Liberals44. Mississauga-Streetsville, reported by National Post.45. Mount Royal, reported by Liberals. CBC (election-day robocalls misdirectedvoters).46. Nanaimo-Alberni, reported by NDP (Parksville News: phone calls misdi-rected voters).47. Nanaimo-Cowichan, reported by Sixth Estate (calls misdirecting voters).48. New Westminster-Coquitlam, reported by Royal City Record.49. Niagara Falls, reported by Liberals (Postmedia: overnight callers imperson-ated Liberal Party)50. Nipissing Timiskaming, reported by Liberals (CBC: Calls impersonating Elec-tions Canada misdirected voters to the wrong locations.)51. North Vancouver, reported by private citizen (Postmedia: election-day robo-calls misdirected voters.)144Appendix E. Electoral District Lists52. Northumberland-Quinte West, reported by Trentonian (misleading and ha-rassing calls).53. Oak Ridges-Markham, reported by National Post.54. Oakville, reported by Liberals (Postmedia: callers with "fake accents" pre-tended to represent Liberal candidate.)55. Ottawa Centre, reported by NDP.56. Ottawa Orleans, reported by Liberals (OpenFile: election-day robocalls im-personated Elections Canada and misdirected voters. Ottawa Citizen: fakecallers misdirected voters.)57. Ottawa-Vanier, reported by CBC (misdirection calls and harassment calls).58. Ottawa West-Nepean, reported by Liberals (Postmedia: election-day callsmisdirected voters).59. Outremont, reported by Sixth Estate.60. Parkdale-High Park, reported by Liberals and by NDP IPostmedia: overnightcallers impersonated the Liberal Party. National Post: robocalls misdirectedvoters).61. Perth-Wellington, reported by Liberals.62. Peterborough, reported by Conservatives.63. Pierrefonds-Dollard, reported by Liberals (CBC: Election-day calls misdi-rected voters).64. Pitt Meadows-Maple Ridge-Coquitlam, reported by private citizen (CBC:Conservative call centre contacted a woman who had previously told themshe would be voting NDP, and told her that her polling station had changed.)65. Prince George–Peace River, reported by Elections Canada (election-day robo-calls misdirected voters).66. Regina-Lumsden-Lake Centre, reported by private citizen (election-day callsmisdirected voters).67. Richmond Hill, reported by Liberal Party (misdirection calls).68. Saanich-Gulf Islands, reported by Greens (See also Maclean’s. Toronto Star:145Appendix E. Electoral District Listselection-day live calls misdirected voters.)69. Saint Boniface, reported by Liberals (Postmedia: callers impersonated theLiberal Party).70. Saint John, reported by private citizen (CBC: calls impersonated ElectionsCanada and misdirected voters).71. Sarnia-Lambton, reported by Sun Media (RMG telephone calls misdirectedvoters to the wrong polling station)72. Saskatoon-Rosetown-Biggar, reported by Council of Canadians.73. Sault Ste Marie, reported by National Post.74. Scarborough Southwest, reported by National Post.75. Scarborough-Rouge River, voting irregularities reported by Conservative Party.76. Simcoe-Grey, reported by Liberals.77. South Shore-St. Margaret’s, reported by NDP (Chronicle-Herald: election-day robocalls misdirected voters).78. St. Catharines Conservatives by 8822 Conservatives by 13,598 Reported byLiberals. National Post: alleges live calls misdirect voters.79. St. Paul’s, reported by Liberals (National Post: robocalls misdirect voters).80. Sudbury, reported by Liberals and NDP.81. Sydney-Victoria, reported by Liberals (Chronicle Herald: fake Liberals andanonymous robocallers misdirected voters).82. Timmins-James Bay, reported by private citizen.83. Trinity-Spadina, reported by National Post.84. Thunder Bay-Superior North (CBC: calls misdirect voters to wrong pollingstations).85. Toronto-Danforth, reported by CBC (misleading calls).86. Vancouver Centre, reported by private citizen (misleading call).87. Vancouver East, reported by NDP to Elections Canada in June 2011.88. Vancouver Island North, reported by CHEK TV (election-day calls misdi-146Appendix E. Electoral District Listsrected self-identified NDP and other voters).89. Vancouver Kingsway, reported by National Post90. Vancouver Quadra, reported by Liberals Postmedia: Late-night phone callsimpersonated Liberal Party.91. Vancouver South, reported by Liberals (CBC: overnight phone calls)92. Vaughan, reported by iPolitics.ca (financial misconducted and campaign ir-regularities).93. Wascana, reported by Liberals (Global News: overnight live calls).94. West Nova, reported by CBC (election-day calls misdirected voters to nonex-istent polling locations).95. Willowdale, reported by Liberals (CBC: Calls impersonated Liberal Party).96. Windsor West, reported by Liberals (Windsor Star: "similar" phone calls toother ridings).97. Windsor-Tecumseh, reported by NDP.98. Winnipeg Centre, reported by private citizens (Winnipeg Free Press: electionday robocalls misdirected voters).99. Winnipeg South, reported by NDP.100. Winnipeg-South Centre, reported by Liberals (National Post: robocalls andlive calls misdirected voters).101. York Centre, reported by National Post (misleading calls).102. Yukon, reported by CBC (calls with false information about polling stations).147


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items