UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

North Korea : cyber threat perception and metadata analysis Chah, Niel 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2014_september_chah_niel.pdf [ 527.11kB ]
JSON: 24-1.0165997.json
JSON-LD: 24-1.0165997-ld.json
RDF/XML (Pretty): 24-1.0165997-rdf.xml
RDF/JSON: 24-1.0165997-rdf.json
Turtle: 24-1.0165997-turtle.txt
N-Triples: 24-1.0165997-rdf-ntriples.txt
Original Record: 24-1.0165997-source.json
Full Text

Full Text

North Korea: Cyber Threat Perception and Metadata Analysis    by Niel Chah  B.A., The University of British Columbia, 2012     A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF  THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF ARTS   in   THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES  (Political Science)     The University of British Columbia  (Vancouver)     August 2014   © Niel Chah, 2014       ii  Abstract  Since the turn of the century, the increasing relevance of the Internet and non-traditional security concerns has been visible in the East Asian context. On the Korean peninsula, there have been starkly different approaches to cyberspace. South Korea, a developed economy and liberal democracy has made significant strides in adopting the Internet while its northern counterpart still remains largely unconnected. In such a context, this paper uses metadata and big data sources to delve into the American threat perception of North Korean cyberspace. Recent trends indicate that the American government and media have a growing interest in cyber security issues. As the target of historical North Korean cyber attacks, the United States should have considerable interest in the cyber attack capabilities of North Korea. A theoretical framework on threat perception is used to estimate that the American threat perception of North Korean cyber capabilities is high. However, an analysis of data that was collected with Python scripts and web APIs shows that the American government and media often associate the threat from North Korea with nuclear weapons and ballistic missiles rather than cyber warfare. As a result, the use of big data and metadata technologies reveal nuances in the American threat perception of North Korea. For the United States, North Korea’s cyber attack capabilities should be seen as an emerging threat in objective terms, but nuclear weapons and missile capabilities still dominate in threat perceptions.      iii  Preface  This dissertation is the original, unpublished, independent work of the author, Niel Chah.        iv  Table of Contents  Abstract ........................................................................................................................................... ii Preface............................................................................................................................................ iii Table of Contents ........................................................................................................................... iv List of Figures ................................................................................................................................. v Acknowledgments.......................................................................................................................... vi Dedication ..................................................................................................................................... vii Introduction ..................................................................................................................................... 1 The Metadata Trends ...................................................................................................................... 3 Theoretical Framework: Threat Perception .................................................................................... 6 Characteristics of Cyber Insecurity ................................................................................................. 9 Attribution and Responsibility ................................................................................................ 9 Reporting Rates ..................................................................................................................... 10 Cyber Attack Types .............................................................................................................. 11 North Korea Cyber Threat Perception .......................................................................................... 14 Damage Capability................................................................................................................ 14 Probability ............................................................................................................................. 17 Resulting Perceptions.................................................................................................................... 20 Data Collection and Methodology ................................................................................................ 21 American Media Coverage: The New York Times .............................................................. 21 American Government Publications ..................................................................................... 25 Discussion: Metadata Analysis ..................................................................................................... 29 Reconsidering Threat Perception .................................................................................................. 33 Conclusion: North Korean Cyber Threat ...................................................................................... 35 Conclusion: Metadata, Methodologies, and Potential .................................................................. 37 Works Cited .................................................................................................................................. 40     v  List of Figures  Fig. 1. Frequency of Terms in American English Corpus 2009 ..................................................... 4 Fig. 2. Frequency of Search Terms on Security.............................................................................. 5 Fig. 3. Perception of Existential Threat .......................................................................................... 7 Fig. 4. North Korean Cyber Threat Perception Possibility 1 ........................................................ 20 Fig. 5. Frequency of Search Terms on North Korean Security Concerns .................................... 31 Fig. 6. North Korean Cyber Threat Perception Possibility 2 ........................................................ 34     vi  Acknowledgments I offer my gratitude to the faculty, staff, and students in the Department of Political Science at the University of British Columbia.     vii  Dedication  To my family, CHC, KSO, BC, LC, and AK.          1  Introduction Since the emergence of cyberspace in the late 1980s and early 1990s, the digital domain has become increasingly more relevant for political scientists in the field of security studies. An estimation of the likely damage from North Korea’s cyber capabilities suggests that the American perception of threat should be high. However, an analysis of data from government and media publications indicates that the threat from North Korean cyber attacks is not seen as a significant concern. This paper will show that such data leads to a more nuanced understanding of the American threat perception of North Korea: the American threat perception of North Korea is primarily focused on the country’s nuclear weapons and ballistic missiles rather than its cyber capabilities. The North Korean cyber threat is best characterized as an emerging concern for the United States.  On the topic of cyber security, the non-traditional security literature in the political science discipline provides a well established theoretical foundation on which to base this paper. This paper will use the theoretical framework set out by one of the prominent schools in the literature, such as the Copenhagen School or Paris School, but not to the full extent. Instead, the focus will be on the existential threat component of the securitization process, a small but significant part of the theoretical framework. To guide the examination of existential threat perception, certain concepts are borrowed from outside the non-traditional security literature while still maintaining a primary focus on cyber security.  The purpose of focusing exclusively on the threat perception of North Korea’s cyber capabilities is as follows. By examining a narrower portion of the theoretical framework in non-traditional security studies, this paper attempts to contribute to a deeper understanding of the issues at the intersection of cyber security and North Korea, an increasingly significant security 2  area and a politically unique state in the Asia Pacific respectively. Using technical tools that allow the processing of large stores of data in the American media and government, the utility of big data methods will also be shown. Furthermore, by focusing on the threat originating from cyberspace, it is hoped that the findings in this paper will drive further research into the securitization process of cyber security issues.    3  The Metadata Trends This paper begins with an overview of the prominence of various subfields in security studies. Two publicly available big data sources are used to cover different date ranges. For data from 1970 to 2008, the Google Ngrams dataset is utilized. More recent trends from 2008 to 2014 are tracked by the Google Trends data. By using these big data proxies, it will be shown that the American interest in the subfields of non-traditional security studies, such as environmental security, food security, and cyber security, has been steadily increasing over time to the point that cyber security is the primary non-traditional security interest for the United States in most recent years. The high interest in cyber security issues is a relatively recent phenomenon. Cyber security concerns would not have been on the radar of most states in the early stages of the Internet’s development.1 It is with the expansion of the Internet on a large scale that new vulnerabilities and threats began to arise. In turn, states began to show greater interest in cyber security as these new threats gained political and military significance. In recent years, the frequency of cyber attacks is estimated to be on the order of millions on a weekly basis (“Dark Side” 265). With such growing insecurity, the United States Department of Defense declared the cyber domain as another domain of warfare in 2011 (Special Report). President Obama has also publicly stated that the “cyber threat is one of the most serious economic and national security challenges we face as a nation” (Cyber Security).                                                  1 The early peak and plateau in the frequency of the “cyber security” keyword between 1975 and 1985 is an anomaly that required further investigation. The early peak is the product of both an early discourse on cyber security matters and imperfections in the data. Despite the unusual peak in the 1970s and 80s, this does not detract from the general observable trend of increasing significance for cyber issues. To examine the issue, the raw TSV (tab-separated values) datasets were downloaded from Google. Next, filtered queries were applied to the Google Books database, which the Ngrams data is derived from. The TSV values showed that the underlying raw data was being accurately depicted in the graph generator. However, the Google Books results showed that a few of the book results had incorrect publication date values. As a result, the anomaly is not actually as significant as depicted in the graph. 4   Fig. 1. Frequency of Terms in American English Corpus 2009 from Michel, Jean-Baptiste, et al. "Quantitative Analysis of Culture Using Millions of Digitized Books." Science 331.6014 (2011): 176-82. Web. 30 July 2014. In order to measure the interest in cyber security over time, the frequency of keywords in the American English language corpus of books was chosen as a proxy.2 As shown in Figure 1, although the query results do not include the word frequencies in newspapers, magazines, and other notable sources of published materials, the general trends that can be observed for the large collection of English language books is indicative of the rising interest in cyber security concerns. As mentioned before, it is to be expected that cyber security concerns begin to establish a greater presence on the graph in the late 1990s and early 2000s. In fact, the most prominent cyber concern in that time period was the Y2K problem, as depicted by the significant spike in the graph. Further data on the word frequencies in books is not available beyond 2008 at the time of this paper’s publication.                                                    2 The data from the Google Books Ngram Viewer shows the frequency of word usage in books published in the English language corpus from the 1500s to the present. The database consists of the digitized contents of roughly 12% of all books ever published in the English language (Michel et al. 176). A smaller subset of this large dataset was then used by the graph generator to visualize the queries.  5   Fig. 2. Frequency of Search Terms on Security from "Google Trends." Google Trends. Google, n.d. Web. 28 July 2014. To examine more recent trends in public interest for the various subfields of security studies, a different large-N dataset is necessary. The Google Trends data in Figure 2 shows the relative frequencies of specific search terms on the Google search engine from 2008 to 2014.3 As a widely used English language search engine, Google is a suitable source for this search data. Although the data on relative frequencies of search terms is an entirely different proxy from the earlier n-gram data, to the extent that both measure the level of public interest on a topic, it is possible to observe that the cyber security field is an area that continues to garner greater interest in recent years.                                                     3  The Google Trends graph presents data as a relative measure instead of depicting absolute search volume. According to the company, the Trends data is “normalized and presented on a scale from 0-100” in order to depict the relative popularity of search terms (“About”). The general trends and peaks in the frequency of search terms over time are most relevant for this paper (“How Trends data is normalized”). Figure 2 indicates that the prominence of “cyber security” outpaces interest in the other subfields of non-traditional security studies.  6  Theoretical Framework: Threat Perception The Copenhagen School presents political scientists with the most important components in the theoretical framework on non-traditional security studies: (1) the existential threat, (2) the securitizing actor(s), (3) the referent object(s), and (4) the audience. The interaction between these components leads to an outcome that is characterized as the securitization (or desecuritization) of an issue. Of these many moving components, this paper is most concerned with the first factor: the perception of existential threat from a specific source.  The existential threat component of the securitization process is closely linked to an issue’s threat perception and the level of interest in it. For the purposes of this paper, it is accepted that a higher threat perception is correlated with a higher level of interest in an issue, as measured by such proxies as publication volume, Internet search frequency, and allocation of government funding. In a sense, the perception of threat influences how actors write about, discuss, and react to an issue. For example, the estimated $300 billion response to the Y2K problem worldwide was primarily driven by the perception of imminent danger as the millennium drew nearer (Y2K). By examining the indicators that measure what is prominent in the public sphere, it is possible to determine what is or is not considered a prominent issue (and a threat) by a group of actors.  It is necessary to explain the determinants of threat perception. In the fields of engineering and risk assessment, the concept of probabilistic risk assessment (PRA) is used to estimate the possible futures and vulnerabilities of a project from a damaging event (“Fact Sheet”). This is particularly relevant for research on natural disasters and nuclear incidents. Borrowing from the disciplines that make frequent use of PRA, this paper uses the basic principles of probabilistic risk assessment to propose that the perception of threat is formed on (1) 7  the expected damage and (2) the likelihood of an event. As shown in the following equation, threat perception is determined by an estimation of expected utility (or expected cost).                                                         Fig. 3. Perception of Existential Threat Expected damage and probability are worded as they are in order to emphasize the importance of the human perception of these components. It may be the case that there are objective true values for the actual damage and probability of a harmful event. Many efforts at estimating expected costs are attempts to determine those true values as closely as possible. However, in the context of this paper, the human perception of these components (damage and probability) matters more than the actual or real values. As in the Y2K example, in the years leading up to the millennium the expected costs of the Y2K bug were perceived as sufficiently high to be deserving of the high level of spending.  While the emphasis is on human perceptions for the purposes of this paper, this is not to say that perceptions are not based on actual events and information in the real world, whether the data are information from expert sources or predictions based on historical events. In fact, the next sections of this paper reference such real information in greater detail as a way to determine the most likely American perception of the NK cyber threat. The emphasis on human perceptions is meant to show how there may be discrepancies between a seemingly high threat source and a low threat response to it.  The expected damage (Damage) component of North Korean cyber attack capabilities will be estimated by examining the range of damage capabilities as well as the historical record of damage inflicted. With damage held constant, the perception of higher likelihood (Probability) 8  of an event magnifies the expected cost. The likelihood of damage from North Korean cyber attacks will be estimated by examining the frequency of cyber attacks over recent years.    9  Characteristics of Cyber Insecurity This section elaborates on certain characteristics of cyberspace that are relevant for cyber security. Many of these factors increase the uncertainty of cyberspace, whether it is in determining the true source of attack or the actual levels of cyber aggression. Furthermore, this section specifies the methods of cyber attack that are of greatest relevance for this paper.  Attribution and Responsibility  Certain features distinguish the cyber domain from other domains of warfare, namely (1) the anonymity of attacks and (2) the linkages with non-state actors. The very characteristics that made the Internet into an open commons also created a hospitable environment for cyber crime to flourish undetected as states, international organizations, and the private sector were unable to effectively block the operation of decentralized and mobile cyber criminals. The concept of plausible deniability, whereby aggressors can claim uninvolvement in a cyber attack because of the difficulty of attribution, is often mentioned as a feature that maintains such uncertainty in cyberspace. Furthermore, the prominence of non-state actors makes it difficult to hold states culpable for cyber aggression. The decentralized state linkages with the non-state actors of cyberspace, such as cyber criminals and hackers, are a significant source of cyber attack capabilities. In some cases, the state, such as China and Russia, is in a cooperative relationship with domestic cyber criminals that carry out attacks on desired targets (Klimburg 41). A difficulty arises then in identifying what is or is not a state-led cyber attack. Despite the attempts by private cyber security firms to examine the source code used by cyber criminals and attackers, it is difficult to be completely certain of the actors responsible for a cyber attack. Attribution is such a difficulty 10  that in 2000 a “15-year-old Canadian high school student” was identified as behind a “series of high-profile denial of service attacks [on]...Yahoo!, Amazon.com, Dell, ETrade, eBay and CNN” only after he had bragged about it on an Internet forum (Markoff “Web’s”). However, the uncertainty in determining the attribution of cyber attack does not have to prevent a deeper discussion of threat perception. In fact, it is informative to observe how actors will often persistently attribute an attack to a specific source even in the face of uncertainty or contrary data. It is often stated in the literature that cyber criminals act behind a mask of “plausible deniability” as mentioned above (Klimburg 42). The same principle is open to strategic use by the accuser such that a cyber attack victim can place responsibility on any plausible aggressor. To observe states reacting to a cyber attack in light of attribution difficulties provides interesting insights into threat perceptions.  Reporting Rates Another factor creating further uncertainty in cyberspace is the incentive to underreport incidents of cyber attacks by the victims of cyber aggression, thereby distorting the true level of cyber insecurity. While the open sharing of possible vulnerabilities is a mutually beneficial outcome for all actors, the reporting of cyber attacks is often a more insulated process. Cyber crimes, which number in the millions on a weekly basis by modern estimates, are not all reported or known to the general public (“Dark Side” 265).  The distorting effects of underreporting are particularly prevalent for private sector organizations that rely on maintaining a good public reputation. As companies with profitability tied to the perceptions of their consumer bases, the “fear of publicity similarly deters companies from reporting computer crimes” (Hafner and Biggs). The incentive to maintain a favourable 11  public image outweighs the mutually beneficial outcome of open reporting which would reduce “the chances that someone else will attempt to use the same path into a secured system” (Hafner and Biggs).  Underreporting is not likely to be limited to only private sector actors. The same principle of strategic underreporting applies to state actors, particularly as reporting incidents of damaging cyber attacks can act as a signal of cyber vulnerability to the rest of the world. Once more the mutually beneficial outcome of open reporting is undermined by a dominating preference for each state to be secretive of its own national vulnerability against cyber attacks.  Cyber Attack Types  There are various methods of cyber aggression, ranging from the nuisances of amateur cyber vandalism to the more sophisticated cyber attacks on the level of Stuxnet. Among the most widely used methods of attack are the distributed denial of service (DDoS) attack, the zero-day exploit, and various malicious codes. The full range of cyber attack methods can be categorized according to various systems. Under one system, cyber weapons can be divided into syntactic, semantic, mixed, and electromagnetic weapons (Solce 304). Another less granular system distinguishes between physical attacks, electronic attacks (equivalent to electromagnetic), and computer network attacks (CNA) (Wilson 3). An even more general conceptualization lumps the various methods under the umbrella term of computer network exploitation attack (Klimburg 43).  This paper is most concerned with the cyber attacks that are conducted purely through code and the Internet. This includes syntactic and semantic weapons as well as computer network attacks (CNA). These methods of attack are conducted through purely computer code or digital means. Physical and electromagnetic attacks that primarily affect the physical targets on 12  which the cyber domain is based on are not included in the analysis. Such methods of aggression are a significant departure from this paper’s interest in the digital means of cyber attacks.  The distributed denial of service (DDoS) attack is a common tactic in the cyber domain. The DDOS attack has its origins in a relatively harmless feature of the Internet. When multiple users attempt to visit the same Internet domain or website, it is possible for the server to be stalled by an excessively high number of visits. When used for malicious intent, the number of requests per second is high enough to cause damage to the servers on the other end. This method of attack is relatively easy to use, such that it is available for the “average computer user with the right tools” (Klimburg 42). In the case of websites holding sensitive personal and financial information, national websites, and the software tied to public infrastructure, the damage from DDoS attacks can be significant. As websites and digital infrastructure are brought down virtually, the effects are felt in the real world. The 2007 DDoS attacks on Estonia that “brought down the Web sites of the Estonian President, Parliament, a series of government agencies, the news media, the two largest banks” are often mentioned as a landmark usage of this method of attack (Hansen and Nissenbaum 1168). Due to the very nature of digital infrastructure, another common source of danger lies in the vulnerability from zero-day exploits. All software programs have technical “bugs” that allow the program to be used in unintended ways. A “zero-day exploit” in the software is the malicious use of a previously undiscovered and unpatched bug. Such attacks cannot be anticipated in advance due to the complexity of even the simplest software. Protective measures are not possible because the existence of the exploit is unknown even to the software developers of the product. As a recent example, a significant zero day exploit that allows “remote code execution” 13  was discovered in a widely used Internet browser, Internet Explorer, in April 2014 (Musil).  In addition to the DDoS attack and zero-day exploits, cyber attacks can also take the form of trojan horses, worms, logic bombs, and malicious viruses. The diverse programs and codes that are written for these aggressive purposes are numerous and continuously evolving. Stuxnet is often considered as the most significant example of this method of attack due to its high level of detail in comparison to more typical codes (Dark Code 178). It was the most sophisticated cyber weapon at the time, based on the damage it incurred on the Iranian nuclear program (Farwell and Rohozinski 27).    14  North Korea Cyber Threat Perception Damage Capability Having provided a brief survey of the range of cyber attacks, it is appropriate to guide this paper to an examination of the particularities of North Korean cyber capabilities. While this section presents substantive evidence from government and technical sources, it must be admitted that assessing the North Korean regime’s true cyber warfare capabilities is a difficult task. The lack of outside information due to the closed-in nature of North Korea means much of the information must be derived from defectors and leaked insider sources. A survey of the evidence from these sources indicates that North Korean cyber capabilities are composed of hundreds of trained hackers. North Korea’s adoption of modern telecommunications has increased in recent years but still remains incredibly low compared to other states. The most recent developments in this area have been the “introduction of cell phones into the country and the opening of a new computer lab at the Pyongyang Institute of Science and Technology” (Park and Snyder 101). The government also maintains a number of official state websites and social media pages on Facebook and Twitter since 2003 and 2010 respectively (Cha and Anderson 108; Ko, Lee, and Jang 294-295). However, the above services are the full extent of North Korea’s public presence on the global Internet. A number of outside sources looking into North Korea depict the cyber threat from North Korea in an alarming light. The South Korean National Assembly’s national defense committee reports that “North Korea’s intelligence warfare capability is estimated to have reached the level of advanced countries,” with 500 to 600 hacking staff (Fitfield). The report confirms the suspicion that their “main task is to gather intelligence from or launch a cyber attack on the US, 15  Japan and South Korea” (Fitfield).  More recently in 2014, an Australian Strategic Policy Institute (ASPI) report on the state of cyber security in the Asia Pacific evaluates North Korea’s cyber offensive capabilities as a “concern” (7). Although the country’s rank in the Asia Pacific region based on a cumulative index of “cyber maturity” places it second from the bottom, the North Korean cyber military capabilities are rated as seven on a ten point scale. This places the country behind China, the U.K., and the United States, but ahead of much more well connected states such as India and Thailand. According to the metric used in the report, a seven indicates that there is “very well-developed cyber capabilities; well-defined civilian and military cyber roles; [and] some international engagement” (64). Other countries rated seven are Australia, Singapore, and South Korea.  The report goes on to state that North Korea is “believed to have highly developed cyber capabilities and a well-organized and extensive education and research program” (40). Citing historical cyber attacks, it is noted that North Korea has already “successfully infiltrated South Korean government and private sector systems” (40). North Korea only measures significantly on the military aspect of cyberspace, lacking any social or economic uses of the Internet. It will be important to examine the internal structure of North Korea’s cyber units to grasp its full capabilities. Sources make frequent reference to the specialized government agencies that deal with cyber matters, such as Unit 121 and Mirim College. These government units and educational institutions make up the cyber military infrastructure of the country. Mirim College, also called the University of Automation, is one of many institutions that act as preliminary training grounds for budding hackers (Park “Hacking”; Yoon). Under the General Bureau of Reconnaissance, Unit 121 is a dedicated hacking unit that employs them (Boo and Lee 16  96). This dedicated focus to nurturing cyber capabilities is well in line with the military doctrine of Kim Jong-il who stated that “the wars of the 20th century were those of oil and bullets, but the wars of the 21st century are information wars” (Boo and Lee 96). A North Korean defector with professional ties as a former computer science professor, Kim, shares that the North Korean educational system is specially geared towards nurturing technical prodigies (Yoon). The former professor admits that training is started in the early years and furthered when “students are sent to China or Russia for about one year to solidify their knowledge of hacking and other technical skills” (Yoon). As of 2011, Kim estimates of a boost in North Korea’s cyber capacity to a corps consisting of 3,000 hackers (Yoon).  Anecdotal evidence from a former North Korean soldier and former hacker, Jang, corroborates Kim’s statements as he also “estimates the North has some 3,000 troops, including 600 professional hackers, in its cyber-unit” (Park “Hacking”).  As a former student of a Pyongyang military college for hackers, Jang reveals that the coordination of such a large number of hackers in a country that is isolated from the Internet at large is possible because they are dispatched undercover to “China, Russia and even Europe, posing as "programmers" keen to learn about developing new commercial programmes” (Yoon). This is an effort to avoid detection as cyber attacks would never be conducted from within the homeland where attacks would be traced back easily (Yoon). As of the interview conducted in 2011, Jang states that “there are 600 hackers, two teams of 300, operating overseas… [and that] rotate once every year or two” (Yoon).  Based on these anecdotes from a few years ago, it is likely that North Korea would have advanced further rather than stagnated in its development of cyber capabilities. Following an examination of the anecdotal information from defectors that have settled into new lives in South 17  Korea, it is possible to grasp how government reports could rate the North Korean cyber capabilities so strongly. From this initial examination, it seems that the asymmetric cyber warfare capabilities of North Korea are formidable and capable of high damage. The next section will detail the prominent damage that has been dealt by attacks attributed to North Korea.  Probability An overview of past North Korean cyber attacks will play an important role in determining how perceptions of the likelihood of cyber attacks are formed. A high end estimate of 70,000 historic North Korean cyber attacks on South Korea to date is given by a South Korean intelligence official (Park “Hacking”). Most of these instances do not reach a sufficient degree of notability to be documented by the media. However, the high end estimate can be kept in mind as an indication of the high frequency of cyber aggression in the digital age, even if most go unreported or underreported.  The North Korean cyber attacks are primarily aimed at South Korea and the United States. The attacks on South Korea should not be dismissed as an insignificant concern for the United States. As one of the closest economic, political, and military allies in the Asia Pacific, attacks on South Korea are a significant issue for American interests (Manyin et al. 2). Furthermore, North Korean cyber attacks also target digital entities that are jointly operated by the South Korean and American militaries. In these ways, North Korea’s cyber attacks have a direct impact on American assets and interests in the Asia Pacific.  South Korean cyberspace has been the target of attacks originating geographically from China since 2004 (Sin 10). From 2004 onwards, attacks have occurred at the frequency of “1,632 in 2006, 870 in 2007, and 1,277 in 2008” according to Korean media sources (Sin 11). These 18  yearly figures do not follow a regular pattern. Instead, it is enough to recognize that North Korea’s cyber attack capabilities date back to the early 2000s. Cyber attacks become more damaging and frequent in the years after 2009.   A symbolic set of cyber attacks against South Korean and American targets occurred on July 4, 2009 as more than 27 government and commercial websites were affected (Choe and Markoff). The South Korean national intelligence agency placed the responsibility for this attack on North Korea. The damage from these attacks was “relatively minor” as the websites were restored to their original state in a few hours (Choe and Markoff). While the attacks did not leave lasting damage, the 2009 attacks were the first to be actively covered by the American mainstream media, raising the alarm on North Korean cyber capabilities.  Shortly after the 2010 shelling of Yeonpyeong Island in South Korea, speculation surrounded the range of actions North Korea would take as many South Koreans were “convinced that the North will strike again” (Fackler). Among the likely options to be taken by North Korea, a cyberattack was a strong candidate (Fackler). The possibility of a North Korean cyber attack following on the heels of an artillery shelling, which claimed the lives of two South Korean Marines and injured others, indicates how North Korean cyber attacks were now a presence on the radar of security concerns (Hogg). For ten days beginning on March 4, 2011, a DDoS attack targeted forty South Korean websites as well as South Korean and American military websites (Macdonald). Called the “Ten Days of Rain” by security firm McAfee, this was another landmark cyber attack case (Update 1). Coverage by American media outlets noted that this attack was not “as serious as” the one in 2009 as preparations had been made. On April 12 of the same year, another prominent wave of cyber attacks covered by the American media brought down a widely used banking system in 19  South Korea for days and led to data loss (Harlan and Nakashima). Preparations were ineffective against this round of attacks.   The next wave of serious cyber attacks that made headlines in the American media occurred on March 20, 2013 and June 25, 2013. The former disabled 32,000 computers used by South Korean several television broadcasters and six banks for up to five days (Update 1). While military and government agencies were not damaged in what are now called the “DarkSeoul” attacks, South Korean commerce was frozen for that time period (Kim; Choe “Computer”). The June 25 attacks, on the same day of the start of the Korean War, again brought down the “main websites of South Korea's presidential office and some local newspapers” as well as over 60 websites (Finkle; Choe “June”).   A general trend in this brief overview of North Korean cyber attacks is the increasingly frequent and destructive nature of the cyber attacks. Despite the staggering technological gap in Internet adoption between North Korea on one hand and South Korea and the USA on the other, on a number of occasions the South Korean digital infrastructure has suffered for days at a time. American targets have also suffered from the attacks. Based on these historical events, it is likely that the probability of North Korean cyber attacks would be seen as high or concerning rather than low or negligible.     20  Resulting Perceptions Thus far, the concerning nature of the North Korean cyber attack capabilities, as a function of their damage capabilities and likelihood of usage, has been shown. Based on the evidence in the preceding sections, the most likely threat perception of the cyber threat from North Korea is likely to be High as shown in the following equation. The high damage capabilities and likelihood of North Korean cyber attacks results in a high expected cost to the United States. The high objectively measured expected cost should lead to a high threat perception. In turn, the high threat perception of North Korean cyber security issues should be associated with greater measurable levels of interest in the issue.  For the purposes of this paper, a qualitative estimation of values in the equation is used. The High designation for damage and probability could be replaced by a lower value, Medium, as long as the concerning nature of cyberspace is still recognized. The above sections of this paper have attempted to show that a Low threat perception of North Korean cyber capabilities is not warranted in objective and analytical terms.                                                                          Fig. 4. North Korean Cyber Threat Perception Possibility 1  With the theoretical threat perception that has been posed above, the next sections will explore two prominent sources of data from the American government and media to determine the nature and extent of apparent public perceptions. The application of various computer scripts, programs, and web APIs play an important role in uncovering a number of interesting patterns that are particularly relevant for the North Korean case.   21  Data Collection and Methodology American Media Coverage: The New York Times  The New York Times was chosen as a representative American media organization for the following reasons. The New York Times (hereafter NYT) is widely regarded as a “newspaper of record” in both reputation and quality of journalism (Martin and Hansen 1). In the digital age, it is the most visited newspaper company by the American audience with nearly 30 million unique visitors per month (comScore). With such a wide audience, the publication record of the NYT will be an appropriate proxy to track the public interest in the issues that are covered in this paper. The data collection process was expedited by scripts coded in the Python programming language by this paper’s author.4 A time constraint filter sorted publications with a publication date (pub_date) property greater than 1980.5 Each media publication included certain metadata properties: the publication date and time (UTC), the permanent URL, the lead paragraph, and the type of material. The type of material was a useful disambiguator and took on one of twenty values, with the most important ones being “News”, “Blog”, and “Op-Ed”. A total of 481 unique publications were extracted from the 1,676 results collected by the multiple overlapping queries.6 Of these 481 items, a further filter that isolated only the items                                                  4The web interface is tedious for large-N use because results are capped at ten items per page and one thousand items per query. In order to automate the frequent usage of the API service, a program was written in the Python programming language to send requests, collect the results of the various queries, and finally to store the data into separate .txt files. The data from the New York Times API was in JSON or JavaScript Object Notation format. While JSON data is constructed in a human readable format, it was necessary to organize the large lists of data into a more manageable structure. As multiple files of data were collected, the organization structure was a particularly important. A cloud-integrated spreadsheet application known as Google Sheets was utilized. A series of formulas and functions were written into the spreadsheet to parse the raw JSON data into rows of values. Functions also filtered duplicate entries. Each newspaper publication and its metadata would occupy one row in the spreadsheet, allowing for easy comparison over different time ranges. 5An early start date was intentionally chosen to retrieve as much data as possible. When this time constraint was not applied, meaning data from 1851 to the present was queried, there was not a noticeable difference in the results that were returned.  6 The high number of duplicate items that were detected is to be expected because the multiple queries were meant to ensure no materials on the cyber domain were missed in the data collection process. A variety of different search terms were required because of the relatively unstandardized nomenclature for cyber security issues.  Each unique 22  with “News” as the type of material refined the data to 345 unique publications. This number was reduced further to 170 items for the substantive analysis. Each remaining publication was then coded based on its content for the degree of coverage on (1) cyber security and (2) North Korea. Labels with greater granularity were assigned where applicable.7 An initial glance at the metadata would suggest that there is increased American coverage and interest in the North Korean cyber issues as suggested by the threat perception equation. Based on the aforementioned coding rules, there were only 1 to 2 publications covering cyberspace and North Korea (NK_CYBER and NK_CYBER_RPT) in the years from 2010 to 2012. In 2013, the number of publications jumped to 29 as cyber attacks became more prominent and garnered greater interest. If taken at face value, the numbers alone support the expectation that the American media is giving greater attention to the North Korean cyber threat over time. However, examining the publication data in greater detail reveals a number of “false positives” in the data. The false positives are characterized by the presence of content on cyber security and North Korea as detected digitally by the NYT API. Crucially, these two topics are not semantically linked. In other words, the cyber content is distinctly separate in sentence structure and meaning from the content on North Korea. This disconnection comes about because North Korean threat concerns are often associated with the regime’s nuclear and ballistic                                                                                                                                                              query was constructed from a formula: {korea, north+korea}  + {{cyber} OR {cybersecurity, cyber+security} OR {cyberattack, cyber+attack} OR {cybercrime, cyber+crime} OR {cyber+threat} OR {internet} OR {internet+attack}}. The first component is a locational term for the Korean peninsula. The second component queries the data for content on cyber issues. The inclusion of these two components meant that as much data as possible on North Korean cyberspace was retrieved.  7 Ten descriptive tags were used to categorize all 345 articles. An article was assigned a NONE tag when it was not about a cyber security or North Korean matter. The CYBER and NK tag indicated substantive content on cyber security and North Korea respectively. For the purposes of this paper, the NK_CYBER (to describe North Korean and cyber issues in general) and NK_CYBER_RPT (to describe journalistic reporting of cyber attacks) were relevant. Interestingly, the analysis of the publications resulted in the need for further descriptors: NK_NUCLR_MSSLE and NK_NUCLR_OTHER_CYBER. The former describes articles that only associate North Korea with a nuclear missile threat and the latter describes the same while cyber security is associated with another non-NK source. The remaining labels (CYBER_OTHER, NK_OTHER, and KOREA_OTHER) covered miscellaneous content not relevant for the paper. Eliminating the non-relevant items (the NONE and OTHER codes) left 170 unique articles that cover cyber security and/or North Korea in substantive detail. 23  missile capabilities rather than its cyber capabilities. American cyber concerns are also regularly associated with not North Korea but China. An additional label, NK_NUCLR_OTHER_CYBER was created to account for these cases. When cyber security concerns were not mentioned at all and North Korean threat was explicitly associated with only nuclear capabilities, the NK_NUCLR_MSSLE code was used. Combined, these types of articles made up 33% of the 170 unique relevant articles. The aforementioned patterns of association are consistently observed. For example, in a 2013 article on the American willingness to retaliate against cyber attacks, significant emphasis is put on the new US Cyber Command and possible responses to attacks from foreign nations. Then, North Korea is mentioned to point out that “American intelligence officials are giving new emphasis to the danger posed by North Korea’s nuclear weapons and missile programs” (Mazzetti and Sanger). The association of North Korea and cyber threat is undone as the nuclear threat is emphasized. Most importantly, such textual evidence shows that the cyber threat from North Korea is not recognized by the American media as a significant concern; the threat is from nuclear weapons and missile programs. In the few cases where North Korea is mentioned specifically as a cyber security concern, it is with a degree of uncertainty. In particular, the difficulty of tracing cyber attacks and the plausible deniability of cyber attacks is an often cited concern in the publications. As mentioned earlier in this paper, the act of associating cyber attacks with a specific source even in the face of uncertainty of attribution is possible in the cyber context. The publications that have been examined so far do not forcefully make the connection between cyber attacks and North Korea. This would further suggest that American media sources are not highly aware of or interested in the NK cyber threat. 24  The final adjusted numbers provide a more nuanced and informative picture. Slicing the data into portions, the NK_NUCLR and NK_NUCLR_OTHER_CYBER variant make up 33% or one third of the 170 unique and relevant news items. Next, at 29% almost another third of the articles cover issues where cyber security and North Korea intersect (NK_CYBER and NK_CYBER_RPT). These articles cover the North Korean cyber attacks in the form of journalistic reporting without making overt threat perception associations. Within the 29% of articles, a small subset of 11 articles explicitly characterize North Korean cyber attacks as a concerning threat to the United States. The remaining 38% of articles cover cyber security or North Korea separately.  The small subset of the articles that specifically depict North Korea as the source of a significant cyber threat make up only 6% or 11 of the total 170 unique items. All 11 articles of this nature were found for the year 2013 (under the subset NK_CYBER). In one article, it is specifically noted that there is a “high level of concern in Washington about the growing danger of computer-network attacks from Iran or North Korea” (Shanker and Sanger). The others echo this concern. However, this kind of publication makes up the smallest portion of the collection of total publications because the North Korean threat is often a nuclear or ballistic missile threat even when cyber security concerns are also concurrently mentioned. Based on the analysis, it can be seen that the degree of American media coverage of the NK cyber threat is greater than zero but secondary to the association of threat with nuclear and missile capabilities. The analysis of the programmatically retrieved NYT articles showed that the American perception of the North Korean cyber threat is more complex than initially thought. While North Korean and cyber security issues are frequently covered by the NYT in journalistic coverage, many of the articles do not explicitly link the topics of North Korea and cyber security 25  into a coherent whole. As a result, it can be seen how the American perception of threat is formed and affected by other sources of danger. It will be worth keeping these findings in mind as the government publication data are examined.    American Government Publications  Publicly available reports from the Department of Defense and the Congressional Research Service were analyzed as representative American government publications.8 The Quadrennial Defense Review (QDR) reports by the Department of Defense and over thirty Congressional Research Service (CRS) reports were included in this stage of the analysis. Python scripts were used again to programmatically extract portions of all government publications where the keywords “North Korea” and “cyber” appeared.9 These digitally retrieved results were subject to further human analysis.  Both QDR and CRS documents are suitable for gauging the American government’s perception of security issues. The QDR report is a “legislatively-mandated review of Department of Defense strategy and priorities” that sets out American security concerns for the immediate future (Quadrennial). The CRS documents are another notable source of government information as they are the product of “research and analysis for Congress on a broad range of issues of national policy” (Congressional).  The oldest QDR report from 1997 makes no mention of cyber issues. The inclusion of cyberspace as a security consideration begins with the 2001 QDR. The report warns that “states                                                  8 Publicly available government publications are most relevant for this paper. The documents in the public domain are intentionally used for this paper because such documents act as a reflection of the  American government’s public portrayal of security concerns. 9 A python script was written using the PyPDF2 library to first convert the QDR and CRS reports into text (.txt) files (PyPDF2). Then, the text files were parsed line by line to search for instances of relevant keywords. When a keyword was found, the sentence containing that keyword was written to a separate text file. This process was iterated over the entire collection of American government publications.  26  will likely develop offensive information operations...through cyber space” (7). Although the future of cyberspace is still an uncertainty at this time, the “cyber” keyword appears in five instances of the 2001 report as a potential area of concern.  The possible insecurity from cyberspace is given further consideration in the 2006 QDR. Threats from cyberspace are particularly noted for their usage by terrorists as an “attack on U.S. territory, people, critical infrastructure, or forces” (25). As observed in the media publications, the report flags China’s emphasis on “electronic and cyber-warfare” as a potential cyber threat to American interests. Overall, the 2006 QDR mentions cyber issues in ten instances throughout the report. This is an increase from the last QDR, but not by a significant margin.  In the pre-2010 QDR reports from 1997 to 2006, a general trend towards increased attention to cyber issues is noticeable. In these reports, none of the cyber concerns were linked to North Korea (occurring as a keyword 13 times in 1997 and 4 times in 2006). This is to be expected because North Korean cyber attacks were not as sophisticated or notable in this early time period. It will be necessary to examine how cyber and North Korean issues are perceived in the years following the more prominent North Korean cyber attacks that began in 2009.   The 2010 American Quadrennial Defense Review gave a significant amount of attention to cyberspace with 73 mentions throughout the report and its own sub-section (37-39). The main points emphasize the coordinated response under the US Cyber Command as well as increased efforts at training and raising awareness of cyber security issues. As of 2010, the documents indicated that cyberspace was a significant concern for the American government.   In the 2010 QDR, North Korea is mentioned only once. An analysis of the context reveals that while the country is mentioned in the same sentence as Iran, the focus is on the two countries’ ballistic missile capabilities (31). While the cyber domain as of 2010 showed signs of 27  high threat perception by the United States, the North Korean threat was perceived as a non-cyber issue. This lack of attention may be due to the relatively low prominence of the NK cyber issue at the time. The regime launched prominent cyber attacks in mid-2009 and mid-2011, at times when the 2010 QDR would have been unable to fully document the events (Choe and Markoff; BBC). It will be worth comparing these findings to the 2014 QDR. The most recent Defense Review reverses some of the trends of its predecessor. The 2014 American Quadrennial Defense Review mentions cyber security concerns a total of 45 times regularly throughout the report. The report’s ongoing inclusion of this non-traditional security concern as a “key priority” suggests that the United States is steady in placing priority on cyber issues (32). This is in line with the patterns found in the Google Ngram and Trends data. However, the increased attention to cyber concerns does not spillover into the North Korean domain. The 2014 document has only 6 mentions of the North Korean regime. Part of this is to be expected for a report that covers a scope as large as the Quadrennial Defense Review. Yet, of the few times that North Korea is mentioned, the issue is specifically framed as one of a nuclear and missile threat rather than a cyber threat. In other words, while the review covers both cyber security and North Korea explicitly, the two concepts never intersect in the discussion. The thirty CRS documents that were examined are similar to the QDR publications in content. Nearly all of the publicly available CRS reports on North Korea focus on the threat from nuclear weapons and ballistic missiles. A recent exception is a 2014 CRS report that specifically references the March 2013 cyber attacks from North Korea. In this report’s short section on cyber issues, it is noted that North Korea is “investing heavily in improving its military capabilities in the cyber domain” (Chanlett-Avery and Rinehart 20). The same short attention to the NK cyber issue is found in a 2013 report on North Korea (Military and Security). However, 28  as was observed with the few NYT publications that drew explicit connections between North Korea and cyber security, these documents are the exception rather than the norm. The findings in the government data are similar to those found for media publications. In likeness to the prior section, the source of threat perception of North Korea is most strongly tied to the nuclear and ballistic missile capabilities at the expense of cyber concerns. While a small fraction of the NYT publications presented the North Korean cyber threat to the United States, even less was detected in the government data.     29  Discussion: Metadata Analysis  While technological advances have created new vulnerabilities in cyberspace, they have also opened up new methods of inquiry into pressing issues. Through the use of databases, APIs, and computer programs, there is great added value in analyzing large-N datasets that have hitherto been unavailable to researchers. At the same time, the big data approach must be coupled with a theoretically precise understanding of the information and a closer examination of each datum itself.   The first interesting finding on cyber issues was made even before the first script was run. A total of seventeen unique search queries were necessary to retrieve metadata from the NYT database.10  In order to ensure that no part of the literature was missed, a series of overlapping search terms was intentionally constructed. The highly variable nature of the terms that describe the activities in cyberspace (or cyber space) is a sign of the relatively early development of cyber issues. Cyber security is a field that has grown significantly in a matter of decades. Such rapid development has left open many loose ends in conceptual clarity as new attacks, new defenses, and new terms are invented regularly. In such a fast changing environment, big data technologies were a welcome and necessary tool.  The raw data collected by big data technologies can be misleading when presented without careful examination. If only the numerical patterns in the metadata of media and government publications are reported, such as the frequency of words over time, then there                                                  10 A variety of different search terms were required because of the relatively unstandardized nomenclature for cyber security concerns. Unlike other disciplines where the common vocabulary is set out by standardization or path dependency, there is a lack of unified terminology to describe cyber attacks and new security developments on the Internet. For example, an aggressive act over the Internet may be referred to as a cyber attack, cyberattack, or internet attack with the same intended meaning. The search terms were further improved by the inclusion of search operators (indicating relationships such as “and”, “or”, and “not”) through quotation marks and addition symbols. For example, the search terms [north korea] and [“north korea”] (with quotation marks) would return different results. However, even with the inclusion of search operators, a single search query is not sufficient to capture the full range of written materials on the topic of cyber aggression.  30  seems to be a wide level of support for the expectation that the American government and media devote greater attention to cyber and North Korean issues over time. While it is indeed the case that cyber and North Korean issues are being increasingly discussed in government and media publications, the two concepts often remain distinct from each other in actual semantic and conceptual usage. Such findings were possible by following the big data trends with a human examination of the data, just as the first anomaly in the Google Ngram data was examined further manually earlier in this paper.   Interesting patterns were also uncovered in the NYT publication record.11 In the NYT publications, it was found that a large portion of the increasing attention to cyber issues was made up of “false positives” which presented the North Korean threat as primarily associated with the country’s nuclear and missile programs. It may be the case that this may change in the future as North Korean cyber attack capacities increase even further. Only a short period of time (less than a decade) has passed since the very first cyber attack was covered by the American media. The beginnings of this shift in perception were even seen in the few NYT articles that explicitly acknowledged the threat of NK cyber attacks.  Likewise, in the government publications, it was also found that North Korea is commonly perceived of as a threat due to its nuclear and missile capabilities rather than its cyber warfare capabilities. Just as the beginning of a change in American perception was seen with the small portion of NYT publications that covered the dangers of North Korean cyber attacks, the official government coverage of North Korean cyber attacks was noted in a single CRS report                                                  11 The comprehensiveness of the API service should be emphasized. A Web API, otherwise known as an application programming interface, is a web service that provides nearly unlimited access to an organization’s data. The New York Times Article Search API v2 is one of many services available for web developers and programmers (“Article Search”). It is specifically designed to provide open access to all NYT publications from the year 1851 to the present. Although time periods that stretch into the nineteenth century are not as relevant for the analysis of this paper, the wide range of available data is an encouraging feature for a large-N data analysis. 31  from early 2014 and a Department of Defense report from 2013. It remains to be seen whether this new acknowledgement of North Korean cyber threat is likely to continue for future American government reports.   The findings in the government and media publications are further corroborated by a big data source. The Google Trends data on the search term frequencies of various North Korean sources of threat depict the American public’s responsiveness to the various sources of North Korean threat. Figure 3, which draws on the Google Trends data, presents an interesting exception to the general trends that have been observed in the level of interest in cyber security issues. From 2008 to 2014, the graph shows that American attention tracks the “nuclear” and ballistic “missile” activities of North Korea quite closely while responding very little to its “cyber” activities. This pattern is observable despite the large number of cyber attacks on American and allied targets that have been attributed to North Korea in this time period.   Fig. 5. Frequency of Search Terms on North Korean Security Concerns from "Google Trends." Google Trends. Google, n.d. Web. 28 July 2014.  The above findings have made it possible to gain a deeper understanding of the American perception of North Korean threats. The initial threat perception equation from an earlier section 32  estimated that the high cyber threat would be associated with high levels of observable interest in the topic in government and media sources. While the North Korean cyber threat perception equation is not entirely undermined, the next section presents an improved understanding of cyber threat perception as it is informed by the metadata analysis.    33  Reconsidering Threat Perception  Based on the evidence from the above sections, it is now appropriate to further qualify the initial threat perception equation with the consideration of threat from nuclear and missile capabilities. Although the objective threat from North Korean cyber attacks, as a function of their expected cost to the United States, is still High, the human perception of the NK cyber threat by the American government and media does not reflect this. The degree of media and government attention to the issue indicates that there is a low human perception of the cyber threat from North Korea. It is important to emphasize the distinction between the human perception of threat and the objective level of threat. The two concepts should not be conflated despite their seemingly similar nature. For example, although the proxies used in this paper show that there is a greater human perception of threat from North Korean nuclear weapons and ballistic missiles, this does not mean that the cyber security threat is no longer an objective concern. Instead, it is more accurate to say that the American government and media perceive of the cyber threat as being low, regardless of the actual objective danger from the source. In the securitization literature, which this paper is based on, the human perception of threat is of great importance. According to the theoretical framework, threats are “socially constructed” such that “security threats are not objectively given but instead reflect the development...in which some thing is discursively framed as posing an existential threat to some valued referent object” (Hameiri and Jones 463). Once an issue is first perceived as a threat, it can become securitized following the securitization process. Without the initial threat perception, a crucial step is missing. In the case of the North Korean cyber threat, the larger implication is that the North Korean cyber security issue is not likely to be securitized by the United States. 34  Based on the findings in the data, the observable American government and media perception of the North Korean cyber threat is shown in the possible estimation in Figure 5. This estimation is revised to reflect the observable trends in the data: the North Korean cyber threat is much lower concern than initially expected by a general overview of the levels of interests in cyber and non-traditional security issues. The main difference between the current estimation (Low threat perception) and the initial estimation (High threat perception) stems from the differences in human perceptions.                                                                        Fig. 6. North Korean Cyber Threat Perception Possibility 2    35  Conclusion: North Korean Cyber Threat  This paper has shown that the American media and government attention to the North Korean threat from cyberspace does not display the high level of interest that was found for cyber issues as a whole. An initial estimation of the American threat perception anticipated a high threat perception based on the high damage capabilities and likelihood of North Korean cyber attacks. Accordingly, it was expected that a high threat perception should be matched by proxies that measure a high level of interest and discussion on the topic. Interestingly, an analysis of the metadata and content of The New York Times and notable government documents revealed that other considerations affect the American threat perception of North Korea.  It was found that the American perception of threat from North Korea is often posed as an issue of nuclear and ballistic missile danger rather than a cyber attack danger. This association of threat was consistently observed in the media and government publications despite the many instances of North Korean attacks on American and allied targets on the Internet. In the NYT publication record, roughly one third of publications followed this threat association while almost another third only went so far as to link North Korea and cyberspace together conceptually through journalistic reporting of the cyber attacks. Only a small subset of those articles explicitly framed North Korean rather than Chinese cyber capabilities as a threat. In American government publications, the threat association with North Korean nuclear and missile capabilities was even more pronounced, with no detectable mention of concern for the NK cyber capabilities beyond those found in two government documents.  To account for this added complexity, the original threat perception equation was revised to present a more nuanced depiction of the human threat perception of North Korean capabilities. North Korean cyber, nuclear, and missile capabilities are seen as threatening to varying degrees. 36  Cyber capabilities are not objectively a low threat, but based on the current data analysis, nuclear and missile capabilities have higher priority in perceptions. Although cyber capabilities are starting to be seen as a threat in a minority of sources, it remains to be seen whether this is a pattern that will continue into the future.  The different threat perceptions of the various North Korean capabilities suggest that threat perception is not fixed; it can change over time as new threats arise and old threats decline. The first signs of this change in threat perceptions were observed in the few NYT publications and government documents that explicitly described the North Korean cyber attack capabilities as a threat to the United States. In the future, later iterations of this research may find it worthwhile to note whether the American perception of the North Korean cyber threat has changed over time.    37  Conclusion: Metadata, Methodologies, and Potential As non-traditional security concerns become increasingly more relevant, the need for new methods of data collection also increases. Metadata and big data technologies are particularly applicable for non-traditional security studies because such unconventional and interdisciplinary tools are able to provide new insights into the issue area. Cyber security served as an apt example of the utility of these new technologies as both the security issue and the technology are closely tied to the Internet and computers. As valuable as the new technologies are, it is equally as important to recognize their limitations. The term big data is often misused as the large size of the dataset is thought to render it immune to statistical errors. In the strongest sense of the term, a big data dataset is one where its size is such that “N = All” (Harford). Its coverage of all data points in the population means that traditional statistical methods can be avoided. Despite the great technological advances in recent years, this is not feasible for even the largest big data technologies. Whether the data is from Google, Facebook, or Twitter, their users are not so numerous as to be “representative of the population as a whole” in statistical terms (Harford). So long as “N = All” is unattainable, big data and metadata must be wielded with the same considerations that guide usage of all other large-N datasets. It will be helpful to show one of the first notable applications of big data technologies: Google’s tracking of flu trends. First published in Nature, the big data technology was used to approximate the “current level of weekly influenza activity in each region of the United States” based on the relative frequency of relevant search terms (Ginsberg et al. 1012). The ongoing Google Flu Trends service continues to track the real-time frequency of search terms on influenza and related terms to act as a predictive index for the current spread of influenza around 38  the world (Flu Trends).  A recent publication in Science responds to the Google Flu Trends data by noting the “big data hubris” that is characteristic of these new technological tools when not used cautiously. As mentioned earlier, the large quantity of data does not compensate for the inherent “foundational issues of measurement and construct validity and reliability”. The hubris was exposed in the real world when the Google data consistently overestimated “the proportion of doctor visits for influenza-like illness (ILI) than the Centers for Disease Control and Prevention” (Lazer et al. 1203). The researchers estimate that the errors came about at the “data-generating” stage because the company “had modified its search results to provide suggested additional search terms” which created biases in the data from the start (1204). In effect, the big data results were based on flawed preliminary data. Being aware of such vulnerabilities, precautions were taken to minimize the misuse of data in this paper. Google Ngrams and Trends data was used in a limited context to visualize general trends in the larger literature. The full record of New York Times publications was thoroughly examined, eventually paring down the original dataset from 1,676 to 170 unique and relevant publications through various filters and coding rules. Likewise, this method of close analysis led to the discovery of false positives in the larger trends of the NYT publications. These patterns in false positives were again observed in the publicly available government data from the most notable American government documents. For the purposes of this paper, the application of technology through these methods yielded new insights. It is likely that big data and metadata technologies will become increasingly more relevant for political scientists and researchers in other disciplines. Computer programs and web APIs are some of the technologies that provide open access to big data resources that have been 39  unavailable until only recently. This paper demonstrated one possible use of a web API that directly accesses the entire publication record of The New York Times. Processing of the large dataset was also aided by strategic use of computer programs and software. The potential uses of other programs and APIs that are open to researchers are numerous. As the quantity and quality of information that is openly available on the Internet continues to increase over time, the potential application of these and other technologies to future research endeavours in political science and other disciplines will also increase.     40  Works Cited "About Trends Graphs." Google Trends Help. Google, n.d. Web. 23 July 2014. Boo, Hyeong-Wook, and Kang-Kyu Lee. "Cyber War and Policy Suggestions for South Korean Planners." Korean Unification Studies (2012): 85-106. Branigan, Tania. "South Korea on Alert for Cyber-attacks after Major Network Goes down." The Guardian. Guardian News and Media, 20 Mar. 2013. Web. 26 Apr. 2014. Cha, Victor, and Nicholas Anderson. "North Korea after Kim Jong Il." North Korea in Transition:  Politics, Economy, and Society. By Kyung-Ae Park and Scott Snyder. Lanham, MD: Rowman & Littlefield, 2013. N. pag. Print. Chanlett-Avery, Emma, and Ian E. Rinehart. North Korea: U.S. Relations, Nuclear Diplomacy, and Internal Situation. Rep. Congressional Research Service, 15 Jan. 2014. Web. 21 July 2014. Choe, Sang-hun. "Computer Networks in South Korea Are Paralyzed in Cyberattacks." The New York Times. The New York Times, 20 Mar. 2013. Web. 02 July 2014. ---. "South Korea Blames North for June Cyberattacks." The New York Times. The New York Times, 16 July 2013. Web. 03 July 2014. Choe, Sang-hun, and John Markoff. "Cyberattacks Jam Government and Commercial Web Sites in U.S. and South Korea." The New York Times. The New York Times, 08 July 2009. Web. 01 July 2014. Choucri, Nazli, and Daniel Goldsmith. "Lost in cyberspace: Harnessing the Internet, international relations, and global security." Bulletin of the Atomic Scientists 68.2 (2012): 70-77. Web. 11 July 2014. Clarke, Richard. "China's Cyberassault on America." The Wall Street Journal, 15 June 2011. 41  Web. 11 July 2014. ---. "The Coming Cyber Wars." Boston.com. The Boston Globe, 31 July 2011. Web. 11 July 2014. Coleman, Kevin G. World War III: A Cyber War Has Begun. Rep. Technolytics, Oct. 2007. Web. 11 July 2014. "ComScore Media Metrix Ranks Top 50 U.S. Desktop Web Properties for April 2014." ComScore, 23 May 2014. Web. 25 June 2014. "Congressional Research Service Reports." Federation of American Scientists. Federation of American Scientists, n.d. Web. 21 July 2014. Crandall, Jedidiah R., Masashi Crete-Nishihata, Jeffrey Knockel, Sarah McKune, Adam Senft, Diana Tseng, and Greg Wiseman. "Chat Program Censorship and Surveillance in China: Tracking TOM-Skype and Sina UC." First Monday 18.1 (2013): n. pag. Web.  Cyber Maturity in the Asia-Pacific Region 2014. Rep. Australian Strategic Policy Institute, Apr. 2014. Web. 25 June 2014. "Cyber Security." The White House. The White House, n.d. Web. 29 July 2014. Deibert, Ronald. Black Code: Inside the Battle for Cyberspace. Toronto: McClelland & Stewart, 2013. Print. ---. "Canada and the Challenges of Cyberspace Governance and Security." The School of Public Policy at the University of Calgary, SPP Communique 5.3 (March 2013): Web. 11 July 2014. ---. "The Growing Dark Side of Cyberspace (. . . and What To Do About It)." Penn State Journal of Law & International Affairs 1.2 (2012): 260-74. Web.  Deibert, Ronald, and Masashi Crete-Nishihata. "Global Governance and the Spread of 42  Cyberspace Controls." Global Governance 18.3 (2012): 339-61. Web. Deibert, Ronald, and Rafal Rohozinski. "Contesting cyberspace and the coming crisis of authority." Deibert et al., Access Contested (2011): 21-41. Erldenson, Jennifer J. "North Korean Strategic Strategy: Combining Conventional Warfare With The Asymmetrical Effects Of Cyber Warfare." Thesis. Utica College, 2013. Ecii.edu, Mar. 2013. Web. 26 Apr. 2014. Fackler, Martin. "South Koreans Guess at the North's Next Target." The New York Times. The New York Times, 09 Dec. 2010. Web. 01 July 2014. "Fact Sheet on Probabilistic Risk Assessment." U.S. Nuclear Regulatory Commission. U.S. Nuclear Regulatory Commission, Oct. 2007. Web. 18 July 2014. Farwell, James P., and Rafal Rohozinski. "Stuxnet and the future of cyber war." Survival 53.1 (2011): 23-40. We. 11 July 2014. Finkle, Jim. "Four-year Hacking Spree in South Korea Blamed on 'Dark Seoul Gang'" Reuters. Thomson Reuters, 26 June 2013. Web. 03 July 2014. Fitfield, Anna. "N Korea’s Computer Hackers Target South and US." Financial Times. Financial Times, 4 Oct. 2004. Web. 25 Apr. 2014. "Flu Trends." Google. Google, n.d. Web. 31 July 2014. Ginsberg, Jeremy, Matthew H. Mohebbi, Rajan S. Patel, Lynnette Brammer, Mark S. Smolinski, and Larry Brilliant. "Detecting Influenza Epidemics Using Search Engine Query Data." Nature 457.7232 (2008): 1012-014. Web. 31 July 2014. "Google Trends." Google Trends. Google, n.d. Web. 28 July 2014. Hafner, Katie, and John Biggs. "In Net Attacks, Defining the Right to Know." The New York Times. The New York Times, 29 Jan. 2003. Web. 28 June 2014. 43  Hameiri, Shahar, and Lee Jones. "The Politics and Governance of Non-Traditional Security." International Studies Quarterly (2012): 1-12. Web. Hansen, Lene, and Helen Nissenbaum. "Digital disaster, cyber security, and the Copenhagen School." International Studies Quarterly 53.4 (2009): 1155-1175. Harford, Tim. "Big Data: Are We Making a Big Mistake?" Financial Times. Financial Times, 28 Mar. 2014. Web. 31 July 2014. Harlan, Chico, and Ellen Nakashima. "Suspected North Korean Cyberattack on a Bank Raises Fears for S. Korea, Allies." The Washington Post. The Washington Post, 29 Aug. 2011. Web. 11 July 2014. Hogg, Chris. "Two South Korean Civilians Died in Attack by North." BBC News. BBC, 24 Nov. 2010. Web. 11 July 2014. "How Trends Data Is Normalized." Google Trends Help. Google, n.d. Web. 23 July 2014. Kim, Jean. "Article Search API V2." Times Developer Network. The New York Times, n.d. Web. 18 July 2014. Kim, Sam. "South Korea: Chinese Address Source of Attack." The Big Story. Associated Press, 20 Mar. 2013. Web. 02 July 2014. King, Gary, Jennifer Pan, and Molly Roberts. "How Censorship in China Allows Government Criticism But Silences Collective Expression." APSA 2012 Annual Meeting Paper 107.2 (2013): 1-18. Web. <http://ssrn.com/abstract=2104894>. Klimburg, Alexander. "Mobilising cyber power." Survival 53.1 (2011): 41-60. Web. 11 July 2014. Ko, Kyungmin, Heejin Lee, and Seungkwon Jang. "The Internet dilemma and control policy: political and economic implications of the Internet in North Korea." Korean Journal of 44  Defense Analysis 21.3 (2009): 279-295. Web. 11 July 2014. Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. "The Parable of Google Flu: Traps in Big Data Analysis." Science 343.6176 (2014): 1203-205. Web. Lee, Sunny. "US Security Strategy toward North Korea’s Cyber Terrorism." Carnegie Institution for Science (2011). Web. 11 July 2014. Lewis, James A. "The “Korean” Cyber Attacks and Their Implications for Cyber Conflict." Center for Strategic and International Studies (2009). Web. 11 July 2014. Manyin, Mark E., Emma Chanlett-Avery, Ian E. Rinehart, Mary B. Nikitin, and William H. Cooper. U.S.-South Korea Relations. Rep. Congressional Research Service, 12 Feb. 2014. Web. 11 July 2014. Markoff, John. "Web's Anonymity Makes Cyberattack Hard to Trace." The New York Times. The New York Times, 16 July 2009. Web. 01 July 2014. Martin, Shannon E., and Kathleen A. Hansen. Newspapers of Record in a Digital Age: From Hot Type to Hot Link. Westport, CT: Praeger, 1998. Web. Mazzetti, Mark, and David E. Sanger. "Security Leader Says U.S. Would Retaliate Against Cyberattacks." The New York Times. The New York Times, 12 Mar. 2013. Web. 02 July 2014. Mcdonald, Mark. "In Cyberattack, Virus Infects 40 Web Sites In South Korea."The New York Times. The New York Times, 04 Mar. 2011. Web. 01 July 2014. Michel, Jean-Baptiste, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, William Brockman, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman 45  Aiden. "Quantitative Analysis of Culture Using Millions of Digitized Books." Science 331.6014 (2011): 176-82. Web. Military and Security Developments Involving the Democratic People’s Republic of Korea 2013. Rep. Department of Defense, 2013. Web. 11 July 2014.  Musil, Steven. "New Zero-day Vulnerability Identified in All Versions of IE."CNET. CNET, 27 Apr. 2013. Web. 10 July 2014. "N Korea 'behind South Bank Hack'" BBC News. BBC News, 3 May 2011. Web. 28 Apr. 2014. "North Korea Archive." Information Warfare Monitor. Munk School of Global Affairs, n.d. Web. 11 July 2014. Park, Ju-min. "Hacking Highlights Dangers to Seoul of North's Cyber-warriors." Reuters. Thomson Reuters, 21 Mar. 2013. Web. 02 July 2014. Park, Kyung-Ae, and Scott Snyder. North Korea in Transition: Politics, Economy, and Society. Lanham, MD: Rowman & Littlefield, 2013. Print. "PyPDF2 1.22." Python.org. Python Software Foundation, 06 June 2014. Web. 21 July 2014. "Quadrennial Defense Review." Defense.gov. U.S. Department of Defense, n.d. Web. 21 July 2014. Quadrennial Defense Review 1997. Rep. Department of Defense, May 1997. Web. 11 July 2014. Quadrennial Defense Review 2001. Rep. Department of Defense, 30 Sept. 2001. Web. 11 July 2014. Quadrennial Defense Review 2006. Rep. Department of Defense, 06 Feb. 2006. Web. 11 July 2014. Quadrennial Defense Review 2010. Rep. Department of Defense, Feb. 2010. Web. 11 July 2014. Quadrennial Defense Review 2014. Rep. Department of Defense, 04 Mar. 2014. Web. 11 July 46  2014. Shanker, Thom, and David E. Sanger. "U.S. Helps Allies Trying to Battle Iranian Hackers." The New York Times. The New York Times, 08 June 2013. Web. 02 July 2014. Sin, Steve. Cyber Threat Posed by North Korea and China to South Korea and US Forces Korea. Publication. N.p., May 2009. Web. 24 Apr. 2014.  Solce, Natasha. "Battlefield of Cyberspace: The Inevitable New Military Branch - The Cyber Force." Alb. LJ Sci. & Tech. 18 (2008): 293-324. Web. 11 July 2014. "Special Report: The Cyber Domain." Defense.gov. U.S. Department of Defense, n.d. Web. 07 July 2014. "Times Topics." The New York Times. The New York Times, n.d. Web. 18 July 2014. "Update 1 - Cyber Attack on S.Korea Came from Chinese IP-Seoul."Reuters. Thomson Reuters, 20 Mar. 2013. Web. 02 July 2014. Valeriano, Brandon, and Ryan Maness. "The Fog of Cyberwar: Why the Threat Doesn’t Live Up to the Hype." Foreignaffairs.com. Foreign Affairs, 21 Nov. 2012. Web. 26 Apr. 2014. "Viruslike Attack Slows Much Internet Traffic." The New York Times. The New York Times, 25 Jan. 2003. Web. 28 June 2014. Wilson, Clay. Computer Attack and Cyberterrorism: Vulnerabilities and Policy Issues for Congress. Rep. Congressional Research Service, 1 Apr. 2005. Web. 10 July 2014. "Y2K: Overhyped and Oversold?" BBC News. BBC, 06 Jan. 2000. Web. 27 June 2014. Yannis, Alex. "U.S. to Face Honduras And Its Fans." The New York Times. The New York Times, 26 Mar. 2001. Web. 28 June 2014. Yoon, Sangwon. "North Korea Recruits Hackers at School." Al Jazeera. Al Jazeera, 20 June 2011. Web. 10 July 2014. 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items