You may notice some images loading slow across the Open Collections website. Thank you for your patience as we rebuild the cache to make images load faster.

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Towards improving the usability and security of Web single sign-on systems Sun, San-Tsai 2013

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
24-ubc_2014_spring_sun_santsai.pdf [ 35.22MB ]
Metadata
JSON: 24-1.0103287.json
JSON-LD: 24-1.0103287-ld.json
RDF/XML (Pretty): 24-1.0103287-rdf.xml
RDF/JSON: 24-1.0103287-rdf.json
Turtle: 24-1.0103287-turtle.txt
N-Triples: 24-1.0103287-rdf-ntriples.txt
Original Record: 24-1.0103287-source.json
Full Text
24-1.0103287-fulltext.txt
Citation
24-1.0103287.ris

Full Text

Towards Improving the Usability andSecurity of Web Single Sign-OnSystemsbySan-Tsai SunA THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)November, 2013c? San-Tsai Sun 2013AbstractOpenID and OAuth are open and lightweight web single sign-on (SSO) protocols that have beenadopted by high-profile identity providers (IdPs), such as Facebook, Google, Microsoft, andYahoo, and millions of relying party (RP) websites. However, the average users? perceptions ofweb SSO and the systems? security guarantees are still poorly understood. Aimed at filling theseknowledge gaps, we conducted several studies to further the understanding and improvementsof the usability and security of these two mainstream web SSO solutions.First, through several in-lab user studies, we investigated users? perceptions and concernswhen using web SSO for authentication. We found that our participants had several miscon-ceptions and concerns that impeded their adoption. This ranged from their inadequate mentalmodels of web SSO, to their concerns about personal data exposure, and a reduction in theirperceived web SSO value due to the employment of password management practices. Informedby our findings, we offered a web SSO technology acceptance model, and suggested designimprovements.Second, we performed a systematic analysis of the OpenID 2.0 protocol using both formalmodel checking and an empirical evaluation of 132 popular RP websites. The formal analysisidentified three weaknesses in the protocol, and based on the attack traces from the modelchecking engine, six exploits and a semi-automated vulnerability assessment tool were designedto evaluate how prevalent those weaknesses are in the real-world implementations. Two practicalcountermeasures were proposed and evaluated to strengthen the uncovered weaknesses in theprotocol.Third, we examined the OAuth 2.0 implementations of three major IdPs and 96 popularRP websites. By analyzing browser-relayed messages during SSO, our study uncovered severalvulnerabilities that allow an attacker to gain unauthorized access to the victim user?s profileand social graph on IdPs, and impersonate the victim on RP websites. We investigated thefundamental causes of these vulnerabilities, and proposed several simple and practical designimprovements that can be adopted gradually by individual sites.In addition, we proposed and evaluated an approach for websites to prevent SQL injec-tion attacks, and a user-centric access-control scheme that leverages the OpenID and OAuthprotocols.iiPrefaceThe materials in chapters 3 to 7 of this dissertation have each been either published or acceptedfor publication. The author of this dissertation conceived of the research idea, performed all thedesign and evaluation, except in Chapter 3 where the design and execution of the user studieswere shared by other co-authors. He also wrote all the papers resulting from this research,under the supervision of the co-authors who provided feedback and guidance throughout theresearch process. Below are the publication details for each chapter.? Chapter 3: The related materials and a preliminary version of this chapter has beenpublished. A full version of this chapter has been accepted for journal publication. Theuser studies were approved by the UBC?s Behavioral Research Ethics Board (Certificationnumber: H10-02345, Project title: OpenID Web Single Sign-On Usability Study).San-Tsai Sun, Kirstie Hawkey, and Konstantin Beznosov. Investigating users? per-spectives of web single sign-on: Conceptual gaps and acceptance model. Accepted forpublication in the ACM Transactions on Internet Technology (TOIT), 35 pages, June2013.San-Tsai Sun, Eric Pospisil, Ildar Muslukhov, Nuray Dindar, Kirstie Hawkey, and Kon-stantin Beznosov. What makes users refuse web single sign-on? An empirical investigationof OpenID. In Proceedings of the Symposium on Usable Privacy and Security (SOUPS),pages 1?20, July 2011.San-Tsai Sun, Eric Pospisil, Ildar Muslukhov, Nuray Dindar, Kirstie Hawkey, Kon-stantin Beznosov. OpenID-Enabled Browser: Towards usable and secure web single sign-on. In Proceedings of the 29th International Conference on Human Factors in ComputingSystems (CHI) Extended Abstracts, pages 1291?1296, May 2011.San-Tsai Sun, Yazan Boshmaf, Kirstie Hawkey, and Konstantin Beznosov. A billionkeys, but few locks: The crisis of web single sign-on. In Proceedings of the 19th NewSecurity Paradigms Workshop (NSPW), pages 61?72, September, 2010.? Chapter 4: The materials of this chapter has been published in the Elsevier Computerand Security journal.San-Tsai Sun, Kirstie Hawkey, and Konstantin Beznosov. Systematically breaking andfixing OpenID security: Formal analysis, semi-automated empirical evaluation, and prac-tical countermeasures. Computers & Security, 31(4):465?483, May 2012.iiiPreface? Chapter 5: The materials of this chapter has been published in the 19th ACM Conferenceon Computer and Communications Security.San-Tsai Sun and Konstantin Beznosov. The devil is in the (implementation) details:An empirical security analysis of OAuth single sign-on systems. In Proceedings of the19th ACM Conference on Computer and Communications Security (CCS), pages 378?390, October 2012.? Chapter 6: The materials of this chapter has been published in the International Journalof Secure Software Engineering.Sun-Tsai Sun and Konstantin Beznosov. Retrofitting existing web applications witheffective dynamic protection against SQL injection attacks. In International Journal ofSecure Software Engineering, pages 20?40, January 2010.? Chapter 7: A preliminary version of this chapter has been published in two workshops.A full version of this chapter has been published in the 25th Annual Computer SecurityApplications Conference.San-Tsai Sun, Kirstie Hawkey, and Konstantin Beznosov. OpenIDemail enabled browser:Towards fixing the broken web single sign-on triangle. In Proceedings of the Sixth ACMWorkshop on Digital Identity Management (DIM), pages 49?58, October 8 2010.San-Tsai Sun, Kirstie Hawkey, and Konstantin Beznosov. Secure Web 2.0 contentsharing beyond walled gardens. In Proceedings of the 25th Annual Computer SecurityApplications Conference (ACSAC), pages 409?418, December 2009.San-Tsai Sun, Kirstie Hawkey, and Konstantin Beznosov. Towards enabling Web 2.0content sharing beyond walled gardens. In Proceedings of the Workshop on Security andPrivacy in Online Social Networking, pages 979?984, August 2009.Note that without my supervisor?s guidance and support, this dissertation would not have beenpossible. I therefore have opted to use the term we throughout this dissertation.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Overview of How a Web SSO Works . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.1 Users? Perspectives of Web SSO Systems . . . . . . . . . . . . . . . . . . 51.2.2 Security of OpenID and OAuth-based Web SSO Systems . . . . . . . . . 61.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Background and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1 Background of Single Sign-On . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Key Web SSO Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2.1 InfoCard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2.2 SAML Web SSO Profile 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . 162.2.3 OpenID 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2.4 OAuth 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.3.1 Web SSO Usability Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 212.3.2 Browser-supported Solutions . . . . . . . . . . . . . . . . . . . . . . . . . 222.3.3 Security Analysis of OpenID . . . . . . . . . . . . . . . . . . . . . . . . . 232.3.4 Security Analysis of OAuth . . . . . . . . . . . . . . . . . . . . . . . . . . 24vTable of Contents3 Conceptual Gaps, Alternative Design, and Acceptance Model . . . . . . . . 263.1 Research Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2 Exploratory Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2.1 Study Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2.2 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2.3 List of Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3 The Identity-Enabled Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.3.1 IDeB Behind the Scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.3.2 IDeB from User?s Perspective . . . . . . . . . . . . . . . . . . . . . . . . 353.4 Formative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.4.1 Study Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.4.2 Results and Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.5 Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.5.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.6 Conceptual Gaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.6.1 Triangular versus Linear Data Flow . . . . . . . . . . . . . . . . . . . . . 453.6.2 The By-Value versus the By-Token Profile-Sharing Model . . . . . . . . . 473.6.3 The Transient SSO Account versus the Traditional Account . . . . . . . 493.7 The Web SSO Technology Acceptance Model . . . . . . . . . . . . . . . . . . . . 503.7.1 Intrinsic Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.7.2 Extrinsic Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.8 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.8.1 Recommendations for RPs and IdPs . . . . . . . . . . . . . . . . . . . . . 533.8.2 Recommendations for the Web SSO Development Community . . . . . . 553.9 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574 Formal Analysis and Empirical Evaluation for OpenID Security . . . . . . . 594.1 Approach and Adversary Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.1.1 Adversary Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.2 The OpenID Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.3 Protocol Formalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.3.1 Alice-Bob Formalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.3.2 HLPSL Formalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.4 Attack Vector Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.4.1 Manual Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.4.2 The OpenID Vulnerability Assessment Tool . . . . . . . . . . . . . . . . 704.4.3 Evaluation of Real-world RPs . . . . . . . . . . . . . . . . . . . . . . . . 714.5 Defense Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72viTable of Contents4.5.1 The Web Attacker Defense Mechanism . . . . . . . . . . . . . . . . . . . 734.5.2 The MITM Countermeasure . . . . . . . . . . . . . . . . . . . . . . . . . 744.5.3 Reference Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 784.5.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785 Empirical Security Analysis of OAuth SSO Systems . . . . . . . . . . . . . . . 805.1 How OAuth 2.0 Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.2.1 Adversary Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855.3 Evaluation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.3.1 Access Token Eavesdropping (A1) . . . . . . . . . . . . . . . . . . . . . . 875.3.2 Access Token Theft via XSS (A2) . . . . . . . . . . . . . . . . . . . . . . 875.3.3 Impersonation (A3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895.3.4 Session Swapping (A4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.3.5 Force-login CSRF (A5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.4.1 Authentication State Gap . . . . . . . . . . . . . . . . . . . . . . . . . . 915.4.2 Automatic Authorization Granting . . . . . . . . . . . . . . . . . . . . . 925.4.3 Security Implications of Stolen Tokens . . . . . . . . . . . . . . . . . . . 935.4.4 Vulnerability Interplays . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945.4.5 Visualization and Analysis of Results . . . . . . . . . . . . . . . . . . . . 945.4.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955.5 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965.5.1 Recommendations for IdPs . . . . . . . . . . . . . . . . . . . . . . . . . . 965.5.2 Recommendations for RPs . . . . . . . . . . . . . . . . . . . . . . . . . . 985.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 996 Dynamic SQL Injection Attacks Protection . . . . . . . . . . . . . . . . . . . . 1006.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1026.1.1 How SQL Injection Attacks Work . . . . . . . . . . . . . . . . . . . . . . 1026.1.2 False Positives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1036.1.3 Existing Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . 1046.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1056.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076.3.1 Token Type Conformity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1086.3.2 Conformity to Intention . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1156.4.1 HTTP Request Interceptor . . . . . . . . . . . . . . . . . . . . . . . . . . 115viiTable of Contents6.4.2 Taint Tracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1156.4.3 SQL Interceptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1156.4.4 SQL Lexer, Intention Validator and SQLIA Detector . . . . . . . . . . . 1166.4.5 Design Details Specific to ASP and ASP.NET . . . . . . . . . . . . . . . 1166.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1166.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1176.5.2 Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1176.5.3 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1176.5.4 Field Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1206.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1217 Secure Content Sharing Beyond Walled Gardens . . . . . . . . . . . . . . . . . 1227.1 Background and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1257.1.1 User Content Sharing Practices . . . . . . . . . . . . . . . . . . . . . . . 1257.1.2 Distributed Authorization and Background of RT . . . . . . . . . . . . . 1267.1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1277.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1287.2.1 System Architecture and Data Flows . . . . . . . . . . . . . . . . . . . . 1297.2.2 OpenPolicy Provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1327.3 Implementation and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1347.4 Identity-enabled Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1357.4.1 System Architecture and Data Flow . . . . . . . . . . . . . . . . . . . . . 1367.4.2 OpenIDua Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1377.4.3 HTTP OpenIDAuth Scheme . . . . . . . . . . . . . . . . . . . . . . . . . 1397.4.4 Log into an OpenIDemail Provider . . . . . . . . . . . . . . . . . . . . . . 1407.4.5 Access Protected Content . . . . . . . . . . . . . . . . . . . . . . . . . . . 1417.4.6 Prototype Implementation and Evaluation . . . . . . . . . . . . . . . . . 1427.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1438 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1468.1 Design Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1468.2 Why OAuth 2.0 Succeeds While Others Fail . . . . . . . . . . . . . . . . . . . . 1488.3 Lessons Learned and Implications . . . . . . . . . . . . . . . . . . . . . . . . . . 1509 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1559.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1579.1.1 Further Investigation of Users? Perspectives of Web SSO . . . . . . . . . 1579.1.2 Usable IdP-Phishing Resistant Mechanisms . . . . . . . . . . . . . . . . . 1589.1.3 Security Analysis of the OpenID Connect Protocol . . . . . . . . . . . . 158viiiTable of Contents9.1.4 Security Analysis of OAuth JavaScript SDK Libraries . . . . . . . . . . . 1599.1.5 Adding Cryptographic Protection to OAuth without Sacrificing Simplic-ity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1609.1.6 Human and Organizational Factors Contributing to the Unsecured SSOImplementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162AppendicesA Web SSO Usability Study Documents (Chapter 3) . . . . . . . . . . . . . . . 183A.1 Exploratory Study Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183A.1.1 Background Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . 183A.1.2 Task Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185A.1.3 Post-Task Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . 187A.2 Comparative Study Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189A.2.1 Task Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189A.2.2 Post-Condition Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . 190A.2.3 Post-Session Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . 191B OpenID HLPSL Code (Chapter 4) . . . . . . . . . . . . . . . . . . . . . . . . . . 194C OAuth Access Token Theft Exploit Scripts (Chapter 5) . . . . . . . . . . . . 198C.1 Access Token Theft Exploit Script 1 . . . . . . . . . . . . . . . . . . . . . . . . . 198C.2 Access Token Theft Exploit Script 2 . . . . . . . . . . . . . . . . . . . . . . . . . 199C.3 Access Token Theft via window.onerror . . . . . . . . . . . . . . . . . . . . . . . 200ixList of Tables3.1 Properties of the selected RPs in the study. . . . . . . . . . . . . . . . . . . . . . 313.2 Participants? demographics in the comparative study. . . . . . . . . . . . . . . . . 403.3 A Wilcoxon Signed Rank Test revealed a statistically significant difference be-tween the CUI and the IDeB in the perceived ease-of-use, security protectionand privacy control for all sub-tasks (z = ?2.356 to ?4.774, p < .018), with amedium to large effect size, (r = 0.28 to 0.57). The median rating scores for theCUI and the IDeB are listed on the last two columns respectively. . . . . . . . . 413.4 The main differences between by-value and by-token profile sharing models. . . . 484.1 Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.2 The results of the empirical RP evaluation. ?SSO CSRF? row denotes the per-centage of RPs that are vulnerable to at least one variant of SSO CSRF attacks. 715.1 IdP-specific implementation mechanisms. Acronyms: FB=Facebook; GL=Google,MS=Microsoft; MU=Multiple Use; SU=Single Use; MD=Multiple Domain; WL=Whitelist;SD=Single Domain. Notes: 1: prior to the fix; 2: postMessage and Flash; 3:postMessage, Flash, FIM, RMR and NIX; 4: use cookie; 5: whitelist for clientand server-flow, but multiple domains for SDK flow; 6: only when an offlinepermission is requested. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.2 The percentage of RPs that is vulnerable to each exploit. Legends: T: SSL is usedin the traditional login form; S: Sign-in endpoint is SSL-protected; A1: Accesstoken eavesdropping; A2: Access token theft via XSS; A3: Impersonation; A4:Session swapping; A5: Force-login. . . . . . . . . . . . . . . . . . . . . . . . . . . 875.3 The percentages of RPs that are vulnerable to impersonation (A3) or sessionswapping (A4) attacks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895.4 Top 10 permissions requested by RPs. Column ?Vulnerable? denotes the per-centage of RPs that request the permission and are vulnerable to token theft(i.e., A1 or A2 attacks.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935.5 Recommendations developed for client-flow (C) or server-flow (S) RPs. Each cellindicates wether the suggested recommendation offers no (empty), partial (4),or complete (?) mitigation of the identified attacks (A1?A5). . . . . . . . . . . 96xList of Tables6.1 SQLPrevent overheads for cases of benign (?detection?) and malicious (?preven-tion?) HTTP requests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119xiList of Figures1.1 Web SSO login flows and sample dialog forms in the sign-up and sign-in processes. 31.2 Inconsistent web SSO user experience. . . . . . . . . . . . . . . . . . . . . . . . . 42.1 The ?single sign-on triangle? model. . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Key web SSO standards and their strengths. . . . . . . . . . . . . . . . . . . . . . 132.3 How InfoCard works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4 SAML message exchange model for achieving web SSO. . . . . . . . . . . . . . . 172.5 Overview of how OpenID works. . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.1 Login forms of three chosen RP websites in the study. . . . . . . . . . . . . . . . 303.2 Screen shots of the IdP phishing demo website. . . . . . . . . . . . . . . . . . . . 323.3 Main screens of the identity-enabled browser (IDeB): (a) block-out desktop andIdP login form, (b) IdP login form that supports accounts from Google, Yahoo,Microsoft and Facebook, (c) profile sharing consent form, (d) block-out desktopand IdP account selector, (e) IdP account selector, (f) IdP identity indicator, (g)profile sharing setting form. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.4 The overall Likert-scale ratings from post-condition questionnaires in the forma-tive study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.5 The block-out desktop and IdP login form designed and used in the formativestudy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.6 Sample mental model drawings and results. . . . . . . . . . . . . . . . . . . . . . 403.7 The average and standard deviation of Likert-scale ratings from post-conditionquestionnaires. The differences are statistically significant with a WilcoxonSigned Rank Test (see Table 3.3). . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.8 The perceived ease of use, security protection, and privacy control ranking re-sults from post-session questionnaires suggest that our design is favored by mostparticipants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.9 The login option preferences from the post-session questionnaire indicate that60% of study participants would use IDeB on the websites they trust. . . . . . . 44xiiList of Figures3.10 The data flows of the system model and the acquired incorrect mental model.The system model has a triangular data flow with two distinct browser sessionswith the RP and IdP, while the data flow of the incorrect mental model is linear(i.e., the user?s login credential is given to the RP without an active session withthe IdP.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.11 Web single sign-on technology acceptance model. The acceptance factors wefound are correlated as antecedent variables to the intermediate factors in TAM,and categorized as intrinsic and extrinsic variables. Solid arrowed lines indicatepositive influences, while dashed arrowed lines represent negative influences. . . 514.1 Overall approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.2 The OpenID protocol sequence diagram. . . . . . . . . . . . . . . . . . . . . . . . 624.3 The Alice-Bob formalization of the OpenID protocol. The corresponding stepsfrom the the sequence diagram is denoted in the end of each step. . . . . . . . . . 654.4 The conceptual model of the HLPSL formalization. . . . . . . . . . . . . . . . . . 654.5 Main components of OpenIDVAT. . . . . . . . . . . . . . . . . . . . . . . . . . . 704.6 The revised OpenID protocol in Alice-Bob notation. The changes are shown inboldface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.7 The MITM defense mechanism establishes a DH session key (gab mod p) betweenthe browser and the RP server during the OpenID authentication process. Here,g is the DH generator, p is the modulus, and a and b are random DH privatekeys for the browser and the RP server respectively. . . . . . . . . . . . . . . . . 754.8 The MITM defense mechanism with the presence of an MITM attacker betweenthe browser and the RP server. The OpenID authentication protocol will failif the MITM attacker attempts to interfere the DH key exchange. If the AuthResponse is successfully validated, then the DH key shared by the browser andthe RP is unknown to the attacker. . . . . . . . . . . . . . . . . . . . . . . . . . . 775.1 The server-flow protocol sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . 825.2 The client-flow protocol sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . 835.3 Causality diagram: OAuth 2.0 design features that lead to the security weak-nesses we found. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925.4 The distribution of the rank of each evaluated RP and its corresponding vulner-abilities (A1 to A5), requested permissions (offline, email, publish streams,publish actions), and the use of SSL on tradition login form (SSL T) and SSLsession (SSL S). Each vertical line in the ?Rank? row denotes the rank of the RPthat we tested. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 956.1 How SQL injection attacks work. . . . . . . . . . . . . . . . . . . . . . . . . . . . 102xiiiList of Figures6.2 Main elements of SQLPrevent architecture are shown in light grey. The dataflow is depicted with sequence numbers and arrow labels. Underlined labelsindicate that the data are accompanied by the tainted meta-data. Depending onwhether an SQL statement is benign or potentially malicious, data may flow tothe Intention Validator conditionally. . . . . . . . . . . . . . . . . . . . . . . . . 1076.3 A simplified SQL SELECT statement grammar written in Backus-Naur Form(BNF). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116.4 The intention tree of the intention statement from Fragment 6.1. Oval boxesrepresent nonterminal symbols, square boxes represent terminal symbols, anddash-lined boxes are placeholders. The grammar rules for each placeholder are(from left to right) two id lists, a STRING LIT, and an order clause. . . . . . 1126.5 Design of the evaluation testbed. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1166.6 Detection and prevention performance evaluation. tb and tm are round-trip re-sponse time with SQLPrevent deployed, measured using benign and maliciousrequests, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1177.1 User-centric content sharing model. . . . . . . . . . . . . . . . . . . . . . . . . . . 1287.2 The system architecture of the proposed Web 2.0 content sharing solution. Email2OpenIDprovider enables web users to use their email to login CSPs while remain usingOpenID URI for identification. OpenPolicy provider offers services for inter-net users to organize their access polices, and for CSPs to make authorizationdecisions. Users are free from choosing their Email2OpenID and OpenPolicyproviders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1297.3 Flow for sharing content, assuming Owner W has logged into her OpenPolicyprovider P. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1307.4 Flow for accessing a shared content. . . . . . . . . . . . . . . . . . . . . . . . . . 1317.5 Main components of an OpenPolicy provider. . . . . . . . . . . . . . . . . . . . . 1327.6 OpenPolicy performance evaluation results. The worst-case response time wasmeasured by forcing OpenPolicy to enumerate all credential statements on alltesting servers. For each run, a different number of credential statements aregenerated on each server, and 5-20 concurrent authorization requests are submitted.1347.7 System architecture and high-level data flow of the proposed the identity-enabledbrowser. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1367.8 Flow for logging into an OpenIDemail provider. . . . . . . . . . . . . . . . . . . . 1407.9 Flow for accessing protected content. . . . . . . . . . . . . . . . . . . . . . . . . . 1417.10 Screen shots of the OpenIDemail enabled browser. . . . . . . . . . . . . . . . . . . 1428.1 Contradictory requirements between User, RP and IdP. Each requirement isenclosed in an oval, and the conflict between two requirements is denoted bya double-arrowed line with a cross in the middle. . . . . . . . . . . . . . . . . . . 147xivAcknowledgementsI would like to sincerely thank many people who have supported and helped me during mypursuit of PhD study at the University of British Columbia. This dissertation would not havebeen possible without them.My deepest gratitude is to my research supervisor, Professor Konstantin (Kosta) Beznosov,for his unwavering support and encouragement throughout the study. I am indebted to Kostafor his tremendous help. Especially at the beginning when I encountered several obstacles, bothin research and my new life in Vancouver, and decided to discontinue my research. Withouthis guidance, support, and patience, this dissertation would not have been possible. I thereforehave opted to use the term we throughout this dissertation.I would like to gratefully thank Professor Matei Ripeanu, Ali Mesbah, and Karthik Pattabi-raman, who served on my supervisory committee and have provided insightful comments andconstructive criticism to improve this dissertation. I would also like to thank Professor SathishGopalakrishnan and Eric Wohlstadter for being my university examination committee, andProfessor Paul Van Oorschot, who served as the external examiner of my doctoral examination.They provided many helpful suggestions and constructive comments to further strengthen thisdissertation.My thanks go to all my colleagues from the Laboratory for Education and Research inSecure Systems Engineering (LERSSE). I am very grateful for their friendship and assistancein all aspects. I want to thank them for their valuable feedback and insightful discussions onmany parts of my research. Knowing them made my life richer.Special thanks to my wife, Hsiao-Pei Lan, for her love and support throughout my yearsas a PhD student. She accompanied me on this journey, sharing my happiness and stress. Ialso want to thank my son, Tien-Lan Sun, who has always been a wonderful and consideratekid. Thanks for their encouragement and cheering during the stressful times. Their supporthas meant much to me. I would also like to thank my parents-in-law, Chun-Ming Lan andHsueh-Ying Yang Lan, for their continuous encouragement and support. They treat me liketheir own son, and always provide support and advices when I need them. I am fortunate andgrateful to have them as my parents-in-law.Last, and surely not the least, I want to thank my parents (An-Chuan Sun and Hsiu-ChihLu) who offered me unconditional love and support. They did not have a chance to completetheir elementary education, and yet strived to save any penny they have for their descendants toobtain better education. My father was a coal miner for more than 40 years and unfortunatelypassed away prior to my study. I hope that this dissertation would make him proud. ThanksxvAcknowledgementsto my brother, Chun-Wang Sun, who provided financial support for my master study abroad20 years ago, and took great care of my mother throughout these years. I would not able topursuit my PhD without him. I am lucky to have him as my brother. Thank you all.xviChapter 1IntroductionThe proliferation of web applications has caused web users to accumulate a multitude of useraccounts and passwords. In 2007, a large scale study of password habits found that a typical webuser had about 25 password-protected accounts, and entered approximately eight passwords perday [FH07]. The burden of managing this increasing number of accounts and passwords leads to?password fatigue? [Wik09]. Aside from the burden on human memory, password fatigue maycause users to devise password management strategies (e.g., to write down, reuse, or chooseweak passwords) that could degrade the security of their protected information [GF06, FH07].Web single sign-on (SSO) systems enable web users to leverage one single account on aservice provider to sign onto multiple unrelated websites, reducing the number of passwordsand the amount of registration information a user must manage. This is commonly accom-plished by having an identity provider (IdP) that manages and authenticates the user?s identityinformation (e.g., Facebook, Google, Yahoo), and then provides the asserted identity to otherrelying party (RP) websites (e.g., CNN, Sears, Groupon) upon user login through the user?sbrowser. Web SSO solutions were initially developed by various educational institutions in themid-1990s. Early innovators in the field include Stanford University?s WebAuth, Cornell Uni-versity?s SideCar, Yale?s Central Authentication System (CAS), and PubCookie from Universityof Washington [HHJM08]. Microsoft Passport [Opp04], the predecessor of Windows Live ID,was the first commercial effort to improve web authentication through SSO. However, MicrosoftPassport failed to gain widespread adoption, beyond Microsoft?s own services, for reasons oftrust. In addition to its security flaws [Wan12], the system is proprietary and centralized, andthus perceived by many potential web users and RP websites as a vehicle that could allowMicrosoft to monopolize the online identity landscape [HHJM08].As being an identity provider comes with strategic competitive advantages for the provider?sonline identity landscape [SBHB10], a number of proprietary web SSO solutions from high-profile service providers (e.g., Yahoo BBAuth [Yah08], AOL OpenAuth [AOL08], Google Auth-Sub [Goo08]) have been introduced. However, similar to Microsoft Passport, these systemsare proprietary and centralized. The protocol is not standardized, and identity information ismaintained and controlled by a single administrative domain. SAML [OAS05]-based federatedidentity solutions enable cross-domain single sign-on. Many successful SAML implementationsexist in industry, government, and academia [WNC+05], and other standards such as the In-ternet2 Shibboleth project [Int08], Liberty Alliance [Kan02], OASIS Web Services Security(WS-Security) [OAS02], and eXtensible Access Control Markup Language [Com05] (XACML)11.1. Overview of How a Web SSO Worksare based on SAML. Although the SAML framework is highly flexible and extensible and sup-ports varying degrees of identity assurance, the prerequisite of agreements on protocol detailsamong organizations in a federation, and the complexity of XML parsing, signing and valida-tion, make it difficult to scale to the Internet at large. Information Card [NJ08], known asInfoCard, defines the Identity Selector Interoperability Profile specification [NJ08] underlyingWindows CardSpace [Mic09b] deployed in Windows operating systems. InfoCard has impor-tant features such as phishing-resistant authentication and IdP-to-RP unlinkability. However,due to weak adoption by IdPs and RPs, Microsoft discontinued the development of WindowsCardSpace in 2011 [Mic11].OpenID [RF07] and OAuth [HLRH11] are open and lightweight web SSO protocols that havebeen adopted by high-profile IdPs such as Facebook, Google, Twitter, Yahoo, and Microsoft,and millions of RP websites. Together, these two protocols offer billions of potential webSSO users [Ope09, Fac11], and are becoming a key element of the web ecosystem. BesidesSSO, the OAuth protocol is employed by major social websites for social login, which offerspersonalized, web-scale content sharing through social graphs and platform-specific servicessuch as messaging, recommendations, ratings, and activity feeds. This pervasive and in-depthintegration provides a clear and compelling business incentive for RPs and IdPs [SBHB10,Gig11, Jan12a]. The enormous number of users from social networks attracts numerous RPwebsites to reach a broader set of users and integrate their services deep into the users? socialcontexts [Fac11].This dissertation work focuses on analyzing and improving the usability and security ofOpenID and OAuth-based web SSO systems because together they offer a critical mass of theweb SSO population [Jan12b]. Usability and security are two critical factors to the adoptionsand protections of those billions of user accounts on IdPs and RPs [MR08, DD08]; however users?perspectives of web SSO and the systems? security guarantees are still poorly understood. Aimedat filling these knowledge gaps, we conducted several studies to further the understanding andimprovements of the usability and security of these two mainstream web SSO solutions.The rest of this chapter is organized as follows: the next section provides an overview ofhow a typical web SSO works. In Section 1.2, we describe the problems that motivate thisdissertation, and discuss our research goals and their importance. Section 1.3 summarizesthe contributions of this dissertation, followed by an outline of the dissertation?s structure inSection 1.4.1.1 Overview of How a Web SSO WorksWeb SSO systems are based on browser redirections in which an RP redirects the user to anIdP that interacts with the user before redirecting the user back to the RP for sign-in. The IdPauthenticates the user, identifies the RP to the user, and then prompts the user for grantingthe RP access to the user?s profile information. Once the requested permissions are granted,21.1. Overview of How a Web SSO WorksFigure 1.1: Web SSO login flows and sample dialog forms in the sign-up and sign-in processes.the user is redirected back to the RP with either an identity assertion (i.e., the user?s identityattributes digitally signed by the IdP) that the RP can verify locally, or an access token (i.e.,randomly generated string that represents the scope and duration of the granted permissions)that allows the RP to access the user?s profile information through the web service interfacepublished by the IdP. Further discussions on the details of various SSO protocols are presentedin the next chapter.Despite differences in protocol details, the interaction flows from the user?s perspectiveare similar?the authentication request and response are passed between the RP and the IdPthrough the browser. Figure 1.1a illustrates the following steps, which demonstrate a high-levelview of a signup flow when a visitor attempts to log in to an RP website using one of her IdPaccounts:1. A user selects an IdP via a login form presented by an RP. A web SSO-integrated login formtypically combines traditional login fields (i.e., username, password) with a list of IdP iconsfor the user to choose from (see Figure 1.1b for an example).2. The RP redirects the user to the IdP for authentication.3. The user authenticates to the IdP by entering her username and password. Note that thisstep may be skipped if the user has previously been authenticated by the IdP in the browsersession. After authentication, the IdP presents a profile sharing consent form for the userto authorize the release of her profile information (Figure 1.1c). This consent step could beomitted if the requested permissions have already been granted by the user.31.2. Problem StatementFigure 1.2: Inconsistent web SSO user experience.4. The IdP redirects the user back to the RP with the requested profile attributes. Beforegranting access, the RP may prompt the user to complete a registration form to gatheradditional profile information or link to an existing account (Figure 1.1d).For a returning user, only steps 1 to 3 are needed to sign onto the RP website ( i.e., selectingan IdP and entering an IdP username and password if the user has not yet authenticated tothe IdP in the browser session).1.2 Problem StatementDespite the proliferation of OpenID and OAuth adoptions by IdPs and RPs, users? perspec-tives of web SSO, and the security properties of these two systems have not been thoroughlyexamined. First, security mechanisms are only effective when adopted and used correctly byusers [WT99, AS99, MAS03, Lam09], and yet there is little understanding of the perceived risksand concerns users face when using web SSO for login, their mental models of web SSO, andhow these models influence their perceptions of security and privacy, as well as their intentionsto adopt web SSO. Second, the value of those billions of web SSO accounts is clearly attractiveto adversaries [BMBR11, Pat11], but whether OpenID and OAuth-based SSO systems are se-cure and sound requires further investigations [BHvOS12]. Our research aims to fill these twoknowledge gaps.41.2. Problem Statement1.2.1 Users? Perspectives of Web SSO SystemsAnalogous to how credit cards reduce the friction of paying for goods and services, web SSOsystems are intended to reduce the friction of using the Web. According to Davis?s technologyacceptance model (TAM) [DBW89]?one of the most widely used models for explaining thefactors that affect user acceptance of information technologies?and prior research that extendedTAM in different application domains [Fen98, LMSZ00, MK01, SH03, Pav03, AGS04, WW05],uses? perceived usefulness, ease of use, and risks determine user acceptance of a computertechnology. Nevertheless, in the context of web SSO, users? perceptions have not yet beenthoroughly studied.There are several recommendations of best practices and design guidelines for implementingusable web SSO user interface [Fre08, Sac08, DD08]. However, because RPs and IdPs havediverse needs for authentication and user management, they do not offer consistent user experi-ence. As illustrated in Figure 1.2, when accessing N RPs using one IdP, the user must visit N+1possible different login forms (one for each RP website and one at the IdP), choose an IdP tologin N times via N possible ways, consent to the release of personal profile information on theIdP N times, and log out N + 1 times through N + 1 different interfaces. This complex and in-consistent user experience may impose a cognitive burden on average web users [MR08, DD08].Additionally, many RPs combine a sign-up or account linking step (Figure 1.1c) at the end ofan SSO process to gather additional profile information required for a new account, or to allowexisting registered users to login using their SSO account. However, users may not understandthe purpose of the additional signup or account linking step, and the process may confuse userseven further. Furthermore, the lack of visibility and feedback for users who use different IdPaccounts for RPs that vary in trustworthiness could impose additional memory and cognitiveburdens. Besides remembering which IdP account was used on the visited RP, using multipleIdP accounts in a browser session can make it difficult for the user to determine why an accessfailed, and whom to contact if a problem is encountered [MR08, DD08].In addition to ease of use, users? perceived risks could also play a significant role in theirpreference of login options. First, sharing with RPs personally identifiable information cancause significant privacy concerns [DTO02, MR08, SC09]. Web SSO users may be concernedabout spam or misuse of their profile information when signing onto RP websites using theirIdP account. In addition, single point-of-failure is an inherent risk of using web SSO; onecompromised account on an IdP can result in breaches on all services that use this compromisedidentity for authentication. Moreover, redirection-based web SSO systems may habituate usersto being redirected to IdP websites for authentication. If users do not verify the authenticityof these websites before entering their credentials (and they usually do not [WMG06, DTH06,ZECH07, SDOF07, SEA+09, ATO12, Hon12]), IdP account credential phishing attacks arepossible. A malicious RP could redirect users to a bogus IdP login form to steal the victim?slogin credential, and it has to rely on a user?s cognitive capability to detect an IdP phishingattack. Furthermore, as IdP login forms are initiated from and surrounded by RP websites,51.2. Problem Statementwhether users understand that they are not giving their username and password to the RPwebsites is questionable.The user-centric design of security mechanisms is imperative to the development of securitysolutions that are intended to be used by average users [AS99, MAS03, Lam09]. Hence, under-standing users? perceptions and addressing their concerns and challenges are essential to thecontinuous adoption and evolution of web SSO systems. To improve the usability of OpenIDand OAuth-based web SSO systems, we aimed to further the understanding of the followingquestions:? What are the mental models users have?? How are these mental models formed?? What are the gaps between users? mental models and the system model?? How do these gaps affect users? security/privacy perceptions and adoption intentions?? How can we reduce these conceptual gaps?1.2.2 Security of OpenID and OAuth-based Web SSO SystemsDue to the widespread adoptions of IdPs and RPs, a security breach in the OpenID and OAuth-based web SSO systems could compromise the confidentiality, integrity and availability of bil-lions of user accounts residing on millions of websites. It is thus clearly important to understandwhether these system are secure and sound in order to protect web users from adversaries pre-sented in today?s hostile web environment.From an adversary?s perspective, the information guarded by OpenID and OAuth-basedweb SSO systems can be attractive and valuable. Through a successful exploit of an uncoveredweakness in the protocol or implementations, an attacker may be able to harvest private userdata, such as email addresses, phone numbers, friend lists, and other personal informationthat have monetary value (i.e., credit card, online transactions). To an adversary, such dataare valuable [BMBR11] and can be used for identify theft, online profiling, and large-scaleemail spam, phishing, and drive-by-download campaigns [Pat11]. The enormous user basesand growing popularity within these IdP and RP websites could lure numerous adversariescontinually into this ?lucrative business.?In addition to the threats imposed on the user?s private data, a compromised SSO accountcould also put the victim?s friends and family in her social circle at risk. The social graph withinan IdP or RP site carries established trust, and is a powerful viral platform for information dis-tribution. Web users have been conditioned to be wary of links in email, but tend to put moretrust in social network messages from their social circles [SCM11]. Adversaries are commonlyexploiting the trust within a social graph to improve the pull of their lures [Pat11, SCM11].Known attack vectors include compromising existing accounts via phishing [Mil09] and mal-ware [TN10, BCF10], creating fake accounts for infiltrations [Sop09, BSBK09, BMBR11], orthrough fraudulent applications [PAC09, SBL09, EMKK11]. Nonetheless, flaws in the design61.3. Contributionsand implementation of web SSO systems may also allow an attacker to act on behalf of thevictim user for the purposes of fame and monetary gains, which is an important research areathat urges for an in-depth investigation.Furthermore, OpenID and OAuth-based web SSO systems are built upon the existing webinfrastructure, but web application vulnerabilities [OWA10] (e.g., network eavesdropping, cross-site scripting, SQL injection, cross-site request forgery, clickjacking) are prevalent and con-stantly exploited [Whi11, NIS11]. In addition, as protocol messages and sensitive private dataare passed between the RP and IdP through the browser, a vulnerability found in the browsercould also lead to significant SSO security breaches. It is therefore vital to investigate whetherand how those web vulnerabilities could be leveraged by adversaries to compromise web SSOsystems, and how to prevent them in a sound and practical way.Given the popularity of major IdPs and the proliferation of RP websites, the risk of com-promised web SSO implementations can be significant. To enhance the security of OpenID andOAuth-based SSO systems, we aimed at furthering the understanding of the following questions:? What are fundamental security weaknesses in the OpenID and OAuth-based SSO systems?? How can these weaknesses be leveraged by attackers?? How prevalent are those weaknesses?? What are the enabling root causes?? How can we mitigate them in an effective and practical way?1.3 ContributionsThis dissertation research conducted several works to further the understanding of usability andsecurity of two key web SSO systems, and proposed improvements and mitigation mechanismsbased on insights from our investigations. In summary, this dissertation contributes to the bodyof knowledge in the domain of web SSO as follows:? Conceptual gaps and acceptance model. Through several user studies, we evaluate users?experience, investigate their perceptions and concerns when using web SSO for authenti-cation, and explore possible improvements. We found that our participants had severalmisconceptions and concerns that impeded their adoption. This ranged from their inade-quate mental models of web SSO, to their concerns about personal data exposure, and areduction in their perceived web SSO value due to the employment of password manage-ment practices. Informed by our findings, we offered a web SSO technology acceptancemodel, and suggested design improvements.? Formal analysis, semi-automated empirical evaluation, and practical countermeasures forOpenID security. We conduct a systematic analysis of the OpenID 2.0 protocol usingboth formal model checking and an empirical evaluation of 132 popular websites that71.3. Contributionssupport the use of OpenID for login. Our formal analysis reveals that the protocol doesnot guarantee the authenticity and integrity of the authentication request, and it lackscontextual bindings among the protocol messages and the browser. The results of ourempirical evaluation suggest that many OpenID-enabled websites are vulnerable to a seriesof cross-site request forgery attacks (CSRF) that either allow an attacker to stealthily forcea victim user to sign into the OpenID supporting website and launch subsequent CSRFattacks, or force a victim to sign in as the attacker in order to spoof the victim?s personalinformation or mount XSS attacks. In addition, the adversary can impersonate the victimon many of the evaluated websites by forging the extension parameters during an SSOprocess. Based on the insights from this analysis, we propose and evaluate a simple andscalable mitigation technique for OpenID-enabled websites, and an alternative man-in-the-middle defense mechanism for deployments of OpenID without SSL.? Empirical security analysis of OAuth 2.0 SSO systems. We examine the OAuth 2.0 im-plementations of three major IdPs (Facebook, Microsoft, and Google) and about onehundred popular RP websites that support the use of Facebook accounts for login. Ourresults uncover several critical vulnerabilities that allow an attacker to gain unauthorizedaccess to the victim user?s profile and social graph, and impersonate the victim on theRP website. Closer examination reveals that these vulnerabilities are caused by a set ofdesign decisions that trade security for implementation simplicity. To improve the secu-rity of OAuth 2.0 SSO systems in real-world settings, we suggest simple and practicalimprovements to the design and implementation of IdPs and RPs that can be adoptedgradually by individual sites.? Dynamic SQL injection attack protection. SQL injection attacks (SQLIAs) are one ofthe foremost threats to web applications, and can be leveraged by adversaries to directlycompromise users? personal data and authentication credentials on IdP and RP websites.We present an approach for retrofitting existing web applications with run-time protectionagainst known as well as unseen SQLIAs without the involvement of application develop-ers. The precision of the approach is also enhanced with a method for reducing the rate offalse positives in the SQLIA detection logic, via runtime discovery of the developers? in-tention for individual SQL statements made by web applications. The proposed approachintercepts both HTTP requests and SQL statements, marks and tracks parameter valuesoriginating from HTTP requests, and performs SQLIA detection and prevention on theintercepted SQL statements.? OpenPolicy: Secure content sharing beyond walled gardens. By leveraging the OpenIDand OAuth protocols, we explore and evaluate a preliminary design of a novel user-centricaccess control scheme that enables web users to reuse their access control policies acrossboundaries of websites. In decentralized environments such as the Web, the contentowner and the requestor often are unknown to each other, but need to share contents in81.4. Dissertation Outlinea controlled manner. To support distributed authorization, our proof-of-concept imple-mentation allows content owners to specify access policies based on their existing socialcontacts, express delegation of relationship authority (i.e., friend?s friend), and denoteauthorized users using attributes (e.g., friends from a university).1.4 Dissertation OutlineThe rest of this dissertation consists of eight chapters. The next chapter presents backgroundand reviews related work, and Chapter 3 describes three user studies that investigate the chal-lenges and concerns web users face when using web SSO for authentication. Chapter 4 presentsour formal and empirical security analysis of the OpenID 2.0 protocol, and in Chapter 5, weexamine the security of real-world OAuth-based SSO implements. The design and evaluationof our proposed SQLIAs protection mechanism and user-centric access control scheme are dis-cussed in Chapter 6 and 7 respectively. In Chapter 8, we discuss design challenges that mustbe met in order for a web SSO solution to succeed in the online identity landscape, followed byhow the design of OAuth 2.0 addresses these design challenges, and what are the implicationsof these design tradeoff decisions. We conclude the dissertation in Chapter 9 by discussing theachieved results and outlining future work.9Chapter 2Background and Related WorkThe proliferation of information technology has led to computer users accumulating many re-dundant user accounts and passwords. To provide access to restricted services, computer sys-tems and applications maintain identity attributes and credentials of users, and require usersto prove possessions of these credentials to obtain access to the protected resources. As infor-mation systems proliferate to support business processes, system users and administrators arefaced with an increasingly complicated interface to accomplish their tasks. When interactingwith multiple computer systems, users have to necessitate an equivalent number of sign-ondialogues, each of which may involve different authentication information. Meanwhile, systemadministrators need to manage user accounts among multiple systems, sometimes even acrossenterprise boundaries, in a coordinated manner in order to enforce an integrated access-controlpolicy.The ubiquitous web services exacerbate this problem even further. As web applicationsbecome widespread, web users need to handle an increasing number of authentication credentialsand profile information to establish security contexts with different web applications. Theresulting multitude of user accounts translates to a surfeit of usernames and passwords forusers to remember. Managing multiple authentication credentials is annoying for users, andweakens the security for the authentication system as users tend to pick weak passwords or toreuse the same password among different websites, and many employ unencrypted passwordfiles to keep track of their accounts.One approach to reduce the burden on human memory and the overhead of credential man-agement is password managers [ME05]. Password managers typically store encrypted passworddata in a local database and are able to automatically fill in the login forms of the websites thatusers visit. The most commonly used password managers are those built into the browser itself(e.g., password auto complete) [GF06], rather than those implemented as a browser extension(e.g., Password Multiplier [HWF05]). Password managers can reduce a user?s memory burdenas they only need to remember a single master password. However, users may have difficulty inmigrating their existing passwords to the system [CvOB06]. Such systems typically have issueswith the transportability of passwords between computers [CvOB06], and users may not trustthe security of these systems [GF06]. In addition, when using password managers that improvesecurity through custom generated passwords (e.g., Passpet [YS06], PwdHash [Ros05]), usersmay be uncomfortable not knowing the actual site passwords [CvOB06].Another approach to reduce the problem of password fatigue is web single sign-on (SSO),102.1. Background of Single Sign-OnFigure 2.1: The ?single sign-on triangle? model.which is the focus of this dissertation. Developed and grown from corporate enterprises, SSOsystems are evolving into the Internet, and gradually becoming a key element of the web ecosys-tem. In this chapter, we present background and key web SSO solutions, and discuss relatedwork on improving the usability and security of web SSO systems.2.1 Background of Single Sign-OnSingle sign-on is an authentication mechanism that allows a user to leverage one single accountto access protected resources or services on multiple different computer systems. SSO enablesnetworked services to achieve authentication goals by consuming just-in-time identity data fromauthoritative sources residing in other system or organizational domains, at the moment usersapproach. SSO solutions come with a diversity of variants, but they all share a common model.From an architectural point of view, an SSO system always involves five logical entities as themodel depicted in Figure 2.1:? The identity provider (IdP) is the organization that manages and authenticates the users?identity information, and provides asserted identities to other service providers. Examplesof IdPs include enterprises in the private and public sectors that manage their workforcemembers (e.g., employees, contractors, retirees), or web service providers such as socialnetworks (e.g., Facebook, Twitter) and web email providers (e.g., Google, Yahoo, Hot-mail).? The relying party (RP) is an application that provides services to end-users, and relieson the authenticated user identities from IdPs to make authorization decisions. An RPand its IdP can be administrated by the same or different organizations. Information112.1. Background of Single Sign-Onsystems within enterprises, extranet services provided by business-to-business partners,or consumer websites are examples of RPs.? The user is a person who assumes a particular digital identity from an IdP to accessprotected resources and services provided by the RPs. A user can be a workforce memberor a web user.? The user agent is a software application running on the user?s personal computer, mobileor appliance device that interacts with the IdP and RP on behalf of the user, such asa web browser or a mobile application. A user?s interactions with IdPs and RPs alwaystake place through an agent, which can either passively allow identity information flow oractively mediate it.? The protocol is an agreement of message formats and transport mechanisms among IdPs,RPs, and user agents, which is designed to exchange asserted identities between IdPs andRPs. An SSO protocol can be an open standard or a proprietary specification.When an authentication architecture separates the identity information?s source from itsusage, every stakeholder in the SSO ecosystem benefits:? Users can leverage one single account to access multiple protected resources and serviceswithout revealing their login credentials to RPs. This could drastically reduce the numberof account and login credentials a user needs to manage, and prevent users from devisinginsecure password management strategies.? RPs can provide users with an enhanced sign-on user experience, offload many account-management and assertion tasks to IdPs, reach a broader set of users from IdPs, andpromote their services through the users? social circles.? IdPs can focus on improving authentication methods, adding attractive features to accountmanagement interfaces, and facilitating personal content sharing through their platformin order to gain the marketing and thought-leadership in the online identity landscape.SSO solutions can be categorized according to the administration domain of the RPs andIdPs participating in the system. Many SSO solutions are single-domain SSO systems in whichboth IdP and RP are controlled by a single enterprise or administrative domain. A single-domain SSO system provides the ability for applications across an organization to rely on ashared user store, and to provide user access to applications with minimal number of sign-ons. Examples of such systems include Kerberos [NT94], CA IdentityMinder [CA 12], EvidianEnterprise SSO [Evi12], Imprivata OneSign [Imp12], and HTTP cookie-based solutions suchas PubCookie [Uni08], CAS [Yal09] (Yale University), and CoSign [Uni09] (The University ofMichigan).Increasingly however, users are accessing external systems which are fundamentally outsideof their domain of control, and external users are accessing internal systems. A cross-domain122.2. Key Web SSO ProtocolsFigure 2.2: Key web SSO standards and their strengths.SSO system (also known as ?Federated Identity management System?) lets computer systemsdynamically distribute identity information and delegate identity tasks across security domains,where IdP and RP are administered by different organizations. Some cross-domain SSO sys-tems are IdP-centralized, in which a single authority acts as IdP for all RPs. Proprietarybrowser-based protocols such as Microsoft Passport [Opp04], Yahoo BBAuth [Yah08], AOLOpenAuth [AOL08], Google AuthSub [Goo08] are main solutions in this category. Each ofthese proprietary protocols is developed by a single IdP allowing other websites to accept usercredentials only from their own domain. Due to its closed nature, no other service provider isable to implement the protocol and participate as an IdP.In contrast to IdP-centralized SSO systems, a decentralized cross-domain SSO system canhave more than one IdP, allowing an RP to leverage user accounts from multiple IdPs. De-centralized SSO systems need shared protocols and message formats to exchange identitiesand assertions between IdPs and RPs. Key decentralized SSO protocol specifications includeInforCard[NJ08], SAML Web Browser SSO Profile [OAS05], OpenID [RF07], and OAuth [HLRH11],which we discuss in greater details in the next section.132.2. Key Web SSO Protocols2.2 Key Web SSO ProtocolsSince 2000, four major open web SSO specifications have emerged: SAML Web Browser SSOProfile, InfoCard, OpenID, and OAuth. Any RP or IdP implementation adhering to the openspecification or standard can achieve the full spectrum of use-cases and interoperability providedby these protocols. InfoCard has a Microsoft pedigree, which defines the Identity SelectorInteroperability Profile specification [NJ08] underlying Windows CardSpace [Mic09b] deployedin the Windows operating systems. However, due to the lack of adoption by IdPs and RPs,Microsoft discontinued its development in 2011 [Mic11].In the current web SSO standard landscape, SAML, OpenID, and OAuth are playing avariety of strategic roles. Figure 2.2 summarizes the commonalities and distinctions betweenthese three web SSO protocols. Among them, SAML is the most mature and comprehensivestandard with versions standardized in 2002 and 2005, offers the majority of organizationallymanaged identities. OpenID, a lightweight protocol emerged from community effort in 2005,provides a unique ?dynamic IdP discovery? capability with major service providers (e.g, Google,Yahoo, Microsoft, AOL) and the US government supporting it. Combining the best implemen-tation practices of proprietary industry protocols, such as Google AuthSub [Goo08], YahooBBAuth [Yah08] and Flickr API [Fli12], the OAuth protocol is a web resource authorizationprotocol that enables not only web SSO but also web-scale content sharing through social graphsand platform-specific services. The newly emerging OAuth version 2.0 is still under draft atthis time of writing, but it has already been employed by Facebook and other key players suchas Twitter, Linkedin, Google, Microsoft, and SalesForce.Note that although this section discusses four major web SSO protocols, our work focuseson OpenID and OAuth because together they offer a critical mass of web consumer populations.According to a recent web SSO trends report [Jan12b] in September 2012, Facebook is the mostpopular choice for web SSO (50%, uses OAuth 2.0), followed by Google (25%, OpenID 2.0),Twitter (10%, OAuth 1.0a), Yahoo (7%, OpenID 2.0), Microsoft (2%, OAuth 2.0), and AOL(2%, OpenID 2.0).2.2.1 InfoCardInformation cards (known as InfoCard) [NJ08] are personal digital identities that are analogousto real-world identity cards such as passports, driver licenses, and credit cards. Each cardcontains assertions about a user?s identity that are either self-issued or issued by an identityprovider. When logging into a web site, the user selects a card instead of typing a user nameand password. Information cards are managed on client computers by a software componentcalled an identity selector (e.g., Windows CardSpace [Mic09b], Higgins Card Selector [The09a]).In June 2008, the Information Card Foundation [The09b] was formed to advance the use of theInfoCard metaphor as a key component of user-centric identity systems. Industry leaders suchas Equifax, Google, Microsoft, Novell, Oracle, PayPal, and VeriSign are among the steering142.2. Key Web SSO ProtocolsFigure 2.3: How InfoCard works.members of the Information Card Foundation.In order to use InfoCard the user must first create a self-issued card on her own machineusing the identity selector, or obtain an IdP-issued card from an IdP. A newly issued InfoCardis an XML document that can be transmitted to the user via e-mail or web download. Thefollowing steps and Figure 2.3 illustrate how a user uses InfoCard to log into an RP website:1. User U uses browser B to make an HTTP request to a protected resource on an InfoCard-enabled relying party RP.2. RP returns a security policy to B indicating what type of identity and channel security theservice requires.3. B invokes the identity selector S and passes in the security policy received from RP. Sshows the user a collection of cards that matches the given policy. The match is primarilydetermined by the type of token and the claims of user attributes required by the service.4. Once the user selects a card to send, S initiates a security policy exchange conversation withthe identity provider IdP that issued the card.5. IdP returns a security policy to S indicating how the user is supposed to prove her credential.6. U provides her credential to S (e.g., by entering user name and password).7. S makes a request to IdP for the required claims along with U?s credentials.8. IdP returns a security token to S for the user to send to the service. This token containsall the claims RP requested.9. Based on the user?s consent, S passes the security token to B.10. B in turn, passes the security token (i.e., identity attribute assertion) to RP. RP then makesaccess decisions based on the received security token.152.2. Key Web SSO ProtocolsInformation cards have important features such as phishing resistant authentication, IdP-to-RP unlinkability, and real-time user consent. However, in comparison to OpenID or OAuth,InfoCard is a heavy-weight protocol. In particular, users need to install an identity selectorand relying parties must have a valid SSL certificate configured to provide secure channelswhen communicating with identity selectors. Security-wise, as cards are stored on the localrepository, they could be stolen and used to impersonate the victim if the user?s machine iscompromised [Hao09]. Additionally, InfoCard raises privacy issues when it is used on sharedor public computers, and it is difficult to use them if users switch between multiple computers.InfoCard is suffering adoption problems as only a few websites support InfoCard as an IdP orRP [The09b]. In February 2011, Microsoft announced that it is discontinuing the developmentof InfoCard [Mic11].2.2.2 SAML Web SSO Profile 2.0Security Assertion Markup Language (SAML), a standard specified by the OASIS [OAS12] Secu-rity Services Technical Committee, offers an XML-based framework that encodes security asser-tions and corresponding protocol messages for exchanging identity assertion information acrossdomain boundaries. The SAML 2.0 standards [OAS05] are widely considered the most robust,extensible, and interoperable choice for enterprise-strength identity federation scenarios. Manysuccessful SAML implementations exist in industry, government and academia [WNC+05], andother standards such as the Internet2 Shibboleth project [Int08], Liberty Alliance [Kan02],OASIS Web Services Security (WS-Security) [OAS02], and eXtensible Access Control MarkupLanguage [Com05] (XACML) are based on SAML.The modular design of the SAML framework allows its components, core, bindings, andprofiles, to be combined to support a wide variety of deployment scenarios. The core of SAMLconsists of assertions and message formats for requesting and responding assertions betweensecurity domains. An assertion is an XML document containing signed statements (by an IdP)about the identity holder?s identifier, authentication status, and attributes. SAML bindingsspecify how SAML request/response messages are encapsulated in other common transporta-tion protocols such as HTTP or Simple Object Access Protocol (SOAP). A SAML message istransmitted from one entity to another either by value or by reference. A reference to a SAMLmessage is called an artifact. The receiver of an artifact resolves the reference by sending anartifact resolving request directly to the issuer of the artifact, who then responds with the ac-tual message referenced by the artifact. There are three common used SAML bindings: HTTPRedirect, HTTP POST, HTTP Artifact. To transmit a SAML request/response message fromone entity to another by value, the message can be carried directly in the URL query string ofan HTTP GET request (i.e., HTTP Redirect binding), or inside an HTML form (i.e., HTTPPOST binding). Alternatively, to transmit a SAML message by reference, an artifact that de-notes the actual stored message is placed as a query parameter of an HTTP GET request; thatis, HTTP Artifact binding.162.2. Key Web SSO ProtocolsFigure 2.4: SAML message exchange model for achieving web SSO.A SAML profile describes how SAML assertions, request/response messages, and bindingscombine to support a defined use case. The most important SAML profile is the Web BrowserSSO Profile, in which a web browser is used as a user agent, and three possible bindings (i.e.,HTTP POST, HTTP Redirect, HTTP Artifact) can be used to transfer authentication requestand response between an IdP and an RP. The basic template for achieving web SSO in SAMLis illustrated in Figure 2.4. Within an individual step, there may be one or more actual messageexchanges depending on the choice of SAML bindings used for that step.1. The user U, via a web browser B, makes an HTTP request for a protected resource at the RPwithout an established security context. RP obtains the location of an endpoint at IdP forthe authentication protocol. Note that the mechanism for IdP discovery is implementation-dependent.2. RP issues an authentication request message to be relayed via B to IdP. Either the HTTPRedirect, HTTP POST, or HTTP Artifact binding can be used to transfer the message tothe IdP through the user agent.3. U authenticates to IdP by some means outside the scope of the SAML profile.4. IdP issues an authentication response message to RP via B. Either the HTTP POST orHTTP Artifact binding can be used for the message transportation. The response messagemay indicate an error, or includes an authentication assertion.5. RP either grants or denies access to resource based on the authentication response fromIdP.The design of SAML is driven by strong requirements for trust, high-value transactions,and privacy. Although the SAML framework is flexible and highly extensible, the prerequisiteof agreements on protocol details between organizations in the federation makes them hard toscale on the Web. In addition, it could be challenging for average web developers to parse andverify the rather complex SAML messages.172.2. Key Web SSO ProtocolsFigure 2.5: Overview of how OpenID works.2.2.3 OpenID 2.0Whereas SAML is a comprehensive and complex security framework that supports a broadrange of deployment scenarios, OpenID [RF07] is a lightweight authentication protocol designedspecifically for user-centric web single sign-on, with which users are free to choose or even set uptheir own OpenID IdP server. OpenID?s most valuable and unique feature lies in its ?dynamicIdP discovery? design, which allows RPs to dynamically discover the endpoints of an IdP andestablish shared session keys at runtime. This scalability feature removes the need for RPs ofconfiguring a set of known IdPs and having a pre-established agreement on the protocol detailswith an IdP before initiating the protocol. According to the OpenID Foundation [Ope09], asof September 2009, more than one billion of OpenID enabled user accounts were provided bymajor service providers (e.g., Google, Yahoo, AOL, Microsoft, PayPal).In OpenID, a user?s identity is a URI (Universal Resource Identifier), and the OpenID au-thentication process asserts to the RP that the user controls the content at that URI. Figure 2.5illustrates the following steps, which demonstrate a high level view of how the OpenID protocolworks:1. User U selects an IdP (e.g., https://yahoo.com/) or enters her OpenID identifer (e.g.,http://ece.ubc.ca/alice) via a login form presented by an RP.2. RP makes an HTTP request from its server-side to fetch a document identified by thegiven OpenID identifier (i.e., https://yahoo.com/ or http://ece.ubc.ca/alice) that containsthe IdP?s user-authentication endpoint (e.g., https://ece.ubc.ca/openid) and key exchangeendpoint (e.g., https://ece.ubc.ca/keyexchange). Optionally, RP could establish a sharedsession key with IdP using Diffie-Hellman Key Exchange protocol [DH76]. The session keycan be later used by RP to verify the identity assertion returned from IdP. RP then redirectsU to IdP?s user-authentication endpoint.182.2. Key Web SSO Protocols3. U authenticates to IdP (e.g., by entering her user name and password), and then consentsto the release of her profile information with RP.4. IdP validates the user credential and redirects U back to RP along with an identity assertionthat RP can verify, either by using the prior established session key or by sending theassertion back to IdP via a direct back-end channel for validation. RP then makes anauthorization decision based on the verified identity attributes encoded in the assertion.OpenID brings dynamic partnering to lightweight web SSO environment. The openness ofOpenID attracts not only major consumer service providers but also US government. In 2009,the US federal government has profiled OpenID (along with SAML) for use as part of its OpenIdentity Solutions for Open Government initiative. These profiles improve security and privacycharacteristics of the underlying protocols. As part of this initiative, consumer IdPs such asGoogle and PayPal have been certified as compliant with the OpenID profile, and governmentagency websites such as PubMed.gov can serve as RPs for users of these IdPsOne fundamental issue of OpenID is that of trust: how does an RP know it can trustcredentials issued from an arbitrary IdP specified by a web user? This is a business, legal,and social issue that cannot be solved by technology alone. In March 2010, the Open IdentityExchange (OIX) [Ope10] foundation was formed to build trust in the exchange of online identitycredentials across public and private sectors. OIX follows an open market model to provide thecertification services needed to deliver the levels of identity assurance and protection requiredby organizations.2.2.4 OAuth 2.0OAuth is a web resource authorization protocol whose original goal was to solve the ?pass-word anti-pattern? problem when sharing personal content across websites. Many websitestoday are full-fledged platforms storing large amounts of user data and expose data and ser-vices through a web API. The provision of web APIs from major web service providers (e.g.,Facebook, Google, SalesForce) enables third-party developers to build user-centric applicationsby leveraging user content from those platforms. Nevertheless, in the absence of a secure APIauthorization protocol, third-party applications must request user credentials in order to ac-cess user information stored in the service provider, which is clearly undesirable. This insecure?password anti-pattern? practice is analogous to handing out an ATM card and PIN code to thewaiter when it is time to pay?once given out, there is no guarantee that the user?s credentialswill not be misused or abused.The OAuth protocol is an open standard for web resource authorization that enables webusers to grant third-party applications with limited access (e.g., scope, duration) to their re-sources stored at a website. The authorization is made without sharing the user?s long-termcredentials, such as passwords, and allows the user to selectively revoke an application?s ac-cess to their account. Building upon the actual implementation experience from proprietary192.2. Key Web SSO Protocolsindustry API authorization protocols (e.g., Google AuthSub, Yahoo BBAuth, and Flickr API),OAuth 1.0 [HLLM+07] was published in December 2007 and quickly become the industry stan-dard for web-based access delegation. A minor revision (OAuth 1.0 Revision A [HLLM+09])was published in June 2009 to patch a security hole [HL09]. In April 2010, OAuth 1.0 waspublished as RFC 5849 [HLRH10].OAuth 2.0 [HLRH11] is the next evolution of the OAuth protocol which is not backwardcompatible with previous versions. Compared to its predecessors, OAuth 2.0 largely makesthe protocol simple for RP developers to implement. First, it removes the cryptographic re-quirements (i.e., digital signature) from the specification, and relies on SSL as the default wayfor communication between the RP and IdP. This also improves performance as the protocolbecomes stateless without needing to store temporary token credentials. Second, it definesseveral authorization flows for different security contexts, such as websites, desktop applica-tions, mobile phones, and appliance devices. Particularly, in the context of SSO, it supportsJavaScript as a client so that the OAuth protocol can be executed completely within a browser.The OAuth 2.0 specification is still a work in progress within IETF OAuth working group, butimplementations have already been developed and deployed by Facebook, SalesForce, Twitter,LinkedIn, Google, Yahoo, and many others.OAuth is designed as an authorization protocol, but many implementations of OAuth arebeing deployed for web SSO, and thus authentication. In this use case, user identity informationhosted on an IdP is authorized by the user and shared as a web resource for RPs to identify thecurrent SSO user. Unlike OpenID 2.0 which provides IdP-discovery capability, an OAuth RPwebsite must first register with an IdP to obtain an unique application identifier and sharedsecret key. The RP also needs to register a redirect URI or HTTP domain from which the IdPwill return authorization responses back to the RP.Similar to SAML Web SSO profile and OpenID, the protocol messages of OAuth-based SSOsystems are passed between an RP and IdP through a browser with two redirections. In thefirst redirection, the RP redirects the users browser to the IdP?s user-interaction endpoint atwhich the IdP authenticates the user and asks the user for permission to grant the RP access tothe protected resources. If the permission was granted, the IdP initiates the second redirectionthat directs the user?s browser to the RP with a random token. The token may be (1) an accesstoken that represents the granted permissions and the duration of the authorization whichallows access to requested resources directly, or (2) a provisional token that the RP exchangesfor an access token by accompanying the shared secret with the IdP through a direct back-endchannel to the IdP. Once the authorized access token is obtained, the RP then calls web APIspublished by the IdP to access the user?s profile attributes, as well as objects in the user?s socialgraph (e.g., people, photos, events, and pages) and the connections between them (e.g., friendrelationships, shared content, and photo tags).202.3. Related Work2.3 Related WorkWeb SSO brings benefits to all stakeholders in the SSO ecosystem. However attractive thesebenefits are, those browser-based solutions impose new and increased security and privacy risksas they share valuable personal information across domains through web browsers using looselycoupled network protocols. The scale and complexity of web SSO technologies, combined withthe privacy and security requirements demanded of them, create steep design challenges forusability as well. It is thus challenging to design an ease-of-use, secure, privacy-preserving, andsimple-to-implement web SSO system that could motivate adoption by RPs and web users. Thechallenges involve dependencies, constraints, complex trade-offs, and sometimes even contra-dictory design requirements. In this section, we discuss related work that towards improvingthe usability and security of web SSO solutions.2.3.1 Web SSO Usability StudiesTo understand the conceptual and usability issues associated with enabling Yahoo OpenID onRP websites, Yahoo OpenID research conducted a usability study [Fre08] with nine femaleYahoo users (aged 32?39 with a self-declared medium-to-high level of Internet savvy). Thestudy found a number of usability problems that web users faced when using OpenID forauthentication. Based on the results, the authors recommended best practices and designguidelines for implementing usable login interfaces on both RP and IdP websites. For RP loginforms, they suggest that RPs should clearly indicate that users have the choice of logging inusing different login options. They also promote the ability to log in using an existing account(e.g., ?Sign in with a Yahoo ID? button, IdP logo list), but not the technology itself. Thedesign of most state-of-art RP login forms follows Yahoo?s recommendations, including the RPwebsites in our study.Google OpenID research found that using the IdP icon list as a guide for login imposes somelimitations [Sac08]. Consistent with our findings, the author found that unless the buttons arelarge, they are only noticeable by a subset of the end-users. However, if the buttons are madelarge, then users can also be confused about how they should login. In addition, if the buttonsinclude IdPs who are not email providers, then there is no good way to identify the same personlogging on through SSO and traditional login, which requires an account linking step. As aresult, Google suggests using ?email as a key? to hide IdP icons from users completely. However,this approach is not widely adopted by RPs, because not all IdPs are email providers.Plaxo.com, an online address book provider, conducted a ?Two-Click Sign-Up? experimentwith Google to enable Google users (1000 participants) to sign up and import their Googlecontact list into Plaxo [McC09]. The result was encouraging; 92% of participants completedthe importing task. However, the login form was optimized to contain only one ?Sign up withmy Google Account? button without any other login options, which is not applicable to mostRP websites.212.3. Related WorkShehab et al. [SMH11] propose an extension to the OAuth protocol that enables the provi-sioning of fine-grained authorization to users when granting permissions to third party appli-cations. They implemented the proposed OAuth extension as a browser extension and conducta usability study that collected data regarding user decisions. The extension was installed by1,286 Firefox users who installed 1,561 unique Facebook applications. Their results show thatusers do have varying willingness towards sharing different types of private information.The results of existing usability studies shed some light on users? perceptions of web SSOsystems. However, those findings are still insufficient for the web SSO development communityto understand the perceived risks and concerns users face when using web SSO, what are theusers? mental models, and how they are formed. If these mental models are inadequately alignedwith the system model, they could potentially influence users? perceptions of usability, security,and privacy, as well as their willingness to adopt this technology. Our web SSO usabilityresearch aims to fill this knowledge gap, and explore how web SSO interfaces can be improvedfor users to derive a more adequate mental model.2.3.2 Browser-supported SolutionsSxipper [Sxi09] is a form manager implemented as a Firefox add-on that helps users to fillin web forms during registration or ordering processes. Unlike OpenID or OAuth-based webSSO solutions that reuse users? profile information from their IdPs, Sxipper requires users toenter and maintain separate copies of their personas in the browser. In addition, Sxipper mightnot detect forms correctly, and it stores sensitive information such as credit card numbers asplain text on the user?s local machine. This poses a security threat if the user?s computer iscompromised, and it raises portability issues when users switch between computers or want touse a shared or public computer.VeriSign?s Seatbelt [Ver09], a Firefox add-on, is designed to make OpenID more convenientto use by automatically filling in a user?s OpenID URL when visiting relying parties. Seatbeltis easy to use; however, it may not detect OpenID login form fields precisely, because it usesa simple text-matching technique (e.g., openid, oidurl, open-id, open id) to identify them. Inaddition, it requires Seatbelt specific configurations from the participating OpenID IdPs.Weave Identity [Moz09] is a Firefox add-on that leverages a Firefox built-in password man-ager for single-click and automatic login, and integrates Weave server accounts for automaticOpenID sign-on. Similar to VeriSign?s Seatbelt, it might not detect and submit login formscorrectly; and automatic OpenID login support is limited only to Weave accounts.Mozilla?s Persona [Moz12b] is a browser-supported web SSO scheme first released in July2011, and fully deployed by Mozilla on its own websites in January 2012. With Persona,web users can maintain a list of their email addresses in the browser and choose one of themto sign onto the RP websites. Each email address is certified by the corresponding emailprovider through a digital certificate. Using email as the user identifier enhances usabilityand minimizes personal information disclosure. In addition, Persona stores the user?s email222.3. Related Workcertificate and conveys it to RP websites upon user login, which prevents IdPs from trackingthe websites a user has visited. One main challenge Mozilla Persona faces is adoption of RPs.In particular, RPs need a rich set of user data from IdPs in order to motivate their adoption ofPersona [SBHB10].2.3.3 Security Analysis of OpenIDSeveral possible threats are documented in the OpenID specification itself, including (1) aphishing attack that redirects users to a malicious replica of an IdP website, (2) the masqueradeof an IdP by an MITM attacker between the RP and IdP to impersonate users on the RP, (3)a replay attack that exploits the lack of assertion nonce checking by RPs, and (4) a denial-of-service (DoS) attack that attempts to exhaust the computational resources of RPs and IdPs.In addition to the aforementioned phishing attack, Tsyrklevich et al. [TT07] demonstratea series of possible attacks on the OpenID protocol: (1) a malicious user could trick an RP toperform port scans and exploit non-accessible internal hosts; (2) an MITM attacker between theRP and IdP could perform two distinct DH key exchanges with each party to sign authenticationassertions on behalf of the IdP; and (3) an IdP could track all the websites a user has loggedinto via the return to parameter.Barth et al. [BJM08] introduce session swapping attack, in which an attacker logs thevictim into a site as the attacker by using the victim?s browser to issue a forged cross-site loginrequest embedded with the attacker?s user name and password. The authors also illustratehow the session swapping attack works in OpenID and in PHP cookie-less authentication. Inthe case of OpenID session swapping, the attacker first signs into an RP using the attacker?sidentity, intercepts the authentication response, and then embeds the intercepted responsein a web page that victims will visit. Sovis et al. [SKS10] examined the OpenID extensionframework and found that, due to an improper verification of OpenID assertions, the extensionparameter values sent within the OpenID protocol could be manipulated if the channel is notSSL-protected. Rui et al. [WCW11] found some RP implementations do not check whether theinformation passed through Attribute Exchange extension was signed, which allows an attackerto modify the profile attributes returned from an IdP. Jain et al. summarize existing OpenIDsecurity issues on their OpenID review website,1 and Delft et al. [DO10] present the OpenIDsecurity issues found by others.A formal OpenID model in AVISPA was presented by Lindholm [Lin09], but the formal-ization only models the non-association mode of the OpenID protocol (i.e., no DH shared keybetween the RP and IdP), and it assumes that an MITM attacker controls the communicationbetween the RP and IdP. In a non-association mode, the RP has to send the assertion backto the IdP for validation via a direct communication (i.e., not via browser) and the validationresult is not signed. It is clear?and documented in Section 15.1.2 of the OpenID specificationas well?that an MITM attacker between the RP and IdP could impersonate the victim by1https://sites.google.com/site/openidreview/issues232.3. Related Workreplying to the RP with an unsigned positive assertion. Fundamentally, this adversary modelcontradicts the basic assumption of the OpenID protocol, which requires the communicationbetween RP and IdP to be secured.The existing findings about the security of the OpenID protocol are valuable, but thereis a lack of deeper understanding of the systemic causes of those vulnerabilities found in theOpenID protocol, how prevalent they are, and how to effectively address them. Our work onsecurity analysis of OpenID fills this gap.2.3.4 Security Analysis of OAuthThe ?OAuth Threat Model? [LMH11] is the official OAuth 2.0 security guide that provides acomprehensive threat model and countermeasures for implementation developers to follow. Sev-eral formal approaches have been used to examine the OAuth 2.0 protocol. Pai et al. [PSK+11]formalize the protocol using Alloy framework [Jac10], and their result confirms a known securityissue discussed in Section 4.1.1 of the ?OAuth Threat Model?. Chari et al. [CJR11] analyzeOAuth 2.0 server-flow in the Universal Composability Security Framework [Can11], and theresult shows that the protocol is secure if all endpoints from IdP and RP are SSL protected.Slack et al. [SF11] use Murphi [DDHY92] to verify OAuth 2.0 client-flow, and confirm a threatdocumented in the ?OAuth Threat Model? (i.e., CSRF attack against redirect URI). How-ever valuable these findings are, as the formal proofs are executed on the abstract models ofthe OAuth protocol, subtle implementation details and browser behaviors might be inadver-tently left out. Furthermore, it is unclear whether real implementations actually do follow therecommended security guidelines.Many researchers examined the security of Facebook Connect, which has been deprecatedand replaced by OAuth 2.0 as the default Facebook Platform authentication and authorizationprotocol. Unlike OAuth 2.0, Facebook Connect is a proprietary protocol. Miculan et al. [MU11]reverse engineered the Facebook Connect protocol from network traces, formalized the protocolin HLPSL and verified it using AVISPA model checking engine [Vig06]. The AVSIA attack tracerevealed that an intruder could capture the session credential during a legitimate request, andreplay them to impersonate the victim user. Hanna et al. [HSA+10] investigate two client-sidecross-domain communication protocols, Facebook Connect and Google Friend Connect, thatlayer on postMessage HTML5 API. Their analysis found that the protocol implementations usethe postMessage primitive unsafely in several places within the JavaScript libraries, openingthe protocol to severe confidentiality and integrity attacks. Wang et al.[WCW12] label andmanipulate HTTP messages going through the browser to identify potential impersonationexploit opportunities. The authors discovered eight logic flaws in high profile IdPs and RPs,such as Google OpenID, PayPal Access, Facebook Connect, JanRain, Freelancer, FarmVille andSears.com. For Facebook Connect, they found that by luring a victim user to visit a maliciouswebsite, the victim?s Facebook access token can be tricked to deliver to the attack website bysimply setting Flash as the cross-domain communication transport and naming the malicious242.3. Related WorkFlash object with a underscore prefix.The vulnerability discovery methodology employed by Wang et al. [WCW12] and our workon OAuth security analysis are similar (i.e., examining the browser relayed messages), butdifferent in two important aspects. First, we assume a practical adversary model based onexisting literature in which an attacker can eavesdrop un-encrypted traffic between the browserand the RP server, and that application and browser vulnerabilities could be leveraged by a webattacker. Second, we focused on OAuth 2.0 rather than generic SSO. This focus allowed us to(1) identify the gaps between the protocol specification and implementations, (2) design semi-automatic assessment tools to examine the prevalence of each uncovered weakness, whereas thework in [WCW12] requires in-depth knowledge from domain experts to evaluate an exploit, and(3) investigate fundamental causes (rather than implementation logic flaws), and propose simpleand practical improvements that are applicable to all current OAuth IdPs and RPs (instead ofspecific websites), and can be adopted gradually by individual sites.25Chapter 3Conceptual Gaps, AlternativeDesign, and Acceptance ModelThis chapter presents three user studies that investigate users? perceptions and concerns whenusing web SSO for authentication, and explore possible improvements. We first conductedan exploratory study to better understand users? experiences with web SSO. After identifyingmisconceptions and concerns common to most participants, we designed an identity-enabledweb browser (IDeB) intended to explore changes in the login flow that could improve SSOexperience of web users and their acceptance incentives. The prototype was refined throughseveral iterations of cognitive walkthroughs and pilot studies. We then conducted a formativewithin-subjects evaluation of the IDeB prototype to confirm the findings of the exploratorystudy and to further improve the prototype and study design. Through mental model draw-ings and semi-structured interviews, we identified several conceptual gaps that influenced ourparticipants? perceptions and acceptance intentions. Finally, we conducted a within-subjectsstudy to compare the usability of our IDeB design with the existing interfaces (denoted as CUIhereafter), to see whether the issues identified through the exploratory and formative studieshad been addressed in IDeB. These studies were approved by the UBC?s Behavioral ResearchEthics Board (BREB), and the study documents are listed at Appendix A.Our study revealed that current web SSO user interfaces could be misleading and that lackof visibility of system states, resulting in users deriving inadequate mental models, negativelyinfluence their risk perceptions and adoption intentions. Most participants in our study in-correctly believed that the way SSO works is to give their IdP login credentials to RPs. Thismisconception was initially ?confirmed? for participants when they saw that the IdP login pagewas skipped when they were asked to log out and sign back into the first RP in the study. Later,this incorrect belief was ?confused? during the subsequent task scenarios, when they could loginto another RP and view their IdP profile (e.g., Gmail, Yahoo Mail, Facebook) without anexplicit IdP login. In addition, most participants were uncertain about what types of data werebeing shared, what actions the RP could do to their IdP account, and how they could revoketheir profile sharing. Many participants did not know that RPs could post messages back tothe IdP on their behalf, while some participants expressed reluctance to use web SSO solutionsdue to a prior surprising and embarrassing experience with RPs having posted their activitiesto their Facebook update streams. Furthermore, most of our participants did not know thatIdPs can track when and which RP websites the user has visited, as well as the services and26Chapter 3. Conceptual Gaps, Alternative Design, and Acceptance Modelproducts in which the user is interested if the IdP is an OAuth-based IdP.Besides security misconceptions and privacy concerns, our study also identified the followingfactors hindering the participants? intention to adopt SSO (however these are difficult to resolveby improving the SSO protocol alone).? No perceived urgent need for the web SSO that the websites offered: Most participantswere ?comfortable? with weak or reused passwords, while many used the password man-ager feature in the browser.? Single point-of-failure concerns: Over a quarter of the participants identified this inherentproperty of web SSO, and expressed concerns about it.? Phishing concerns: Once informed of the possibility of IdP phishing attacks, all partici-pants expressed serious concerns about this common issue of redirection-based web SSOprotocols.? Trust concerns with RPs: Many participants stated that they would not use SSO on RPwebsites that contain valuable personal information, involve potential monetary loss, orare not trustworthy or familiar.? Account linking misconceptions: Linking a traditional account to an IdP account allowsan existing account on the RP website to sign in using SSO, and ensures that users arestill able to log in when their IdP accounts are inaccessible. However, most participantsdid not understand the purpose and concept of account linking and became confused andfrustrated when they were prompted for such a linking.Our main contribution lies in a user-centric investigation of web SSO systems, offeringinformed design recommendations to web SSO development communities. Our study focusedon OpenID and OAuth-based web SSO systems because together they offer a critical massof the web SSO population [Jan12b]. We did not include high-value RPs in the study (e.g.,banking, government) as they typically require a high degree of identity assurance to be providedby the SSO protocol and trust frameworks such as SAML-based identity federation solutions(e.g., Yodlee [Yod12], Shibboleth [Int08], Liberty Alliance [Kan02]). In addition, we choseRPs that reuse software libraries from leading SSO integration providers (i.e., Gigya [Gig11],Janrain [Jan12a]) as opposed to designing the SSO UI themselves, because those SSO UIs areprofessionally designed and have been widely used by many popular websites, including many ofthose listed on Google Top 1000 websites [Goo11]. Our findings and insights were uncovered andderived mainly from the qualitative data collected through the observations of the task scenarios,mental model drawings, and semi-structured interviews during our iterative user-centric process.We found that most of our participants exhibited similar misconceptions and concerns, and datasaturation was achieved quickly. We strove to recruit a representative sample of participants toreduce potential sample bias. In addition to balancing age, education, and student/non-student273.1. Research Approachattributes, participants had a variety of occupations, such as dance teacher, financial planner,dentist, accountants, and fulltime housewife. Nevertheless, we acknowledge that a populationbias most likely exists as our participants were highly educated. It is likely that the proportionof actual web users who have inadequate mental models of web SSO would be higher than theproportion of our participants who do so.Security mechanisms are only effective when adopted and used correctly by users [WT99].The user-centric design of security mechanisms is thus imperative for the development of se-curity solutions that are intended for average users [AS99, MAS03, Lam09]. Our finding thatdangerous mistakes and adoption concerns occurred due to inadequate mental models of webSSO is indeed yet another observed instance of ?Why Johnny Cannot Encrypt? [WT99]. Insummary, the work presented in this chapter makes the following contributions:? We identify the mental models users have of web SSO and how these mental models areformed.? We identify conceptual gaps between the user?s mental model and the system model, andanalyze how these gaps affect user experience and perceptions in regards to SSO.? We introduce a web SSO technology acceptance model that explains how each factor wefound influences users? acceptance of a web SSO solution.? We suggest design improvements for RP and IdP websites, and web SSO developmentcommunities. We do not claim that our IDeB design is ready for real-world adoption; itserved solely as a discovery tool for the recommended design improvements. The recom-mended design improvements for RPs and IdPs can be implemented without additionalsupports from the browser.The rest of the chapter is organized as follows: The next section provides an overview ofour methodology. Section 3.2 describes the design and findings of the exploratory study, andSection 3.3 presents the design of the identity enabled browser. The formative and comparativestudies and its results are presented in Sections 3.4 and 3.5. Sections 3.6, 3.7, and 3.8 discuss theidentified conceptual gaps, web SSO technology acceptance model, and our recommendations,respectively. Finally, we discuss the limitations of our research in Section 3.9, and summarizethis chapter in Section 3.10.3.1 Research ApproachTo gain an overall understanding of users? perceptions and concerns when using web SSOfor login, we first conducted an in-lab exploratory study with nine participants. Participantswere asked to sign up and log into three real-world RP websites using their existing accountfrom Google, Yahoo, Microsoft or Facebook. To obtain objective data, we directly observedparticipants during task scenarios, recorded qualitative data on the nature of the interaction,283.1. Research Approachand kept notes on particular items of interest to be investigated further in the post-session,semi-structured interview. We found several similar behaviors, misconceptions and concernsexhibited by most participants, and the results saturated quickly. Based on our findings, wecompiled a list of requirements and brainstormed potential solutions to address the issues andconcerns.Our next step was to design and implement an alternative interface intended to provideweb users with a consistent, intuitive, phishing-resistant and privacy-preserving single sign-onuser experience. The design process was both incremental and evolutionary, as the prototypewas refined and redesigned throughout, and user feedback was iteratively integrated into thedesign. We implemented a horizontal prototype, and used a ?Wizard of Oz? approach forvertical communication functions to make users perceive that the websites in the study hadadopted our new design (further discussed in Section 3.3.) This functional partition allowed usto compare our design with the existing interfaces using identical RP and IdP websites in orderto achieve internal and ecological validity.Once a working version of the prototype was complete, we conducted a formative, within-subjects study with seven participants to compare the initial IDeB prototype with the existinginterfaces. This study design was chosen rather than a between-subjects design due to expecta-tions that individual differences would be substantial. Additionally, the comparative commentsof research subjects who experienced both conditions were essential for our evaluation. Therewas particular emphasis on examining the mental models formed for each system, and howthey differed. A semi-structured interview was used to obtain additional feedback from theusers. Moreover, it was desirable that the new system would compare favorably to traditionallogin methods, so we chose to also investigate users? preferences between the traditional login,the current user interface (CUI), and our design. We found participants? misconceptions andconcerns in the formative study were consistent with the findings from the exploratory study,and that our design significantly improved their perceived ease of use, security protection andprivacy control. We also identified parts of the prototype and study design that required furtherimprovements.In the final phase, we modified the prototype and study design to address the noted deficien-cies. In particular, (1) we revised the IDeB to reuse the existing IdP login forms instead of usinga customized one to make the IDeB look more trustworthy, and (2) IDeB shrinks the browserbefore presenting the IdP login form to reduce the possibility of IdP phishing attacks, and toconvey a more accurate mental model. We then conducted a comparative within-subjects studywith 35 participants to compare the usability of IDeB with CUI, to determine if there were anyoutstanding issues hindering the adoption of the new prototype. Overall, 51 participants wereused in these studies, and each participant was included in only one study. Using criteria fortheoretical sampling [GS67], we stopped recruiting new participants when we observed no newfindings arising in the study, and all comparison results were statistically significant.293.2. Exploratory StudyFigure 3.1: Login forms of three chosen RP websites in the study.3.2 Exploratory StudyIn the initial stage, our goal was to investigate web users? perceptions, challenges, concerns,and perceived benefits when using their existing IdP account to sign in to real-world RP web-sites. Our research focuses on OpenID and OAuth-based web SSO systems, because togetherthey provide a critical mass of the web SSO population [Jan12b]. Nevertheless, compared toSAML-based identity federation solutions (e.g., Yodlee [Yod12], Shibboleth [Int08]), OpenIDand OAuth provide only level one of identity assurance [BDP06]). Hence, we did not includehigh-value RP websites in the study, as OpenID and OAuth are not intended to be employed onRPs that require a high degree of identity assurance, such as banking or government websites.To find a representative sample of RP websites, we went through an RP site directory atMyOpenID.com, and categorized RPs into several groups based on their login form styles. RPsthat use a simple OpenID textbox were excluded as this approach has already been foundto be unusable for most web users [Fre08, Sac08]. In addition, RPs designed for a specificcommunity of users, and those that had the potential to make participants feel uncomfortableor embarrassed (e.g., dating or gaming websites) were excluded as well. From the three mostpopular style groups, we chose one RP website from each group based on the properties listed inTable 3.2. In the order presented in the study, we chose (1) Fox News, a premier news websitefrom www.foxnews.com, (2) ITrackMine, an online collection manager (www.itrackmine.com),and (3) Skitch, an online photo sharing website (www.skitch.com). The login forms of thesethree RP websites are shown in Figure 3.1. Note that rather than designing the SSO login flowfrom scratch, all three RPs reuse software libraries from leading professional SSO integrationproviders for their SSO login UIs and implementations (Gigya [Gig11], Janrain [Jan12a], andIDSelector [Jan10], respectively).303.2. Exploratory StudyProperty Fox News ITrackMine SkitchPopup window Yes Yes NoSize of IdP icons Medium Large Small# of IdPs supported 6 12 12Additional sign-up No Yes YesAccount linking No Yes NoWell-known Yes No NoTable 3.1: Properties of the selected RPs in the study.3.2.1 Study ProtocolWe recruited nine participants (six males and three females) from the University of BritishColumbia (UBC) and the Greater Vancouver area, and conducted a one-hour lab study witheach participant. Four participants were 19-24 years old, and five were 25-34 years old. Mostparticipants were fluent in English (eight), and had college or graduate degrees (eight) with adiverse range of majors. All had more than four web accounts, and two participants used apassword manager. Five participants had prior SSO experience using the UBC campus-widelogin.After completing a background questionnaire, participants were asked to sign up for, andsign in to, three RP websites using one of their existing accounts from another service provider(i.e., Google, Yahoo, Facebook or Microsoft). Then the participants were asked to log out of allwebsites, as if the tasks had been performed on a public computer from which they were aboutto walk away. We then asked participants to access their IdP account used in the study (e.g.,Gmail, Facebook). Finally, participants were directed to an OpenID phishing demo website(http://idtheft.fun.de) and told to select Google or Yahoo as the IdP for login. Before theyentered their username and password, we stopped participants and asked them whether theycould identify any clues to indicate that this was not the real Google or Yahoo signin page (seeFigure 3.2).Afterwards, the participants completed a questionnaire detailing their experiences with var-ious aspects of these tasks. We then conducted a contextual interview with the participants inorder to understand the problems that they encountered, as well as their potential concerns,perceived benefits, and features desired in a web SSO system.3.2.2 FindingsWe found that the current web SSO login UI was inconsistent and counter-intuitive, and thatparticipants formed incorrect mental models of the SSO workflow. Of the three websites, weexpected that the majority of our participants would be able to sign onto Fox News withoutany errors or concerns, as this website is well-known, listed on the Google Top 1,000 web-sites [Goo11], uses a popup window, and does not require additional sign up or account linkingprocess. It only requires three clicks and the username and password to be entered into the IdP313.2. Exploratory StudyFigure 3.2: Screen shots of the IdP phishing demo website.login form to sign into the site. Surprisingly, most of the misconceptions and concerns that wefound were uncovered when participants were trying to log in to this site. The main problemsand concerns identified in our study are listed in the following. Note that the findings wereconfirmed again in the formative and comparative studies:? Misleading affordance: On the Fox News login form, most participants (eight) enteredtheir IdP username and password into the traditional login fields directly. They statedthat they believed the website must be integrated with the identity providers (IdPs) insome way so that they would be able to use their Google or Yahoo email and passworddirectly on the login form to sign in. They did not know that they needed to click on oneof the IdP icons to initiate the login process; three participants thought the IdP icons wereadvertisements, and two thought the website had teamed up with the IdPs for contentsharing.? Incorrect mental model derived from the login process: Many participants (five)thought that after the login and consent processes, the website knew their Google orYahoo username and password. We found that this misconception was formed because(a) the user-to-IdP authentication/authorization popup window was initiated from andsurrounded by the Fox News website, and (b) when participants were logging back to FoxNews, the popup window simply blinked open and then closed, because the participantshad authenticated to their IdP account in the same browser session.? Privacy concerns: Most participants (eight) were concerned about spam or misuse oftheir information when consenting to profile sharing from their IdP account.? Implicit IdP login concern: Logging into an RP website with an IdP account actuallysigns the user into both the IdP and the RP. All participants (nine) were surprised thatthey could access their IdP account without an explicit login, after they have logged outall RP websites. They were very concerned that they had to explicitly log out from the323.2. Exploratory StudyIdP in addition to the RP websites. Participants sometimes used a public computer orshared a computer with their family members, and wanted to prevent others who sharethe computer from accessing services provided by the IdP.? Account linking is confusing: The ITrackMine website in the study requires users tosign up for a new account or link to an existing account after users have authenticated totheir IdP account, but none of our participants understood the purpose of account linking.Most participants (seven) believed that as soon as they were redirected back from the IdP,they had already logged in to the RP (not true in the case of the ITrackMine website).? Phishing concerns: Most participants (seven) did correctly identify the fake Google orYahoo website as a fake, based on the URL that appeared at the address bar. However,they expressed concern that in future logins, they might not pay attention to the URLbar and other security indicators. In the following formative and comparative studies,we provided participants with a printout of a fake Google login form that obfuscates theURL (which most phishing websites could do), and found that most participants couldnot tell whether or not is was a real Google login form.In last task of the study, we provided our participants with full step-by-step instructionsfor removing the RP websites? access to their IdP accounts. On the page that manages RPaccess, there is a list of granted RPs, each with a list of shared profile attributes and accesshistories. Most participants stated that they did not know how to remove RPs? access to theirIdPs without help. In addition, although it is clear that the IdP is able to track which websitesthey have visited and when, none of our participants expressed privacy concerns about this IdPtracking capability.3.2.3 List of RequirementsBased on these findings and existing literature review, we collected a list of requirements toinform the future design. To be usable, (R1) the RP login form must provide a clear loginaffordance that indicates to users that they can sign in using their existing IdP account. (R2)The solution must leverage the login experience that an average web user already has, andtransform a negative transfer effect (i.e., habituated to enter username and password directly)into a positive one. (R3) It must avoid relying on users? cognitive capabilities to detect phishingsites [WMG06, DTH06, ZECH07, SDOF07, ECH08, SEA+09, Hon12]. (R4) It must provideweb users with a fine-grained privacy control as opposed to the all-or-nothing sharing optionoffered by current IdPs, as well as a central location to manage their overall privacy settings.(R5) The login state of the IdP must be visible to the user. (R6) Future design should assistusers in choosing from different identities for websites that vary in their level of trustworthi-ness. (R7) Future design should provide a single logout mechanism that automatically ends allauthentication sessions when the user logs out of their IdP account.333.3. The Identity-Enabled BrowserFigure 3.3: Main screens of the identity-enabled browser (IDeB): (a) block-out desktop andIdP login form, (b) IdP login form that supports accounts from Google, Yahoo, Microsoft andFacebook, (c) profile sharing consent form, (d) block-out desktop and IdP account selector, (e)IdP account selector, (f) IdP identity indicator, (g) profile sharing setting form.In addition, asking users to provide large amounts of signup information during first eversign on annoys them. If RP websites could provide gradual engagement features that acquireadditional user attributes only when there is a reason for the user to provide them, it wouldincrease the website?s conversion rate (i.e., converting anonymous visitors into users).3.3 The Identity-Enabled BrowserWe developed an alternative web SSO interface design by building identity support directly intothe browser, thereby unifying and simplifying the interface across websites. In this section, wepresent the design details of this identity-enabled browser (IDeB). Note that our IDeB designis a mock-up system, and we mainly used IDeB to explore possible improvements.3.3.1 IDeB Behind the SceneIn order to build SSO support directly into the browser, we could have adopted our pro-posed OpenID protocol extensions [SHB10] to perform authentication with IdPs directly in the343.3. The Identity-Enabled Browserbrowser, and convey the authenticated identity to RPs. However, as the websites in our studyhad not yet adopted the protocol extensions, doing so would have forced us to use different IdPsand RPs for subsequent studies. Because our main evaluation goal was a direct comparisonwith current web SSO solutions, performing study tasks on different websites could have sub-stantially impacted the participants? impressions and preferences. Thus, we decided to employa ?Wizard of Oz? approach to make it appear to participants that the websites that were usedin the studies had adopted our new approach.Our IDeB design consists of two parts: a Firefox extension that integrates the websites inthe study, and a Windows program developed in .NET framework 3.5 that implements ouralternative designs of the SSO user interface. The Windows program runs as a backgroundprocess on the study computer, and prompts the participant with the corresponding UI formsbased on the requests from the Firefox extension.To integrate with the RPs in the study (i.e., Fox News, ITrackMine), the Firefox extensionmodifies the event handlers of the login and logout links on the RP websites via dynamic HTMLDocument Object Model (DOM) modifications. When the participant clicks on the login link,the Firefox extension calls the Windows program to shrink the browser, block out the desktop,and then prompt the IdP login form (Figure 3.3a) or the IdP account selector (Figure 3.3d).When the logout link on the RP website is clicked, the ?look and feel? and the contextual menuoptions of the IdP indicator (Figure 3.3f) are altered accordingly.To integrate with IdPs in the study (i.e., Google, Yahoo, Microsoft, and Facebook), theIdP login form (Figure 3.3b) of IDeB passes the user?s username and password to a proxyprogram that we developed. The proxy program is also running in the background of the studycomputer. When the proxy receives the user?s IdP username and password, it signs onto theIdP using the provided login credential, scrapes the user?s profile information from the IdP?suser profile page, collects cookies issued by the IdP, and then passes the collected user?s profileinformation and cookies back to the IDeB. The collected information is then used by the IDeBfor the subsequent tasks such as profile sharing consent (Figure 3.3c), shared profile editingand revoking (Figure 3.3g), integrating RPs by replacing the user information displayed on theRP website, and providing the participants with access to her email box on the IdP, using thecollected cookies.3.3.2 IDeB from User?s PerspectiveFigure 3.3 shows the main screens of the IDeB. When a user begins to sign onto an RP websitefor the first RP login attempt in a browser session, the IDeB prompts the user to log in usingone of their IdP accounts from Google, Yahoo, Microsoft or Facebook (Figures 3.3a and b).Before it presents any prompts to the user, the IDeB freezes and dims out the whole desktop(the block-out desktop), and shrinks the browser window (similar to the Windows User AccountControl (UAC) prompt). This design could redirect the user?s attention to the IdP login form,and convey a more accurate mental model (i.e., they are giving their credentials only to the353.3. The Identity-Enabled BrowserIdP)?requirement R1. We reused the existing IdP login forms to make the IDeB look moretrustworthy through positive transfer effects?R2.The block-out desktop and the shrunken browser can make it difficult for malicious websitesto phish users? IdP login credentials with spoofing prompts, because the JavaScript of mali-cious websites cannot alter the UI elements outside the chrome area of a browser (e.g., tabs,address bar, tool bars, status bar), or take a screen shot of the current page?R3. Unless theuser?s browser or computer is compromised already, shrinking the browser by using a ?zoomin? animation effect before the login prompts are presented prevents malicious websites fromshowing a similar dialog to the one prompted by the IDeB. Note that although only the browseritself can alter the UI elements outside the chrome area and screenshot the current page, thisphishing-resistant feature may still need the attention of the user.If the user uses an IdP account that has never been used to sign in to this particular RPbefore, a dialog that solicits the user?s profile information will be presented (Figure 3.3c). Theprofile sharing form is pre-filled with the user?s profile from the IdP, and the user can edit theprofile attributes requested by the RP (i.e., a fine-grained privacy control)?R4. Once loggedin, the user?s current login information is shown on an IdP identity indicator located on theleft-hand corner of the browser? status bar (Figure 3.3f)?R5. The user can manage her IdPprofile and sharing of information from the context menu of the IdP indicator. Using the profilesharing setting form (Figure 3.3g), the user can view the last login time (or whether currentlylogged in) for each RP website, edit the shared profile attributes, and revoke the RP?s accessto the IdP account?R4.For the subsequent RP login attempts in the same browser session, the IDeB prompts theuser to select an authenticated IdP account to sign onto the RP (Figures 3.3d and e)?R6. In theIdP account selector (Figure 3.3e), if the IdP account has been used to sign in to the RP website,the last login time to the RP website is shown on the button (i.e., santsaisun@gmail.com inFigure 3.3e). This can serve as a cue for the user to remember which IdP account is used forthis RP website. If the user selects an IdP account that has never been used to sign in to theRP (i.e., santsaisun@yahoo.com in Figure 3.3e), a profile sharing consent form similar to theone in Figure 3.3c will be presented. From the IdP account selector (Figure 3.3e), the user canalso click on the ?Login using a different account? button to use a different IdP account via theIdP login form (Figure 3.3b) for the visiting RP website.When users sign on to RP websites with different IdP accounts, they traditionally have toremember which identities were used to access which RPs, and what profile information beingshared with different websites. To support this, in addition to the visual cue for returning userson the IdP account selector (Figure 3.3e)?R6, the IdP indicators change their appearance basedon the ?signed-up? and ?signed-on? status with the website on the current tab of the browser(Figure 3.3f)?R5. Users may also log out from all websites that used the selected IdP accountfor login (i.e., single sign-out), or view and modify their profile-sharing information with oneclick on the IdP indicator?R7.363.4. Formative Study3.4 Formative StudyTo confirm the findings from the exploratory study (as only nine participants were interviewed),and to test the prototype, we conducted a formative within-subjects study. This study wasdesigned in such a way that each subject spent only a limited amount of time (about 10minutes) with each condition to reduce fatigue effects. We counterbalanced the order in whichthe interfaces were presented.3.4.1 Study ProtocolSeven participants with similar demographics to the exploratory study were recruited. Eachparticipant was asked to perform the same set of tasks, similar to the exploratory study, usingboth the current user interface (denoted as ?CUI?) and the IDeB. After completing a back-ground questionnaire, participants were instructed to sign on to two websites (only the FoxNews and ITrackMine to reduce fatigue effects), and then log out of all websites as if the taskshad been performed on a public computer. We then asked the participants to check their emailusing the IdP account that was used to log in to the RP websites. At the end of each condition,we provided full step-by-step instructions for the participants to remove the access of Fox Newsand ITrackMine from their IdP accounts.After each condition using CUI/IDeB, the participants were asked to draw how they think(their mental model) the information flowed from one location to another during the sign-onprocess to the Fox News website. They were also asked to rate the ease of use, the security,and the level of privacy control of the interface, from 1 to 5 (1=very poor, 5=excellent).When both conditions were completed, participants were asked in a post-session question-naire to compare the usability, security, and privacy of both systems, as well as to express theirfuture preferred login system (the traditional login was included as an option). After the post-session questionnaire, a printout of a fake Google login form was presented to the participants,and we asked them if they could find a way to tell whether or not this was the real Googlewebsite. The phishing identification task was added to the very end of the session to prevent itfrom influencing the participants? responses in the post-session questionnaire. At the end of thesession, the researcher conducted a contextual interview with the participants to understandtheir impressions of both systems. Participants were then debriefed.3.4.2 Results and FindingsMost participants completed the study tasks successfully when working with the IDeB design,while exhibiting similar misconceptions and concerns shown in the exploratory study when usingCUI. As consistently seen both in the post-condition and the post-session questionnaires, as wellas the interview, our IDeB design was preferred by most participants. Note that we do notclaim that IDeB is ready for real-world deployment; instead, we used IDeB mainly as a studytool to explore possible design improvements. We further discuss the conceptual gaps derived373.4. Formative StudyFigure 3.4: The overall Likert-scale ratings from post-condition questionnaires in the formativestudy.Figure 3.5: The block-out desktop and IdP login form designed and used in the formative study.383.5. Comparative Studyfrom the current interface (CUI), and how they were improved by the IDeB in Section 3.6.Figure 3.4 shows the post-condition questionnaire results for the sub-tasks, where the x-axis represents the tasks and the y-axis is the mean rating of the seven participants, withstandard deviation bars. The results suggest that our design is easier to use, is perceived to bemore secure, and affords more privacy control. In the post-session questionnaire, 29% of theparticipants stated that they would prefer to use the traditional login option instead of usinga single-sign-on system; the remaining 71% would prefer to use our IDeB design, with nonechoosing CUI.There were, however, issues that were revealed for our interface; in particular, the block-outdesktop and the IdP login form, as illustrated in Figure 3.5. First, two participants thoughtthat the IdP login form was popped up by the RP website, and they were giving their usernameand password to the website directly. Second, three participants commented that the lookand feel of our interface (Figure 3.5) was not familiar, and that this affected their trust of thesystem. Feedback from participants suggested that in addition to usability, the trust a user hasin the interface plays a substantial role in the success of the approach. Finally, we found thatthis version of the IDeB still relied on the users? cognitive capabilities to detect IdP phishingattacks, as a malicious RP could dim out its website and then prompt the IdP login form tospoof the user.To address these issues, we redesigned IDeB as follows: (1) The IDeB shrunk the browserwindow before presenting the IdP login form (Figure 3.3a) to prevent malicious RP websitesfrom showing a similar dialog to the one prompted by the IDeB. (2) We reused the existing loginforms from IdP websites (Figure 3.3b) instead of presenting a customized one (Figure 3.5) tomake the IDeB feel more trustworthy through positive transfer effects. (3) The IDeB made theblock-out desktop completely opaque instead of transparent in order to convey a more accuratemental model (i.e., users are giving their credentials only to the IdP).3.5 Comparative StudyAfter revising the prototype based on the deficiencies noted from the formative study, weemployed the revised interface to conduct a full within-subjects comparative usability studythat compares the CUI and the IDeB.3.5.1 ParticipantsWe recruited 35 participants from both the University of British Columbia and the generalcommunity for the study. All participants were paid $10 CAD for their participation. To en-sure diversity, we screened interested participants by email, asking their age, gender, degreeand major, occupation, and whether or not they were students. We counterbalanced the orderof presentation by dividing participants into two groups: the 18 participants in Group 1 (G1)used the CUI before the IDeB, while the 17 in Group 2 (G2) used the IDeB before the CUI.393.5. Comparative Study(a) A representative sample of correct (left) and incorrect (right) mental model drawings.(b) Percentages of participants who developed correct mental models.Figure 3.6: Sample mental model drawings and results.Property Group 1 Group 2 TotalN = 18 % N = 17 % N = 35 %Gender (F / M) 10 / 8 56 / 44 6 / 11 35 / 65 16 / 19 46 / 54Student (Y / N) 8 / 10 44 / 56 10 / 7 59 / 41 18 / 17 51 / 49Age 19?24 8 44% 6 36% 14 39%25?34 5 28% 5 29% 10 29%35?44 5 28% 5 29% 10 29%45 or over 0 0% 1 6% 1 3%Table 3.2: Participants? demographics in the comparative study.403.5. Comparative StudyType Task z p r 50th (Median)N = 35 Asymp. Sig. = z/?N ? 2 CUI IDeBEase Fox News -3.331 .001 0.40 4 5-of-use ITrackMine -4.559 .000 0.55 2 5Revoke -4.774 .000 0.57 2 5Security Fox News -2.356 .018 0.28 3 4ITrackMine -2.725 .006 0.33 2 3Privacy Fox News -3.654 .000 0.44 2 4ITrackMine -3.643 .000 0.44 2 4Table 3.3: A Wilcoxon Signed Rank Test revealed a statistically significant difference betweenthe CUI and the IDeB in the perceived ease-of-use, security protection and privacy control forall sub-tasks (z = ?2.356 to ?4.774, p < .018), with a medium to large effect size, (r = 0.28 to0.57). The median rating scores for the CUI and the IDeB are listed on the last two columnsrespectively.Participants with similar demographics were divided among the two groups to reduce the in-dividual differences that might affect the development of their mental models (see Table 3.5.1for participant demographics). None of the differences in demographic properties between thetwo groups were statistically significant (Chi-square test). Participants had a wide range ofeducation levels (from high school to Master?s degree) and the 17 non-student participants hada variety of occupations, such as teachers, financial planners, dentists, business managers, andIT support technicians.3.5.2 ResultsIn the following sections, we present results collected from post-condition and post-sessionquestionnaires. Throughout, we specify the results overall (All), and by the two presentationorder groups (G1?CUI first, G2?IDeB first) in order to examine whether the order of conditionsaffects the users? mental models and their preferences.Mental Model DrawingsAs Jonassen et al. [JC08] state, ?drawings can be a complementary method of verbal reports? forcapturing users? mental models. After each condition, we provided participants with four pic-ture cutouts (?You?,?Browser?, ?Fox News?, ?Google/Yahoo/Microsoft/Facebook?) and askedthem to express how they believe the information (in terms of their username, password, pro-file data) flows from one entity to the other when they sign on to the Fox News website. Wecategorized a mental model drawing as ?correct? if the participant clearly indicated that theygave their username and password only to their IdP, but not the Fox News website. Figure 3.6aillustrates a representative sample of correct and incorrect mental model drawings from our par-ticipants. The percentages of participants who developed correct mental models in the studyare shown in Figure 3.6b. We further examine the gaps between the participants? incorrect413.5. Comparative Study(a) The perceived ease-of-use Likert-scale ratings.(b) The perceived security protection Likert-scale ratings.(c) The perceived privacy control Likert-scale ratings.Figure 3.7: The average and standard deviation of Likert-scale ratings from post-conditionquestionnaires. The differences are statistically significant with a Wilcoxon Signed Rank Test(see Table 3.3).mental models and the underlying system model in Section 3.6.1.Ratings and RankingsAfter each condition (i.e., completing the study tasks using the CUI or the IDeB), participantswere instructed to rate the perceived ease of use (Figure 3.7a), security protection (Figure 3.7b),and level of privacy control (Figure 3.7c) from 1 to 5 (1=very poor, 5=excellent). The ratingdifferences between the CUI and the IDeB are statistically significant with a Wilcoxon SignedRank Test as shown in Table 3.3.At the end of the session, participants were asked: ?For these two approaches that youused to sign onto different websites in the study, which one is easier for you to use/makesyou feel more secure/makes you feel more in control of your privacy?? Figure 3.8 shows theranking results from post-session questionnaires, which conforms to the post-condition Likert-423.5. Comparative Study(a) Ease of use (b) Security (c) PrivacyFigure 3.8: The perceived ease of use, security protection, and privacy control ranking resultsfrom post-session questionnaires suggest that our design is favored by most participants.scale ratings in Figure 3.7. We report only the overall rankings in Figure 3.8, as there wereno significant differences observed in participants? choices in terms of the order of interfacepresentation.Login Option PreferencesIn the post-session questionnaire, we asked all participants, ?In the future, if you encountera website that supports using a third-party account to log in (similar to the websites in thestudy), which approach would you use to login?? Possible options for the participants included:?CUI?, ?IDeB?, ?traditional login?, ?depends on which website they are logging into?, and?Don?t know/haven?t decided.? We then probed the reasons behind their choice. Figure 3.9ashows the participants? preference for future login. One interesting observation is that one-third of participants preferred using SSO (IDeB?29% or CUI?3%), another one-third choseto create a separate username and password on different websites (29%), and the rest basedtheir preference decisions on the types of websites they are accessing. Possible factors thatinfluence their adoption intentions are further discussed in Section 3.7.We asked participants who chose ?it depends? (36%) to provide their reasoning behindwhich login options they would prefer to use, and on what kinds of websites. All of them statedthat they would not use the SSO on websites that contain valuable personal information (e.g.,bank, tax, stock websites). For the other websites, if the website itself is trustworthy (e.g., awebsite that they are familiar with or that has a good reputation), they would like to use anSSO solution, and would prefer to use the IDeB (All 85%, G1 88%, G2 80%), because of itsease of use and privacy control; otherwise, they would rather create a separate account on thewebsite to avoid misuse of their IdP account.Figure 3.9b shows the participants? login preferences for non-valuable, but trustworthy,websites. The percentage of the participants who chose ?Depends? from Figure 3.9a is brokendown and added to the CUI or IDeB, based on their indicated preference. For example, thepercentage of all participants that preferred IDeB on websites they trust is calculated as 36%(?Depends?) * 85% + 29% = 60%.433.5. Comparative Study(a) The login option preferences regardless of types of websites.(b) The login option preferences for non-valuable but trustworthy websites.Figure 3.9: The login option preferences from the post-session questionnaire indicate that 60%of study participants would use IDeB on the websites they trust.443.6. Conceptual GapsFigure 3.10: The data flows of the system model and the acquired incorrect mental model. Thesystem model has a triangular data flow with two distinct browser sessions with the RP andIdP, while the data flow of the incorrect mental model is linear (i.e., the user?s login credentialis given to the RP without an active session with the IdP.)3.6 Conceptual GapsIn the exploratory study, we noticed several common misconceptions and concerns exhibited byour participants (e.g., the incorrect belief that the RPs knew their IdP login credentials, concernsregarding implicit IdP login, uncertainty about what the RP could do to their IdP accountafter consent, and account-linking confusion). We revised the design of our formative andcomparative studies to better understand the root causes of those confusions. Through mentalmodel drawings, questionnaires, and post-session interviews, we identified several conceptualgaps between the acquired mental model and the underlying system model. In this section, weexamine how those gaps are formed, and how they influence participants? perceptions. Notethat the statistics reported in this and subsequent sections are based on the comparative studypresented in Section 3.5.3.6.1 Triangular versus Linear Data FlowMany of our participants developed an incorrect mental model via the interactions with theweb SSO systems in the study. As illustrated in Figure 3.10a, the web SSO architecture hasa triangular data flow in which the authentication request and response are passed betweenthe RP and the IdP through the browser. There are two separated browser sessions with theRP and the IdP respectively; the user-to-IdP authentication and authorization (i.e., profile453.6. Conceptual Gapssharing consent) are performed only within the IdP session. The RP and IdP browser sessionsare independent of each other, because the browser?s same-origin policy [Rud08] prohibits adocument or script of an RP website from accessing web content served by an IdP site. Foreach IdP, a user may need to authenticate only once in the same browser session, as an IdPtypically issues an authentication cookie after a successful login, and then uses the cookie toauthenticate the current user for the subsequent requests. Via an option on the IdP loginform (e.g., ?remember me?) that causes a persistent session cookie to be stored on the user?scomputer, an authenticated IdP session could be retained across browser sessions (i.e., after thebrowser is closed).As opposed to the triangular authentication flow in the system model, the data flow of anincorrect mental model is linear, as illustrated in Figure 3.10b. In this mental model, the userprovides the RP with their username and password to retrieve profile information from theirIdP account. In addition, the user believes they interact solely with the RP, and that the RPcould access the user?s profile information at anytime.We observed that many participants formed an inaccurate mental model when they weresigning in to the Fox News website initially, and then the incorrect mental model was ?con-firmed? when they were asked to sign out and log back into Fox News. On the first loginattempt, because the user-to-IdP authentication and authorization were performed on a pop-up window originated from and surrounded by the Fox News website, participants thought theywere giving their IdP login credentials to the RP. On the second login attempt, as partici-pants had already authenticated to their IdP in the same browser session, the pop-up windowsimply blinked open and then closed without prompting users for authentication. Participantsexpressed that the lack of an IdP login prompt for subsequent RP login attempts reinforcestheir incorrect belief that the RP website possesses their IdP username and password.With the incorrectly formed mental model, participants made dangerous mistakes and ex-hibited surprise and concern:? On the Fox News login form, many participants (69%) thought that they should entertheir IdP username and password directly into the traditional login fields in order toinitiate a login process.? When logging back into the Fox News website, 29% of participants were surprised thatthey could sign in without any authentication and authorization (i.e., the pop-up windowblinked and then closed).? On the ITrackMine?s login form, 29% of participants again entered their IdP usernameand password into the traditional login fields directly.? When logging in to the ITrackMine website, 26% of participants were surprised that theywere only prompted for profile sharing consent without needing to enter their IdP logincredential (because of the authentication cookie); they were confused and wondered wheretheir IdP user and password had been stored.463.6. Conceptual Gaps? When instructed to log out of all websites as if they were going to leave the computer,71% of participants logged out of the RP websites only. After being asked to check theiremail, those who did not log out from their IdP account and left the browser open (orclosed the browser but kept the ?remember me? option checked when logging in to theIdP) were surprised to see that they could access their email without an explicit login.? 26% of participants expressed concerns about possible misuse of their login credential byRP websites, and stated that they would use the SSO only on trustworthy websites.To aid in developing and maintaining an adequate mental model that could reduce users?security errors and concerns, our IDeB design (1) shrinks the browser in which the RP websiteis shown, and moves it to the top left-hand corner of the desktop before presenting the IdP loginform to the user (see Figures 3.3a and b), and (2) prompts the user to select an authenticatedIdP account for every subsequent RP login attempt (Figures 3.3d and e). Using our design,69% of participants formed a correct mental model (CUI 33%). Notably, 44% of participantsin G1 acquired an incorrect mental model when they first used the existing interface, but laterdeveloped a correct mental model when using our design.3.6.2 The By-Value versus the By-Token Profile-Sharing ModelMany participants (40%) were hesitant to authorize the release of their IdP profile informationto the RP website, because it was uncertain to them what the RP could do after consent: Whatwas the scope of the authorization? Was the granted permissions limiting the RP?s access onlyto basic identity attributes, or including personal generated contents and friend list as well?Could the RP post messages or update one?s status to the IdP account? How long would theauthorization last? Was the authorization still valid even after logged out from the IdP account?Could the authorization be revoked or not, and how?The answers to the aforementioned questions largely depend on the profile-sharing modelsupported by the web SSO protocol in question, but they also vary subtly among individual IdPimplementations. We observed that there are two different profile sharing models supported bycurrent web SSO solutions: by-value and by-token. With a sharing-by-value model, a copy ofthe requested profile information is passed to the RP via the browser when the user is redirectedback to the RP website. Once the SSO login process is completed, the user?s profile data residingon the IdP is no longer accessible to the RP. In contrast, with the by-token model, instead ofthe actual profile attributes, an access token that represents the scope and duration of theauthorization is passed back to the RP. Using the authorized access token, the RP then makesrequests to access the user?s data through a direct communication (i.e., not via the browser)with the IdP.Currently, the by-value profile sharing model is provided by OpenID with Simple Regis-tration [HDR06] and Attribute Exchange [HBH07] extensions. Major OpenID providers suchas Google, Yahoo, AOL, MyOpenID, and PayPal, support this sharing model. As opposed to473.6. Conceptual Gapsmodel by-value by-tokenprotocol OpenID OAuthscope identity attribute identity, social graph, content, streams, etc.format key-value pair compound datavisibility explicitly shown implicitly describedaction read only read, append, writeduration one time one time, time limited, permanentTable 3.4: The main differences between by-value and by-token profile sharing models.OpenID, which is designed mainly for authentication with profile sharing as its extended func-tion, the OAuth protocol [HLRH11] is primarily designed for authorization, and is commonlyused to realize the by-token profile sharing model. OAuth enables a user to grant a third-partysite access to their information stored with another service provider, without sharing their logincredential or the full extent of their data. Major social websites such as Facebook, Twitter,MySpace, Microsoft, Google, and Yahoo, employ OAuth to achieve single sign-on and facilitateuser content sharing between websites. Some IdPs, such as Google and Yahoo, support bothsharing models; the authentication request from a specific RP determines which sharing modelis activated.The main differences between these two profile sharing models are listed in Table 3.4. Withthe by-value model, the user?s profile attributes are shared as parts of the protocol payload;they are simple key-value pairs that could be shown explicitly on a profile sharing consent form.In contrast, an access token does not contain profile attributes, but the presence of the tokenallows the RP to retrieve, append or update the user?s profile data by calling API servicespublished by the IdP. With the by-token model, the scope and duration of an authorizationcould be customized for each IdP implementation. Typically, in addition to identity attributes,compound data such as a user?s social graph (i.e., friends with roles), personal content (e.g.,photos, videos, blogs) and message streams (e.g., status updates, comments) are shared withRP websites.The duration of an authorization is another source of confusion among users. Varying byindividual implementation, the duration of an authorization could be one-time, a limited periodof time, or long-lived, and would depend on whether the user has logged into the IdP. Here aresome examples:? Twitter: The lifetime of an access token is long-lived until explicitly revoked by the user.? Facebook and Google: One hour by default. When offline access permission is explic-itly authorized by the user, the RP could perform authorized requests on behalf of theuser at any time.? Microsoft: An access token is valid as long as the user is still signed in to Microsoft LiveConnect. A special authorized permission (wl.offline access) enables an RP to readand update a user?s information at any time.483.6. Conceptual Gaps? Yahoo: One hour by default; the access token could be renewed via OAuth Session ex-tension [TAF08].Our prior work [SBHB10] found that the by-token sharing model provides RPs with higherbusiness incentives, because RPs could (1) get access to users? social graphs in addition to theirprofile data, (2) utilize platform-specific services such as messaging, and (3) provide a richer userexperience through social plug-ins such as recommendations and activity feeds. Nevertheless,results from our study show that users? privacy concerns significantly influence their adoptionintention. In addition, we noticed that most participants did not know how to manage theirauthorizations on each IdP website, and many found it difficult to do so even when told how.To improve users? perceptions of privacy and control, our IDeB design (1) sets the emailaddress as the only required attribute; the rest of profile attributes are not shared by default,(2) allows users to edit the requested profile attributes before authorization, and (3) provides acentral location for users to manage all of their profile sharing. As shown in our participants?rating and ranking, as well as comments from the interviews, these design features enhance par-ticipants? perceived ease of use and privacy control. Shehab et al. [SMH11] propose an extensionto the OAuth authorization protocol that enables the provisioning of fine-grained authorizationto users when granting permissions to third party applications. They implemented the proposedOAuth extension as a browser extension, and collected data regarding user decisions. The ex-tension was installed by 1,286 Firefox users, who installed 1,561 unique Facebook applications.Consistent with our findings, their results show that users do have varying willingness towardsdifferent profile-attribute sharing.3.6.3 The Transient SSO Account versus the Traditional AccountWhen signing into the ITrackMine website using the current interfaces (CUI), most participants(94%) could not complete the task without requiring assistance from us (i.e., explaining whythey needed to sign up a new account or link to an existing one). Most participants thoughtthey had already logged into ITrackMine after being redirected back from their IdP, but theRP website required them to complete an account linking process before granting access. Asthe concept of account linking was not clearly conveyed by the RP website, many participantsexhibited confusion or frustration: ?I thought I had already logged in.? ?Then, what was thepoint of signing into Google?? ?Are you going to leave me struggling here?? ?Argh? I amfrustrated.?The purpose of account linking is to gather additional profile information required for a newaccount, as well as to enable existing users to login using the SSO. Most RP websites integratethe SSO after many traditional accounts have been manually registered using a traditionallogin approach. A traditional account typically contains a unique username, password and avalidated email along with other profile attributes. On the other hand, an RP website createsa transient SSO account after a successful SSO process, which contains a unique identifier andthe user?s profile data from the IdP. Before granting access, however, the profile information493.7. The Web SSO Technology Acceptance Modelin the transient SSO account may not be sufficient (e.g., missing zip code or date-of-birth). Inparticular, some RPs require a unique username and password from every user to ensure theuser can still login when their IdP account is inaccessible, and a valid email address for thepassword reset and future communications. If any of the required information is missing fromthe transient SSO account, the RP needs to prompt users to either provide it manually or loginto an existing traditional account that already has the required information. In both cases,the SSO account is linked to a traditional account (a new or existing one), and the link can bechecked by the RP in the future SSO process.An account linking process migrates a transient SSO account into a traditional account,thereby unifying and simplifying access control after login. However, most of our participantsdid not understand the purpose and concept of account linking. To provide a frictionless SSOuser experience, we suggest that RPs should avoid account linking during SSO, assign differentlevels of privilege for SSO accounts and traditional accounts, and allow the user to perform thetask at hand with just a transient SSO account.3.7 The Web SSO Technology Acceptance ModelOne of our research goals is to understand what factors influence users? adoption intentions,and how. To represent the users? acceptance of web SSO solutions, our model is based onDavis?s technology acceptance model (TAM) [DBW89], one of the most widely-used models forexplaining the factors that affect user acceptance of information technologies. The TAM positsthat users? perceived ease of use and usefulness predict application usage. Several previousresearch efforts have extended and instantiated TAM with variables in different applicationdomains, such as internet banking [SH03], mobile commerce [WW05], World-Wide Web [Fen98,LMSZ00, MK01], and enterprise resource planning [AGS04]. Pavlou et al. [Pav03] proposes amodel that integrates trust and perceived risk with the TAM to predict consumer acceptance ofelectronic commerce. They posit that both trust and perceived risk influence users? intentionsto transact, and that consumer trust positively impacts the perceived usefulness and ease ofuse of a web interface.Using insights from our studies, we introduce a technology acceptance model in the contextof web SSO, as illustrated in Figure 3.11. Our study found several factors that hinder users?adoption intentions; we correlated them as antecedent variables to the intermediate factorsin the existing technology acceptance model. Each identified variable was categorized as anintrinsic variable that could be improved by the design of a web SSO system, or an extrinsicvariable that is difficult to resolve with technology alone. Consistent with Pavlou?s findings, ourstudy shows that users? risk perceptions influence their attitudes towards accepting a web SSOsolution. We further found that, within the context of web SSO, the perceived risk actuallyinvolves users? personal information on both RP and IdP websites, and each is influenced bydifferent variables.503.7. The Web SSO Technology Acceptance ModelFigure 3.11: Web single sign-on technology acceptance model. The acceptance factors we foundare correlated as antecedent variables to the intermediate factors in TAM, and categorized asintrinsic and extrinsic variables. Solid arrowed lines indicate positive influences, while dashedarrowed lines represent negative influences.3.7.1 Intrinsic VariablesAs illustrated in Figure 3.11, users? perceived ease of use and their risk perception influencetheir intention to use a web SSO system. Intrinsic variables are mainly misconceptions, anduncertainties resulting from interactions with SSO systems, as we discussed in Section 3.6. Wefound misleading login affordance and account-linking confusion are the main variables thatimpact the participants? perceived ease of use. Moreover, as participants were asked to usetheir real IdP account for the study tasks, we found that participants? risk perceptions withrespect to their IdP account were significantly raised by their security misconceptions andprivacy concerns.Our IDeB design was intended to improve the intrinsic variables that hinder participants?adoption intentions. Compared to the current user interface, it was a preferred option for ourparticipants. Nonetheless, from post-session interviews, we also found that extrinsic variablesplay a significant role in a user?s preferences for future login options.3.7.2 Extrinsic VariablesEven with a highly usable web SSO system, some participants were reluctant to adopt it onany website, or preferred to use it only on certain websites. The extrinsic variables we foundare listed as follows:? Existing password management strategies: Web SSO solutions could reduce the numberof passwords a user needs to manage. However, we found that the perceived usefulness of513.7. The Web SSO Technology Acceptance Modelthis feature is reduced by participants? existing password management strategies. Similarto Herley?s findings (2009), as the majority of user experiences indicate that weak pass-words typically do not lead to physical asset loss, most users are ?comfortable? with weakor reused passwords. Some (23%) participants in our study used the password managerfeature in the browser to reduce their memory burden. Password managers are inconve-nient when users switch between computers or when they want to use shared or publiccomputers. However, our participants view this as an acceptable solution, because theymostly work on one computer, and most websites provide a password recovery mechanism(e.g., a temporary password sent to the registered email account).? Single point-of-failure concern: One inherent risk of using a web SSO is that one compro-mised account on an IdP can result in breaches on all services that use this compromisedidentity for authentication. Of those participants who favored traditional login, 90% ofthem expressed this concern.? The value of personal information: The value of personal data on RP websites increasesa user?s perception of risk. All the participants that chose the ?depends on which websitethey are logging into? (36%) future login option stated that they would not use SSO onwebsites that contain valuable personal information or involve potential risk of monetaryloss (e.g., banking or stock trading websites). They preferred to create a separate accounton those websites. Even though our questionnaires did not explicitly ask our participantsabout this concern, we believe that it is a general concern applicable to most web users,including participants who chose our design.? Trust levels with RP websites: Web users need to be confident that their IdP profilewould not be misused or abused by RP websites. Most of the participants who choseweb SSO as their future login option (60%) stated that they would only use a web SSOsolution for websites that are trustworthy or with which they are familiar; for websitesthey do not trust, they prefer to use the traditional login option. Two of our participantsrefused to use any web SSO solutions as their future login option due to a prior privacycompromise resulting from using Facebook Connect for login. They mentioned: ?I hadterrible experiences with websites that I used Facebook for login. They posted what Iplayed there to my wall and make me feel embarrassed because all my friends knew aboutit.? ?I used Facebook Connect for a while, but then stopped, because I didn?t like that itconnects all the places to my Facebook account.?Web SSO TAM provides SSO solution designers with a holistic view of the causality depen-dencies between a user?s adoption intention, intermediate factors, and the design and environ-mental factors. We could not measure the level of concerns for extrinsic factors as they wereuncovered from the qualitative interview data. Nevertheless, for a given context or scenariowhen each factor is measured and the weight of each causality dependency is computed, the523.8. Recommendationsweight-populated SSO TAM model can be used to identify strengths and weaknesses of a givenweb SSO design, prioritize design improvements, and predict how likely it would be that a po-tential user would adopt a given SSO solution based on measures taken from a brief interactionwith the system.3.8 RecommendationsBased on our findings, we recommend UI and login flow improvements for RP and IdP websites.To enhance the workflow efficacy of future web SSO solutions, we also offer recommendationsfor the web SSO development community.3.8.1 Recommendations for RPs and IdPsRPs play a substantial role in the success of a web SSO solution. We recommend the followingUI and login flow improvements for RPs:? Provide clear login affordance: Our study found that simply placing a list of IdP iconsbeside traditional login fields could be misleading. To provide a clear login affordance, ourIDeB design separates a traditional login form from SSO login options, with each existingIdP login form located on a separate browser tab (see Figure 3.3b). This design wasintended to transform the observed negative transfer effect into a positive one in whichthe user?s prior knowledge or experience facilitates the acquisition of a mental modelrather than leading to an incorrect one.? Provide visual cues for returning users: Web users may use different IdP accounts for RPwebsites, which vary in trustworthiness in order to preserve privacy and prevent singlepoint-of-failure. However, it might be difficult for a user to remember which IdP accountwas used for accessing which RP, and to determine why an access failed or whom tocontact when a problem is encountered. To reduce the memory burden on users in thisIdP-to-RP mapping problem, we suggest that RPs should provide visual cues for returningusers. One way to accomplish this is by using a persistent browser cookie that encodesthe last login-related information (e.g., login option used, selected IdP, username, logintime). By checking whether the cookie in question is presented in the HTTP request, theRP could present a customized login screen showing the username, last login time, loginoptions and the corresponding IdP icon to guide the returning user for login.? Practice the principle of gradual engagement: Requiring sign up for an account beforegranting access could discourage potential visitors from trying out a new web service.When an anonymous visitor consents to use one of their IdP accounts for the visitingRP, the RP should grant the user the required permissions for the task at hand withoutrequesting any additional personal information from the user?the principle of gradualengagement [Wro08]. This instantly turns the visitor into a marketable lead, who is533.8. Recommendationsidentifiable by the unique user identifier issued by the IdP or their email address. Once thevisitor is identifiable, the RP could gradually engage with the user to acquire additionalattributes when there is value for the user to provide them. Ultimately, the RP maybe able to convert the user from performing simple actions, such as page browsing andcommenting, to more desired transactions, such as sales of products or software downloads.? Avoid account linking in the SSO process: As our results suggest, account linking coulddiminish the usability gain of SSO. We suggest that RP websites should avoid includingaccount linking at the end of an SSO login process and instead follow the principle ofgradual engagement, which allows users to perform the task at hand without an additionalaccount linking step. To enable the existing users to log in using the SSO, we suggest thatRPs should prompt existing users to link to an IdP account during a traditional login ifthe association has not yet been established, instead of including account-linking in anSSO login flow.In our study, we found that about one half of our participants did not have prior SSOexperience. Based on our SSO research experience and literature reviews, we suggest that RPsshould convey the value of the web SSO, and promote the SSO login option on their websitesin order to enhance its perceived usefulness to users by the following:? Convey the benefits of the web SSO: The traditional login approach typically requiresnew users to fill out a sign up form and remember their chosen password. In addition toprofile information, a signup form normally requires a user to choose a unique username,pick a memorizable password that conforms to the password policy, validate the providedemail address through an activation link, and pass a CAPTCHA challenge. In contrast,a properly designed web SSO implementation allows users to sign up and log in withfew clicks by reusing login credentials and profile attributes from their IdP account. RPsshould clearly convey this ?two-click sign up and log in? usability gain to web users.? Promote the web SSO login option: A high conversion rate, where anonymous visitorsare turned into users, is desirable for many websites; however, most websites enjoy onlysmall conversion rates. The average online conversion rate is around 3%, with the highestat approximately 9% [Str09]. Compared to the traditional login approach, the web SSOcould encourage an anonymous visitor to try out a site?s services with a few simple clicks.Hence, RPs should promote the web SSO login option by placing it on the top or left-handside of a traditional login option, or requiring one additional click to reach the traditionallogin form.Privacy concerns are another major obstacle that impedes users? SSO adoption. To minimizeusers? privacy concerns, we recommend the following suggestions for IdPs:? Provide a fine-grained privacy control: Our results indicate that users want to controlthe degree of disclosure of their IdP profile information to RP websites, but this control543.8. Recommendationsis lacking from the current IdP implementations. The current option for profile sharingis now all or nothing, which might be intended by IdPs to trade users? privacy for thewebsites? adoption as RPs. To reduce users? privacy concerns, IdPs should provide a fine-grained privacy control that allows users to edit the scope and duration of the requestedpermissions before consent.? Explicit user consent: Prompting user consent for each RP sign-on request could increaseusers? privacy awareness and control. Automatic authorization granting (i.e., consentonly once for a given RP) should be offered only to RPs that explicitly request it duringregistration. To encourage the practice of the principle of least privilege by RPs, IdPscould also prompt a user consent for every authorization request originated from RPsthat ask for extended permissions, such as offline or publish actions.? Support of multi-persona accounts: Four participants who favored our design told us thatthe IdP account they used in the study is a spare or garbage account, and that theyused this account to sign onto untrustworthy websites in order to avoid their security andprivacy concerns. Using a fake account for SSO is a good strategy for users to minimizetheir risks, but it requires users to remember and to switch to a corresponding IdP accountwhen visiting an RP website. In particular, if both fake and real accounts are from thesame IdP, the user needs to log out of the real account and then log in with the fake one,or use a different instance of the browser for each account. Participants who used a fakeaccount in the study preferred the IDeB, because it could cue them to switch to an IdPaccount with one simple click. Based on this insight, we suggest that IdPs should allowa user to maintain multiple profiles in one account; each profile represents a particularpersona of the user. During the SSO, a user could choose an appropriate persona forthe visiting RP website, with the profile previously used for the RP selected by default.Note that this recommendation does not address the risk of IdP phishing attacks. Despitemaking it easier for those users who want to use a different persona for different websites,the user still needs to enter her master password for the IdP account.3.8.2 Recommendations for the Web SSO Development CommunityBased on the successful experience of the password manager enabled browser, a web SSO solu-tion would be more likely to be trusted and adopted by web users when directly supported bythe browser. In addition, as shown in our study, an SSO-supported browser could provide userswith (1) a consistent interface and flow across RP websites that could unify users? SSO expe-riences and encourage positive transfer, (2) clear login affordances and visual cues that guideboth first-time and returning users through the login process and convey an adequate workingmental model, (3) privacy-preserving features such as a fine-grained permission control, identityselector, and in-browser profile authorization management, and (4) phishing-resistant mecha-nisms that prevent IdP phishing attacks without relying on the users? cognitive capabilities and553.9. Limitationscontinuous attention. Moreover, an SSO-enabled browser could eliminate the need for RPs todesign a customized SSO login form, and would reduce the RPs? integration efforts by providinga unified protocol interface for them to integrate a diverse range of web SSO protocols.As the browser is the central piece that communicates with all actors in the identity ecosys-tem, we conjecture that the browser can potentially provide a driving force for users to adoptSSO when the browser is directly augmented with identity support. An identity-enabled browsercould be more usable for emerging application domains as well. Current HTTP redirection-based web SSO solutions could be problematic in Web 2.0 mashup applications that aggregatepersonal data located on multiple websites. For server-site mashups that integrate a user?spersonal content from different providers, being presented with a login form on each serviceprovider could be annoying, and impose a cognitive burden on the user [BKM+08]. For client-side mashups that use AJAX-style web services to acquire user data from several websites, loginforms will block such communications. In addition, existing solutions may be more difficult touse on mobile or appliance devices that have limited input capabilities.After our user study (the related materials and results have been published [SHB10, SPM+11a,SPM+11b]), we were delighted to learn that Mozilla was releasing a new browser-supported SSOproposal (Mozilla Persona [Moz12b]). Although Mozilla Persona and our IDeB design have thesame idea of building identity support directly into browsers, the UI designs are significantlydifferent (as presented in Section 3.3). In addition, while Mozilla Persona uses BrowserID pro-tocol [Moz12a] (previous known as ?Verified Email Protocol?), our IDeB design leverages andextends OpenID 2.0 as the underlying protocol (further discussed in Section 7.4 of Chapter 7).3.9 LimitationsThe design of our study supported a direct usability comparison of our IDeB prototype withcurrent SSO solutions. However, because of the inherent limitations of this within-subjectsstudy, we could not evaluate the effectiveness of some important features provided by ourdesign (e.g., phishing protection, multiple IdP sessions, in-browser profile editing and sharing,and single sign-out), and validate the proposed web SSO technology acceptance model. Inaddition, our empirical study results have the following limitations:? Generalizability: Participants were primarily young adults, with only one participant over45 and none under 19. All of the participants reported browsing the web daily or more,and thus might be less prone to errors or misunderstandings while using the interface.? Ecological validity: The participants were restricted to using the computer and RP web-sites provided to them during the study. In addition, only the first-time user experiencewas studied; we did not examine daily usage behaviors. Expanded (more websites) andlonger term studies are recommended to address this. In addition, we revised the designof the block-out desktop and the IdP login form used in the formative study to the one563.10. Summaryemployed in the comparative study solely based on our observations and feedback fromour participants. Due to the limitations of this short-term laboratory study, we could notevaluate the familiarity impact of the customized login dialog on the user?s level of trust,which requires a long-term field study.? Precision: Carryover and fatigue effects due to the within-subjects format may haveaffected the study results (although responses were similar between the two groups). Abetween-subjects study will be required to validate whether those negative effects did existin our study. Moreover, in the post-session interview, most participants expressed seriousconcerns about IdP phishing once informed about the issue, and of those participantswho preferred traditional login, 90% of them expressed the single point-of-failure concern.These two concerns?as well as other extrinsic factors?were uncovered and identified fromthe analysis of the qualitative interview data. As those factors were unknown to us beforethe study, we did not measure the level of each concern. The degree of concern regardingthe factors we uncovered, however, can be further quantified and validated by large-scalesurveys or laboratory studies. Likewise, four participants in the interview stated that theyuse multiple IdP accounts for websites that vary in degree of trustworthiness, and that ourIDeB design could help them remember which RP website they linked to with which IdP.But as the interviews with those participants occurred in the middle of the whole study,we did not have a chance to ask participants, who have had a prior experience of webSSO, how many IdPs they utilize on how many RP websites. This statistical informationis, however, an important support for the needs of our multiple-persona recommendation.We also found issues with our IDeB interface that require further improvement. First, mostparticipants did not notice the identity indicator at the bottom left-hand corner of the screen.Second, it was not clear to the participants that the IDeB does not store their password on thelocal computer, and some participants were consequently concerned that the stored passwordand profile information could be compromised. Third, some participants thought that theywere giving their username and password to the websites directly. Moreover, we suggest thatthe account-linking task should be performed during a traditional login rather than at the endof an SSO process; nevertheless, how to convey the concept and benefits of account linkingand how to design a usable interface for managing account linking-related tasks (e.g., linkingto one or several IdP accounts, unlinking, auditing) are research questions that require furtherinvestigation.3.10 SummaryIn this chapter, we present three user studies that investigate users? perspectives of web SSO.Through our empirical investigation, we found that web users could perform SSO correctlyafter having been taught, but many of them may not want to trade their privacy and security573.10. Summaryfor usability gains. To reduce users? privacy concerns, we believe that it is crucial that RPspractice the principle of gradual engagement, and IdPs provide a fine-grained privacy controland on-login profile switching option. In addition, future research should investigate how toenhance users? security perceptions and mitigate IdP phishing attacks without relying on users?cognitive capabilities. We do not claim that our design is ready for real-world adoption, as onlyhorizontal user interface functions were designed and evaluated. Nevertheless, we hope that ourdesign and study results will inform the design of future web SSO solutions. In summary, thework presented in this chapter makes the following contributions:? We identify conceptual gaps between the user?s mental model and the system model, andanalyze how they affect user experience and perceptions in regards to SSO.? We suggest a web SSO technology acceptance model that explains how each factor wefound influences users? acceptance of a web SSO solution.? We design and implement an SSO-enabled browser, and use it to explore possible im-provements in the users? SSO experience. Our results indicate that with an enhancedprivacy control and improved affordance that guides users through the SSO login process,more than 60% of study participants would use SSO solutions on the websites they trust.? We suggest design improvements for RP and IdP websites, and web SSO developmentcommunities.58Chapter 4Formal Analysis and EmpiricalEvaluation for OpenID SecurityThis chapter presents the results of a systematic analysis of the OpenID 2.0 protocol using bothformal model checking and an empirical evaluation of 132 popular RP websites. We formalizedthe OpenID 2.0 protocol in the High Level Protocol Specification Language (HLPSL) and ver-ified the model using the Automated Validation of Internet Security Protocols and Application(AVISPA) model checking engine [Vig06]. Based on this analysis, three root weaknesses ofthe OpenID protocol were identified: (1) a lack of authenticity guarantee of the authenticationrequest, (2) a lack of contextual bindings between the authentication messages and the browser,and (3) a lack of integrity protection of the authentication request.To evaluate how prevalent those weaknesses are in the real-world implementations of RPwebsites, we developed six exploits and a semi-automated tool, and evaluated 132 RPs. Theresults of our empirical evaluation show that many of the tested RPs are vulnerable to at leastone variant of SSO CSRF attacks and/or are exploitable through the session swapping attack.Our evaluation also found that after a successful SSO CSRF attack, an adversary could useCSRF attacks to alter the users? profile information on two-thirds of the evaluated RPs. Boththe SSO CSRF and the session swapping attack could be launched by a passive web attacker thatlures the victim to visit a web page with the exploit code. With additional practical adversarycapabilities that enable an attacker to intercept the authentication assertions, the attacker couldimpersonate a user on the majority of RPs to gain complete control of the victim?s data. Inaddition, the extension parameters can be forged on almost half of the websites that supportOpenID Simple Registration or Attribute Exchange extension.The lack of security guarantees in the OpenID protocol requires RPs to employ additionalcountermeasures. However, our formal analysis and empirical evaluation found that the existingcountermeasures and recommendations are provided as piece-meal patches and do not addressthe root causes of the vulnerabilities. To address the uncovered weaknesses, two countermea-sures are proposed and evaluated. Both proposed countermeasures work with the existingOpenID 1.1 and 2.0 protocol, and do not require modifications of IdPs or web browsers. Wehave made the formal protocol specification, the vulnerability assessment tool, and the referenceimplementation of the countermeasures publicly available [Sun11].The rest of the chapter is organized as follows: Section 4.1 provides an overview of ourapproach and the adversary model. The OpenID protocol and its formalization are presented in594.1. Approach and Adversary ModelFigure 4.1: Overall approach.Section 4.2 and 4.3. In Section 4.4, the results of our evaluation of existing RP implementationsare presented. We describe our proposed countermeasures in Section 4.5, and summarize thechapter and outline future work in Section 4.6.4.1 Approach and Adversary ModelFigure 4.1 illustrates the overall process of our approach, which consisted of three stages: (1)formalizing the OpenID protocol and identifying its vulnerabilities using an automatic securityprotocol verification tool, (2) designing exploits and tools to evaluate real-world RP websites,and (3) designing and evaluating countermeasures.To formalize the OpenID protocol, we first interpreted the OpenID specification into asequence diagram and implemented an RP website. The sequence diagram was then validatedby using an HTTP proxy to examine the protocol messages. Based on the OpenID sequencediagram, our adversary model, and the weaknesses documented in the OpenID specification, theprotocol was formalized in Alice-Bob (A-B) notation?a simple way commonly used to describesecurity protocols [CVB05, LSV03, DM00, Low98]. The A-B notation gave a clear illustrationof the messages exchanged in a normal, successful run of the protocol, which assisted initialanalysis and could be later translated into other protocol specification languages. According tothe A-B notation, the protocol was modeled in HLPSL and validated using AVISPA [Vig06].AVISPA is a security protocol verification tool that has been widely employed to validateauthentication and key exchange protocols (see http://www.avispa-project.org for the libraryof examined protocols and related papers). The verification process outputted possible attacktraces on the model of the OpenID protocol.When applying model checking approach for security protocol analysis, one inherit limitationis that a model checker stops its execution once an insecure state is reached or when thecomputation resources are exhausted. Thus, in order to formalize a concise model that couldavoid the state explosion problem and discover as many weaknesses as possible, our formalizationexcluded all documented weaknesses in the OpenID specification. Our analysis also assumed604.1. Approach and Adversary Modelthe integrity of the user?s computer and that the RP, the IdP, and the channel between themare trusted.The analysis of the AVISPA attack traces identified three weaknesses in OpenID that couldbe exploited by several attack vectors. For each attack vector, a corresponding exploit wasdesigned and manually tested on 20 RP websites. To facilitate the assessment process, a semi-automatic vulnerability assessment tool was then developed and used to evaluate 103 RP web-sites listed on an OpenID directory,2 and 29 websites from the Google Top 1,000 Websites thataccept OpenID for logins.To eliminate the identified vulnerabilities, potential countermeasures were first modeled inAVISPA to ensure that the proposed solutions could address the root cause of the vulnerabil-ities. To be simple and scalable, the proposed defense mechanisms are stateless and only usecryptographic functions (i.e., HMAC and DH key exchange) and data that are readily accessibleto RPs. In addition, we designed a scheme that allows the browser and the RP server to derivea DH session key during the OpenID authentication process to mitigate impersonation attacksafter login. Both proposed countermeasures were implemented and tested on an open-sourceJava web application.4.1.1 Adversary ModelIn this work, we assume that both the RP and IdP are trust-worthy and that the users? ma-chines are benign and not compromised. We do not consider attacks that rely upon subvertingthe RP and IdP?s administrative functions or exploiting vulnerabilities in their infrastructures.In our adversary model, an adversary is not affiliated with an RP or IdP; and its goal is to gainunauthorized access to the user?s personal data on an RP?s website. In addition, ?perfect cryp-tography? in the protocol is assumed; that is, an attacker cannot break cryptography withoutthe decryption key. Moreover, the known threats documented in the OpenID specification (e.g.,phishing, IdP masquerade, replay, denial-of-service attacks) are not considered. In this work,three different adversary types are considered, which vary on their attack capabilities:? A Web attacker can post comments that include static content (e.g., images, stylesheet)on a benign website, setup a malicious website, send malicious links via spam or Adsnetwork, and exploit web vulnerabilities (e.g., XSS) at benign websites. Malicious contentcrafted by a Web attacker can cause the browser to issue HTTP requests to other websitesusing both GET and POST methods, but these requests cannot modify or access HTTPheaders.? A network attacker can sniff and alter traffic between the browser and the RP by eaves-dropping messages on an unencrypted network, or using MITM proxying techniques, suchas luring the victim to use a rogue wireless access point, or employing ?drive-by pharm-2https://www.myopenid.com/directory614.2. The OpenID ProtocolNotation Description Notation Descriptioni OpenID identifier h Session handlek Session key n Noncer RP return url x.y Concatenation of x and yx? y XOR of x and y E(x, k) Encrypt x with key kH(x) Hash function on x HMAC(x, k) HMAC on x with key kU User B BrowserRP Relying party IdP Identity PproviderTable 4.1: Notation.Figure 4.2: The OpenID protocol sequence diagram.ing? [SRJ07] attacks to alter the DNS server settings on the victim?s home broadbandrouter.? A remote attacker can send and modify HTTP request and response messages to RPsand IdPs from his browser. The attack goal is to log in as the victim user on the RPwebsite right from the attacker?s machine.4.2 The OpenID ProtocolOpenID uses a URL or XRI [OAS08] as a user?s identifier and the OpenID protocol asserts toan RP that the user owns the resource of the given identifier. In this chapter, the notationdescribed in Table 4.1 is used to denote the protocol messages and entities. In particular, weuse a capital letter to denote an entity (e.g., User, RP, IdP) and a lower-case letter to representdata in the protocol. As illustrated in Figure 4.2, the OpenID protocol consists of four phases,each phase is described below:Phase 1: Initialization and discovery:1.1 User U selects an IdP (e.g., https://yahoo.com/), or enters her OpenID identifier i into anOpenID login form on an RP. The browser B then sends i to RP?a ?Login Request?.624.2. The OpenID Protocol1.2 RP makes an HTTP request on i to fetch the document hosted on the ID server. The IDserver can be located within the domain of an IdP or can be a completely different entitythat delegates an IdP to authenticate the user.1.3 The ID server responds with either an XRDS [HL08] or HTML document that contains theIdP endpoint URL idp.Phase 2: Association (optional):2.1 RP generates a Diffie-Hellman (DH) modulus p, generator g, and a random DH privatekey a to initiate an association operation that establishes a session key k with IdP.2.2 RP sends i, p, g, and the DH public key ga mod p to IdP.2.3 IdP generates a new session handle h, a session key k, and a random DH private key b.2.4 IdP sends gb mod p, h, and an encrypted session key kenc = (k ?H(gab mod p) to RP.2.5 RP computes k = H(gab mod p) ? kenc and then stores the tuple (h, k, i) temporarily forthe SSO login session.Phase 3: Authentication request:3.1 RP sends i, h (optional), and a return URL r to IdP via B to obtain an assertion?an?Auth Request?. The return URL r is where IdP should return the response back to RP(via B). If RP omits Phase 2, it must validate the received authentication response viaa direct communication with IdP in the ?Authentication response? phase (Steps 4.4 and4.5).3.2 B sends i, r, and h to IdP.3.3 IdP checks i and h against its own local storage. If h is not presented, IdP generates anew session handle h and a session key k. In addition, if a cookie that was previously setafter a successful authentication with U is presented in the request, IdP could omit thenext two steps (3.4 and 3.5).3.4 IdP presents a login form to authenticate the user.3.5 U provides her credentials to authenticate with IdP, and then consents to the release ofher profile information (e.g., email, user name).3.6 Once the user credentials are validated, IdP generates nonce n and signature s =HMAC(idp.i.h.r.n, k).Here, the ?.? is a concatenation operation between two values.Phase 4: Authentication response:4.1 IdP sends idp, i, h, r, n, and s to the URL specified in r via B?an ?Auth Response?.4.2 B redirects the authentication response to RP.4.3 RP computes s? = HMAC(idp.i.h.r.n, k) over the received idp, i, h, r, and n, and checkswhether s?=s. Note that RP can perform local validation on s only if it has established ashared session key k with IdP in Phase 2.634.3. Protocol Formalization4.4 If RP omits Phase 2, it sends the authentication response directly to IdP, i.e., not via B.4.5 IdP answers whether the authentication response is valid.4.6 If the authentication response is valid, RP signs U in using i as her identifier.4.3 Protocol FormalizationSince our adversary model assumes that both the RP and IdP are trustworthy and that the in-tegrity of the user machine is guaranteed, the following assumptions are made when formalizingthe OpenID protocol:? Secure discovery process: We assume that the RP knows the end-point URL of theIdP based on a given OpenID identifier. Thus, the discovery steps (Steps 1.2 and 1.3 InFigure 4.2) are ignored in our model.? Secure association process: The OpenID protocol uses the DH key exchange protocolto establish a session key between the RP and IdP; but DH is vulnerable to MITM attacks.We do not attempt to address this problem and thus omit the association steps (Steps2.1 to 2.5) from the formal model. Our model assumes that the RP has successfullyestablished a shared key with the IdP and the authentication response can be validatedby the RP locally (i.e., Steps 4.4 and 4.5 are omitted).? Secure channel between the user and the IdP: We assume that the user-to-IdPcommunication is protected with SSL, and the RP redirects the user to the correct IdPfor authentication (i.e., phishing attacks are not considered).These assumptions allow us to derive a concise model that could avoid the state explosion prob-lem during verification, and prevent the known weaknesses from blocking the execution of themodel checker when an insecure state is reached. In addition, the mechanism for authenticatinga user to an IdP is not defined in the OpenID specification (Steps 3.4 and 3.5). For the purposeof modeling user-to-IdP authentication, the following authentication protocol is adopted fromthe HLPSL documentation:1. A ? B : E(na, k), A sends B a nonce na encrypted with a shared key k2. B ? A : E(nb, k), B sends A another nonce nb also encrypted with k3. A ? B : E(nb, k1), A computes a new key k1=H(na.nb) and sends back B the value of nbencrypted with k1The first two messages serve to establish k1, shared between A and B, and the last one servesas a proof that A has the new key, k1, and B can authenticate A using nb. This protocol hasbeen verified to be secure by AVISPA, and thus the use of it would not affect the outcome ofthe analysis.644.3. Protocol Formalization1. UB ? RP : i, Login Request (1.1)2. RP ? UB : IdP.i.h.RP, Auth Request (3.1)3. UB ? IdP : IdP.i.h.RP.E(na, kUI), UB-to-IdP authentication (3.2)4. IdP ? UB : E(nb, kUI), k1=H(na.nb) (3.3 & 3.4)5. UB ? IdP : E(nb, k1), IdP authenticates UB on nb (3.5)6. IdP ? UB : IdP.i.h.RP.n.s, s=HMAC(IdP.i.h.RP.n, kRI), Auth Response (4.1)7. UB ? RP : IdP.i.h.RP.n.s, Assertion Validation (4.2 & 4.3)Figure 4.3: The Alice-Bob formalization of the OpenID protocol. The corresponding steps fromthe the sequence diagram is denoted in the end of each step.Figure 4.4: The conceptual model of the HLPSL formalization.4.3.1 Alice-Bob FormalizationOur formal model combines user U and browser B into one single entity, denoted as UB. Basedon the above assumptions and the user-to-IdP authentication protocol, the shared knowledgebetween each entity is defined as the follows: (1) IdP and UB share a secret key kUI and anidentifier i, (2) RP shares a secret key kRI and a session handle h with IdP, and (3) RP doesnot have a prior knowledge of UB and i.By taking out the omitted steps from the sequence diagram based on our assumptions andusing the shared knowledge defined above, an Alice-Bob formalization illustrated in Figure 4.3is modeled. Each step in the A-B notation is annotated with the corresponding steps from theprotocol sequence described in Section 4.2. Steps 3 to 5 use the aforementioned authenticationprotocol to authenticate UB to IdP.4.3.2 HLPSL FormalizationFor a protocol to be verified with AVISPA?s back-end model checking engines, it must beencoded in HLPSL?an expressive, modular, role-based formal language that allows for thedetailed specification of the protocol in question. An HLPSL model typically includes the rolesplayed in the security protocol, as well as the environment role and the security goals that have654.3. Protocol Formalizationto be satisfied.The conceptual model of our HLPSL formalization is illustrated in Figure 4.4, and thesource code and attack traces are listed in Appendix B. Each basic role (i.e., UB, RP, IdP)contains a set of state transition definitions and local variables. Each transition represents thereceipt of a message and the sending of a reply message, and the local variables are set duringa state transition. In addition, each basic role contains a set of shared constants defined by theenvironment role to model the shared knowledge between different roles.A role in HLPSL uses channels defined by the environment role for sending and receivingmessages. As illustrated in Figure 4.4, the message sequences between each role have a one-to-one mapping to the A-B notation defined in Figure 4.3. AVISPA analyzes protocols under theassumptions of a perfect cryptography and that the protocol messages are exchanged over anetwork controlled by a Dolev-Yao intruder [DY83]. That is, the intruder can intercept, modify,and generate messages under any party name, but he cannot break cryptography without thedecryption key.The environment role also defines the intruder?s initial knowledge?shared constants thatare initially known to the intruder. In our model, the intruder knows all shared constantsexcept the secret keys that are shared between basic roles (i.e., kUI between UB and IdP, andkRI for RP and IdP). Based on this initial knowledge, the intruder gains or derives additionalknowledge via the intercepted messages through out the execution of the protocol.An HLPSL model is a state machine, and an AVISPA model checking engine tries to reachall possible states of the protocol to find an insecure state that violates at least one of theprotocol?s safety properties?referred as ?security goals? in AVISPA. There are two types ofsecurity goal supported by HLPSL?secrecy and authentication. Each security goal, declaredwith a unique constant identifier, is an invariant that must hold true for all reachable states.Three special statements in HLPSL are used to specify the condition of a desired securitygoal. For secrecy goals, the secret statement specifies which value should be kept secretamong whom; and if the intruder learns the secret value, then he has successfully attacked theprotocol. For authentication goal, a pair of statements (witness and request) are used tocheck that a principal is right in believing that his intended peer is presented in the currentsession, and agrees on a certain value. For instance, an authenticity goal ?A authenticates Bon the value of C? could be read as ?A believes B is presented in the current session and agreeson value C.? Typically, C is a fresh value that is unknown to the intruder and unique amongconcurrent sessions. If an intruder manipulates protocol messages to reach a state in which Bagrees on a different value C with A, or the same value C is used in multiple sessions, then theauthentication goal has been successfully violated by the intruder.Our HLPSL model specifies six security goals based on the Alice-Bob formalization in Fig-ure 4.3. The overall goal of the OpenID protocol is to assert to an RP that the user owns aspecific OpenID URL controlled by the IdP. In order for the user to participate in the authen-tication process, the OpenID authentication request and response are passed between the RP664.3. Protocol Formalizationand IdP through the user?s browser. Thus, when an RP receives an Auth Response, the RPhas to assert that the Auth Response is generated by the IdP (goal G1), the same UB is usedfor the request and response (G2), and the UB has been authenticated by the IdP (G3, G4).On the other hand, when an IdP receives an Auth Request, the IdP has to make sure that theAuth Request is originated by the RP (G5), and the RP needs to ensure the Login Requestis initiated by the UB with the user?s OpenID identifier (G6). Therefore, the security goals ofour HLPSL model are specified as follows:G1: RP authenticates IdP on the value of the signature s = HMAC(IdP.i.h.RP.n, kRI).G2: RP authenticates IdP on the value of UB.G3: IdP authenticates UB on the value of nb.G4: The session key k1=H(na.nb) should be kept secret between UB and IdP.G5: IdP authenticates RP on the value of the Auth Request (IdP.i.h.RP).G6: RP authenticates UB on the value of the OpenID identifier i.A run of AVISPA model checking found three violated security goals, G2, G5 and G6. Theviolation of the G2 goal reveals that the OpenID protocol lacks contextual bindings betweenthe Auth Request, Auth Response, and the browser. This means that when an RP receivesan Auth Response, the RP cannot assert that the Auth Response is sending from the samebrowser through which the authentication request was issued. The lack of contextual bindingin the protocol enables many possible attacks when an Auth Response is intercepted by anintruder, such as (1) a session swapping attack that forces the user?s browser to initialize asession authenticated as the attacker, (2) an impersonation attack that impersonate the userby sending the intercepted Auth Response via a browser agent controlled by the attacker. Notethat SSL could prevent an MITM attacker from intercepting the Auth Response transmittedin the network, but it could not stop a session swapping attacker who intercepts the AuthResponse from his own browser.The violation of the G5 goal indicates that the authenticity and integrity of the Auth Requestis not protected by the OpenID protocol. That is, an IdP might accept an Auth Request sentfrom the intruder or the Auth Request might be altered during the transmission. This weaknesscould be exploited in many ways, such as (1) a SSO CSRF attack that forces the victim to loginto her RP website by sending a forged Auth Request via the victim?s browser, (2) a parameterforgery attack that manipulates the victim?s profile attributes requested by the RP websitesthrough a modification of the Auth Request within the protocol.Goal G6 cannot be satisfied either. Based on the attack trace, an intruder can initiate aLogin Request with the RP, and then use role UB for the rest of the communications to violatethis goal. This indicates that the authenticity of the Login Request is not guaranteed. Thisweakness can be exploited by using a traditional CSRF technique to initiate a Login Request674.4. Attack Vector Evaluationsusing either the GET or POST method via the victim?s browser to insidiously sign the victiminto the RP in order to launch subsequent CSRF attacks.4.4 Attack Vector EvaluationsIn addition to the three variants of SSO CSRF, session swapping, impersonation, and parameterforgery, a replay attack was included in our evaluation in order to assess how many RPs hadperformed the assertion nonce check correctly, as an RP must check the nonce values receivedfrom all IdPs. Overall, seven attack vectors were evaluated on 132 real-world RPs.4.4.1 Manual EvaluationsWe describe below how each attack vector was manually evaluated on 20 RP websites. Eachevaluation began by selecting an IdP or entering the OpenID identifier of a test account on theRP login form to initiate a sign on process. The protocol messages (i.e., Login Request, AuthRequest, and Auth Response) were intercepted by a Firefox extension we designed that allowsthe investigator to abort or manipulate the intercepted messages. For attacks that could belaunched from a browser agent controlled by the attacker (i.e., replay and impersonation at-tacks), which allow the attacker to forge and manipulate the HTTP headers including cookies,we designed and implemented a customized browser agent by reusing the GeckoFX web browsercontrol [Sky10]. Note that before each evaluation, all cookies in the browser are removed toreset the browser to its initial state, and a protocol message does not reach the RP if it is?intercepted?, but does if it is ?captured?.A1: SSO CSRF via Login Request through GET method (exploits G6):1. Intercept a Login Request and abort the rest of authentication process.2. Construct an attack URL from the intercepted Login Request and create an attack pagethat contains an invisible HTML iframe element with src attribute set to the attackURL. If the RP login form uses an HTTP POST method for submitting the login request,take the request parameters (key-value pairs) from the HTTP request body and appendthem to the end of the request URL as part of query strings to form an attack URL. If theLogin Request uses an HTTP GET method, then the request URL is used as the attackURL directly. For example:<iframe style=?display=none? src=?http://rp.com/login?p1=v1&p2=v2?>3. Open another browser and make sure the testing account has logged into the IdP but notlogged into the RP yet. Browse the attack page and then go to the RP website to checkwhether the testing account has logged into the RP successfully.684.4. Attack Vector EvaluationsA2: SSO CSRF via Login Request through POST method (exploits G6): Theevaluation procedures for this attack are the same as A1 except in Step 2, the iframe?s srcattribute is set to another page which contains (1) a web form with the action attribute setto the request URL of the Login Request, and each HTTP query parameter (key-value pair)in the Login Request is added to the form as a hidden input field, and (2) a JavaScript thatsubmits the web form automatically when the page is loaded. For example:<iframe style=?display=none? src=?http://evil.com/sso_csrf_post.htm?>sso_csrf_post.htm:<body onload=?document.forms[0].submit();?><form action=?http://rp.com/login? method=?post?><input type=?hidden? name=?p1? value=?v1?><input type=?hidden? name=?p2? value=?v2?>...<form>A3: SSO CSRF via Auth Request (exploits G5): Similar to A1, except an Auth Requestinstead of a Login Request is intercepted in Step 1. Additionally, in order to reuse the attackURL, the association handle (i.e., parameter assoc handle) is removed from the interceptedAuth Request before forming the attack URL. Removing association handle makes the exploitgeneral and reusable because the association between the RP and the IdP would expire after acertain period of time specified by the IdP, and it might be bound to a specific OpenID identifier.A4: Parameter Forgery (exploits G5):1. Capture an Auth Response and log all parameters related to the OpenID Simple Reg-istration or Attribute Exchange extensions. The extension parameters contain profileinformation of the user such as email and date of birth.2. Re-initiate a login process again. This time, strip out all extension related parametersin the Auth Request, and append forged extension parameters to the Auth Responsebefore sending it to the RP website. The end result is that the forged extension param-eters included in the Auth Response are not signed by the IdP. When the RP sends theAuth Response to the IdP for maladaptation, the signature will be validated success-fully. However, if the RP does not check wether the extension parameters are included inthe signature or not before using those parameters to identify the current user, then theattacker can log into the RP as the victim user using the forged extension parameters.A5: Session swapping (exploits G2): The steps for evaluating this attack are the sameas A1, except that this attack intercepts an Auth Response passed from the IdP as the attackURL in Step 1 (i.e., the Auth Response does not reach to the RP), and the testing account has694.4. Attack Vector EvaluationsFigure 4.5: Main components of OpenIDVAT.not logged into the RP in Step 3.A6: Impersonation (exploits G2): Intercept an Auth Response and then send the inter-cepted Auth Response (including HTTP headers) to the RP via the customized browser agentwe designed.A7: Replay: Similar to A6, except that an Auth Response is captured instead of being inter-cepted in this attack (i.e., the Auth Response reaches the RP).4.4.2 The OpenID Vulnerability Assessment ToolTo facilitate the vulnerability evaluation process and to enable website developers to assess theirRPs, we designed an OpenID vulnerability assessment tool named ?OpenIDVAT? in C# .NET.The tool reuses the GeckoFX web browser control [Sky10] for sending HTTP requests andrendering the received HTML content. The original GeckoFX exposes a read-only document-object-model (DOM) and does not provide the capability to capture and intercept HTTPrequests. We modified GeckoFX to provide a writable DOM, and make it capable of observingand blocking HTTP requests.Figure 4.5 illustrates the main components of OpenIDVAT. The primary user interface isthe GeckoFX web browser control augmented with a writable DOM and an HTTP interceptor.The ?Auto Form Filler? component fills and submits the IdP login form automatically usingthe test account. It also fills in the OpenID identifier field on the RP login form to reducethe amount of user input. Each vulnerability is assessed by one assessment class, which is asoftware module that implements a pre-defined interface. The tool can be extended with newassessment classes, which could be implemented by inheriting from an existing module thatcontains most of the functions related to the assessment tasks.To assess whether an RP is vulnerable, the evaluator first signs into the RP via OpenID-704.4. Attack Vector EvaluationsD1 D2 TotalNo. of RPs N=103 N=29 N=132Full Secured 0% 10% 2%Proxy Service 11% 31% 15%SSL Protected 12% 45% 19%SSO CSRF 88% 16% 81%A1: POST 73% 14% 67%A2: GET 44% 9% 41%A3:Auth Req 69% 13% 64%A5: Session Swap 76% 83% 77%A6: Impersonation 88% 55% 80%A7: Replay 10% 21% 12%Support Extension N=76 N=26 N=102A4: Parameter Forgery 54% 7% 45%Table 4.2: The results of the empirical RP evaluation. ?SSO CSRF? row denotes the percentageof RPs that are vulnerable to at least one variant of SSO CSRF attacks.VAT using a pre-configured testing OpenID account. OpenIDVAT records the mouse clicksthat initiate the login process and then captures the protocol messages. Once logged in, theevaluator is instructed to start an assessment process. For each vulnerability under assess-ment, OpenIDVAT (1) resets the browser state by removing all cookies from the GeckoFX webbrowser control, (2) retrieves the captured protocol messages from logs, or replays the mouseclicks to initiate a new login request and then capture or intercept the required messages, (3)simulates switching to the victim?s browser by clearing all cookies, (4) constructs and sendsattack messages via GeckoFX, and (5) prompts the user to check if the account under test hassigned into the RP successfully.4.4.3 Evaluation of Real-world RPsTo find a representative sample of RP websites, we went through the OpenID site directoryon myopenid.com (denoted as ?D1?, 249 entries) and the Google Top 1000 websites (??D2?,1000 entries). We excluded these websites listed that are not written in English (D1 20, D2527), not a relying party (D1 88, D2 442), or not accessible (D1 32, D2 2). Six RP websitesappeared on both lists, and they were removed from D1 to avoid double-counting. Together,OpenIDVAT was employed to evaluate a total of 132 RPs websites. The GeckoFX web browsercontrol does not support popup windows, thus for RPs that use a popup window during theOpenID authentication, the protocol messages were examined manually.We found 15% of RP websites use a proxy service (e.g., Janrain engine3, Gigya4) for OpenIDauthentication. The proxy service performs the OpenID communication on behalf of the web-site, requests and stores the users? profile attributes, and then returns an access token for the3http://www.janrain.com/products/rpx4http://www.gigya.com/714.5. Defense Mechanismswebsite to retrieve the user?s profile data via a direct communication with the proxy service(i.e., not through the browser). Further investigation revealed that although the communicationbetween the proxy service and the IdP is secure, the access token returned to the RP may notbe protected. If the token is not SSL-protected, the RP is subject to impersonation and replayattacks (within 5 to 10 minutes) in addition to session swapping and SSO CSRF attacks.As shown in Table 4.2, the majority of RPs (98%) are vulnerable to at least one attack. Asignificant percentage of D2 RPs utilize a proxy service (D1 11%, D2 31%) and employ SSLto protect the communication channel (D1 12%, D2 45%). RPs listed on D2 are much moreresilient to SSO CSRF and parameter forgery than D1 RPs; but many of them are vulnerableto session swapping, impersonation, or replay attacks due to the lack of protection on the accesstokens returned from the proxy service. In addition, we found that 33% of RPs employed aCSRF protection mechanism to protect their login form via the POST method, but 44% ofthem (D1 61%, D2 13%) failed to protect SSO CSRF using the GET method or through anAuth Request. Furthermore, 77% of RPs support OpenID Simple Registration or AttributeExchange extension, but we found the extension parameters can be forged on 45% of thesewebsites.4.5 Defense MechanismsThe lack of security guarantee in the OpenID protocol means that RP websites need to employadditional countermeasures. We aimed to satisfy the following properties when designing ourdefense mechanisms:? Completeness: The countermeasure must address all weaknesses uncovered from ourformal model.? Compatibility: The protection mechanism must be compatible with the existing OpenIDprotocol and must not require modifications to IdPs and the browsers.? Scalability: Statelessness is a desirable property of the defense mechanism. The coun-termeasure should not require RPs to maintain an additional state on the server in orderto be effective.? Simplicity: The countermeasure should be easy to implement and deploy. In particular,it should only use cryptographic functions (i.e., HMAC and DH key exchange) and datathat are readily accessible to RPs.To eliminate the uncovered weaknesses, we revised the formal model in which (1) the AuthRequest is signed by RP and the UB is included in the signature, and (2) the Login Requestis signed by UB. Figure 4.6 illustrates the revised protocol in A-B notation with boldfacedelements showing the changes. The revised model was encoded in HLPSL, and verified to besecured by AVISPA, with respective to the weaknesses described in Section 4.4.724.5. Defense Mechanisms1. UB ? RP : i.t1, t1=HMAC(UB.i, kRP), Login Request2. RP ? UB : IdP.i.h.RP.t2, t2=HMAC(UB.IdP.i.h.RP, kRI) Auth Request3. UB ? IdP : IdP.i.h.RP.t2.E(na, kUI), UB-to-IdP authentication4. IdP ? UB : E(nb, kUI), k1=H(na.nb)5. UB ? IdP : E(nb, k1), IdP authenticates UB on nb6. IdP ? UB : IdP.i.h.RP.t2.n.s, s=HMAC(IdP.i.h.RP.t2.n, kRI) Auth Response7. UB ? RP : IdP.i.h.RP.t2.n.s.Figure 4.6: The revised OpenID protocol in Alice-Bob notation. The changes are shown inboldface.In order for our countermeasures to be easily implemented and deployed by RPs, the defensemechanisms were designed based on the revised model, but separated with respect to differentadversary models. SSL prevents network attackers from intercepting or altering network traffic,but it cannot stop attacks launching from the victim?s browser, such as SSO CSRF and sessionswapping attacks. Hence, a defense mechanism complimentary to SSL is required to mitigateattacks launched by Web attackers. On the other hand, as SSL introduces unwanted compli-cations, and only 19% of RP websites in our evaluation employed SSL, an alternative defensemechanism to SSL is needed to prevent network attackers from impersonating the victim via asniffed session cookie or an intercepted Auth Response.4.5.1 The Web Attacker Defense MechanismDesigned as a complementary countermeasure to SSL, we propose the following defense mech-anism based on the revised model:1. When rendering a login form, RP generates token t1 = HMAC(sid, kRP) and appends itto the login form as a hidden form field. Here, sid is the HTTP session identifier and kRPis an application or session-level secret key generated by RP. Token t1 is used to ensurethe Login Request is originated from the RP itself.2. Upon receiving a Login Request, RP computes t1? = HMAC(sid, kRP) and checks whethert1? = t1 from the request. If it is, then RP initiates an Auth Request with parameter t2= HMAC(sid.idp.i.h.r, kRP) appended to the return to URL of the Auth request.3. Upon receiving an Auth Response, RP extracts t2 from the return to URL, computest2? = HMAC(sid.idp.i.h.r, kRP), and checks whether t2? = t2 in addition to the AuthResponse signature validation.Our Web attacker defense mechanism is stateless, and designed to be implemented com-pletely on the RP server-side. In addition, all required cryptographic functions (i.e., HMAC)and data (i.e., Auth Request and session cookie) are readily accessible to the RP. The mitigationapproach uses an HMAC function to bind the session identifier to the protocol messages in orderto provide contextual binding and ensure the integrity and authenticity of the authentication734.5. Defense Mechanismsrequest. Using an HMAC code as a validation token avoids the exposure of the session identi-fier. In addition, for RPs that support an OpenID extension, the extension request parameterscan be included in the return to URL to be protected by the defense mechanism.Most web application development frameworks support automatic session management,which makes the session identifier readily accessible to the RP implementation. Websitesthat do not issue a session before authentication need to initiate an ?unauthenticated? ses-sion (including setting the session cookie) before rendering the login form, and then switch toan authenticated session with a new session identifer after a valid assertion is received. Alsonote that the OpenID protocol 2.0 allows an end user to enter an IdP?s OpenID Identifier (e.g.,?https://yahoo.com? for Yahoo) instead of her OpenID. When an IdP Identifier is entered, thei in the Auth Request is a constant string5 defined by the OpenID, and the i in the AuthResponse is the user?s OpenID URL. In this case, RP has to use the constant identifier definedby the OpenID when initiating an Auth Request in Step 2, and computing t2? in Step 3.Our defense mechanism prevents SSO CSRF via Login Request attacks (attacks A1 andA2) as an attacker is not able to compute the validation token t1 without knowing the sessionidentifier and the RP?s secret key. SSO CSRF via Auth Request (A3) and session swapping(A4) attacks are mitigated as well, because the session identifier in the attacker?s browser sessionis different from the one in the victim?s browser. In addition, the integrity of Auth Requestis guaranteed (A5) as the Auth Request is accompanied by an HMAC, and any modificationto the Auth Request would be detected in Step 3. Impersonation attacks via an interceptedAuth Response (A6) can be prevented when the communication between the browser andthe RP website is SSL-protected. However, SSL imposes unwanted side effects for websiteowners such as computation overhead, non-cacheable latency, and mixed content warnings. Inaddition, even if the login process is protected by SSL, if the attacker manages to find thesession cookie in a subsequent communication that is not secured by HTTPS (e.g., pages,graphics, JavaScripts, style sheets), the attacker could use the eavesdropped session cookie toimpersonate the victim for the length of the session. We found that only 19% of RP websitesin our evaluation employed SSL, and 84% of them were vulnerable to session hijacking via aneavesdropped session cookie after login. This speaks to the need for an alternative defensemechanism to prevent impersonation attacks without requiring SSL employed by RPs.4.5.2 The MITM CountermeasureThe stateless nature of the HTTP protocol makes it difficult to be sure if two HTTP requestsoriginated from the same client. Web applications typically use browser cookies to identify eachinstance of their browser clients. However, without the confidentiality and integrity protectionsprovided by SSL, browser cookies can be eavesdropped on or altered by network attackers. Inthe case of the OpenID protocol, an MITM network attacker can intercept an Auth Responsewith the corresponding session cookie, and then replay them from a browser agent controlled5http://specs.openid.net/auth/2.0/identifier select744.5. Defense MechanismsFigure 4.7: The MITM defense mechanism establishes a DH session key (gab mod p) betweenthe browser and the RP server during the OpenID authentication process. Here, g is the DHgenerator, p is the modulus, and a and b are random DH private keys for the browser and theRP server respectively.by the attacker in order to impersonate the victim. Moreover, even if the login process iscompletely secured by SSL, if the session identifier is revealed in any of the subsequent HTTPrequests, a passive network attacker can simply eavesdrop on the session identifier to hijack thesession after the user has successfully logged into the RP website. One intuitive solution to thesession identifier eavesdropping problem is to associate web sessions with the user?s IP addressat the time of session initiation. If a session cookie is received from a different IP address, itcould be detected by the web server. Unfortunately, many web users? computers are locatedbehind a web proxy server or Network Address Translator so that they are effectively using thesame IP address to surf the web. From a server?s point of view, if an attacker has managed tosniff a victim?s session cookie behind a network router, there is no detectable difference betweena legitimate HTTP request and the one sent by the attacker.An impersonation attack is difficult to mitigate when there is no shared secret between thebrowser and the RP server. The OpenID protocol and our Web attacker countermeasure use anHMAC message authentication code to verify both the data integrity and the authenticity ofa message. Similarly, an accompanying HMAC code for each HTTP request could provide anauthenticity and integrity guarantee to prevent impersonation attacks via eavesdropped sessioncookies. However, an HMAC function needs a secret key, and the main challenge is how to derivea shared secret among the browser and the server in the presence of an MITM attacker. Thus,the goal of our impersonation defense mechanism is to derive a shared session key between thebrowser and the RP server without employing SSL by the RP. With the shared key, the clientcan encrypt sensitive information and compute an HMAC code for each subsequent HTTP754.5. Defense Mechanismsrequest to prevent impersonation attacks launched by network attackers. To establish a sharedsecret for the browser and the RP server during the OpenID authentication process, we proposethe following scheme (illustrated in Figure 4.7):1. Before submitting the RP login form to the server, RP uses a client-side JavaScript codeto establish a Diffie-Hellman session key with the server and store the session key onthe browser?s local storage (e.g., localStorage in HTML5, userData for IE 5+, andwindow.globalStorage for Firefox 2+) or as a fragment identifier of the Action URLof the login form. A fragment identifier is the portion of a URL that follows the #character; it is never sent over the network but only used by the browser to scroll to theidentified HTML element. Note that the client-side session key kC might be different fromthe server-side kS if an MITM attacker intercepted the DH request and performed twodistinct DH key exchanges with the client and the server.2. Upon receiving a Login Request, RP replies with a page containing an Auth Requestand a JavaScript code that (1) retrieves kC from the fragment identifier by using thecommand document.location.hash or from the browser?s local storage, (2) appends aparameter t3 = HMAC(idp.i.h.r, kC) to r, (3) appends kC as a fragment identifier of r iflocal storage is not supported by the browser, and (4) sends the Auth Request to IdPusing the command window.location or an HTML form submission.3. Upon receiving an Auth Response, RP computes t3? = HMAC(idp.i.h.r, kS) using kS(excluding t3 from the return to URL) and checks whether t3? = t3, in addition tothe assertion signature validation. Note that t3 is included in the IdP signature as it isappended to the return to URL.The DH key exchange protocol does not provide authentication of the communicating par-ties, and is thus vulnerable to an MITM attack. As illustrated in Figure 4.8, an MITM attackercould perform two distinct DH key exchanges with the client and the RP server to derive twosession keys (kC and kS) with each party. The attacker can then use the derived session keysto decrypt the encrypted messages between the client and the server, or generate HMAC codeson behalf of each party.Since the DH key exchange by itself is vulnerable to an MITM attack, our countermeasureuses the assertion signature generated by the IdP to prevent an MITM attacker from interferingthe DH key agreement protocol. Our defense mechanism is designed based on the followingobservation: As the DH private key a and b for the client and the server are unknown to theMITM attacker, the two session keys, kC and kS, are derived with different values (i.e., kC = gacmod p and kS = gbc mod p) if an MITM attack is presented in the key agreement protocol. Inaddition, given a message m, if kC and kS are not the same, then the corresponding HMACcodes are different as well (i.e., HMAC(m, kC) != HMAC(m, kS). In our defense mechanism,the client appends a validation token t3 = HMAC(idp.i.h.r, kC) to the return to URL usingkC (Step 2), and the RP verifies the token when an Auth Response is received using kS (Step764.5. Defense MechanismsFigure 4.8: The MITM defense mechanism with the presence of an MITM attacker between thebrowser and the RP server. The OpenID authentication protocol will fail if the MITM attackerattempts to interfere the DH key exchange. If the Auth Response is successfully validated, thenthe DH key shared by the browser and the RP is unknown to the attacker.3). To pass the token validation performed by RP in Step 3, the attacker must replace t3 witht3?=HMAC(idp.i.h.r,kS) from the intercepted Auth Response using kS. However, replacingthe t3 will fail the signature validation performed by RP as t3 is included in the signature.Therefore, the DH key shared by the browser and the RP is unknown to the attacker if the AuthResponse is successfully validated. Our countermeasure requires the communication betweenthe browser and the IdP to be SSL-protected to prevent the attacker from replacing t3 witht3?=HMAC(idp.i.h.r,kS) in Step 2. This requirement is feasible because, to the best of ourknowledge, all major IdPs support authentication over SSL.Once the DH session key has been established, the session key can then be used to protectthe authenticity, confidentiality and integrity of the subsequent communications after login. Toprevent an MITM attacker from impersonating the victim via a sniffed session identifier, RPscould use the DH session key to encrypt sensitive data and compute a timestamp and an HMACfor every subsequent HTTP request. The RP should only respond to requests that come witha valid timestamp and HMAC authentication code, in addition to a valid session cookie whichmay be sniffed by an attacker. This is similar to SessionLock [Adi08b] and other web-basedAPIs, such as the Google and Facebook Platform APIs, but does not require SSL support fromthe RP websites.774.6. Summary4.5.3 Reference ImplementationTo evaluate the proposed defense mechanisms from a computational complexity view point,we developed a reference implementation. We first used the OpenID4Java [Buf09] libraryto augment OpenID support in an open-source J2EE web application (the BookStore fromhttp://gotocode.com), and then implemented the countermeasures on the web application. Wehave made the reference implementation of the countermeasures publicly available [Sun11].The Web attacker defense mechanism was implemented completely on the server-side usingthe javax.crypto.Mac class to compute and validate the HMAC tokens. We used the DHsession key exchanged by the browser and the RP server as the key for the HMAC function.Both the Login Request and Auth Request validation tokens are computed in 10 lines of code(LOC).For the server-side implementation of our MITM defense mechanism, the BigInteger Javaclass is used to compute the DH session key with the client (8 LOC). To validate the HMACtoken computed by the browser, the Mac Java class is used again (10 LOC). On the client-side,the XMLHttpRequest object is used to initiate a DH key exchange with the server, and thefollowing JavaScript libraries were used through out the reference implementation:? BigInt (http://leemon.com/crypto/BigInt.html) : Computes DH session key kC (7 LOC).? jStorage (http://www.jstorage.info): Stores and retrieves the the DH session key fromthe browser local storage (1 LOC).? jshash (http://pajhome.org.uk/crypt/md5/scripts.html) : Computes the HMAC authen-tication token for the Auth Request (15 LOC).4.5.4 LimitationsOur Web attacker defense mechanism could be easily implemented by RPs from a computationalcomplexity view point, because the HMAC function and all required data are readily accessibleto them. On the other hand, the MITM countermeasure requires JavaScript to be enabled inthe browser, and the client-side code needs to be written in a cross-browser manner. In addition,although the MITM attacker cannot impersonate the user by initiating requests on behalf of thevictim, the attacker could still read all unencrypted data between the client and the server, andalter the responded web page contents. While this threat exists and is important, its preventionand mitigation are outside the scope of this work.4.6 SummaryWe conducted a formal model checking analysis of the OpenID 2.0 protocol, and an empiricalevaluation of 132 OpenID-enabled website. To summarize, the work presented in this chaptermakes the following contributions:784.6. Summary? A formal specification and analysis of the OpenID protocol that identifies three weaknessesand correlates six types of possible attack vectors.? A semi-automatic OpenID vulnerability assessment tool.? An empirical evaluation of 132 OpenID-enabled websites.? Two proposed and evaluated countermeasures for the attacks that exploit the uncoveredweaknesses in the protocol.For an HTTP-redirection based protocol in which the protocol messages are passed throughthe browser, our analysis shows that the RP has to ensure that the authentication requestoriginated from the RP website itself, was not altered during transmission, and that the au-thentication assertion is passed from the same browser through which the request was issued.We provide a simple and scalable defense mechanism for RPs to ensure the authenticity andintegrity of the protocol messages. In addition, for those RPs that find deploying SSL im-practical, the MITM countermeasure we recommended can be used as an alternative. This isimportant because impersonation attacks are possible and easy to launch even after the OpenIDauthentication, when the authenticity and integrity of the HTTP requests are not protected.Nevertheless, we suggest that future protocol development of OpenID should provide authen-ticity, confidentiality, and integrity protection directly in the protocol to free RPs from takingad-hoc defense mechanisms.79Chapter 5Empirical Security Analysis ofOAuth SSO SystemsMillions of web users today employ their Facebook accounts to sign into more than one million(RP) websites. This web SSO scheme is enabled by OAuth 2.0 [HLRH11], a web resourceauthorization protocol that has been adopted by major service providers. The OAuth 2.0protocol has proven secure by several formal methods, but whether it is indeed secure in practiceremains an open question. In this chapter, we present an empirical security analysis of OAuthSSO systems that aimed to understand (1) what are security weaknesses in the real-worldOAuth-based SSO implementations, (2) the fundamental enabling causes and consequences,(3) how prevalent they are, and (4) how to prevent them in a practical way. These issues arestill poorly understood by researchers and practitioners.To answer these questions, we examined the implementations of three major IdPs (Facebook,Microsoft, and Google), and 96 Facebook RPs listed on Google Top 1,000 Websites [Goo11] thatprovide user experience in English. Our approach treated IdPs and RPs as black boxes, andrelied on the analysis of the HTTP messages passing through the browser during an SSO loginsession to explore potential exploit opportunities. In particular, as OAuth-based SSO systemsare built upon the existing web infrastructure, we aimed to understand how prevalent andwell-known web attack vectors (e.g., network eavesdropping, cross-site scripting (XSS), cross-site request forgery (CSRF)) can be leveraged by an adversary, individually or collectively, tocompromise user accounts on IdPs and RPs. For each uncovered vulnerability, an exploit wasdesigned and tested using a set of semi-automatic evaluation tools that we implemented toavoid errors introduced by manual inspections.One of our key findings is that the confidentiality of the temporary secret key to the user?saccounts can be compromised. In OAuth, an access token that represents the scope and durationof a resource authorization is the temporary secret key to the user?s accounts on both RP andIdP websites; and any party with the possession of an access token can assume the same rightsgranted to the token by the resource owner. Like a capability, if forged or copied, it allowsan adversary to obtain unauthorized access. Our analysis reveals that, although the OAuthprotocol itself is secure, the confidentiality of access tokens can be compromised in several ways.First, the OAuth protocol is designed specifically to prevent access tokens from exposingin the network (further discussed in Section 5.1), and yet we found that many access tokensobtained on the browser side are transmitted in unprotected form to the RP server side for80Chapter 5. Empirical Security Analysis of OAuth SSO Systemsthe purpose of authentication state synchronization. Moreover, to simplify accessibility, IdPs?JavaScript SDKs or RPs themselves store access tokens into HTTP cookies, and hence opens thetokens to a wide range of attacks (e.g., network eavesdropping, XSS cookie theft). Surprisingly,our evaluation shows that only 21% of RPs employ SSL to protect SSO sessions, even thoughabout half of tested RPs have protected their traditional login forms with SSL.Second, and more interestingly, access tokens can be stolen on most (91%) of the evalu-ated RPs, if an adversary could exploit an XSS vulnerability on any page of the RP website.Obviously, an XSS vulnerability found on the login page of an RP for which access tokens areobtained on the browser-side (i.e., client-flow) could allow an adversary to steal access tokensduring the SSO process. Nevertheless, our test exploit even succeeded on RPs that obtain accesstokens only through a direct communication with the IdP (i.e., server-flow, not via browser),regardless of whether the user has already logged into the RP website, and when the redi-rect URL is SSL-protected. XSS vulnerabilities are prevalent [OWA10, BBGM10], and theircomplete mitigation is shown to be difficult [CLZS11, HLM+11, SML11, TLV09, NSS09, RV09].Third, even assuming the RP website itself is free from XSS vulnerabilities, cross-site ac-cess token theft could be carried out by leveraging certain vulnerabilities found in browsers.We analyzed and tested two such exploit scenarios in which the vulnerable browsers are stillused by about 10% of web users [W3C12]. The first exploit executes the token theft scriptembedded in an image file by leveraging the browser?s content-sniffing algorithm [BCS09]. Thesecond one steals an access token by sending a forged authorization request through a scriptelement and then extracting the token via onerror event handler which contains cross-originvulnerability [OSV10].In addition to access tokens, our evaluation results show that an attacker could gain completecontrol of the victim?s account on many RPs (64%) by sending a forged SSO credential (i.e.,data used by the RP server-side program logics to identify the current SSO user) to the RP?ssign-in endpoint through a user-agent controlled by the attacker. Interestingly, some RPs obtainthe user?s IdP account profile (e.g., Facebook account identifier, email, user name) on the user?sclient-side browser using OAuth client-flow, and then pass it as an SSO credential to the sign-inendpoint on the server side to identify the user. However, this allows an attacker to log in asthe victim user on the RP by simply sending the victim?s publicly accessible Facebook accountidentifier to the RP?s sign-in endpoint.Unlike logic flaws, the fundamental causes of the uncovered vulnerabilities cannot simplybe removed with a software patch. Our analysis reveals that those uncovered weaknesses arecaused by a combination of implementation simplicity features offered by the design of OAuth2.0 and IdP implementations, such as the removal of the digital signature from the protocolspecification, the support of client-flow, and an ?automatic authorization granting? feature.While these simplicity features could be problematic for security, they are what allow OAuthSSO to achieve rapid and widespread adoption.We aimed to design practical mitigation mechanisms that could prevent or reduce the un-815.1. How OAuth 2.0 WorksFigure 5.1: The server-flow protocol sequences.covered threats without sacrificing simplicity. To be practical, our proposed improvements donot require modifications from the OAuth protocol or browsers, and can be adopted by IdPsand RPs gradually and separately. Moreover, the suggested recommendations do not requirecryptographic operations from RPs because understanding the details of signature algorithmsand how to construct and sign their base string is the common source of problems for manySSO RP developers [She11].The rest of the chapter is organized as follows: The next section introduces the OAuth2.0 protocol and Section 5.2 provides an overview of our approach. Section 5.3 presents theevaluation procedures and results. In Section 5.4, the implications of our results are discussed.We describe our proposed countermeasures in Section 5.5, and summarize the chapter andoutline future work in Section 5.6.5.1 How OAuth 2.0 WorksOAuth-based SSO systems are based on browser redirection in which an RP redirects the user?sbrowser to an IdP that interacts with the user before redirecting the user back to the RP website.The IdP authenticates the user, identifies the RP to the user, and asks for permission to grantthe RP access to resources and services on behalf of the user. Once the requested permissionsare granted, the user is redirected back to the RP with an access token that represents thegranted permissions. With the authorized access token, the RP then calls web APIs publishedby the IdP to access the user?s profile attributes.The OAuth 2.0 specification defines two flows for RPs to obtain access tokens: server-flow(known as the ?Authorization Code Grant? in the specification), intended for web applicationsthat receive access tokens from their server-side program logic; and client-flow (known as the?Implicit Grant?) for JavaScript applications running in a web browser. Figure 5.1 illustratesthe following steps, which demonstrate how server-flow works:1. User U clicks on the social login button, and the browser B sends this login HTTP requestto RP.825.1. How OAuth 2.0 WorksFigure 5.2: The client-flow protocol sequences.2. RP sends response type=code, client ID i (a random unique RP identifier assigned duringregistration with the IdP), requested permission scope p, and a redirect URL r to IdP viaB to obtain an authorization response. The redirect URL r is where IdP should return theresponse back to RP (via B). RP could also include an optional state parameter a, whichwill be appended to r by IdP when redirecting U back to RP, to maintain the state betweenthe request and response. All information in the authorization request is publicly known byan adversary.3. B sends response type=code, i, p, r and optional a to IdP. IdP checks i, p and r againstits own local storage.4. IdP presents a login form to authenticate the user. This step could be omitted if U hasalready authenticated in the same browser session.5. U provides her credentials to authenticate with IdP, and then consents to the release of herprofile information. The consent step could be omitted if p has been granted by U before.6. IdP generates an authorization code c, and then redirects B to r with c and a (if presented)appended as parameters.7. B sends c and a to r on RP.8. RP sends i, r, c and a client secret s (established during registration with the IdP) to IdP?stoken exchange endpoint through a direct communication (i.e., not via B).9. IdP checks i, r, c and s, and returns an access token t to RP.10. RP makes a web API call to IdP with t.11. IdP validates t and returns U?s profile attributes for RP to create an authenticated session.The client-flow is designed for applications that cannot embed a secret key, such as JavaScriptclients. The access token is returned directly in the redirect URI, and its security is handledin two ways: (1) The IdP validates whether the redirect URI matches a pre-registered URL toensure the access token is not sent to unauthorized parties; (2) the token itself is appended asan URI fragment (#) of the redirect URI so that the browser will never send it to the server,and hence preventing the token from being exposed in the network. Figure 5.2 illustrates howclient-flow works:1. User U initiates an SSO process by clicking on the social login button rendered by RP.835.2. Approach2. B sends response type=token, client ID i, permission scope p, redirect URL r and anoptional state parameter a to IdP.3. Same as sever-flow step 4 (i.e., authentication).4. Same as sever-flow step 5 (i.e., authorization).5. IdP returns an access token t appended as an URI fragment of r to RP via B. Stateparameter a is appended as a query parameter if presented.6. B sends a to r on RP. Note that B retains the URI fragment locally, and does not includet in the request to RP.7. RP returns a web page containing a script to B. The script extracts t contained in thefragment using JavaScript command such as document.location.hash.8. With t, the script could call IdP?s web API to retrieve U?s profile on the client-side, andthen send U?s profile to RP?s sign-in endpoint; or the script may send t to RP directly, andthen retrieve U?s profile from RP?s server-side.5.2 ApproachOur overall approach consists of two empirical studies that examine a representative sample ofthe most popular OAuth SSO implementations: an exploratory study, which analyzes potentialthreats users faced when using OAuth SSO for login, and a confirmatory study that evaluateshow prevalent those uncovered threats are. Throughout both studies, we investigate the rootcauses of those threats in order to design effective and practical protection mechanisms.We examined the implementations of three high-profile IdPs, including Facebook, Microsoftand Google. We could not evaluate Yahoo and Twitter as they were using OAuth 1.0 at thetime of writing. For the samples of RP websites, we looked through the list of Google?s Top1,000 Most-Visited Websites [Goo11]. We excluded non-English websites (527), and only chosewebsites that support the use of Facebook accounts for login (96), because Google?s OAuth 2.0implementation was still under experiment, and the implementation from Microsoft had justbeen released.On December 13th, 2011, Facebook released a ?breaking change? to its JavaScript SDK.The updated SDK uses a signed authorization code in place of an access token for the cookiebeing set by the SDK library [Cai11]. This change avoids exposure of the access token in thenetwork, but it also breaks the existing SSO functions of RP websites that rely on the tokenstored in the cookie. This particular event gave us an opportunity to investigate how client-flowRPs handle SSO without the presence of access tokens in cookies, and whether their copingstrategies introduce potential risks.5.2.1 Adversary ModelWe assume the user?s browser and computer are not compromised, the IdP and RP are benign,and that the communication between the RP and IdP is secured. In addition, our threat model845.2. Approachassumes that the confidentiality, integrity, and availability of OAuth related credentials (e.g.,access token, authorization code, client secret) are guaranteed by the IdP. In our adversarymodel, the goal of an adversary is to gain unauthorized access to the victim user?s personaldata on the IdP or RP website. There are three different adversary types considered in thiswork, which vary on their attack capabilities:? A web attacker can post comments that include static content (e.g., images, or stylesheet)on a benign website, setup a malicious website, send malicious links via spam or an Adsnetwork, and exploit web vulnerabilities at RP websites. Malicious content crafted by a webattacker can cause the browser to issue HTTP requests to RP and IdP websites using bothGET and POST methods, or execute the scripts implanted by the attacker.? A passive network attacker can sniff unencrypted network traffic between the browserand the RP (e.g., unsecured Wi-Fi wireless network). We assume that the client?s DNS/ARPfunction is intact, and hence do not consider man-in-the-middle (MITM) network attackers.An MITM attacker can alter the script of a redirect URI to steal access tokens directly, whichis an obvious threat that has been already discussed in the ?OAuth Threat Model? (Section4.4.2.4).? A remote attacker can send and modify HTTP request and response messages to RPs andIdPs from his browser. The attack goal is to log in as the victim user on the RP website rightfrom the attacker?s machine.5.2.2 MethodologyAcademic researchers undertaking a security analysis of real-world OAuth SSO systems faceunique challenges. These technical constraints include the lack of access to the implementa-tion code, undocumented implementation-specific design features, the complexity of client-sideJavaScript libraries, and the difficulty of conducting realistic evaluations without putting realusers and websites at risk. In our methodology, we treated IdPs and RPs as black boxes, andanalyzed the HTTP traffic going through the browser during an SSO login session to identifyexploit opportunities.In the initial stage, we implemented a sample RP for each IdP under examination to observeand understand IdP-specific mechanisms that are not covered or mandated by the specificationand the ?OAuth Threat Model?. In addition to other findings, we found that each evaluatedIdP offers a JavaScript SDK to simplify RP development efforts. The SDK library implements avariant of client-flow, and provides a set of functions and event-handling mechanisms intendedto free RP developers from implementing the OAuth protocol by themselves. We observedseveral IdP-specific mechanisms that deserve further investigation, as illustrated in Table 5.1:(1) SDKs save access tokens into HTTP cookies, (2) authorization codes are not restrictedto one-time use, (3) access tokens are obtained even before the end-user initiating the loginprocess, (4) access tokens are passing through cross-domain communication mechanisms, (5)855.3. Evaluation and ResultsMechanisms (Sections) FB GL MS1. Token cookie (4.1, 5.1) Yes1 No Yes2. Authz. code (4.3, 5.1) MU SU MU3. Implicit authz. (4.2, 5.2) Yes Yes Yes4. Cross-domain comm. (5.3) Yes2 Yes3 No45. Redirect URI (4.2, 5.2, 6.1) MD WL+MD5 SD6. Refresh token (5.2, 6.1) No Yes Yes6Table 5.1: IdP-specific implementation mechanisms. Acronyms: FB=Facebook; GL=Google,MS=Microsoft; MU=Multiple Use; SU=Single Use; MD=Multiple Domain; WL=Whitelist;SD=Single Domain. Notes: 1: prior to the fix; 2: postMessage and Flash; 3: postMessage,Flash, FIM, RMR and NIX; 4: use cookie; 5: whitelist for client and server-flow, but multipledomains for SDK flow; 6: only when an offline permission is requested.redirect URI restriction is based on an HTTP domain instead of a whitelist, and (6) a tokenrefresh mechanism is absent from Facebook?s implementation. The security implications of eachobservation are further discussed in the denoted sections.In the second stage of our exploratory study, we manually recorded and analyzed HTTPtraffic from 15 Facebook RPs (randomly chose from the list of 96 RP samples). The analysiswas conducted both before and after the Facebook SDK revision event. From the analysis ofnetwork traces, we identified several exploitable weaknesses in the RP implementations. Foreach vulnerability, a corresponding exploit was designed and manually tested on those 15 RPs.In the confirmatory study, a set of semi-automatic vulnerability assessment tools were de-signed and implemented to facilitate the evaluation process and avoid errors from manual inspec-tions. The tools were then employed to evaluate each uncovered vulnerability on 96 FacebookRPs. For each failed exploitation, we manually examined the reasons.5.3 Evaluation and ResultsTo begin an assessment process, the evaluator signs into the RP in question using both tradi-tional and SSO options through a Firefox browser. The browser is augmented with an add-onwe designed that records and analyzes the HTTP requests and responses passing through thebrowser. To resemble a real-world attack scenario, we implemented a website, denoted as at-tacker.com, that retrieves the analysis results from the trace logs, and feeds them into eachassessment module described below. Table 5.2 shows the summary of our evaluation results.We found 42% of RPs use server-flow, and 58% support client-flow; but all client-flow RPs useFacebook SDK instead of handling the OAuth protocol themselves. In the following sections,we describe how each exploit works, the corresponding assessment procedures and evaluationresults. Note that, prior to publishing our analysis results, all evaluated websites were notifiedwith the uncovered vulnerabilities.865.3. Evaluation and ResultsRPs SSL (%) Vulnerabilities (%)Flow N % T S A1 A2 A3 A4 A5Client 56 58 21 6 25 55 43 16 18Server 40 42 28 15 7 36 21 18 20Total 96 100 49 21 32 91 64 34 38Table 5.2: The percentage of RPs that is vulnerable to each exploit. Legends: T: SSL isused in the traditional login form; S: Sign-in endpoint is SSL-protected; A1: Access tokeneavesdropping; A2: Access token theft via XSS; A3: Impersonation; A4: Session swapping;A5: Force-login.5.3.1 Access Token Eavesdropping (A1)This exploit eavesdrops access tokens by sniffing on the unencrypted communication betweenthe browser and RP server. To assess this exploit, the log analyzer traces the access token fromits origin, and checks if the token is passed through any subsequent communication betweenthe browser and the RP server without SSL protection. We also implemented an access tokennetwork sniffer to confirm the results. According to the OAuth specification, an access tokenis never exposed in the network between the browser and the RP server. However, our resultsshow that access tokens can be eavesdropped on 32% of RPs.Initially, we found that Facebook and Microsoft SDKs store the access token into an HTTPcookie on the RP domain by default, and all client-flow RPs use this cookie as an SSO credentialto identify the user on the server side. However, as the cookie is created without secured andHTTP-only attributes, it could be eavesdropped on the network, or hijacked by malicious scriptsinjected on any page under the RP domain. To address this issue, Facebook revised its SDKto use a signed authorization code in place of an access token for the cookie [Cai11]. We re-executed the evaluation and found that, many RPs save the token into a cookie themselves,or pass the access token as a query parameter to a sign-in endpoint on the RP server side.Surprisingly, even server-flow RPs (7%) exhibit this insecure practice.SSL provides end-to-end protection, and is commonly suggested for mitigating attacks thatmanipulate network traffic. However, SSL imposes management and performance overhead,makes web contents non-cacheable, and introduces undesired side-effects by the website ownersuch as browser warnings about mixed secure (HTTPS) and insecure (HTTP) content [Adi08b].Due to these unwanted complications, many websites use SSL only for login pages. We found49% of RPs employ SSL to protect their traditional login forms, but only 21% use SSL for thesign-in endpoints. The reason behind this insecure practice is unclear to us, but it might bedue to the misconception that the communication channel is SSL-protected by the IdP.5.3.2 Access Token Theft via XSS (A2)The IdP?s ?automatic authorization granting? feature returns an access token automatically(i.e., without the user?s intervention) for an authorization request, if the requested permissions875.3. Evaluation and Resultsdenoted in the request have been granted by the user previously, and the user has alreadylogged into the IdP in the same browser session. The rationales behind this design feature aredetailed in Section 5.4.2. This automatic authorization mechanism allows an attacker to stealan access token by injecting a malicious script into any page of an RP website to initiate a client-side login flow and subsequently obtain the responded token. To evaluate this vulnerability,two exploits in JavaScript were designed (listed in Appendix C). Both exploits send a forgedauthorization request to the Facebook authorization server via a hidden iframe element whenexecuted. The first exploit uses the current page as the redirect URI, and extracts the accesstoken from the fragment identifier. The second exploit dynamically loads the SDK and usesa special SDK function (getLoginStatus) to obtain the access token. In order to conduct arealistic evaluation without introducing actual harm to the testing RPs and real users, we usedGreasyMonkey [LBS12], a Firefox add-on, to execute these two exploits.To evaluate, the evaluator logs into the IdP and visits the RP in question (without signingin) using a GreasyMonkey augmented browser. Both exploit scripts create a hidden iframeelement to transport a forged authorization request to the IdP, and then obtain an access tokenin return. Once the access token is obtained, the exploit script sends it back to attacker.comusing a dynamically created img element. With this stolen access token, attacker.com then callsthe IdP?s web APIs to verify whether the exploit has been carried out successfully.Our evaluation results show that 88% of RPs are vulnerable to the first exploit regardlessof their supporting flow or whether the user has logged into the RP website. RPs that areresistant to this exploit either framebusted their home pages (i.e., cannot be framed), or used adifferent domain for the redirect URI (i.e., login.rp.com for www.rp.com). The second exploitsucceeded on all evaluated RPs except those that use a different HTTP domain for receivingauthorization responses.Additionally, we examined the feasibility of a scenario in which the browser is the one thatmakes token theft possible, instead of relying on the RP website having an XSS vulnerability.We tested two such scenarios, but believe that other current and future exploits are possible.In both test cases, the vulnerable browsers are still used by about 10% of web users [W3C12].First, we embedded each exploit in a JPG image file and uploaded them onto the RP undertest. The evaluator then used IE 7 to view the uploaded image, which caused the XSS payloadbeing executed due to the browser?s content-sniffing algorithm [BCS09]. Second, we designedan exploit script (see Appendix C) that leverages certain browsers? onerror event handlingbehavior. In those browsers [OSV10], the URL that triggers the script error is disclosed to theonerror handler. We tested the exploit using Firefox 3.6.3, and it succeeded on all evaluatedRPs. The exploit script sends a forged authorization request through the src attribute of adynamically created script element, and then extracts the access token via onerror eventhandler.885.3. Evaluation and ResultsRPs SSL % Vul. %Flow SSO credential N % T S A3 A4code 35 36 14 4 25 4Client token 17 17 7 2 15 8profile 4 4 0 0 3 3Server code 24 25 18 7 11 10token 4 4 1 1 3 1Gigya profile 12 13 9 6 6 6Total 96 100 49 21 64 33Table 5.3: The percentages of RPs that are vulnerable to impersonation (A3) or session swap-ping (A4) attacks.5.3.3 Impersonation (A3)An impersonation attack works by sending a stolen or guessed SSO credential to the RP?ssign-in endpoint through an attacker-controlled user-agent. We found that an impersonationattack could be successfully carried out if (1) the attacker can obtain or guess a copy of thevictim?s SSO credential, (2) the SSO credential is not limited to one-time use, and (3) the RPin question does not check whether the response is sent by the same browser from which theauthorization request was issued (i.e., lack of ?contextual binding? validation).We designed an ?impersonator? tool in C# to evaluate this vulnerability. The tool reusesGeckoFX web browser control [Sky10] for sending HTTP requests and rendering the receivedHTML content. We modified GeckoFX to make it capable of observing and altering HTTPrequests, including headers. Based on the RP domain entered by the evaluator, the tool con-structs an exploit request based on the SSO credential and sign-in endpoint retrieved fromattacker.com, and then sends it to the RP through the GeckoFX browser control. In addition,for RPs that use the user?s IdP account profile as an SSO credential, the evaluator replaced theprofile information with one from another testing account to test whether the SSO credential isguessable. Table 5.3 shows our evaluation results. Interestingly, several RPs (9%) use the user?sIdP account identifier as an SSO credential. This allows a Remote attacker to log into the RPas the victim by simply sending the victim?s Facebook account identifer (publicly accessible) tothe RP?s sign-in end-point URI.We also found that 13% of RPs use a proxy service from Gigya [Gig11], and half of them arevulnerable to an impersonation attack, because the signatures signed by Gigya are not verifiedby those RPs. The Gigya platform provides a unified protocol interface for RPs to integrate adiverse range of web SSO protocols. The proxy service performs OAuth server-flow on behalf ofthe website, requests and stores the user?s profile attributes, and then passes the user?s profilevia a redirect URI registered with the proxy service or through cross-domain communicationchannels. While useful, we believe that a malicious or compromised proxy service could result inserious security breaches, because RPs need to provide the proxy service with their applicationsecret for each supported IdP, and all access tokens are passed through the proxy server.895.3. Evaluation and Results5.3.4 Session Swapping (A4)Session swapping is another way to exploit the lack of contextual binding vulnerability; that is,the RP doesn?t provide a state parameter in an authorization request (Step 2 in Figure 5.1 and5.2) to maintain the state between the request and response. The state parameter is typicallya value that is bound to the browser session (e.g., a hash of the session identifier), which willbe appended to the corresponding response by the IdP when redirecting the user back to theRP (Step 7 in Figure 5.1, and Step 6 in Figure 5.2). To launch a session swapping attack,the attacker (1) signs into an RP using the attacker?s identity from the IdP, (2) intercepts theSSO credential on his user-agent (Step 7 in Figure 5.1, and Step 8 in Figure 5.2), and then(3) embeds the intercepted SSO credential in an HTML construct (e.g., img, iframe) thatcauses the browser to automatically send the intercepted SSO credential to the RP?s sign-inendpoint when the exploit page is viewed by a victim user. As the intercepted SSO credentialis bound to the attacker?s account on the RP, a successful session swapping exploit allows theattacker to stealthily log the victim into her RP as the attacker to spoof the victim?s personaldata [BJM08], or mount a XSS attack as we discussed in Section 5.4.4.To evaluate this vulnerability, we designed an exploit page hosted on attacker.com. Theexploit page takes an RP domain as input parameter, retrieves the SSO credential and sign-inendpoint as an exploit request for the RP in question from the log, and then sets the exploitrequest as the src of a dynamically created iframe element. Malicious content embedded inthe iframe can cause the browser to issue an HTTP request to the RP website using both GETand POST methods, but the exploit request cannot modify or access HTTP headers. Whenthe POST method is used by the RP, the iframe?s src attribute is set to another page thatcontains (1) a web form with the action attribute set to the URL of the exploit request, andeach HTTP query parameter (key-value pair) in the exploit request is added to the form asa hidden input field, and (2) a JavaScript that submits the web form automatically when thepage is loaded.5.3.5 Force-login CSRF (A5)Cross-Site Request Forgery (CSRF) is a widely exploited web application vulnerability [OWA10],which tricks a user into loading a page that contains a malicious request that could disrupt theintegrity of the victim?s session data with a website. The attack URL is usually embedded inan HTML construct (e.g., <img src=bank.com/txn?to=evil>) that causes the browser toautomatically issue the malicious request when the HTML construct is viewed. As the mali-cious request originates from the victim?s browser and the session cookies previously set by thevictim site are sent along it automatically, there is no detectable difference between the attackrequest and one from a legitimate user request. To launch a CSRF attack, the malicious HTMLconstruct could be embedded in an email, hosted on a malicious website, or planted on benignwebsites through XSS or SQL injection attacks.A typical CSRF attacks requires the victim has already an authenticated session with the905.4. Discussionwebsite, and a force-login CSRF attack can be leveraged by an attacker to achieve this pre-requisite. By taking advantage of the ?automatic authorization granting? design feature, aforce-login CSRF attack logs the victim user into the RP automatically by luring a victim userto view an exploit page that sends a forged login request (Step 1 in Figure 5.1) or authorizationrequest (Step 2 in both Figure 5.1 and 5.2) via the victim?s browser. A successful exploit en-ables a web attacker to actively carry out subsequent CSRF attacks without passively waitingfor the victim user to log into her website.The evaluation procedures for this attack are the same as A4, except this attack requires thevictim has already an authenticated session with the IdP, and it uses a login or authorizationrequest as the exploit request. We have also noticed that some client-flow RPs in our study(18%) sign users in automatically if the user has already logged into Facebook, but this ?auto-login? feature enables an attacker to launch CSRF attacks actively. After a successful force-loginattack, we examined whether the user account data on the RP can be altered automatically by aCSRF attack. Our results show that, on 21% of the tested RPs, their users? profile informationis indeed vulnerable to CSRF exploits.5.4 DiscussionSurprisingly, we found the aforementioned vulnerabilities are largely caused by design decisionsthat trade security for simplicity. Unlike logic flows, those design features are valuable toRP developers, and cannot be fixed with a simple patch. The causality diagram in Figure 5.3illustrates how simplicity features from the protocol and IdP implementations lead to uncoveredweaknesses. First, OAuth 2.0 drops signatures in favor of SSL for RP-to-IdP communication.This design decision enables the protocol to be ?played? by clients that cannot keep their clientsecret secure (e.g., OAuth JavaScript clients), and thus the provision of client-flow. Second, toenhance user experience and reduce client-flow implementation efforts, IdPs offer an ?automaticauthorization granting? feature and SDK library. These features make the protocol simple toimplement, but at the cost of increasing the attack surface and opening the protocol to newexploits.5.4.1 Authentication State GapThe OAuth client-flow is inherently less secure than server-flow, because of an authenticationstate gap between the client-side script and the program logic on the RP server. Accordingto the OAuth specification, a client-flow is intended for browser-based applications that areexecuted completely within a user-agent. Nevertheless, a web application typically issues au-thentication sessions from its server-side. Hence, when applying client-flow for SSO, there is anauthentication state gap between the client-side script and the RP server after the authorizationflow is completed (i.e., the access token has been delivered to the client-side script). This gaprequires a client-side script to transmit an SSO credential to the sign-in endpoint on the RP915.4. DiscussionFigure 5.3: Causality diagram: OAuth 2.0 design features that lead to the security weaknesseswe found.server in order to identify the current SSO user and issue an authentication cookie. However,if the sign-in endpoint is not SSL-protected, then SSO credentials, such as the access token,authorization code and user profile, could be eavesdropped in transit.Transmitting SSO credentials between the browser and RP server could also make RPsvulnerable to impersonation and session swapping attacks if the authenticity of SSO credentialis not or cannot be guaranteed by the RP website. OAuth SSO systems are based on browserredirections in which the authorization request and response are passed between the RP andIdP through the browser. This indirect communication allows the user to be involved in theprotocol, but it also provides an opportunity for an adversary to launch attacks against theRP from his or the victim?s browser. As the exploits are launched from the end-point of anSSL channel, impersonation and session swapping attacks are still feasible even when bothbrowser-to-RP and browser-to-IdP communications are SSL-protected. In addition, we foundsome client-flow RPs use the access token obtained on the browser to retrieve the user?s profilethrough the IdP?s graph APIs, and then pass the profile as an SSO credential to the RP?s sign-in endpoint. Nevertheless, this enables an impersonation attack by sending a forged sign-inrequest using the victim?s Facebook identifier.5.4.2 Automatic Authorization GrantingIdPs offer an ?automatic authorization granting? feature to enhance both performance and theuser experience, but this feature also enables an attacker to steal access tokens through anXSS exploit. We observed that when the browser loads a HTML page that includes an OAuthJavaScript SDK library, an access token is returned to the library automatically without anexplicit user consent. This happens when the requested permissions have been granted before,and the user has already logged into the IdP in the same browser session. Further investigationon this undocumented feature revealed that obtaining access tokens in the background is en-abled by several design decisions, including (1) for simplicity, OAuth 2.0 removes the signature925.4. DiscussionPermissions Requested (%) Vulnerable (%)1. email 71 662. user birthday 44 423. publish stream 39 364. offline access 35 315. user location 27 256. basic info 20 207. user likes 10 88. publish actions 9 99. user interests 8 510. user photos 7 7Table 5.4: Top 10 permissions requested by RPs. Column ?Vulnerable? denotes the percentageof RPs that request the permission and are vulnerable to token theft (i.e., A1 or A2 attacks.)requirement for an authorization request [HL10], (2) for usability, a repeated authorization re-quest is granted automatically without prompting the user for consent, and (3) for flexibility,redirect URI restriction is enforced based on a registered HTTP domain as a while, rather thanusing a whitelist of individual UIRs, so that access tokens could be obtained on any page withinthe RP domain.Automatic authorization granting might be indeed useful, but it can be harmful as well.This function could be used by RPs to eliminate the popup login window that simply blinksand then closes, and reduce delays when the user is ready for login. In addition, we believe thatmany RPs use this design feature to (1) refresh an access token when it expires, (2) log theuser into the RP website automatically, and (3) integrate the user?s social context on the clientside directly to reduce the overhead of round-trip communication with the RP server. Whileuseful, this function, however, enables an attacker to obtain access tokens via a malicious scriptexecuted on any page of an RP website, even when the redirect URI is SSL-protected and theuser has not logged into the RP yet. Surprisingly, we found that even server-flow RPs thatobtain access tokens through a direct communication with the IdP are vulnerable as well.5.4.3 Security Implications of Stolen TokensThe malicious activities an attacker can perform with the stolen access token depend on thepermissions granted to the token, which is requested by the RP. Table 5.4 shows the top tenpermissions requested by RPs. For instance, an attacker can use email permission for spam,user birthday for identity theft or answering security questions, or publish stream to postmessages on the victim?s status wall to distribute phishing or malware messages. A full listof permissions can be found on Facebook?s Developer website [Fac13]. Note that 35% of RPsrequest an offline permission, which allows an attacker to perform authorized API requestson behalf of the victim at any time until the authorization is explicitly revoked by the user.Interestingly, 60% of publish stream and 45% of publish actions permissions were requested935.4. Discussionwith an offline permission.Attacking the victim?s social graph using compromised tokens can be fruitful for adver-saries, and hard to detect by IdPs. The social graph within a social network is a powerfulviral platform for the distribution of information. According to the designers of Facebook Im-mune System [SCM11], attackers commonly target the social graph to harvest user data andpropagate spam, malware, and phishing messages. Known attack vectors include compromisingexisting accounts, creating fake accounts for infiltrations, or through fraudulent applications.Compromised accounts are typically more valuable than fake accounts because they carry es-tablished trust; and phishing and malware are two main ways to compromise existing accounts.Yet, our work shows that the compromised access tokens can be used as another novel way toharvest user data and act on behalf of the victim user. Since this kind of new attack makes useof legitimate web API requests on behalf of the victim RP, we believe that it is difficult for anIdP to detect and block the attack, unless it can be distinguished from a legitimate use of thesame APIs.5.4.4 Vulnerability InterplaysOne vulnerability could lead to several different exploits. For example, a compromised tokencould be used to impersonate the victim user on the RP, or harvest the victim?s identity infor-mation on the IdP. In addition, it can be used to infiltrate the victim?s social circles to trickother victims into visiting the vulnerable RP, or bootstrapping a drive-by-download exploit.Other possible exploits remain.Interestingly, we found that, a session swapping or force-login vulnerability could be usedto overcome an attack constraint where an authenticated session with the RP is required beforelaunching an XSS token theft attack. Moreover, for the RP in which user profile (e.g., username) is not XSS protected, a session swapping or force-login attack could be leveraged fortoken theft. To leverage session swapping, the attacker first appends a token theft script tothe user name of his account on the RP website. The attacker then creates a malicious pagethat uses a hidden iframe or img element to log the victim into the RP as the attacker, andhence executes the exploit script when the attacker?s name is rendered on the page. Ourexploit succeeded on 6% of tested RPs. The exploit page could be customized with attractivecontent, and delivered to the users through spam emails, malvertisings [SE11], inflight contentmodifications [ZHR+11], or posting on popular websites. To take advantage of a force-loginvulnerability, the malicious page stealthily logs the victim into the RP, appends a script to theuser?s name using CSRF attacks, and then redirects the victim to a page on the RP where theuser name is rendered.5.4.5 Visualization and Analysis of ResultsWe visualized our evaluation results to explore the correlations between the rank of each testedRP and its vulnerabilities, requested permissions, and the use of SSL. The visualization in945.4. DiscussionFigure 5.4: The distribution of the rank of each evaluated RP and its corresponding vulnerabili-ties (A1 to A5), requested permissions (offline, email, publish streams, publish actions),and the use of SSL on tradition login form (SSL T) and SSL session (SSL S). Each vertical linein the ?Rank? row denotes the rank of the RP that we tested.Figure 5.4 provides an overall view of the distributions of these four related data items. Inaddition, it allows us to reason about certain security properties of each individual RP visually.For instance, the figure shows that the highest ranked RP on the first column was free from anyvulnerability, requested several extended permissions (i.e., offline, email, publish streams),and used SSL on both traditional and SSO login options. This seems to imply that this RP?sdesigners were security-aware (i.e., used SSL) and made it secure (i.e., no vulnerabilities), butthe requested permissions might raise users? privacy concerns.We found no correlation between the rank, vulnerability, and permission. There was, how-ever, a strong correlation between the use of SSL on the sign-in endpoint and whether the RPwas resistant to the uncovered vulnerabilities. Comparison of the distribution of vulnerablewebsites (A1 to A5 respectively, and the total number of vulnerabilities) in the bins of 100revealed that there was no statistically significant difference (SSD) from uniform distribution(F-test, p=.56 to .99). Similarly, the request permissions were uniformly distributed (p=.60 to.84), and there is no SSD between the number of vulnerabilities found in RPs that used SSLfor traditional login page and those that did not. However, our analysis found that for an RPthat used SSL for SSO login sessions, there were significantly fewer chances (31%, p=0.001) tobe vulnerable to the discovered vulnerabilities, in comparison with RPs that performed SSOwithout SSL protection.5.4.6 LimitationsOur work only examined high-profile IdPs, and RPs listed on the top 1,000 most-visited sites,and hence the evaluation results might not be generalizable to all IdPs and RPs. However, ourstatistical analysis did not reveal any correlation between websites? popularity rankings and thediscovered vulnerabilities. In addition, due to the inherent limitations of the black-box analysisapproach, as well as the analysis captures status at only one instant in time, we acknowledgethat the list of the vulnerabilities we uncovered is not complete. We believe that other potentialimplementation flaws and attack vectors do exist.955.5. RecommendationsThreats to User?s DataRecommendations On IdP On RPA1 A2 A3 A4 A5C S C S C S C S C SR1: Explicit authorization flow?R2: Whitelist redirect URIs 4 4R3: Support token refresh 4 4R4: Single-use authorization code 4 4R5: Avoid token cookie 4R6: Explicit user consent 4 4 4 4R7: Explicit user authentication? ?R8: SSO domain separation 4 4R9: Confidentiality of SSO credentials? ?4 4R10: Authenticity of SSO credentials 4 4? ? ? ?Table 5.5: Recommendations developed for client-flow (C) or server-flow (S) RPs. Each cellindicates wether the suggested recommendation offers no (empty), partial (4), or complete (?)mitigation of the identified attacks (A1?A5).5.5 RecommendationsWe make recommendations that not only allow to close down discovered vulnerabilities but alsomeet the following requirements:? Backward compatibility: The protection mechanism must be compatible with the existingOAuth protocol and must not require modifications from the browsers.? Incremental deployability: IdPs and RPs must be able to adopt the proposed improve-ments gradually and separately, without breaking their existing functional implementations.? Simplicity: The countermeasure must not require cryptographic operations (e.g., HMAC,public/private key encryption) from RPs, because it is the main design feature that makesOAuth 2.0 gain widespread acceptance.Table 5.5 illustrates the summary of our recommendations as described below. The recom-mended improvements were tested (from the security mitigation point of view) on sample IdPand RP websites that we have implemented.5.5.1 Recommendations for IdPsIdPs should provide secure-by-default options to reduce attack surfaces, and include users inthe loop to circumvent request forgeries while improving their privacy perceptions:? R1: Explicit authorization flow registration: IdPs should provide a registration optionfor RPs to explicitly specify which authorization flow the RP supports, and grant accesstokens only to the flow indicated. This option alone could completely protect server-flowRPs (42% of RPs in our study) from access token theft via XSS attacks.965.5. Recommendations? R2: Whitelist redirect URIs: Domain-based redirect URI validation significantly in-creases the RP attack surface. In contrast, whitelisting of redirection endpoints allows RPs toreduce the attack surface and dedicate their mitigation efforts to protect only the whitelistedURIs that are registered with the IdP.? R3: Support token refresh mechanism: Without a standard token refresh mechanism(as described in Section 6 of the specification [HLRH11]) offered by the IdP, RPs need torequest an offline permission in order to keep the access token valid due to the short-livednature of access tokens (e.g., one hour). However, this practice violates the principle of leastprivilege, and increases the chances for such a request being disallowed by users. Anotherwork-around solution is to use the ?automatic authorization granting? feature on the client-side to get a new access token periodically. However, this could make access tokens vulnerableto network eavesdropping and XSS attacks.? R4: Enforce single-use of authorization code: 61% of tested RPs use an authorizationcode as an SSO credential, but they are vulnerable to impersonation attacks, partially be-cause its single-use is not enforced by Facebook. The rationale behind this practice is notdocumented, but we believe that, due to the lack of a token refresh mechanism provided byFacebook, the authorization code is intended for RPs to exchange a valid access token whenone expires.? R5: Avoid saving access token to cookie: At the time of writing, Microsoft?s SDK stillstores access tokens into cookies. We suggest other IdPs to follow Facebook?s improvement byusing a signed authorization code and user identifier for the cookie in place of an access token.This is because the consequence of a stolen access token is much severer than a compromisedauthorization code, since a composed access token can be used directly to access the victim?sIdP account profile information.? R6: Explicit user consent: Automatic authorization granting should be offered only toRPs that explicitly request it during registration. In addition to preventing token theft,explicit user consent could also increase users? privacy awareness, and their adoption inten-tions [SPM+11b]. To encourage the practice of the principle of least privilege by RPs, IdPscould also prompt a user consent for every authorization request originated from RPs thatask for extended permissions, such as offline or publish actions.? R7: Explicit user authentication: Our user study (Chapter 3) show that many partici-pants in the study incorrectly thought that the RP knows their IdP login credentials becausethe login popup window simply blinked open and then closed when the participants hadalready authenticated to their IdP in the same browser session. The study also shows thatprompting users to authenticate with their IdP for every RP sign-in attempt could provideusers with a more adequate mental model, and improve user?s security perception. Accord-ingly, RPs should be able to specify an additional parameter in the authorization requestindicating whether an explicit user authentication is required in order to enhance users? trust975.5. Recommendationswith the RP, and prevent force-login attacks. We acknowledge, however, that the usabilityimplications of this recommendation on users need to be proper evaluated.Furthermore, we recommend IdPs to adopt a more secure type of access token. The ?OAuthThreat Model? introduces two types of token: bearer token, which can be used by any client whohas received the token [JHR11], and proof token (e.g., MAC tokens [HLBA11]), which can onlybe used by a specific client. We found that?probably for the sake of simplicity?all examinedIdPs offer bearer tokens as the only option. As proof tokens can prevent replay attacks whenresource access requests are eavesdropped, IdPs should provide proof token as a choice for RPs.Furthermore, we suggest that JavaScript SDK should support the use of an authorization codeas a response option so that server-flow developers can use the SDK as well.5.5.2 Recommendations for RPsBesides verifying signatures from the signed authorization code cookie and the proxy service, andavoiding using the user?s profile received from the IdP on the client-side as an SSO credential,RPs can further reduce the risks we?ve discovered by practicing the following recommendations:? R8: SSO Domain separation: RPs should use a separate HTTP domain for redirectURIs, in order to prevent attacks that exploit token theft vulnerabilities potentially presentin the RP?s application pages. For instance, an RP can register login.rp.com with the IdP asthe redirect URI domain for the www.rp.com domain. All endpoints within this dedicatedlogin domain should be protected with SSL, and input values should be properly sanitizedand validated to prevent XSS attacks.? R9: Confidentiality of SSO credentials: For RPs that already have SSL in place, theSSL should be used to protect their sign-in endpoints (i.e., accepting only SSL connectionsfor their sign-in endpoints). Although the use of SSL introduces unwanted complications, webelieve that the negative impacts can be negligible, since there is typically only one sign-inendpoint per website, and the sign-in endpoint normally contains only server-side programlogic.? R10: Authenticity of SSO credentials: To ensure contextual bindings, RPs could includea value that binds the authorization request to the browser session (e.g., a hash of thesession identifier) in the request via redirect uri or state parameter. Upon receiving anauthorization response, the RP recomputes the binding value from the session cookie andchecks whether the binding value embedded in the authorization response matches the newlycomputed value. For server-flow RPs, the binding token can be used to prevent force-loginattacks by appending the binding token to the SSO login form as a hidden field. Moreover,the binding token should be used with any HTTP request that alters the user state with theRP website.985.6. Summary5.6 SummaryAs OAuth SSO systems are being employed to guard billions of user accounts on IdPs and RPs,the insights from our work are practically important and urgent, and could not be obtainedwithout an in-depth analysis and evaluation. To summarize, this work makes the followingcontributions:? The first (to our knowledge) published empirical investigation of the security of a repre-sentative sample of most-visited OAuth SSO implementations, and a discovery of severalcritical vulnerabilities.? An evaluation of the discovered vulnerabilities and an assessment of their prevalence acrossRP implementations.? A development of practical recommendations for IdPs and RPs to secure their implemen-tations.OAuth 2.0 is attractive to RPs and easy for RP developers to implement, but our inves-tigation suggests that it is too simple to be secured completely. Unlike conventional securityprotocols, OAuth 2.0 is designed without sound cryptographic protection, such as encryption,digital signature, and random nonce. The lack of encryption in the protocol requires RPs toemploy SSL, but many evaluated websites do not follow this practice. Additionally, the authen-ticity of both an authorization request and response cannot be guaranteed without a signature.Moreover, an attack that replays a compromised SSO credential is difficult to detect, if therequest is not accompanied by a nonce and timestamp. Furthermore, the support of client-flow opens the protocol to a wide range of attack vectors because access tokens are passedthrough the browser and transmitted to the RP server. Compared to server-flow, client-flow isinherently insecure for SSO. Based on these insights, we believe that OAuth 2.0 in the hand ofmost developers?without a deep understanding of web security?is likely to produce insecureimplementations.To protect web users in the present form of OAuth SSO systems, we suggest simple andpractical mitigation mechanisms that can be incrementally deployed by IdPs and RPs. It isurgent for current IdPs and RPs to adopt those protection mechanisms in order to prevent large-scale security breaches that could compromise millions of web users? accounts on their websites.In particular, the design of OAuth 2.0?s server-flow makes it more secure than client-flow, andshould be adopted as a preferable option, and IdPs should offer explicit flow registration andenforce single-use of authorization code. Furthermore, JavaScript SDKs play a crucial role inthe security of OAuth SSO systems; a thorough and rigorous security examination of thoselibraries is an important topic for future research.99Chapter 6Dynamic SQL Injection AttacksProtectionSQL injection attacks (SQLIAs) are one of the foremost threats to web applications. Accordingto the OWASP Foundation, injection flaws, particularly SQL injection, were the second mostserious type of web application vulnerability in 2010 [OWA10]. In the context of web SSO,SQLIAs can be leveraged by adversaries to compromise user private data and login credentialsboth on IdPs and RPs. In this chapter, we propose an efficient and effective approach fordynamic SQLIAs protection without accessing application source code.The threats posed by SQLIAs go beyond simple data manipulation. Through SQLIAs, anattacker may also bypass authentication, escalate privileges, execute a denial-of-service attack,or execute remote commands to transfer and install malicious software. As a consequence ofSQLIAs, parts of or entire organizational IT infrastructures can be compromised. An effectiveand easy to employ method for protecting numerous existing web applications from SQLIAsis crucial for the security of today?s organizations. As a case in point, SQLIAs were appar-ently employed by Ehud Tenenbaum, who has been arrested on charges of stealing $1.5M fromCanadian and at least $10M from US banks [Zet09].State-of-the-practice SQLIA countermeasures are far from effective and many web applica-tions deployed today are still vulnerable to SQLIAs [OWA10]. SQLIAs are performed throughHTTP traffic, sometimes over SSL, thereby making network firewalls ineffective. Defensivecoding practices require training of developers and modification of the legacy applications toassure the correctness of validation routines and completeness of the coverage for all sourcesof input. Sound security practices?such as the enforcement of the principle of least privilegeor attack surface reduction?can mitigate the risks to a certain degree, but they are prone tohuman error, and it is hard to guarantee their effectiveness and completeness. Signature-basedweb application firewalls?which act as proxy servers filtering inputs before they reach webapplications?and other network-level intrusion detection methods may not be able to detectSQLIAs that employ evasion techniques [MS05].Detection or prevention of SQLIAs is a topic of active research in industry and academia. Anaccuracy of 100% is claimed by existing published techniques that use static analysis [HO05,BWS05, SW06, BBMV07], dynamic taint analysis [NTGG+05, PB05], or machine learningmethods [VMV05]. However, the requirements for analysis and/or instrumentation of the ap-plication source code [HO05, BWS05, SW06, BBMV07], or acquisition of training data [VMV05]100Chapter 6. Dynamic SQL Injection Attacks Protectionlimit the adoption of these techniques in real-world settings. Moreover, a common deficiency ofexisting SQLIA approaches based on analyzing dynamic SQL statements is in defining SQLIAstoo restrictively, which leads to a higher than necessary percentage of false positives (FPs).False positives could have significant negative impact on the utility of detection and protectionmechanisms, because investigating them takes time and resources [JD02, WHM+08]. Evenworse, if the rate of FPs is high, security practitioners might become conditioned to ignorethem.We propose an approach in this chapter for retrofitting existing web applications with run-time protection against known as well as unseen SQLIAs without the involvement of applicationdevelopers. Our work is mainly driven by the practical requirement of web-application ownersthat a protection mechanism should be similar to a software-based security appliance that canbe ?dropped? into an application server at any time, with low administration and operatingcosts. This ?drop-and-use? property is vital to the protection of web applications where sourcecode, qualified developers, or security development processes might not be available or practical.To detect SQLIAs, our approach combines two heuristics. The first heuristic (labeled as?token type conformity?) triggers an alarm if the parameter content of the corresponding HTTPrequest is used in non-literal tokens (e.g., identifiers or operators) of the SQL statement. Whileefficient, this heuristic leaves room for false positives when the application developer (intention-ally or accidentally) includes tainted SQL keywords or operators in a dynamic SQL statement.This case would trigger an SQLIA alarm, even though the query does not result in an SQLIA.For instance, as a common case of result-set sorting, a developer could intentionally includea predefined parameter value in an HTTP request to form an ?ORDER BY? clause in an SQLstatement. As we explain later in the chapter, the existing approaches and the detection logicbased solely on the first heuristic would trigger an SQLIA alarm because the keywords ?ORDER?and ?BY? are tainted, even though the intercepted SQL statement is indeed benign. In thiscase, the user is supplying input intended by the programmer; she is not injecting SQL.When a potential SQLIA is detected by the first heuristic, our approach employs the secondheuristic (labeled as ?conformity to intention?) to eliminate the above type of false positives.We put forward a new view of an SQLIA: an attack occurs when the SQL statement producedby the application at runtime does not conform to the syntactical structure intended by the ap-plication developer. Intention conformity enables runtime discovery of the developers? intentionfor individual SQL statements made by web applications. Defined more precisely later in thechapter, such a view of an SQLIA requires ?reverse engineering? of the developer?s intention.Our approach not only ?discovers? the intention but does so at runtime, which is critical forthose applications that are provided without source code. To discover the intended syntacticalstructures, our approach performs dynamic taintness tracking at runtime and encodes the in-tended syntactical structure of a dynamic query in the form of SQL grammar, which we termintention grammar. Our detection algorithm triggers an alarm if the intercepted SQL statementdoes not conform to the corresponding intention grammar.1016.1. BackgroundFigure 6.1: How SQL injection attacks work.To evaluate our approach, we developed SQLPrevent. It is a software-based security appli-ance that (1) intercepts HTTP requests and SQL statements at runtime, (2) marks parametervalues in HTTP requests as tainted, (3) tracks taint propagation during string manipulations,and (4) performs analysis of the intercepted SQL statements based on our heuristics. To evalu-ate SQLPrevent, we examined its effectiveness and performance overhead both in lab and fieldenvironments. For in-lab evaluation, we employed the AMNESIA [HO05] testbed, which hasbeen used for evaluating several other research systems. We extended the AMNESIA testbedto contain requests with new false positives, and added another set of obfuscated attack inputsper application. In our experiments, SQLPrevent produced no false positives or false negatives,and imposed little performance overhead (maximum 3.6%, standard deviation 1.4%), with 30milliseconds response time for the tested applications. For field evaluation, we deployed SQLPre-vent on to a web application administrated by the IT department of our institute. SQLPreventidentified nine instance of SQL injection vulnerabilities and imposed neglectable performanceoverhead.The rest of the chapter is organized as follows. In the next section, we explain how SQLinjection attacks and typical countermeasures work. Then we review existing work and compareit with the proposed approach. We then describe our approach in detail for detecting andpreventing SQL injection attacks. Next, we discuss the implementation of SQLPrevent inJ2EE, ASP.NET, and ASP, followed by a description of the evaluation methodology and results.Finally, we discuss the implications of the results and the strengths and limitations of ourapproach before summarizing the chapter and outlining future work.6.1 BackgroundIn this section, we explain how SQLIAs work, why false positives are possible, and what coun-termeasures are currently available. Readers familiar with the subject can proceed directly tothe next section.6.1.1 How SQL Injection Attacks WorkFor the purpose of discussing SQLIAs, a web application can be thought of as a black box thataccepts HTTP requests as inputs and generates SQL statements as outputs, as illustrated in1026.1. BackgroundFigure 6.1. Web applications commonly use parameter values from HTTP requests to form SQLstatements. SQLIAs may occur when data in an HTTP request is directly used to constructSQL statements without sufficient validation or sanitization. For instance, when S="SELECT *FROM product WHERE id=" + request.getParameter("product id") is executed in the webapplication, the value of the HTTP request parameter product id is used in the SQL statementwithout any validation. By taking advantage of this vulnerability, an attacker can launch varioustypes of attacks by posting HTTP requests that contain arbitrary SQL statements. Below isan example of a malicious HTTP request:POST /prodcut.jsp HTTP/1.1product_id=2; exec master..xp_cmdshell ?net user hacker 1234 /add?In the case of the above attack, the SQL statement constructed by the programming logic wouldbe the following:SELECT * FROM product WHERE id=2; _exec master..xp_cmdshell ?net user hacker 1234 /add?If the injected code is executed by the database server, this attack would add a new user accountnamed ?hacker? with a password ?1234? to the underlying Windows operating system. Moremalicious attacks, such as file upload and remote command execution, are also possible withsimilar attack techniques [Anl02].To confuse signature-based detection systems, attackers may also apply evasion techniquesthat obfuscate attack strings. Below is an obfuscated version of the above privilege-escalationattack.POST /prodcut.jsp HTTP/1.1product_id=2; /* */declare/* */@x/* */as/**/varchar(4000)/* */set/* */@x=convert(varchar(4000),0x6578656320206D61737465722E2E78705F636D647368656C6C20276E65742075736572206861636B6572202F6164642027)/**/exec/* */(@x)The above obfuscation utilizes hexadecimal encoding, dropping white space, and inline commenttechniques. For a sample of evasion techniques employed by SQLIAs, see [MS05].6.1.2 False PositivesWeb application developers typically use string manipulation functions to dynamically com-pose SQL statements by concatenating pre-defined constant strings with parameter values fromHTTP requests. In these cases, programmers can freely incorporate user inputs to form dynamicSQL statements. Without taking developers? SQL-grammatical intentions into account, falsepositives are possible in all existing dynamic SQLIA approaches. We illustrate this false-positiveproblem through a running example.1036.1. BackgroundExample 1. Assume there is an HTML dropdown list named ?order by?, which consists ofthree entries??without order?, ?by id?, ?by name?. Each entry and its corresponding valueis shown in the following HTML code:<select name=?order_by?><option value=??>without order</option><option value=?ORDER BY id?>by id</option><option value=?ORDER BY name?>by name</option></select>Assume a programmer intentionally uses the value of the parameter ?order by? to form anSQL query, as illustrated in the following Java code fragment:S=?SELECT c1 FROM t1? + request.getParameter(?order by?);Based on a user?s selection at runtime (assume the second entry is selected), the SQL statementconstructed by the above programming logic would be ?SELECT c1 FROM t1 ORDER BY id?,where underlined labels indicate the data originated from an HTTP request.Obviously, the above Java code fragment is vulnerable. An attacker can launch an arbitraryattack by simply appending an attack string to the legitimate input ?order by=ORDER BY id?.However, during normal operations, the dynamically constructed SQL statements are indeedbenign and harmless.6.1.3 Existing CountermeasuresBecause SQLIAs are carried out through HTTP traffic, sometimes protected by SSL, mosttraditional intrusion-prevention mechanisms, such as firewalls or signature-based intrusion de-tection systems (IDSs), are not capable of detecting SQLIAs. Three types of countermeasuresare commonly used to prevent SQLIAs: web application firewalls, defensive coding practices,and service lock-down.Web application firewalls such as WebKnight [AQT07], ModSecurity [Bre07] and Se-curity Gateway [SS02] are easy to deploy and operate. They are commonly implemented asproxy servers that intercept and filter HTTP requests before requests are processed by webapplications. However, due to the limitation of signature databases or policy rules, they maynot effectively detect unseen patterns or obfuscated attacks that employ evasion techniques.Also, false positives might occur if signatures or filter policy rules are too restrictive.Defensive coding practices are the most intuitive ways to prevent SQLIAs, by validatinginput types, limiting input length, or checking user input for single quotes, SQL keywords,special characters, and other known malicious patterns. Using a parameterized query API(e.g., PrepareStatement in Java and SQLParameter in .NET) is another compelling solution1046.2. Related Workfor mitigating SQLIAs directly in code, as parameterized queries syntactically separate theintended structure of SQL statements and data literals.Service lock-downs are procedures employed to limit the damage resulting from SQLIAs.System administrators can create least-privileged database accounts to be used by web appli-cations, configure different accounts for different tasks and reduce un-used system procedures.However, similar to defensive coding practices, these countermeasures are prone to human error,and it is difficult to assure their correctness and/or completeness.Having discussed the state of the practice, in the next section we provide an overview of thestate of the art.6.2 Related WorkExisting research related to SQLIA detection or prevention can be broadly categorized basedon the type of data analyzed or modified by the proposed techniques: (1) runtime HTTPrequests, (2) design-time web application source code, and (3) runtime dynamically generatedSQL statements. Below, we discuss related work using this categorization, briefly summarizethe advantages and limitations of existing approaches, and demonstrate why false positivesare possible in some approaches. For a more detailed discussion, we refer the reader to aclassification of SQLIA prevention techniques in [HVO06].Web application source code analysis and hardening: WebSSARI [HYH+04], andapproaches proposed by [LL05], [JKK06], and [XA06] use information-flow-based static analysistechniques to detect SQLIA vulnerabilities in web applications. Once detected, these vulnerabil-ities can be fixed by the developers. They have the advantages of no runtime overhead and theability to detect errors before deployment; however, they need access to the application sourcecode, and the analysis has to be repeated each time an application is modified. Such access issometimes unrealistic, and repeated analysis increases the overhead of change management.Runtime analysis of SQL statements for anomalies: [VMV05] propose an SQLIAdetection technique based on machine learning methods. However, the fundamental limitationof this and other approaches based on machine learning techniques is that their effectivenessdepends on the quality of training data used. Training data acquisition is an expensive processand its quality cannot be guaranteed. Non-perfect training data causes such techniques toproduce false positives and false negatives.Static analysis with runtime protection: SQLrand [BK04] modifies SQL statements inthe source code by appending a randomized integer to every SQL keyword during design-time;an intermediate proxy intercepts SQL statements at runtime and removes the inserted integersbefore submitting the statements to the back-end database. For our running Example 1 of falsepositive, the intercepted SQL statement in SQLrand would read as ?SELECTkey c1 FROMkey t1ORDER BY id?, where ?key? represents the random key. The intercepted SQL statement wouldcause a false positive, since the keywords ?ORDER? and ?BY? are not appended with the random1056.2. Related Workkey.SQLGuard [BWS05] provides programmers with a Java library to manually bracket theplaceholders of user input in SQL statements. During runtime, SQLGuard compares two parsetrees of the dynamically created SQL statement with and without input values respectively.In the case of Example 1, SQLGuard will compare parse trees of (1) ?SELECT c1 FROM t1key ORDER BY idkey ?, and (2) ?SELECT c1 FROM t1 key key ?, where the first query containsinput value and the second does not. SQLGuard would trigger an alarm for this query sinceneither augmented query is a valid SQL statement.AMNESIA [HO05] builds legitimate SQL statement models using static analysis based oninformation flow. At runtime, SQL statements that do not conform to the corresponding pre-built model are rejected and treated as SQLIAs. Since the automaton of the model ?SELECT ?c1 ? FROM ? t1 ? ?? would not accept the example dynamic SQL (corresponding ? mustbe string or numeric constant), the SQL query from Example 1 would be an instance of falsepositive in AMNESIA.WASP [HOM06] prevents SQLIAs by checking whether all SQL keywords and operators inan SQL statement are marked as trusted. To track trusted sources, WASP uses Java byte-code instrumentation techniques to mark all hard-coded and implicitly created strings in thesource code, and strings from external sources (e.g., file, trusted network connection, database)as trusted. In the case of Example 1, WASP would view the intercepted SQL statement as? SELECT c1 FROM t1 ORDER BY id?, where underlined labels indicate the data are trusted.Since the keywords ?ORDER? and ?BY? are not marked as trusted, the query would be rejectedas an instance of false positive.SQLCheck [SW06] detects SQLIAs by observing the syntactic structure of generated SQLqueries at runtime, and checking whether this syntactic structure conforms to an augmentedgrammar. The main limitation of SQLCheck is that it requires each parameter value to beaugmented with the meta-characters in order to determine the source of substrings in theconstructed SQL statement. This approach requires manual intervention of the developer toidentify and annotate untrusted sources of input, which introduces incompleteness problemsand may lead to false negatives. In addition, wrapping meta-characters around each parametervalue might cause unexpected side-effects. For instance, if the programming logic in a webapplication performs string comparison using the augmented parameter value, the result wouldbe different than in the case of no meta-characters, which would cause unexpected resultsin business logic (e.g., math operations of two user inputs). In addition, the generated SQLstatement for Example 1 would read as ?SELECT c1 FROM t1 / ORDER BY id .?, where / and. are special meta-characters added by SQLCheck. This query would be treated as an injectionattack if the augmented grammar does not state user inputs are permitted in ?ORDER? and ?BY?keywords.CANDID [BBMV07] transforms a Java web application by adding a benign candidate vari-able vc for each string variable v. When v is initialized from the user-input, vc is initialized1066.3. ApproachFigure 6.2: Main elements of SQLPrevent architecture are shown in light grey. The data flowis depicted with sequence numbers and arrow labels. Underlined labels indicate that the dataare accompanied by the tainted meta-data. Depending on whether an SQL statement is benignor potentially malicious, data may flow to the Intention Validator conditionally.with a benign candidate value that is the same length as v. If v is initialized by the program,vc is also initialized with the same value. CANDID then compares the real and candidate parsetrees at runtime. Using Example 1, the real and the corresponding candidate SQL statementwould be ?SELECT c1 FROM t1 ORDER BY id?, and ?SELECT c1 FROM t1 aaaaaaaaaaa?, re-spectively. The intercepted SQL statement would be treated as an attack, since the parse treesderived from the two queries differ.Runtime analysis of HTTP requests and SQL statements: Approaches employingdynamic taint analysis have been proposed by [NTGG+05] and [PB05]. Taint information refersto data that come from un-sanitized or un-validated sources, such as HTTP requests. Bothapproaches modify the PHP interpreter to mark tainted data as they enter the application andflow around. If tainted data have been used to create SQL keywords and/or operators in thequery, the call is rejected. For the running example, the intercepted SQL statement wouldbe viewed as ?SELECT c1 FROM t1 ORDER BY id?, where underlined labels indicate the dataare tainted. Since the keywords ?ORDER? and ?BY? are marked as tainted, the query wouldbe rejected?which is an instance of false positive. Sekar [Sek09] proposed a black-box taint-inference technique that infers tainted data in the intercepted SQL statements, and then employssyntax and taint-aware policies for detecting unintended use of tainted data. His techniqueachieves taint-tracking without intrusive instrumentation on target applications or modificationto the runtime environment. However, false positives and false negatives are possible due tosub-optimal accuracy of the taint-inference algorithm and taint-awareness policies.6.3 ApproachOur approach enables retrofitting existing web applications with run-time protection againstknown as well as unseen SQLIAs. The core of the approach is a software-based securityappliance, SQLPrevent, which can be ?plugged? into a web server without any modifica-1076.3. Approachtions to the hosted web applications. As illustrated in Figure 7.2, SQLPrevent consists ofHTTP Interceptor, Taint Tracker, SQL Interceptor, SQL Lexer, Intention Validator,and SQLIA Detector modules. When SQLPrevent is deployed in a web server, the originaldata flow (HTTP request ? web application ? database driver ? database) is altered. First,the reference to the program object representing an incoming HTTP request is intercepted byHTTP Interceptor, and data in the request are marked as tainted. Second, propagation oftainted data is tracked by Taint Tracker. Finally, the SQL statements issued by web applica-tions are intercepted by the SQL Interceptor and passed to the SQLIA Detector. The SQLIADetector module performs detection based on the two heuristics (token type conformity andconformity to intention) to detect an attack. Token type conformity determines whether anHTTP request is benign or potentially malicious by checking whether tainted data are used onlyas string or numeric literals in the intercepted SQL statement. SQL Lexer is used by SQLIADetector module to tokenize SQL statements. Normally, most dynamically constructed SQLstatements are benign. When a potential SQLIA is detected (i.e., any non-literal token containstainted characters), SQLIA Detector passes a tainted SQL statement to Intention Validatorto confirm whether tainted non-literal tokens have been intentionally constructed by developers.If the intercepted SQL statement does not conform to the intended syntactical structure, SQLIADetector, depending on the configuration, either triggers an alarm or prevents the malformedSQL statement from being submitted to the database. Note that any HTTP request that vio-lates toke type conformity will be flagged as a potential vulnerability. Whereas the SQLPreventarchitecture is based on a standard approach of implementing a security subsystem in the formof interceptors, our approach is distinguished by its detection logic. The following subsectionsdescribe each of the detection heuristics in detail.6.3.1 Token Type ConformityThe core of the token type conformity heuristic is based on the observation that SQLIAs alwayscause a parameter value, or its portion, to be interpreted by the back-end database as somethingother than an SQL string or numeric literal, thus altering the intended syntactical structure ofthe dynamically generated SQL statement. In order to retain statements? intended syntacticalstructure, however, parameter values from HTTP requests should be used only as SQL stringor numeric literals.Tracking of Tainted DataTainted data refers to data that originates from an untrusted source, such as an HTTP request.An SQLIA occurs when tainted data are used to construct an SQL statement in a way that altersthe intended syntactical structure of the SQL statement. To trace the source of each characterin an SQL statement for web applications, we designed per-character taint propagation using acustom implementation of Java?s string-related classes. Our design (1) contains an additionaldata structure?referred as taint meta-data?for tracking the taint status of each character1086.3. ApproachAlgorithm 1: Token type conformity SQLIA detection algorithmInput: An intercepted SQL statement string sOutput: A boolean value indicates whether s is malicious or not4? set of tokens in s;for every token t in 4 doif typeOf(t) 6= string or number literal and isTainted(t) thenreturn true;end ifend forreturn false;in a string, and (2) implements public methods for setting/getting the taint meta-data. Thismeta-data is propagated during string manipulations, such as concatenation, extraction, orconversion.Lexical Analysis of SQL StatementsSQLPrevent performs lexical analysis of SQL statements at run-time in order to identify non-literal tokens in the SQL statements. Lexical analysis is the process of generating a stream oftokens from the sequence of input characters comprising the SQL statement. The goal of lexicalanalysis in our approach is to generate two sets of tokens: LITERALS and NON-LITERALS.The LITERALS set contains string and number tokens, and the NON-LITERALS set hastokens of all other types. The exact types of tokens in the NON-LITERAL set are irrelevantfor the purpose of our detection logic. This simplified design of the lexical analyzer makes ourapproach efficient and more portable among databases. For instance, during the experiments,our implementation of SQL lexer worked with MySQL without any modification, even thoughthe lexer was originally designed for Microsoft SQL Server.Detecting SQLIAsApplying our heuristic that parameter values should only be used as string or numeric literalsin the dynamic SQL statements, the mechanisms of taint tracking, and SQL lexical analysis,we developed an algorithm for SQLIA detection using token type conformity. Shown in Algo-rithm 1, the algorithm takes an SQL statement s and taint information about the charactersin s as a implicit parameter. If tainted character(s) appears in any non-literal token (e.g.,identifier, delimiter, or operator) of s, the algorithm returns true, otherwise false. For eachtoken of an intercepted SQL statement, if the type of token is not a literal (i.e., not a string ornumber), and the token is tainted, then the intercepted SQL statement is potentially malicious.The ?token type conformity? heuristic was originally inspired by Perl taint mode [Wal07].When in taint mode, the Perl runtime explicitly marks data originating from outside of aprogram as tainted. Tainted data are prevented from being used in any security sensitive1096.3. Approachfunctions such as shell commands, or database queries. To ?untaint? an untrusted input,the tainted data must be passed through a sanitizer function written in regular expressions.However, developers have to manually untaint user input data, and sanitizer functions mightnot catch all malicious inputs, especially when evasion techniques are employed.The effectiveness of our approach depends on the precision of taint tracking. However, thetraces of taint meta-data might be lost due to certain limitations in the tainting implementation.For instance, in Java, string-related classes export character-based functions (e.g., toCharArray)for retrieving internal characters of a string. The taint tracking module is unable to propagatetaint meta-data to primitive types unless a modified version of JVM is employed. Thus, thetaint information would be lost if an application constructs a new instance of string based onthe internal characters of another string. Nevertheless, based on the experimental results and tothe best of our knowledge, retrieving internal buffer of a string to construct an SQL statementis a rare case, and it is common coding practice that a programmer should validate any binarydata retrieved from an unsafe buffer [HL03].6.3.2 Conformity to IntentionTo protect the integrity of SQL statements, our token type conformity heuristic, and someexisting approaches, use pre-defined taint policies, implicitly or explicitly, to specify wherein an SQL statement the untrusted data are allowed, and then check at runtime whether anintercepted SQL statement conforms to those policies. Based on the pre-defined taint policies,these approaches employ various mechanisms to track tainted data, and distinguish them in adynamic query. However, while these approaches are effective, by using static taint policies andnot taking developers? intentions into account, false positives are possible (as we demonstratedin Example 1).Instead of using pre-defined taint policies, we take the issue of explicit information-flowone step further, and treat SQLIA as a problem of detecting whether a given SQL queryconforms with the original intention of the application developer. Our second heuristic, whichwe labeled as ?conformity to intention,? allows discovery of the intended syntactical structure ofa dynamic SQL statement at runtime, and performing validation on the SQL statement againstthe dynamically identified intention. To the best of our knowledge, there is no dynamic SQLIAdetection and prevention technique that employs a concept similar to ?conformity to intention?.Intention StatementWeb application developers typically specify the intended syntactical structure of an SQL state-ment using placeholders directly in code. For instance, the following Java code constructs adynamic SQL statement by embedding parameter values from an HTTP request (each param-eter might also pass through a sanitizer function):Example 2. Typical Java code for constructing an SQL statement with the use of an HTTPrequest object:1106.3. Approachselect statement ::= ??SELECT?? select list from clause [where clause][order clause]select list ::= ??*?? | id listid list ::= ID | ID ??,?? id listfrom cause ::= ??FROM?? id listwhere clause ::= ??WHERE?? cond { (??AND?? | ??OR?? ) cond }cond ::= value OPERATOR valuevalue ::= ID | STRING | NUMBERorder clause ::= ??ORDER BY?? id listFigure 6.3: A simplified SQL SELECT statement grammar written in Backus-Naur Form(BNF).statement= "SELECT book_name," + request.getParameter("p1")+ " FROM " + request.getParameter("p2")+ " WHERE book_id=?" + request.getParameter("p3") + "? "+ request.getParameter("p4");The intended syntactical structure of the SQL statement in the above example can beexpressed as shown in code Fragment 6.1, where an underlined question mark is used to indicatea placeholder:"SELECT book name,? FROM ? WHERE book id=??? ?" (6.1)We refer to such a parameterized SQL statement as an intention statement. Our approach relieson per-character taint tracking for deriving intention statements during runtime. When an SQLstatement is intercepted, our taint tracker marks every character in a token as tainted when thetoken contains one or more tainted characters. Our approach constructs an intention statementby replacing each consecutive tainted substring in a dynamically constructed SQL statementwith a special meta-character. Thus, when the SQL statement ?SELECT book name,priceFROM book WHERE book id=?SQLIA? ORDER BY price? is intercepted, our approach substi-tutes each tainted substring with the placeholder meta-character (?) to form an intention state-ment, as shown in code Fragment 6.1.A placeholder in an intention statement represents an expanding point, where each expansionmust conform to the corresponding grammatical rule intended by the developer. We denote aplaceholder?s corresponding grammar rule as an intention rule, which regulates the instantiationof a placeholder at runtime. Each intention rule maps to an existing nonterminal symbol (e.g.,SELECT list) or terminal symbol (e.g., string literal or identifier) of a given SQL grammar. Thecollection of intention rules of an SQL statement serves as the intended syntactical structure,and can be discovered by using an SQL parse tree.Intention Tree and Intention GrammarAn intention statement is a string without explicit structure. To identify the intention rulesof an intention statement, we use an SQL parse tree. Our approach constructs a parse tree1116.3. ApproachFigure 6.4: The intention tree of the intention statement from Fragment 6.1. Oval boxesrepresent nonterminal symbols, square boxes represent terminal symbols, and dash-lined boxesare placeholders. The grammar rules for each placeholder are (from left to right) two id lists,a STRING LIT, and an order clause.(referred to in this chapter as an intention tree) from an intention statement to represent theexplicit syntactical structure of an intention statement. Figure 6.4 illustrates an intention treefor the intention statement in Fragment 6.1, based on the simplified SQL SELECT statementsample grammar shown in Figure 6.3. The sample grammar consists of a set of productionrules, each of the form ? ::= ?, where ? is a single nonterminal symbol, and ? is any sequenceof terminals and/or nonterminals. In the example from Figure 6.3, the select statement isthe start symbol. A parse tree represents the sequence of rule invocations used to match aninput stream, and can be constructed by deriving an SQL statement from the start symbol ofthe given SQL grammar. For each grammar rule ? ::= ? matched during the derivation process,the matched rule forms a branch in the parse tree, where ? is the parent node, and ? representsa set of child nodes of ?. A nonterminal symbol ? in ? would be replaced by another grammarrule that matches the nonterminal symbol ?, which in turn forms another branch originatedfrom ?. During construction of an intention tree, the placeholder meta-character represents aspecial type of token that can match any nonterminal and terminal symbols during derivation.In addition, lookahead on input data corresponding to a placeholder are used to distinguishalternatives. The derivation process continues recursively until all input tokens are exhausted.In Figure 6.4, oval boxes represent nonterminal symbols, square boxes are terminal symbols,and dash-lined boxes contain placeholders. In an intention tree, a placeholder is an expandingnode. The branch expanded from a placeholder must follow the placeholder?s intention rule.Given an intention tree, our approach uses the grammar rule of each placeholder?s parent nodeas the intention rule for each placeholder. For the intention tree depicted in Figure 6.4, theintention rules of the three placeholders are as follows: (from left to right) two identifier lists(id list), a string literal (STRING LIT), and an ORDER BY clause (order clause), respec-tively.In addition to intention rules, the intended structure of a dynamic SQL statement includes1126.3. ApproachAlgorithm 2: IsMaliciousSQLInput: SQL statement sInput: s taint information tInput: SQL grammar GOutput: A boolean value indicate whether s is malicious or notintention statement: si ? construct(s, t);intention tree: Y ? parse(si, G);intention grammar: Gi ? derive(Y );if parse(s, Gi) failed thenreturn true;elsereturn false;end ifconstant symbols that are specified by developers at design-time. The intended constant sym-bols of an SQL statement can be represented by leaf nodes of an intention tree, excludingplaceholder nodes. By walking through all leaf nodes of an intention tree, and replacing eachplaceholder with its intention rule, a new grammar rule can be derived for that specific dynamicSQL statement. We refer to the grammar rule derived from an intention tree as an intentiongrammar. For instance, code Fragment 6.2 shows the intention grammar derived from the inten-tion tree in Figure 6.4, where double-quoted strings represent constant terminal symbols (e.g.,?SELECT book name,?), and id list, STRING LIT, and order clause are existing grammarrules."SELECT book name," id list " FROM " id list"WHERE book id=?" STRING LIT "? " order clause(6.2)Detection of SQLIAsOnce an intention grammar is derived, an SQLIA can be detected by parsing the dynamic SQLstatement using its intention grammar. If the dynamic SQL statement can be recognized by itsintention grammar, then it is a benign statement; otherwise, it is malicious. For instance, whilestatements in both code Fragments 6.3 and 6.4 yield the same intention grammar (as shown incode Fragment 6.2), only the statement in Fragment 6.4 is malicious, as it does not conform tothe intention grammar.SELECT book name, price FROM bookWHERE book id=?SQLIA? ORDER BY price(6.3)SELECT book name, price FROM bookWHERE book id=?SQLIA?ORDER BY price; UPDATE users SET password=null(6.4)1136.3. ApproachOur algorithm for SQLIA detection (Algorithm 2) employs taint tracking and intentiongrammar derivation. The algorithm takes an SQL statement s, taint information t about s, andan SQL grammar G as arguments, and then returns a boolean to indicate whether the taintedSQL statement is malicious or not. The algorithm first constructs an intention statement si froman SQL statement s by replacing each consecutive tainted string in s with a meta-character.The algorithm then parses si using an SQL grammar G to construct an intention tree Y . Oncethe intention tree is constructed, the algorithm derives an intention grammar Gi by traversingthrough the leaf nodes of Y . If s can be parsed by Gi, the algorithm returns false; otherwise,it returns true to indicate that the intercepted SQL statement is malicious.Intention discovery reduces the rate of false positive in the SQL detection logic. However, theintended structure expressed by a developer might allow an SQLIA to pass through. To preventSQLIAs from a programmer?s permissive intention, our ?conformity to intention? heuristicemploys a baseline policy to restrict where in an SQL statement the untrusted data are allowed.In our design, in addition to literal tokens, only identifier tokens (e.g., table name, column name)and order by, group by, and having clauses are permitted to contain tainted data.As with all existing SQLIA detection techniques that rely on SQL grammar parsing (e.g.,SQLGuard [BWS05], SQLCheck [SW06], CANDID [BBMV07]), grammatical differences be-tween the detection engine and the back-end database could potentially cause false positives.Nevertheless, for ?token type conformity?, the SQL lexical analyzer in our approach is requiredonly to be able to distinguish between literals and non-literals. Even though most databasevendors develop proprietary SQL dialects (e.g., Microsoft TSQL, Oracle PL-SQL, MySQL) inaddition to supporting standard ANSI SQL, the lexical analyzer required for our approach cansimply treat all non-literal tokens equally and disregard the syntactical differences among SQLdialects due to different non-literal tokens supported. For instance, we used SQLPrevent withMySQL without any modification to the SQL lexer, even though the lexer was originally de-signed for Microsoft SQL Server. For intention discovery, we used ANSI SQL grammar duringevaluation. Our implementation of SQLIA detection module can be configured to use differentSQL dialects, and we are currently evaluating SQLPrevent with a real-world web applicationthat uses Oracle as a back-end database.Due to space limitations, we only summarize the complexity-analysis results of proposeddetection logic here. For Algorithm 1, the computational complexity is O(N), where N is thelength of the SQL statement in characters. For Algorithm 2, the computational complexity isthe same as the worst-case complexity for constructing a parse tree, which is as follows:?????O(N) if G is LALRO(N2) if G is not LALR but deterministicO(N3) if G is non-deterministic1146.4. Implementation6.4 ImplementationIn this section, we explain the implementation of SQLPrevent in J2EE, ASP.NET, and ASP.Our description is organized around the SQLPrevent architecture depicted in Figure 7.2.6.4.1 HTTP Request InterceptorFor J2EE, HTTP Request Interceptor is implemented as a servlet filter that intercepts HTTPrequests. For each intercepted HTTP request, a separate instance of TaintMark ?wraps? theintercepted request. From this point on, on each access to the value of the request parameter,TaintMark calls the wrapped HTTPServletRequest object to get the value, marks it as tainted,and only then returns it to the caller.6.4.2 Taint TrackerThe purpose of Taint Tracker is to mark the source of each character as either tainted ornot, in an intercepted SQL statement. For J2EE, the Taint Tracker module is implementedas a set of taint-enabled classes, one for each string-related system class?such as String,StringBuffer, and StringBuilder. Taint Tracker provides dynamic per-character trackingof taint propagation in J2EE web applications. Each taint-enabled class has exactly the sameclass name and implements the same interfaces as the corresponding Java class?in fact, theyare identical from a web application point of view. In order to specify the taintness of eachcharacter in a string, each taint-enabled class has an additional data structure referred to astaint meta-data, and a set of functions for manipulating this structure. In Taint Tracker forJ2EE, taint meta-data is implemented as an array of booleans, with its size equal to the numberof characters of the corresponding string. Each element in the array indicates whether thecorresponding character is tainted or not. For taint tracking, the taint-enabled classes propagatetaint meta-data during string operations. In order to replace existing system classes with TaintTracker at runtime, a Java Virtual Machine (JVM) needs to be instructed to load taint-enabledclasses instead of the original ones. For instance, we used the -Xbootclasspath/p:<path totaint tracker> option to configure Sun JVM to prepend the taint tracker library in front ofthe bootstrap class path.6.4.3 SQL InterceptorSQL Interceptor for J2EE extends P6Spy [MGAQ03], a JDBC proxy that intercepts and logsSQL statements issued by web application programming logic before they reach the JDBCdriver. JDBC is a standard database access interface for Java, and has been part of JavaStandard Edition since the release of SDK 1.1. We have extended P6Spy to invoke the SQLIADetector when SQL statements are intercepted.1156.5. EvaluationFigure 6.5: Design of the evaluation testbed.6.4.4 SQL Lexer, Intention Validator and SQLIA DetectorThe SQL Lexer module is implemented as an SQL lexical analyzer. This module converts asequence of characters into a sequence of tokens based on a set of lexical rules, and determinesthe type of each token during scanning. SQLIA Detector takes an intercepted SQL statement asinput, passes the intercepted SQL statement to the SQL Lexer for tokenization, then performsdetection according to Algorithm 1. When a potential SQLIA is detected, SQLIA Detectorpasses the intercepted SQL statement to Intention Validator to check whether the queryconforms to the intended syntactical structure of the designer, based on Algorithm 2. If anSQLIA is identified, the detector throws a necessary security exception to the web application,instead of letting the SQL statement through.6.4.5 Design Details Specific to ASP and ASP.NETSQLPrevent was originally implemented in J2EE, and subsequently ported to ASP.NET andASP in order to assess the degree to which our approach is generalizable and portable. In addi-tion, we wanted to offer to the community a means of protecting legacy ASP applications. Whilethe implementations of SQL Lexer, Intention Validator, and SQLIA Detector are identi-cal among platforms (except the languages used), the design of HTTP Request Interceptor,Taint Tracker, and SQL Interceptor is specific to each execution environment. In particu-lar, we used .NET profiling API [Pie01] and Microsoft Intermediate Language re-writing tech-niques [Mik03] to intercept SQL statements in ASP.NET. For ASP, we utilized a techniqueknown as universal delegator [Bro99] to intercept SQL statements generated from ActiveXData Object.6.5 EvaluationWe evaluated SQLPrevent both in the lab environment and in the field. In the lab experiment,SQLPrevent was evaluated using the testbed suite from project AMNESIA [HO05]. We chosethis testbed because it allowed us to have a common point of reference with other approachesthat have used it for evaluation [HO05, SW06, HOM06, BBMV07, Sek09]. For evaluation infield, we deployed and tested SQLPrevent on one of web applications from our institute. In thissection, we first present our in-lab evaluation procedures and results, then the field evaluation.1166.5. EvaluationFigure 6.6: Detection and prevention performance evaluation. tb and tm are round-trip responsetime with SQLPrevent deployed, measured using benign and malicious requests, respectively.6.5.1 Experimental SetupThe experimental set up is illustrated in Figure 6.5. The testbed suite consisted of an auto-matic testing script in Perl and five web applications (Bookstore, Employee Directory, Classi-fieds, Events, and Portal), all included in the AMNESIA testbed. Each web application camewith the ATTACK list of about 3,000 malformed inputs and the LEGIT list of over 600 legiti-mate inputs. In addition to the original ATTACK list, we produced another set of obfuscatedattacks by obscuring the attack inputs that came with AMNESIA using hexadecimal encod-ing, dropping white space, and inline comments evasion techniques to validate the ability ofSQLPrevent to detect obfuscated SQLIAs. To test whether the intention-validator module iscapable of performing SQLIA detection without causing false positives, we modified each JSPin the testbed to intentionally include user inputs to form ?ORDER BY? clauses in each dynamicSQL statement when an additional HTTP parameter named ?orderby? is presented. We thenmodified the ATTACK and LEGIT lists by appending the additional parameter for each test-ing trace. To test whether the SQL lexer module is capable of performing lexical analysis ina database-independent way, we configured Microsoft SQL Server and MySQL as back-enddatabases. SQLPrevent was tested with each of the five applications, and each of the twodatabases resulting in 10 runs.6.5.2 EffectivenessIn our experiments, we subjected SQLPrevent to a total of 3,824 benign and 15,876 maliciousHTTP requests. We also obfuscated the requests carrying SQLIAs and tested SQLPreventagainst them, which resulted in doubling the number of malicious requests. We then repeatedthe experiments using an alternative back-end database. In total, we tested SQLPrevent withover 70,000 HTTP requests. None of these requests resulted in SQLPrevent producing a falsepositive or false negative.6.5.3 EfficiencyWe measured the performance overhead of SQLPrevent for two modes of operation: when theweb application receives one request at a time, and when it is accessed concurrently by multiple1176.5. Evaluationweb clients. First we describe the experimental setup common to both modes, then discussspecifics of experiments for each mode and the results.To make sure the performance measurements were not skewed by hardware, we performedthem on both low-end and high-end equipment. For the low-end configuration, the web appli-cations and databases were installed on a machine with a 1.8 GHz Intel Pentium 4 processorand 512 MB RAM, running Windows XP SP2. The automatic test script was executed on ahost with a 350 MHz Pentium II processor and 256 MB of memory, running Windows 2003SP2. These two machines were connected over a local area network with 100 Mbps Ethernetadapters. Round-trip latency, while pinging the server from the client machine, was less than 1millisecond on average. For the high-end configuration, the testing script and web applicationswere installed on two identical machines, each equipped with eight Intel Xeon 2.33 GHz pro-cessors and 8 GB of memory, running Fedora Linux 2.6.24.3. Round-trip latency was less than0.1 millisecond on average in this configuration.Sequential AccessTo measure the performance characteristics of SQLPrevent, we used nanosecond API in J2SE1.5 and employed two sets of evaluation data. The first set was used for measuring detectionoverhead, which is the time delay imposed by SQLPrevent for each benign HTTP request.To calculate detection overhead, we measured the round-trip response time with SQLPreventfor each benign HTTP request, as shown in Figure 7.6, and applied the following formula:Detection Overhead = (tr+ts)/tb, where tr and ts are the time delays for the request interceptorand SQLIA detector, respectively, and tb is the round-trip (from A to C in Figure 7.6) responsetime when a benign SQL statement is detected.The second set of data was for measuring prevention overhead, which is the overhead im-posed by SQLPrevent when a malicious SQL statement is detected and blocked. Preventionoverhead shows how fast SQLPrevent can detect and prevent an SQLIA. If either overheadis too high, the system could be vulnerable to denial-of-service attacks that aim for resourceover-consumption. To ensure that SQLPrevent would not impose high overhead when block-ing SQLIAs, we conducted another performance experiment and used the following formula tocalculate prevention overhead : Prevention Overhead = (tr + ts)/tm, where tr and ts are thetime delays for request interceptor and SQLIA detector, respectively, and tm is the round-trip(from A to B) response time when a malicious SQL statement is detected and blocked.For each web application, Table 6.1 shows the average detection overhead and preventionoverhead each with its corresponding standard deviation. When averaged for the five testedapplications, the maximum performance overhead imposed by SQLPrevent was 3.6% (withstandard deviation of 1.4%). This overhead was with respect to an average 30 millisecondsresponse time observed by the web client.1186.5. EvaluationOverhead(%)Subject Detection PreventionAvg Std Dev Avg Std DevBookstore 1.2 0.6 3.4 1.1Employee 1.7 0.7 4.3 1.5Classifieds 1.5 0.7 3.6 1.5Events 3.3 1.4 4.2 2.3Portal 1.9 0.9 2.5 0.5Average 1.9 0.9 3.6 1.4Table 6.1: SQLPrevent overheads for cases of benign (?detection?) and malicious (?prevention?)HTTP requests.Concurrent AccessTo test SQLPrevent performance overhead under a high volume of simultaneous accesses, weused JMeter [Apa07], a web application benchmarking tool from Apache Software Foundation.For each application, we chose one servlet and configured 100 concurrent threads with fiveloops for each thread. Each thread simulated one web client. We then measured the averageresponse time with SQLPrevent and applied the prevention overhead formula to calculate theoverhead. During stress testing, SQLPrevent imposed an average 6.9% (standard deviation1.3%) performance overhead, with respect to an average of 115 milliseconds response time forall five applications and both databases.6.5.4 Field EvaluationTo obtain more realistic data on the accuracy and efficiency of SQLPrevent, we collaboratedwith UBC IT to deployed SQLPrevent on to CTConnect2, an online course management toolused to administer and manage online WebCT Vista course sections. CTConnect2 is a webapplication developed in J2EE, using Oracle as its back-end database. The application isaccessible only to faculty administrators.SQLPrevent was deployed and tested on CTConnect2 between June to September 2009, aduration of three months. The scale of CTConnect2 is relatively small, with 37 unique accessibleURLs. We manually went through all accessible URLs and injected each HTTP parameter withSQL injection payloads. SQLPrevent reported nine instances of SQL injection vulnerabilitiesamong seven unique Java classes. Interestingly, one of the identified injection vulnerabilitywas caused by a predefined parameter value in an HTTP request intentionally included by thedevelopers to form an ?ORDER By? clause in an SQL statement. This particular vulnerabilitycould trigger a false positive alarm during benign operations if the developers intention is nottaken into account; that is, without checking our proposed ?conformity to intention? heuristic.For efficiency, we observed SQLPrevent imposes a neglectable performance overhead, with 30to 50 milliseconds response time observed by web clients.1196.6. Discussion6.6 DiscussionIn our evaluation, SQLPrevent produced no false positives or false negatives, and imposed lowruntime overhead on the testbed applications. In addition to high detection accuracy and lowperformance overhead, the advantages of our technique are in its automatic adaptability todeveloper?s intentions, and its ease of integration with existing web applications.SQLPrevent can be easily integrated with existing web applications. For instance, in orderto protect a J2EE application with SQLPrevent, the administrator needs to (1) deploy theSQLPrevent Java library into the J2EE application server, (2) configure the HTTP RequestInterceptor filter entry in the web.xml, (3) replace the class name of the real JDBC driverwith the class name of SQL Interceptor in the configuration settings, and (4) configure theJVM to prepend the Taint Tracker library in front of the bootstrap class path. For ASP.NETand ASP, deploying SQLPrevent is a matter of copying and registering the binary components.We ported SQLIntention to ASP.NET and ASP to assess the generalizability of our ap-proach, and to offer protection for legacy web applications. Legacy web applications are nat-ural targets of SQLIAs, since most vulnerabilities are known by attackers, and the resourcesfor prevention and protection required from development or administration might have beenre-allocated to other projects. To the best of our knowledge, none of the existing dynamicSQLIA detection techniques have been ported to ASP. The lack of support for ASP is mainlydue to the lack of a standard mechanism for intercepting SQL statements in ASP. Furthermore,the ASP runtime environment cannot be modified. ASP web applications have been the targetof waves of massive SQLIAs from October 2007 to April 2008 [Kei08]. As a consequence ofthese attacks, more than half a million web pages have been infected with malicious JavaScriptcode that redirects the visitors of compromised web sites to download malware from malicioushosts [PMRM08]. Our approach can be integrated into an existing web application with a fewconfiguration setting changes. Security protection without additional effort from developersand administrators is vital to the protection of legacy web applications.The approach proposed in this chapter is not a replacement for all other defences againstSQLIAs; it offers an alternative point in the trade-off space. Open-source and some otherapplications?source code for which can be analyzed and, if necessary, modified by the applica-tion owners?make those approaches that employ static analysis and/or alteration of the sourcecode viable. For applications where an additional overhead of 2-5% is unacceptable, static de-tection and elimination of SQLIA vulnerability identification, or even the use of parameterizedquery APIs, would be more appropriate. Our approach offers the ability to protect existingapplications effectively, efficiently, and without having to depend on application vendors ordevelopers.The concept of token type conformity and conformity to intention can be applied to othertypes of web application security problem such as cross-site scripting (XSS) and remote com-mand injection, for which taintness of tokens can be analyzed and the intended syntacticalstructures can be dynamically discovered. For instance, a web application can check whether1206.7. Summarytainted data is used to construct script elements in the Document Object Model (DOM) of adynamically generated HTML page to prevent XSS attacks.6.7 SummarySQL injection vulnerabilities are ubiquitous and dangerous, yet many web applications deployedtoday are still vulnerable to SQLIAs. Although recent research on SQLIA detection and pre-vention has successfully addressed the shortcomings of existing SQLIA countermeasures, thepossibilities of false positive and the effort needed from web developers have limited adoptionof these countermeasures in some real world settings. In this chapter, we presented a novelapproach to runtime SQLIA protection, as well as a tool (SQLPrevent) that implements ourapproach. Our experience and evaluation of SQLIntention indicate that it is effective, efficient,easy to deploy by web administrators.121Chapter 7Secure Content Sharing BeyondWalled GardensWith Web 2.0, the user is both a consumer and provider of Web content [Ore07]. It is thusessential to design a usable mechanism from Web development communities for web usersto share their personal content with each other in a controlled manner across boundaries ofwebsites. While an Internet-scale adoption of a web SSO solution enables a web user to beuniquely identified on the Web, however, a user-centric access control mechanism is still needed.In this chapter, we explore and evaluate a preliminary design of a distributed access controlscheme that allows a user?s access-control policies authored and hosted on one site being reusedon other content-hosting or service-provider (CSPs) websites. The proposed sharing mechanismis based on the OpenID and OAuth protocols with use-case scenario in which content ownerand requestor may be unknown to each other.Today?s Web is mostly site centric; for each service provider, a web user has to maintaina separate copy of identity and access-control policy. In this chapter, we use the term ?walledgarden? to refer to such an administrative domain defined by a service provider. Because eachwalled garden controls its own set of users and employs a different access control mechanism toprotect personal content, it is difficult to share personal content beyond walled gardens. For thepurpose of illustrating our discussion, we will use the following scenario of web content sharingas a running example:Scenario 1. Alice is a Girl Scout in the Colonial Coast Adventures (CCA) club. She tookpictures at a scout training event, and would like to use her favorite photo web site MyPhoto.comto share those photos online. In CCA, the policy is that pictures of training events can onlybe seen by CCA troop members and their parents. Alice would like to implement this policyand limit access to her photos accordingly. Jenny is another CCA member and Mary is hermother. In order for Mary to access Alice?s photos for this event, Mary has to prove that she isthe parent of Jenny and that Jenny is a CCA member. However, neither Jenny nor Mary areregistered members of MyPhoto.com, and Alice does not know Jenny and Mary.Personal content sharing is currently available in limited forms, with two main contentsharing mechanisms offered by CSPs. The first one is to make user content public. Obviously,this is inadequate for controlled sharing. The second one is the walled garden approach. Withthis approach, the user who ?owns? content can grant permissions directly to other users (oruser groups) within the same CSP. This walled-garden approach is easy to implement and use.122Chapter 7. Secure Content Sharing Beyond Walled GardensIts main limitation is that not all the desired content users (e.g., Girl Scouts and their parents)are necessarily registered with the corresponding CSP; and thus, users outside of that CSPcannot be granted selective access. Even within the same walled garden, the resource requesterand owner might not be known to each other (e.g., Alice does not know some other Girl Scoutsand their parents who use MyPhoto.com), increasing the challenge of controlled sharing forboth the owners and consumers of content.To share personal content with unknown users, one possible solution is to adopt a distributedauthorization system that support the notion of trust and delegation, which provides a flexibleway for a user to delegate authority to another user who is in a better position for definingattributes of other users. For example, Alice might trust the CCA troop to define its Scoutmembers, even those members who are unknown to Alice. A user might want to use oneattribute to make an inference about another attribute (e.g., Alice defines all Girl Scouts ofCCA as her friends). In addition, a user might also want to delegate to unknown users basedon their asserted attributes. For example, Alice trusts CCA to define its member Scouts; shethen delegates the authority over the ?parent? attribute to those scout members.There are many existing distributed authorization systems (e.g., KeyNote [BFIK99], SP-KI/SDSI [EFL+99], RT [LMW02]), each with a different level of delegation support. However,because Web 2.0 access policies for personal content are authored by users without special tech-nical skills, and are enforced by mutually untrusted walled gardens, there are many challengesremaining to address. The first challenge is usability. The expressive power of a policy languagemust be balanced with usability. An average web user must be able to comprehend the languageto ensure that an access policy matches the owner?s sharing intention. To be usable, the user-experiences provided by the sharing mechanism must leverage the skills and experiences thata web user already has. Interoperability is the another challenge. Access-control policies thatare authored in one policy provider must be employable to protect personal content residing onmultiple CSPs. In addition, the solution should work under current walled-garden restrictionswithout requiring CSPs to change their existing access-control mechanisms.In addition to usability and interoperability, granularity of control and accountability shouldbe considered as well. Content created by web users is diverse and sometimes complex, a contentowner should be able to specify access-control in a fine-grained terms. For example, a contentowner might want to protect a photo in an album, an event in a calendar, or even a paragraphwithin a blog. For accountability, a content owner should be able to know which data is beingaccessed, by who and when, and be able to revoke an authorization at anytime if necessary.To address the aforementioned challenges, the work presented in this chapter explores apossible solution based on the following design goals:? Users are not required to set up a separate account and password on each service providerto view shared content.? The user is assumed to be equipped only with a web browser. The sharing mechanismshould be built upon the existing Internet infrastructure and open standards. It should123Chapter 7. Secure Content Sharing Beyond Walled Gardensnot require any special software (e.g., browser plug-in, local proxy) being installed onend-user computers, or require public key/secret key, X.509, or SPKI/SDSI certificatesmanaged by a user to perform cryptographic operations.? CSPs are not required to change their existing access-control mechanisms.To achieve our design goals, we first reviewed existing literature to understand user contentsharing practices. The most important findings that related to our work is that email is the mostcommonly used sharing mechanism [VEN+06, Wha08, ME07], and users tend to treat sociallydefined classes (e.g., friends, co-workers, family) the same when sharing [VEN+06, Wha08]. Inaddition, we studied current sharing solutions provided by CSPs that enable users to share theircontent beyond walled gardens. Most CSPs (e.g., Google, Flickr, Facebook) support sharingthrough a secret-link, which is a hard-to-guess URL that uniquely identifies a shared resource(e.g., http://picasaweb.google.com/Alice?sl=Gv1sRgCOzuv). When a resource owner sharespersonal content using a secret-link, the corresponding CSP creates a special URL for thatresource. Anyone who knows the secret-link can access the content. To share specific personalcontent, a resource owner (sometimes with the aid of the CSP) sends the secret link via email toselected users. Message recipients view the shared content by clicking on the link. Secret-linksare easy to use for both owners and users, and are easy to implement by CSPs. Secret-linksprovide a certain degree of control over sharing since only those who obtain (or guess) the linkcan access the content. However, the use of secret-links are not secure as they can be forwardedto unauthorized users. Another limitation is that a content owner has to know the recipient?semail explicitly, which might be impractical in some sharing scenarios (e.g., Alice doesn?t knowJenny and Mary).In this chapter, we propose a new approach for secure Web 2.0 content sharing beyondwalled gardens. Based on the aforementioned findings, the main ideas behind our approach are:(1) reuse existing email accounts for global identification, (2) extend user?s social circle withthe notion of trust and delegation for access control, and (3) leverage the existing secret-linkmechanism for content sharing. Our approach has two main components: Email2OpenID andOpenPolicy providers. An Email2OpenID provider is service provider that is augmented withan OpenID [RF07] identity service and an email-to-OpenID mapping service. Email2OpenIDproviders enable web users to use their email to login CSPs while remain using OpenID identifierfor user identification and service discovery. An OpenPolicy provider is a policy hosting providerthat offers services for internet users to organize their access polices, and for CSPs to makeauthorization decisions.To evaluate our approach, we implemented a prototype on Facebook to allow Facebook usersto share their photo albums with non-Facebook users. With our approach, the user experiencesfor content sharing are similar to the existing secret-link sharing mechanism. Content owners usetheir contact-lists hosted on OpenPolicy providers to specify delegation-enabled access policies.Using existing email accounts, content requesters do not need to setup an account on Facebookand do not require any special software installed to view shared content. Moreover, content1247.1. Background and Related Workowner and requestor may be unknown to each other.The rest of the chapter is organized as follows. The next section discusses background andrelated work. Section 7.2 presents the detailed design of our proposed solution, and Section 7.3discusses the evaluation methodology and results. To make our proposed content sharing schememore usable for content requestor, we propose an identity-enabled browser in Section 7.4, andsummarize the chapter in Section 7.5.7.1 Background and Related WorkIn order to develop requirements for a mechanism to share personal web content beyond walledgardens, we reviewed existing literature to understand user content sharing practices and iden-tify the breakdowns users encounter when sharing. As we aimed to design a solution that canbe operated across administrative boundaries, we also studied existing solutions in distributedauthorization systems. In this section, we summarize the lessons we learned from existingresearch and discuss related work.7.1.1 User Content Sharing PracticesTo explore preferences for general information sharing, Olson et al. [OGH05] investigated whatcontent is shared and with whom. They find that participants abstract the details of shar-ing into high-level classes of recipients and information which are treated similarly. Voida etal. [VEN+06] studied the sharing practices at a medium-size research organization to identifythe types of content they share and with whom, the mechanisms they use to share, and howmuch control over the information they grant to the sharing recipients. They identified 34different types of files that are shared among colleagues, friends, and families. One of the mostimportant findings related to our work is that email is the most common mechanism for sharing(45%), followed by network folders (16%) and posting content to a web site (11%). The studyalso identified the breakdowns that users have experienced in their file sharing activities. Themain classes of breakdowns are (1) difficulties in selecting a sharing mechanism with desiredfeatures that are available to all sharing participants, (2) forgetting what files had been sharedand with whom, and (3) problems in knowing when new content was made available. Similarly,Whalen [Wha08] conducted studies to investigate the file sharing practice in both work and per-sonal context of about 200 employees at a US research institution. Most of her results confirmthe findings made by Voida et al. In addition, Whalen et al. identifies the factors that influencethe choice of sharing method used. She also found that a lack of activity information (e.g., whoand when) on the shared files could be problematic for both security and collaboration.Miller et al. [ME07] conducted an empirical study of the photo sharing practices in Flickr.com.They found that privacy-concerned users primarily use e-mail, supplemented with web galleries,to control the privacy level of different photos. The perception of using email for sharing bythose users is that an e-mail message is intentional, requires no setup, and is targeted at a1257.1. Background and Related Workspecific list of recipients. Miller et al. suggest that a sharing solution should look and feel muchlike e-mail, but with a more robust underlying framework geared to photo sharing.7.1.2 Distributed Authorization and Background of RTIn decentralized environments such as the Web, the content owner and the requestor often areunknown to each other (e.g., Alice does not know Mary and Jenny). There is a substantialbody of literature addressing the problem of authorization within distributed environments.PolicyMaker [BFL96] coined the term ?trust management? to denote an access control modelin which authorization decisions are based on locally stored security policies and distributedcredentials (signed statements), without explicit authentication of a requestor?s identity and acentralized repository of access rules. Policies and credentials in PolicyMaker consist of pro-grams written in a general programming language such as AWK. Although general, it is veryhard to understand the overall access policy for a protected resource. KeyNote [BFIK99], thenext version of PolicyMaker, uses a C-like notation and regular expression syntax for describingconditions. SPKI/SDSI [EFL+99] is a digital certificate scheme for authorization, which pro-vides methods for binding authorization privileges to keys and for localized name spaces andlinked local names. A credential in KeyNote and SPKI/SDSI delegates certain permissions froman issuer to a subject. A chain of credentials can be viewed as a capability which authorizes thesubject at the end of the chain. KeyNote and SPKI/SDSI do not support attribute inferencingand attribute-based delegation. RT [LMW02] is a family of languages that add the notion ofRBAC to the concept of trust management systems such as KeyNote and SPKI/SDSI.In our approach, RT [LMW02] is employed for expressing access-control policies. The RTlanguage is a family of role-based trust-management languages for representing policies andcredentials in distributed environments. RT combines the strength of role-based access con-trol (RBAC) [SCFY96] and trust-management (TM) [BFL96] systems to form a concise andexpressive language.All policy statements and credentials in RT take the form A.r ?? exp, where A is an entity,r is a role, and exp is a role expression (a sequence of entities and roles). In this chapter, wecapitalize the first character of entities and use lower-case to represent roles. An entity in RT isa uniquely identified individual or process that can issue credentials and make requests, and arole is a set of entities who are members of this role. The above credential A.r ?? exp meansthat members(A.r) ? members(exp) (i.e., exp is a member of A.r).There are four types of credentials in RT, each corresponding to a different way of definingrole membership and a different level of delegation:? Type 1 A.r ?? B : An A defines an entity B to be the member of role r. For example,CCA certifies Alice as its Girl Scout member (CCA.scout ?? Alice), or Jenny assertsthat Mary is her parent (Jenny.parent??Mary).? Type 2 A.r ?? B.r1 : The role A.r is defined to contain every entity that is a member of1267.1. Background and Related WorkB.r1 role. This statement can be used to represent a simple delegation from A to B, sinceB may affect the members of A.r by issuing new credentials. For instance, Alice definesall Girl Scouts in CCA as her scout friends Alice.scout?? CCA.scout. The members ofthe Alice.scout role will dynamically change as CCA revokes or issues new credentials forits Scout members.? Type 3 A.r ?? A.r1.r2 : The role A.r is defined to contain B.r2 for every B that is amember of A.r1. This represents a delegation from A to the members ofA.r1. For example,Alice trusts CCA to define its scout members (Alice.scout ?? CCA.scout), and thendelegates the authority over ?parent? to those member scouts (Alice.scout parent ??Alice.scout.parent).? Type 4 A.r ?? B1.r1?? ? ??Bk.rk : A.r is defined to contain the intersection of all the rolesB1.r1, ? ? ? , Bk.rk. This represents partial delegation from A to B1 ? ? ?Bk. For instance,Alice may share certain content with her close friends who are classmates Lake-SideElementary School (LSES), and are from the same scout troop (Alice.close friend ??CCA.scout ? LSES.class 2006).7.1.3 Related WorkMicrosoft Live Mesh [Mic09a] aims to provide a centralized web location for a user to storepersonal content that can be accessed and synchronized across multiple devices (e.g., computersand mobile phones). The user is able to access the uploaded content through a web-based LiveDesktop or her own devices with Live Mesh software installed. Dropbox [Dro09] offers a similarpersonal content sharing solution. When a user joins a shared folder, the folder appears insidetheir Live Desktop (or Dropbox), and syncs to their computers and devices automatically.Both solutions are easy to use, however, they accept only users within their own administrativedomain and the sharing is explicit for each individual user (i.e., no grouping or delegation).YouServ [BJAGS02] enables users to share their content using their personal computers byleveraging technologies in personal web servers, dynamic DNS, proxies, and replications. Auser?s YouServ content remains available and accessible even if they are using a dynamically-assigned IP address, or when the user?s PC is offline (through a peer replicated site) or firewalled(through a proxy site). All YouServ-hosted content is publicly accessible unless it is containedwithin private folders. To control access, YouServ provides a single-sign-on authenticationservice for the YouServ community, and content owners whitelist other users in a local file togrant accesses. ScoopFS [KSC09] is another personal web server-based content sharing solution.The user-interface provided by ScoopFS resembles an email client, and each user has an uniquemailbox identified by a Web-Key [Clo08] (similar to a secret-link). ScoopFS is designed forease of use. The main limitation of ScoopFS is that content recipients need to install a copyof ScoopFS and manually exchange their Web-Keys in order to receive the shared content.Mannan et al. [MvO08] proposed a scheme for personal web content sharing by leveraging the1277.2. ApproachFigure 7.1: User-centric content sharing model.existing ?circle of trust? in Instant Messaging (IM) networks. This scheme enables an owner?spersonal data to be accessible only to her IM contacts. However, both content owner andrequester must be on the same IM network and the proposed system does not support trustand delegation.Relationships between a content user and a content owner are intuitive to web users and arecommonly used to derive authorization decisions by CSPs. Carminati et al. [CFP06] proposedan access control mechanism for web-based social networks, where policies are expressed asconstraints on the type, depth, and trust level of existing social relationships. The proposedsystem requires a special software module running on an end-user?s machine in order to deriveaccess decisions, and delegation is supported in a limited way (only through the same relation-ships along relationship paths). Lockr [TGS+08] is another access control mechanism based onsocial relationships. The main limitation of Lockr is the expressive power of access policies asit simply uses value matching to derive access decisions. Thus, it cannot express delegation ofrelationship authority (i.e., friend?s friend), and cannot denote authorized users using shared at-tributes (e.g., friends from a university). In addition, credentials have to be manually sent fromissuer to recipient; to get access, users have to manually find/search appropriate credentials toconstruct the proof.7.2 ApproachOur proposed content sharing scheme is framed and guided by a user-centric content shar-ing model as illustrated in Figure 7.1. In this model, a user is not only a content owner andconsumer, but a credential issuer as well. A user enrolls a set of identities (e.g., user name/pass-word) from multiple identity providers to represent themselves when accessing shared contentand constructing access polices. A content owner creates personal content on CSPs and asso-ciates that content with access-control polices that are hosted by a policy provider. To accessshared content, a content consumer chooses an appropriate identity to make a request. Eachrequest contains the identity provided by the consumer and a corresponding set of contextinformation. Context information is the meta-data of a request, such as user-specific profileattributes, current location, date/time of the request, and user credentials. A credential is an1287.2. ApproachFigure 7.2: The system architecture of the proposed Web 2.0 content sharing solution.Email2OpenID provider enables web users to use their email to login CSPs while remain usingOpenID URI for identification. OpenPolicy provider offers services for internet users to organizetheir access polices, and for CSPs to make authorization decisions. Users are free from choosingtheir Email2OpenID and OpenPolicy providers.assertion of certified user-attribute from another individual user or an organization authority.To mediate accesses, a CSP requests authorization decisions from a policy provider to protectshared content. The policy provider then acts as a policy decision point (PDP), which re-sponds with authorization decisions based on the context of the request and a set of pre-definedcredentials and access policies.User-centric content sharing requires user-centric authentication and authorization mech-anisms. For user-centric authentication, the user/owner should be able to control their ownidentities, and is free from choosing when and where to use it. For user-centric authorization,the content owner is the author of access policies. Access-control decisions are based on thepolicy associated with the protected content and credentials issued by multiple trusted individ-ual or organization users. The content owner has the freedom to choose policy providers thathost policies and trusted authorities that issue credentials. In a user-centric Web, access policyfollows the user. One access policy hosted in one policy provider should be able to be enforcedto protect shared content residing on different CSPs.7.2.1 System Architecture and Data FlowsAs shown in Figure 7.2, our proposed solution contains two additional players?an OpenPolicyprovider and an Email2OpenID provider, in addition to the existing actors (Owner, User, CSP)from the secret-link sharing scenario. An OpenPolicy provider provides policy-hosting servicesfor web users to organize their credentials and polices, and a set of web services for CSPsto make authorization decisions. An Email2OpenID provider is an existing email providerthat is augmented with both an OpenID identity service and an Email to URL Translation(EAUT) [FN08] service. By combining these two services, our approach allows web users to usetheir email to login to CSPs while using OpenID identifier for identification. Both OpenPolicy1297.2. ApproachFigure 7.3: Flow for sharing content, assuming Owner W has logged into her OpenPolicyprovider P.and Email2OpenID are user-centric, users are free to choose their favorite providers. Thefollowing steps illustrate the sequence for a user to login to a CSP using the Email2OpenIDprotocol:1. User U presents her email e to CSP C.2. C parses the domain d from e (as an email is in the form of user@domain) and prepends thestring ?http://? to d to form an EAUT Discovery Endpoint URL u.3. C retrieves an XRDS-Simple document [HL08] on u, and lookups values representing anEAUT Template or Mapping Service Endpoint URL m.4. C translates or maps e to an OpenID identifier i via m.5. Once C gets back the corresponding OpenID identifier i, the rest of the steps are the sameas the original OpenID protocol.Assume a content owner has logged into her OpenPolicy provider using Email2OpenIDprotocol and has organized a set of credentials and access polices. To share content, the contentowner clicks on the link of the content on a CSP. The CSP generates a secret-link based on thecontent and redirects the content owner to her OpenPolicy provider to specify a set of roles asthe recipient of the shared content. The OpenPolicy provider then sends out the link to eachmember of the designated roles and calls back the CSP with the designated roles to construct anaccess-control list of the shared content. Figure 7.3 illustrates the sequence of steps for sharingcontent as explained below:1. Content owner W specifies that content c residing on CSP C should be shared.2. C generates a secret-link l based on content c.3. C redirects W to her OpenPolicy provider P with secret-link l and a post-back URL b aspart of payload.4. P presents a role-selection user interface to W.1307.2. ApproachFigure 7.4: Flow for accessing a shared content.5. W specifies a set of roles R as the recipients of c. For instance, Alice specifies Alice.scoutand Alice.scout.parent as the roles for recipients.6. For each role r ? R, P sends out l to a set of destination email addresses E = { e | e ?members(r)} (e.g., the members of the Alice.scout role) by performing a distributed mail-ing with other OpenPolicy providers. The details of the distributed mailing protocol arediscussed in Section 7.2.2.7. Once the distributed mailing is completed, P calls b on C with R and l in the payload.8. C finds content c based on l and then stores the tuple (l, c, R) to serve as an access-controllist of c.We now provide the data flow for accessing shared content. A user requests access by pre-senting a secret-link to a CSP. The CSP prompts the user for an email account and redirects theuser to her Email2OpenID provider for authentication. Once authenticated, the Email2OpenIDprovider redirects the user back to the CSP with a claimed OpenID identifier and a token thatthe CSP can verify. After the claimed OpenID identifier is verified, the CSP retrieves the rolesassociated with the shared content and requests an access decision from the owner?s OpenPolicyprovider. For each authorization request, the CSP provides the OpenPolicy provider with theuser?s OpenID identifier and the associated role to determine whether the request should bepermitted. Figure 7.4 illustrates the flows for accessing shared content as explained below:1. To access a shared resource, user U presents l to CSP C (e.g., by clicking on the link in heremail box).2. C prompts U for email e and redirects U to her Email2OpenID provider O with secret-linkl.3. U authenticates herself to O.4. Once authenticated, O redirects U back to C with an OpenID identifier i and l.5. Based on l, C lookups the stored tuple (l, c, R) to find content c and roles R.1317.2. ApproachFigure 7.5: Main components of an OpenPolicy provider.6. For each role r ? R, C requests an access decision from the conetnt owner?s OpenPolicyprovider P with r and i to determine whether the request should be granted.7. P performs a distributed containment query for each r with respect to i. A containmentquery Q takes the form r w i. Q is true if i ? members(r). The access to c is grantedonly if Q holds. The details of the distributed containment query algorithm are discussed inSection 7.2.2.8. C returns c to U if any one of the containment queries returns true.7.2.2 OpenPolicy ProviderAs illustrated in Figure 7.5, the OpenPolicy provider provides (1) a web-based policy editor forusers to construct their online credentials/policies, (2) a web-based sharing module for usersto associate access polices with shared content, (3) a distributed mailing module to send outsecret-links, and (4) a distributed authorization module for CSPs to make access decisions. Atits core is a distributed inference engine, which consists of a membership query module and acontainment query module. The membership query module takes a goal role A.r and a setof credentials C as inputs and computes a set of entity members E of A.r as an output. Foreach e ? E, the distributed mailing module emails a copy of secret-link l to e. Similarly, thecontainment query module takes a goal role A.r and a user U as inputs and returns whetherU ? members(A.r). The containment query result is used by the distributed authorizationmodule to determine wether a request made by a user U should be granted.Our proposed membership and containment query algorithms are based on the notion of acredential graph, as introduced by Li et al. [LWM03]. A credential graph G is a directed graphthat represents a set of credentials C and their relationships. For each credential A.r ?? exp ?C, there is one node for A.r in G, one node for exp, and an edge exp  A.r that links expto A.r. A proof graph Gp is a subgraph of credential graph G that is rooted by a given goalrole and contains additional nodes derived from Type 3 and Type 4 credential statements. Ouralgorithms use the proof graph as a helper data structure for computing the members of a givenrole A.r. To construct a proof graph, our design uses another data structure that we call aproof stack, which is a stack for storing the nodes to be processed during a derivation process.1327.2. ApproachAlgorithm 3 shows the detail of a distributed membership query. It takes a goal role anda set of credentials as inputs and computes a set of members of the goal role as an output.Algorithm 3 processes one node in the stack at a time until the stack is empty. Initially, onlythe goal node (i.e., A.r) is added to the proof graph and is pushed onto the stack. A node inthe proof stack is the basic processing unit, each node consists of the following properties:? exp: the role expression of this node (e.g., B, A.r, A.r1.r2, or f1 ? f2 ? . . . fk).? parents: the set of nodes this node is a member of.? solutions: the set of entity nodes that can reach this node. Solutions are propagatedto a node?s parents in the following way. When a node e2 is added to the solution ofe1 (e2  e1), all existing solutions of e2 are appended to the solutions of e1 and thenpropagate to e1?s parents as well. Solution propagation is illustrated in Algorithm 4 (lines3 to 8).? linked roles: the set of linked role names. This property is used to process a Type 3 linkedrole A.r1.r2. The details are discussed below.? intersection nodes: the set of intersection nodes. This property is used to process a Type4 linked role f1 ? f2 ? . . . fk. The details are discussed below.For a credential statement A.r ?? exp, we define function RHS (A.r ?? exp) = exp(i.e., right-hand-side) and LHS (A.r ?? exp) = A.r (i.e., left-hand-side). If exp is a linked-role (i.e., in the form of A.r1.r2), then function PrimaryEntityRole(A.r1.r2) = A.r1 andSecondaryRole(A.r1.r2) = r2. If exp is an intersection-role (i.e., in the form of f1 ? f2 ? . . . fk),then RoleCount(f1 ? f2 ? . . . fk) = k.To process a Type 2 role node A.r (Algorithm 3, lines 10 to 16) in the stack, the algorithmfinds all credential statements that defines A.r. For each credential A.r ?? exp, it creates anode for exp in the proof graph, if none exists, pushes the newly created node onto the stack(addNode function), and then adds an edge exp A.r. The addEdge(exp, A.r) function of theproof graph adds A.r to the parents set of exp and propagates exp?s solutions to A.r.To process a Type 1 entity node B (Algorithm 3, lines 7 to 9), the algorithm simply adds Bto B?s solutions. Solutions of B are then propagated into all B?s parents as shown in Algorithm4 (lines 3 to 8).A Type 3 A.r ?? A.r1.r2 statement defines A.r to contain B.r2 for every B that is amember of A.r1. To process a Type 3 linked node A.r1.r2 (Algorithm 3, lines 17 to 24), thealgorithm creates a node for A.r1 and adds role name r2 to A.r1?s linked roles property. Whena new solution B is added to A.r1, Algorithm 4 (lines 9 to 13) creates a node B.r2 and addsan edge B.r2  A.r1.r2 to the proof graph. Thus, when a solution D is added to B.r2, D ispropagated to A.r1.r2 automatically according to Algorithm 4 (lines 6 to 8).A Type 4 A.r ?? f1 ? f2 ? . . . fk statement defines A.r to contain the intersection of allthe roles f1, ? ? ? , fk. To process a Type 4 linked node f1 ? f2 ? . . . fk (Algorithm 3, lines 25 to1337.3. Implementation and EvaluationFigure 7.6: OpenPolicy performance evaluation results. The worst-case response time wasmeasured by forcing OpenPolicy to enumerate all credential statements on all testing servers.For each run, a different number of credential statements are generated on each server, and5-20 concurrent authorization requests are submitted.30), the algorithm creates k nodes, one for each fi, and adds the current node f1 ? f2 ? . . . fkto the intersection nodes of fi. When a solution B is added to the solutions of fi, the currentintersection node is notified to add B to its partial solutions property (Algorithm 4, line 16).The partial solutions property of a intersection node maintains a set of potential solutions, eachassociated with a counter. When the count of a potential solution D reaches the number of rolesin the intersection role expression f1 ? f2 ? . . . fk (in this case, k), an edge D  f1 ? f2 ? . . . fkis added to the proof graph (Algorithm 4, lines 17 to 19).Similarly, the distributed containment query takes a goal role A.r, a user U and a set ofcredentials as inputs, and returns a boolean indicating whether U ? members(A.r). The logicfor constructing a proof graph is very similar to the membership query algorithm. The onlydifference is that this algorithm checks whether U ? members(A.r) holds for each node beingprocessed. If U ? members(A.r) holds, the function returns immediately. Each member of A.ris represented in form of an email, but U can be an email or an OpenID identifier. When U isin the format of an OpenID, the distributed authorization module use Email2OpenID EAUTservice to map each solution of A.r into an OpenID before checking whether U ? members(A.r).7.3 Implementation and EvaluationTo evaluate our approach, we implemented an Email2OpenID provider and OpenPolicy providerin J2EE. To support OpenID protocols, we reused OpenID4Java [Buf09], an open-source Javalibrary that offers support for implementing OpenID IdPs and RPs web-sites. OpenPolicy usesApache Tomcat as a web container and stores credential statements in a MySQL database. Tovalidate the design of our prototype implementation, we developed a Facebook application to1347.4. Identity-enabled Browserenable Facebook users to share their private photo albums with non-Facebook users via ourproposed sharing architecture. We employed OAuth 2.0 server-flow for sharing access policiesbetween CSPs and OpenPolicy providers.In addition to validating the correctness of data flows and inference logic, the runtime latencyincurred during authorization decision processes was another important concern. To evaluatethe performance characteristics of OpenPolicy, we deployed OpenPolicy on three hosts withinour institution?s internal network. To evaluate the portability and to ensure the performancemeasurements were not skewed by hardware and operating system, we used a different OS andhardware for each machine. The configurations were as follows: (A) Intel Duo Core 2 2.4GHzCPU, 4GB RAM, running Windows Vista, (B) Intel Duo Core 2 2.6GHz CPU, 4GB RAM,running MacOS 10.5.6, and (C) AMD Opteron Processor 142 CPU, 8GB RAM, running Linux2.4.27. Testing machines were connected over a local area network with 100Mbps Ethernetadapters. Round-trip latency was less than 0.1 millisecond on average in this configuration.To evaluate the performance characteristics of containment queries, we wrote scripts tocreate a set of credentials for each OpenPolicy server and then triggered OpenPolicy to performa worst-case containment query (i.e., A.r w D, butD is not a member of A.r), which enumeratedall credential statements on all testing servers. The performance results are shown in Figure 7.6.For each run, a different number of credential statements (i.e., 5,000 to 25,000) are generatedon each server, and a different number of threads (i.e., five to twenty threads) are invoked tosimulate concurrent authorization requests.To improve authorization response time, OpenPolicy caches proof graphs. When proofgraphs are cached, the response time becomes linear to the number of servers involved in thequery process. In our testbed, the worst-case response time was less than 3 milliseconds whencaching was used. Proof graph caches can greatly improve the response time of OpenPolicy.However, when the size of the cached graphs exceeds available memory, cache efficiency beginsto degrade. As future work, we plan to use other cache strategies to improve cache efficiency.We also want to apply authorization recycling [WCBR08] techniques to derive access-controldecisions directly on CSPs based on cached authorization responses from OpenPolicy providers.7.4 Identity-enabled BrowserOne usability challenge imposed by our proposed content sharing scheme is ?identity switch?,as a content requestor may need to switch to an appropriate identity when accessing sharedcontent hosted on different CSPs. To reduce the number of HTTP redirections encountered bya content request, and to make identity switch easier, we proposed an identity-enabled browser.Our approach is based on the identity flow metaphor from the design of operating systems (OS).In an OS, a user authenticates to the OS and that authenticated identity automatically ?flows?into all processes launched by the user. Our approach treats a browser as an operating systemand each web site the user visits as a process. A user enters her existing user name/password on1357.4. Identity-enabled BrowserFigure 7.7: System architecture and high-level data flow of the proposed the identity-enabledbrowser.the web into a browser; and with the user?s consent, that authenticated identity automaticallyflows into all websites that require an authenticated identity.7.4.1 System Architecture and Data FlowThe main actors in our system are an OpenIDemail identity provider, an RP that supports theOpenIDAuth HTTP authentication scheme, and an OpenIDemail-enabled browser. Figure 7.7illustrates the system architecture and high-level data flow among the actors. An OpenIDemailprovider is an Email2OpenID identity provider augmented with an OpenIDua extension. Webusers are not accustomed to using an OpenID URL as an identifier [DTH06, Adi08a]; emailaddresses on the other hand, serve as user identifiers for many CSPs [Adi08a]. Our approachis not bound to any specific email-to-OpenID translation service. In our implemented solution,we combine EAUT and OpenID so that web users can login using their email addresses to IdPswhile transparently conveying an OpenID identifier to RPs for identification. With EAUT,an OpenIDemail provider is free to implement any custom logic to map or translate an emailto an OpenID. OpenIDua is our proposed OpenID extension that allows IdPs to authenticatedirectly with user-agents such as browsers. Details of the OpenIDua extension are discussed inSection 7.4.2.To ?flow? authenticated identities automatically into websites that support OpenID for au-thentication, we introduce an HTTP access authentication scheme named OpenIDAuth. Similarto the HTTP Basic or Digest authentication schemes, OpenIDAuth is designed to allow a webbrowser, or other client program, to provide credentials when making an HTTP request. How-ever, instead of utilizing the username and password as credentials, OpenIDAuth uses OpenIDand a challenge/response protocol to ensure that a user ?owns? the claimed OpenID. Detailsof the HTTP OpenIDAuth scheme are discussed in Section 7.4.3.An OpenIDemail-enabled browser is a browser extended with the OpenIDemail protocol. To1367.4. Identity-enabled Browserlogin, a user mutually authenticates to her IdP directly in the browser, instead of performingauthentication on the IdP?s web site. There are three main steps to this login process (Steps 1to 3 in Figure 7.7):1. Map an email address to an OpenID: After a user enters her email address intoan OpenIDemail-enabled browser, the browser maps it to an OpenID identifier via anemail-to-OpenID mapping service and uses that to discover the end-point of the IdP.2. Establish a session key with IdP: The browser uses an extended associate operation(defined in OpenIDua) to exchange a shared session key with the IdP.3. Mutually authenticate with IdP: The browser mutually authenticates with the IdPvia an extended checkid immediate operation. This allows the IdP to assert that theuser owns the claimed OpenID identifier, and the browser to validate the authenticity ofthe IdP. During this process, the password hash of the corresponding claimed OpenID isused as the shared secret between these two parties.Once mutual authentication has been successfully completed, the browser and the IdP sharea tuple: OpenID i, session handle h, and session key k.When the user accesses protected content of an RP, the RP responds with an HTTP 401?Unauthorized? message to the browser with the WWW-Authenticate scheme set to OpenID:session.There are three steps required for the browser and the RP to complete an OpenIDAuth au-thentication (Steps 4 to 6 in Figure 7.7):4. Supply a claimed OpenID and the session handle: The browser makes an HTTPrequest again with the claimed OpenID i and the corresponding session handle h in therequest header.5. Validate a claimed OpenID and the session handle: The RP discovers the authen-tication endpoint of the IdP based on i and then sends i and h to the IdP to ensure thati and h are valid. The IdP responds to the RP with a validation result comprised of anonce n and signatures = HMAC(i||h||n, k).6. Compute a response for a given challenge: If the claimed OpenID is valid, the RPresponds to the browser with an HTTP 401 ?Unauthorized? message; it includes i, h, n inthe response header and sets the WWW-Authenticate scheme to ?OpenID:challenge?.The browser computes a signature s? = HMAC(i||h||n, k) based on the stored (h, k, i)tuple and the received nonce n, and it sends s? to the RP to check whether s? = s. If thecheck is successful, access is granted.7.4.2 OpenIDua ExtensionTo allow an IdP to communicate directly with a browser rather than relying on redirectionswith a RP, we propose OpenIDua, which is an OpenID extension that ?piggybacks? extra1377.4. Identity-enabled Browserinformation on the associate and checkid immediate operation messages. OpenIDua extendsthe associate operation to prevent man-in-the-middle (MITM) attacks. The original OpenIDprotocol uses the Diffe-Hellman (DH) key exchange protocol to establish a session key; however,plain DH is vulnerable to MITM attacks. To prevent this attack between a browser and an IdP,our design piggybacks the associate operation with a claimed OpenID. As described in Section8.2.3 of the original OpenID Specification [RF07], the session key is only XORed with the hashof a DH shared key (i.e., kenc = (k) ? H(gab mod q). It is possible for an MITM attackerto perform two distinct DH key exchanges with each party, which allowing the attacker todecrypt then re-encrypt the messages passed between them. With our extension, the additionalclaimed id field can be used by an IdP to find the corresponding password hash which is onlyknown to the IdP and the browser, but not to the adversary. The function used to hash thepassword is indicated in pwd hash type field. Based on our extension, the IdP responds to thebrowser with an encrypted session key by XORing the session key with the password hash andthen XORing the result with the hash of a DH shared key: kenc = (k?Hp(p)) ? H(gab mod q).Upon receiving the IdP?s response, the browser computes the session key: k = Hp(p)? (H(gabmod q) ? kenc).OpenIDua extends the checkid immediate operation with an extra field named enc pwd hashthat contains a password hash encrypted with the session key k. This information is used byan IdP to check whether a user ?owns? the claimed OpenID identifer. The encryption methodis indicated in the enc type field. As this is a direct communication between a browser andan IdP, the value of the original openid.return to field should be ignored by the IdP. Inour prototype implementation, we set openid.return to to the browser?s User-Agent HTTPheader. Below is a sample request from a browser (text in bold font indicates our extension tothe message):[Browser request:]openid.ns.ua=http://lersse.ece.ubc.ca/openid/ext/ua/1.0openid.ua.enc pwd hash=QAWSDERF412QAopenid.ua.enc type=AES-256openid.mod=checkid immediateopenid.claimed id=ece.ubc.ca/aliceopenid.assoc handle=123456789openid.return to=User-Agent: Mozilla/5.0 ...The message format of the response from an IdP is identical to the one for the originalcheckid immediate operation. This operation is also used by an RP to check whether a givenclaimed OpenID and its corresponding session handle have been authenticated with the IdP(without enc pwd hash field) when processing a request for protected content.1387.4. Identity-enabled Browser7.4.3 HTTP OpenIDAuth SchemeTo transmit a claimed OpenID transparently to an OpenID-enable web site, we introduceOpenIDAuth (Steps 4 and 6 in Figure 7.7), which is an HTTP access authentication scheme.For a given HTTP request to a protected resource, the RP responds to the browser with thefollowing message in order to solicit a claimed OpenID and the corresponding session handle:[RP response:]HTTP/1.1 401 Authorization RequiredWWW-Authenticate: OpenID:sessionrealm=?*.ubc.ca? auth-domain=?www.ubc.ca?The realm value and auth-domain specified in the response header are used to define the realmof an authentication session, which is the area of protected resources that shares the same usercredentials. To respond to an OpenID:session message, the browser prompts the user to selectan authenticated identity for the specific realm on the RP. The browser can optionally recordthis identity-to-realm mapping and use it for future requests without prompting the user again(for a specific period of time designated by the user). Once an authenticated OpenID is selected,the browser makes another HTTP request:[Browser request:]GET /private/content.html HTTP/1.1Authorization: OpenID:sessionuser-id=?http://ece.ubc.ca/alice?, session-id=?1234?When an RP receives such a request, it sends the supplied OpenID and session id to anIdP via an extended OpenID check immediate operation to ensure the supplied information isvalid. If it is valid, the IdP responds with a positive assertion that consists of a nonce n andsignature s = HMAC(i||h||n, k). The RP then uses n to challenge the browser. To respondto the OpenID:challenge message, the browser computes a signature over the list of fieldsspecified by the signed field (e.g., OpenID i, session handle h, nonce n), and then sends thesignature to the RP.The RP then checks whether the response matches the signature generated by the IdP. Ifit does, the RP responds to the browser with the requested resource and a logout URL in theheader. The browser uses the logout URL to notify the RP when the single logout event istriggered by the user.1397.4. Identity-enabled BrowserFigure 7.8: Flow for logging into an OpenIDemail provider.7.4.4 Log into an OpenIDemail ProviderWe now provide the sequence of steps for logging into an OpenIDemail provider as illustratedin Figure 7.8:1. User U enters email e and password p into browser B.2. B parses the domain d from e (email is in the form user@domain) and prepends the string?http://? to d to form an EAUT discovery URL. B retrieves an XRDS document [HL08]on the URL, and lookups values representing an EAUT service E.3. B sends e to E (i.e., https://lersse.ece.ubc.ca/eaut/).4. E maps e to an OpenID identifier i and sends it back to B.5. B makes an HTTP request on i to fetch the document hosted on I.6. I responds with either an XRDS or HTML document that contains the IdP endpoint URLIdP.7. B generates a Diffie-Hellman (DH) modulus q, generator g, and a random DH private keya to initiate an association operation that establishes a session key k with IdP (Steps7 to 11).8. B sends i, q, g, and the DH public key ga mod q to IdP.9. IdP generates a new session handle h, a session key k, and a random DH private key b.IdP then retrieves the password hash Hp(p) based on i from its credential store.10. IdP sends gb mod q, h, and an encrypted session key kenc = (k?Hp(p)) ? H(gab mod q)to B. Note that k is XORed with Hp(p) to prevent MITM attacks.11. B computes k = Hp(p)? (H(gab mod q) ? kenc) and then stores the tuple (h, k, i).12. To check whether U owns the claimed OpenID identifier i, B sends i, h, and E(Hp(p), k)to IdP via an extended check immediate operation.13. IdP decrypts the encrypted password hash using k, and checks whether Hp(p) matchesthe stored password hash for i.1407.4. Identity-enabled BrowserFigure 7.9: Flow for accessing protected content.14. After password verification, IdP sends back i, h, a nonce n, and a signatureHMAC(i||h||n, k)to B.15. B verifies the signature using the session key computed at Step 11 to ensure IdP holdsthe same session key. Once the signature is verified, B acknowledges to U that theauthentication process has been successfully completed.Our design allows users to log in with multiple IdPs. Users are prompted to choose an appro-priate identity when accessing protected content on RPs.7.4.5 Access Protected ContentWhen the login process has completed, the browser and the IdP have been mutually authenti-cated and each has established a tuple of (h, k, i). We now illustrate the data flow for accessingprotected content on an RP (Figure 7.9):1. B makes an HTTP request r for the protected content.2. RP responds with an HTTP 401 ?Unauthorized? to B with WWW-Authenticate schemeset to OpenID:session.3. B presents an identity selection dialog for U to select a claimed OpenID i. B sends r toRP again with i and the corresponding session handle h in the request header.4. RP makes an HTTP request on i.5. I responds with either an XRDS or HTML document that contains the IdP endpoint URLIdP.6. RP sends i, h to IdP via an extended check immediate operation to check whether ihas associated with an authenticated session h.1417.4. Identity-enabled BrowserFigure 7.10: Screen shots of the OpenIDemail enabled browser.7. IdP verifies i and h based on the stored (h , k ,i) tuple. If i and h are valid, IdPgenerates a nonce n (e.g., 2009-09-15T17:11:51 ZUNIQUE) and computes signature s =HMAC(i||h||n, k).8. IdP sends i, h, n, and s to RP.9. RP responds with an HTTP 401 ?Unauthorized? to B with WWW-Authenticate schemeset to OpenID:challenge and uses the nonce n as a challenge.10. B computes signature s? = HMAC(i||h||n, k) based on the stored (h, k, i) tuple and thereceived nonce n.11. B sends r to RP again with i, h, n, and s?.12. RP checks whether s = s?. If it does, RP returns the protected content to B and a logoutURL in the response header.Once the OpenIDAuth authentication process has completed, the RP can issue a cookiefor the browser B to represent the current authenticated session. B then includes this cookiein the HTTP request header for future communications with RP, instead of re-initiating anOpenIDAuth process.7.4.6 Prototype Implementation and EvaluationTo evaluate our approach, we implemented the proposed protocols in J2EE, and developed aFirefox extension to communicate with IdPs and RPs using the new protocols. For OpenIDua,we extended OpenID4Java [Buf09], which is an open-source Java library that offers support for1427.5. Summaryimplementing OpenID identity providers and relying party websites. We setup an OpenIDemailprovider by augmenting an existing email server with EAUT and the OpenIDua extension. Wealso augmented five open-source J2EE web applications (Bookstore, Employee Directory,Classifieds, Events, and Portal) from gotocode.com to become OpenIDemail RPs.Figure 7.10a shows a screen shot of the OpenIDemail Firefox add-on when a user launchesFirefox, but before the UI of the browser is visible to the user. The user enters her email accountand types or ?clicks through? her password to start the login sequence illustrated in Figure 7.7,steps 1 to 3. Once logged in, the user?s current login information will be shown on an iconlocated on the status bar of the browser. The user can click on the icon to log out or sign inwith additional accounts via a popup menu (Figure 7.10b). When the user is browsing to aprotected resource on an RP that supports OpenIDAuth, the Firefox add-on will prompt theuser to choose an identity (Figure 7.10c), before starting the automatic identity provisioningillustrated in Figure 7.7, steps 4 to 6. The add-on also records this action (i.e., ?Remember it?in Figure 7.10c), which allows automatic login for future access and assists users in determiningwhich identity was used for accessing which RP.7.5 SummarySince the beginning of the Web, ?identity wars? have led to service providers building ?walls?to protect their customer-base. However, these ?walls? restrict the evolution of the Web. Inour vision of a truly user-centric Web 2.0, users own their personal content and are free toshare it across and beyond walled gardens. Users should also have the freedom to choose theirfavorite providers for their identities, content, social relationships, and access-control policies.The separation of personal content and services puts the focus of a service provider on providingvalue services to the user it serves, forcing the service provider to be just a service provider?nolonger requiring users to compromise their identity or expand their social graphs unnecessarilyto share personal content.In this chapter, we proposed and described a preliminary design and implementation ofa user-centric access control scheme for secure content sharing beyond walled gardens. Ourproposed approach enables web users to reuse their access control policies across boundaries ofwebsites. In the Web, the content owner and the requestor often are unknown to each other, butneed to share contents in a controlled manner. To support distributed authorization, our proof-of-concept design and implementation enables content owners to specify access policies based ontheir existing social contacts, express delegation of relationship authority (i.e., friend?s friend),and denote authorized users using attributes (e.g., friends from a university). Note that thework described in this chapter focuses on the overall architectural design of and the underlyingprotocols. Usability studies are required to explore and improve the design of its user interface.1437.5. SummaryAlgorithm 3: queryMembership1: Input: Goal role R, Credential set C2: Output: Members of goal role3: Gp = new ProofGraph(); S = new Stack();4: addNode(R, Gp, S);5: while S is not empty do6: Node n = S.pop();7: if n.type == 1 # entity node B # then8: addSolution(n, n, Gp, S);9: end if10: if n.type == 2 # role node A.r # then11: find Cr = { c | LHS(c) = n.exp, c ? C};12: for each credential c ? Cr do13: Node n? = addNode( RHS(c), Gp, S);14: Gp.addEdge(n?, n);15: end for16: end if17: if n.type == 3 # linked-role node A.r1.r2 # then18: n? = addNode(PrimaryEntityRole(n.exp), Gp, S);19: n?.linked roles.addRole(SecondaryRole(n.exp));20: for each solution e ? n?.solutions do21: n?? = addNode( e+?.?+SecondaryRole(n.exp), Gp, S);22: Gp.addEdge(n??, n);23: end for24: end if25: if n.type == 4 # intersection-role node f1 ? f2 ? . . . fk # then26: for each role expression f in n.exp do27: n? = addNode(f , Gp, S);28: n?.intersection nodes.add(n);29: end for30: end if31: end while32: return Gp.findNode(R).solutions;1447.5. SummaryAlgorithm 4: addSolution1: Input: Node N , Solution entity node E2: Input: Proof Graph Gp, Stack S3: if E /? N .solutions then4: N .solutions.addNode(E);5: for each node n ? N .parents do6: addSolution(n, E, Gp, S);7: end for8: end if9: for each role expression r ? N .linked roles do10: Node n? = addNode(E.exp+?.?+r), Gp, S);11: Node n = Gp.findNode(N.exp+?.?+ r);12: Gp.addEdge(n?, n);13: end for14: for each node n ? N .intersection nodes do15: addPartialSolution(n, N , E) ;16: if RoleCount(n.exp) = count of E in n?s partial solutions then17: addSolution(n, E, Gp, S);18: end if19: end for145Chapter 8DiscussionFrom security point of view, designing a web SSO protocol should not be too difficult. After all,cryptographic primitives are mature and well understood. All the protocol designers need to dois to specify an agreed format and flow for conveying identity assertions from IdPs to RPs, andthen protect the protocol messages with cryptographic primitives. But why is OAuth 2.0 theonly web SSO protocol so far that has succeeded in achieving internet-scale adoptions, whileother promising web SSO proposals fail? This chapter discusses design challenges that must bemet in order for a web SSO solution to succeed in the online identity landscape, followed byhow the design of OAuth 2.0 addressed these design challenges, and the implications due to itsdesign tradeoff decisions.8.1 Design ChallengesWeb SSO is not just a technology; rather, it is a complex business ecosystem involving distinctstakeholders with different needs. Thus, to succeed in the marketplace, a web SSO solutionmust cater to the needs of users, RPs and IdPs that make up the ecosystem. Designing a webSSO solution aimed at internet-scale adoption is challenging because there are many differentrequirements from diverse stakeholders that need to be satisfied, and some requirements areeven in conflict with each other or constrained by the underlying web infrastructure. WebSSO systems? scale and complexity, combined with usability, privacy, security, and businessrequirements, create steep design challenges. Figure 8.1 illustrates the design requirementsfrom each stakeholder and their conflicts. Each requirement conflict (from A to F) betweenstakeholders is discussed in detail below.A: Web users want minimal personal information disclosure [MR08, DD08, Gig12, RRJT13],but RPs need a rich set of user data from IdPs to sustain their business [SBHB10, Gig11,Jan12a]. The user data that RP wants includes not only static profile attributes that can beconveyed in identity assertions, but also dynamic and rich data, such as friend lists, photos,posts, blogs, and location information [Gig11, Jan12a]. In addition, RPs also want the capa-bility of promoting their products and services among the users? social circles. Social graphintegration through SSO can provide compelling business incentives for RPs. However, thisbusiness requirement certainly contradicts user privacy concerns, because not only the IdP butalso all the friends in the user?s social graph could learn the user?s status, as well as the servicesand products in which the user is interested, on RP websites.B: Web users want untraceability [Cam05, MR08, DD08], whereas IdPs need to know user1468.1. Design ChallengesFigure 8.1: Contradictory requirements between User, RP and IdP. Each requirement is enclosedin an oval, and the conflict between two requirements is denoted by a double-arrowed line witha cross in the middle.browsing behavior for targeted marketing. From their business point of view, browsing behavioris invaluable, such as when and which RP websites the user has visited, what services andproducts the user is interested in, and to whom the activities on RPs have been shared. Thecollected user browsing behaviors allow IdPs to provide personalized advertisements and serviceson their platform. Nevertheless, this IdP tracking capability can be intrusive from users? privacyperspectives.C: Web SSO protocol messages need to be secured in transit, typically through crypto-graphic mechanisms. However, what RP developers really care about is the cost of imple-mentation and maintenance, because they are commonly overloaded with other prioritizedrequirements and project deadlines [BC08, WvO08]. Therefore, RP developers want imple-mentation simplicity?a web SSO solution that can be implemented without requiring them toperform additional complex formatting and parsing, state management, or cryptographic com-putations [Rec10, Pau10]. Common cryptographic operations, such as protecting the integrityand authenticity using digital signature or message authentication code, ensuring confidential-ity via encryption, or preventing replay attacks using nonce and timestamp, could be difficultfor average website developers to understand and implement correctly. Besides performanceoverhead, cryptographic computations is a common source of implementation errors [She11].D: RPs want a protocol that can be consumed by a variety of rich clients (e.g., JavaScript,Flash, mobile applications), but those clients are typically incapable of keeping the secrecy of theembedded cryptographic keys [HL12]. Unlike the cryptographic secrets residing on RP servers,the cryptographic keys embedded in those clients can be discovered by attackers through reverseengineering [Kia12]. Unfortunately, without secured cryptographic keys, it is fundamentallydifficult to protect the confidentiality, integrity and authenticity of the protocol messages amongnetworked entities [HL12].E: One-click sign-up is a desirable usability gain for SSO users [DD08, Gig11, Jan12a],1478.2. Why OAuth 2.0 Succeeds While Others Failbut many RPs need additional user credentials to ensure service availability. When an IdP?sidentity service is disrupted or compromised, RPs are the ones that pay for the loss and areliable to resolve the errors [MA10]. To ensure that users can still log in when their IdP accountsare inaccessible, RPs need to prompt new SSO users to sign up a new account with a uniqueusername and password, or link to an existing account. In addition, if an email address is notprovided by the IdP during the SSO process, RPs would need the new user to provide a validemail address, and verify it, for the password reset and future communications. Nevertheless,as shown in our studies (Chapter 3), the additional sign up or linking step is not only annoyingbut also imposing a mental burden on users.F: Leveraging the browser as a user agent makes it easy for users to use the SSO solu-tion without requiring them to install and configure additional software. However, HTTPredirection-based SSO solutions require users? cognitive capability and continuous attentionto detect IdP phishing attacks [DTH06, SDOF07]. Phishing attacks on SSO protocols are alooming threat [Lau07, DD08, Mes09]. Redirection-based protocols can habituate users to be-ing redirected to IdP websites for authentication, and it is trivial for an attacker to launchIdP phishing attacks. A malicious or compromised RP website can easily simulate IdP lo-gin forms, including the content and security indicators that IdP website might present, tocapture users? IdP login credentials. To prevent phishing attacks, users must confirm theauthenticity of an IdP before entering their credentials; but unfortunately, they usually donot [SDOF07, DTH06]. Existing studies suggest that security indicators are ineffective at pre-venting phishing attacks [DTH06, SDOF07]. Even with improved security indicators, users maystill ignore them [WMG06, SDOF07, SEA+09].The conflicts between user and business needs requires web SSO protocol designers to maketradeoff decisions. While most web SSO proposals focus on user needs, the design of OAuth2.0 was catered to the business needs of RPs and IdPs. This strategic design decision makesOAuth 2.0 the most popular web SSO solution ever.8.2 Why OAuth 2.0 Succeeds While Others FailFundamentally, web SSO systems shift the function of identity collection and authenticationfrom RPs to IdPs. However, it is unlikely that websites will want to act as RPs if IdPs guardthe identities while RPs pay the cost of failure [JZS07, DD08, MA10]. Thus, unless there isa significant add-on value through the integration of SSO, it is inherently difficult to convincewebsites to take on business risks of being an RP [SBHB10]. OAuth 2.0 succeeds because itsdesign was catered to the business needs of RPs and IdPs. The widely adopted RPs and IdPscreate a prosperous social interaction ecosystem, which in turn attracts and facilitates users?adoption.From RP business point of view, migrating to web SSO may not be a worthwhile endeavor.First, few organizations are willing to trust a third-party organization to authenticate their1488.2. Why OAuth 2.0 Succeeds While Others Failusers when they have no recourse in the event of error or attack [MA10]. Second, confusinguser experiences could upset users; and, as a result, impact a website?s business directly [DD08].New user sign-up is a critical process for business, and most websites would not want to risktheir potential customers for the sake of SSO [MR08, DD08]. Third, the usability gain of SSOfor web users is marginal when deciding which websites to engage [SPM+11b]. Most usersare unlikely to choose one RP over another solely because the website supports SSO for login,especially when the same usability gain can be achieved by simply using a browser that hasautomatic password and form fill-in feature. Thus, when early adoption does not appear toprovide RPs with direct returns or competitive advantages, websites would rather wait untilweb SSO technology is matured and the cost of user training has already been absorbed byother websites [SBHB10].The primary reason why existing web SSO solutions prior to OAuth 2.0 failed to gainwidespread RP adoptions (e.g., InfoCard, OpenID, SAML) is because they do not provideconcrete business gains for websites to overcome their resistance of being RPs [SBHB10]. Thedesign of those web SSO proposals mainly focused on how to convey identity assertions fromIdPs to RP websites in a usable, secure and privacy-preserving manner. Although these designfeatures are certainly important to users, there are little, if any, value proposition for RPs. Afterall, websites could acquire user attributes through their existing sign-up process. In addition,by authenticating users themselves, websites have full and independent control over the userauthentication and attribute collection processes without relying on third-party organizationsthat websites do not have trust relationships with.Understanding RP and IdP business needs and catering its design to fulfill those businessrequirements is the leading factor that makes OAuth 2.0 a great success. Unlike other web SSOproposals that center their design around user needs, the design strategy of OAuth 2.0 tradesuser privacy and protocol security for what RP and IdP business needs, and then leverages theubiquitous RPs and IdPs to attract users? engagements. What OAuth 2.0 designers envisionedis a ?circle of virtue? social ecosystem that provides not only compelling business incentives forRPs and IdPs, but also enriched social interaction and content sharing experience for users.Image a web user, Alice, visits a restaurant website using her social account (e.g., Face-book) for login. Immediately, Alice notices the ratings and reviews of several meal entriesrecommended by her two Facebook friends, Bob and Charlie. After reviewing Bob and Char-lie?s comments, Alice decides to try out the meal suggested by her connections. Right in therestaurant, Alice takes several photos, and posts them with her comments back to Facebookthrough her mobile phone. Right away, Alice?s friends, Debbie and Emma, after seeing herposts, proceed to visit the restaurant website and decide to try out the meal suggested by Alice.When Alice returns to Facebook, there are several advertisements showing similar restaurantsand recipes for her to browse. Alice follows the links, and another ?circle of virtue? beginsagain.There are two main design features that allow OAuth 2.0 to create this ?circle of virtue? in1498.3. Lessons Learned and Implicationsa scale and speed that have never been seen before. First, OAuth 2.0 is a token-based protocol.Unlike claim-based SSO protocols (e.g., SAML Web SSO, OpenID, InfoCard) that can onlycarry limited static user attributes in the assertion claim when the user sign in, the OAuthprotocol uses an access token that acts as a ?temporary key? to the user?s profile on the IdP.This temporary key carries permissions granted by a user to a given RP. As long as the userconsents to what the RP requested, the RP can virtually access any piece of data the userowns or acts on behalf of the user through web APIs published by the IdP, anywhere, any time.For instance, Facebook provides APIs for RPs to access any friend lists the user created, readfrom a user?s Facebook email inbox, post content, comments, and ?likes? to a user?s streamand to the streams of the user?s friends, and even RSVP to events on the user?s behalf. Thistoken-based design removes the limitation on what and when user data RPs can access, as longas the user clicks on the ?Allow? button on the consent form to grant the requested access.Second, the OAuth 2.0 protocol is extremely easy to implement?two simple, ordinaryHTTP requests are sufficient for an RP to verify an SSO user. Technically, OAuth 2.0 is nota ?security? protocol. It has no encryption, hashing, digital signing, or even random numberor timestamp. To reach a critical mass of RP adoptions in a short period of time, OAuth2.0 was catered to average web developers by removing cryptographic requirements from thespecification completely, and even made the use of SSL optional. This simplicity design strategyallows the protocol to be implemented without performing any cryptographic computations orstate management. Removing the cryptographic requirement from the specification also allowedthe protocol to be integrated by ?rich? clients that cannot keep the secrecy of cryptographicsecrets, such as JavaScript, Flash, or mobile applications.8.3 Lessons Learned and ImplicationsThis section discusses our insights stemmed from the observation of the design of OAuth 2.0.We also discuss the security, privacy, and user adoption implications due to the design tradeoffdecisions that have been made by the protocol designers.Web SSO proposals that function solely as an authentication mechanism will berejected by most RPs. SSO technology grew from within the corporate enterprise [HHJM08].The advantage of adopting an intra-organization SSO solution is obvious?SSO reduces opera-tional cost and streamlines users? login experiences [Nov09, Act09]. All an SSO project needs isa cost justification for the identity management project; there are no other business concerns.Similarly, federated SSO provides operational benefits for all mutually trusted organizations inthe federation. Each organization can continue to manage their users while leveraging all usersin the ?circle of trust.? As a result, all organizations in the federation are rewarded for theirparticipation. On the contrary, web SSO requires RPs to rely on IdPs without pre-establishedtrust relationships. This change raises great business concerns and requires direct businessbenefits for organizations to overcome. Thus, web SSO proposals that function solely as an au-1508.3. Lessons Learned and Implicationsthentication mechanism (e.g., OpenID 2.0, InfoCard, Mozilla?s Persona) without offering RPsconcrete business gains will most likely be rejected by RPs, even if the protocol is secure, privacypreserving, and usable for users.Social login provides compelling business incentives for RPs. The abundance ofdemographic data, interests and friends that users maintain on their social network profileshas opened a new world of possibilities for websites. OAuth 2.0 started to take off whensocial network sites, such as Facebook and Twitter, employed it for social login. When socialnetwork sites act as IdPs, OAuth 2.0 provides a great opportunity for RPs to drive users fromthese social networks to their websites. Clearly, user sign-on through social login provides amuch richer and accurate data, compared to traditional account sign-up approach. In addition,social graph data can be used to segment user persona, in turn allowing RPs to provide morerelevant user experience and highly effective content to increase return on advertising spent.Moreover, social sharing enables users to broadcast content and activities from RPs to theirsocial networks, increasing brand advocacy and creating an effective source of qualified referraltraffic back to RP websites. A recent study also shows that socially logged-in users spend moretime engaging with the website and are more likely to purchase products [Gig09]. These directbusiness benefits provide websites with compelling motivations of becoming RPs.Social login brings attractive features to users. For web users, SSO reduces theirpassword management burden and streamlines account registrations. However, unlike SSOusers in the enterprise settings, web users have alternative options, and they have differentprivacy and security concerns. First, without SSO, web users tend to use weak passwords oruse same passwords across websites, as choosing strong memorable passwords is a challengingtask. Nevertheless, as there is no direct data and user experiences available indicating weakor reuse password leads to physical asset loss, most users are ?conformable? with risky behav-ior [Her09]. Second, many web users have already used the password/form manager featurein the browser to reduce the number of password and registration information that need tobe entered. Password and form managers are inconvenient when users switch between com-puters or when they want to use shared or public computers. However, those are occasionalevents that are considered tolerable by most users [GF06]. Moreover, web SSO imposes privacyand security concerns when sharing personal information across websites. Therefore, to attractusers, web SSO solutions need to provide additional benefits for users to motivate their adop-tion intentions. Unlike authentication-only web SSO proposals, social login provides web userswith an immense social experience outside of their social networks. For socially engaged users,social login not just reduces the friction of sign-up and sign-in, but more importantly, it extendstheir social interactions virtually to anywhere on the web through the ubiquitous OAuth RPwebsites.OAuth 2.0 is only as secure as HTTP cookie. OAuth 2.0 is not one of the conventional?security protocols? because no cryptographic primitives are built into it. The removal ofcryptographic protection from the specification makes the protocol easy to implement, but the1518.3. Lessons Learned and Implicationstradeoff is that it is only as secure as the HTTP cookie architecture. Similar to HTTP cookie,OAuth 2.0 implementations can be compromised by many prevalent attack vectors such asXSS, CSRF, and network eavesdropping. Without additional security guarantees offered bythe protocol, RPs need to follow general web application security guidelines to secure theirimplementations. However, security is rarely the top priority for website developers. As oursecurity analysis shows, even popular Top 1000 websites do not practice simple and well-knownsecurity guidelines. Thus, we believe that OAuth 2.0 in the hands of most developers, who areoverloaded with other priorities or without a deep understanding of web security, would likelylead to insecure implementations.Social login aggregator will be an attractive attack target. OAuth 2.0 is notinteroperable?one implementation for an IdP (e.g., Facebook) will not work seamlessly withanother IdP (e.g., Google Plus). RPs need to tailor their implementations for each supportedIdP, because each IdP provides a unique API and URL endpoints for accessing their ownparticular silo. We believe that many RP websites will thus resort to social login aggregators,such as Giya and Janrain, that provide RPs with a unified interface to communicate with majorIdPs. Through social login aggregators, RPs can virtually integrate any web SSO protocolswith one single implementation. However, as social login aggregators proxy and store user datafrom IdPs, a single security breach in their implementation could lead to significant user datacompromise. Instead of targeting an individual RP or IdP, an attacker may target a social loginaggregator where significant user data are stored.OAuth 2.0 as a new phishing and malware distribution channel. The social graphwithin a social network is a powerful viral platform for the distribution of information, butit could be a new phishing and malware distribution channel as well. Web users have beenconditioned to be wary of links in email, but tend to put more trust in social network messagesfrom their friends and the social network. According to the designers of Facebook ImmuneSystem [SCM11], attackers commonly target the social graph to harvest user data and propagatespam, malware, and phishing messages. Known attack vectors include compromising existingaccounts [TN10, BCF10], creating fake accounts for infiltrations [Sop09, BSBK09, BMBR11],or through maliciously crafted applications [PAC09, SBL09, EMKK11]. Yet, our empiricalsecurity analysis (Chapter 5) suggests that compromised access tokens can used as anothernovel attack vector to distribute phishing and malware download messages. We believe that itis fundamentally difficult for an IdP to detect and block such attacks because by design anyparty possessing an access token can consume the same rights granted to the token, and thereis no distinguishable features between the API calls made by an legitimate RP and an attacker.OAuth 2.0 accelerates internet-scale private information disclosure. Comparedto the traditional website sign-on option, and to web SSO solutions that solely function asan authentication mechanism, OAuth 2.0 facilitates and accelerates personal data exposuremore than ever before. First, RPs can request nearly any piece of information presented ina user?s profile from IdPs. These user information includes, but is not limited to, basic user1528.3. Lessons Learned and Implicationsinformation (name, email, address, gender, birthday, hometown), photos, work and educationhistory, events, interests, documents, relationship status, location, status updates, list of friends,and content generated by friends (status update, photos, documents). Second, whenever a userlogs into an RP website, the IdP learns when, from where, what user agent the user is using,and what data the RP is requesting. Third, when an RP posts message streams back to theIdP on behalf of the user, not only the IdP but also all the friends in the user?s social circle canlearn what activities the user are conducting on the RP website. In many contexts, exposinguser activities on RPs to social connections can result in awkward or embarrassing situations(e.g., playing a game in the middle of night). Fourth, for RPs that use a social login aggregatorto proxy their communications with IdPs, the aggregator gets a copy of all personal informationpassing through it, without the users? knowledge. Fifth, for RPs that use a JavaScript SDKprovided by the IdP, the SDK could periodically pass private information located on the RPpages back to the IdP. Furthermore, the primary privacy protection mechanism in social loginis the profile sharing consent dialog. However, existing studies suggest that users becomehabituated to consent forms and cease paying attention after seeing them multiple times [MR08,DD08]. Asking users to consent for more frequent profile sharing could result in a system thatincreases, rather than minimizes, the identity information that users are willing to reveal toRPs.To be, or not to be? That is the question. On one hand, OAuth 2.0 brings fric-tionless and enriched social experience to users; but on the other hand, social login can leadto an extreme exposure of personal data. So when facing convenience/privacy tradeoffs, whatproportion of users is willing to trade their privacy and security for the usability gain that SSObrings to them? The evidence is mixed. Egelman [Ege13] performed a controlled experimentto quantify the proportion of users who use Facebook to sign onto various websites. In hisstudy, 70% of participants proceeded to sign on websites using Facebook Connect. But anotherstudy [Gig12] shows that 60% of surveyed users favor the right to control how their data is usedor opt out of online data tracking entirely. In addition, many respondents [Gig12] believed thatbusinesses that acquired social data through social login would sell their social profile data tothird parties, or spam their social network friends. Ronen et al. [RRJT13] studied how dataexposure to RPs affects the choice of sign-in accounts, and their result suggests that many usersare not aware of what types of data they expose to RPs upon sign in, but when clear tradeoffsbehind each sign-in option are presented, many users are willing to change their sign-in optionto reduce data exposure. Currently, it is not clear how many web users will opt in for SSO orkeep on using it. Nevertheless, we believe that once users have experienced privacy or securityintrusions due to social login, many of them will avoid to click on the social login button, orabandon SSO entirely. For those who find social login risky, browser built-in password and formmanagers (i.e, Google Chrome) would be a viable alternative. A browser-supported password/-form manager can achieve an approximate level of sign-up and sign-on convenience withoutimposing privacy and security concerns. In addition, it can work with existing websites, and1538.3. Lessons Learned and Implicationsdoes not require an additional learning curve from users. Browser-based password managershave been worked effectively for years. However, for form managers, how to detect form fieldscorrectly remains a fundamental challenge, which is an important research question deservingfurther investigations.OAuth 2.0 is the first web SSO protocol that was catered to the business needs of RPsand IdPs by trading the security and privacy that users care about. This design strategymakes OAuth 2.0 the most popular web SSO solution ever, however the design tradeoffs imposesignificant security and privacy implications. Thus, web SSO development community shouldcontinue to investigate mechanisms that could improve security and enhance privacy withoutsacrificing implementation simplicity and the content sharing flexibility.154Chapter 9ConclusionSimilar to the way that credit cards reduce the friction of paying for goods and services, webSSO systems are intended to reduce the friction of using the Web. The proliferated adoption ofmajor online service providers and social networks as identity providers have attracted millionsof supporting websites. Yet, how to design a usable, secure, and privacy-preserving web SSOsolution that could drive potential users toward embracing it, and preventing existing usersfrom abandoning this new technology remains a challenging task for web SSO developmentcommunities.An Internet-scale adoption of web SSO solutions could lay an interoperable authenticationfoundation on today?s web infrastructure. This dissertation presents several works to achievethe goal of improving the usability and security of web SSO solutions. In Chapter 3, ourusability study shows that current implementations of web SSO solutions impose a cognitiveburden on web users, and raise significant security and privacy concerns. Moreover, web usersdo not perceive an urgent need for SSO, and many would only use a web SSO solution on RPwebsites that are familiar or trustworthy. We designed an alternative SSO user interface toexplore possible improvements in the users? SSO experience, and found that many users woulduse web SSO on the websites they trust if the SSO option is clear to them, and they have controlover the sharing of their profile information. In addition, our results suggest an extension to thetechnology acceptance model in the context of web SSO. With further validations, the modelcould be used to explain and predict user acceptance of a web SSO solution from measurestaken after a brief period of interaction with the system.For the security improvement of the OpenID protocol, we present in Chapter 4 a formalmodel checking analysis of the OpenID 2.0 protocol, and an empirical evaluation of 132 OpenID-enabled websites. The model checking analysis revealed that the OpenID protocol does notprovide an authenticity or integrity guarantee for the authentication requests, and the protocollacks contextual bindings among the protocol messages and the browser that make those re-quests. Our empirical evaluation show that the uncovered vulnerabilities are prevalent amongthe real-world RP implementations. We provide a simple and scalable defense mechanism forRPs to ensure the authenticity and integrity of the protocol messages. In addition, for thoseRPs that find deploying SSL impractical, the session-hijacking countermeasure we proposed canbe used as an alternative defending mechanism.In Chapter 5, we investigate how well-known and prevalent web attack vectors, such asnetwork eavesdropping, cross-site scripting, and cross-site request forgery, can be leveraged155Chapter 9. Conclusioncollectively or individually by an adversary to compromise the security of OAuth-based SSOsystems. We examine the implementations of three major IdPs and about 100 most-visited RPwebsites. Our results suggest that current OAuth SSO implementations are far from secure. Ouranalysis uncovers several critical vulnerabilities that allow an adversary to harvest victim user?sprivate data and act as the victim on the IdP, as well as impersonating the victim user on theRP websites. We identify the fundamental causes of those uncovered weaknesses, and suggestseveral practical mitigation mechanisms for IdPs and RPs to secure their implementations.One way to directly compromise users? personal data and authentication credentials isthrough SQL injection attacks?one of the foremost threats to web applications. In Chap-ter 6, we propose and evaluate an approach for retrofitting existing web applications againstknown as well unseen SQL injection attacks. The proposed approach is based on dynamic taintanalysis at runtime, and our experience and evaluation indicate that the approach we proposedis effective, efficient, and easy to deploy without involvement of application developers. Ourproposed approach offers the protection to the existing web applications against SQL injectionattacks where source code, qualified developers, or security development processes might notbe available or practical.In addition to using web SSO for authentication, web users need usable mechanisms forsharing personal content in a controlled manner across boundaries of websites, even with un-known content requestors. By leveraging OpenID and OAuth, we propose a novel user-centriccontent sharing scheme in Chapter 7. The proposed scheme allows web users to author theiraccess-control policies based on friends or families in their existing social circles, and reuse thosepolicies on content-hosting websites to control the access to their personal contents. In addition,our proposed scheme supports the notion of trust and delegation, which provides a flexible wayfor a user to delegate attribute assertion authority to another user who is in a better positionfor asserting attributes of other users that are unknown to the content owner.Throughout the course of this dissertation research, we observed several important usabil-ity and security-related improvements in the web SSO technologies that are aligned with ourfindings or may be influenced by our insights or recommendations:? OpenID Connect, the next version of OpenID, embraces the ?share-by-token? model andusing OAuth 2.0 as its core authorization protocol [SBHB10].? Mozzila Persona, a web SSO proposal by Mozzila, is conceptually and architecturallysimilar to our proposed OpenIDemail enabled browser that uses email-addresses as identi-fiers, makes IdP phishing difficult and identity switch easier, and focuses on user privacyprotection.? Major IdPs (e.g., Facebook, Google, Yahoo) support fine-grained profile sharing controlin the OAuth user consent forms.? Facebook OAuth JavaScript SDK no longer saves access tokens as cookies.? Facebook OAuth JavaScript SDK supports the OAuth server-flow.1569.1. Future Work? The offline permission is deprecated and removed by Facebook.? Facebook?s OAuth authorization server supports token refresh mechanism.We hope that other insights and recommendations from this dissertation research could beadopted by web SSO development communities to further improve the usability and security ofweb SSO solutions.9.1 Future WorkAlthough this dissertation research has presented several works toward improving the usabilityand security of web SSO systems, there are many areas required for future research. We detaileach future research direction in this section.9.1.1 Further Investigation of Users? Perspectives of Web SSOThe design of our usability study supported a direct usability comparison of our prototype designwith current SSO solutions. However, because of the inherent limitations of the within-subjectsstudy approach, we could not evaluate the effectiveness of some important features providedby our design (e.g., phishing protection, multiple IdP sessions, in-browser profile editing andsharing, single sign-out), nor validate the proposed web SSO technology acceptance model. Wealso found issues revealed from our IDeB interface that require further improvement. First,most participants did not notice the identity indicator at the left bottom corner of the screen.Second, it was not clear to the participants that IDeB does not store their password on thelocal computer; and some participants were consequently concerned that the stored passwordand profile information could be compromised. Third, some participants thought that theywere giving their username and password to the websites directly. In addition, we suggest thatan account linking task should be performed during a traditional login rather than at the endof an SSO process; nevertheless, how to convey the concept and benefits of account linkingand how to design a usable interface for managing account linking-related tasks (e.g., linkingto one or several IdP accounts, unlinking, auditing) are research questions that require furtherinvestigations. Furthermore, our empirical study results have the following limitations:? Generalizability: Participants were primarily young adults, with only one participant over45 and none under 19. All of the participants reported browsing the Web daily or more,and thus might be less prone to errors or misunderstandings while using the interface.? Ecological validity: The participants were restricted to using the computer provided tothem during the study and accessing the websites specified by the study. In addition,only the first-time user experience was studied; we did not examine daily usage behaviors.Expanded (more websites) and longer term studies are recommended to address this.? Precision: Carry over and fatigue effects due to the within-subjects format may haveaffected the study results (although responses were similar between the two groups). A1579.1. Future Workbetween-subjects study will be required to validate whether those negative effects didexist in our study.9.1.2 Usable IdP-Phishing Resistant MechanismsPhishing attacks on SSO protocols are a looming threat. OpenID, OAuth, and other similarbrowser-redirection based protocols (e.g., Google AuthSub [Goo08], AOL OpenAuth [AOL08],Yahoo BBAuth [Yah08]) may habituate users to being redirected to IdP websites for authenti-cation. If users do not verify the authenticity of these websites before entering their credentials(and they usually do not [SDOF07, DTH06]), IdP login credential phishing attacks are possi-ble. An attacker can lure users to a phony RP website via email or messages posted on socialnetwork, and redirect users to a forged IdP site where they are asked to enter their passwords.To prevent phishing attacks, users must confirm the authenticity of an IdP before entering theircredentials.In our usability study, all participants expressed serious concerns about the issue of IdPphishing attacks. During the study, half of the participants, even when prompted, could notfind any distinguishing features on a bogus Google login form. Once informed of the possibilityof IdP phishing attacks, most of our participants stated that they would not use the SSOtechnology if IdP phishing is possible.Research on methods of authenticating websites to users include security indicators [Cor05,HJ08], secure bookmarks for known websites [DT05, WMG06, YS06], and automated detec-tion [ANNWN07, GPCR07, ZHC07, XH09] and blacklisting of known phishing sites (e.g.,Google, Microsoft, PhishTank [Phi11]). However, studies suggest that security indicators areineffective at preventing phishing attacks [DTH06, SDOF07, ECH08]; and blacklisting knownphishing sites still cause the problem of a high rate of false-positives or false-negatives [ZECH07,Hon12]. Even with improved security indicators, users may ignore them [WMG06, SDOF07,ECH08, SEA+09]. Our SSO-enabled browser design proposes one way to reduce the possibilitiesof IdP phishing attacks. Nevertheless, the SSO development community should further inves-tigate how to design phishing resistant mechanisms that prevent IdP phishing attacks withoutrelying on users? cognitive capabilities and continuous attentions.9.1.3 Security Analysis of the OpenID Connect ProtocolOne major limitation of the OAuth 2.0 protocol lies in its interoperability?each RP needs totailor its implementation for each supported IdP that provides a unique API and endpoints foraccessing their own particular silo. In addition, the level of identity assurance is not conveyedin the protocol. At the time of this writing, OpenID Foundation is drafting the next version ofOpenID, named ?OpenID Connect? [SBdM+11], that aims at addressing these two limitations.Technically, OpenID Connect is fundamentally different than the OpenID 2.0 protocol.OpenID Connect leverages OAuth 2.0 as the basic access authorization protocol and introducesinteroperability and identity assurance features on the top of the OAuth 2.0 protocol. The1589.1. Future Workmain design features of OpenID Connect include (1) dynamic IdP endpoints discovery, (2)dynamic RP-to-IdP registration and session key exchange, (3) endpoints for retrieving userprofile attributes and session management, and (4) ID token that contains claims about theuser?s identifier and the corresponding authentication methods. These design features make asingle implementation of an RP website able to virtually interact with all OpenID Connect IdPswithout a tailored configuration, registration, or implementation. As the adoption of OpenIDConnection might be seen in the near future, its security needs to be thoroughly examined.9.1.4 Security Analysis of OAuth JavaScript SDK LibrariesIdP SDK libraries employ cross-domain communication (CDC) mechanisms for passing accesstokens between cross-origin windows of a browser. As demonstrated by several researchers [BJM09,HSA+10, WCW12], passing sensitive information through CDC channels could impose severesecurity threats. Nevertheless, the security of OAuth JavaScript SDK libraries has not beenthoroughly examined yet.Facebook SDK uses postMessage HTML5 API and Adobe Flash for cross-frame interac-tions. For postMessage, Hanna et al. [HSA+10] found that, due to several insufficient checkson the sender?s and receiver?s origin in the code, both tokens and user data could be stolen byan attacker. For Flash, Wang et al. [WCW12] uncovered a vulnerability that allows an attackerto obtain the session credential of a victim user by naming the malicious Flash object with anunderscore prefix. Both vulnerabilities were reported and fixed by Facebook, but they mightappear in the other IdPs? SDK implementations.We examined Microsoft?s SDK and found that the SDK does not use any CDC mechanism forpassing access tokens. Instead, a cookie shared between same-origin frames is used. MicrosoftSDK requires RPs to include its SDK library on the page of the redirect URI, which is underthe RP?s domain. The library on the redirect URI page extracts the access token from theURI fragment and saves it to a cookie; and the library on the RP login page polls the changeof this cookie every 300 milliseconds to obtain the access token. Using cookies for cross-frameinteractions avoids the security threats presented in the CDC channels. Nevertheless, HTTPcookies could be eavesdropped in transit or stolen by malicious cross-site scripts.Google SDK implements a wide range of CDC mechanisms for cross-browser support andperformance enhancement. Those mechanisms include fragment identifier messaging, postMessage,Flash, Resizing Message Relay for WebKit based browsers (Safari, Chrome), Native IE XDCfor Internet Explorer browsers, and the FrameElement for Gecko based browsers (Firefox). TheSDK is separated into five script files and consists of more than 8,000 lines of code. Barth etal. [BJM09] systematically analyze the security of postMessage and fragment identifier mes-saging, and Hanna et al. [HSA+10] empirically examine two JavaScript libraries, Google FriendConnect and Facebook Connect, that are layered on postMessage API. However, the lack ofa thorough security analysis for the rest of CDC mechanisms might lead to severe securitycompromises, which is an important research topic requiring further investigations.1599.1. Future Work9.1.5 Adding Cryptographic Protection to OAuth without SacrificingSimplicityCompared to its predecessor and other SSO protocols, OAuth 2.0 mainly make the protocolsimple for RP developers to implement by removing the digital signature requirements from thespecification, and relies on SSL as the default way for communication between the RP and IdP.However, our investigation suggests that the protocol without cryptographic protection is toosimple to be secured completely. Based on insights from our analysis, we believe that OAuth2.0 at the hand of most developers?without a deep understanding of web security?is likely toproduce insecure implementations.Unlike conventional security protocols, OAuth 2.0 is designed without sound cryptographicprotections, such as encryption, digital signature, and random nonce. The lack of encryptionin the protocol requires RPs to employ SSL, but many evaluated websites do not follow thispractice. In addition, the authenticity of both an authorization request and response cannotbe guaranteed without a signature. Moreover, an attack that replays a compromised SSOcredential is difficult to detect, if the request is not accompanied by a nonce and timestamp.Furthermore, probably for the sake of simplicity, current major IdP implementations favorbearer token [JHR11] over proof token [HLBA11], which make replay attacks on resource accessdifficult to detect. Thus, it is imperative for future research to investigate how to providesound cryptographic protections to the OAuth protocol without imposing RP implementationdifficulties.9.1.6 Human and Organizational Factors Contributing to the UnsecuredSSO ImplementationsOur OAuth security analysis found several relatively obvious implementation flaws exhibited inmany popular RP websites listed on Google Top 1000 Websites. For instance:? SSL is employed for the protection of the traditional login form, but the same SSL certifi-cate is not used during an SSO processes to prevent SSO credentials (e.g., access token,authorization code) from network eavesdropping.? Access tokens are saved by RP websites as HTTP cookies without secure and httponlyattributes being set.? Instead of obtaining user profile from RP server-side, the JavaScript of several testedRPs obtains the user?s Facebook profile directly on the user?s client browser, and thentransmits the profile data to the RP server to identify the current SSO user. Nevertheless,when the publicly accessible Facebook user identifier is used as an SSO credential, thisimplementation allows an adversary to impersonate a victim RP user by simply sendinga sign-in request with the victim?s Facebook user identifier as a parameter.? The optional ?state? parameter is not included in the authorization request to maintainthe state between the request and response, which makes the RP vulnerable to session1609.1. Future Workswapping attacks. This threat is documented in the protocol specification and the ?OAuthThreat Model? [LMH11], as well as on the documentation of Facebook OAuth developerguide [Fac13]. And yet many evaluated RPs ignore this simple security guideline.? XSS and CSRF are well-known web vulnerabilities, and yet many tested RP websites arenot protected against these two popular attack vectors.Because those uncovered implementation flaws are clearly obvious, it is thus important tounderstand what actually went wrong in the RP?s development process in order to address thisproblem effectively. In other words, what factors from developers? perspectives contribute tothe aforementioned unsecured SSO RP implementations? Is it due to resource constraints?Time-to-market schedules? Complexity of the protocol? Lack of a sound security engineeringprocess? Development platforms or methods? Misconceptions about the security provided bythe protocol or IdPs? Not aware of the existence of OAuth security guidelines? Or even the low-value and low-sensitivity nature of the website itself. Understanding the problems and challengesRP developers faced might shed light on what efforts and improvements from both organizationsand the web SSO development community could lead to secure SSO implementations. Theinsights might be also applicable to other similar security engineering domains.161Bibliography[Act09] ActivIdentity Corp. SecureLogin. http://www.protocom.com/, 2009. [Online;accessed 15-October-2013].[Adi08a] Ben Adida. EmID: Web authentication by email address. In Web 2.0 Securityand Privacy Workshop (W2SP?08), Oakland, California, USA, 2008.[Adi08b] Ben Adida. Sessionlock: Securing web sessions against eavesdropping. In Pro-ceeding of the 17th International Conference on World Wide Web (WWW?08),pages 517?524, New York, NY, USA, 2008. ACM.[AGS04] Kwasi Amoako-Gyampah and A.F. Salam. An extension of the technology accep-tance model in an ERP implementation environment. Information and Manage-ment, 41(6):731 ? 745, 2004.[Anl02] Chris Anley. Advanced SQL injection in SQL server application. http://www.nextgenss.com/papers/advanced_sql_injection.pdf, 2002. [Online; accessed15-October-2013].[ANNWN07] Saeed Abu-Nimeh, Dario Nappa, Xinlei Wang, and Suku Nair. A comparison ofmachine learning techniques for phishing detection. In Proceedings of the Anti-Phishing Working Groups 2nd Annual eCrime Researchers Summit, pages 60?69,New York, NY, USA, 2007. ACM.[AOL08] AOL LLC. AOL Open Authentication API. http://dev.aol.com/api/openauth, 2008. [Online; accessed 15-October-2013].[Apa07] Apache Software Foundation. Apache JMeter. http://jakarta.apache.org/jmeter/, 2007. [Online; accessed 15-October-2013].[AQT07] AQTRONIX. WebKnight. http://www.aqtronix.com/?PageID=99, 2007. [On-line; accessed 15-October-2013].[AS99] Anne Adams and Martina Angela Sasse. Users are not the enemy. Communica-tions of the ACM, 42(12):40?46, 1999.[ATO12] Chaitrali Amrutkar, Patrick Traynor, and Paul C. Oorschot. Measuring SSLindicators on mobile browsers: Extended life, or end of the road? In Dieter162BibliographyGollmann and FelixC. Freiling, editors, Information Security, volume 7483 ofLecture Notes in Computer Science, pages 86?103. Springer Berlin Heidelberg,2012.[BBGM10] Jason Bau, Elie Bursztein, Divij Gupta, and John Mitchell. State of the art:Automated black-box web application vulnerability testing. In Proceedings of theIEEE Symposium on Security and Privacy (SP?10), 2010.[BBMV07] Sruthi Bandhakavi, Prithvi Bisht, P. Madhusudan, and V. N. Venkatakrishnan.CANDID: Preventing SQL injection attacks using dynamic candidate evaluations.In Proceedings of the 14th ACM Conference on Computer and CommunicationsSecurity (CCS?07), pages 12?24, Alexandria, Virginia, USA, October 2007.[BC08] Konstantin Beznosov and Brian Chess. Security for the rest of us: An industryperspective on the secure-software challenge. IEEE Software, 25(1):10?12, 2008.[BCF10] Jonell Baltazar, Joey Costoya, and Ryan Flores. The real face of Koob-face: The largest web 2.0 botnet explained. http://www.trendmicro.com/cloud-content/us/pdfs/security-intelligence/white-papers/wp_the-real-face-of-koobface.pdf, 2010. [Online; accessed 16-January-2012].[BCS09] Adam Barth, Juan Caballero, and Dawn Song. Secure content sniffing for webbrowsers, or how to stop papers from reviewing themselves. In Proceedings of the30th IEEE Symposium on Security and Privacy (SP?09), pages 360?371, Wash-ington, DC, USA, 2009.[BDP06] William E. Burr, Donna F. Dodson, and W. Timothy Polk. Electronic authenti-cation guideline. NIST special publication, 800:63, 2006.[BFIK99] Matt Blaze, Joan Feigenbaum, John Ioannidis, and Angelos D. Keromytis.The KeyNote trust-management system version 2. http://www.ietf.org/rfc/rfc2704.txt, September 1999. [Online; accessed 15-October-2013].[BFL96] Matt Blaze, Joan Feigenbaum, and Jack Lacy. Decentralized trust management.In Proceedings of the IEEE Symposium on Security and Privacy (SP?96), pages164?173, Washington DC, USA, 1996.[BHvOS12] Joseph Bonneau, Cormac Herley, Paul C. van Oorschot, and Frank Stajano. Thequest to replace passwords: A framework for comparative evaluation of Webauthentication schemes. In Proceedings of the IEEE Symposium on Security andPrivacy (SP?12), pages 553?567, 2012.[BJAGS02] Roberto J. Bayardo Jr., Rakesh Agrawal, Daniel Gruhl, and Amit Somani.YouServ: A web-hosting and content sharing tool for the masses. In Proceed-163Bibliographyings of the 11th International Conference on World Wide Web (WWW?08), pages345?354, New York, NY, USA, 2002. ACM.[BJM08] Adam Barth, Collin Jackson, and John C. Mitchell. Robust defenses for cross-siterequest forgery. In Proceedings of the 15th ACM Conference on Computer andCommunications Security (CCS?08), pages 75?88, New York, NY, USA, 2008.ACM.[BJM09] Adam Barth, Collin Jackson, and John C. Mitchell. Securing frame communica-tion in browsers. Commun. ACM, 52(6):83?91, June 2009.[BK04] Stephen W. Boyd and Angelos D. Keromytis. SQLrand: Preventing SQL injec-tion attacks. In Proceedings of the Second International Conference on AppliedCryptography and Network Security (ACNS?04), pages 292?302, June 2004.[BKM+08] P. Austel S. Bhola, S. Chari L. Koved, M. McIntosh, M. Steiner, and S. Weber.Secure delegation for web 2.0 and mashups. In Web 2.0 Security and PrivacyWorkshop (W2SP?08), 2008.[BMBR11] Yazan Boshmaf, Ildar Muslukhov, Konstantin Beznosov, and Matei Ripeanu. Thesocialbot network: When bots socialize for fame and money. In Proceedings ofthe 27th Annual Computer Security Applications Conference (ACSAC?11), pages93?102, New York, NY, USA, 2011. ACM.[Bre07] Breach Security Inc. ModSecurity. http://www.modsecurity.org/, 2007. [On-line; accessed 15-October-2013].[Bro99] Keith Brown. Building a lightweight COM interception framework part 1: Theuniversal delegator. Microsoft Systems Journal. http://www.microsoft.com/msj/0199/intercept/intercept.aspx, January 1999. [Online; accessed 15-October-2013].[BSBK09] Leyla Bilge, Thorsten Strufe, Davide Balzarotti, and Engin Kirda. All yourcontacts are belong to us: Automated identity theft attacks on social net-works. In Proceedings of the 18th International Conference on World Wide Web(WWW?09), pages 551?560, New York, NY, USA, 2009. ACM.[Buf09] Johnny Bufu. OpenID4Java. http://code.google.com/p/openid4java/, 2009.[Online; accessed 23-August-2011].[BWS05] Gregory T. Buehrer, Bruce W. Weide, and Paolo A. G. Sivilotti. SQLGuard:Using parse tree validation to prevent SQL injection attacks. In Proceedings ofthe 5th International Workshop on Software Engineering and Middleware, pages106?113, Lisbon, Portugal, September 2005.164Bibliography[CA 12] CA Technologies. CA IdentityMinder. http://www.ca.com/us/user-provisioning.aspx, 2012. [Online; accessed 15-October-2013].[Cai11] Jerry Cain. Updated JavaScript SDK and OAuth 2.0 roadmap. https://developers.facebook.com/blog/post/525/, 2011. [Online; accessed 16-April-2012].[Cam05] Kim Cameron. The laws of identity. http://www.identityblog.com/stories/2005/05/13/TheLawsOfIdentity.pdf, 2005. [Online; accessed 15-October-2013].[Can11] Ran Canetti. Universally composable security: A new paradigm for cryptographicprotocols. In Proceedings of Foundations of Computer Science, 2011.[CFP06] Barbara Carminati, Elena Ferrari, and Andrea Perego. Rule-based access controlfor social networks. In On the Move to Meaningful Internet Systems 2006: OTM2006 Workshops. LNCS, Springer-Verlag, 2006.[CJR11] Suresh Chari, Charanjit Jutla, and Arnab Roy. Universally composable securityanalysis of OAuth v2.0. http://eprint.iacr.org/2011/526.pdf, 2011. [Online;accessed 15-October-2013].[Clo08] Tyler Close. Web-key: Mashing with permission. In Web 2.0 Security and PrivacyWorkshop (W2SP?08), 2008.[CLZS11] Charlie Curtsinger, Benjamin Livshits, Benjamin Zorn, and Christian Seifert.ZOZZLE: Fast and precise in-browser JavaScript malware detection. In Proceed-ings of the 20th USENIX Conference on Security, Berkeley, CA, USA, 2011.[Com05] XACML Technical Committee. OASIS eXtensible Access Control MarkupLanguage (XACML) version 2.0. http://www.oasis-open.org/committees/xacml/, February 2005. [Online; accessed 15-October-2013].[Cor05] CoreStreet Ltd. Spoofstick. http://www.spoofstick.com/, 2005. [Online; ac-cessed 23-August-2011].[CVB05] Carlos Caleiro, Luca Vigan, and David Basin. Deconstructing Alice and Bob.Electronic Notes in Theoretical Computer Science, 135(1):3 ? 22, 2005.[CvOB06] Sonia Chiasson, Paul C. van Oorschot, and Robert Biddle. A usability studyand critique of two password managers. In Proceedings of 15th USENIX SecuritySymposium, pages 1?16, Vancouver, Canada, August 2-4 2006.[DBW89] Fred D. Davis, Richard P. Bagozzi, and Paul R. Warshaw. User acceptance ofcomputer technology: A comparison of two theoretical models. ManagementScience, 35:982?1003, August 1989.165Bibliography[DD08] Rachna Dhamija and Lisa Dusseault. The seven flaws of identity management:Usability and security challenges. IEEE Security and Privacy, 6:24?29, 2008.[DDHY92] David L. Dill, Andreas J. Drexler, Alan J. Hu, and C. Han Yang. Protocolverification as a hardware design aid. In Proceedings of the IEEE InternationalConference on Computer Design, 1992.[DH76] W. Diffie and M.E. Hellman. New directions in cryptography. IEEE Transactionson Information Theory, IT-22:644?654, 1976.[DM00] G. Denker and J. Millen. Capsl integrated protocol environment. In Proceedingsof DARPA Information Survivability Conference and Exposition, volume 1, pages207?221, 2000.[DO10] Bart Delft and Martijn Oostdijk. A security analysis of OpenID. In Proceedings ofthe 2nd IFIP WG 11.6 Working Conference on Policies and Research in IdentityManagement (IDMAN?10), November 2010.[Dro09] Dropbox Corp. Sync your files online and across computers. http://www.getdropbox.com/, 2009. [Online; accessed 15-October-2013].[DT05] Rachna Dhamija and J. D. Tygar. The battle against phishing: Dynamic securityskins. In Proceedings of the First Symposium on Usable Privacy and Security(SOUPS?05), pages 77?88, New York, NY, USA, 2005. ACM.[DTH06] Rachna Dhamija, J. D. Tygar, and Marti Hearst. Why phishing works. In Pro-ceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI?06), pages 581?590, Montreal, Quebec, Canada, 2006. ACM.[DTO02] Jerry DeVault, Brian Tretick, and Kevin Ogorzelec. Privacy and independent veri-fication: What consumers want. http://consumerprivacyguide.com/privacy/ccp/verification1.pdf, 2002. [Online; accessed 23-August-2011].[DY83] Danny Dolev and Andrew Yao. On the security of public key protocols. IEEETransactions on Information Theory, 29(2):198 ? 208, March 1983.[ECH08] Serge Egelman, Lorrie Faith Cranor, and Jason Hong. You?ve been warned:An empirical study of the effectiveness of web browser phishing warnings. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI?08), pages 1065?1074, New York, NY, USA, 2008. ACM.[EFL+99] Carl M. Ellison, Bill Frantz, Butler Lampson, Ron Rivest, Brian Thomas, andTatu Ylonen. SPKI certificate theory. http://www.ietf.org/rfc/rfc2693.txt,September 1999. [Online; accessed 15-October-2013].166Bibliography[Ege13] Serge Egelman. My rofile is my password, verify me! The privacy/conviencetradeoff of Facebook Connect. In Proceedings of the SIGCHI Conference on Hu-man Factors in Computing Systems (CHI?13), pages 2369?2378, New York, NY,USA, 2013. ACM.[EMKK11] Manuel Egele, Andreas Moser, Christopher Kruegel, and Engin Kirda. PoX:Protecting users from malicious Facebook applications. In Proceedings of the 2011IEEE International Conference on Pervasive Computing and CommunicationsWorkshops, pages 288?294, 2011.[Evi12] Evidian Corp. Evidian Enterprise SSO. http://www.evidian.com/iam/enterprise-sso/index.htm, 2012. [Online; accessed 15-October-2013].[Fac11] Facebook Inc. Facebook platform statistics. http://www.facebook.com/press/info.php?statistics, 2011. [Online; accessed 09-December-2011].[Fac13] Facebook Inc. Facebook devlopers home. https://developers.facebook.com/,2013. [Online; accessed 15-October-2013].[Fen98] Tino Fenech. Using perceived ease of use and perceived usefulness to predictacceptance of the World Wide Web. Computer Networks and ISDN Systems,30(1):629?630, 1998.[FH07] Dinei Florencio and Cormac Herley. A large-scale study of web passwordhabits. In Proceedings of the 16th International Conference on World Wide Web(WWW?07), pages 657?666, New York, NY, USA, 2007. ACM.[Fli12] Flickr. Flickr API Documentation. http://www.flickr.com/services/api/,2012. [Online; accessed 15-October-2013].[FN08] David Fuelling and Will Norris. Email Address to URL Transformation 1.0.http://eaut.org/specs/1.0/, June 2008. [Online; accessed 15-October-2013].[Fre08] Beverly Freeman. Yahoo! OpenID: One key, many doors. http://developer.yahoo.com/openid/openid-research-jul08.pdf, July 2008. [Online; accessed23-August-2011].[GF06] Shirley Gaw and Edward W. Felten. Password management strategies for onlineaccounts. In Proceedings of the Second Symposium on Usable Privacy and Security(SOUPS?06), pages 44?55, 2006.[Gig09] Gigya Inc. Social Login: What CMOs should know. http://blog.gigya.com/social-login-what-cmos-should-know-infographic/, 2009. [Online; ac-cessed 15-October-2013].167Bibliography[Gig11] Gigya Inc. Social media for business. http://www.gigya.com/, 2011. [Online;accessed 15-October-2013].[Gig12] Gigya Inc. Social privacy survey: Consumers want more transparency. http://info.gigya.com/social-privacy-survey-email.html, 2012. [Online; accessed15-October-2013].[Goo08] Google Inc. AuthSub authentication. http://code.google.com/apis/accounts/docs/AuthSub.html, 2008. [Online; accessed 15-October-2013].[Goo11] Google Inc. The 1000 most-visited sites on the web. http://www.google.com/adplanner/static/top1000/, 2011. [Online; accessed 12-December-2011].[GPCR07] Sujata Garera, Niels Provos, Monica Chew, and Aviel D. Rubin. A framework fordetection and measurement of phishing attacks. In Proceedings of the 2007 ACMWorkshop on Recurring Malcode, pages 1?8, New York, NY, USA, 2007. ACM.[GS67] Barney Glaser and Anselm Strauss. The discovery of grounded theory: Strategiesfor qualitative research. Aldine de Gruyter, 1967.[Hao09] Xu Hao. Attacking certificate-based authentication system and Microsoft Info-Card. In Power of Community Security Conference, 2009.[HBH07] Dick Hardt, Johnny Bufu, and Josh Hoyt. OpenID Attribute Exchange 1.0- Final. http://openid.net/specs/openid-attribute-exchange-1\_0.html,2007. [Online; accessed 28-September-2011].[HDR06] Josh Hoyt, Jonathan Daugherty, and David Recordon. OpenIDSimple Registration Extension 1.0. http://openid.net/specs/openid-simple-registration-extension-1_0.html, 2006. [Online; accessed28-September-2011].[Her09] Cormac Herley. So long, and no thanks for the externalities: The rational rejec-tion of security advice by users. In Proceedings of the New Security ParadigmsWorkshop (NSPW?09), pages 133?144, New York, NY, USA, 2009. ACM.[HHJM08] Jeff Hodges, Josh Howlett, Leif Johansson, and RL Morgan. Towards Kerberizingweb identity and services. http://www.kerberos.org/software/kerbweb.pdf,2008. [Online; accessed 15-October-2013].[HJ08] Amir Herzberg and Ahmad Jbara. Security and identification indicators forbrowsers against spoofing and phishing attacks. ACM Transactions on InternetTechnology, 8(4):1?36, 2008.[HL03] Michael Howard and David LeBlanc. Writing Secure Code. Microsoft Press,Redmond, Washington, 2nd edition, 2003.168Bibliography[HL08] Eran Hammer-Lahav. XRDS-Simple 1.0. http://xrds-simple.net/core/1.0/,March 2008. [Online; accessed 15-October-2013].[HL09] Eran Hammer-Lahav. OAuth security advisory. http://oauth.net/advisories/2009-1/, 2009. [Online; accessed 15-October-2013].[HL10] Eran Hammer-Lahav. OAuth 2.0 (without signatures)is bad for the Web. http://hueniverse.com/2010/09/oauth-2-0-without-signatures-is-bad-for-the-web/, 2010. [Online;accessed 01-April-2012].[HL12] Eran Hammer-Lahav. OAuth 2.0 and the road to hell. http://hueniverse.com/2012/07/oauth-2-0-and-the-road-to-hell/, 2012. [Online; accessed 15-October-2013].[HLBA11] Eran Hammer-Lahav, Adam Barth, and Ben Adida. HTTP authen-tication: MAC access authentication. http://tools.ietf.org/html/draft-ietf-oauth-v2-http-mac-00, 2011. [Online; accessed 15-October-2013].[HLLM+07] Eran Hammer-Lahav, Ben Laurie, Chris Messina, David Recordon, and DickHardt. The OAuth 1.0 protocol. http://oauth.net/core/1.0/, 2007. [Online;accessed 15-October-2013].[HLLM+09] Eran Hammer-Lahav, Ben Laurie, Chris Messina, David Recordon, and DickHardt. OAuth Core 1.0 Revision A. http://oauth.net/core/1.0a/, 2009. [On-line; accessed 15-October-2013].[HLM+11] Pieter Hooimeijer, Benjamin Livshits, David Molnar, Prateek Saxena, and Mar-gus Veanes. Fast and precise sanitizer analysis with BEK. In Proceedings ofthe 20th USENIX conference on Security, Berkeley, CA, USA, 2011. USENIXAssociation.[HLRH10] Eran Hammer-Lahav, David Recordon, and Dick Hardt. The OAuth 1.0 protocol.http://tools.ietf.org/html/rfc5849, 2010. [Online; accessed 15-October-2013].[HLRH11] Eran Hammer-Lahav, David Recordon, and Dick Hardt. The OAuth 2.0 autho-rization protocol. http://tools.ietf.org/html/draft-ietf-oauth-v2-22,2011. [Online; accessed 15-October-2013].[HO05] William G.J. Halfond and Alessandro Orso. AMNESIA: Analysis and monitoringfor neutralizing SQL injection attacks. In Proceedings of the 20th InternationalConference on Automated Software Engineering, pages 174?183, Long Beach, Cal-ifornia, USA, 2005.169Bibliography[HOM06] William G. J. Halfond, Alessandro Orso, and Panagiotis Manolios. Using positivetainting and syntax-aware evaluation to counter SQL injection attacks. In Pro-ceedings of the 14th ACM SIGSOFT International Symposium on Foundations ofSoftware Engineering, pages 175?185, Portland, Oregon, USA, 2006.[Hon12] Jason Hong. The state of phishing attacks. Commun. ACM, 55(1):74?81, January2012.[HSA+10] Steve Hanna, Eui Chul Richard Shinz, Devdatta Akhawe, Arman Boehmz, Pra-teek Saxena, and Dawn Song. The Emperor?s new APIs: On the (in)secureusage of new client-side primitives. In Web 2.0 Security and Privacy Workshop(W2SP?10), 2010.[HVO06] William G.J. Halfond, Jeremy Viegas, and Alessandro Orso. A classificationof SQL injection attacks and countermeasures. In Proceedings of of the IEEEInternational Symposium on Secure Software Engineering, 2006.[HWF05] J. Alex Halderman, Brent Waters, and Edward W. Felten. A convenient methodfor securely managing passwords. In Proceedings of the 14th International Con-ference on World Wide Web (WWW?05), pages 471?479, 2005.[HYH+04] Yao-Wen Huang, Fang Yu, Christian Hang, Chung-Hung Tsai, D? TL?ee, andSy-Yen Kuo. Securing web application code by static analysis and runtime pro-tection. In Proceedings of the 13th International Conference on World Wide Web(WWW?04), pages 40?52, 2004.[Imp12] Imprivata Corp. Imprivata OneSign. http://www.imprivata.com, 2012. [Online;acces