UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Personal data curation in the cloud age : individual differences and design opportunities Vitale, Francesco 2020

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
24-ubc_2020_november_vitale_francesco.pdf [ 19.74MB ]
Metadata
JSON: 24-1.0392427.json
JSON-LD: 24-1.0392427-ld.json
RDF/XML (Pretty): 24-1.0392427-rdf.xml
RDF/JSON: 24-1.0392427-rdf.json
Turtle: 24-1.0392427-turtle.txt
N-Triples: 24-1.0392427-rdf-ntriples.txt
Original Record: 24-1.0392427-source.json
Full Text
24-1.0392427-fulltext.txt
Citation
24-1.0392427.ris

Full Text

Personal Data Curation in the Cloud AgeIndividual Differences and Design OpportunitiesbyFrancesco VitaleB.Sc., University of Milano-Bicocca, 2013M.Sc., KTH Royal Institute of Technology, 2016M.Sc. Mention tre`s bien, Universite´ Paris-Saclay, 2016A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Computer Science)The University of British Columbia(Vancouver)July 2020c© Francesco Vitale, 2020The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, the dissertation entitled:  Personal Data Curation in the Cloud Age: Individual Differences and Design Opportunities  submitted by Francesco Vitale  in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science  Examining Committee: Joanna McGrenere, Professor, Department of Computer Science, UBC Supervisor  William Odom, Assistant Professor, School of Interactive Arts & Technology, Simon Fraser University Supervisory Committee Member  Ivan Beschastnikh, Associate Professor, Department of Computer Science, UBC  Supervisory Committee Member Margo Seltzer, Professor, Department of Computer Science, UBC University Examiner Heather O’Brien, Associate Professor, iSchool, UBC University Examiner   Additional Supervisory Committee Members: Karon MacLean, Professor, Department of Computer Science, UBC Supervisory Committee Member iiAbstractPeople are creating and storing a growing amount of personal data, from photosand documents, to messages and applications, on a growing number of devices.Storage space, often in the cloud, is cheap or free. But previous research showsthat a degree of selectivity and curation is necessary to build personal archives thathave value over time.In this dissertation, we ask: How do different people decide what personal datato keep or discard? What drives their decisions? And how can data managementtools better support individual preferences?We used a qualitative and design-based approach to conduct four studies con-sisting of 64 interviews in total and a survey (n=349).First, we identified a spectrum of tendencies that informed how participants(n=23) decided what to keep or discard, with two extremes: “hoarding” (keep-ing most of data), and “minimalism” (keeping as little as possible). We extendedthis spectrum with a set of five behavioral styles that capture contextual curationpatterns: taking a casual approach to data, feeling overwhelmed, collecting data,purging data, and trying to be frugal. This model of behaviors (based on the 64interviews) highlights a key role for data curation: what people keep or discardinforms how they think about their own identity.We used these insights to map a design space for data curation and create fivedesign concepts for different user needs, exploring automation and other key designdimensions. Participants’ reactions (n=16) varied: some welcomed technology andautomation, others opposed it, with context informing their reactions.Inspired by these results and using a taxonomy of data types and declutteringcriteria based on the survey (n=349), we designed Data Dashboard, a tool thatiiiaggregates data from a user’s multitude of devices and cloud platforms, providingcustomizable functions for different goals. We evaluated a prototype of the systemwith 18 participants and found that a personalized approach to data curation ispromising, so long as it respects users’ boundaries.Our work outlines key design directions and opportunities that can help envi-sion new tools, prioritize user needs, and redefine our relationship with personaldata in a world full of it.ivLay SummaryAs people create, save, and share a growing amount of personal digital data (for ex-ample, photos, documents, and mobile apps) on their devices and online platforms,how do they decide what to keep or discard? To find out, we ran 64 interviews andan online survey (349 participants). We identified how some participants tendedto keep almost everything, while others tried to discard as much as possible. Par-ticipants often took a varied approach based on different categories of items (forexample, keeping all photos but discarding most apps). Using these results, wecreated five design concepts and a prototype data management tool that could helpusers decide what to keep or discard. We evaluated the concepts and the tool withparticipants and found that because people’s preferences differ, a good approach isto create tools that can be personalized to individual desires. Our work providesdirections to build such tools.vPrefaceResearch is a collaborative effort. All the studies I present in this dissertationinvolved a set of collaborators in different capacities.My supervisor Dr. Joanna McGrenere followed my PhD work from start tofinish. After Study 1 (Chapter 4), Dr. William Odom joined as a member of myPhD committee and an unofficial co-supervisor. Dr. McGrenere and Dr. Odomjointly supervised and helped with all the additional studies that make up my PhD(Chapter 5, Chapter 6, Chapter 7, Chapter 8). Here, I list the collaborators involvedin each study and detail my role and contributions.In Chapter 4, I report Study 1, that I conducted as part of my RPE (ResearchProficiency Evaluation) 1 from May to August 2017. I designed the study, choos-ing data collection and analysis methods, guided by my supervisor Dr. JoannaMcGrenere. I conducted and transcribed all 23 interviews. I analyzed the datahelped by PhD student Izabelle Janzen and my supervisor Dr. Joanna McGrenere,as described in the chapter. I was the lead author of the final report and paperbased on it, later published at the ACM CHI 2018 conference [290], where it wona Best Paper Award (top 1% of all paper submissions). Because the paper has beenpeer-reviewed, in Chapter 4, I reproduce its contents with a few adjustments:Francesco Vitale, Izabelle Janzen, and Joanna McGrenere. (2018)Hoarding and Minimalism: Tendencies in Digital Data Preservation.Proceedings of the SIGCHI Conference on Human Factors in Com-puting Systems (CHI ’18).1The RPE process consists of a four-month research project, followed by an examinationfrom an RPE supervisory committee. More details about the process are available online:https://www.cs.ubc.ca/students/grad/graduate-programs/research-proficiency-evaluation-rpe.viIn Chapter 5, I report Study 2, consisting of 7 interviews and an online survey.This study took place in the first half of 2018. I designed the interview study andthe survey with the guidance of my co-supervisors Dr. Joanna McGrenere and Dr.William Odom. I recruited participants and then collected and analysed all data,discussing it with my co-supervisors.In Chapter 6, I report an overarching analysis of the four main studies in myPhD, introducing a set of five behavioral styles or archetypes. I originally had theidea of developing a set of “archetypes” in the summer of 2018, after conductingthe first two interview studies and the survey. As the chapter explains, I iteratedon the analysis throughout my PhD helped by my co-supervisors Dr. Joanna Mc-Grenere and Dr. William Odom, but I am largely responsible for the analysis pro-cess. My co-supervisors helped in structuring the chapter and its contributions. Mysupervisory committee also helped in framing the structure of this chapter.In Chapter 7, I report Study 3, a design study that took place in the secondhalf of 2018. I developed the design concepts and defined the study protocol withthe guidance of my co-supervisors Dr. William Odom and Dr. Joanna McGrenere.In particular, Dr. Odom suggested the specific methods we used, centred aroundcreating different design concepts and video prototypes. I recruited, conducted,and analysed all of the interviews in the elicitation study. I wrote a full paper basedon the study, with input from my co-supervisors Dr. Odom and Dr. McGrenere.The paper was published at the ACM DIS 2019 conference [291], where it receiveda Best Paper Honorable Mention (top 2% of all paper submissions). Because thepaper has been peer-reviewed, in Chapter 7, I reproduce its contents with someadjustments and additions, based on comments from my committee:Francesco Vitale, William Odom, and Joanna McGrenere. (2019)Keeping and Discarding Personal Data: Exploring a Design Space.Proceedings of the 2019 Conference on Designing Interactive Systems(DIS ’19).In Chapter 8, I report Study 4, a design study that took place at the end of2019 (after I spent the summer and early fall of 2019 at Google as a User Experi-ence Research Intern). I designed and developed the bulk of the Data Dashboardprototype in the first half of 2019, before my internship. From September 2019,viiundergraduate student Janet Chen joined the project to help finish implementingsome functions of the prototype. Janet Chen also assisted in running the evalu-ation study. With input from my co-supervisors Dr. Joanna McGrenere and Dr.William Odom, I decided the methods to use and designed the study protocol. Irecruited all participants and moderated all sessions and pilots, except for the lasttwo interview sessions, that were moderated by Janet Chen. Janet Chen also actedas a note-taker in several interview sessions, helped with data analysis, conducteda literature review of past design work in Personal Information Management (PIM)under my guidance, and contributed to writing the Data Dashboard usability report(Appendix A). I wrote a full paper based on the study with input from Janet Chen,and my co-supervisors Dr. William Odom and Dr. Joanna McGrenere. The paperwas published at the ACM DIS 2020 conference [292]. Because the paper has beenpeer-reviewed, in Chapter 8, I reproduce its contents with some adjustements andadditions, based on comments from my committee:Francesco Vitale, Janet Chen, William Odom, and Joanna Mc-Grenere. (2020) Data Dashboard: Exploring Centralization and Cus-tomization in Personal Data Curation. Proceedings of the 2020 Con-ference on Designing Interactive Systems (DIS ’20).Content from the three papers published at CHI and DIS also appears in Chap-ter 1 and Chapter 2. These chapters also include revised content from an extendedabstract accepted to the Doctoral Consortium (a curated track) of the ACM CHI2019 conference [289]:Francesco Vitale. (2019) Designing for Long-term Digital DataManagement. In CHI Conference on Human Factors in ComputingSystems Extended Abstracts (CHI’19 Extended Abstracts).In the bulk of the dissertation, I write about my work using the plural first-person to acknowledge the collaborative nature of the research studies that Ipresent. I use the first person in Chapter 3 to reflect on my research positional-ity and take ownership of the research process as a whole.The UBC Behavioural Research Ethics Board (BREB) approved all studiesreported in this dissertation: certificate number H17-00734.viiiA note on terminologyThroughout the dissertation, we use different expressions to indicate specific useractions that we are interested in studying: data preservation (Chapter 4), declut-tering (Chapter 5), data curation (Chapter 6 and Chapter 8), data selection (Chap-ter 7), keeping and discarding decisions (throughout the whole dissertation). Theseexpressions refer to very similar notions and reflect the evolution of our investiga-tion. We use them on a chapter basis because they reflect our thinking in eachspecific phase of the research. In Chapter 2, we better explain differences, over-laps, and nuances in meaning. We also provide a definition of each term on achapter basis.ixTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xixList of Supplementary Material . . . . . . . . . . . . . . . . . . . . . . xxiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Context and motivation . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 A seductive digital landscape . . . . . . . . . . . . . . . . 11.1.2 The need to focus on personal data curation . . . . . . . . 21.1.3 Why discarding data is hard . . . . . . . . . . . . . . . . 21.1.4 Why look at individual differences . . . . . . . . . . . . . 31.1.5 Why studying personal data curation matters . . . . . . . 41.2 Research goals, research questions, and approach . . . . . . . . . 4x1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1 Defining personal data and personal data curation . . . . . . . . . 92.1.1 Framing personal data management . . . . . . . . . . . . 92.1.2 The moral economy of personal data management . . . . 112.1.3 The personal data curation cycle . . . . . . . . . . . . . . 122.2 Practices and challenges in personal data curation . . . . . . . . . 142.2.1 Keeping data to build an identity and to remember . . . . 142.2.2 The difficulty of discarding data . . . . . . . . . . . . . . 152.2.3 Previous insights about discarding decisions . . . . . . . . 162.3 Summary and conclusion . . . . . . . . . . . . . . . . . . . . . . 173 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1 Research approach . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Criteria for quality . . . . . . . . . . . . . . . . . . . . . . . . . 233.3 Research positionality, reflexivity, and transparency . . . . . . . . 233.3.1 Research positionality . . . . . . . . . . . . . . . . . . . 233.3.2 Reflection on my analysis choices . . . . . . . . . . . . . 243.3.3 Conflict of interest disclosure . . . . . . . . . . . . . . . 264 Study 1:Identifying a Spectrum of Data Preservation Tendencies . . . . . . . 274.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2.1 Framing data preservation . . . . . . . . . . . . . . . . . 294.2.2 Digital hoarding . . . . . . . . . . . . . . . . . . . . . . 304.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . 314.3.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3.3 Data collection . . . . . . . . . . . . . . . . . . . . . . . 314.3.4 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . 324.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35xi4.4.1 Contextualizing information . . . . . . . . . . . . . . . . 354.4.2 Identifying hoarding and minimalism . . . . . . . . . . . 364.4.3 Hoarding: keeping to remember . . . . . . . . . . . . . . 374.4.4 Minimalism: I am more than my data . . . . . . . . . . . 424.5 Discussion and implications . . . . . . . . . . . . . . . . . . . . 464.5.1 Tendencies across a spectrum . . . . . . . . . . . . . . . 464.5.2 Individual variation is common . . . . . . . . . . . . . . 474.5.3 Comparing and contrasting hoarding and minimalism . . . 474.5.4 Reflecting on hoarding and forgetting . . . . . . . . . . . 494.5.5 Implications for shaping technologies . . . . . . . . . . . 494.6 Summary and conclusion . . . . . . . . . . . . . . . . . . . . . . 525 Study 2:Unpacking Decluttering Criteria and Practices . . . . . . . . . . . . 535.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.2.1 Contextual interviews with self-identified minimalists . . 545.2.2 Online survey . . . . . . . . . . . . . . . . . . . . . . . . 565.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.3.1 General decluttering habits . . . . . . . . . . . . . . . . . 585.3.2 A taxonomy of data types and decluttering criteria . . . . 595.3.3 Decluttering practices . . . . . . . . . . . . . . . . . . . 615.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.4.1 Physical vs. digital decluttering . . . . . . . . . . . . . . 655.4.2 Opportunities for supporting digital decluttering . . . . . 665.4.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 675.5 Summary and conclusion . . . . . . . . . . . . . . . . . . . . . . 686 Study 1, 2, 3, 4:Synthesizing Behavioral Styles in Personal Data Curation . . . . . . 696.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716.2.1 Individual differences in PIM . . . . . . . . . . . . . . . . 71xii6.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736.3.1 Study 1: exploratory interviews with a broad sample . . . 746.3.2 Study 2: contextual interviews with “minimalists” . . . . 746.3.3 Study 3: design exploration with with a varied sample . . 756.3.4 Study 4: system evaluation with a varied sample . . . . . 756.3.5 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . 766.4 Behavioral styles and approaches . . . . . . . . . . . . . . . . . . 786.4.1 Casual . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786.4.2 Overwhelmed . . . . . . . . . . . . . . . . . . . . . . . . 806.4.3 Collector . . . . . . . . . . . . . . . . . . . . . . . . . . 816.4.4 Purger . . . . . . . . . . . . . . . . . . . . . . . . . . . . 826.4.5 Frugal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836.4.6 Temporal aspects of data curation . . . . . . . . . . . . . 846.5 Evolution of the behavioral styles descriptions . . . . . . . . . . . 856.5.1 Initial, preliminary version . . . . . . . . . . . . . . . . . 866.5.2 A first, stable version . . . . . . . . . . . . . . . . . . . . 866.5.3 Summarizing the behavioral styles for recruitment . . . . 866.5.4 Validating the behavioral styles with additional studies . . 876.5.5 Connecting the behavioral styles to design research . . . . 896.5.6 Final version . . . . . . . . . . . . . . . . . . . . . . . . 906.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906.6.1 How the behavioral styles help us understand data curation 906.6.2 Behavioral styles as a resource for design and research . . 916.7 Opportunities for design and research . . . . . . . . . . . . . . . 936.7.1 Prioritizing user support . . . . . . . . . . . . . . . . . . 936.7.2 Making curation feel more engaging . . . . . . . . . . . . 946.7.3 Exploring the role of curation for memory . . . . . . . . . 956.7.4 Investigating new ways of capturing user variation . . . . 956.8 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 966.9 Summary and conclusion . . . . . . . . . . . . . . . . . . . . . . 977 Study 3:Exploring a Design Space for Selecting Personal Data . . . . . . . . 98xiii7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1007.2.1 Existing and proposed design approaches . . . . . . . . . 1007.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1047.3.1 Research approach and design dimensions . . . . . . . . . 1047.4 Design concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 1067.4.1 Patina . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1067.4.2 Data Recommender . . . . . . . . . . . . . . . . . . . . . 1097.4.3 Temporary Folder and Temporary App . . . . . . . . . . 1107.4.4 Future Filters . . . . . . . . . . . . . . . . . . . . . . . . 1127.5 Elicitation study . . . . . . . . . . . . . . . . . . . . . . . . . . . 1147.5.1 Recruitment and participants . . . . . . . . . . . . . . . . 1147.5.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 1157.5.3 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . 1167.6 Thematic analysis results . . . . . . . . . . . . . . . . . . . . . . 1167.6.1 Theme 1: Selecting data is a personal responsibility . . . . 1177.6.2 Theme 2: Selecting data is a chore . . . . . . . . . . . . . 1207.6.3 Theme 3: Context is key . . . . . . . . . . . . . . . . . . 1227.7 Discussion and future directions . . . . . . . . . . . . . . . . . . 1247.7.1 Moving the design space towards personalization . . . . . 1247.7.2 Finding a space for automation . . . . . . . . . . . . . . . 1257.7.3 Safeguarding automatic and prospective decisions . . . . . 1257.7.4 Rethinking keeping and discarding actions . . . . . . . . 1267.7.5 Taking steps towards active data privacy protection . . . . 1277.7.6 Reflecting on the broader impact of our work . . . . . . . 1287.8 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1297.9 Summary and conclusion . . . . . . . . . . . . . . . . . . . . . . 1298 Study 4:Evaluating a Personalized Approach to Personal Data Curation . . 1308.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1308.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1328.2.1 Previous PIM system studies . . . . . . . . . . . . . . . . 132xiv8.2.2 Data boundaries . . . . . . . . . . . . . . . . . . . . . . 1338.2.3 Personalization and customization . . . . . . . . . . . . . 1348.3 The Data Dashboard prototype . . . . . . . . . . . . . . . . . . . 1358.3.1 Overview of the prototype . . . . . . . . . . . . . . . . . 1358.3.2 Rationale and design . . . . . . . . . . . . . . . . . . . . 1388.3.3 Personalization mechanisms . . . . . . . . . . . . . . . . 1428.3.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . 1438.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1448.4.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . 1448.4.2 Procedure and data collection . . . . . . . . . . . . . . . 1458.4.3 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . 1488.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1488.5.1 Contextualising information . . . . . . . . . . . . . . . . 1488.5.2 Data boundaries drive curation . . . . . . . . . . . . . . . 1518.5.3 Centralization blurs data boundaries . . . . . . . . . . . . 1548.5.4 Customization upholds boundaries . . . . . . . . . . . . . 1578.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1588.6.1 Reflecting on the feasibility of Data Dashboard . . . . . . 1598.6.2 Integrating data boundaries into design . . . . . . . . . . 1608.6.3 Rethinking the language of personal data . . . . . . . . . 1618.6.4 Centralization as a matter of perspective . . . . . . . . . . 1638.6.5 Envisioning a post-cloud future . . . . . . . . . . . . . . 1648.7 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1658.8 Summary and conclusion . . . . . . . . . . . . . . . . . . . . . . 1659 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1669.1 Summary of results and contributions . . . . . . . . . . . . . . . 1669.1.1 Primary research contributions . . . . . . . . . . . . . . . 1679.1.2 Secondary research contributions . . . . . . . . . . . . . 1699.2 Implications and future work . . . . . . . . . . . . . . . . . . . . 1709.2.1 Implications for User Interface (UI) and product design . . 1709.2.2 Implications for future Human-Computer Interaction (HCI)research . . . . . . . . . . . . . . . . . . . . . . . . . . . 171xv9.2.3 Implications for algorithm design . . . . . . . . . . . . . 1729.2.4 Implications for policy and sustainability . . . . . . . . . 1739.3 Closing remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 174Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175A Data Dashboard Usability Report . . . . . . . . . . . . . . . . . . . 215A.1 User interface usability . . . . . . . . . . . . . . . . . . . . . . . 215A.1.1 Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . 215A.1.2 Shared Data . . . . . . . . . . . . . . . . . . . . . . . . . 216A.1.3 Explore Your Data . . . . . . . . . . . . . . . . . . . . . 216A.1.4 Quick Actions . . . . . . . . . . . . . . . . . . . . . . . 217A.1.5 Sidebar filter panel . . . . . . . . . . . . . . . . . . . . . 218A.1.6 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 219A.2 Use scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219A.2.1 Scenario 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 219A.2.2 Scenario 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 220A.2.3 Scenario 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 220A.2.4 Scenario 4 . . . . . . . . . . . . . . . . . . . . . . . . . . 221A.2.5 Scenario 5 . . . . . . . . . . . . . . . . . . . . . . . . . . 221B Survey Questions and Additional Results . . . . . . . . . . . . . . . 223B.1 Complete set of survey questions . . . . . . . . . . . . . . . . . . 223B.2 Descriptive results about general data practices . . . . . . . . . . 239C Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241C.1 Examples of coding in Study 1 . . . . . . . . . . . . . . . . . . . 241C.2 Memos from Study 2 . . . . . . . . . . . . . . . . . . . . . . . . 244C.3 Example of coding in Study 3 . . . . . . . . . . . . . . . . . . . 246C.4 Examples of coding in Study 4 . . . . . . . . . . . . . . . . . . . 248C.5 Examples of coding in the survey responses . . . . . . . . . . . . 250D Additional Study Materials . . . . . . . . . . . . . . . . . . . . . . . 254D.1 Screening surveys for Study 3 and Study 4 . . . . . . . . . . . . . 254xviE Additional Documents . . . . . . . . . . . . . . . . . . . . . . . . . . 263E.1 Study 1 call for participation . . . . . . . . . . . . . . . . . . . . 263E.2 Study 1 recruiting poster . . . . . . . . . . . . . . . . . . . . . . 265E.3 Study 1 consent form . . . . . . . . . . . . . . . . . . . . . . . . 267E.4 Study 2 call for participation . . . . . . . . . . . . . . . . . . . . 270E.5 Study 2 recruiting poster . . . . . . . . . . . . . . . . . . . . . . 272E.6 Study 2 consent form . . . . . . . . . . . . . . . . . . . . . . . . 274E.7 Study 3 call for participation . . . . . . . . . . . . . . . . . . . . 277E.8 Study 3 consent form . . . . . . . . . . . . . . . . . . . . . . . . 279E.9 Study 4 call for participation . . . . . . . . . . . . . . . . . . . . 281E.10 Study 4 consent form . . . . . . . . . . . . . . . . . . . . . . . . 283E.11 Survey call for participation . . . . . . . . . . . . . . . . . . . . 285E.12 Survey consent section . . . . . . . . . . . . . . . . . . . . . . . 287xviiList of TablesTable 7.1 Previous research projects categorized based on their generaldesign approach. . . . . . . . . . . . . . . . . . . . . . . . . . 101Table 7.2 Previous research projects categorized based on the curationstage they focus on. . . . . . . . . . . . . . . . . . . . . . . . 102Table B.1 Survey results on general keeping and deleting decisions. . . . 239xviiiList of FiguresFigure 1.1 Overview of the research chapters in the dissertation. . . . . . 6Figure 2.1 The scope of personal data management. . . . . . . . . . . . 10Figure 2.2 The personal data curation cycle. . . . . . . . . . . . . . . . . 13Figure 4.1 Data history sketches from participants in Study 1. . . . . . . 33Figure 4.2 macOS’ advanced storage panel. . . . . . . . . . . . . . . . . 50Figure 4.3 Google’s Files application on Android. . . . . . . . . . . . . 51Figure 5.1 Participants’ homes in Study 2. . . . . . . . . . . . . . . . . . 56Figure 5.2 Our taxonomy of data types . . . . . . . . . . . . . . . . . . 60Figure 6.1 Overview of the four interview datasets used for the behavioralstyles analysis. . . . . . . . . . . . . . . . . . . . . . . . . . 74Figure 6.2 Overview of the five behavioral styles. . . . . . . . . . . . . . 79Figure 6.3 Illustration of the five behavioral styles. . . . . . . . . . . . . 92Figure 7.1 Overview of the design concepts in Study 3. . . . . . . . . . . 107Figure 7.2 Design concept: Patina on desktop folders. . . . . . . . . . . 107Figure 7.3 Design concept: Patina for music playlists. . . . . . . . . . . 108Figure 7.4 Design concept: Data Recommender. . . . . . . . . . . . . . 110Figure 7.5 Design concept: Temporary Folder. . . . . . . . . . . . . . . 111Figure 7.6 Design concept: Temporary App. . . . . . . . . . . . . . . . 112Figure 7.7 Design concept: Future Filters. . . . . . . . . . . . . . . . . . 113Figure 7.8 Overview of participants’ reactions to our design concepts. . . 117xixFigure 8.1 Data Dashboard prototype: Activity. . . . . . . . . . . . . . . 136Figure 8.2 Data Dashboard prototype: Explore Your Data. . . . . . . . . 137Figure 8.3 Data Dashboard prototype: Quick Actions. . . . . . . . . . . 138Figure 8.4 Data Dashboard prototype: Settings page. . . . . . . . . . . . 139Figure 8.5 Tools that inspired the design of Data Dashboard. . . . . . . . 141Figure 8.6 Data Dashboard prototype: customization panel. . . . . . . . 143Figure 8.7 Overview of participants’ reactions to Data Dashboard. . . . . 150Figure 8.8 Automatic categories in Gmail’s inbox. . . . . . . . . . . . . 161Figure B.1 Survey results on keeping and deleting decisions for differentdata types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240Figure C.1 Example of coding from Study 1. . . . . . . . . . . . . . . . 241Figure C.2 Additional example of coding from Study 1. . . . . . . . . . . 242Figure C.3 Example of categories and codes from Study 1. . . . . . . . . 242Figure C.4 Examples of codes about “hoarding” in Study 1. . . . . . . . 243Figure C.5 Examples of codes about “minimalism” in Study 1. . . . . . . 243Figure C.6 Example of coding from Study 3. . . . . . . . . . . . . . . . 246Figure C.7 Subset of categories and codes from Study 3. . . . . . . . . . 247Figure C.8 First example of coding from Study 4. . . . . . . . . . . . . . 248Figure C.9 Second example of coding from Study 4. . . . . . . . . . . . 248Figure C.10 Third example of coding from Study 4. . . . . . . . . . . . . 249Figure C.11 Fourth example of coding from Study 4. . . . . . . . . . . . . 249Figure C.12 Example of coding from the online survey. . . . . . . . . . . 251Figure C.13 Additional example of coding from the online survey. . . . . . 252Figure C.14 Subset of categories and codes from the survey. . . . . . . . . 253xxList of Supplementary Material1. Video prototype for the Patina concept from Study 3 (study3-patina.mp4)2. Video prototype for the Data Recommender concept from Study 3(study3-recommender.mp4)3. Video prototype for the Temporary Folder concept from Study 3(study3-temp-folder.mp4)4. Video prototype for the Temporary App concept from Study 3(study3-temp-app.mp4)5. Video prototype for the Future Filters concept from Study 3(study3-filters.mp4)6. Video walkthrough of the Data Dashboard prototype from Study 4(study4-dashboard.mp4)xxiGlossaryFM File ManagementHCI Human-Computer InteractionPIM Personal Information ManagementUI User InterfacexxiiAcknowledgmentsI want to start by thanking my main co-supervisor Dr. Joanna McGrenere, for herconstant guidance and support over the years. Patient, encouraging, enthusiastic,and always open to new ideas: I could have not asked for a better supervisor.Next, I want to thank my co-supervisor Dr. William Odom, from whom I havelearned a lot about using design methods to drive a research inquiry. His guidancewas essential for shaping much of my work.I am grateful to my supervisory committee members, Dr. Karon MacLean andDr. Ivan Beschastnikh, for their comments over the years. In particular, I thankDr. MacLean for challenging key aspects of my work and pushing for better clarityin describing my ideas. I thank Dr. Beschastnikh for providing a much neededtechnical perspective to my work, with engaging discussions around user behaviorsand file systems.I am also thankful to Dr. Anna Cox, external examiner, and Dr. Margo Seltzerand Dr. Heather O’Brien, university examiners, for their thoughtful comments,questions, and input on my work. Their on-point suggestions helped me improvekey parts of the dissertation.I also want to thank Dr. Sandra Mathison and Dr. Marla Buchanan, instructorsof UBC’s 2017 winter edition of EPSE 595, for introducing me to key works onqualitative research that I have used in my studies.A mention goes to all UBC’s MUX lab members (students, post-docs, faculty)between 2016 and 2020 who have provided feedback on my work or who havebeen around to chat during these years. Izabelle Janzen, Oliver Schneider, SoheilKianzad, Kamyar Ardekani, Laura Cang, Paul Bucci, Yelim Kim, Taslim ArefinKhan, Shareen Mahmud, Sabrina Hauser, Michael Opperman, Jessalyn Alvina,xxiiiEmmanouil Giannisakis, Hanieh Shakeri, Ashish Chopra, Kevin Chow, Sang-WhaSien, Frances Sin, Hayley Guillou, Mohi Reza, Hasti Seifi, Matthew Chun, DilanUstek, Mona Haraty, Helen Yin, Zack Wilson, Anna Maria Offenwanger, Dong-wook Yoon, Tamara Munzner, Leila Aflatoony. A special thanks to Janet Chenwho helped me during the Data Dashboard study and who has been a great studentto supervise.I also want to thank the team members of my group project in UBC’s 2017edition of CPSC 554m: Matthew Chun, Yingying Wang, Haotian Zhang. Our workon collaborative file management inspired a small feature of the Data Dashboardprototype in Chapter 8.Several people at UBC have also helped me in different roles over the years. Ithank Dr. Alla Sheffer for chairing my research proficiency evaluation presentation,Dr. Ronald Garcia for chairing my thesis proposal defense, Dr. Eric Meyers forchairing my PhD defense, Laura Selander and Joyce Poon for their administrativesupport.I thank the organizers, mentors, and fellow participants at the CHI 2019 Doc-toral Consortium for their feedback and suggestions on my thesis: Laura Dab-bish, Koji Yatani, Duncan Brumby, Jennifer Thom, Walter Lasecki, Florian ‘Floyd’Mueller, Elena Agapie, Alex Ahmed, Yulia Brazauskayte, Simran Chopra, KarenCochrane, Michael Ann DeVito, Jessica Feuston, Jane Hoffswell, Suvi Holm,Matthew Hong, Tom Horak, Fannie Liu, Tricia Ngoon, Makuochi Nkwo, TeddySeyed, Rachel Simons, David Verweij, Jillian Warren, Cara Wilson.I also want to thank the several anonymous reviewers at CHI, DIS, and CSCWwho have taken the time to provide feedback on my submissions over the years.Their comments have contributed in improving my work.Many people that I have chatted with at CHI, DIS, or in other occasions havealso helped my work indirectly. Among others: Jofish Kaye, Steve Whittaker,Anne Spaa, Jesse David Dinneen, Ofer Bergman.A thanks also goes to my team and collaborators at Google, from whom I havelearned several research tips and techniques that I have used in my last study andbeyond: Melinda Klayman, Kerstin Kuehne, Caroline Nivetha.In Study 3 (Chapter 7) we created the concepts using Sketch and InVisionwith icons from the Streamline library and The Noun Project (by Abdo, AR Ehsan,xxivDino, Hare Krishna, and Landan Lloyd). The prototype built in Study 4 (Chapter 8)includes icons from Flat Icon.All my research, of course, would not have been possible without the manyparticipants in my studies: I thank them for taking them time to help me understandhow they curate their data and I hope that my work might help solve some of theirproblems in the future.Finally, I am thankful for the support of my family and friends, on one or theother side of the world.My PhD was supported by the grant NSERC RGPIN-2017-04549 “Highly per-sonalized user interfaces.”xxvChapter 1IntroductionIn this chapter, we provide an overview of the dissertation, starting from contextand motivation1. Then, we outline our research questions, goals, and approach. Weintroduce our contribution and then conclude with an outline of the dissertation.1.1 Context and motivationWhy study personal data curation? We argue that the current technological land-scape calls for a renewed interest and attention to how people manage their personaldata. Here, we explain why.1.1.1 A seductive digital landscapeThe past decade has changed the way people think about personal data. Ten yearsago cloud storage platforms were in their infancy. Dropbox launched in 2008,iCloud in 2011, Google Drive in 2012. Before the advent of these platforms andbefore the popularity of mobile devices, people largely thought of their data aslimited to files and folders stored on local computers. Now, personal data is abuzzword at the center of political debates and regulatory efforts [1, 265]. Datadominates all aspects of life, from work life to domestic life, so much so that anentire industry relies on the storing, accumulation, and exchange of personal data1Part of the content in this chapter also appears in the publications mentioned in the Preface tothe dissertation.1for profit [50, 264, 319]. This is a “seductive digital landscape” [183] where trendssuch as lifelogging technologies [85] and cheap storage space [155] allow peopleto collect and keep a virtually infinite amount of data. It is not surprising thatcollections of personal digital items are growing [73, 76, 213, 285].But these technological trends can lead to questions about the role of data indaily life. How can people extract value and meaning from large quantities ofdata over time? [133, 157, 229] How can permanent records of personal data allowfor forgetting? [11, 254] And how can individual behaviors scale to a growingpopulation? [84, 236, 285]1.1.2 The need to focus on personal data curationResearchers in Personal Information Management (PIM) have called for an in-creased focus on studying and designing for long-term data management prac-tices [144]. Key research questions to address include how to enable meaningfuldata legacy [111], how to approach automation of data management [144], andhow to support data curation [302]. In this dissertation, we focus on personal datacuration.Personal data curation is the process of saving and managing data for later use.It consists of three stages: keeping (where users create or save data, and then decidewhat to keep or discard), managing (where users organize kept items), and exploit-ing (where users retrieve and use the information they have kept) [302]. Improve-ments in search functions have made retrieving items easier. Instead, the keepingand managing phases of curation challenge users. In particular, users struggle whendeciding what items to keep or discard, often opting to keep as a default action [38].1.1.3 Why discarding data is hardThere are three main reasons that make discarding data challenging. First, discard-ing data can lead to a “paradox” [26]: you might want to delete unneeded data tofocus your time and attention on what is important. But deciding what to deletewill take time and attention. Second, deciding what to keep or not is hard becauseit often involves predicting the future, something that humans are not always goodat or might find stressful. People might decide to not take any intentional decision2and keep everything just to avoid missing any useful data in the future [24, 145].Third, users often perceive the activity of sorting personal data as burdensome be-cause of the poor support that data management tools provide. These challengesoften create an unintentional accumulation cycle: the more you keep, the harder itbecomes to curate [24], with people becoming increasingly less aware of what theyhave in the first place [213].More and more, research studies report how technology users can feel stressedabout the accumulation of data [171, 205, 253, 271]. Previous studies show howpeople might need some degree of selection to make sense of their digital archives,both for data they explicitly create or acquire [166, 182, 183] and also for datacreated by technology on their behalf [308]. There are situations (e.g., romanticbreakups) where people assume specific roles in relation to curating data [247]:deleter, keeper, selective disposer. But outside of these specific situations, a com-mon finding is that people often “loathe” deleting data [205, 308]. These insightsinto people’s keeping and discarding practices, however, are broad and do not con-sider individual differences.1.1.4 Why look at individual differencesMost of the literature on PIM focuses on how people organize or retrieve informa-tion. One recurring finding is that people show substantial individual differencesin their behavior. For example, some people choose to “neatly” manage files withorganized structures; others let them sit within “messy” piles [127]. Some keepemails in their inbox; others clean them regularly [117]. Some use search to re-trieve a piece of information; others rely on navigation [25, 28]. We see some fea-tures of modern data management tools as likely informed by this line of research(e.g., macOS’ automatic “Stacks” for creating piles of desktop files [8]). Thesefeatures show the importance of looking at individual differences within PIM.However, there is still a lot to understand about the data curation process as awhole. In particular, the literature lacks a deep understanding of behaviors in theinitial stage of curation, where people acquire data and then decide what to keepor discard [302]. Studies in Human-Computer Interaction (HCI) and InformationScience report how people’s digital archiving strategies are “widely varied” [259].3But despite knowing this, there is no systematic model of individual behaviors, aneed that many previous studies call out [23, 118, 150, 309]. This is the space thatthis thesis explores. In particular, we look at keeping and discarding decisionsin personal data curation.1.1.5 Why studying personal data curation mattersThe need for better curation tools also comes from popular discourse about tech-nology and people’s demand for more control over their data. Privacy is a keyconcern for people [176]. The advent of the Cloud has imposed a centralizeddata management model where a few monopolizing corporations (Amazon, Ap-ple, Google, Facebook, Microsoft) aggregate people’s data. But in recent years thepitfalls of this model have come to light: from government-sanctioned surveillanceprograms [277] and illicit use of personal data for political aims [283], to photoleaks [9] and voice assistants secretly sharing private conversations [172]. Theseepisodes have made people more aware of their own “digital footprint” and theconsequences of an economy increasingly based on data. To companies, data isnothing more than a commodity, no matter how private or personal. But peoplewant more control over their data [176]. As the importance of digital data contin-ues to grow, current models face increased scrutiny. This is also why we see a needfor better tools to manage and curate personal data.1.2 Research goals, research questions, and approachThis dissertation has three main goals:1. Providing a rich, empirical account of user practices in personal data cura-tion, with a focus on keeping and discarding decisions (an under-exploredarea of curation). To reach this goal, we ask the following research ques-tions: How do people decide what personal data to keep or discard? Whatdrives their decisions? (Chapter 4) What are the types of data that peopleconsider when curating, and what are the criteria and strategies they use to“declutter”? (Chapter 5)42. Extending previous characterizations of individual differences in PIM by fo-cusing on the first stage of curation and taking a design-focused perspective.To reach this goal, we ask the following research question (Chapter 6): Howcan we make individual differences in personal data curation actionable forresearch and design?3. Exploring and evaluating a range of design approaches for supporting differ-ent user needs. To reach this goal, we ask the following research questions:How can we design technologies to support people’s individual decisionsaround what data to keep or discard? In particular, what different designapproaches might be viable to different users and in different situations?(Chapter 7) Can centralization and customization help people decide whatpersonal data to keep or discard? (Chapter 8)To answer the research questions above, we conducted four studies (Fig-ure 1.1):• Study 1 (Chapter 4): an exploratory interview study with 23 participants.• Study 2 (Chapter 5): a mixed-methods study consisting of 7 contextual in-terviews and an online survey (349 participants).• Study 3 (Chapter 7): a design exploration with 16 participants.• Study 4 (Chapter 8): a system evaluation with 18 participants.The studies combine different methodological approaches: from a groundedtheory approach in Study 1 and Study 2, to a Research through Design approachin Study 3 and Study 4. In Chapter 3, we provide more details on the overallmethodology across studies.1.3 ContributionsThe primary contributions of the dissertation fall under three areas: theoretical,design and artifacts, empirical [310].5Chapter 4Study 1Chapter 5Study 2Chapter 6Study 1, 2, 3, 4Chapter 7Study 3Chapter 8Study 4Characterize user behaviorsCharacterize user behaviorsExtend categories of individual differencesExplore design approachesEvaluate design approachesHow do people decide what to keep or discard?What data do people declutter and how?How to make individual differences actionable in design?What design approaches are possible?Can centralization and customization help?Spectrum of preservation tendenciesTaxonomy of data types, decluttering criteria and strategiesBehavioral styles in curation, reflexive account of processSet of design dimensions and conceptsDesign approach for personalized data curationGrounded theory,    23 interviewsGrounded theory, 7 interviews, survey (349)Grounded theory, cross-study analysisResearch through Design, 16 interviewsResearch through Design, 18 interviewsResearch GoalResearch QuestionApproach & MethodsContributionsFigure 1.1: An overview of research chapters in the dissertation and the stud-ies reported in each. For each chapter, from top to bottom, we outline:the research goal, the key research question, the research approach andmethods, the key contributions.• Theoretical contributions: a rich description of individual differences in per-sonal data curation consisting of a spectrum of general preservation tenden-cies (Chapter 4) and a set of five behavioral styles (Chapter 6), with an ex-planation of their role for identity construction over time.• Design and artifact contributions: a set of dimensions and concepts fordefining and exploring a design space around personal data curation (Chap-ter 7), a design approach to address key challenges in data curation by com-bining a unified tool with personalized functions (Chapter 8), a set of gener-ative design directions and opportunities across all four studies (Chapter 4,Chapter 5, Chapter 6, Chapter 7, Chapter 8).• Empirical contributions: four studies (64 interviews in total and an onlinesurvey with 349 participants) that report personal data curation practiceswith the use of different methods and approaches (Chapter 4, Chapter 5,Chapter 6, Chapter 7, Chapter 8).6In addition, the secondary contributions of the dissertation are: a reflexive ac-count of our research and design process across the four studies (Chapter 6); ataxonomy of personal data types and decluttering criteria (Chapter 5); and a de-scription of temporal decluttering practices (Chapter 5).1.4 OutlineThe rest of the dissertation follows this structure. In Chapter 2, we provide somebackground and definitions for personal data management and curation, framingour work against two theoretical frameworks that inform our work. We also givean overview of related work on personal data management and outline key insightsabout people’s keeping and discarding practices.Chapter 3 presents the overall methodological approach of the thesis, with de-tails on methods and criteria for research quality.In Chapter 4, we present the results from Study 1, where we conducted ex-ploratory interviews with a broad sample of 23 participants. In this study, we in-troduce the idea of a spectrum of general tendencies for data preservation with twoextremes: “hoarding” (keeping most of data) and “minimalism” (trying to keep aslittle as possible). The spectrum represents a first take on individual differences inpersonal data curation.In Chapter 5, we present the results of Study 2, which was a followup studyextending the results on “hoarding” and “minimalism” using a mixed-methods ap-proach. First, we conducted contextual interviews with 7 self-identified “minimal-ists.” In parallel, we ran an online survey with 349 participants. We use open-endedsurvey responses to build a taxonomy of data types with related “decluttering” cri-teria. We later used these results to inform our design work in the last two studies.Then, based on data from both the survey and the interviews, we describe a set ofdecluttering practices and selection strategies that we later use to inform both ourwork on user modelling and our design work.In Chapter 6, we present one of the key contributions of the dissertation: a set offive behavioral styles in personal data curation: Casual, Overwhelmed, Collector,Purger, and Frugal. To describe the behavioral styles, we draw on all the fourinterview studies that make up the dissertation. First, we explain how we identified7the behavioral styles after conducting Study 1 and Study 2. Then, we bring forwarddata from the last two studies (described in more detail in Chapter 7 and Chapter 8)to show how we enriched and validated the behavioral styles.In Chapter 7, we present Study 3, a design study focused on exploring thedesign space of personal data curation with five concepts that can help users selectwhat data to keep or discard. These are Patina, Data Recommender, TemporaryFolder, Temporary App, and Future Filters. We elicited reactions to the conceptsfrom a varied sample of 16 participants (recruited using a short description of thebehavioral styles presented in Chapter 6).In Chapter 8, we present Study 4, a design study focused on evaluating DataDashboard, a prototype system for curating personal data. The system embodiesinsights from all previous studies, with some functions taking inspiration from thedesign concepts and insights presented in Chapter 7, or modeled after the patternsof behaviors from previous chapters. We evaluated the prototype with a varied sam-ple of 18 participants (recruited using the same short description of the behavioralstyles from Chapter 6 that we used in the previous design study).Finally, in Chapter 9, we summarize our results and discuss the implications ofour work.8Chapter 2BackgroundIn this chapter, we give an overview of past studies and general user practicesin personal data curation1. The next chapters will include more specific relatedwork tied to aspects of each study. In particular, in Chapter 4, we will discussrelated work about digital hoarding. In Chapter 6, we will provide more detailsabout individual differences in PIM. In Chapter 8 we will focus on previous workon personalization and customization. In both Chapter 7 and Chapter 8 we willdiscuss past design and system work in PIM.2.1 Defining personal data and personal data curationTo define the scope of our work, we first draw some distinctions between “per-sonal data management,” PIM and File Management (FM). Then, we introduce twoframeworks that guide our work.2.1.1 Framing personal data managementPIM is a broad activity that involves creating, acquiring, organising, interactingwith, and searching for personal information [27, 142]. PIM research within thefield of HCI goes back to the advent of personal desktop computers [158, 177].1Part of the content in this chapter also appears in the publications mentioned in the Preface tothe dissertation.9Personal Data ManagementPersonal Information ManagementFile ManagementFigure 2.1: We conceptualize “personal data management” as broader than“personal information management,” that in turn is broader than “filemanagement.” (The size of the circles in this illustration represents howbroad the focus of each term is, not their importance.)Today, the literature and popular discourse largely think of personal informa-tion as synonymous with digital information but that was not always the case. Sev-eral early PIM studies focus on physical information, such as paper documents,and investigate how to support people’s management practices when moving fromphysical to digital information [158, 177, 303]. As the bulk of people’s work anddaily life shifted to digital, PIM studies honed in on emerging types of information:computer documents [14, 38], emails [117, 306], contacts [305], bookmarks [273].Compared to PIM, File Management has a narrower focus, limited to files andfolders [74]. From a technical standpoint, all personal information resides in files,but files are a debated data structure metaphor [123, 191, 255]. Users might ex-perience visual files and folders as different from information such as contacts orbookmarks, that are more integrated into specific applications.Recently, HCI studies have started using terms such as “personal data,” “digitaldata,” and “digital or virtual possessions” in place of “personal information” [111,211, 288]. This shift signals an expanded focus that includes new types of data:10social network data, location data, lifelogging data, metadata, and so on. With newdevices such as voice and virtual assistants or IoT (Internet of Things) productscoming onto the market, even more types of data will become part of people’s lifeand feel increasingly personal.In this dissertation, we use the term personal data management to take a broadfocus that includes but is not limited to FM and PIM (Figure 2.1): personal datais information, but it is not only files or folders, and not all data is information.For example, data can include articles from online magazines, as one participant inStudy 1 mentioned (these are not strictly personal since they are public), or mobileapplications (better known as “apps,” these are perceived as different from files andfolders), or computer logs (this are not necessarily perceived as information).Our work largely focuses on digital data. In Study 2 (Chapter 5) we touch onkey differences between physical and digital items that can inform the design oftools for digital data curation.Throughout the dissertation, we touch on data captured automatically by tech-nology. However, we do not delve into the background of lifelogging technologies(i.e., Personal Informatics systems), which we see as a related but separate area ofwork within HCI that previous studies have explored in depth [85, 87, 148, 163,308].2.1.2 The moral economy of personal data managementIn looking at personal data management, we use two theoretical frameworks todefine the scope of our our work. The first comes from Vertesi et al. [288], whodetail how personal data can take many forms (from photos to files and logs), liv-ing in ecosystems made up of multiple devices and relationships, with online cloudstorage platforms becoming increasingly prominent. When managing their per-sonal data, people face a tension between sharing with and safeguarding data fromothers. People often make decisions based on moral convictions about what theythink is the “right way” to manage data. We adopt this moral economy of datamanagement as a backdrop for much of the analysis in the dissertation.In our work we touch on privacy issues, but do not focus on sharing behaviors.Sharing data with other people is without doubt an essential component of mod-11ern user practices in this space and undoubtedly affects some curation decisions.However, to scope the focus of our investigation we do not delve into collaborativedata management or sharing practices. Studying this related but different aspect ofdata management would require different methods and samples. We refer to workby Massey et al. [186], Voida et al. [293], and Rader [239] for more details oncollaborative user behaviors.2.1.3 The personal data curation cycleThe second framework helps us narrow the focus of our work. We refer to Whit-taker [302] to reframe common PIM actions as part of an “information curation”cycle consisting of three stages (Figure 2.2): keeping (where users acquire or createinformation and then decide what to keep or discard), managing (where users or-ganize the items they keep using folders or other structures), and exploiting (whereusers retrieve items by searching for them or navigating in their structures). (Incombining the two frameworks, we replace Whittaker’s use of the term “informa-tion” with “data,” as explained above.)We largely focus on the first stage of curation, where users acquire or createdata and then decide what to keep or discard. Throughout the studies we also touchon the second stage, management, looking at how keeping decisions intersect withorganization. Our focus on the last stage of curation, exploiting, is limited. In-formation retrieval represents a sub-field of its own, with a large body of workcovering strategies for re-finding data and providing new tools for improving userpractices. However, the retrieval stage still informs some of the ideas that we dis-cuss in several studies. Where relevant, we point out specific past work and insightsabout data retrieval.The terminology of personal data curationOne challenging aspect of studying data curation is the lack of standard definitionsfor overlapping ideas and user actions. Archiving, curating, preserving, managing,organizing, storing, cleaning up, selecting, decluttering: these are all terms thatappear in previous work. But often, researchers use them inconsistently and in away that might differ from users’ interpretations (many participants we talked to,12Exploiting Retrieving data to use it  (e.g., searching or navigating)Keeping Acquiring or creating data Selecting what to keep or discard  (e.g., archiving, decluttering) Preserving over time  (e.g., backing up, storing)Managing Organizing data  (e.g., creating folders)Personal data curation cycleFigure 2.2: An illustrative elaboration of Whittaker’s personal data (or infor-mation) curation cycle, consisting of three stages [302]: keeping, man-aging, exploiting. In our work, we focus on the first stage, keeping,elaborating on possible actions that people take. This illustration listssome of the terms and actions that we focus on.for example, used the word “archiving” to indicate they wanted to hide something,whereas research on “digital archiving” often uses the term to indicate intentionallong-term storage). The messy nature of the language of personal data and its con-sequences for user behaviors is a discussion item that we bring up later in our work,proposing directions to address it (Chapter 7 and Chapter 8). On our part, as wenote in the Preface to the dissertation, we use different terms to highlight specificaspects of the curation process that overlap but can be framed as slightly differ-ent. Even though this approach might contribute to the confusion in terminology,we think it is important to capture the nuance of different actions in personal datacuration.Throughout the dissertation, we use the expression keeping and discarding de-cisions to indicate the first stage of curation, which is the key focus of our work.Within this stage, we elaborate on possible actions that users might take after ac-quiring or creating data. (Figure 2.2) We name data selection the overarching13process of selecting specific items to keep or discard. After making this selection,users might engage in data preservation for items they want to keep (i.e., takingadditional actions to preserve items in the long-term, such as doing backups orstoring items in specific places) or data decluttering for items they do not want tokeep (i.e., getting rid of items or re-organizing them–this is an action where twostages of curation, keeping and managing, are closely linked and often happen si-multaneously because by re-organizing it becomes more clear what to get rid of).We see archiving as an overarching and somewhat ambiguous action that can beused both to preserve and declutter items depending on what meaning the actionhas for each person. The concept of a personal archive, instead, is the general ideaof a set of items stored somewhere. Engaging in personal data curation should helpmake that personal archive feel more meaningful and intentional.We focus on preservation tendencies in Chapter 4, criteria for decluttering inChapter 5, support for selection in Chapter 7, and curation as a whole (but still witha stronger focus on keeping and discarding decisions) in Chapter 6 and Chapter 8.2.2 Practices and challenges in personal data curationPrevious studies report how people engage in personal data curation for severalreasons: for example, to build a legacy [111, 112, 149], support memory [140,153], manage and honor relationships [153, 288], find things later [149], or presenttheir image to others [211, 315]. In some cases people actively acquire or createdata [24], sometimes as part of an ongoing collection [89, 299]. In other cases,they accumulate data more passively [24, 307].2.2.1 Keeping data to build an identity and to rememberOne key reason for keeping and wanting to preserve data is to build an identity [66,68, 149, 211]. As with physical possessions [17, 65], people can form attachmentsto digital items [18]—a phenomenon called “self-extension” [17, 18, 66–68]. Akey way for possessions, whether digital or physical, to help shape identity is bysupporting memory and reminiscence [149, 153, 229]. Sentimental possessions(e.g., photos in particular [43, 44, 152, 218, 307]) can help people remember keymoments and relationships from their past, acting as symbols of key experiences14in their life [153, 228]. Much of previous work explores the idea of digital “me-mentos” [39, 147, 230, 278] or “heirlooms,” [10, 210, 212] investigating ways toincorporate digital data and its potential for reminiscence into everyday life andinto people’s homes [217, 218, 226].However, previous research shows how not all “stuff” is created equal. In thecase of digital data, only some items feel as possessions [68]. To determine theimportance of digital items people might refer to values such as utility and recency,emotional attachment, craft, and replaceability [68, 104, 153, 210]. Sometimesthese values are shared between physical and digital items, but people can perceivedigital items as less unique because they lack the material qualities of physicalitems [104, 210].The literature on material culture suggests that disposing of possessions is asimportant a practice as acquiring them [60, 122, 159, 196, 249]. Studies on physical“decluttering stages” [59] or “selection regimes” [140] report useful accounts ofselection practices, showing how sorting and discarding physical items can feelenjoyable compared to sorting through digital items [227].But when it comes to personal data, many users find the act of deleting items“onerous,” a “burden,” [140] to the point of questioning the idea of deleting digitalitems in the first place because it is against “their nature” [111].2.2.2 The difficulty of discarding dataThe ongoing process of curating data is challenging because it often involves an-ticipating the future [24, 302]. Many users end up choosing to keep all their per-sonal data as a default [38]. Bergman & Whittaker [24, 302] refer to prospecttheory [145] to explain why. In general, people tend to be averse to risk and per-ceive a potential loss from discarding data as more substantial than any gain. (i.e.,the possibility of losing information that turns out to be useful outweighs the con-venience of finding data more easily.)There is also evidence that digital tools offer poor support for curating databesides going through items one by one [130] and that users can have confusedmental models of how deleting works, especially in cloud platforms [240]. The15result of these challenges and perceptions is that many technology users “approachlong-term preservation [of data] with trepidation” [227].The prevalence of keeping as a default [38, 151], however, creates an accu-mulation cycle: the more data there is, the harder it is to select, organize, or findthings [24, 307]. An emerging body of research points to people feeling stressedby the accumulation of data [253, 271].The increasing popularity of cloud platforms further complicates keeping de-cisions. People can perceive their digital data as being an undefined collection ofitems without knowing exactly what they possess and where things are. As Odomet al. [213] remark, “the role of curator can become complicated if one does notknow what one is curating.” Yet, there is a desire for some form of selection. Pre-vious studies highlight that a meaningful personal archive should focus on “theremarkable” items [166] and that selection is one of the ongoing practices that in-form what an archive is [149]. Work on the gaps between actual and desired PIMbehaviors highlights how users are “eager” to delete unnecessary information andwould like to be more efficient in this practice but often fail to do so because ofpoor support in data management tools [5].2.2.3 Previous insights about discarding decisionsPrevious HCI studies mention several reasons for deleting or discarding data. Forexample, users might want to delete data from an ex-partner after a breakup [247],unneeded, unknown or unwanted data [150, 184, 213], data copied online [211],social media accounts or expired accounts [288]. Most of these studies, however,mention these episodes as secondary insights. Our work looks more closely atdiscarding practices in the context of personal data curation, providing a richercharacterization of user behaviors.Some of the more detailed studies about deleting practices look at the problemfrom a psychology or security perspective. For example Kim [151] reports thatpeople might decide to delete for a variety of reasons and with a variety of ap-proaches: because of limited space on their devices, after buying a new device, byinspecting items one by one, accidentally, or more regularly. Kahn et al. [150], in-stead, investigate retrospective data management decisions in two prominent cloud16storage platforms, Google Drive and Dropbox. They show that most participantswould like to delete at least one from a set of 10 old files when given the opportu-nity to review them. Ramokapane et al. [240] also look at cloud platforms and listthe main reasons for wanting to delete data: among them, privacy protection, stor-age issues, external company policies, and the perceived value of the data. Theyalso mention how some users decide to delete for the sake of it, to “tidy things up.”Additional studies look in detail at deleting decisions for emails [202], data onsocial networks [4, 202, 260, 298], and messages [251]. With email, reasons fordeleting tend to focus on storage capacity and information utility, with users report-ing deleting emails that are old, unneeded, or taking up space [202]. With socialnetworks, instead, privacy and regret play an important role, with users wanting toremove embarrassing content [4, 202, 260, 298]. A key reason for deleting mes-sages, instead, is to fix and revise exchanges in a conversation (e.g., typos, inappro-priate content, obsolete content) or free up space taken up by media content [251].But emails, social networks, and messages add complexity to deleting decisionsbecause other people are involved: even if a user decides to delete a message, forexample, the person they are chatting with might have seen it already [251].Past work also makes a key point: user interfaces should avoid “one-size-fits-all” solutions for supporting deletion because deleting decisions are contextual andhard to generalize [150, 202].2.3 Summary and conclusionOverall, previous work shows that personal data curation is an important activityfor people, but the literature lacks a rich and systematic understanding of individualdecisions for keeping and discarding data. Previous work pays less attention toindividual differences or design efforts in the initial stage of curation, as comparedto organization and retrieval. Deciding what to keep or discard is a challengingaspect of curation where users often struggle because of the personal and contextualnature of these decisions, but also because users lack proper support in the toolsthey use to manage data.Our work adds to the existing literature by shifting the focus from data orga-nization and retrieval to decisions about what data to keep or discard. We advance17existing literature in several ways. First, we detail individual user preferences fordiscarding decisions in different contexts (Chapter 4, Chapter 6), and categorizecriteria and strategies for discarding data (Chapter 5). Then, we explore how tomove beyond a one-size-fits-all approach and incorporate individual preferencesboth in contextual but separate design solutions (Chapter 7) and with a cohesive,personalized tool (Chapter 8).In the following chapters, we will touch on additional related work and high-light differences between our work and previous studies.18Chapter 3MethodologyIn this chapter, I discuss my research approach and methodology. I outline criteriafor evaluating my work and reflect on my research positionality. Reflexivity is animportant component of rigorous qualitative research: it informs the reader aboutpossible biases or influences on the analysis, and it positions the researcher as anactive actor in the research process [56]. Because reflexivity emphasizes the roleof the researcher, I write this chapter in the first person.3.1 Research approachMy research approach is broadly speaking qualitative. Over the course of my workI combine two philosophical research paradigms: constructivism/interpretivismand pragmatism [197]. Combining approaches and methodologies is common inHCI, a multi-disciplinary field that embraces many perspectives.Study 1 and 2 fall largely within a constructivist and interpretive paradigm,where the goal is to understand and interpret a culture-bound reality [197].Across the two studies, I use a constructivist grounded theory methodological ap-proach [56]. Grounded theory is a methodology for collecting and analyzing datain a “systematic but flexible” way [56]. Grounded theory originates in social sci-ence, but it is a common approach in the field of HCI [201] (although HCI studiesoften only follow a light version of grounded theory, without using the full set ofprocedures). The goal of grounded theory is to build an understanding of processes19and phenomena grounded in research data. Key steps in grounded theory include:theoretical sampling, for selecting participants based on gaps in an emerging the-ory rather than to build a representative sample; an initial phase of open coding(often going through interview transcripts line-by-line), followed by axial and se-lective (or focused) coding (where the researcher focuses on a few key ideas); aprocess of constant comparison, where examples and experiences from differentparticipants are compared to help better understand the process at hand and itskey categories; memo writing, where the researcher engages in reflections aboutthe analysis. There are different schools of grounded theory, a result of years ofpractice and debate among scholars, with diverging views on the nature of qual-itative research [54, 55]: at a high level, we can distinguish between objectivistgrounded theory (as conceptualized by Corbin & Strauss [267]) and constructivistgrounded theory (as conceptualized by Charmaz [56] and, before that, in the origi-nal conception of grounded theory by Glaser & Strauss in the 1960s [103]). Char-maz summarizes the difference between objectivist and constructivist groundedtheory explaining that “constructivist grounded theory assumes relativity, acknowl-edges standpoints, and advocates reflexivity” [55]. In my work, I follow Charmaz’sschool of grounded theory.My work moves closer to pragmatism with Study 3 and 4, where I take a Re-search through Design (RtD) approach [317]. At a general level, RtD argues for in-tegrating design into research, giving emphasis to creating design artifacts that candrive research inquiries. However, the broad nature of RtD often includes divergingmethods and perspectives [96, 318]. In my work (especially in Chapter 7), I borrowmore closely from methods such as Speed Dating [72], User Enactments [214], andExperience Prototyping [45].Speed Dating is a method that argues for exploring a variety of design ideasand dimensions, without requiring full technical implementation [72]. User Enact-ments, instead, ground the evaluation of possible design ideas into situations andcontexts from a potential future, allowing participants to experience alternative sce-narios [214]. Finally, Experience Prototyping argues for creating prototypes thatcan simulate user engagement with a product, often in a set of scenarios that mightbe difficult to experience directly otherwise [45]. In my design work (Chapter 7,20Chapter 8), I combine aspects from these methods and adapt them to the specificrequirements of my research inquiry.In terms of data analysis, I refer to reflexive thematic analysis (TA) [61] in mostof my studies. Reflexive thematic analysis differs from other schools of thematicanalysis (“codebook” thematic analysis and “coding reliability” thematic analysis)because it gives emphasis to the researcher subjectivity and sees coding data as areflexive, recursive process [41]. As Braun and Clarke explain,“Quality [in] reflexive TA is not about following procedures ‘cor-rectly’ (or about ‘accurate’ and ‘reliable’ coding, or achieving consen-sus between coders), but about the researcher’s reflective and thought-ful engagement with their data and their reflexive and thoughtful en-gagement with the analytic process.” [41]Thematic analysis, however, is a method and not a methodology: this is why Iuse it both when I refer to a grounded theory approach and also when I refer to aresearch through design approach.In general, my work moves away from post-positivism and experimental re-search, where common goals are predictability, statistical generalizability, and rep-resentativeness. Instead, my work is exploratory, holistic, generative, and descrip-tive:• Exploratory because I openly and inductively explore a specific topic thatthe existing literature has not covered in detail (i.e., people’s decisions aboutwhat personal data to keep or discard). There are no a priori hypotheses totest here [175, 266].• Holistic because I take a broad perspective on the topic, letting participantsguide me towards many different types of data, devices, platforms, and toolsthat they consider part of their everyday life. There are no imposed defini-tions of what should be part of the research. Instead, the focus of my workis grounded in participants’ daily experiences.• Generative because I use the varied sample of participants and the resultinginsights for generating design ideas and open a space for new opportunities.21My goal is not to provide a statistically representative account of behav-iors that can automatically generalize to a broader population, but rather myanalysis is grounded in a specific context that informs its transferability (seebelow).• Descriptive because I strive to provide a rich, thick, contextual descriptionof people’s behaviors but I do not claim the ability to precisely predict futurebehaviors based on my insights [110].Why do I take this approach? It matches my world view. I believe that method-ology should follow from ontology and epistemology. These are key philosophi-cal branches that tell us what we can know about and how we can know aboutit [63, 197]. They inform research according to a personal stance. My stance isthat the phenomena I study should be approached with a perspective closer to so-cial science, rather than natural science [197]. Thus, in my ontology I refer tobounded relativism (i.e., constructions of reality are bound to cultural and socialcontexts), and in my epistemology I refer to constructionism (i.e., people constructreality). The resulting philosophical perspective I take is interpretivism [63, 197].My stance does not imply that a quantitative approach would be wrong anda qualitative approach is the only possibility for studying the phenomena I lookat. Simply, a qualitative approach is what matches my philosophical perspectiveon the topic. Another researcher could look at the same topic with a differentphilosophical perspective and then take a different approach.A qualitative approach also stems from to the nature of the research questionsI am interested in exploring with my work. The research questions I highlightthroughout the different studies focus on new and uncharted research territory.Thus, a qualitative approach is the appropriate methodological fit for my work,where the goal is to develop a rich, descriptive understanding of emerging areasof interest. A qualitative approach helps in revealing and understanding salientissues for future research. Rich, one-on-one user interviews help understand thecurrent state of user behaviors, opening opportunities and directions for more fo-cused work. Research through Design methods, instead, help in probing futurestates, anticipating how design can change and influence current situations.223.2 Criteria for qualityCriteria for this type of qualitative work include rich rigor, credibility, resonance,and sincerity [280].• Rich rigor considers the complexity of the work, its face validity, and care-fully chosen procedures for data collection and analysis [280].• Credibility considers the “trustworthiness, verisimilitude, and plausibility ofthe research findings.” [280]• Resonance considers the aesthetic value of the text and its transferabil-ity [110, 165] (i.e., the possibility of transfering results to a similar but sepa-rate context).• Sincerity refers to self-reflexivity and transparency in the research pro-cess [280] (as outlined below).3.3 Research positionality, reflexivity, and transparencyA common practice of qualitative research, and constructivist grounded theory inparticular, is to outline the researcher positionality, providing a reflexive account ofkey decisions in the research process. An additional ethical requirement in manyacademic journals is to acknowledge a potential conflict of interest that influencesthe work reported. These statements are unfortunately rare or non-existent in HCIpublications. I include them to lend rigor and transparency to my dissertation.3.3.1 Research positionalityWhen the subject of my dissertation comes up, people often assume that I am ahardcore minimalist on a mission to purge everyone’s devices. Hide your harddrives, save your data. But no. I do spend time regularly managing and curatingmy data, but like many of the participants I talked to, my approach is contextualand varied. For example, on my phone I have collected close to four years of moodtracking and money tracking data. I will probably not delete this information anytime soon and I do regular backups, a practice I started after losing all my data23more than a decade ago. On Spotify I have a collection of playlists that I have beenputting together month by month for over three years. On the other hand, I strivefor a clean email inbox, I structure the organization of my documents, and I try tokeep the storage space I use on my computer and cloud platforms to a minimum.I grew up before paying for digital things became an everyday practice and to medigital still means free. I do not want to pay for storing my data. These experiences,practices, and attitudes inform my research and the specific perspective I take inlooking at personal data curation. At a general level, my position informs the ideathat studying data curation is important, as discussed later on in Chapter 4. Italso helps in noticing and observing curation behaviors that differ from my own.Because this is a topic I am fascinated by and I have first-hand experience with, Iam alert to individual differences that might go unnoticed or appear as meaninglessto other researchers.3.3.2 Reflection on my analysis choicesThe qualitative approaches I refer to in my work place great importance on theresearcher as the key instrument of investigation. Reflecting on the decisions thatdrive the research process is key for understanding the resulting analysis. Here, Ihighlight some of the most important decisions that determined the outcome of myresearch.I did not plan my PhD to be about personal data curation. In fact, the origi-nal research question of Study 1 was to investigate how users approach backups.I wanted to know whether people still do backups and how. But then, during thestudy, I found the idea of “hoarding” and “minimalism” as two opposite tenden-cies more interesting. I decided to follow this lead and shift the focus of the study,influencing everything that came after. I could have recorded the insights abouthoarding and minimalism along others, give them no particular emphasis, and dis-cuss other themes based on the data collected. But I made a choice that my positionand perspective as a researcher informed.Similarly, in Study 4, I could have detailed how participants used each sectionof the prototype we evaluated, how they ranked each and every tool we asked themabout. But I did not think those were necessarily the most interesting insights, so24when I identified the idea of data boundaries from one specific participant quote,I decided to follow that conceptual lead and structure the analysis around this onekey idea. Another researcher might have approached the same data differently.Interviews are rich by nature and capture a lot of information; this is one of theadvantages of this method. But at some point, as a researcher, you have to decidethe focus of the analysis–decide what story to tell. In this dissertation, you willready the story I have chosen to tell.Throughout my PhD work (starting from Study 2) I regularly engaged in jour-nal and memo writing, a common practice in grounded theory [56] that helped mereflect on my analysis as I went along. Several ideas I wrote about in my memoseventually made it into the analysis, but others did not. Appendix C includes someexamples of memos that might be helpful to gain more insights into how my think-ing and process around specific ideas evolved.Appendix C also includes some coding examples from multiple studies. I pro-vide them for transparency but with an attached disclaimer: a set of codes doesnot make an analysis. It is tempting to place too much importance on them, andtransfer any responsibility in the analysis to “the codes,” maybe going at lengthsto produce a codebook that can be routinely applied to the data. This is not aview of qualitative research I share. As Charmaz argues [56, 101], codes are a toolto understand and stay close to the data, but at some point the researcher needsto move from codes to a more abstract and conceptual understanding of the data.Concepts and themes are not necessarily captured in a single word or a neat corre-spondence between a paragraph and a code: that is where the actual analysis comesin. The more interesting themes and concepts often cut across codes, categories,participants. Like Braun and Clarke [41], I do not believe that themes and conceptsmagically “emerge” from data, as if they were there all along, hidden, waiting to befound. Qualitative analysis requires interpretation. Interpretation requires choicesabout what to focus on and what to ignore. Coding is the first step in this processbut it is not an end in itself.253.3.3 Conflict of interest disclosureFinally, I see it as an ethical obligation to acknowledge my internship as a UserExperience Researcher at Google in 2019. While my work at Google was entirelyindependent from my PhD research, Google and its products often come up in mydissertation because of their relevance to the topic of personal data. Throughout mywork I keep the position of an independent researcher and keep my experience atGoogle separate, expressing opinions on related products based only on my workand its implications. However, I am aware that bias is often implicit and havingworked at Google is a potential conflict of interest in the context of my PhD work.I hope that this disclosure can make my position more transparent and help readersfairly interpret my work.26Chapter 4Study 1Identifying a Spectrum of DataPreservation TendenciesIn this first interview study1, we focused on exploring general tendencies aroundwhat data participants decided to keep or discard over the years. We used the termdata preservation to indicate the practice of keeping data in the long-term, andwe see this as a key component of the keeping stage in the personal data curationcycle. This first study details the nuance of people’s practices in this space and theimportance of curating data for identity construction: this key idea will come backthroughout the dissertation and represents the core underlying theme of my work.4.1 IntroductionEconomists argue that digital data has become the most valuable resource of the21st century [276]. Like oil, it is a resource that big companies are trying to con-trol and extract from people in large quantities, because it drives economic trans-actions [264]. Every day people produce, store, share, and interact with an increas-1Originally published in Vitale, Janzen, and McGrenere. (2018) Hoarding and Minimalism: Ten-dencies in Digital Data Preservation. Proceedings of the 2018 SIGCHI Conference on Human Fac-tors in Computing Systems (CHI ’18) [290].27ingly large amount of data, including pictures, texts, files, mobile applications andthe data they contain.Cloud platforms are one of the solutions that leading technology companieshave proposed to deal with the increasing amount of data. These platforms of-ten cause confusion [213] and raise privacy concerns [138, 288], but they offerseemingly unlimited storage that requires little maintenance on the user side. Thisexplains why they are an increasingly popular choice to store digital data for ev-eryday users [288]. Storage is either cheap or outright free. Google Photos, forexample, offers unlimited space for pictures (although at reduced quality).This is the “seductive” digital landscape that Marshall [182] predicted a decadeago when studying long-term preservation of digital items. At the time, a similarchange was taking place: hard drive storage was becoming cheaper, giving usersthe option to store nearly “everything” [184]. The pervasiveness of the cloud isonce again reinforcing this possibility. Now that people are living in this seductivelandscape, how are data preservation practices changing? It is critical to understandhow users are experiencing this new world, as they are just in its foothills. Asstorage gets cheaper and digital data more of a commodity, how do users deal withthis new environment?In this first chapter, we are interested in the act of preserving (or selecting) data,by which we mean deciding what data to keep and discard. As Whittaker [302]points out: little is known about “when and why people keep or delete differenttypes of information.” Therefore, we focused on a main, broad research question:how do people approach digital data preservation in the cloud age? How do theydecide what to keep and discard?We interviewed 23 participants from diverse backgrounds, focusing on theircurrent and past digital data practices. We asked them what “stuff” they keptthrough the years and why, how they used it, what they considered important, andhow they made sure not to lose it.We found that participants approached data preservation driven by a rangeof underlying tendencies, living on a spectrum with two recognizable extremes:hoarding (where participants tended to accumulate a lot of data even if it had littlevalue, rarely deleting it) and minimalism (where they avoided storing too muchdata or regularly engaged in a cleanup process).28First, we characterize in depth the spectrum and its extremes, showing the nu-anced nature of preservation tendencies. Then, we compare and contrast differentpreservation strategies, focusing on the extremes of the spectrum, elaborating onhow they helped participants build their identity, a practice commonly associatedwith possessing data [149]. Finally we discuss, among other things, how our cat-egorization relates to previously reported behaviors (e.g., filing and piling [177],email cleaners and keepers [117]) and the broad implications to shape the currenttechnological landscape.4.2 Related workHere we report related work on data preservation and digital hoarding, expandingon some the studies cited in the Chapter 2.4.2.1 Framing data preservationWe use the expression data preservation to indicate a subset of what is commonlythought of as data management or curation, as others have done before [156, 182].In the overall process of data curation, preservation looks specifically at decidingwhat data people might choose to keep.Although preservation is often overlooked in favor of other curationstages [302], there are clues in previous literature about general practices andvalues users refer to. However, we argue that there are gaps in the currentliterature.Previous studies show that users tend to take a neglectful approach to preser-vation: they do not think carefully about long-term preservation expecting datato somehow survive without planning [184], they have inconsistent strategies forshort-term preservation with a mix of “planned” (e.g., doing a manual backuponto an external hard drive) and “unplanned” methods (e.g., emailing documentsto other people as part of other activities) [156], they make no clear distinctionbetween short-term and long-term preservation, using terms such as “storing,”“archiving” and “backing up” interchangeably [156, 184, 263].When people preserve data, they do so, among other reasons, to build an iden-tity [66, 68, 149, 211], as we have discussed in Chapter 2. In this chapter, we29extend the idea of building an identity and further explore data values that peoplemight refer to2.What is missing in the current literature on data preservation is a holis-tic understanding that can explain the broader context of these values againstchanging technologies. While insightful, previous studies often either focus onlyon computers [156] or specific populations (e.g., academics [149], photogra-phers [263]), or predate the current technological landscape and its significantchanges [149, 182, 184]. With a broader approach, we aim for a more compre-hensive understanding of user practices.4.2.2 Digital hoardingDigital hoarding is not an entirely new phenomenon, but little is known aboutit. Coming from a background in psychiatry and neuroscience, van Bennekomet al. [284] present a clinical case of digital hoarding with one patient who suffersfrom a hoarding disorder that leads him to take 1,000 pictures every day. Theydefine digital hoarding as the “accumulation of digital files to the point of loss ofperspective, which eventually results in stress and disorganization.” They also pro-pose to categorize digital hoarding as a sub-type of the hoarding disorder and pointto the lack of scientific papers on the subject. The topic is just now gaining interestin the broader scientific research community, as evidenced by additional studiespublished starting in 2018 [205, 206, 222, 253, 271].In a review of published literature, Gormley and Gormley [105] discuss in gen-eral terms the costs associated with data hoarding and digital clutter based on previ-ously published literature: for example, hoarding data can result in costs for storagespace and management overhead. However, the research literature on the subjectis extremely scarce compared to hoarding of physical objects, a much more widelystudied phenomenon, with tools to measure it and diagnose it [95]. In addition, allof these studies are from outside the HCI literature, where these terminologies arenot well recognized and only mentioned in a few studies [126, 127, 250]. Beforerunning our study and encountering hoarding and minimalism, we were not awareof research on the subject. We note, however, that we refer to hoarding as a set2In the following studies, and particularly in Chapter 5, we expand on criteria for deciding whatdata to discard.30of everyday tendencies, not as a disorder. We are not in a position to diagnoseparticipants.4.3 Methodology4.3.1 ParticipantsWe interviewed 23 participants (16 females, 7 males) in Vancouver, Canada. Weused purposeful sampling to gather a relatively varied sample in terms of age, eth-nicity, and background. Participants’ ages ranged from 21 to 64 (average: 35.4,median: 30, SD: 12.5). Occupations included business consultant, cook, mentalhealth worker, server, researcher, research coordinator, retired accountant, specialeducator, social worker, software tester, stay-at-home parent, trader, university-level coordinator, in-between-jobs, in-between study and work, full-time graduateand undergraduate students (8 with backgrounds in Architecture, Archiving, Com-merce, Education, Electrical Engineering, Kinesiology, Mechanical Engineering,Organizational Behavior), part-time graduate students who were also working (2with backgrounds in Arts and Gender studies). The majority of participants (15)had basic technical skills, followed by average (4) and above average (4). Partici-pants were compensated $15 each.4.3.2 ProcedureAfter conducting three pilot interviews, we recruited participants through mailinglists and posters in several community centres in the city. We conducted semi-structured interviews, each lasting on average 45 minutes, at a location chosen byparticipants. One member of the research team conducted all interviews. We askedparticipants to bring their main interactive devices (e.g., laptop, smartphone, tablet)to the interview. All interviews were conducted in English. We recorded the audioof the interview, took hand-written notes, and later transcribed them for analysis.4.3.3 Data collectionAfter collecting demographic information, we asked participants to talk about andshow us their digital data, whether it was files, data from mobile applications, or31other examples. Following the example of Vertesi et al. [288], we did not imposea specific definition of digital data. However, unlike Vertesi et al. [288] we askedparticipants to show us the data, although they were free to choose what to show soas to respect their privacy. Participants gave an overview of their devices and thena more detailed tour of each, explaining what they used them for, what data theyhad on them, and why they had kept it. Then, we asked participants to imaginewhat they would want back if their devices broke down or were stolen, focusingon what was the most important data they had on them and why they considered itimportant.The second part of the interview revolved around a light version of the lifehistory method, a technique used to relive the life an individual through their nar-ration [181]. In the context of our study, we asked participants to relive their digitallife history, focusing on the devices they used through the years, asking them to re-member what they used them for, what data they had on them, whether they kept itor not, and why. We also encouraged them to sketch a chronology of their digitallife history on paper to help them think through it (Figure 4.1). Life histories areuseful to understand individual experiences in the social context where they takeplace and how individual understanding evolves through time [181]. During the in-terviews, we used the sketches as prompts to encourage participants in discussinghow their data evolved over time. Then, during the analysis process, we referredback to the sketches to contextualize participants’ recollections.We also asked participants to think about their data one and ten years into thefuture, to know what they anticipated as something worth keeping and why. Weconcluded by focusing on positive and negative aspects of their data management.4.3.4 Data analysisWe analyzed the interviews using the Braun and Clarke approach to thematic anal-ysis [61], where “coding is flexible and organic and evolves throughout the cod-ing process” [62]. We did both an inductive and deductive analysis (based on the“data economy” framework [288]). We used open coding with all members ofthe research team examining and discussing the data collaboratively in an iterativeand reflective process. Each member could see how others were coding the data,32Figure 4.1: We asked participants to sketch the history of their data over theyears to see what they had kept or discarded. In this figure, examples ofsketches from two participants.33discuss the interpretations, and propose alternative explanations. Throughout thisprocess, we regularly met for lengthy in-person discussions of the interpretations,making sure they were coherent, comprehensive, reasonable, and reflective of theactual data. We later grouped codes into categories and went back again to theinterviews to check for consistency.We looked at preliminary trends after the first batch of interviews, adjusted ourresearch foci and proceeded with additional interviews until thematic saturation.That is, when we got to P20, we noticed interviews were starting to closely repeatideas from previous participants, therefore we stopped at P23. In the later stages ofthe analysis, we paid attention to the contrast between hoarding and minimalism.These terms came up halfway through the study, when some participants used themto describe their approach to data preservation.We do not present counts for specific occurrences of behaviors, as we focuson recurring patterns of behaviors across and within participants to characterizehoarding and minimalism. We agree with Braun and Clarke that “frequency doesnot determine value” [62]. Our goal in reporting is to characterize the essenceof hoarding and minimalism, not their distribution, as our methods simply do notallow us to give a distribution.Epistemological stance and reflexivityIn our analysis, we took a constructivist epistemological stance within a boundedrelativist ontology [197]. In the context of HCI, we position ourselves in the so-called third paradigm, where meaning-making is a central focus [124].Our approach is similar to a constructivist grounded theory approach [56]. Tak-ing a constructivist approach means that we saw interviews as an interactive pro-cess of meaning-making: we built knowledge with participants. Therefore, we donot claim absolute truths about people’s behaviours, but a shared understandinggrounded in participants’ reasoning and experience, reflective of their broader cul-tural environment. Focusing on the words used by participants, we arrived at thenotion of hoarding and minimalism. These terms are socially constructed in thesense that they embody cultural connotations: we debated whether they were ap-propriate, reflecting on our assumptions about what they point to. Ultimately, we34use them to fairly represent the shared understanding we constructed with partici-pants.In line with our constructivist approach, we critically reflect on our position asresearchers and its influence on the analysis. Throughout the analysis, we reflectedand discussed our own experience with data preservation, since it is something wedeal with on a regular basis. In particular, one team member considered themselvesto have mostly minimalist tendencies, one had a mix of both, and another one re-flected on a tendency to hoard pictures. Additionally, we frame data preservation asa challenging task worth investigating but others might have different perceptions.We also acknowledge that our Western cultural background and its values informour view. This points to the inherently interpretive nature of our work taking placein the the current socio-technical landscape.4.4 ResultsFirst, we present contextualizing information about the data that participants dis-cussed in the interviews. Then, we focus on the cross-cutting theme of hoardingand minimalism, giving an overview of recurring behaviors across participants. 34.4.1 Contextualizing informationSimilar to what Vertesi et al. [288] found, participants considered a variety of datasources: computers, smartphones, tablets, wearable devices, online platforms, andmobile applications. They talked about files, text conversations, pictures, videos,bookmarks, logs, profile settings. Pictures were consistently regarded as one of themost important pieces of data because of their sentimental value. Participants men-tioned how photos served as a tool for remembering and how it would be hard totake them again if they lost them: “I can’t retake those photos [...] I’m emotionallyattached to them [...] Music, I can always download again. It seems like photosare less replaceable.” (P3) Other factors determining the importance of data wererecency, utility, time invested to craft it, and its role as a record. We are not the first3The two other themes were “the importance of other people” and “striving for the right wayto manage data.” We decided not to include them in the final analysis because they had substantialoverlaps with work by Vertesi et al. [288]. In Chapter 6 we touch on the importance of other peopleand how it helps in understanding data curation.35to report these values [104] but we elaborate later on the important role of data asa memento in relation to hoarding.4.4.2 Identifying hoarding and minimalismHalfway through the data collection process we met with Sarah (pseudonym forP13), the participant who introduced us to the approach of data minimalism.Sarah is a graduate student and mental health worker. She manages all her dataon her laptop. She does not use cloud platforms. She has a phone, but it is not asmartphone. It is a “dumb” flip phone. She explains that she grew up in a smallcommunity whose members did not use tablets or smartphones. She does not wantone, “never”, because “otherwise [she’ll] be on the bus [demonstrates hunchingover the phone]” and instead she wants to “look outside and talk to people.”On her laptop, a small MacBook Air, there is only one main folder simplycalled “Life.” “Everything is kinda organized,” she explains. “My apartment, in-spiration, beautiful photos, photos of my family.” When we ask her why she calledit “Life” she takes a moment to think. “Well, I was thinking about it. [...] And Iwas like, OK, what could this be? Well, it is my life. My family, my school. I mean,my life is so much more than that. But I couldn’t think of a better name.” She thenexplains how data helps her build an image of herself, an idea that will becomeimportant to understand the broader role of preservation tendencies:“I think humans are always trying to find things outside of themselvesto make them feel they’re more than they are. If I like a song, it’s partof me, me kind of building up the image of myself. So I think it’s mebeing like ’Oh yeah, my life, school and this and that.’ We don’t needthings outside ourselves, but we are always looking for things to makeus feel complete.” (P13)At the end of the interview she summarizes her approach to deciding what tokeep and discard: “It’s very minimal. I try to delete everything that I don’t need asfast as I can. [...] Do most people have a lot of stuff?”Yes, they did. Compared to what we had observed up to that point in the study,hers was a very different approach. In retrospect, it was clear that until that point36we had mostly seen strategies closer to another extreme: hoarding. We thoughtSarah might be a “unicorn” and that we would not meet other participants like her.But we did. And then something similar happened when other participants self-identified as hoarders, even though we never used these terms in our questions.In fact, looking back, we saw that some participants had specifically mentioned“hoarding” before we interviewed Sarah, but it had not jumped out to us. Alto-gether, it became more and more clear that participants adopted a range of datapreservation tendencies that lived across a spectrum between hoarding and mini-malism.We start by describing and characterizing the tendencies participants reported,largely grouping them along the two extremes of a spectrum: hoarding and mini-malism. Throughout the analysis, we point to instances of nuance within individu-als, with some participants being highlighted in both sections on hoarding and min-imalism, or displaying interesting exceptions to their general approach. Broadlyspeaking, some participants were stronger in their tendency towards hoarding orminimalism, while others displayed a much more even mix of both or were noteasily classifiable. However, this is an over-simplistic categorization, given thenuanced nature of the tendencies and the fact that they represent a spectrum ofbehaviours within two recognizable extremes.We also touch on the actual organization of data that participants displayed(e.g., being organized or messy, using folder hierarchies or not), showing that itappeared to be orthogonal to their tendencies—some participants were organized,some were not, independent of the tendencies they displayed.4.4.3 Hoarding: keeping to rememberHoarding was characterized by the tendency to have large amounts of digital“stuff”, rarely deleting any of it. Participants often kept data even if they describedit as having no value. The practice had both an emotional component (where itwas a response to the fear of forgetting and letting things from the past go) and apractical component (where it was related to job or external requirements). Whendiscussing hoarding, participants often reported challenges and frustrations withmanaging and “being on top” of data.37Self-identifying with hoardingSimilar to what happened with Sarah, we were surprised when some of the par-ticipants self-identified as “hoarders.” “I am a hoarder, I hoard things,” said P17,explaining why he had a large number of ebooks. Or:“I consider myself a hoarder because I didn’t delete them, causeI didn’t clean them or delete, and I’ve kept all of them, except afew.” (P20)“I’m a bit of a hoarder, I just keep all the stuff and nothing ever goesaway.” (P12)However, not all participants were comfortable identifying with hoarding. Atone point during the interview, P8 said: “I am not a hoarder, really I am not!” Andadded that she could delete stuff if needed. However she later explained that she“keep[s] everything” because she “like[s] to keep things.” In fact, she had digitaldata going back to her first computer from when she was 10 years old—she wasnow 25.Lots of data, often spanning yearsThe first point that characterized hoarding tendencies was the large amount of dataparticipants had kept through the years. P12, for example, had a large numberof old files on her computer from decades ago that she never looked at and wassurprised to occasionally discover. She also had a large number of pictures on herphone and a lot of unrecognized documents on Google Drive. P17’s ebooks werein the thousands. P3, who had recently taken a trip around multiple countries, hadaround 6,000 pictures just from that one trip on a hard disk, which admittedly was“a lot to deal with.” Although they had kept it for long time, participants oftendismissed most of their data, describing it as not needed. For example, P23 hadfour external hard drives in which she stored videos taken at public events such asconcerts or festivals. She had kept them all since the 1990s:“I’ve always kept them, I know I don’t need them anymore, but [I justkeep them]. I guess I hoard things. At least with data it just takes the38drives. It’s not like it accumulates or takes the space in your room.Before I used to [go] shopping to buy clothes and clothes would addup. And then, OK, I’m running out of space! Get rid of the old stuff,right? So I kinda stopped that now. But I guess I switched over todata!” (P23)This hoarding tendency did not seem limited to videos: her phone had multiplescreens full from top to bottom of application folders, each with several applica-tions. However, she reported regularly using only a few.Sometimes participants even went as far as describing the data they had kept inrather uncomplimentary words; for example P12 said: “Crap. All kind of stuff. Mybills, recipes.” She did not know why she had kept it all through the years: “I don’tknow, might need it.” At the end of the interview, she asked if other people did thesame: “Are there people who don’t have tons of crap on their devices? Do you getrid of your messages? Do kids do that? Do kids get rid of everything?”Rarely deleting dataThe large amount of stuff participants kept might be explained by the tendency toavoid or to rarely delete data. Participants lamented the effort it takes to curate anddelete data: “I don’t think anything is going to go. I’m just going to add more,because it costs so little to add stuff but it takes a lot of time to sort the stuff youwant to delete.” (P2)At the same time, having access to larger storage space than in the past (whetheron hard drives or in the cloud) tilted the choice towards inaction:“I can just put it there and forget about it and don’t have to actuallyselect. If I couldn’t backup to a physical hard drive and I could onlybackup to the cloud with a limited capacity, that would force me toclean up a little bit of the files. But because I have plentiful storagespace, I don’t think about it too much.” (P3)In cases when storage became an issue, getting additional storage appeared tobe the easiest solution. P9, for example, described regularly buying new hard disksto accommodate her growing set of data: “I think the reason I have my second39hard disk is because the first is filling up, because I don’t like to delete stuff.” P8explained a plan to keep everything in the future: “Oh, I’ll keep all of it! Well, I’llhave to get a bigger hard drive [...] And if it doesn’t fit, I will use Google Driveagain if I run out of space on Dropbox.”The emotional value of hoardingThe costs associated with curating data in the first place might explain why partici-pants rarely deleted data. However, this tendency also appeared to be an emotionalresponse to the underlying fear of letting things from the past go and forgetting, asentiment that participants often brought up. “I like to keep memories. I don’t likeletting go of things,” (P8) “I tend to keep everything. It’s more like, I don’t want toforget things that have happened to me in the past,” (P11) “I have not learned howto let go of things.” (P17)In fact, while in some cases participants described their data as having no ap-parent, concrete value, it had a deeper, emotional value. Such is the case with P15,a stay-at-home mother of two, who had over 20,000 pictures on her laptop:“I’m sentimental. As a mom, both my children, 15 and 18, they en-capsulate memories. And sometimes it feels I have to hold on to thosebecause that’s all I got left in some sense. Sometimes it feels likethat. So the pictures represent something that’s important to me, that’sprecious. The experiences with my children. [...] There’s maybe thisimpression that things that are digitalized are somehow permanent andmaybe it’s an attempt to try and hold onto things, in spite of the pass-ing of time.”Here hoarding was a proxy to remember life. It provided emotional support,with the large amount of stuff representing a large amount of experiences to goback to.The practical value of hoardingAlong with an emotional value, hoarding tendencies also had a practical compo-nent, related to job requirements or external factors. For example, P14 reported40keeping all tax documents for the previous five years, to comply with governmentregulations. P3 explained that a large number of pictures from a trip could act as arecord for other people when looking for a job:“We really value these pictures, they’re useful for us to keep as memo-ries and also for employment. When they say ’Why did you have this9-month gap in your history?’ We can say ’This is what we did,’ this[the travel pictures] is proof I wasn’t somewhere else.” (P3)Another participant, a student in architecture who was close to graduating, ex-plained keeping Autocad files of all of her school projects because they might comein handy when looking for a job:“Much of the stuff is school work. When we want to apply [for a job],make a portfolio, I’ve heard they ask you to send Autocad for specificprojects [...] I will need a job after I graduate.” (P9)Keeping all files offered assurance that she would have the right piece of workto show when the right moment came. However, P9 was frustrated by how increas-ingly challenging this practice was: “I think it’s not very efficient: files are gettingbigger and bigger, but my hard disks aren’t, except if I get more hard disks.” (P9)She was not alone in expressing frustrations with hoarding.Challenges when hoardingThe large amount of data that characterized hoarding tendencies often led to frus-trations. Participants reported issues in 1) keeping up with their data because ofhow much they had, 2) knowing what exactly they had, and 3) knowing wherethey had stored it. For example, P15, who valued the 20,000 pictures stored onher laptop for their emotional role, described also being overwhelmed by the sheeramount: “It’s hard to keep on top of. I wish I was more organised in the beginning’cause now it’s overwhelming going back and organising things [...] On one hand,you can take 30 pictures and have one that’s good, but 30 pictures take time to gothrough.” She aspired to become more organized and minimal in her approach, toreportedly do things right [288]: “I’m hoping that by organising I can get rid of41things, then I have more space. I hope it will be more efficient, so that whatever Ihave, I am valuing it and enjoying it.” The problem was, she did not know how togo about changing her approach.Hoarding in relation to data organizationThe large amount of data also made it difficult to know what exactly participantspossessed and where it was stored. This did not appear to be an effect of generaldisorganization. P6, who was in general methodical with her organization, hadsix different Google accounts that she used in the past to segment [293] her emailusage. She also used them with Google Drive, but she had a large amount ofdata, so it was hard to know what it was: “I don’t know what’s on everything,just random stuff.” Similarly, P12 had a rather organized computer, making use offolders and sub-folders. Yet she had no idea what she had kept through the years,simply because it was a very large amount of data: “I look at things and I’m like’What’s even in there?’ And there might be folders inside those folders.”Some participants did not appear to be bothered by their approach and char-acterized themselves as being “just lazy”, displaying a rather care-free attitude:“Occasionally I have some weird stuff here, like this ebook, I don’t know what it’sdoing here, this is probably my stuff from 2015. I’m too lazy to move it so I justleave it.” (P6)4.4.4 Minimalism: I am more than my dataMinimalism was characterized by the tendency to keep a small amount of digitaldata. Participants used both preventive and reactive strategies to keep as little aspossible: they set a limit on the amount of data to acquire, or they regularly wentback to cull it and delete it. Participants described minimalism as a way to be incontrol of data and life, but they also hinted at underlying anxieties behind it, andin some cases they felt detached from their data.Prevention: limiting the amount of dataSimilar to what happened with hoarding, some participants were explicit in callingout their minimalist approach. As an example, P19 had recently switched from42a Macbook to a Chromebook, which she found cheaper and more “basic”. Sheexplained how the change affected her data practices: “I am more of a minimalistnow. Really keeping what I need.” (P19) Her minimalist approach encompassedseveral types of data, including, for example, mobile applications on her iPhone:“My phone, again, minimalism. I do not like having tons of apps. And the apps Idon’t really use, I put them here [a folder]. But other apps, I was so happy whenthey [Apple] said you could get rid of them.” (P19) Interestingly, the exceptionto her minimalist approach was a collection of articles from the “New Yorker”magazine: “I am obsessed with The New Yorker, the magazine. I have all differentsections of it. Every time, I download it and then I read it. And I save the ones thatare amazing and I want to re-read in the coming years.” (P19)With minimalism, some participants limited the amount of stuff to keep in thefirst place, and this worked as a self-imposed preventive measure: “I try not to havetoo much stuff here [the desktop] [...] And I try not to download too much, ’causeit is primarily for school.” (P13) Referring to her pictures, P6 explained that shewas selective and therefore chose to not use automatic uploads in the cloud: “Mostpeople auto upload them to Google Photos, but I don’t, because I don’t want to saveevery single photo.” (P6) It is interesting to note that P6, outside of her pictures,displayed a tendency to keep a lot of data.Reaction: cleaning up dataAnother recurring behavior participants displayed was going back to the data sothat they could cull it, clean it up, and delete it as needed: “Every couple months Igo through all the old photos and delete them.” (P21)They articulated a thoughtful process of evaluation based on future utility andpersonal values:“With my phone, I guess I tend to only keep things that I think willbe useful. For example, if I went out and took a lot of photographs ina single day, in that evening I might clean up the photographs that Ididn’t like or that I wouldn’t think would ever be of interest to anyoneelse. If I don’t like them, I don’t think others will, and I don’t see thepoint of keeping them. I’m generally quite clean with what I do.” (P22)43In some cases, getting rid of things was the ultimate goal of being organized,an activity participants sought out: “[I] organize, so that I know what to get ridof.” (P19) But while some participants were very organized, this was not alwaysthe case with minimalism. For example, P21, who had a minimalist approach withthe data on his phone, said he was “not an organized person whatsoever,” relyingentirely on the automatic organization his iPhone provided. Similarly, P16 did nothave many documents on her laptop and she stored them only on the desktop: “Ihave a tendency to keep my stuff on the desktop, all the time. It’s not a good habit interms of organization but [...] It’s there, it’s easy to find.” Having a limited amountof data might have made it possible to be less organized and still be able to use itefficiently.Underlying anxieties in minimalismA minimalist approach was often described as a habit: “I clean it out regularly.I don’t really know why, just a habit I guess. It seems kind of busy. So, I likehaving clean files I guess.” (P10) However, participants with minimalist tendenciessometimes displayed underlying anxieties behind their approach that we did not seereflected in hoarding.Curbing the amount of data with preventive and reactive actions appeared to bea way to have control of one’s data and, by extension, life: “It’s probably a way forme to stay clear.” (P13) In some cases the need to be in control extended beyonddata:“For me, being able to see on Gmail that I have less than a hundredemails that are unread and not having to rely on too many apps, itmakes me feel calm inside. I do not like clutter. Clutter? I hate clutter!Visibly, physically, I like clean, I like washing clothes, I like seeingeverything clean on the table and house. I’m not like a clean freak,that’s my mother. I’m somewhere in between.” (P19)When talking about minimalism participants also expressed a need to limitthe time spent with technology, displaying a general avoidance for it: “I reallydon’t like how much time I have to spend on it. I would rather be not staring44at the screen for hours.” (P13) They placed greater importance on face-to-faceinteractions, as if technology was in itself negative: “I like spending time withpeople one on one, talking, I don’t like chatting.” (P19) This is an attitude that didnot surface in participants with stronger hoarding tendencies.Some participants also reported worries about external factors, such as money.For example, P16 used a very old computer and a four-year old iPhone, becauseshe was trying to be economical and have a rather frugal lifestyle. She also was ina phase of her life where she did not have large amounts of data in the first place:“I try not to do a lot. I don’t have to do a lot of documentation for school, I’m donewith school, so I don’t have a lot of essays.” (P16)Similarly, P21 had recently “downsized” his digital life: he went from owninga Mac computer to having just an iPhone for all his data. This change, that “didnot come from within” (implying again that minimalism was in part a reaction tofinancial constraints) imposed a limit on the amount of space at his disposal:”[When] I had more space, I would save almost every stupid photo tothe computer and have photos of my background, have photos of mythumb. And now with limited space I have to be more choosy [...] Inever needed all those photos [...] I do enjoy this [the iPhone] becauseit simplifies everything a little more.” (P21)The exception to his approach were texts. P21 explained the need to keep allof them because they were important for work and having a record of what peoplesaid.Detachment from dataMinimalism sometimes translated to a level of detachment from the data itself, tothe point of being at ease with the possibility of losing it. P21, for example, relatedhow his approach evolved after downsizing:“After awhile, you know, they’re just photos. And life is ongoing re-ally. There was that big need before to hold on to every little type ofthing. And now, you know, it wouldn’t be the end of the world if I lostthese things.” (P21)45External factors such as money appeared once again to be important in deter-mining the contextual value of data. This was the case more with minimalism thanhoarding:“I would rather not [lose it] obviously, but I don’t think it would bethat critical. I would get over it pretty quickly [...] I would go tosome lengths to get it back, but if it was to cost me some money, I’drather lose the files than money. Money is more important to me Iguess.” (P22)That is not say that in minimalism data did not have value. Participants reportedhow the limited data they kept was a part of themselves: “The things I use morefrequently are in this file. This is my D&D, I play Dungeons and Dragons. It’s abig part of who I am.” (P22) However, they also reported being at ease with thepossibility of losing data. As P16 summarized: “That’s OK, if I lose it, I lose it.”4.5 Discussion and implicationsIn introducing a spectrum of tendencies, questions about their nature arise. Are youinnately more aligned towards minimalism or a hoarding? Can you move acrossthe spectrum? Can you embody aspects of both extremes at the same time?4.5.1 Tendencies across a spectrumWe start by addressing terminology. We talk about a spectrum of tendencies withhoarding and minimalism at two extremes, rather than categorizing participants aseither “hoarders” and “minimalists.” This is because we saw variation both acrossand within individuals, and also across data types. P21, for example, had a min-imalist approach with most of his data because of external factors: once he soldhis computer, he became choosy with what to store, except with texts. Similarly,P6, who was highlighted in both the hoarding and minimalism sections above, dis-played tendencies on both extremes throughout her various devices, hoarding themajority of stuff, while also displaying exceptions for specific types of data (e.g.,photos). P19, with the strongest minimalist tendencies, displayed an exception in46collecting articles from the New Yorker. Several participants shared similar pat-terns of behaviors, suggesting that the tendencies were context-dependent and nota clear-cut binary. Therefore, our goal was to categorize behaviors across a spec-trum, not individuals.4.5.2 Individual variation is commonA growing body of literature shows how people might segment their digital datainto multiple mental places: an account for work stuff and one for personalstuff [293]; a messaging application for friends, one for family [207]. This mentalsegmentation adds to the idea that there is variation within an individual: users ap-proach data differently depending on the context they build around it. Therefore, asingle user can actually incorporate multiple behaviors, influenced and dependenton the specific context she needs to manage at a specific time. Chapter 6 furtherexplores this idea.4.5.3 Comparing and contrasting hoarding and minimalismAs two ends of a spectrum, hoarding and minimalism appeared to be radicallydifferent opposite approaches. But while there were indeed differences in requiredeffort, they both served a similar function in helping participants construct theiridentity.Identity constructionTendencies across both hoarding and minimalism appeared to often have the im-plicit goal of providing participants with a framework for building their identity.This is a practice closely tied to data preservation [66, 68, 149, 211]. Participantslooked at themselves in relation to data, context, and other people. Here we re-fer to the several quotes where participants asked what other people did with datacompared to them: “Do you get rid of your messages?” (P12), “Do most peoplehave a lot of stuff?” (P13) Similar questions were a common occurrence during theinterviews. Towards the hoarding side of the spectrum, the large amount of dataappeared to provide emotional support against underlying worries and concerns oftime passing. Data was a symbol of experiences and memories (I have data there-47fore I am). On the contrary, limiting the amount of data in minimalism seemed toprovide a way to gain independence from technology and detaching from data (Iam more than my data, paraphrasing P13).Costs and effortTendencies at both extremes came with costs, although at different stages of thepreservation process. Hoarding tendencies seemed to have no upfront costs (e.g.P3: “I can just put it there and forget about it”), but later revealed themselves tonot be an optimal preservation strategy if the amount of data became too large.Hoarding was a way to offset any upfront costs. Minimalism, on the contrary,required both an initial investment and ongoing dedication: setting a preventivelimit and regularly going back to clean up data. In short, minimalism requiredongoing effort, while hoarding seemed to require effort only once problems startedarising, if at all.Past clues about hoarding and minimalismPrevious studies contain clues about the notion of a spectrum of behaviors fallingbetween hoarding and minimalism, but they rarely use these terms. For example,Spurgin [263], in studying photographers, talks about how some people delete allpictures, some do not, most fall in the middle. Henderson [126, 127] talks aboutfiling and piling (two common strategies for organizing documents [177]) and men-tions participants who self-identify as hoarders. We also see similarities betweenminimalism vs. hoarding and cleaning vs. keeping email [117], where participantseither cleaned their inbox or let messages accumulate.All these different categorizations of individual differences are neither in con-flict nor duplicates. What we provide is a broader and more comprehensive lens onuser behaviours that builds on top of and extends previous categorizations.By focusing on a broad range of data types, we provide a broader context forpreviously reported behaviors that come from studies in specific, narrower settings(e.g. personal documents, email). That is, the spectrum of tendencies that wedeveloped appeared to encompass several types of data, suggesting that it repre-sents an overarching phenomenon not limited to a specific domain. We focused on48data preservation, but we speculate that these tendencies might play a role in otheruser behaviors (e.g., tab usage in browsers or notification management). Further,we believe that looking at the different prior categorizations together in relation tohoarding and minimalism might lead to building an even more comprehensive andexhaustive spectrum of data related behaviors.4.5.4 Reflecting on hoarding and forgettingThe tension in using the term “hoarding” is that the word itself often embodiesnegative cultural connotations, evoking images of people buried alive by their pos-sessions. This might explain, for example, why P8 was so emphatic in saying thatshe is “not a hoarder!” But we saw how hoarding tendencies had an important rolefor participants, providing them with an emotional support for the fear of forget-ting things, an insight that further supports the link between digital possessionsand their role for identity shaping. Kaye et al. [149], for example, titled their pa-per about personal archives “To have and to hold,” highlighting the importance ofholding onto to things.It is interesting to compare the emotional need of never forgetting to recentneuroscience studies about memory. Researchers suggest that forgetting is in facta useful function of the human brain, essential to make decisions [135, 242]. Otherstudies show how taking pictures of every moment does not actually help in remem-bering them [129, 268]. There are even specific circumstances (e.g., the breakupof a relationship) where disposing of digital possessions is seen as a necessary actto avoid negative emotions [247]. Considering these insights, a worthy questionto ask is whether attempting to store and keep everything forever still allows thespace for forgetfulness and how?4.5.5 Implications for shaping technologiesLeading technology companies such as Apple, Dropbox, Google, and Microsofthave an interest in encouraging users to move their data onto cloud platforms andaccumulate large amounts of it: the more data, the more space users need. Themore data, the more possibilities to thrive as a platform [264]. Unlimited storagefor pictures on Google Photos might seem a generous offer, but generosity is not49Figure 4.2: macOS Sierra shows users large files on their hard drives, dis-playing the size and the last time they accessed them.necessarily the main motive when considering a larger business model where datais an essential resource for machine learning and AI training [170].In this context, how much do technology applications influence data preser-vation behaviours? Some participants mentioned the amount of storage at theirdisposal as a decisive factor for keeping large amounts of data, while others wereaccumulating independent of it. At the same time, some participants gravitatedtowards a minimalist approach because of the limited storage on their devices. Sowe do not have a definitive answer to our question, but we do believe that consid-ering the spectrum of tendencies we present can inform design decisions. What weoffer are not specific design recommendations for user interfaces, but rather broadimplications that could help shape technology.Seeing these tendencies as living on a spectrum with two ends lead us to ad-vocate for ways to mitigate the costs that characterize both sides. Some recent50Figure 4.3: Files is an Android application by Google that gives users recom-mendations on how to free up storage space on their phone.changes in interfaces show that mitigation is possible. For example, recent ver-sions of Apple’s macOS provide a panel, although rather hidden, to explore howstorage space is used (Figure 4.2). It shows what are the largest files on a user’shard drive, their size and the last time they were used. This is information that theoperating system can easily access and can be helpful to users to inform decisionsabout data preservation. Similarly, Google released at the end of 2017 “Files” 4,an Android application that suggests to users how to free up storage space on theirmobile devices by deleting, for example, old apps and temporary files (Figure 4.3).Even though it is not clear how many users are aware or regularly take advantageof such features, their existence provides some evidence that companies are at leastsomewhat conscious of the frustrations experienced with the accumulation of largefiles.4https://filesgo.google.com51We recommend more user support along these directions, namely, finding waysto increase awareness for these features, and making them more visible duringdaily usage. In Chapter 7 we will explore similar design concepts to understandhow people might perceive them.4.6 Summary and conclusionWe have shown how participants approached digital data preservation driven bya spectrum of underlying tendencies with two extremes: hoarding (where theyaccumulate large amounts of data, sometimes considered useless, experiencing insome cases challenges with managing it) and minimalism (where they try to keepas little as possible, preventing or reacting to data as a way to be in control of it).There was nuance and variation within individuals, but tendencies close to bothextremes of the spectrum appeared to be a way for participants to build their ownidentity in relation to data (I have data therefore I am vs. I am more than my data).The contribution and value of this study lies in: 1) bringing to light a spectrumof tendencies with hoarding and minimalism on two ends, characterizing themin depth, 2) comparing and contrasting different user behaviours, showing theircommon role for identity construction, 3) putting them in context compared topreviously reported behaviors in the literature.Our analysis sheds light on possible behaviors for preserving digital data, agenerally under-unexplored topic. Furthermore, they have broad implications forshaping technology, opening rich possibilities for future work. Now that peopleare in the foothills of a new world where seductive cloud storage is pervasive, itis critical to understand what drives people’s behaviors so that we can shape thisworld in a way that promotes informed decisions and well-being.In the next chapter, we will complement Study 1, with a mixed-methods studythat looks more closely at decluttering data rather than preserving it. Then, inChapter 6, we will tease out the variation in behaviors across the spectrum andintroduce a set of “behavioral styles” that expand more general preservation ten-dencies.52Chapter 5Study 2Unpacking Decluttering Criteria andPracticesIn this chapter, we present Study 2, a mixed-methods study (7 interviews andan online survey with 349 participants) that we conducted as a followup toStudy 1 (Chapter 4). Here, we focus on decluttering, defining criteria and practicesthat participants used to declutter their data. These criteria and practices inform thedesign concepts that we explore in Chapter 7. In addition, we later use the 7 inter-views from this chapter as part of the overall analysis we present in Chapter 6. Wealso build a taxonomy of data types that will play a role in the prototype system weintroduce in Chapter 8. Similar to how we used the term preservation in Chapter 4,we use the term decluttering to indicate specific actions that are part of the keepingstage of personal data curation.5.1 IntroductionData curation involves both keeping and discarding data. After looking at preser-vation tendencies in the previous chapter, we now take a closer look at how peopledecide what to get rid of when curating their personal data. As we have establishedbefore, personal data curation is the overall process of keeping data and managingit for future use. Past work tends to focus on the items that people keep, as we53outlined in Chapter 2, pointing to criteria that people might use. However, as wehave seen, curation is not only about keeping. It is also about discarding. Less isknown about criteria for discarding data.In this study, we aim to fill this gap by asking: what are the types of data thatpeople consider when curating, and what are the criteria and practices they use to“declutter”? To answer this question, we conducted a mixed-methods study con-sisting of contextual interviews with 7 participants as a followup to Study 1 (23interviews on “hoarding” and “minimalism”, see Chapter 4), and an online surveywith 349 participants. The goal of the study was to investigate “decluttering” prac-tices with two different samples (narrow and specific in the interviews, broaderand more general in the survey) to generate insights that can inform the design ofpersonal data curation tools.We define decluttering as the act of removing or reorganizing data. We see it aspart of the overall process of selecting what data to keep or discard within curation.Decluttering can involve actions from across the curation cycle such as deleting,hiding, archiving, moving, and so on.Using data from both the interviews and the open-ended responses from thesurvey, we outline a taxonomy of data types and decluttering criteria, together witha set of decluttering practices, that we use to inform the rest of the thesis work.5.2 MethodologyTo answer our research questions, we used a mixed-methods approach: first, weconducted 7 in-depth contextual interviews; then, we ran a broader survey with349 respondents. We present details about each method, followed by the combinedanalysis.5.2.1 Contextual interviews with self-identified minimalistsIn the first phase of the study, we conducted interviews to investigate the processof decluttering possessions, both physical and digital. Our goal was to identifydecluttering practices to use as a basis for design.54Participants and research processIn the interviews, we took a constructivist grounded theory approach [56], andused theoretical sampling to recruit seven self-identified “minimalists” (2 women,5 men) from a local “Minimalist Meetup” group in Vancouver, Canada. We chosethis specific population to expand on Study 1, to complement additional recentliterature that focuses more on a “hoarding” perspective [271], and to better explorevariation in attitudes among people who take a minimalist approach. Recent workshows that “minimalist” participants can provide inspiration for design [58, 59].Six participants identified as Caucasian, one as Asian. They all used a computerand a smartphone, except P1, who used a flip phone. P6 also used a tablet. Theiroccupations and ages varied: P1 (31, receptionist), P2 (36, investment analyst),P3 (19, barista), P4 (35, system analyst), P5 (32, administrator), P6 (49, projectmanager), P7 (36, life coach). Their living arrangements also varied: most lived instudios or one-bedroom apartments. One lived in a minivan, one in a family house.The research process was iterative and spanned four months. Data analysisinformed data collection. After each interview, we transcribed the audio verbatimand did a first round of open coding. We proceeded until theoretical saturation.We also used data from the survey as a complement. In most cases, interviewstook place at the homes of participants (Figure 5.1) and lasted approximately onehour each. One interview took place at the participant’s place of work for schedul-ing purposes. Another interview took place in part in a minivan and in part in apark near the minivan (the participant’s current living space). Participants werecompensated $40.Interview questionsThe interviews were semi-structured. In all interviews we asked participants togive us a tour of their place and their possessions. Then, we asked them to recalla recent time in which they decluttered possessions and if possible to show us theprocess. We discussed what clutter meant for them, what was something clutteredin their home, and how they approached minimalism. We asked similar questionswhen switching to digital possessions. Participants showed us the data they werecomfortable with, walking us through their organization and decluttering practices.55Figure 5.1: In Study 2 we ran contextual interviews with 7 self-identified“minimalists,” asking them to show us and discuss how they curatedtheir possessions, both physical and digital. In this figure, some of thehomes and highlights that participants discussed (from left to right): asmall but cherished apartment; a collection of ebooks contrasted witha collection of physical magazines; a frugal apartment with no visibledevices; a set of boxes for “purging” items.We also probed the difference between the two domains and any challenges theyface in either or both. In the later interviews we also asked about digital tools todelete data (e.g., Clean My Mac) and whether participants used them.Data analysisWe started the analysis with line-by-line open coding using mostly in vivo codes1.To keep the focus on actions, events, and the underlying process, we used “activecodes” with gerunds [56]. We used constant comparison to better understand dif-ferent contexts and individual attitudes. We later grouped codes into categories.Then, we did a second round of focused coding, and finally a round of selectivecoding. We also used memos: some became part of the categories we present,others come up as choices in the research process (some examples are available inAppendix C).5.2.2 Online surveyIn parallel to the interviews, we conducted an online survey about data managementbehaviors and decluttering episodes. Our goal was to complement the interviewswith a larger sample and collect a broader range of decluttering experiences.1“In vivo” codes are based on participants’ words. They are a way to be alert about participants’language and preserve specific meanings [56].56Survey participantsWe recruited survey participants through mailing lists, a university paid studiesboard, word of mouth, and online postings in Vancouver, Canada. In total, 349participants took the survey and answered at least one question. 334 participantsreported their age: 18-24 (45%), 25-34 (34%), 35-44 (11%), 45-54 (5%), 55-64(2%), 65-74 (1%), 75+ (1%). 335 reported their gender identity: woman (69%),man (27%), transgender (1%), other (2%). Out of 335, 30% had experience withcomputer programming. 334 reported the highest level of education: primary orhigh school (32%), technical training (7%), Bachelor level (44%), Master level(16%), PhD level (0.9%). 319 reported their cultural background and ethnicity,with a wide variety (e.g., African American, Asian, European, North American,South American) 318 reported their current occupation: 127 were students at dif-ferent levels; others had a variety of occupations (e.g., artist, cook, designer, nurse,project manager, teacher). Each survey question has a slightly different number ofresponses because only one question was mandatory. Our ethics board prescribedthat participants had the right to withdraw at any point. All participants had a 1/10chance of winning a $25 gift card.Selected survey questionsIn the survey (administered using Qualtrics), we asked three questions about de-cluttering digital data: 1) “When was the last time you decluttered some of yourdigital data, that is, you were in an active session focused mostly on deleting someof your data?” (348 responses) 2) “Can you briefly describe what you decluttered?”(325 responses) 3) “How often do you declutter digital data?” (348 responses).Question 2 was open-ended, the others multiple-choice.In the survey, we also asked questions about decluttering physical possessions(see Appendix B), but we do not report them here because they overlap with previ-ous work [140, 227]. Similarly, we asked participants about their general approachin keeping or deleting digital data (349 responses), their approach for differentdata types (e.g., documents, photos, media files, texts, apps, bookmarks, contacts,Facebook friends) (346 responses for the data types on average), and to agree ordisagree with a list of statements connected to general data tendencies. We do not57elaborate on these questions, because we realised they would not help us in otherstudies, but we report some descriptive results, together with the complete list ofsurvey questions in Appendix B.Data analysisFor close-ended survey responses, we report descriptive statistics based on par-ticipants’ answers. To analyse the open-ended survey responses to question 2 ondigital decluttering, we used open coding. We coded responses taking a mostly de-ductive approach, using words or sentence fragments as the unit of analysis. In thecoding, we used the following categories to look for specific aspects of declutter-ing: what data, when, how, why, where (in terms of devices or platforms). We alsodeveloped additional inductive categories that informed the decluttering practiceswe describe.5.3 ResultsIn this section, we present results from both the interviews and the survey, high-lighting at relevant points whether they are based on one or both methods.5.3.1 General decluttering habitsSurvey participants reported decluttering regardless of whether they tended to keepor delete digital data. They decluttered a wide range of data types with photos andscreenshots being the most mentioned type: photos and screenshots (mentioned by146 participants2); files and documents (107); email (64); audio and video files(53); apps and programs (33); texts and voicemails (19); system data, disk frag-ments, cookies, cache, logs (such as call or browser histories) (10); contacts (8);games (6); accounts (4); bookmarks (4) Facebook friends (2), reminders (1). Theydecluttered from phones (58), computers (40), cloud platforms (16), external harddrives (3), and tablets (2). We expand on criteria for different types below.2The numbers we provide here should be carefully interpreted as indicative. Because responseswere open-ended and unstructured, not every participant mentioned both what they decluttered andon which device they decluttered.58When asked about the most recent decluttering session, a third of survey par-ticipants reported decluttering digital items within the last month (30%). In termsof frequency, most reported decluttering data multiple times a year (33%). But arelatively high percentage of respondents reported rarely decluttering digital items(21%).5.3.2 A taxonomy of data types and decluttering criteriaAs mentioned above, survey participants reported decluttering a variety of datatypes. We grouped different data types into six macro-areas (Figure 5.2): docu-ments, organization, communication, media, system data, logging data.• Documents: files and folders, productivity documents (text documents,spreadsheets, presentations, PDFs).• Organization: tasks, notes and reminders, events, bookmarks.• Communication: emails, texts and messaging conversations, voicemails,phone contacts, social network contacts.• Media: photos and screenshots, videos, audio (music, playlists, recordings,voicemails, podcasts), ebooks, web articles, games.• System data: icons for apps or links on the desktop, applications, pass-words, accounts on websites and services, temporary data (e.g., cache, sys-tem logs, cookies).• Logging data: tracking data (location/watching/searching/browsing his-tory), life-logging data (e.g., mood tracking, food tracking , sleep tracking,or money tracking) 3.Then, we looked at decluttering criteria that survey participants mentioned us-ing for different types of data.3Although survey respondents did not mention specific logging apps, we included life-loggingdata together with other types of tracking data to encompass a range of technologies that HCI oftenassociates with the study of personal data [85, 87].59Figure 5.2: A taxonomy of personal data types, derived from the kinds of datathat survey participants reported decluttering. We note that even thoughthis taxonomy covers a relatively comprehensive set of data types, it isnot intended to be fully exhaustive given the nature of our methods.• Documents: Empty or broken content; without a file name or with a weirdfile name; duplicate similar to other documents; old versions of the samedocument; large size; old; unused; backed up on an external location orsynced on cloud services; created, shared with, or sent by other people;downloaded from a website.• Organization: Old; completed; irrelevant; unused (bookmarks); unreach-able (bookmarks).• Communication: spam or irrelevant content; newsletters; containing sen-sitive information (e.g., attachment with passport, address, or credit cardnumber); old; group conversations, outdated (for contacts); unused (for con-tacts).60• Media: Duplicates; large size; blurry photos; unflattering photos; emotionalcontent (e.g., post-breakup); irrelevant content; disliked content; not-safe-for-work content; shared or sent by others; containing sensitive information;used in another document (i.e., as attachment); content already consumed;unused.• System data: Unused; large size; old; tied to a service that does not existanymore.• Logging data: Irrelevant; containing sensitive information.These insights on data types and decluttering criteria informed the design ofthe Data Dashboard prototype (Chapter 8). In particular, we used the taxonomyof decluttering criteria as a starting point for populating the automatic categoriesin Explore Your Data, designing the recommendations in Quick Actions, and pro-viding default options in Settings (these are all key sections in the Data Dashboardprototype, that we explain in more detail in Chapter 8).5.3.3 Decluttering practicesBoth interview and survey participants reported a mix of practices for decluttering4.These practices were largely consistent between physical and digital items. Yet, aswe discuss later, the digital domain offered unmet opportunities for simplificationand better support.We characterize decluttering practices based on their temporal nature (routine,serendipitous, and triggered) and the selection strategy involved (mass declutteringor individual inspection). These different practices will play a role in the rest ofdissertation: in Chapter 6 we will discuss how broader temporal attitudes informbehavioral styles in data curation; while in Chapter 7 we will use selection regimesas a key design dimension to explore.In the analysis, we also touch on the role of external tools for decluttering andkey differences between physical and digital decluttering that can inform the designof data management tools.4In this section, we use the label PX (i.e., Participant X) for interview participants’ quotes andRX (i.e., Response X) for survey participants’ quotes.61Routine declutteringRoutine decluttering happened regularly, as a habit. Participants took some timeto go through their data and delete some. For example, P4 decluttered his largeGoogle Photos library every week: “I do it as a routine. Every week I clean upmy entire Google Photos to make sure that there are no photos that I do not want.I categorize it. I create the albums and I put them in my trips, so that it’s noteverywhere. [...] If I do not this, it’s scattered across the entire timeline and it’sa pain to search.” Routine decluttering could be frequent or infrequent, dependingon different data types: “I delete text conversations on a daily basis, my emailinbox is always managed so that it’s empty. My photos are sorted weekly. At theend of each month, I declutter files. At the end of each month I declutter friends onsocial media.” (R34). Routine decluttering helped participants to mark the passageof time or a transitional period in their life: “I finished the school term and decidedto organize my school notes and delete what I didn’t need” (R160).An important distinction compared to other decluttering practices is that in-trinsic motivations drove routine decluttering: respondents, on their own accord,made some time in their everyday life with the explicit goal of decluttering data,without external pressures. A key characteristic of routine decluttering was its lon-gitudinal nature: it took place gradually over time. This points to the importanceof visualizing temporal dimensions of data, something that commercially availabletools to manage data do not do well–we explore this idea with Patina, one of thedesign concepts we introduce in Chapter 7. The notion of routine decluttering isconsistent with insights from previous work [151, 152, 311].Serendipitous declutteringSerendipitous decluttering occurred in the context of another activity. For exam-ple, when participants were busy and not planning to declutter. But, by encounter-ing some items that fit decluttering criteria in a serendipitous way, they decidedto declutter. Serendipitous decluttering could happen while browsing throughdata: “I was looking through old photos and realized I could delete a bunch ofthem” (R144). Or, while looking for something specific and failing to find it:62“When I try to find something and I can’t find it, I usually go through and filterout what I don’t want/need anymore, which will declutter my digital data” (R170).The main difference between serendipitous and routine decluttering was theunexpected and unplanned, yet ultimately productive, outcome of the process. Thecontext of items was essential to serendipitous decluttering: items popped up ascandidates for decluttering in relation to a broader set. Once again, declutteringcriteria were important to decide what to get rid of, but many of the criteria areessentially invisible in digital items and require careful consideration from users.Triggered declutteringTriggered decluttering took place after a relatively infrequent event. For example,a breakup [247], buying a new device, or space running low. For digital items, thetrigger was often a notification from the operating system: “My mac kept poppingup with a notification telling me I only have X GB of storage, your disk is almostfull.” (R271) Participants felt forced to act and discard items, even though thismight not have been what they wanted to do. For example, P2 had a stronglyminimalist approach with physical possessions, but not so much with digital data,rarely finding the time and the will to delete things. On his 32GB smartphone, hehad only 1GB of storage left. He explained that he recently had to delete somepictures because of an update: “I deleted some photos [because of] the regularupdate. It wanted to update the OS and I didn’t have the choice. The pop upmessage said the update will install when I restart the phone. So I had no choice,right? I had to do it [delete]. So I got prompted again, that’s when I deleted.”Triggered decluttering shares some aspects with serendipitous decluttering:they both occur unexpectedly. But external factors motivate triggered declutter-ing while serendipitous decluttering relies on internal motivations.Triggered decluttering was also time sensitive and often needed to be targetedtowards an end goal (e.g., freeing up space). In the digital context, this can be hard,because digital items are often experienced as “placeless, spaceless, and form-less” [215], and it might be hard to find the best candidates to achieve the end goal:“My phone would not allow me to download a meditation app I wanted. It wasfrustrating getting rid of enough space and I didn’t have a clear idea of what took63how much space.” (R48) But digital tools could proactively find items to delete–an idea that we explore with Data Recommender, one of the design concepts weintroduce in Chapter 7. Previous mentions of specific triggers to declutter in HCIresearch include storage space when dealing with shared devices in the home [109],changes in important relationships (e.g., separation, divorce, or death) [249], andchanges in social status [151].Selection strategiesBased on how participants selected items to declutter, we describe two main strate-gies: mass decluttering and individual inspection. These strategies are consistentwith work by Jones & Ackerman [140], who highlight similar selection regimes.Mass decluttering, or “purging,” involved getting rid of many items at once (afinding consistent with Kim [151]). For example, P6 had a “purging week” at thestart of spring every year: “This one-week period in May. [You found me] rightin the middle of a purge. [...] I’ll stay here, spend the whole week doing minorrepairs on the house, patchwork. And the purge.” Books were among the items hewould purge; throughout the year he put aside the ones he did not want to keep sohe could donate them to a public library.Mass decluttering considered items and their merits as part of a larger group orcategory rather than individually. This strategy was often a way to jump-start thehabit of decluttering: participants wanted to get rid of the bulk of clutter so thatthey could follow on with a more gradual and regular process after it. For example,this is how P4 recalled a digital downsizing he had gone through: “I did a massivecleanup maybe two years ago before I gave my desktop away. [...] I cleaned upmy photos in Google Photos and then I thought, let’s clean up other data as well.I started cleaning up my contacts. In my phone, I had around 200 contacts, somepeople I did not even contact in years! [...] You know, things like my old dentist,my old doctor, those are just irrelevant. [...] It took long because there was a lot ofstuff I had to go through and delete or save.”A complementary selection regime involved going through items one by onethrough individual inspection. This was a cross-cutting strategy that often requiredmore time. For example, P1 recalled what happened when she had to move to a64smaller apartment (triggered decluttering). She went through clothes and itemsone by one and decided what to get rid of based on their use: “We had a very shorttime. [...] The process was really ’Do I wear this? Do we use this?’ And anythingthat wasn’t getting much wear and usage was gone” (P1). Several prior studiesalso mention the need for users to inspect items on an item-by-item basis beforedeciding what to discard [111, 150, 151, 299].The role of external tools in declutteringWhen it came to external help for digital decluttering, only one survey respondentmentioned using in-person company support to decide what to declutter: “I wasrunning out of space on my computer. I couldn’t figure out what was what, so Ihad to go to the Apple store genius bar and have them help me discern what wasdeletable.” (R22) Some interview participants had heard of or had used tools thatcan help declutter (e.g., CCleaner, AppCleaner, Clean My Mac). But in their view,these tools only supported basic behaviors, failing to address the most relevantdata: “[Those tools are] mostly to clean up space and temporary files, probably notmain files.” (P5) In Chapter 7 and Chapter 8 we will explore ideas and solutionsthat can provide a more comprehensive support for decluttering and data curation.5.4 Discussion5.4.1 Physical vs. digital declutteringWhen we recruited self-identified “minimalists” for the contextual interviews, weexpected they would have a similar approach for physical and digital possessions.But this was not always the case. Several interview participants mentioned how itwas more difficult decluttering digital items, a result consistent with past relatedwork we discuss in Chapter 2.Several qualities of digital data (e.g., spacelessness [215]), contributed to mak-ing digital decluttering a difficult task for many. In addition, the tools that partic-ipants reported using did not offer adequate support for the varied and contextualpractices that they used. These results further motivate the need to explore newways for supporting data decluttering and selection, something we do in the rest65of the dissertation. Here, we outline the general design directions we derived fromthis study and later explored in Chapter 7.5.4.2 Opportunities for supporting digital declutteringVisualizing temporal aspects of dataIn comparing physical and digital items, it was clear that for digital items it is noteasy to get a quick, visual sense of the different decluttering criteria that peoplemight want to refer to. Physical objects can give a better sense of them: they oftenshow traces of use and age, for example. If a user wants to know how frequentlythey have used a digital item in the past month, they might struggle. Other criteriaare difficult for current digital systems to capture in the first place. It is difficultfor a file browser to know what data a user dislikes, or what data they have littleemotional attachment to? Machine learning might offer a possible remedy to thislack of knowledge, but a more immediate solution might be to leverage metadatafor visualizing temporal aspects of data. For example, data management systemscould show the age of items or the frequency of interactions at a glance, withembedded visualizations. Design work in this area could leverage the idea of adigital “patina”, which is recurrent in HCI literature [99, 167, 211] and has clearties to “edit and read wear” [132]. This is one of the key design ideas we explore inChapter 7, connecting to attitudes that different behavioral styles (Chapter 6) mighthave.Proactively finding items to declutterSeveral technology companies are starting to recognize some of the issues aroundspace optimization and unwanted data that we have surfaced in this chapter. A com-mon approach to address this problem involves creating recommender systems thatcan proactively help users. For example, in 2018, Windows 10 introduced StorageSense [35], a new setting that can automatically delete temporary files or files in theDownloads folder that are older than a set time. Similarly, Google Photos providesusers with recommendations on how to declutter their cloud-stored pictures. Theseexamples show how recommender systems are gaining popularity in common data66management tools. However, it is not clear what tensions they introduce in thespace of personal data curation or how different users might perceive and react tothem. Will people trust the system to know what to discard? Should these sys-tems aggregate multiple items for mass processing forgoing individual inspection?These are key questions to address, that we later explore in Chapter 7.Preventing unwanted data accumulationWe have seen how participants encountered many types of data that they perceivedas clutter across their platforms and devices. In many cases, their declutteringpractices were a reaction to unwanted accumulation of items that they did notconsider important or relevant. If we take the idea of systems proactively rec-ommending items to discard to its extreme, we can imagine systems that insteadprevent the accumulation of unwanted data in the first place. Previous work hasexplored self-destructing data in emails [97] or having a lease on data shared withothers [192]. Similar approaches could also apply to a broad range of data typesand be more nuanced. A key opportunity is to explore solutions for capturing thisnuance and understanding how different users might experience similar extremetools. In Chapter 7, we will better explore the potential for this type of approach todata curation.5.4.3 LimitationsIt is important to acknowledge the limitations of this study and in particular ofthe online survey we conducted. In the survey, we relied on self-reported answersfrom participants, without being able to see their digital data. The survey samplealso had one third of respondents with a Computer Science background or expe-rience in programming. However, when we separated their answers from the restand conducted a visual inspection, we did not notice any major differences in thedescriptive results.When looking at the survey as a whole, there can be an apparent conflation ofdecluttering with deleting, but that is not intended. Even though the phrasing ofthe open-ended question on decluttering suggests that decluttering is mostly aboutdeleting data, participants reported actions other than deleting in their answers, al-67though our wording might have skewed responses towards focusing on deleting asopposed to other aspects of decluttering and curating data. We decided to describedecluttering as mostly related to deleting data to keep the question simple and un-derstandable by potential respondents: we distributed the survey in several coun-tries, with some having English as a second language, therefore associating declut-tering with deleting seemed a reasonable simplification. Terms such as archivingand curating might have been more difficult to understand for participants whosefirst language was not English.Overall, we see the survey as largely generative, with its insights informingthe design studies that come after it. The limitations in our questions and sampleprevent us from making stronger claims about the results. However, these resultscan be a starting point for future research.5.5 Summary and conclusionIn this chapter, we used data from interviews with self-identified minimalists andan online survey to outline a taxonomy of data types and decluttering criteria. Wealso described a set of temporal practices and selection strategies that participantsused to declutter their data. These results suggest opportunities for better support-ing curation in data management tools and highlight the contextual nature of userbehaviors.In the next chapter, we will synthesize our work on individual differences anduser practices with a set of five behavioral styles that can bridge empirical workwith design work.68Chapter 6Study 1, 2, 3, 4Synthesizing Behavioral Styles inPersonal Data CurationIn this chapter, we take a bird-eye view of all the four main interview studies thatmake up my dissertation. This is an unusual approach. The five behavioral stylesthat we present are a direct extension of Study 1 (Chapter 4) and Study 2 (Chap-ter 5). But they are also informed by Study 3 and Study 4, two design studies thatappear later in the dissertation (Chapter 7 and Chapter 8). We choose to present thebehavioral styles early in the dissertation because they are necessary to understandthe recruitment process in later studies (Chapter 7 and Chapter 8).6.1 IntroductionIn previous chapters, we explained how understanding individual differences fordeciding what personal data to keep or discard is one of the key questions in PIMresearch [144]. We also highlighted how previous research tends to focus on indi-vidual differences in organizing and retrieving data, leaving the keeping stage ofcuration underexplored. Previous studies in HCI and Information Science reporthow people’s digital archiving strategies are “widely varied” [259]. But despiteknowing this, a systematic model of individual behaviors is missing from the litera-ture. Identifying different types of users and their overarching approach to curation69has been a long-standing unaddressed need for the past decade. In discussing a life-cycle approach to personal data curation, Williams et al. [309] ask whether theremight exist “archetypes” for digital archiving. Gwizdka & Chignell [118] men-tion the possibility of “PIM personalities.” Bergman [23] outlines key variables forhow individual PIM behaviors differ, encouraging researchers to identify a set ofoverarching “PIM styles.” And Khan et al. [150] ask for a precise categorization of“archetypes” in relation to keeping and discarding decisions. This body of workshows that categorizing user behaviors is an important step for creating and thenevaluating new, personalized curation tools. In this chapter, we propose a set ofbehavioral styles to address this need.Our work in this space started with the Study 1, an exploratory interview studyon general tendencies for keeping and discarding data (Chapter 4). Analyzing 23interviews with a broad sample, we identified a spectrum of behaviors with two ex-tremes: “hoarding” (keeping most of data) on one side, and “minimalism” (keepingas little as possible) on the other. In Study 1, however, we highlighted that thereis considerable variation within the spectrum, leaving space for a more detailedcategorization.In this chapter, we pick up on the premise of better categorizing individualdifferences and turn it into a key research question: How can we make individualdifferences in personal data curation actionable for research and design? To an-swer this question, we propose a set of behavioral styles in personal data curation:Casual, Overwhelmed, Collector, Purger, and Frugal. These behavioral styles dif-fer along a set of behavioral dimensions and individual characteristics, while alsopointing to temporal aspects of personal data curation.To develop the behavioral styles, we used an iterative analysis process thatspanned all the four studies that make up the dissertation. Study 1 (Chapter 4) wasour initial exploratory study that focused on “hoarding” and “minimalist” tenden-cies in data preservation. Study 2 (Chapter 5) was a followup study consisting of 7interviews with self-identified “minimalists” (Study 2 also included an online sur-vey, but we do not focus on the survey results here.) Then, in Study 3 (Chapter 7)we conducted 16 interviews as part of a design-led exploration around data cura-tion. Finally, in Study 4 (Chapter 8) we conducted 18 interviews for evaluating asystem designed to accommodate the five behavioral styles that we had developed.70After describing the behavioral styles, we chronicle how their descriptionsevolved over time with the addition of new insights, and how we used them asa recruiting tool in later studies. To ground our analysis, we reflect back on ourdesign and research practice across the studies. We explain how we developed thefirst version of the behavioral styles after Study 1 and 2, and later used Study 3 and4 to validate them 1 and enrich them. Then, we discuss how the behavioral styleshelp us better understand data curation, and how our specific approach to formu-lating them can inform research practice. We conclude with a range of designopportunities that build on top of the behavioral styles.In this chapter, we make three main contributions: 1) We present an actionableset of behavioral styles that expand our understanding of personal data curationpractices. Designers, practitioners, and researchers can use the behavioral styles infuture user studies and product design. 2) We provide a reflexive account of the it-erative analysis process that led to the behavioral styles. Designers and researcherscan use this account to inform similar user modeling efforts in other domains. 3)We generate a set of opportunities tailored to the different behavioral styles thatcan drive future design work on personal data curation.6.2 Related workThe behavioral styles we present in this chapter advance our understanding of in-dividual differences in PIM with a specific focus on curation. In this section weoutline key known differences and categorizations of behaviors in PIM to contex-tualize our work.6.2.1 Individual differences in PIMA recurring insight in PIM studies is that people manage data in different ways.Bergman proposes 15 variables that can account for differences in PIM behaviorsand groups them into five categories [23]: “organization, structure, work process,1When we refer to validation, we are still basing our approach on Grounded Theory proceduresand criteria for qualitative research outlined in Chapter 3. Validating the behavioral styles does notmean that we are going to prove them as unequivocally true. Instead, we are checking them againstnew data, seeing how they hold up, and identifying any theoretical gaps in our current explanationthat need iteration.71memory, retrieval.” For example, organization can vary from ordered to disordered,while structure can result in large or small collections. Differences along these vari-ables lead to different categories of behaviors. For example, common categoriza-tions for organizing documents are piling (letting documents pile as part of shallowor “disorganized” hierarchies), filing (taking the time to organize documents intohierarchies with moderate structure) and structuring (intensive organization withdeep structures) [128, 177]. For emails, cleaners and keepers are two key cate-gorizations [117]. Cleaners tend to remove task-related documents, to-dos, andevent reminders from their email inbox, while keepers tend to leave them. Thesestrategies, however, are not exclusive and people often mix them depending on con-text [38, 128]. There are additional categorizations such as email prioritizers andarchivers [173], or frequent filers, no-filers and spring-cleaners [306], that previouswork summarizes in more detail [38, 118, 128, 220].Work from psychology and cognitive sociology can explain why individual dif-ferences in organization exist in the first place. For example, personality traits likeconscientiousness or neuroticism can help predict differences between filers andpilers [187]. People also differ in how they think of categories and these differ-ences influence their organization style. Some people perceive categories as fixed,with clear distinctions and boundaries–their file organization reflects this way ofthinking, with rigid structures that do not change over time [219]. Others do notperceive clear boundaries between categories and so they do not use any specificstructure in their organization (e.g., storing all their files on the Desktop). Mostpeople, though, might be in between the two extremes: they are flexible and canchange or remove boundaries over time [219].People also differ in how they retrieve information [25, 28, 302]. Despiteimprovements in search functions, users often prefer navigation for retrievingfiles [28] and search for retrieving emails [25]. The preference for one or the othermight be a personal trait (i.e., variable between individuals, but consistent withinthem) [25] and might change as people age [33].There are two key takeaways from this body of literature. One, data curationis a subjective process [22, 24, 26, 158]. Any supporting tool for different curationstages will need to rely on models of individual behaviors [23, 219]. Two, peopletend to distribute themselves in clusters along spectra of behaviors [23, 219] (and72Chapter 4). Our work adds to the literature on individual differences in PIM byfocusing on the first stage of curation, where people decide what data to keep ordiscard. Other than Study 1 (Chapter 4), we are not aware of models for individualdifferences about keeping and discarding decisions. We also look at how organi-zation behaviors, the second stage of curation, mix with keeping decisions. On afew occasions we touch on retrieval, the final stage of curation, but we do not con-sider it as a key focus of our work and refer to past work for more details on thisstage [24, 28, 274, 302]. In the remainder of this chapter, we use the term curationto largely refer to the first stage focused on decisions about what data to keep ordiscard.6.3 MethodologyTo develop the behavioral styles, we draw on a total of 64 interviews from the fourstudies that make up the dissertation (Figure 6.1). Each study had specific researchquestions and used somewhat different methods, that addressed the evolving natureof our overall investigation: we started with exploratory interviews to understandgeneral behaviors (Study 1), followed by more focused contextual interviews toexpand our initial results (Study 2). Then, we moved towards research throughdesign in Study 3 and 4, where we explored design artifacts as a prompting toolfor building knowledge [317]. All studies shared an interview component withoverlapping questions about what data participants tended to keep or discard, gen-eral management practices, feelings and attitudes about data curation. These arethe questions and answers we focus on. Our overarching methodology is in linewith constructivist grounded theory [56], with theoretical sampling informing themove from one dataset to the other. After a broad interview sample in Study 1,we moved to a specific and narrow sample in Study 2, and then came back to abroad, but purposefully varied interview sample in Study 3 and Study 4. Below,we summarize the four studies. All interviews took place in Vancouver, Canadaand lasted between 40 and 100 minutes (on average one hour).73Study 1 (2017)Broad sample (16 women, 7 men, aged 21-64)Exploratory interviews (23) on how participants decided what data to keep or discardChapter 4Study 2 (2018)Focused sample        (2 women, 5 men,  aged 19-49)Contextual interviews with self-identified minimalists (7) on decluttering strategiesChapter 5Study 3 (2018)Varied sample            (9 women, 6 men, 1 non-conforming, aged 23-71)Elicitation interviews (16) about five speculative design concepts for data selectionChapter 7Study 4 (2019)Varied sample  (12 women, 6 men,  aged 18-64)Evaluation interviews (18) focused on a prototype system for data curationChapter 8Focus of the studyInterview sampleFigure 6.1: An overview of the four interview datasets and the related studiesused for the analysis of the behavioral styles. For each study, we reportthe year, the corresponding chapter, a summary of the general focus,and an outline of the interview sample.6.3.1 Study 1: exploratory interviews with a broad sampleThe first study (Chapter 4) took place in 2017 and consisted of exploratory in-terviews about general keeping and discarding decisions with 23 participants (16women, 7 men, aged 21-64), with a broad set of occupations (e.g., consultant,cook, social worker, software tester). The interview questions touched on whatdata participants had kept through the years on different devices and how they de-cided what data to keep or discard. Participants showed us how they organizedtheir data, discussed what they considered important to keep, and their frustrationsor challenges with curating data.6.3.2 Study 2: contextual interviews with “minimalists”The second study (Chapter 5) took place in the first half of 2018. It consisted ofcontextual interviews with 7 self-identified “minimalists” (2 women, 5 men, aged19-49), who had different occupations (receptionist, investment analyst, barista,system analyst, administrator, project manager, life coach). (The study also in-74cluded an online survey, but here we focus only on data from the interviews.) Theinterviews took place at the home or office of participants (Figure 5.1). The in-terviews were semi-structured and focused on how participants decluttered data.Participants showed us their data and walked through their organization and cura-tion practices.6.3.3 Study 3: design exploration with with a varied sampleThe third study (Chapter 7) took place in the second half of 2018 and consistedof semi-structured interviews focused on eliciting reactions to a series of designconcepts for curating data from 16 participants (9 women, 6 men, 1 gender non-conforming, aged 23-71), who had a broad set of occupations (e.g., HR specialist,journalist, photographer). We recruited participants using a short description of thebehavioral styles in a screening survey to recruit participants based on their dif-ferent general approach to curation. The short description (included in the Resultssection) focused on the key aspect of each behavioral style and was largely stablebefore Study 3. (The complete screening survey is available in Appendix D). Theinterview questions touched on how participants organized and curated their dataover time. We also showed them some design concepts to help them discuss datacuration in more detail, reflecting on their behaviors, goals, and frustrations. Thedesign concepts were also a way to provide support for the different needs and ap-proaches in the behavioral style: some concepts were reflective and user-controlledin their approach to data curation and selection, others were more extreme and au-tomated. We thought that different extremes of the design dimensions we exploredcould be a good match for different styles of behavior. After the study, we used thethe 16 interviews to support and refine the behavioral styles.6.3.4 Study 4: system evaluation with a varied sampleThe fourth study (Chapter 8) took place in 2019 and consisted of structured ses-sions with 18 participants (12 women, 6 men, aged 18-64), with varying occu-pations (e.g., occupational therapist, sales associate, social worker). The studysessions focused on evaluating Data Dashboard, a prototype system for curatingpersonal data, designed to accommodate the different behavioral styles. For re-75cruiting, we used the same short description of the behavioral styles from Study 3.All the details of the prototype and evaluation are in Chapter 8. Here, we focuslargely the introductory interview part of the evaluation, where we asked partic-ipants to discuss and show us their personal data curation practices. Participantstalked about how they decided what personal data to keep or discard, rememberedspecific episodes in which they discarded or re-organized a certain number of items,and discussed any specific tools they used to curate data.6.3.5 Data analysisTo synthesize data from the four studies we used both a deductive and inductiveanalysis process. Here, we describe the process linearly for clarity, but we laterexpand on its iterative nature and how the behavioral styles evolved over time.In all of the studies we started with inductive open coding, as part of a reflexivethematic analysis [61] (Study 1, 3, 4) or a grounded theory approach [56] (Study 1and 2). We started by using in vivo codes, then grouped codes into categories, andthen evolved initial categories into more abstract concepts. The inductive codingphase was useful to connect participants’ quotes to specific patterns of behaviors.But throughout the studies we also used a more deductive approach, taking a high-level look at study participants and comparing them based on a set behavioral anddesign-oriented dimensions (Figure 6.2):Quantity of data: how much data participants dealt with on a dailybasis.Preservation tendency: how much they tended to keep or discard, frommost to a little. (This variable overlaps with Bergman’s collectionsize [23].)Organization approach: how they approached their data organiza-tion (structured, unstructured, a mix). (This variable overlaps withBergman’s variables around structure [23].)Feelings: how they related to data and curation (attentive, satisfied,frustrated, indifferent, etc.).Thoughts: what were their priorities when curating data.76Pains: what were their main pains and frustrations with curating data.Goals: what they wanted to achieve when curating data.The first three dimensions are behavioral variables based on the variation wesaw across participants in Study 1 (Chapter 4) and Study 2 (Chapter 5) but alsooverlap with related work on PIM variables [23]. The last four dimensions borrowfrom the empathy mapping framework [100]. Empathy maps are a common designtool for building user models. They focus on what users say, think, do, feel, andwant to achieve during an activity or when using a product. One possible short-coming of empathy maps is their static nature [256]. Throughout the analysis andthis chapter, we overcome this potential issue in three ways. First, we see the em-pathy mapping framework as a starting point for coding rather than a prescriptivetool. Second, we ground our analysis in specific experiences and episodes thatparticipants reported. Third, we elaborate on the dynamic and composite nature ofthe behavioral styles across time and contexts. This approach combines the practi-cal nature of a tool such as an empathy map with the focus on narratives and userexperience of more complex analysis techniques.With these two parallel approaches, one inductive, one deductive, we focusedon recurring patterns among participants using a process of constant comparisonfor the different dimensions. Then, we clustered the patterns in a set of behavioralarchetypes. Behavioral archetypes are similar to personas, another common designtool for user modeling. But they differ in giving emphasis to patterns of behaviorover demographics [20]. Archetypes are also more contextual than personas: aperson can embody different behavioral styles over time or when changing con-text [20]. In general, building archetypes, or personas is a common, establishedstep of design research [37, 238]. Outside PIM, some examples of similar effortsinclude studies about privacy [83], games [223, 279], or child-centric design [7].We named the behavioral styles with labels based on feelings (“over-whelmed”), attitudes (“casual,” “frugal”), or behaviors (“collector,” “purger”).Because the behavioral styles share some aspects or behaviors, the labels highlighta key distinctive trait. For example, both the collector and purger behavioral stylesshare a disciplined attitude. But choosing a behavior as their label emphasizes thedifference between them.77We also decided to label these patterns as “behavioral styles” instead of“archetypes” or similar labels for two reasons. First, the word “behavioral” em-phasizes how our descriptions focus on participants’ behaviors. Second, the word“styles” helps explain how to interpret them: these patterns are context-dependent,meaning that different people might have different styles in different situations or atdifferent points in time. As a whole, the set of behavioral styles are useful to iden-tify relevant patterns within individual participants. A real person who perfectlymatches the Collector behavioral style may not exist because individual behaviorsare dynamic and contextual. The styles are a cross-cutting synthesis of individualbehaviors and only represent a reference point that we check individual people orparticipants against. Then we can say, for example, that a person or participantmatches a bit of a collector, a bit of an overwhelmed, nothing of a purger, andso on. Later in this chapter, we explain the process we used to check individualparticipants against the different behavioral styles in Study 3 and Study 4.6.4 Behavioral styles and approachesAs we explain above, the behavioral styles we present are a synthesis of dynamicand contextual individual behaviors. We see the behavioral styles as dependenton the many data types (Chapter 5) participants had to manage and curate. Forexample, one person might feel overwhelmed by emails, take a casual approachwith work documents, and purge photos. For each behavioral style, we give anoverview of key dimensions and behaviors using a high-level description and ashort example. Then we discuss temporal aspects of participants’ behaviors.6.4.1 CasualParticipants who took a “casual” approach to curating their data had no particularworries, having a relaxed attitude to data management and organization. They keptsome items and discarded others, with no strict rules or underlying concerns. Sim-ilarly, they did not report substantial challenges in finding data when necessary.Sometimes they described themselves as “too lazy” to do things differently. Butthey also thought that data curation was not an important activity in their life: intheory, they could have been more systematic about curating and managing data,78102030More dataLess dataKeep moreDiscard moreLess organizedMore organizedFeel more positiveFeel  more negativeOverwhelmed0102030PurgerLess data Discard moreLess organizedFeel  more negativeMore data Keep moreMore organizedFeel more positive0102030CasualLess data Discard moreLess organizedFeel  more negativeMore data Keep moreMore organizedFeel more positive0102030FrugalMore data Keep moreMore organizedFeel more positiveLess data Discard moreLess organizedFeel  more negative0102030CollectorMore data Keep moreMore organizedFeel more positiveLess data Discard moreLess organizedFeel  more negativeFigure 6.2: A visual overview of how the five behavioral styles map to key di-mensions we used in the data analysis: quantity of data (more data - lessdata), preservation tendency (keep more - discard more), organizationapproach (more organized - less organized), and feelings (more posi-tive - more negative). The graphs are illustrative and do not represent aprecise numerical comparison between the behavioral styles.79but they did not see why they should have bothered. Their priorities were else-where. Even when discussing privacy and security aspects of their personal data,they did not appear particularly concerned. Some factors that influenced this ap-proach included having newer devices with a significant amount of free space orbeing willing (and able) to pay for cloud storage space when needed.For example, one participant explained that he did not take the time to curateebooks because the time investment to do so was not worth it:“My digital books [...] I probably have hundreds of epubs and PDFsin here, some of which I read, some of which I haven’t and yeah, Iwould want to go in here and delete them in principle, but in practiceI don’t. I don’t have six to ten to twenty hours to do this just to save afew megabytes of storage. It doesn’t bother me that these are here. [...]I am not going to get to this. I have more important things to do.” (P6,Study 2)6.4.2 OverwhelmedParticipants who felt “overwhelmed” tended to acquire or create large amounts ofdata and keep most of it. However, they did not feel positively about their approachbecause, in fact, it was not a fully intentional approach and they wanted to changeit. Often participants explained how they tried to organize and curate data, butthey failed because they did not have enough time to keep up with the amount ofdata, or they did not have the knowledge required to organize as efficiently as theywould have liked to. Participants could be too busy with other responsibilities inlife and their interactions at work or in personal life often forced them to deal withlarge amounts of data on a regular basis, with more data coming in than they wereable to deal with. Resigned to accumulating data, they experienced challenges inknowing where to find their digital items or how to better manage them. Severalparticipants who felt overwhelmed thought that proactive, automatic tools couldhelp them overcome their challenges in curating data.As an example, this is how one participant described feeling “overwhelmed”and resigned when thinking about data from many apps, touching on challenges inthe whole curation process, from keeping decisions, to organization and retrieval:80“Sometimes [it is challenging to] just remember where I saved [anitem]. Sometimes when I’m looking for a file–I’m a very obsessivenote taker. I have a lot of notes apps, I use Evernote, Google Keep,Simplenote. I used to have a lot more notes app but I found it hardto back it up. I remember something [and think]: ‘Where did I saveit? Where did I put it?’ because I have it in several places. [...] I justdon’t feel very productive. Sometimes I feel overwhelmed by all thisstuff that I keep track of. I think if there’s a system, a better way tokeep track. I keep a list of all the things I want to do in the future, butI feel there’s not a way I can organize it better.” (P10, Study 4)Another participant complained about “being behind” with her data manage-ment and emails in particular, due to a lack of time after having a baby. She wishedto return to a more manageable state:“I’m actually behind on my emails as well, what a surprise! So thisfor me [Gmail inbox with a list of emails] is not a good example, Ilike to have maybe one page of emails, my inbox is my to-do list. Thisfor me is too much, I like to have five emails normally, and then Iput everything in a folder, I have quite a few folders, and I’ve alwaysworked like that. [I haven’t taken care of them] purely [because of]time.” (P7, Study 2)6.4.3 CollectorParticipants who took a “collector” approach tended to acquire or create largeamounts of data and keep most of their items, often using multiple devices, cloudaccounts, and external hard disks. In terms of amount of data, they were similarto “overwhelmed” participants. But in this case, participants were happy abouttheir approach, sometimes showing pride in their collections, and did not want tochange it. They reported being generally organized in how they managed data andfeeling “on top of things,” similar to “purger” participants. They felt that curatingdata was a personal responsibility and were somewhat skeptical of tools that couldsupport this process. They took many actions to make sure that they could preserve81and access their data in the long-term, for example, by doing regular backups andarchiving items in multiple locations. Their main worries were about the possibil-ity of losing data (either from a device or in a data leak), the attention required to“sync” data across platforms and devices, and the effort required to manage spaceefficiently so that they could “keep everything.”For example, one participant described choosing devices with large storagecapacity to hold data collected over the years:“I get electronics that have a capacity to hold things because I tend tostore stuff. I have a 3TB hard drive where I keep my photos. But Ialso keep them on my laptop, on my phone, all the time. I bought a128GB [phone] because I have a lot of data that goes and comes andI don’t like to delete it all the time. I do a cleanup of my phone andlaptop maybe once a year and that’s only because I get a message like,hi, you need to clean your phone because there is no more space.” (P9,Study 3)6.4.4 PurgerParticipants who took a “purger” approach tended to only keep necessary data,even though they often had a relatively large amount of data to manage. Theyreported regularly “cleaning up” devices and described themselves as being quiteorganized. Curating and upkeeping personal data was an important activity forthese participants, who spent considerable time decluttering and reorganizing itemson a regular basis. Sometimes the limited amount of space on their devices drovetheir behaviors. But often, they reported purging data because of reasons other thanspace: for example, they wanted to access data more easily, they disliked keepingunnecessary data, they wanted to protect privacy, and they enjoyed the activity ofcleaning up data in itself. Overall, they felt positive about their approach, oftensaying that they were “on top of things” and they wanted to continue managingtheir data in a similar way. In fact, they were interested in tools that would makethem more efficient and were often open to the idea of optimizing curation throughautomation.82One participant, for example, described regularly cleaning up data to find itemsfaster saying they were “a bit OCD”:“Every two or three weeks I delete those files I don’t need. [...] IfI put a file in Google Drive and I try to find it, I don’t wanna scrolldown to everything and check the names. [...] For things I don’t need,I’m like, why keep it? Yeah, I don’t want to keep [unnecessary things]there. I just don’t want to. It’s kinda OCD. For the phone I periodicallydelete pictures I don’t want and apps I don’t use anymore. But on thephone it’s partly due to storage problems. Which is not that much of aconcern with my Mac. But in general I just don’t wanna keep uselessthings.” (P3, Study 3)6.4.5 FrugalParticipants who took a “frugal” approach tended to keep little of their data, but,unlike other behavioral styles, they did not have a lot of personal data to begin with(for example, because their job or personal life did not require them to deal withdata frequently). These participants felt that they were not necessarily organized,but because of the limited amount of data they managed, this was not an issue. Infact, frugal participants wanted to minimize their personal data so that they couldreduce the time necessary to manage and curate items. They cared about privacyand security, and were open to user-controlled tools that could help them reducetheir data, although they often did not know or use any in their daily life. On a moregeneral level, they had a somewhat uneasy relationship with technology, trying toavoid spending time or money on digital devices. For example, it was common thatparticipants said they did not want to pay for storage space and often they used olddevices, like flip phones. These participants considered the realm of digital dataas secondary and not as important as anything that was physical or disconnectedfrom technology. They talked aspirationally about wanting to live a life free ofdistractions and intrusions from data and technology.For example, one participant explained the different approach in curating phys-ical and digital items saying that she did not feel a connection to digital items atall:83“I feel I don’t have much of a connection to physical things, but I haveeven less to digital, right? So, it’s like... digital things, I just do notcare. Delete, delete, delete. With physical, I would stop and thinkbefore I put [them] in the garbage. (P1, Study 2)6.4.6 Temporal aspects of data curationOne common aspect across behavioral styles was the ongoing nature of curationas a long-term practice. We found that participants curated their data at differentmoments and with different strategies or approaches for decluttering data (that wedetail in Chapter 5): routinely if they did it on a regular basis, without urgency;serendipitously, when they happened to encounter data while in the middle of othertasks; and urgently, when they were triggered by events like storage space on theirdevices running out. Different behavioral styles relied on different strategies moreor less often, as we show above.User strategies often involved looking at time as a dimension of curation, withparticipants showing both a retrospective and a prospective attitude to data. In aretrospective attitude, they looked back and evaluated items based on their use andother subjective criteria. In a prospective attitude, instead, they looked forward,trying to predict their future data needs. As one participant explained: “I used tohave a lot [of yoga albums on my phone] but now only keep six. [...] I had to gothrough and think whether I would use it in the near future or not. So I only keptsix. I picked the best and I got rid of the rest.” (P4, Study 2)Over time, participants’ approaches often evolved, influenced by life changesand transitions. Participants expressed a need to use curation as a way to evolve andredefine their identity. This process involved reflecting on the past and anticipatingthe future. It meant re-considering values and re-negotiating attachments. Forexample, one participant described her approach to organization and curation as“phasic,” explaining that her data practices changed over time:“I am not a person who keeps organizing all of what is getting stored,there isn’t the time and attention. Long back I had folders for myprofessional [things], work. I don’t organize that frequently now. Perneed, I do. If I am working on a project, or moving, any documents I84collect, I would place them in one folder so that I have ease of access,so it’s more of a phasic organization. [...] Taking time to look atthe pictures: we tend to forget, and that’s how the world seems to bemoving, click, click, click. Unless you’re posting somewhere. Therewas a time I used to post on Facebook, which I stopped, for severalyears. It doesn’t seem that I need to, I am fine. That also stopped mefrom accessing my photos and organizing them.” (P17, Study 4)6.5 Evolution of the behavioral styles descriptionsIn this section, we reflect on how the behavioral styles evolved over time acrossthe four studies and expand on our process. The analysis process was highly itera-tive and took several months (in parallel with analysis focused on answering morespecific research questions in each study).After Study 1, we wanted to expand the focus of our investigation and unpackthe spectrum of “hoarding” and “minimalist” behaviors. In particular, we decidedto tease out the individual variation within “minimalist” tendencies by taking adeeper look at a smaller set of participants. This approach helped us make ourinitial model of tendencies more complete. But it was still not enough for movingtowards design.Knowing that we wanted to explore possible, alternative design solutions in thespace of data curation, we needed an actionable way of referring to the individualdifferences we saw across participants. We also realized that we needed a wayto recruit participants based on their differences, if we were to fully explore thedesign space. Recruiting a sample with varied approaches to data curation wasessential to validate our design process and the idea of building personalized toolsfor curation. So, after the first two studies, we decided to take a higher-level look atthe interviews we had collected and develop a set of user “archetypes,” behavioralpatterns that could summarize individual behaviors, inform design decisions andhelp evaluating them. We chose to develop “archetypes” inspired by recent industrywork on user modeling [20]. We later renamed them as “user types” and then as“behavioral styles” to avoid an over-simplistic interpretation of these patterns as85ideal, fixed categories of behaviors. From the beginning, we avoided thinking ofthe behavioral styles as personas, as we discuss later.6.5.1 Initial, preliminary versionThe initial work on the behavioral styles took place after Study 2 and a few monthsbefore running Study 3. In this first phase, we analysed the 30 interviews fromStudy 1 and 2 using the process described in Section 6.3.5. The initial set of be-havioral styles (at the time still labelled “archetypes”) consisted of six instead offive (what would become the “casual” style consisted of two separate but relatedstyles, “inbetweeners” and “untroubled”). Some of the styles also used differentlabels. For example, “disciplined” instead of “purger”.6.5.2 A first, stable versionBefore running Study 3, we used the initial version of the behavioral styles todiscuss the framework within the team and our research lab. We asked the opinionof people who were familiar with our line of work as a way to test how the set ofbehavioral styles resonated with an external audience. In this phase, we realizedthat some of the labels were not communicating the key characteristics of eachstyle, and that two of the styles (”inbetweeners” and “untroubled”) could be mergedinto one without losing meaning. We went back to the interviews, looked at ourcodes, and, using constant comparison, we iterated on the labels and segmentationof the styles, arriving at a first, stable version that was very close to the one wepresent here.6.5.3 Summarizing the behavioral styles for recruitmentWith the five behavioral styles largely stabilized, we used a short description ofthem in a screening survey to recruit a varied sample of participants in Study 3(and later Study 4). Respondents could choose one of the five options below, eachcorresponding to one behavioral style. We used the same set of options in bothstudies. Note that respondents did not see any label, only the descriptions. Theshort descriptions capture only the key differentiating aspects of each behavioral86style, balancing the need for nuance, with the need for simplicity that a screeningsurvey requires:Casual: I keep some data, delete some, and generally have a prettyrelaxed approach to organization. I don’t think or worry much aboutdata management.Overwhelmed: I keep most of my data, but sometimes feel over-whelmed by it. I am not as organized as I probably should be becauseit is hard or I don’t have time to do it. I would like to get rid of somedata.Collector: I keep lots of data, I am generally organized, and I amhappy about my approach.Purger: I only keep necessary data. I am organized and regularlyspend time deleting and managing data.Frugal: I do not have a lot of data to begin with and I try to avoidspending a lot of time with technology.As we have mentioned before, we do not see a strict one-to-one correspondencebetween a behavioral style and a person because these patterns are dynamic andcontextual: behavioral styles can vary based on data types and over time. However,we asked participants to choose only one option to simplify the survey and alsoensure stronger individual variation in the sample.6.5.4 Validating the behavioral styles with additional studiesWith the behavioral styles summarized in short descriptions, we used Study 3 and 4as a way to validate our first stable version and also enrich it (as we detail below).In both of the last two studies, we had to check the match of participants to theoption they picked in the screening survey. We did this by asking them the sameinterview questions from Study 1 and 2 that we had used to generate the behavioralstyles in the first place. Carrying overlapping interview questions across differentstudies allowed us to check our assumptions with new data and slowly lead toconvergence.87In general, most participants in Study 3 and 4 matched what they reported astheir main approach when filling out the screening survey. But, as expected, theyoften had exceptions to their general approach. It was common for participantsto display and discuss patterns of behaviors linked to more than one behavioralstyle, based on different contexts (e.g., work life vs. personal life) and data types(e.g., photos vs. messages). In most cases, participants closely matched one or twobehavioral styles. In a few cases, they matched three.In a few cases the self-reported approach of participants in Study 3 and 4 did notmatch what they told us and showed us during the study sessions. Often, it was amatter of having a very diversified approach for many different types of data, as weexplain above, with the option chosen in the survey not fully capturing this nuanceand diversity of data types (as we expected from having a simplified description ofthe behavioral styles and a single-choice question). In a couple of cases we hadparticipants reporting being “collectors,” for example, but when talking to themand seeing their data practices, we felt they were closer to being “overwhelmed,”because they used similar expressions and showed similar patterns to other partic-ipants we grouped in the “overwhelmed” behavioral style. With the exception ofthis minority of cases, our screening question seemed a reasonable approximationof actual behaviors but we later note how it should be seen as a tool that needschecking, rather than a definitive match, especially because different data typesoften call for a different approach. There might also be individual differences inthe degree of reflexivity and confidence that different people have around data cura-tion, and out of hundreds of respondents, a few might have answered the behavioralstyle question, which appeared last on the survey, without paying close attention toall the options (this is always a possibility in surveys).Another result of using the behavioral styles as a screening tool for recruitmentis that both in Study 3 and 4 we managed to recruit only a few participants whodisplayed a frugal approach. However, we expected this result given our onlinerecruitment process: we did not expect a large number of respondents who largelytake a frugal approach to frequently check online recruitment advertisements. Fur-ther, the frugal behavioral style is based on a smaller number of participants. Re-gardless, we felt it was different enough to be part of the analysis.886.5.5 Connecting the behavioral styles to design researchIn Studies 3 and 4 we also used the behavioral styles to drive our design processand, at the same time, we leveraged the resulting design work to enrich the behav-ioral styles. A key goal of our work across the studies is to provide support fordifferent patterns of behavior in personal data curation. The behavioral styles wereessential for defining and scoping these patterns, and then understanding how to in-form design decisions. In Study 3 (Chapter 7) we explore different dimensions ofcuration based on previous work, including automation and system aggressiveness.When exploring how to make these dimensions into concrete design concepts weoften referred back to the behavioral styles. For example, the collector behavioralstyle informed Patina, one of the concepts we explore, a visualization of temporalaspects of data (e.g., frequency of use, age). We imagine that a similar conceptmight play well with users who tend to keep and manage large quantities of dataover time. Similarly, we have concepts (Temporary Folder and App, Future Filters)where data is automatically deleted in the future that were inspired by prospectivecuration decisions and the need to “purge” items that some behavioral styles re-ferred to.In Study 4 (Chapter 8), instead, we bring together insights from previous stud-ies into a cohesive prototype system for curating data. Here, we used high-levelgoals from across the behavioral styles to create a set of filters for sorting throughpersonal data: wanting to find data to purge, wanting to improve organization,wanting to protect sensitive information, and wanting to avoid any potential dataloss. We also used different general attitudes about automation to design two differ-ent interfaces for achieving the same goals: one focused on providing informationbut leaving the ultimate decisions about what to do with users, and one focusedon automating the data curation process with recommendations. We detail thesediverging attitudes in Chapter 7. Drawing on the behavioral styles, we outline aseries of use scenarios for evaluating the prototype and we develop additional in-sights about data curation, that we detail in Chapter 8.We used the design work from these last two studies to elicit reactions fromparticipants and add more dimensions to the behavioral styles, considering aspectsof personal data curation that we had paid less attention to before. For example, in89Study 2 we started to look at how data management tools play a role in supporting(or hindering) data curation practices. We continued to look at tools and differentattitudes towards the role of technology in Study 3 and 4. In Study 3 we enrichedthe behavioral styles descriptions with attitudes about automation, detailing op-posing stances toward automated curation tools. In Study 4, instead, we exploredaspects of curation related to privacy and security, aspects of curation that we high-light at different points in the behavioral styles descriptions. The shift in methods,from exploratory and contextual interviews (Study 1 and 2), to research throughdesign (Study 3 and 4) allowed us to look at the behavioral styles from differentperspectives. We were able to build a deeper understanding of curation leveragingboth user narratives [288] (Study 1 and 2) and direct exposure to possible designsolutions (Study 3 and 4).6.5.6 Final versionBetween Study 3 and 4 the behavioral styles were largely stable. Several partici-pants from later studies closely matched answers from earlier studies. At this point,we felt the work had reached convergence and the behavioral styles were matureenough. The version of the behavioral styles that we presented earlier is the resultof this overarching, iterative process.6.6 Discussion6.6.1 How the behavioral styles help us understand data curationWe started the chapter by asking how to make individual differences in personaldata curation actionable. The behavioral styles we described fulfill that role as aresource for research and design. But the idea of categorizing people and assigningthem labels can be controversial. Categories can hide implicit judgements aboutbehaviors and risk placing people into a singular box that they may not solely fitwithin. Additionally, due to the subjective nature of data curation, implicit judge-ments and categorizations are almost impossible to escape. Participants often de-scribed their own respective approach by comparing themselves to other people:“I know people who keep things for years and never go back. I’m not that kind of90person” (P15, Study 4), “Unlike many people I use my phone with extreme cau-tion” (P4, Study 4). Such statements show how participants compared and judgedothers to rationalise and communicate their own approach. Personal data curationhelped participants define themselves against other people [139].Both Study 1 (Chapter 4) and prior work show that curating data helps in iden-tity building [66, 68, 140, 149] and that people hold strong moral views about thebest way to manage their data [288]. Study 1 introduced the idea of “hoarding”and “minimalism” as two opposite ways of thinking about data and identity. How-ever, this process is contextual and the labels are grounded in participants’ words.The behavioral styles expand that model, with different styles matching one or theother extreme: the Overwhelmed and Collector behavioral styles tease out vari-ation within the tendency of “hoarding,” the Frugal and Purger behavioral stylestease out “minimalism,” and the Casual behavioral style represents a bridge be-tween the two sides of the spectrum (Figure 6.3). They tell us what actions peopletake to fulfill the core function of identity building at the heart of data curationand show us how this process takes place over time. These insights resonate withtheories from sociology, consumer behavior, and social psychology that detail howidentity building is a process that shifts and evolves over time through social inter-action [17, 65, 194]. People imbue tools and objects around them, both physicaland digital, with symbolic meaning [17, 18]. Then, these objects and concepts,with their symbolic meaning, play an important role in their evolving sense of selfand in relationships with other people. When we connect these ideas to the wayparticipants discussed their approach as evolving (or being stuck) in time, we cansee the behavioral styles as a dynamic categorization. The styles are not a set ofboxes. Instead, they are a set of patterns that can shift, mix, and evolve, just likepeople do. Below, we discuss the implications for design and research of concep-tualizing behavioral styles as we do.6.6.2 Behavioral styles as a resource for design and researchA common argument against the type of user modeling we present in this chapteris that people are complex and no behavioral style will ever fully capture theirnuanced experience. That said, we argue that the behavioral styles we built can be a91CollectorOverwhelmedCasualFrugalPurgerHoarding MinimalismFigure 6.3: A visual representation of the five behavioral styles as an expan-sion of more general data curation tendencies: the Overwhelmed andCollector align with “hoarding” tendencies, the Frugal and Purger with“minimalist” tendencies, and the Casual style bridges the two sides.useful resource to drive design decisions, inform research recruiting, and help withqualitative analysis. In our work across studies, they were an essential resourcethat we turned to when we did not have direct access to participants. They wererelatively inexpensive and they helped us ground our process. But it was essentialto see them as a tool: something you pick up when you need to, but that you putback when its job is done. To better explain our framing of behavioral styles as adesign and research tool, we compare them to personas.Why did we not use personas? We could have. Personas are a common methodin user-centred design, promising to drive decisions and build empathy. We couldhave created typical personas like “Harry, the Happy Hoarder,” or “Mary, the Min-imalist.” But we did not. We we decided to avoid thinking of these behavioralpatterns as personas because of personas often focus on demographics and assumethat individual behaviors are static. While recent work has looked at integratingcomplex identities into personas [180] or making them more participatory [204],the arguments against them are plenty [179]. Personas can be abstract and imper-92sonal [189], they can reiterate gender stereotypes [131] or power relationships [70],they can be removed from empirical data [88], and they are difficult to validate [53].Compared to personas, behavioral styles helped us focus on contextual patternsof behaviors. We were not trying to help and satisfy a single imaginary person.Instead, we were trying to support behaviors grounded in specific scenarios weobserved. This approach influenced our design outcomes and led us to solutionsthat opened a design space. For example, we could have created a tool that wouldcategorize one person as a typical user or persona based on demographics andbehaviors, and then personalize the interface according to fixed, preset criteria. Wedid not, because we saw behavioral styles as contextual patterns of behaviors thatonly represent a starting point. Similarly, when we used the behavioral styles as arecruiting tool, we saw them as a starting point for reaching variety in the sample.But they were not an end in themselves. After recruiting we still had to assess thefit of each participant to one or more behavioral style, and understand what weretheir exceptions to general patterns of behaviors. This approach made it possible togenerate unexpected design opportunities that build on the intersection of differentbehaviors and contexts.Our conception of behavioral styles as a representation of dynamic behaviorsand a generative resource calls into question common assumptions about user mod-eling. Often as designers and researchers we turn to modeling methods such aspersonas in search of a prescriptive and predictive model that can help design bylimiting its focus. Instead, we propose to look at behavioral styles and similarmodeling efforts as a generative and descriptive resource. The goal is not to reducedesign options, but instead opening new opportunities. In the next section, we out-line some opportunities based on our work, expanding on possible design effortsfor data curation tools and future research investigations around personal data.6.7 Opportunities for design and research6.7.1 Prioritizing user supportA first approach in design could be to prioritize different patterns of behaviors,based on the level of support they need. For example, we see “overwhelmed” users93as the ones needing most support. Participants who felt overwhelmed reported feel-ing stuck and resigned, but wanting to change their approach. In a sense, they feltleft behind by technology. Exploring new tools targeted at this style of behavioris a key opportunity. Recently, tech companies have introduced of tools that canproactively find and recommend items to declutter from devices or cloud platforms(e.g., Google Photos, Files). This is a reasonable approach, but one that can under-mine agency (as we show in Chapter 7). An alternative would be to inform usersof different curation practices and help them learn how to do things differentlyusing their current tools. This approach could take the form of assistive learningtools integrated in existing user interfaces. Automated suggestions could showusers possible alternative ways of organizing or curating data based on differentpatterns of behaviors. This is exactly how the behavioral styles can help gener-ate ideas and integrate patterns of behaviors into design work. For example, thesystem could show to “overwhelmed” users how a user with a “purger” approachwould re-organize and declutter their data. These tools could then explain the ratio-nale behind the suggestions, and let users try and explore the proposed approachesbefore committing to them. A different approach could be an illustrative, informa-tional guide, similar to related efforts about promoting awareness of backups [313]and privacy [200], that would illustrate different approaches and strategies foundin PIM studies as a form of direct knowledge-transfer from empirical research tousers.6.7.2 Making curation feel more engagingOther behavioral styles point to opportunities for making data curation more funand engaging. This approach could be compelling for some casual users while alsomatching the need to purge data of other behavioral styles. Some ideas includegamifying the curation process [320]; imagining new ways for repurposing and“recycling” old data (for example, transforming one data type into a different one:texts becoming haikus is a recent example [258]); or, allowing users to donate data,willingly, to projects that have some collective benefit (e.g., research projects, ahistorical archive), with proper mechanisms for anonymization. This last designdirection builds on investigations that have explored a similar idea in a different94context (e.g., donating personal data after dying) [151]. The emphasis would beon creating engaging experiences that offer benefits beyond optimizing space ororganization.6.7.3 Exploring the role of curation for memoryYet another opportunity is to better explore the role of personal data for memoryand how to make devices or cloud platforms feel like lived-in spaces rather thanstatic repositories. Features like “Remember This Day” on Facebook and “Redis-cover this day” on Google Photos seem to address this need [243]. But they alsocause tensions around the evolving nature of personal identity [252]. A more sub-tle approach could focus on segmenting and resurfacing data based on temporaldimensions [57, 209]: for example, grouping data based on calendar-based periods(e.g., songs I listen to on Tuesdays) or phases of the day (e.g., data I interact with onrainy nights). Services like Spotify provide similar personalized re-packaging ofmusic playlists [262], and we believe that similar ideas could be especially relevantfor users who take a collector approach. There is still a lot to explore about howto best integrate these ideas in the design of data management tools that includebroader types of data [210].6.7.4 Investigating new ways of capturing user variationAs we mentioned, we decided to condense the behavioral styles in a single questionto make the screening survey easier and quicker to answer: we had additional ques-tions to include and we wanted to keep the survey as short as possible. However,investigating new ways of including the behavioral styles in the recruiting processis a potential avenue to explore in future research. Studies in psychology that focuson capturing individual traits often use scales with several questions. A similarapproach could focus on wording the different behavioral styles as dimensions thatparticipants can measure themselves against. This approach could make it possi-ble to better capture the composite nature of individual behaviors. Still, an addedcomplexity when looking at data curation is the way different approaches and dif-ferent types of data intersect. Building scales with multiple items for multiple datatypes might increase precision but also add complexity for respondents. Another95opportunity would be to capture the behavioral styles in a visual way (for example,showing typical device and cloud platforms configurations for each style), lettingusers choose the one most similar to their own approach. A visual representationof different behavioral styles and strategies could also act as a probing tool duringuser interviews, helping participants better articulate their own curation strategiesby comparing them against established patterns.6.8 LimitationsWe do not see our work as globally valid or as an absolute truth about people’sbehaviors. Instead, similar to what we stated in Chapter 4, our analysis reflectsthe social, technological, and cultural context of the participants we talked to: pre-dominantly working-class folks, all living in a major Western country. These pa-rameters should inform the transferability of our results [110, 165, 280] to anothercontext. Future studies can complement our work by looking at similar researchquestions in a different context, to unpack differences and overlaps across culturesand generations. We also encourage future studies to take a different epistemo-logical approach for investigating individual differences in personal data curation.We took a constructivist, interpretive, qualitative approach [110, 165] groundingour analysis in participants’ experiences and narratives [288]. Future research canbuild on our work by collecting quantitative measures of user actions (e.g., numberof times a user moves, deletes, or re-accesses data on a given period) and exploreemerging patterns of behaviors. Some previous user modeling studies, for exam-ple, identify user clusters using surveys and correlational analysis [223, 279]. Asimilar line of work could focus on exploring the prevalence of different behav-ioral patterns in a broader population using quantitative data. Once again, we notethat we did not see a rigid one-to-one correspondence between a participant and abehavioral style and we caution against a similar interpretation of our work, but weencourage future studies to elaborate on this conclusion by using different method-ologies. Finally, an additional way to build upon and expand our contributions is totake a longitudinal approach and better capture temporal aspects of data over longperiods of time (e.g., months or years.)966.9 Summary and conclusionPersonal data curation is an underexplored topic with rich implications for the de-sign of data management tools. As society moves towards a world where digitaldata will become even more pervasive, it is essential to understand individual userneeds and accommodate the different ways people manage their personal data. Wehave proposed a set of individual user behaviors, describing five behavioral stylesand their approach to curation. The behavioral styles represent an actionable re-source for design and research, opening the way for new products and future in-vestigations in this domain. Leveraging our work, future research can tailor designefforts to different users and explore innovative ways of thinking about personaldata.In the following chapters, we build upon the behavioral styles to explore someof the design directions we outline. In particular, in Chapter 7 we use the tempo-ral dimensions of personal data curation and different behavioral styles to informthe design dimensions and concepts we explore. Then, in Chapter 8 we focus onprioritizing and personalizing user support for data curation.97Chapter 7Study 3Exploring a Design Space for SelectingPersonal DataWith this chapter1, we enter the second half of the dissertation, where design worktakes the stage. Here, we build upon the insights from previous studies and explorehow to support the selection of personal data. We define selection as the practicalprocess of choosing what to keep or discard. As in previous chapters, the actionsof selecting what to keep or discard is a subset of the keeping stage in personal datacuration.7.1 IntroductionSo far, we have looked at personal data curation and the different ways people ap-proach the process, driven by general tendencies (Chapter 4), using several declut-tering strategies (Chapter 5) and reflecting different styles of behaviors (Chapter 6).In this chapter, we want to look specifically at how technology can supportdata selection, the process of intentionally deciding what personal data to keepor discard. We have argued before how selecting what data to keep or not is a1Originally published in Vitale, Odom, and McGrenere. (2019) Keeping and Discarding PersonalData: Exploring a Design Space. Proceedings of the 2019 Conference on Designing InteractiveSystems (DIS ’19) [291]98necessary step for “emotionally viable” archival systems: choosing what to keepand what to let go is important so that you can derive value from your personaldata over time by keeping things that matter to you [182]. We have also see how ithas become nearly impossible to decide what to keep and discard [24] due to thegrowing amount of personal data.How can technology support this selection process? Recent studies on the valueof digital data [111] and its longitudinal management [150] point to a growing needfor tools that can support keeping and discarding decisions. But we know fromprevious chapters and previous studies [150] that people show strong individualdifferences in their practices, so it is unlikely that a single solution could satisfy allusers.With these premises in mind, we ask: How can technologies be better designedto support people’s decisions around what data to keep or discard? In particular,what different design approaches might be viable to different users and in differentsituations?Using a Research through Design approach [317], we created five design con-cepts as a way to probe people’s reactions, attitudes, and perceptions on the roleof technology in supporting personal data selection practices. The concepts in-tentionally emphasize different design dimensions stemming from related work inPIM. The five concepts are: Patina, a visualization of temporal aspects of data (e.g.,age or number of interactions), Data Recommender (a recommender system thatsuggests data to take care of using machine learning), Temporary Folder (a folderwith an expiration date), Temporary App (a mobile application with an expirationdate), and Future Filters (a mobile application to create advanced filters for decid-ing what to do with data in advance). For each concept, we created a short videosketch [316] as a prototype that primarily illustrates how it works. Then, we con-ducted one-on-one interview sessions with 16 participants with varied data man-agement approaches. The interview sessions touched on the potential benefits anddrawbacks of the concepts, with a range of reactions. We identified contrasting at-titudes towards the systems we presented, with the tension between automation andcontrol informing the need for context-based solutions. Drawing on the interviewanalysis, we critically reflect on our results and outline future design directions tofurther open the design space.99In this chapter we make three key contributions. First, we outline four designdimensions (selection regime, automation, aggressiveness, and temporality) to de-fine and broaden the design space around keeping and discarding decisions. Thesecan be used as a generative resource for creating new solutions. Second, we offerfive alternative design concepts that we used in an elicitation study with a diversesample to probe and explore the space, showing where people’s key boundariesaround control and automation lie. Third, we discuss future design directions forsupporting keeping and discarding decisions focusing on personalization, automa-tion, defining new actions, and targeting data privacy.7.2 Related workBelow, we review design efforts around personal data management from prior re-search projects. We use this review to outline a set of design dimensions and ap-proaches to probe on. For more general insights about user practices and challengesin selecting data, we refer to Chapter 2.7.2.1 Existing and proposed design approachesAugmenting data management interfacesTwo common user interface paradigms to manage data are: 1) document-centric,with the most dominant desktop metaphor of files and folders (common on per-sonal computers and cloud storage platforms), and 2) application-centric, withthe application acting as a bundle for data (common on mobile devices and so-cial media platforms) [13, 301]. Both paradigms have benefits and drawbacks.The file systems community, for example, has criticized the desktop metaphor or-ganized around folders for being too rigid and inadequate as the amount of datagrows [255]. But despite their faults, folders still dominate management platformsbecause they provide valuable functions: they help people control, organize, andstructure their work [301]. Enhancing them, rather than replacing them, might bethe best design approach [301].Several projects propose augmentations or alternatives to folders (Table 7.1),using for example metadata [300] or annotations [294]. Other projects choose100an alternative activity-centric approach [13], exploring “time-ordered streams” ofdocuments [90], flexible desktop organizations [79, 295], or new metaphors basedon places, time, and data provenance [167]. While exploring radical alternativescan help push forward design, our concepts largely focus on augmenting currentinterfaces so that participants can better relate them to their own experience.Design approach ProjectsDocument or folderaugmentationBIGFile [168], Facet Folders [300], File Bi-ography [167], Finder Highlights [93], Gra-yArea [29], Old’n Gray [31], Project Plan-ner [143], Vanish [97], WikiFolders [294]Filtering or tagging DMTR [30], Phlat [69], Stuff I’ve Seen [82],Tagtivity [221]Activity-centric File Biography [167], Giornata [295],Lifestreams [90], Presto [79], Tagtivity [221],Task Aware Ranking [281]Table 7.1: Previous research projects categorized based on their general de-sign approach.Using automation to complement selection practicesMost of the design projects reviewed so far focus on data organization or retrieval,with little attention paid to keeping decisions (see Table 7.2). An exception is workby Bergman et al., with several related projects addressing the keeping stage of thecuration cycle: GrayArea [29], DMTR [30], and Old’n Gray [31]. These systemsuse the “demotion” principle [22], an intermediate action between keeping anddeleting (which is the most common discarding action afforded by user interfaces).Demotion makes items visually less prominent or hides them in a separate area ofthe interface. This is a valid compromise between keeping and deleting: unneces-sary items do not distract when they are demoted, but they are still there in casethey are ever needed. Although discarding data can mean more than deleting (e.g.,101Curation stage ProjectsKeeping DMTR [30], GrayArea [29], Old’nGray [31],Vanish [97]Organizing Giornata [295], Lifestreams [90], Presto [79],Project Planner [143], Tagtivity [221], Wiki-Folders [294]Retrieving BIGFile [168], DMTR [30], FacetFolders [300], Finder Highlights [93],GrayArea [29], Lifestreams [90],Old’nGray [31], Phlat [69], Presto [79],Project Planner [143], Stuff I’ve Seen [82],Tagtivity [221], Task Aware Ranking [281]Table 7.2: Previous research projects categorized based on the curation stagethey focus on.demoting) our concepts focus on deleting as a key action to elicit more powerfulreactions from participants and understand where their boundaries lie.The examples by Bergman et al. also highlight two distinct design approachesto solve the “burden of curation” [140]: in GrayArea, users rely on direct manipu-lation to demote items by dragging them into a separate area of a folder, where inDMTR and Old’n Gray the process is automatic. The tension between automationand user-control is at the centre of many investigations in HCI [91, 102, 136, 237].Within PIM, the discussion around automating user management strategies iskey [286] and goes back to early studies about email [19, 304]. Some exam-ples of automation or semi-automation focus on selecting photos [164, 208] andaudio [195] or on the process of passing down digital data [112]. Jones [142]discusses automatically archiving information that is no longer useful, while Van-ish [97] introduces the idea of self-destructing data. Bergman et al. [26], amongothers, argue for finding a balance between automation and user control in PIMinterfaces. Yet, this is a question rarely explored in the specific context of keepingdecisions. Understanding which “curating” actions can be automated and whichshould not is an ongoing open question in PIM research [144]. Our work uses102the tension between manual actions and automation as a key design dimension toexplore potential new directions.Using metadata to build awareness of digital itemsMost design efforts discussed so far focus on improving data management tasks. Adifferent strand of design work by Odom and colleagues, instead, offers a counter-perspective, and focuses on reflection, reminiscence, and enjoyment [212]. Thisapproach uses metadata for rediscovering kept data through everyday objects [209,218]. Sas et al. [247, 249] also use a similar, reflective approach by proposing“rituals” for letting go of sentimental digital items. We use this work as inspirationto add possible design choices and incorporate an open-ended, reflective dimensionin some of our concepts. While prior work focuses on tangible interactions, weexplore using metadata in graphical user interfaces to resurface old digital items orbuild awareness of their accumulation over time.Exploring the potential of prospective decisionsFinally, we narrow the focus to email, with work by Gwizdka [114–116] providingadditional inspiration. In studying and designing email management tools, Gwiz-dka introduces the idea of prospective information to support task management byanticipating future needs. Today, several email management tools apply a simi-lar concept through reminders and snoozing functions. Empirical work on moregeneral data selection decisions points to temporal dimensions that consider boththe past and the future [140]. For example, Kim [151] mentions one participanthaving a folder called “to delete,” while Brewer et al. [42] discuss prospectivememory for digital reminders. Our own work in Chapter 6 outlines retrospectiveand prospective dimensions of data curation. Thus, we see the challenge of antic-ipating future needs with prospective decisions as a major opportunity to expandthe design space. While most data management tools are retrospective, there mightbe space for prospective decisions and we use some of our concepts to explore thisarea.1037.3 Methodology7.3.1 Research approach and design dimensionsOur review of related work shows how designing to support keeping and discardingdecisions is a largely under-explored territory, with different potential directions tofollow. This multitude of possibilities makes a design-led exploration ideal. Thus,in our work, we took a Research through Design approach [317]. Our inquiry canbe seen as parallel to that of Gulotta et al., [112] who use of a similar approach toinvestigate the space around data curation, legacy, and memory.We started by clustering and mapping insights from prior work and our ownempirical work described in previous chapters into four key design dimensions toprobe: selection regime, automation, aggressiveness, temporality. These four de-sign dimensions synthesize both previous related work on digital data and our ownwork in previous chapters. Below, we describe the four dimensions, explainingwhat we based them on and how we used them to drive the design process.Selection regime - The first dimension considers possible “selectionregimes” [140] that people use when curating data: whether they considerone item at a time or a collection of items all together. This dimension is basedboth on past related work and our own empirical work. Past related work oncuration [140] introduces the idea of selection regimes, where people consideritems either in groups or individually. In our own work, we saw similar userbehaviors in Study 2 (Chapter 5). We created variations along this dimensionto encompass both individual items and collective categories, probing on thedifferences in support needed for both.Automation - The second dimension focuses on the tension between user-initiated data selection and automated data selection. This dimension is based onpast related work on data management: earlier in this chapter (Section 7.2.1), wediscussed how past work on PIM explores different approaches to automation, ar-guing for a balance between automation and user control. Thus, we see automationas a key dimension to explore given the increased potential for automatic data man-agement tools, thanks to machine learning and artificial intelligence. In our design104work, we contrasted concepts that took automation to its extremes with others thatgive more emphasis to user influence.Aggressiveness - The third dimension is about the level of aggressiveness of thesystem. This dimension is inspired by a large body of past related work on digitaldata and metadata, covered earlier in this chapter (Section 7.2.1), that argues for acalm, open-ended approach to designing for personal data. We used this dimensionto contrast concepts that are more open-ended (i.e., they only inform of data to takecare of, whether the user notices it or not, letting them decide what to do and when)with others that are more forceful and push the user to decide whether to keep ordiscard something.Temporality - The final dimension represents the temporal user mindset in se-lecting data: either retrospective, looking at items based on past use, or prospec-tive, looking at items based on future use. This dimension is based both on pastrelated work and our own work. Earlier in this chapter (Section 7.2.1), we pro-vide examples of past work that points to prospective data management decisionsas a promising area to support. In our own work, we also highlight temporal di-mensions of data curation (Chapter 5, Chapter 6), highlighting a contrast betweenretrospective and prospective user attitudes. We used variations along this dimen-sion to probe on the largely underexplored area of prospective decision making tosee whether this might be a viable direction compared to the retrospective natureof many traditional management tools.After defining the design dimensions, we created five concepts that differ alongtheir extremes to explore a design space for data selection. As we already men-tioned, our own work informed the design dimensions, but these dimensions do nothave a precise one-to-one correspondence to the five Behavioral Styles presentedin Chapter 6, because the dimensions synthesize attitudes from across the behav-ioral styles. Similarly, when we created the design concepts we considered the fivebehavioral styles and how each of them might relate to a concept, ensuring thatwe covered a range of attitudes in the design space. Some concepts have a moreexplicit link to a specific behavioral style (that we point out in their descriptionsbelow), but our goal was not to create or test a rigid one-to-one correspondence,given the contextual and dynamic nature of user behaviors.105For each concept, we created a short video prototype or sketch [316], an illus-tration of how it works. The videos use a mix of descriptions and user scenarios,depending on what we felt best illustrated each concept. The concepts take inspi-ration from existing work or systems, as we detail in their respective descriptions.However, to have more control in our elicitation study, we decided to design ourown set of video prototypes instead of using existing systems. By creating ourown concepts and videos we were able to push the design dimensions in specificdirections, often exploring their extremes in new combinations (Figure 7.1). All to-gether, the concepts synthesize a mix of disparate ideas into a cohesive collection,applying existing and proposed design approaches in new contexts. In the videos,we tailored the user scenarios around key research questions we were interestedin exploring with participants. The videos, similar to experience prototypes [45],frame the concepts in a way that offer glimpses into possible futures to provokeand open dialogue with participants about perceived benefits and consequences ofeach design [178]. Our approach is inspired by and shares similarities with priorwork on User Enactments [214] and Speed Dating [72]. These related approachesargue for exploring the potential roles, values, and social boundaries of emergingnear-future technology by using more than one design vision. They encourage par-ticipants to imagine future interactions and react to them by drawing on their ownexperiences.7.4 Design conceptsWe now describe the five design concepts, positioning them within the design spaceand pointing to sources of inspiration.7.4.1 PatinaThe first concept, Patina, is a visualization on top of data in the geometric form ofa spiral 2. It is inspired by a tree’s growth circles and symbolizes temporal qualitiesof data. In the video for Patina, we show two different options for the spiral: on adesktop, it represents the age of folders (Figure 7.2); with a set of music playlists,instead, it represents the number of interactions over a period of time (Figure 7.3).2Patina’s video is available in the supplementary materials to the thesis.106retrospective   • - - - - - - - - - - - - - - -   prospective  pushy     - - - - - - - - - - - - - - - •   open-ended automatic    - - - - - - - - - - - - - - - •   manual mass items   - - - - - - - - - - - - • - - -   one by onePATINAretrospective   - - - • - - - - - - - - - - - -   prospective  pushy     • - - - - - - - - - - - - - - -    open-ended automatic    - - - - • - - - - - - - - - - -    manual mass items   - - - - - - • - - - - - - - - -   one by oneDATA RECOMMENDERretrospective   - - - - - - - - - - - - - - - •   prospective  pushy     • - - - - - - - - - - - - - - -    open-ended automatic    - - • - - - - - - - - - - - - -    manual mass items   - - - - • - - - - - - - - - - -   one by oneTEMPORARY FOLDER & APPretrospective   - - - - - - - - - - - - - - - •   prospective  pushy     - - - - - - - - • - - - - - - -    open-ended automatic    • - - - - - - - - - - - - - - -    manual mass items   • - - - - - - - - - - - - - - -   one by oneFUTURE FILTERSFigure 7.1: An overview of the design concepts and how they map to the fourdesign dimensions we used in our exploration.Music playlists provide a good contrast to folders because they are “meant to beenjoyed repeatedly and grown over time.” [169]Figure 7.2: Patina showing the age of desktop folders: the bigger the spiral,the older a folder is. (Each dot stands for a set time amount, e.g., aweek.)107Figure 7.3: Patina showing the frequency of use for music playlists: the big-ger the spiral, the more times a user has played the playlist.Patina’s video leaves some aspects as intentionally ambiguous and unexplained(e.g., How is the age of a folder calculated? How is the interaction period deter-mined?). We wanted to encourage user interpretation and discussion. This choicealso emphasizes the open-ended nature of Patina in the design space: it invites re-flection and builds awareness, but does not suggest any specific action to take ondata. We use this concept to probe on the viability of open-ended designs that leaveusers in charge of initiating any selection action and on what metadata attributesmight be useful for doing so.The idea of a patina takes inspiration from prior studies that mention its poten-tial [211] or use it with physical [99, 160, 161] and digital objects [137, 188]. Thefrequency of use in Patina is inspired by Hurst et al. [137] and Matejka et al. [188].The idea of aging, instead, is informed by Giaccardi et al.’s work on “traces of use”for daily objects [99]. Our work, however, takes place in a different context anduses a different approach designed with data selection in mind. There are also com-mercial products with visualizations for used space on hard disks (e.g., Daisy Disk1083, Disk Inventory X 4) but these are separate from the data and use only one typeof metadata (size). Instead, we tie the visualization to the data and use two typesof metadata (creation date for folder and frequency of use for music playlists).Patina was also informed by the Collector behavioral style. As we mentionin Chapter 6, we imagined that this concept might play well with users who tendto keep and manage large quantities of data over time, providing them with anopportunity to revisit and enjoy their collections. We also thought that a visualiza-tion such as Patina could engage people who take a casual approach to their data,providing a fresh and unusual perspective to daily data curation. Finally, we sawPatina as a potential way of directing overwhelmed users and users taking a purgeror frugal approach to items that they might want to get rid of. A similar visualiza-tion could easily match routine and serendipitous decluttering practices of differentusers (Chapter 5).7.4.2 Data RecommenderThe second concept, Data Recommender (Figure 7.4), notifies users and providesrecommendations on data that might need attention, using metadata such as lastaccess, creation date, or size 5. Users can decide to trash items, archive them in acentral archive, move them in a specific folder, or be reminded of them at anothertime. Data Recommender will use machine learning to learn from their actions andprovide new recommendations. This concept is the closest to existing products.For example, Google Photos 6 provides recommendations on photos to archive,while Files on Android 7 gives recommendations on how to free up space.Data Recommender is in the middle between human-driven activity and au-tomation, following a mixed-initiative approach [46, 134]. Using this concept wewant to probe the link between data and context, the viability of different selectionactions, and the attributes that make items good candidates for disposal.When creating Data Recommender, we imagined it might be particularly help-ful for supporting routine decluttering (Chapter 5) and users who feel overwhelmed3http://daisydiskapp.com4http://www.derlien.com5Data Recommender’s video is available in the supplementary materials to the thesis.6https://www.google.com/photos/about/7https://files.google.com109Figure 7.4: Data Recommender notifies users when they have some data totake care of (top) and provides a list of items (bottom): users can chooseto trash, archive, move, or be reminded of items again.(Chapter 6). We also thought it could provide some support to users who take apurger or frugal approach, given their desire to get rid of items. Prior work [215]also points to recommender systems as a good approach to offload the work ofselecting items.7.4.3 Temporary Folder and Temporary AppThe next two concepts come as a couple: Temporary Folder and Temporary App.In this case, we created two videos on two different platforms. The first, TemporaryFolder, takes place on a desktop computer (Figure 7.5) 8: it acts as a standard folder,but users can decide to set an expiration date for it. After the expiration date, thefolder will be automatically deleted. The second, Temporary App, takes place ona smartphone (Figure 7.6) 9. In this case, users can install a mobile application8Temporary Folder’s video is available in the supplementary materials to the thesis.9Temporary App’s video is available in the supplementary materials to the thesis.110temporarily (e.g., for two weeks). At the end of the preset period, the applicationwill be automatically uninstalled.Figure 7.5: When creating a Temporary Folder (top) users can pick an expirydate for it (bottom, left). After the expiry date (bottom, right), the folderis automatically deleted.These temporary concepts fall within the prospective side of the design space,unlike the previous two. Their radical take on automatic deletion was meant tostimulate discussion among our participants about their perceived social accept-ability. The purger and frugal behavioral styles (Chapter 6) were a key influence onthese two concepts. We imagined that Temporary Folder & Temporary App couldprovide support for purging habits and the need to prevent data accumulation in thefirst place. The preventive nature of these concepts could also help overwhelmedusers and could support routine decluttering practices (Chapter 5), helping peopleset prospective routines.Several commercial products also use automatic deletion in specific contexts.As an example, the messaging application Telegram 10 allows users to set an ex-piry date for photos, videos, and other files exchanged with contacts. If they do notaccess them for a set period of time they are removed from the device (they are still10https://telegram.org111Figure 7.6: When installing a Temporary App users can pick an expiry datefor it. After the expiry date, the app is automatically uninstalled.on Telegram’s cloud though, so they are not completely deleted). Snapchat11 hasinstead popularized the notion of emphemeral information as a default. The notionof different information lifespans also goes back to an early study of desktop usagethat identified three types of information [14]: ephemeral, working, and archived.More recently, Murillo et al. [202] explore the potential of data expiration for sup-porting users’ deleting decisions: one participant in their study mentions the ideaof an email folder that allows users to set an expiration date 12. Temporary Folderand Temporary App explore this idea in two specific contexts: desktop files andmobile applications.7.4.4 Future FiltersThe final concept, Future Filters, is a mobile application that allows users to decidewhat to do with data in the future creating set of rules or filters 13. For example,“delete selfies and downloads that are older than two months when my free space is11https://www.snapchat.com12The study by Murillo et al. was published in 2018, at the same time as we were developing ourconcepts. We were not aware of it when we designed the Temporary Folder concept or when writingthe paper that this chapter is based on. It is nice to see the same idea in two independent studies.13Future Filters’ video is available in the supplementary materials to the thesis.112below 20%,” (Figure 7.7) or “archive shared documents not looked at in 2 years,”and so on. Filters use a set of actions (e.g., delete, move, archive, remind me),criteria (size, use, number of copies, source of data, copied on the cloud, etc.), andtriggers (a new update available, free disk space is below a certain amount, etc.).Figure 7.7: Future Filters is a mobile application that lets users create datafilters based on actions, data types, data attributes, and triggers.113This final concept has a strong emphasis on prospective decisions and massprocessing of items, with a certain degree of automation. The disciplined attitudesto data curation that characterized both the collector and purger behavioral stylesinfluenced Future Filters. We imagined a similar approach could support the needfor feeling “on top of things” that participants reported. We also thought this lastconcept could provide support for triggered decluttering (Chapter 5), helping peo-ple identify categories of items quickly.Future Filters also takes direct inspiration from If This Then That 14, a platformto create cross-application rules based on triggers, and other products or featuresthat use automatic filters (e.g., File Juggler 15, Hazel 16, Gemini 17, or email filtersin Gmail.We use Future Filters to further explore the viability of prospective decisions,probing on what actions might be more acceptable when considering automation.7.5 Elicitation studyWe used our set of design concepts in an elicitation study with 16 participants. Weshowed them the videos of the concepts during one-on-one interview sessions thatalso touched on their general data management practices.7.5.1 Recruitment and participantsWe used purposive sampling to recruit a diverse sample of participants. We ad-vertised the study on a university listing and on Craigslist in Vancouver, Canada.We used a screening questionnaire (see Appendix D) to select participants basedon their age, occupation, technical familiarity, and general approach to data cura-tion. We used brief descriptions of the behavioral styles (Chapter 6 in the screeningquestionnaire as a closed-ended question (see Appendix D).We received 177 responses to the screening questionnaire. We contacted 36 re-spondents and 16 agreed to take part in the study. We stopped recruiting when wereached a diverse set of participants and reactions. Nine participants self-identified14https://ifttt.com15https://www.filejuggler.com16https://www.noodlesoft.com17https://macpaw.com/gemini114as female, six as male, one as gender non-conforming. They were aged 23-71 (av-erage: 36). Occupations included administrative assistant, engineer, HR specialist,journalist, photographer. In terms of general approach to data curation, four largelyhad a collector approach, five felt overwhelmed, three tended to purge items, fourtended towards a casual approach. We note that this was the general self-reportedapproach, but during the interviews participants elaborated on their approach, dis-playing some differences among data types and more nuanced behaviors. A fewparticipants also displayed a frugal approach in addition to the main approach theyhad selected.7.5.2 ProcedureThe study sessions consisted of: 1) a short introductory interview on general datamanagement practices, 2) a main elicitation section going over each of the designconcepts, 3) a final, longer semi-structured interview discussing and comparing allthe design concepts and the ideas behind them. In the introductory interview, weasked participants to discuss how they organized and selected their data over timeand on different platforms or devices, asking them to show us examples where pos-sible. Then, for each concept, we first gave a short introduction and then showedthe video. After each video, we asked whether something was not clear, provid-ing printouts of the concepts. Then, we probed on participants’ first impressions,asking them how they felt when watching the video and what they felt about dif-ferent aspects of the concept. Following Odom et al.’s approach [214], in the finalinterview, we asked participants to reflect on and across their experiences of allconcepts. We asked them to pick the most or least valuable for them, discuss themost positive or negative aspects across all the concepts, and elaborate on the ideasbehind them based on how they would fit their needs and experience. Participantsused the printouts of the systems to compare and contrast them. One memberof the research team conducted all interviews, in English, at our university. In-terviews lasted between 37 and 70 minutes (on average: 49 minutes) and wereaudio-recorded. Participants received $15 as compensation.1157.5.3 Data analysisTo analyze the data, we used Braun and Clarke’s approach to thematic analy-sis [61]. We transcribed participants’ answers and started analyzing them induc-tively using open coding. Then, we grouped codes into categories and developedthemes across categories. One member of the research team coded the data and dis-cussed the themes and interpretations with other authors during multiple meetings.We also categorized participants’ reactions to each concept as positive, negative,or mixed, and compiled a list of the most and least valuable concepts for eachparticipant.7.6 Thematic analysis resultsIn this section, we present the results of our thematic analysis. In general, par-ticipants appreciated the idea of getting help in selecting data–they saw it as animportant but often challenging task. However, there were striking differences inhow they reacted to the concepts (Figure 7.8 provides an overview of participants’individual reactions). For example, reactions to Temporary Folder ranged from en-thusiastic (P15: “I like that one, a lot!”) to perplexed (P16: “Why would one wanta temporary folder?”) and terrified (P1: “I would be terrified to put something ina folder that’s going to be deleted!”). In the first two themes of the analysis, wecontrast diverging opinions on what role technology should have in supporting se-lection practices: some participants preferred to retain full control of the process(Theme 1). Others, instead, welcomed automation and felt comfortable in offload-ing selection tasks to technology (Theme 2). Then, we synthesize a middle groundbetween the two different stances in participants’ overall desire for a contextualapproach and how this leads to a new perception of keeping and discarding actions(Theme 3). As in Chapter 4, this analysis provides an abstracted, high-level syn-thesis of cross-cutting individual reactions. The first two themes we present mightgive off the impression that participants expressed one or the other opinion, buttheir attitudes and reactions were contextual, as Theme 3 details. In Chapter 6 wealready connected these high-level themes to different behavioral styles.1162020-03-29, 7)38 PMGrid viewPage 1 of 1https://airtable.com/appslJR9OVXiomlo3/viwnOOmxoEEvPwMvy/print…4a073ef8c64527e80515b8e7cbcdce68f8a9bab86567536e2fbc0fc%22%7DP Patina Recommender Temp Folder Temp App Filters Most valued concept Least valued conceptP1 mixed mixed negative mixed mixed Patina Temp FolderP2 positive mixed positive positive positive Filters Temp FolderP3 positive mixed mixed positive mixed Filters - but retrospective Filters Temp FolderP4 positive negative mixed positive negative Temp App Filters RecommenderP5 mixed mixed mixed positive positive Temp App Filters PatinaP6 negative positive mixed positive positive Filters Recommender Temp App PatinaP7 positive negative mixed positive positive Patina Temp FolderP8 mixed negative positive negative mixed Temp Folder Temp AppP9 positive positive negative positive positive Temp App Temp FolderP10 positive positive positive mixed positive Filters Patina Temp FolderP11 mixed positive negative positive mixed Recommender Temp FolderP12 positive negative positive positive negative Temp Folder Temp App RecommenderP13 negative positive negative negative positive Recommender Filters Temp App PatinaP14 negative negative negative negative positive Filters Temp AppP15 mixed mixed positive positive negative Temp Folder Patina RecommenderP16 mixed negative negative mixed negative none allFigure 7.8: An overview of participants’ varied reactions to the five conceptswe created (here abbreviated as Patina, Recommender, Temp Folder,Temp App, Filters). In the first column, the participant number; in thenext five columns, participants’ general reactions to a concept catego-rized as positive, negative, or mixed; in the last two columns, the mostand least valued concepts for each participant.7.6.1 Theme 1: Selecting data is a personal responsibilityThe first theme captures opposing reactions towards support in selecting data.Some of the participants were generally against automation, mass processing ofitems, or aggressive systems. Their reactions to the design concepts highlighteda need for control, a sense of responsibility for their data, and a strong desire fordoing things in a specific way (the “right way” [288]), all on their own.Wanting full controlFor instance, P11, a professional photographer who managed thousands of photosbetween her phone, laptop, and an external hard drive, made it clear that fully au-tomatic tools would not work for her because they crossed an important boundary.She felt that having full control over data and the selection process was essential.117Her career depended on properly managing digital data, with no room for mistakes:“I’ve heard the horror stories of photographers not backing up data properly andlosing up a whole shoot, and, yeah, it will pretty much just ruin your reputation.”She explained how automatic tools felt intrusive and undermined her sense of con-trol: “For data, and maybe this is my personality or work, but you don’t ever wantsomebody coming in and telling you ‘this is mine’. Or, ‘get rid of it’. You make thework, you wanna have control over it. That’s why I wouldn’t want something likeFuture Filters going through my files. [...] I really I don’t appreciate that.”Thinking independentlySimilarly, P13, who used to work as a “programmer of sorts” in a medical imagingcompany and often felt overwhelmed with data, highlighted that thinking indepen-dently and taking care of items without the help of technology was important tofeel in control. In her reasoning, she drew a parallel between some of the conceptsand older recommender systems in Word processors (i.e., Clippy): “It’s almost like,you know–there were Word processors that tried to think for you. ’Oh, it looks likeyou’re writing a document, let me do such and such.’ And I’m like so mad. I did alot of Word processing. And I know what I want, I know the spacing I want, I knowthe editing I want. I know what I’m trying to do and this nuisance tries to think foryou. I don’t like that. [...] Personally, I would like more power and control myself.”After seeing Temporary App, she added that automatic tools made her feel lazy,hinting once again at the idea that selecting data is a personal responsibility: “It’skind of like, you feel lazy. Because how hard is it and throw it into the delete stuff?Are we all that busy that we need [this]?”Distrusting technologyUnderlying many statements from participants there was a sentiment of distrusttowards technology. For example, participants questioned how the machine learn-ing from Data Recommender would work and whether it would learn the “wrongthings.” Similarly, they feared that any function where they did not have full con-trol would eventually go wrong. Thus, a lingering feeling of uneasiness. “I tendto like the ones that remind you or prompt you vs automatically doing it for you,”118explained P7, an HR specialist who did not trust any cloud platforms for personaldata management preferring to do things on “her own,” with a mix of collectingand purging. She articulated her preference for Patina and Data Recommender interms of trust and comfort: “I wouldn’t feel comfortable putting in parameters andjust having the technology determine for me. I’d prefer to have them notify me orgo through them and choose. I liked [Data Recommender]. I’d feel much morecomfortable with that vs having things automatically deleted.”These preferences were not always a direct reflection of differences in gen-eral curation approaches (e.g., tending to keep a lot vs. keeping little). Considerthe case of P15, working as an administrator in a government agency: she self-described as a “very organized” person who “doesn’t keep a lot of things,” limitsher technology use, and deletes photos, videos, files, reflecting a purger and frugalapproach. She was onboard with discarding data, but trusted herself more than atool: “I’m not someone who keeps everything. So I’m not at all reluctant to deletethings at once, I know people who are. But at the same time, I don’t know if Iwould trust the computer to delete things if I haven’t reviewed them and made sureI want to delete them.” This explains why she was enthusiastic about TemporaryFolder but not Future Filters: “I have control over what I’m putting in the folder.[With Future Filters], you don’t really know exactly what it’s deleting, you’re justtrusting that you’re putting things in the right place so I feel there’s more potentialfor errors with [Future Filters].”Changing idea and having the final sayThe need to have full control and think independently informed a strong preferencetowards always having the final say in all keeping or discarding decisions. Beingin control meant seeing recommendations as nothing more than suggestions thatneed approval and leaving space for changing ideas. When pondering prospectivedecisions, participants who were generally against automation felt uneasy and won-dered what would happen if they changed their mind: “I think I like the concept of[Future Filters] but it’s such broad categories that you might end up deleting dataand regret it.” (P8). Anticipating regret, some participants said that keeping every-thing “just in case” [271] seemed a better approach, while others saw a safeguard119in the possibility of controlling decisions and having a final say. For example, P9, ajournalist who used to deal with large amounts of data but who wanted to be “moreorganized” and “have less,” explained that “Across the board, the review process isreally important: before something gets deleted I should know it’s getting deleted,it should not get deleted without me knowing. And I should have the physical op-tion of choosing to delete or not. [...] I should have the final say. [...] Sometimesyou change your mind.”7.6.2 Theme 2: Selecting data is a choreThe second theme captures reactions towards the concepts that contrast those ex-plored in Theme 1. In this case, participants welcomed automation and generallyexpressed the need for tools that would take care of things for them, freeing themfrom the weight of selecting data. They felt tired of the responsibility of the se-lection process, something they put off or were not good at dealing with, and werehappy to offload the process to technology.Being tired of taking care of dataParticipants who had positive views of automation were happy about tools takingcare of selecting data, a process that they perceived as tiring and relentless. Forexample, P14, an HR specialist who showed both collector and overwhelmed be-haviors, said she “seriously” loved Future Filters and explained that it would helpher deal with things she was tired of: “I am tired of organizing my information andtaking care of it every 2-3 months because of the space limit. It bothers me a lot,so if I can set the filters just once for the majority of things that bother me–and itwould be pictures, videos, and music, that’s the main problem–that would be justperfect.”Needing a push to feel the urgencyParticipants noted how they needed a “push” to attend to data: “I thought the mostuseful [aspect of the system] was that it actually popped up. So you have to actuallytake action on all of the items, because that forces you to decide what to do withthem. I thought that was very useful.” (P10). Selecting items was a task they120wanted to engage in, but often put off: “I feel this could be a cool way to do itfor me, because it’s something I put off. The last time was probably a year ago orsomething, so having an app do it for me would be great,” said P2, who largely hada casual approach to data. More automatic and prospective systems felt useful increating urgency: “I think Patina [was the least valuable]. Even though it sets anindication, it doesn’t create an immediate urgency” (P6).Desiring a proactive systemSome participants were satisfied enough with recommendations, but several hadpreferences for stronger intervention, expressing the desire for a proactive systemthat would think for them: “It’s perfect if the program can think for me in advance.[...] The [Data] Recommender is going to bother me for sure. It means the programadvises me to think about something and I want the program to think in advance,give me some kind of solution.” (P14) This preference might have come down topersonal style. This is how P10, a student in Education who reported constantlyrunning out of space on his laptop, related tools such as Future Filters to his self-described “lazy” selection style: “I think it depends how organized you are. FutureFilters automatically deletes without telling you what. Patina and Data Recom-mender remind you and you decide what to do. So, if you feel like you need thatreminder and you can delete yourself, I think they would be nice. If you feel you’retoo lazy or not organized enough, Future Filters takes care of it for you. I am lessorganized, that’s why Future Filters is a really good option for me.”Deciding in advance to not worry laterThe enthusiasm for simplifying selection extended also to prospective decisions,with participants relating the idea to their own practice: “There’s definitely things Iknow I don’t need. You know, pictures from the internet you want to send someoneand they stay on your desktop.” (P4). Several participants preferred to decide inadvance and not worry later: “I think it’s a good idea in terms of coming up withsome parameters for things you know you’re not going to need in the future and it’sbetter to just automatically delete it and not worry about it” (P7). They perceivedsuch options as a way to limit the constant input required for selection, a process121they compared to daily chores: “[Future Filters] might be better because I’ve madea decision and then it will happen and it’s not being dependent on me being... youknow, it’s like cleaning or doing dishes. [...] Your input is at the beginning andthen it automatically takes care of itself.” (P13) However, as the next theme shows,these decisions still needed to have some safeguards in place or otherwise respectthe context of data.7.6.3 Theme 3: Context is keyIn Themes 1 and 2, we have described a range of reactions to the concepts, withtwo general contrasting stances. The final theme highlights how these reactionswere different but never completely polarized, because the context of data playeda key role. Participants noted important differences between data types based ontheir nature (a document you take a lot of time to create vs. a movie or an app,that you can always download again), their context (work data being generallymore important and critical than personal data), and the device under consideration(computers being for serious stuff and smartphones being for less critical stuff).They perceived data as being always somewhere out there, in the cloud or on adevice: this had both positive and negative consequences, and informed what wecall a post-cloud perception of selection.Selection decisions are contextualA recurring thread in participants’ reactions, whether more positive or negative,was that keeping or discarding decisions are highly contextual. Thus, a conceptthat worked for one type of data, might not have worked for others. For example,many participants drew distinctions between work and personal “stuff,” saying thatthey tended to be more organized and less selective with what to keep at work: “Ikeep everything for work related, for personal it’s different.” (P7) Similarly, theyregarded data on phones as easier to discard and often less important. They per-ceived smartphones and mobile devices as “fluid, temporary, and more accessible,”compared to laptops, that were “serious, demanding,” and with more places whereto hide things away. The difference in storage capabilities between the two typesof devices was also an important factor to consider: “For the phone I periodically122delete pictures I don’t want and apps I don’t use anymore. But on the phone it’spartly due to storage problems. Which is not that much of a concern with my Mac,”recollected P3 in illustrating their largely purger approach.Exceptions to general decisionsParticipants also remarked that digital data being old or unused did not necessarilymean that they would have liked to get rid of it, as some concepts suggested: “Idon’t like the fact it says you haven’t used it because it might say, you haven’t usedit in six months, get rid of it. But that’s not a good idea because sometimes you savefiles for you future situations. Deleting files because they are older, is that a goodidea? Maybe there’s a reason they should be kept.” (P16) They wanted to defineexceptions to general decisions and have the option to instruct the system aboutany item that they might want to keep: “Maybe there’s an option to exclude certainthings. Like, all photos that are older than 2 weeks, except these three. That’d be agood option to have, to be able to create exceptions.” (P10). These reactions pointto the importance of marking items to keep explicitly, an action often absent fromdata management tools.The cloud is as big as the universeThe contrast between different contexts and storing places was particularly evidentwhen comparing Temporary Folder and Temporary App. Several participants ex-plained that automatic or prospective actions were more acceptable with mobileapplications because applications are not unique and are not the result of time oreffort: “There’s no big risk, you can install the app again” (P5). When expandingthe focus within their data ecosystem [288] and discussing the cloud, with placessuch as Facebook, Google Drive, or Dropbox, participants noted how the changein context changed their attitude, hinting that selection would be less necessaryin these storing places: “I never delete [from Facebook] because I imagine theirstorage is as big as the universe.” (P6)123Data is always somewhere out thereThe key role of cloud platforms and the interconnected nature of data ecosystemsinformed a perception of data as ubiquitous and perennial. Participants in Gulottaet al.’s study [111] saw deleting as being against the nature of digital data. Partic-ipants in our study perceived data as never truly deleted because there will alwaysbe a copy somewhere, out there. Perhaps it is a copy on an external hard drive,maybe it is a backup on Facebook or Google, but data never really disappears un-less you want it to: “We’ve talked about deleting apps on my phone regularly andthings and as long as they’re backed up somewhere, if they’re deleted it’s not a bigdeal.” (P2). This post-cloud conception of data makes keeping and discarding de-cisions take on a different meaning: deleting means removing data only from onespecific device or removing a specific instance, while having a copy somewhereelse: “I don’t think I will put anything in Temporary Folder that I don’t have abackup for, so it’s fine.” (P8). And archiving really means moving or hiding datawithin a device ecosystem: “I wonder how the archive... maybe it’s like moving,it sounds like a similar function. I [archive] with my emails, but I think it goesto... it’s the same idea as moving.” (P6). Suddenly, automatic or prospective deci-sions are more acceptable: “Especially in the selfie scenario, you probably alreadyposted it on Instagram or Snapchat or whatever, so there’s a copy of it already inthe world, so removing it from your device it’s not a big deal.” (P5). The selectionprocess then becomes a matter of moving data back and forth within an ecosystemand the cloud is the ultimate storage utility.7.7 Discussion and future directions7.7.1 Moving the design space towards personalizationThe range of reactions we gained from participants supports the idea that decisionsaround what data to keep or discard are highly personal, as we have seen in Chap-ter 4 and Chapter 6. As we expected, no single solution was able to resonate withmost participants. But these results show how branching into potentially controver-sial or radical areas of the design space can be fruitful. By inquiring into conceptsthat appeared risky, we were able to get a better idea of where people’s bound-124aries lie, what it means to cross them, and how different people may have differentboundaries. The design dimensions we explored can now be used as a generativeresource to work towards new solutions. Some of the design dimensions and con-cepts we explored could be remixed (e.g., providing a list of filters that can feedinto recommendations) and modulated to support different user attitudes. There isan opportunity for future work to further investigate this emerging space throughdesigning, developing, and studying more personalized solutions (e.g., customiz-ing default keeping and discarding actions or criteria for recommendations). In thefollowing sections, we articulate some more key directions for future efforts.7.7.2 Finding a space for automationIn our analysis, we were particularly fascinated by the contrasting attitudes to-wards automation. Do the strong reactions against some of the concepts mean thatwe failed as designers to support user needs? We think the key here lies in unpack-ing the underlying threads of such negative opinions and leverage them to movetowards a more nuanced approach. As highlighted in the related work, the tensionbetween automation and user control is a long-standing issue. A key contributionof our work is revealing that some keeping decisions, under specific circumstances,can likely be automated. In particular, promising initial contexts to pursue automa-tion in design interventions are mobile devices with limited storage space, mediafiles, and distributed data (i.e., data that is not unique or otherwise re-accessible).In other cases (e.g., different devices and types of data), there still is a space forautomation, but only with proper safeguards. This principle extends to other di-mensions of the design space, as we outline next.7.7.3 Safeguarding automatic and prospective decisionsOur analysis reveals that there can be a space for both retrospective and prospec-tive actions, manual and automatic, and open-ended or specific. However, it isessential that future design interventions synthesize these extremes so that any ac-tion is reversible and any potential risks are mitigated in advance. This suggestsan opportunity to investigate how to design effective safeguards for automatic andprospective actions. The easiest way to design safeguards would be to provide125reminders before automatic or prospective actions, something that several partici-pants asked for. Another approach would be to simply promote softer actions overthe more radical concept of deletion (e.g., moving, trashing, hiding). Yet anotheropportunity would be to see the perceived risk and anticipated regret that comeup in participants’ words as explicit components of the decision process. For ex-ample, systems could visualize a history of prospective or reverted decisions (e.g.,how many times a document was marked for trashing and then reverted, or howmany times a mobile app was uninstalled and then re-installed over the course ofa period of time). Similarly, efforts along the other dimensions of the space couldfocus on letting users explicitly define potential risks and regrets and then evaluatethem at a later point in time.7.7.4 Rethinking keeping and discarding actionsWhen reacting to the concepts and the actions they afforded, some participantsstruggled with understanding what an archive is. For others, archiving was thesame as moving or hiding. Similarly, participants reported deleting practices rootedin the importance of context and the availability of a multitude of platforms anddevices. Sas et al. [249] argue that “deletion is a crude binary process,” whileRamokapane et al. [240] highlight how cloud platforms in particular provide poordeletion models. We agree that a binary representation of data as either present ordeleted does not reflect the majority of our participants’ mindset. This argumentties to prospect theory [145] and can further explain why keeping decisions areso challenging: if deletion is a binary process, it is more difficult to balance risksand gains. Yet, in most tools deleting is the default discarding action. Althoughour results show that crude deletion is welcomed in some cases (e.g., with mobileapplications), moving towards a mitigated process of deletion might be the wayforward to support different contexts. This idea resonates with work by Harperet al. [123] who argue for rethinking actions about owning, copying, and deletingdata. Based on these implications, we see two possible directions to follow. Thefirst is a design-focused effort in the line of work by Lindley et al. [167] to exploreand define a new grammar of actions around keeping decisions. Harper et al. [123]propose “eradicating” or “withdrawing” files from the cloud, while Bergman et126al. [29–31] show examples of “demoting.” This set of actions could be extended toinclude mirroring (for storing a copy of data from a central repository only tem-porarily), distributing (to disseminate copies of data around many storing places),warranting (to authorize automatic tools to act only on specific items), locking (tomark items as protected by any discarding action). Providing options for discard-ing actions to be granular rather than binary will also allow for more personalizedsolutions. For example, some users might choose “crude” deletion, while othersmight opt for demoting or distributing items as their primary selection action. Asecond, research-focused effort would be to further disambiguate actions as usedin interfaces and perceived by users with a taxonomy. This would follow the exam-ple of Watkins et al. [299], who categorize very specific different types of digital“collections” and collecting practices. Research often refers to the old metaphor ofan archive to study and discuss people’s practices. It seems like apt timing to takemore seriously the question of what is an archive in the post-cloud age and is thisthe best metaphor to use?7.7.5 Taking steps towards active data privacy protectionA tangential but important issue that came up in our exploration was the topicof privacy and security. This was not our focus, but we inevitably touched onit. Participants discussed how the concepts could work for managing privacy andsecurity, both on their devices and in the cloud. Their attitudes varied. In general,they perceived the concepts as acceptable if they came from trusted brands, wereofficially part of the operating system, or were looking at data on devices morethan on cloud platforms (possibly because cloud platforms are “curated throughuse” [315]). But often participants noted how keeping decisions are more delicateand consequential when it comes to privacy. Nudges, reminders, and prospectiveactions could prevent unwanted issues or “leaks” of sensitive data.These results resonate with the public’s desire for more control over data [176]in the face of recent data scandals [47, 107]. The advent of the cloud has imposeda centralized data management model where a few corporations (Amazon, Apple,Google, Facebook, Microsoft) aggregate the large bulk of people’s data. But asMortier et al. argue [198], this approach is “fundamentally flawed” and as the im-127portance of digital data continues to grow, it faces increased scrutiny. We arguethat a contextual and user-driven approach to keeping decisions in the cloud can bea concrete step in protecting privacy. In particular, our ideas around safeguardingand rethinking keeping decisions can be extended to a privacy-oriented mindsetto provide more control to users. There is an opportunity for future work to tar-get similar ideas exclusively around data privacy management and better explorepeople’s attitudes. This approach would fall in line with recent work on “designworkbooks” for privacy [312]. Another possibility in this domain is to study how touse similar concepts for data created about people (e.g., advertising data). Oppor-tunities include allowing users to create temporary advertising profiles or reviewand discard any tracking data that companies have on them.7.7.6 Reflecting on the broader impact of our workFinally, we reflect on the broader impact of our work for individuals and soci-ety [125]. On one side, we hope to inspire positive change in the space of datamanagement and selection, pushing towards a nuanced approach and truly user-centred designs. At the same time, we see how some of the concepts we proposemight lead to unintended consequences and be abused to further centralize datamanagement and restrict users’ freedom. For example, Temporary Folder and Tem-porary App could be used to restrict users’ access to their own cloud-stored data,imposing a subscription model to items that they perceive as their own. Indeed, wesee this trend already emerging in several software applications, as P11 lamentedwhen discussing her use of the Adobe Creative suite. Business needs drive thesedecisions, yet, this approach may contribute to eroding people’s sense of owner-ship and agency in relation to their data. This, in turn, feeds their general dis-trust for technology. We argue for an alternative, de-centralized data managementmodel, where users’ control is key, privacy comes first, and management actionsare context-based. New regulations such as the European Union’s GDPR (GeneralData Protection Regulation) provide a first step towards more ethical practices inthe space of personal data, but regulatory efforts need to be complemented by de-sign changes. It is up to us, as researchers and designers, to ensure that the needswe discuss around personal data are met and that people’s boundaries are respected.1287.8 LimitationsSome limitations in our study point to additional future work. First, while our sam-ple is meant to be generative and varied in terms of occupations and data manage-ment styles, participants had a predominantly Western background. This limitationis an opportunity for future work to focus on participants from different cultures,to see if and how attitudes change. We also did not screen participants for moregeneral attitudes around decision making or psychological traits, because findingcorrelational links was not the goal of this study. But there might be individualdifferences in risk aversion and risk seeking that inform people’s attitudes in de-ciding what data to keep or discard. A previous study by Massey et al. [187] linkspersonality traits to differences in file management behaviors. Similar efforts alongthis line can complement our work.7.9 Summary and conclusionDrawing on previous work on personal data management, we created five conceptsto explore a design space around keeping and discarding decisions. By probingdifferent design dimensions, we elicited contrasting attitudes about the role of tech-nology in supporting decisions, finding a common ground in the need for nuancedand contextual support. Our work in this chapter opens possibilities for new toolsthat, with proper safeguards, have the potential to help users better select whatpersonal data to keep or discard. We see this as a critical step in addressing ourpost-cloud world that is overflowing with data.In the next chapter, we will explore how to include ideas and implications fromthis study into a cohesive, centralized system, with a focus on personalization.129Chapter 8Study 4Evaluating a Personalized Approach toPersonal Data CurationIn this final research chapter1, we build upon the results of all previous chapters.We use the insights on preservation tendencies (Chapter 4), the five curation behav-ioral styles (Chapter 6), the taxonomy of decluttering criteria (Chapter 5), and theinsights on selecting data (Chapter 7) to build Data Dashboard, a cohesive, person-alized system that combines different design ideas from across the dissertation. Wethen use a prototype of the system to explore how this approach can support peo-ple’s curation behaviors. Our results are promising and provide support for muchof the work that came before, but they also leave space open for future work in thisarea.8.1 IntroductionThroughout the dissertation, we have shown how deciding what personal data tokeep or discard can be difficult. Selecting what to keep is a contextual choice anddata management tools often offer poor support for this practice, making personal1Published in Vitale, Chen, Odom, and McGrenere. (2020) Data Dashboard: Exploring Cen-tralization and Customization in Personal Data Curation. Proceedings of the 2020 Conference onDesigning Interactive Systems (DIS ’20) [292]130data curation a challenging process. In this chapter, we focus on what we thinkare the two most pressing challenges that complicate user practices and possibledesign efforts that might mitigate them.The first challenge for personal data curation is the growing number of devicesand cloud platforms that people use [288]. The distributed, often fragmented natureof personal data undermines awareness of ownership, making curation even moredifficult: it becomes hard to curate data if you do not know what you have [213].One approach to address this issue is centralization, which we define as providingan overview of data from different places in a single central tool for increasingawareness of what a user owns [213, 215]. Our work asks the following questions:Can centralization help people decide what personal data to keep or discard? Ifso, how should we approach the design of centralized tools? With cloud platformssuch as Google Drive and Dropbox moving towards centralization by encouragingusers to “sync” all their data to the cloud as a backup, investigating this questioncan produce insights that will help us understand the consequences of a similarapproach.The second challenge for building curation tools is the subjective nature of per-sonal data management and curation [22, 219]. It is difficult to create a solutionthat can satisfy different types of users who have different management styles andcuration approaches [150]. Our work in previous chapters details the subjectivenature of keeping and discarding decisions. A common design approach to dealwith individual differences is to turn to personalization or customization–the abil-ity to tailor a system to specific users’ needs and characteristics. From this comesour second set of research questions: Given the personal and subjective nature ofdata curation, can customization help? How desirable is a personalized approachto data curation? And, what are the aspects that make it more or less desirable?To answer these questions, we designed Data Dashboard, a centralized andcustomizable system for curating personal data. We evaluated an interactive pro-totype of Data Dashboard with 18 participants who had different approaches todata curation, asking them to go through five potential use scenarios. Drawingon previous work around digital data, we use the concept of data boundaries (theidea of conceptual lines that prescribe where to store personal data and how) tounderstand participants’ reactions to centralization and customization. We show131that centralization blurs boundaries and introduces a dilemma around privacy andsecurity, requiring explicit safety guarantees. Customization, on the other hand, iseasier to accept because it upholds boundaries. We discuss what these results meanfor future data curation tools.In this chapter, we make three contributions: 1) we provide additional empir-ical evidence for the role of data boundaries in personal data curation and use itto understand participants’ reactions to design choices–specifically, we show re-actions to centralization and customization; 2) we offer an approach to addresskey challenges in data curation by designing a unified tool with personalized func-tions; and 3) we outline design and research directions for future data curationtools focused on integrating data boundaries into design, rethinking the languageof personal data, and envisioning a post-cloud future.8.2 Related workThe related work we discuss in this section falls into three main areas: (i) previousPIM system studies; (ii) data boundaries; (iii) personalization and customization.Additional related work appears in the prototype description.8.2.1 Previous PIM system studiesSeveral PIM studies propose and evaluate new tools for managing data. For exam-ple, some systems focus on management and retrieval of documents [69, 79, 82, 90]or files and folders on desktop computers [29, 143]. Others explore how to man-age and curate photos [32, 208, 245, 320], contacts [30], or emails [19]. A relatedstrand of work explores automation in the context of task management [119] orcloud file systems [34], showing that it can help people in their management tasks.However, attitudes towards automation for data curation show individual variation,with some people opposing automation (as we show in Chapter 7).The majority of past design work in PIM takes a quantitative approach for eval-uating prototypes, using lab experiments or usage log analysis in field deployments.These studies largely look at time on task and similar metrics for measuring suc-cess. A few studies, instead, use a more qualitative approach, focused on teasingout participants’ attitudes. In our work, we use a qualitative approach in line with132Research through Design [317] and reframe design artifacts in Chapter 7 and Chap-ter 8 as tools to understand broader aspects of data curation. This approach fallsin line with past work on digital data that uses systems to prompt discussion withparticipants [111, 112, 212, 232]. Below, I expand on my methodological approachthroughout the dissertation.8.2.2 Data boundariesPast work points to the idea of data boundaries as a key lens to understand data cu-ration. At a high level, we define data boundaries as conceptual lines that prescribewhere to store personal data and how. Boundaries appear in studies on data man-agement [219, 288, 301], collaboration [293, 295], communication [49, 51, 207],privacy [2, 15, 225, 231], and personal possessions (both physical [17, 58, 77, 270]and digital [154, 166, 211, 299]). We can argue that data management and cu-ration are essentially about establishing and negotiating boundaries. This processhas roots in cognitive models of how the human mind works: to make sense ofa continuous world, people build categories and “draw mental boundaries aroundthem” [219].Research on digital data often looks at the contrast between physical and digitalpossessions, exploring the boundary between the two domains [108, 111, 140, 149,153, 211, 215, 224]. The notion of boundaries runs throughout studies on physicalpossessions, helping us understand how people might experience them. For exam-ple, people might use boundaries to mark the unique character of their spaces [77],give meaning to clutter [270], and establish what belongs in their home [58]. Theseare all practices that help in building an identity [17]. Boundaries also exist in thedigital world. Sometimes they are explicit, as is the case when people use foldersto create structure in their data [301] or when they build collections with clear cri-teria for what goes in and what does not [299]. But more often data boundaries areimplicit. They are influenced by tools and applications [19, 166, 288, 293], workand personal life [49, 51], group and family relationships [154, 207, 288, 296],activities [295], and context (Chapter 7). A key boundary to understand partici-pants’ reactions to centralization is the one around privacy. Previous studies arguethat privacy management is at its essence about negotiating personal boundaries133between the public sphere and the private sphere [2, 225, 231]. Privacy boundariesare “permeable” and “murky” with context playing a key role in defining and shap-ing them [2]. They are subjective and dynamic, evolving over time [15, 225, 231].Our work provides additional evidence for how boundaries drive personal datacuration and shows how to use the concept to understand reactions to centralizationand customization. We did not have in mind the concept of boundaries when wedesigned Data Dashboard or the study, but we identified it in the analysis and thentraced it back to past work.8.2.3 Personalization and customizationBroadly speaking, personalization is about creating a system that is adapted tothe user’s individual preferences and characteristics [12, 36, 119, 190, 193, 269].There is no standard definition of personalization, but previous work highlightsdifferent levels of user involvement [36, 190, 193, 269]. In purely “adaptive”system-controlled personalization, the system does not directly involve users whendeciding how to change the content, interface, or functionality [269]. System-controlled personalization is largely implicit. Customization, by contrast, is theterm used for user-controlled personalization, also sometimes referred to as “adapt-able”: the user is responsible for changing the system through explicit actions. Amiddle ground approach involves mixed-initiative systems, where personalizationis system-initiated but needs to be “approved” by the user [46, 121, 134].Regardless of level of user involvement, personalized changes in the systemcan take place at different levels: user interface (layout), content, and function-ality. Changes in the user interface include, for example, changing colors, fonts,backgrounds, visible buttons, and so on. These changes are largely concernedwith aesthetics [314]. Changes in content are relevant in information-centric sys-tems: they involve showing or hiding specific content based on different userneeds. For example, social network feeds show different content to differentusers [190, 269]. Finally, changes in functionality are about changing how thesystem works [119, 314]. For example, creating macros in spreadsheets, extendingbrowser functionality with extensions, or changing system settings. This approach134is often labeled “advanced customization or personalization” [119]. In this studywe focus on content and functionality changes.Personalization can help users complete tasks and provide benefits if donewell [46, 92, 106], but there are factors that influence and limit its benefits. Ingeneral, users seem to prefer mixed-initiative interfaces over purely adaptable in-terfaces [46], and adaptable interfaces over purely adaptive interfaces [193]. Butthe time and knowledge required for enacting customization often prevents peo-ple from enjoying its benefits [174]. Individual differences [193] and factors suchas exposure, awareness, and social influence also determine whether users willcustomize or not [12, 92]. Several PIM studies have looked at automating infor-mation organization, with past work [121] (and Chapter 7) providing a more de-tailed overview of related work in the area. However, only a few studies havelooked explicitly at personalization or customization mechanisms for data man-agement [119] and curation [208, 245], suggesting that this is a promising butunexplored approach.8.3 The Data Dashboard prototypeWe designed Data Dashboard to address the two key challenges of personal datacuration outlined in the introduction: the fragmented nature of data across devicesand the subjective nature of curation.8.3.1 Overview of the prototypeData Dashboard is a centralized system that provides an aggregated overview ofdata stored on different devices and cloud platforms (e.g., Dropbox, Google Drive,iCloud). The system provides an overview of data and a set of customizable filtersfor curating data. There are four sections: Activity, Explore Your Data, QuickActions, Settings.22The prototype is online at https://datadashboard.github.io. It works on all desktop browsers, buthas some bugs in Safari. Chrome is the best browser to use it. The prototype is not optimized formobile devices. We note that the prototype is only an illustration of possible functions: it does notconnect to any device or cloud platform. A video walkthrough of the prototype is available in thesupplementary materials to the thesis.135Figure 8.1: Activity shows an overview of recent data.Activity (Figure 8.1) provides an overview of recent data (Google Drive, Drop-box, and macOS provide similar views of recent files). For example, for a givenday, it shows documents that users have created or edited, photos and screenshotsthey have taken, apps they have installed, and so on. Users can filter their activityby time (today, this week, this month, all time, custom period). Activity also hasa section aggregating data shared with other people on collaborative services (e.g.,Dropbox, Slack), grouped by the person it was shared with.Explore Your Data (Figure 8.2) shows an overview of all data users have ondifferent devices and platforms, grouped by type of data (e.g., photos, emails, textdocuments, videos, and so on)3. Users can sort or hide the different data types3The complete list of data types we considered is visible in Explore Your Data after clicking the“Sort or hide the types of data you see” button. It includes: photos, screenshots, emails, text doc-uments, presentations, spreadsheets, videos, audio, folders, contacts, applications, messages, cacheand logs, bookmarks, ebooks, tracking data, browser history, passwords, notes and reminders, games.This is a comprehensive but not exhaustive list of types, meant to be a starting point in our explo-ration. Future work could extend our approach to a wider range of types (e.g., unintentionally digitaltraces collected by technology, such as the time spent on a device, clicks, queries.)136Figure 8.2: Explore Your Data shows an overview of all data from differentdevices and cloud platforms. Users can use filters to see different cate-gories of data for each data type.displayed. They can also filter what the system shows them, choosing a filter onthe left: “Data I could get rid of,” “Data I could organize,” “Private information,”“Data to back up or sync.” Users can customize how the sidebar filters work in theSettings section.Quick Actions (Figure 8.3) provides a list of recommended actions for differ-ent data items. For example, the system suggests duplicates to remove, documentsto rename, or message conversations to archive. As in Explore Your Data, userscan filter the recommendations using the sidebar filters (same options as in ExploreYour Data). Users can also apply automatically suggested actions to similar itemsin the future.137Figure 8.3: Quick Actions has recommendations for curating.In Settings (Figure 8.4), users can customize how the filters for Explore YourData and Quick Actions work. They can choose to include or exclude some defaultcombinations of data types and criteria or create new, custom combinations withdifferent data types and criteria. They can also add or remove connected devicesand cloud accounts.8.3.2 Rationale and designData Dashboard combines insights from the previous chapters and past work onpersonal data management [166, 213], while also combining and extending keyaspects of existing commercial and research products. The visual design of thesystem takes direct inspiration from Google Dashboard 4 and similar privacy dash-4https://myaccount.google.com/dashboard138Figure 8.4: The Settings page allows user to customize data filters.boards. We wanted Data Dashboard to feel similar to existing systems so thatparticipants could better imagine when to use it. Still, we had to build our own sys-tem rather than use existing tools to explore how centralization and customizationcould work together.The idea of a centralized toolOne key inspiration for Data Dashboard is work by Odom et al. [213, 215] oncloud platforms and the need for more awareness of digital possessions. Odomet al. suggest creating a “visual inventory” of digital possessions, “a place where‘my stuff’ can be found, even if, in technical terms, it exists on many differentservers, or many applications.” A place to quickly go back where data is originallystored, preserving its context. Lindley et al. [166] explored a similar premise and139found that a centralized web archive might not be the ideal solution for users. DataDashboard, however, is not a central archive in itself, as it only provides links todata stored on different devices and cloud platforms but does not copy any datafrom other storing places (technically, Data Dashboard would rely on metadataprovided by devices and cloud platforms to provide an aggregated overview ofdifferent items). Instead, we see it as a tool to curate a “meaningful archive” [183]within the systems that people already use to store and manage data.Some past PIM projects also explore the idea of centralizing data [69, 82, 143].But they focus on retrieving data, rather than on helping users curate and decidewhat to keep or discard. Cloud platforms such as Google Drive and Dropbox, in-stead, are moving towards centralization by encouraging users to “sync” all thedata from their computers to the cloud as a backup [81, 246]. There are also com-mercial products that offer to unify separate cloud accounts, such as odrive 5 andMultCloud 6, but Data Dashboard has specific novel functions for curation (e.g.,the data filters and their customization).Providing a dashboardA key function of Data Dashboard is to provide an overview of personal data byshowing numbers for different categories (e.g., 49 blurry photos). Two systems in-spired this approach: 1) Cardinal [75], a research tool that scans a user’s computerand provides counts for the total number of files, breaking down numbers for somepopular file types (e.g., photos). We wanted to use the same approach in a user-oriented system. 2) Google Dashboard 7, a privacy-oriented page that provides anoverview of data created and stored in different Google products [233]. (Yahoo 8also provides a similar interface for managing privacy settings, and work on GDPRcompliance also proposes the idea of a dashboard [241].) Data Dashboard is similarto these tools (Figure 8.5) but extends to entire personal data “ecosystems” [288],bringing together data from more than a single device or platform.5https://www.odrive.com6https://www.multcloud.com7https://myaccount.google.com/dashboard8https://yahoo.mydashboard.oath.com140Figure 8.5: The design of Data Dashboard took inspiration from similar dash-boards available online. Here, two key examples: Google Dashboard(top) and Yahoo’s privacy dashboard (bottom).141Our work in Chapter 7 and “cleaning” tools such as Files 9, Clean My Mac10, and CCleaner 11 inspired the automatic recommendations for data to curate inQuick Actions. However, we wanted Data Dashboard to be more comprehensivethan similar tools. Most of these tools focus on freeing up space by finding itemsto discard based on their size. Instead, we wanted to help users go beyond freeingup space, and provide them with a broader set of actions.8.3.3 Personalization mechanismsData Dashboard provides both content customization and functional customization.In terms of content, users can customize what data types they see and their order inExplore Your Data (Figure 8.6). In terms of functions, users can personalize howthe sidebar filters in Explore Your Data and Quick Actions work, deciding whatthe system will show (Figure 8.4). This approach is in line with work on advancedpersonalization for task management [119]. The four options in the sidebar filters(“Data I could get rid of,” “Data I could organize,” “Private information,” “Data toback up or sync”) reflect different behavioral patterns and goals that people mightrefer to when curating data, based on the insights we presented in previous chap-ters. In particular, “Data I could get rid of” embodies user goals from three keybehavioral styles presented in Chapter 6: Overwhelmed, Purger, Frugal. “Data Icould organize” is largely targeted towards the Overwhelmed behavioral style, butis also consistent with the cross-cutting need for re-organizing data in other behav-ioral styles (Casual, Purger, Frugal, Collector). “Private information” is targeted atthe cross-behavioral style goal of protecting data privacy, with some styles (Col-lector, Purger) reflecting a stronger concern compared to others (Casual, Frugal).Finally, “Data to back up or sync” is largely targeted at the Collector behavioralstyle, with this pattern reflecting a common worry of losing data.In addition, the three main sections (Activity, Explore Your Data, Quick Ac-tions) target different types of users and data curation practices. Activity and Ex-plore Your Data target users who prefer to explore data and decide on their ownwhat to do by inspecting items individually. By contrast, Quick Actions targets9https://files.google.com10https://cleanmymac.com11https://www.ccleaner.com142Figure 8.6: The panel to sort or hide data types in Explore Your Data.users who welcome automation and expect an intelligent system to do things forthem. Chapter 7 details these contrasting attitudes to data curation.8.3.4 ImplementationWe built the prototype using AngularJS and Bootstrap 3. We designed Data Dash-board as a system that would be quick to turn into a horizontal prototype, thatis, a prototype where only top-level functions are implemented to communicatethe scope of the whole system [21]. All data in the prototype is “fake.” This is alimitation of our approach, as previous work shows the value of using real partici-pants’ data [111, 112]. However, we took this approach because it was easier andfaster to prototype. Not using participants’ real data also prevented any potentialinfringement of participants’ privacy and gave all participants the same experience143when going through the scenarios. The recommendations in Quick Actions, thedata types in Explore Your Data, and number of items for each type are meant toillustrate the scope of the prototype and largely reflect common distributions ofpersonal collections [73, 76].8.4 MethodologyWe evaluated Data Dashboard in a user study with 18 participants. In theevaluation, we collected participants’ opinions about the Data Dashboard inter-face, the key ideas behind it, and possible scenarios of use. This specific ap-proach to Research through Design aims to use design artifacts and devices toframe and open prospective conversations with participants, as previous studiesshow [111, 212, 232, 317]. This approach also explains the relatively minimal,under-designed nature of the prototype: we did not design a costly, high-fidelityprototype because our goal was a design exploration focused on eliciting partici-pants’ reactions. Before recruiting participants we ran four pilot sessions to checkfor potential issues in our study procedures. We also gathered feedback from otherlab members throughout the iterative development of the prototype, going frompaper sketches to the interactive version we implemented.8.4.1 ParticipantsWe recruited participants using a university recruiting list and Craigslist in Van-couver, Canada. We used a screening survey where we asked participants theirage, occupation, main approach to data curation (based on Chapter 6), and whatdevices, cloud platforms, and specific data management tools they used (all from alist of popular options). (The screening survey is available in Appendix D.)We received 169 responses to the screening survey. We contacted 38 respon-dents, 25 agreed to take part in the study, and we ran the study with 18 of them (12women, 6 men, aged 18-64, median age: 33). Participants’ occupations includedaccountant, background actor, business contractor, childcare provider, facilitator,occupational therapist, sales associate, social worker, student, postdoc, researchmanager, retired. Most participants self-reported having average technical skillsand no experience in computer science or programming. We recruited for a var-144ied set of participants who used different data curation tools, cloud platforms, anddevices. Some participants did not use any tools or cloud platforms. Participantsalso varied in their approach to curation, based on the behavioral styles we intro-duced in Chapter 6. We had a roughly equal distribution of participants across fourbehavioral styles: overwhelmed, collector, purger, casual. Only one participantdisplayed a frugal approach for some of their data, but we have already discussedin Chapter 6 how we expected a much smaller number of participants with a frugalapproach.8.4.2 Procedure and data collectionThe data collection took place over one month, with each study session lasting be-tween 41 and 97 minutes (average: 64 minutes). Whenever possible, two membersof the research team conducted the session. One member would ask questions,while the other would take verbatim notes on a computer. In cases where only onemember of the team conducted the interview, we later transcribed the audio record-ing. After each interview, we compiled a debrief document noting key answers andpreliminary insights from the interview.The session had three parts (each lasting 10-30 minutes): an introductory inter-view, an exploration and scenario-based interaction with the prototype, a debrief-ing interview with two, short card-sorting activities. Participants interacted withthe prototype using the Chrome browser on a MacBook Air 13” laptop. We audiorecorded the whole session and screen-recorded the interaction with the prototype.Participants received $20 in compensation at the end of the session.Introductory interviewIn the first part of the session, we had an introductory interview focused on datacuration practices. First we asked participants to remember and tell us about thelast time they decluttered some of their data, by reviewing, organizing, discarding,archiving, or moving several items at once. Then we asked participants to showus examples of how they organized their data on their devices, how they decidedwhat to keep or discard, and how they used specific tools –if any– to manage orcurate their data (e.g., settings panels to clean up data, Google Dashboard, and so145on, depending on what they mentioned using in the screening survey and in theirinterview answers).Use scenariosIn the second part of the session, we introduced participants to the prototype andlet them familiarize with it for a few minutes, prompting them to think aloud. Af-ter this initial exploration, we asked participants to go through five possible usescenarios with the prototype, once again asking them to think out loud.Scenario 1 (space running out): “The space on your computer is running out.You want to find some data to discard. You are not sure where to start looking, butyou know that you do not care too much about old documents.”Scenario 2 (taking time for regular data curation): “It is a rainy day. You haveset aside some time for doing a regular cleanup of your devices. You usually dothis every few months. You want to review your data and make sure everything isorganized in your preferred way.”Scenario 3 (exploring recent data): “You have 5 to 10 minutes in betweenmeetings and errands. You decide to take a look at your recent data to get a senseof anything that needs taking care of.”Scenario 4 (protecting data privacy): “You have heard about a data leak from apopular cloud storage platform that exposed personal information to hackers. Youwant to review what data you have stored on different cloud platforms that mightpose a privacy risk in the future.”Scenario 5 (safeguarding data across devices and platforms): “You are in theprocess of buying a new computer. You want to make sure that you are not goingto lose any of the data you care about. You want to ensure that everything is storedin more than one place.”The scenarios are based on insights from previous chapters: they consider bothdifferent decluttering strategies (Chapter 5) and different behavioral styles in per-sonal data curation (Chapter 6). We wanted to explore how different functions inthe prototype can support different scenarios and user attitudes.More specifically, Scenario 1 (space running out) closely resembles the con-text behind triggered decluttering (Chapter 5) and reflects goals common to all146behavioral styles, with the Overwhelmed style being a stronger target than others(Chapter 6). Scenario 2 (regular data curation) ties back to routine decluttering(Chapter 5) and largely reflects goals from the Purger or Frugal behavioral style(Chapter 6). Scenario 3 (exploring recent data) is mainly targeted towards theCasual behavioral style (Chapter 6) and also reflects a serendipitous declutteringstrategy (Chapter 5). Scenario 4 (protecting data privacy), focuses on the cross-behavioral style goal of protecting data privacy. Finally, Scenario 5 (safeguardingdata across devices and platforms) reflects one of key goals for the Collector be-havioral style (Chapter 6).The scenarios were also key for helping participants focus on concrete impli-cations of use rather than low-level details of the prototype (e.g., colors, fonts,buttons). Some of the scenarios mention specific devices to feel concrete, but weencouraged participants to see them as a starting point to discuss additional devicesand broader situations or practices.Debriefing interviewThe third and final part of the session was a debriefing interview about the proto-type. Here, we asked participants about their impressions of the system, clarifi-cations about what they did during the scenarios, and then walked through all thefour sections of the prototype one by one to gather more specific feedback. Fi-nally, we had a short card-sorting activity where we asked participants to rank thefive scenarios by how relevant they were to their own experience. We promptedparticipants to explain their ranking, elaborate on the match between different datamanagement methods or tools (prototype included) and scenarios, and consider inwhat other situations they could imagine using the prototype. Participants usedsmall paper printouts of the scenarios. We also asked participants to rank the use-fulness of the prototype against any tools they mentioned using in the screeningsurvey or during the interview. Once again, participants used paper printouts withthe names of the different tools we had prepared for them.1478.4.3 Data analysisWe used thematic analysis to develop recurring themes and patterns from the ses-sions [61]. The analysis process took place over three weeks. Two members ofthe team conducted the bulk of the analysis and later discussed the themes withthe other team members. We started with a round of open-coding, where the twoteam members coded data in parallel, seeing each other’s codes. Then, we groupedcodes into categories, and started thinking about themes and patterns across cate-gories. Codes and categories were both inductive and deductive (based on insightsfrom previous studies, and specific aspects or sections of the prototype). We dis-cussed several iterations of possible themes, choosing specific areas of the analysisto focus on. After identifying the key lens of the analysis (centred around databoundaries and their effect on centralization and customization), we went back tothe transcripts and re-coded them using only the three themes to check for consis-tency and make sure that our interpretation fully captured participants’ experience.8.5 ResultsIn this section, we first provide some context for the analysis, based on generalreactions to the prototype and how participants used it during the scenarios (Fig-ure 8.7) (more details are also available in Appendix A). Then, we delve into themore interpretive part of the analysis. As in Chapter 7, the themes we present area high-level cross-cutting synthesis of individual behaviors.8.5.1 Contextualising informationOverall impressionsMost participants (12/18) had positive reactions to Data Dashboard, saying that itwas “smart,” “intuitive”, “user-friendly,” and would save their time. Several par-ticipants also preferred the system when comparing it to other tools they had usedin the past. Some participants went so far as asking if they could have the systeminstalled on their devices after the interview.But not all participants liked Data Dashboard. Participants who had mixedreactions (3) thought that some aspects of the system were unclear or unnecessary.148Participants who had negative reactions (3), instead, said they did not need or wanta tool such as Data Dashboard. Some opposed the idea of a system deciding howto curate data. Others did not see data management as something worth their time.These negative reactions are in line with previous work and support the idea thatdifferent users have different needs when it comes to personal data curation andtechnology support, as shown in previous chapters. Although it was not our goalto find precise correlations among behavioral styles and reactions (the number ofparticipants would not allow it in any case), we noticed that the negative reactionscame most commonly from participants who took a collector approach–this resultaligns with our expectations but future studies with a bigger sample can betterexplore this possible link.Many participants also reflected on the potential privacy risks of centralizingpersonal data; we expand on this theme later.Interaction during the scenariosExplore Your Data (EYD) and Quick Actions (QA) were the most commonly usedsections of the prototype during the scenarios. Participants thought that EYDworked best for occasional, more focused scenarios and they used it more fre-quently in scenario 1 (getting rid of data to free up space) and scenario 2 (regularscheduled cleanup). Instead, QA would work better for short management episodes(scenario 3, taking a few minutes to look at recent data). Most participants alsoused Activity at some point, but several participants found it underwhelming andtoo similar to EYD. Several participants did not notice the Shared Data section orfound it confusing. Some thought it would help them when working on collabora-tive projects.All participants except one discovered the sidebar filters in EYD and QA ontheir own, and most used them at some point during the scenarios. Most partici-pants found the filters comprehensive and the idea of automatically clustering datahelpful. Participants did not have clear requirements for how the system shouldgenerate suggestions or cluster data, but they expected “a machine learning al-gorithm that gets better with time.” (P18) Most participants also discovered theSettings page and its link to the sidebar filters but several found it initially confus-1492020-04-04, 10(51 PMData Dashboard study | All SessionsPage 1 of 2https://www.notion.so/2aaea588c2e94411acaa50eda84195b7?v=6959f4c92d06449691c7fd71a0e9ce3cData Dashboard studyAll Sessions Properties FilterP3 positive Explore (5) Storing on multiple places (4) Privacy leakP4 negative Explore (3) Exploring recent data (4) Privacy leakP5 more negative Explore Quick Actions (2) Regular cleanup (3) Exploring recent dataP6 positive Explore Quick Actions (1) Space running out (4) Privacy leakP7 positive Quick Actions (3) Exploring recent data (4) Privacy leakP8 positive mixed Quick Actions (5) Storing on multiple places (4) Privacy leakP9 more negative Quick Actions (1) Space running out (5) Storing on multiple placesP10 mixed more positive mix (3) Exploring recent data (4) Privacy leakP11 positive Quick Actions (3) Exploring recent data (4) Privacy leakP12 positive Quick Actions Settings (3) Exploring recent data (5) Storing on multiple placesP13 positive Explore Quick Actions (1) Space running out (3) Exploring recent dataP14 positive mix (1) Space running out (4) Privacy leakP15 positive Explore (2) Regular cleanup (1) Space running outP16 positive Activity (2) Regular cleanup (5) Storing on multiple placesP17 positive Quick Actions Explore (1) Space running out (3) Exploring recent dataP18 positive Explore Quick Actions (1) Space running out (3) Exploring recent dataP19 positive Quick Actions (2) Regular cleanup (1) Space running outP20 mixed Explore (3) Exploring recent data (2) Regular cleanup!ID General reaction Most used section Most relevant scenario Least relevant scenarioFigure 8.7: An overview of participants’ varied reactions to the Data Dash-board prototype. For each participant, we report the general reaction(negative, more negative than positive, mixed, more positive than nega-tive, positive), the most used sections during the scenarios we providedin the evaluation, and the participant’s ranking of the most and leastuseful scenario. As noted earlier, we excluded P1 and P2 from the finalanalysis.ing. In general, participants thought that it was a good idea to customize the filtersbecause it gave them more control; we expand on this theme later.Scenario 4 (protecting privacy) and scenario 5 (safeguarding data across de-vices) presented some challenges for participants. When going through the privacyscenario, several participants scanned the system looking for a way to see all datastored in a specific platform (e.g., iCloud, Dropbox, Google Drive) or device (e.g.,“my drive”). Often they could not find what they were looking for. But many par-ticipants also said that they prefer not to store private data on the cloud in the firstplace and that this scenario did not apply to them. In the last scenario, instead,many participants did not use the system and talked about their actual process of150backing up files either manually or through completely automatic solutions (e.g.,Apple’s Time Machine, Google Photos). Some participants also had trouble withthe language used by the system, saying they had no idea what “syncing” datameant: “I know what data to back up means but I don’t know what sync means. Idon’t know if they’re related, maybe.” (P16) Overall, these reactions highlight howparticipants saw privacy protection and backing up as processes that either takeplace outside of specific tools or require little input from them. Participants mostcommonly ranked these last two scenarios as the least relevant.These results show how our scenario-based evaluation prompted participants toexplore Data Dashboard and think about its design. They also suggest that our ap-proach of combining centralization and customization has potential. To unpack itsvalue and potential risks we now turn to the more interpretive part of the analysis,where we delve into key aspects of curation, centralization, and customization.8.5.2 Data boundaries drive curationWhen we analyzed participants’ reactions and interactions with the prototype, oneidea became key: data boundaries drive curation. We have briefly discussed whatwe mean by data boundaries in the related work section. Boundaries are an abstractconcept that can explain how people enact curation of their personal data ecosys-tems [288]. People create implicit or explicit boundaries that separate differentcategories of information. These boundaries help people build their identity, markareas of their life, and feel in control. For some people boundaries will be moremalleable, for others more strict [219]. Here, we provide additional evidence forhow people create, protect, and think of data boundaries. Then, we use the conceptof boundaries to explain reactions to centralization and customization.Creating boundariesAll participants implicitly talked about creating boundaries when deciding whatdata to store and where to store it. They mentioned rules for what goes where andwhy: “I have my data compartmentalized: [the] tablet is just for reading. [The]laptop is for everything school related.” (P9). A key boundary was between pri-vate and non-private information, with participants choosing where to store private151information based on perceived privacy risks of different devices and cloud plat-forms: “I don’t put any of my private stuff on the computer at all.” (P6) Boundariesalso helped participants distinguish and prioritize data based on importance. Forexample, P10 (who showed a mix of collector, purger, and overwhelmed behaviors)discussed moving data from one cloud platform to another to separate expendableand essential information: “I started moving everything [from Box] to Dropbox,because I think it’s just more reliable. So everything that I can’t afford to lose, I’llstick it in Dropbox.”Breaking boundariesThe centralized approach of Data Dashboard prompted participants to reflect onhow they often experienced breaking points in boundaries. Sometimes this wasintentional. Reflecting on the suggestions and filters for “private information” inthe prototype, one participant explained that sometimes it is necessary to break aboundary around privacy because of convenience: “[Having passport informationon Google Drive is] not ideal. But because of frequent travel, I am somewhereand I am filling in a form and I need that information, [so] either through GoogleDrive or email I was able to locate it. So, it’s more about ease of access to theinformation that saves my time than anything else. Ideally, I don’t like to haveimportant documents even on email, but I haven’t learned if there is a secure systemto store them online.” (P17)In other cases, boundaries were broken unintentionally. For example, afterlooking at the different types of data in the prototype, one participant explained thefrustration of having WhatsApp photos from other people go automatically intotheir own. In this case, the boundary between “my stuff” and “other people’s stuff”was broken without permission: “I tried to figure out on this device as well, somesort of filter where you can manage whether your images from WhatsApp, a groupmessage thread, are automatically going into your photos or not. It drives mecrazy that WhatsApp’s photos go automatically into your photos. If there’s more ofa filtering system to help you organize that, that would be great.” (P12)152Boundaries influence trustThe need to create and protect boundaries also influenced participants’ trust in thesystems and platforms to manage data. Several participants asked whether DataDashboard would be associated with a specific brand because they tended to trustspecific brands with their data: “Google has created credibility over years so Iknow I have trust in that system. [I use] Mac because of [its] ease of use but Idon’t always store all my important info over there. They have a very tricky way.If you’re locked out, only they can unlock it. I don’t like that dependency on a thirdparty.” (P15) Others, instead, referred to the boundary between physical hardwareand cloud platforms to explain their practices, wondering whether Data Dashboardwould be largely local or cloud-based: “You can’t trust anything that you don’thave control on the hardware. Especially with smartphones, because most of itgoes through the cloud, you can’t really trust anything,” said P4, who had both acollector and frugal approach with data.Boundaries lead to fragmentationA consequence of creating boundaries is data fragmentation [143, 288]. This isone of the key challenges for data curation in an age of multiple devices and cloudplatforms. When considering data boundaries, participants explained how frag-mentation could be beneficial: “I am isolating my data more and more rather thansharing it. The people who invented Facebook and all these platforms, they didit with good intentions. The problem is these platforms are now abused [...] Youcan’t really trust their intentions so you have to protect yourself by isolating yourdata.” (P4) But it could also be costly, leading to confusion and frustration: “I’vestarted working for different organizations, you have data from different places andsometimes it gets confusing. I have some things from my previous experiences andnow everything is getting mixed up.” (P3) From a design standpoint, we can askwhether fragmentation is a problem to be solved and how to solve it if that is thecase. Next, we look at how centralization and customization intersect with databoundaries.1538.5.3 Centralization blurs data boundariesCentralization is convenientParticipants saw something positive in centralizing data, with the different sectionsof Data Dashboard providing avenues to reconcile data from different devices andplatforms: “I really like how it’s combining different sources of where these thingscould be stored.” (P12) They thought that centralizing data provided some clearbenefits, like saving time: “This looks delightful. The most important criteria fora service like this for me is the time benefit I get from it [...] I don’t mind givingthem access to all my data.” (P3) Or, making it easy to see everything in one place,as was the case in Explore Your Data: “You see everything: your contacts, yourbookmarks. That’s really handy. I like that. It just feels very comprehensive.” (P10)“It’s all different types of [accounts]. I like the way it’s set up. It’s quite clear, itlooks good. I can see everything at a glance.” (P16) They also hoped that the cardsfor different types of data they saw in Data Dashboard could break the barriersbetween different devices: “If it were to show me text messages, it would be goodfor me because right now I have no way to see them from the computer.” (P11)Convenience goes hand in hand with risksHowever, data centralization introduced a dilemma. As much as it seemed conve-nient, it created additional risks: “If I have everything together, [the] ease of use ishigh but [the potential risk for a] security breach is higher as well.” (P15) Espe-cially when looking at Activity and Explore Your Data, most participants expressedconcerns that touched on how centralization blurs data boundaries: “Some peoplelike to collaborate in one place, one app, access [all] email accounts and whatever,but that’s not my preferred way.” (P8) Based on these concerns, they wanted DataDashboard to respect the boundaries that they created by choosing specific storingplaces for different types of data (a result consistent with past work [166]): “I al-ready upload most of my files to Google Drive, Dropbox. I don’t feel that bothered.But if this accesses even the files that I specify not to be [accessed for a specificfunction], that would really bother me.” (P5, whose approach to curation was amix of collector and casual.)154In particular, when considering privacy data boundaries, several participantsfelt uneasy about Data Dashboard: “You already have questions about the securityof iCloud, Google Drive, etc. and that’s reduced when you go to third-party tools.Especially when you see [that] the algorithm can see private information.” (P18)Some participants imagined the negative consequences of having all data central-ized in one system and wondered whether this is the right approach. For example,after seeing a recommendation to review cloud documents containing passport orcredit numbers in Quick Actions, one participant wondered about the consequencesfor privacy: “But there is a link to the file and it says what’s in the file, so what elsedo I need as a thief? You just gave everything on a platter. It doesn’t make anysense to me.” (P4)These reactions resonate with the privacy-utility trade-off, a common no-tion across past work on smart energy systems [282], health data [48], locationdata [52], and differential privacy [6] among others. In short, the privacy-utilitytrade-off considers the tension users might experience with a program that usesdata about them to provide a potentially useful function (e.g., recommending whatmovies to watch based on movies watched in the past).Some participants also had issues with terms related to privacy in EYD, sayingthey were unsure of what “tracking data” meant: “Tracking data.... what does itinclude?.... Not totally sure.” (P12) We used these terms because they are commonin similar tools, but these reactions show the importance of language for buildingtrust in the system and building awareness of what private data users might havestored on their devices.Centralization requires guaranteesBecause of the potential to blur and distort boundaries, participants expressed aneed for explicit guarantees around centralization: “I just want it to be secure [sothat] no one else can get into the system. I don’t know how... I have no idea how youdo it. How do you protect something like that?” asked P16, one of the participantswho felt overwhelmed with data. Participants wanted to make sure that the sys-tem would respect data boundaries and mitigate potential security risks: “It’s thecentral point that commands all my accounts so it has to be highly secure.” (P7)155Participants rationalized the potential adoption of Data Dashboard by referring tothe terms and conditions that the system would have and would need to strictlyapply (although there was no such thing in the prototype): “If I am using the sys-tem, the terms and conditions that are mandatory to be agreed upon, should notinclude that your data has any potential of being shared by any third party for anycommercial use.” (P17) If the terms are clear and the system feels reliable, thencentralization becomes acceptable: “It depends on the privacy statement really. Ifthe agreement seems good enough and it seems a reliable service, I don’t mind us-ing it and it going through my things.” (P3) If it is possible to reinforce a boundarybetween local data and data in the cloud, then the worries disappear: “If this is anonline interface, I’d have problems with it. But if it was offline with no interactionother than backing up, which I control, then I’m completely fine.” (P5) Once again,these reactions highlighted the need for clear explanations around data curation,with participants often blaming themselves for being “not techy”.Even with potential issues mitigated, some participants wondered about thefeasibility of centralization. They reflected on the underlying conflict between theirexpectations and business practices that impose borders around data: “My questionis, can you actually make it? And I’ll be the first guinea pig. The big giants, theyhave big muscle and they compete. [...] I don’t know if it’s possible because of theconflict of interest of these business things. [...] I am using Google Drive becauseI have a Google account, iCloud [because I have] an Apple account. Does this[system] need to be linked to a certain company or email account or something? Ifthis is like my online version of my external hard drive, yeah, I would like to haveeverything consolidated under the one big, safe roof.” (P19) This conflict requireda broader guarantee, one that puts user needs above business needs: “If there issuch a magnificent creation down the road, it could be under the one roof, but alsosecure and safe. That’s kind of my dream.” (P19)1568.5.4 Customization upholds boundariesCustomization can make boundaries visibleMost participants had positive reactions to the customization options in Data Dash-board: “I like choices, so I’m all for it.” (P17) Some were confused by the linkbetween the sidebar filters and the Settings page, but in general they thought thatcustomizing was a good idea. For example, participants liked the option of sortingand hiding data types in Explore Your Data because it helped them see the thingsthey wanted to prioritize: “I’m the sort of person who would limit what I see here.I would keep photos, bookmarks [...] It’s also good that I can reorganize. Mini-mize what I see and prioritize what I see first.” (P9) Similarly, the option of sortingthrough data using the personalized filters excited participants because it made iteasier to look at their data: “This is great to look at what’s old, what’s inactive,unreachable, unused, these are really good,” said P12, who also described herselfas “a purger more than a saver, always looking for ways to free up space.” Together,these options made personal boundaries more visible and tangible, and helped par-ticipants navigate different data curation scenarios: “Sometimes I need to see allof the information, sometimes if I am running on short time, I wanna do a narrowsearch, instead of always seeing everything.” (P17)Customization allows users to manipulate boundariesA consequence of perceiving boundaries as more visible was that participants alsofelt they had more options for manipulating them. Several participants thought thatthe default options in the system were “novel” and “comprehensive” enough tomeet their needs. Choosing what to include or not made them feel more in chargeof the system: “I have some kind of authority to make a selection of what I want.This is my priority, I want to check duplicate documents. These options are goodbecause then I also have a sort of selection power.” (P15) But an additional positiveaspect was the option of creating your own filters through custom combinations inSettings. While this function was confusing for some participants, they generallysaw it as a useful way of setting specific boundaries: “[It is] cool [that] you can157make your own filters. I like that ’cause I can figure out what’s important to me,my own criteria, I like the customization aspect.” (P18)Customization reinforces controlOne possible disadvantage of customization is the time required for it, as we men-tioned in the related work section. When discussing the customization options inData Dashboard, and the Settings page in particular, participants reflected on thetime required for setting the filters as they preferred. Several participants imag-ined it would not take a lot of time because the system provided default optionsto choose from: “I don’t think it would take that much time.” (P12) Others imag-ined it would take some planning on their part: “I think it might take me probablya couple of hours to figure this out. I would plan it out before I actually use thesystem.” (P10) But in the end they thought that investing time in customizing wasworth it: “I don’t mind investing the time if this was going to figure out the system,I would get out a piece of paper and think of recent data to get rid of.” (P10). Tak-ing the time to customize would lead to the system respecting their own prioritiesand boundaries: “I think filters are always useful. I don’t mind [spending time cus-tomizing] because everyone has something different they’re looking for so it’s goodto have these filter so they have what’s most important to them.” (P3) Participantsperceived this trade-off between time and control as necessary to counterbalanceany potential risks coming from other aspects of the system. Feeling in controlmade them feel safer: “When it comes to data, I am very cautious. So if I have thesettings set correctly, hopefully it acts accordingly.” (P19)Overall, these positive reactions show that customization upholds data bound-aries by making them more visible, allowing users to manipulate them, and rein-forcing their control and involvement in data curation.8.6 DiscussionOur results around centralization and customization show the importance of databoundaries for curation. While the centralized nature of Data Dashboard had sev-eral advantages for curation, it also had the potential for blurring boundaries. Pastwork on centralized personal archives [166] similarly found that storing personal158data in a central place undermines the “different facets of the self,” ignoring dif-ferences between unremarkable and valuable content. Our study suggests that cus-tomization can offset the negative aspects of centralization and better help usersdemarcate different types of content. Customization options, on the other hand,tended to uphold participants’ boundaries. Combining the two approaches is apromising design direction that can balance conflicting user needs. Below, wereflect on some of the evaluation results and outline how to move forward in de-signing curation tools.8.6.1 Reflecting on the feasibility of Data DashboardBefore discussing potential directions inspired by our results, we address a keyquestion: as P19 wondered, is a system such as Data Dashboard feasible? Wouldlimiting its access to metadata be enough to make it viable? Would “the big giants,with their big muscle” allow it? Probably not, because each of them is makingthe same promise to users: enjoy simplicity by having all your data in one place!Our place. Except, as a user you might end up with several places, competing witheach other [288]. At the moment, there is little benefit for cloud storage providersto reduce fragmentation and provide data interoperability, but things might changein the future. For example, Dropbox has recently started to allow its users to create,open or edit Google Docs files from within the Dropbox interface (but without thepermission to move files off Google’s servers) [80]. In turn, Google Drive allowsusers to edit Microsoft Word, Excel, and PowerPoint documents. In theory, theseare steps in the right direction, but similar “integrations” still rely on proprietaryformats and depend on the whims of a small set of companies. Is there anotherway? Yes and there is a precedent: email standard protocols.IMAP, SMTP, and POP3 email protocols allow users to access and manipulateemails from a variety of clients, often leaving data on the original cloud serverwhere it is stored. Users can also aggregate multiple accounts into a single client,easing the burden of fragmentation while still enforcing personal boundaries. Sim-ilar standards should apply to a much broader set of data types outside email. Thecontinued introduction of new email clients year after year also proves that stan-dards do not limit commercial opportunities. Once access and basic actions be-159come standardized, tool creators can focus on innovative functions that addressmore interesting, unexplored user needs and make their product unique. Below, wediscuss some ideas for future developments.8.6.2 Integrating data boundaries into designAn implication from our work is that filtering and sorting data into “chunks” basedon types or other automatic categorizations can be helpful. The examples in DataDashboard suggest that working on algorithms that can filter and recognize differ-ent types of information is a promising direction. While there are previous techni-cal efforts along these lines [16, 64, 261], more work is necessary to address thefunctional design of these mechanisms.Current commercial products for storing and managing personal data also of-fer some form of support for enacting boundaries, but often in a basic way andwith limited scope. For example, cloud storage platforms such as Google Drive,Dropbox, or OneDrive allow users to set sharing settings for documents. But thistype of boundary mechanism is often static, only possible at the level of individualitems, and tends to focus on privacy in the context of sharing data with other peo-ple. Instead, we have seen that data boundaries are dynamic and contextual, notonly about privacy, and most often involve broad mental categories. Similar mech-anisms are also often inconsistent across services and require users to set similarsettings multiple times in different storing places. This limitation highlights thepotential value of a system such as Data Dashboard.Another example of boundary mechanisms comes from email clients such asGmail, that allow users to automatically sort emails into different categories (e.g.,social, promotions, updates) (Figure 8.8). This is a promising idea. However thesecategories are pre-defined and users have little freedom in personalizing them.Overall, we think that support for data boundaries could be more dynamic, ex-pressive, and cohesive.One possibility is to envision more granular mechanisms that integrate databoundaries in their design and combine the key positive aspects of both centraliza-tion and customization. For example, we can imagine users being able to directlyreview and manipulate boundaries at the level of individual items: they could be160Figure 8.8: Gmail allows users to automatically sort emails into pre-definedcategories.looking at a document containing their passport number, and choose how private,important, or relevant they consider that type of information to be. Then, theycould set the boundary to be valid only for a certain amount of time and apply itto similar items or just the single item, set it for one platform or multiple, all froma central control point. An explicit boundary management mechanism could allowusers to better manage breaking points, defining whether a boundary can be bro-ken and in which scenarios (e.g., “keep passport data off the cloud, unless I amtravelling”).Another potential avenue for integrating boundaries into design is to think ofthem as objects that users can share and exchange. The initial premise of exploringcustomization was to see if it would help in the design of a system that can ac-commodate individual user needs. Our results show that this is possible. However,while each person is different, there might be clusters of users who take a similarapproach to data curation. We can envision a way for users to adopt the boundariesof other people, adapting them to suit their own needs. A system similar to DataDashboard could use a similar sharing infrastructure to provide more personalizeddefaults based on user preferences. This could reduce the time required for settingpersonal filters and make the system more appealing to those who are not willingto invest time in customization [120]. Of course, it would be necessary to considerhow to create similar functions in a privacy-preserving way, so that they do notbreak the boundaries they are trying to support.8.6.3 Rethinking the language of personal dataAnother key line of work emerging from our study is about rethinking the languageof personal data management and curation. Personal data curation has relevancebeyond practical actions such as deleting files, moving documents into folders, or161uninstalling mobile applications to free up storage space. Those are concrete ac-tions that need support. But the broader implications of curating data are aboutcontrol over personal information and the consequences of producing, storing, andaccessing data in platforms and devices that are part of a “surveillance” appara-tus [319].The language used to explain where personal data (from documents to loca-tion history) is stored and what happens to it is essential for understanding andcontrolling its use. In our study, we saw that participants sometimes struggledwith some of the terms used in the system, such as “syncing data” or “trackingdata” and pointed to the importance of language and terms for building trust inthe system. These recurring reactions from a varied sample of “not techy” partic-ipants show how it is necessary to work on improving the language of everydayuser interfaces, especially in privacy and security-related scenarios. How can userscurate and control their personal data if they are not aware of what this categorymight include and what common actions in everyday tools do? Past work showshow language inconsistencies are common in popular operating systems for ac-tions as simple as deleting data [113]. We argue that simplifying and unifying thelanguage of personal data is a necessary, basic way of supporting user needs. Do-ing so would be the first step to address the deeper issue of naming the processesbehind personal data in a way that makes people more aware of the mechanismsbehind its storage and handling. As Zuboff says in describing strategies for fight-ing surveillance capitalism, naming is the first step in “confronting and taming theunprecedented.” [203]To help navigate a looming data-driven society [86], privacy and data regula-tions are paramount [1, 265], but design practice has a role to play too. There isan opportunity for future design initiatives to start addressing the underlying gapin personal data literacy. This would involve rethinking common technical termsfor describing personal data, and creating initiatives, both within tools but also out-side them, to build a common vocabulary around personal data. We imagine thisas a community effort where designers and researchers explore how user languageand the grammar of possible actions within data management are intertwined, anidea brought forward by previous work [123]. One option could be to promoteconsistency across operating systems, applications, and tools, creating a grounded,162standard vocabulary that users are familiar with. This could involve a systematicstudy of users’ term that could be then integrated into “personal data standards,”just as there are conventions for common design elements. Another option wouldbe to let users define and teach their own language to the tools through mechanismsthat leverage a link between individual items and categories of data. For example,systems could give users an example item, ask them to categorize it using their ownwords, and then apply this user-generated vocabulary in the interface.8.6.4 Centralization as a matter of perspectiveAnother important thread in our work is the contrast between physical devicesand cloud platforms, with centralization hinting at a tension between local andonline data storage. This tension highlights a gap between users and technologycompanies, with different perceptions of what centralization means. For users,centralization is largely about data access. For companies, centralization is insteadabout data storage.Users tend to perceive local devices as closer to them: after all, personal de-vices used to be the only place where users could access their personal data. Cloudplatforms, instead, feel more peripheral: they have become a key component ofpersonal data ecosystems [288], but because users tend to use more than one, theycan feel fragmented, less tangible, and out of sight [213]. Thus, for users, access-ing data from the cloud is about bringing items back from a set of far away placesto a central device in their hands. This is why the Cloud, to users, can feel decen-tralized. But from a technical standpoint, cloud computing is in fact a centralizingforce [94], pushing all data to be stored not on single devices but on the serversof a few large companies, centralized in a limited number of locations. Thesecompanies then promote their cloud platforms by emphasizing the convenience ofstoring everything into a central place. Except, people’s perception of what is cen-tral is different. This divergence in the promised vs. the actual experience of theCloud reinforces the need for more consistent language around personal data andstronger conventions. It also makes us wonder whether a different paradigm formodern computing is possible.1638.6.5 Envisioning a post-cloud futureCurrent setups force users to go through a third party for accessing data acrossdevices. There is no practical alternative. This dynamic has consequences for per-sonal data boundaries, with tools creating boundaries of their own and imposingspecific structures on data (e.g., having to store files in a Dropbox folder if youwant to access them on more than one device, rather than in a local and more per-sonal structure). But what if we imagined an alternative that emphasizes users’perception of local devices as central, rather than giving priority to the cloud? Forexample, one where interoperability across data formats and storage devices madeit possible to easily synchronize data without the need for a cloud-first structure.The tools and user interfaces would be similar to existing ones, but the functionalparadigm would be different. We imagine that future data infrastructures could relyon local, “social” clouds, maybe with a physical device acting as the central nodeconnecting data links across devices. For example, members of a family, neigh-borhood, or social group could set up a “local cloud” that allows personal datato move freely among known and trusted members’ devices. Items could be syn-chronized to different devices on an individual basis rather than by uploading themon a distant server. The local cloud would take advantage of the multiple devicesfor maximizing storage and preventing data loss through redundancy. A similarapproach could make it possible to make data available across devices, while stillrespecting data boundaries and local structures. These ideas are speculative, butnot entirely new. Before the advent of commercial cloud computing, investigationson peer-to-peer and decentralized technologies for data access and storage werepopular [3, 71, 146, 244]. But then, the Cloud won. Today, the increasingly criticaldiscourse around technology companies, cloud platforms, and privacy might sig-nal a shift in perceptions. Within this evolving landscape, there is renewed interestaround decentralized computing [297], cooperative storage [275], and cross-mediainformation spaces [257]. All together, current efforts and past explorations pointto a promising domain that future design work can help explore [287].1648.7 LimitationsOur sample is meant to be generative and not statistically representative, so it hassome limitations around age and gender balance. We did not screen for genderidentity, but studies on related topics have similar samples [58, 59] and noticeno apparent differences in participants’ attitudes based on gender. We screenedfor age, but we did not get interest from many respondents over 50. Instead, wefocused on the variation in data curation approaches. Recent work [5] hints atdifferences in desired PIM behaviors based on age and gender, providing a com-plementary perspective to our work. Future studies can further complement ourwork with a more representative sample to see if and how our results transfer toa broader population. Future work should also explore quantitative measures ofuser satisfaction for a design approach similar to ours, how to extend our efforts toother stages of data curation (i.e., retrieval), and what might be the psychologicalor logistical effects of spending time curating data (e.g., enjoyment, satisfaction,time saved, more efficient retrieval).8.8 Summary and conclusionStoring, managing, and curating personal data is a challenging process made morecomplex by the mix of devices and cloud platforms that people regularly use. Inthis chapter, we have explored how centralization and customization can supportpeople’s behaviors. We identified how centralization blurs personal data bound-aries, while customization upholds them. Using this specific analysis lens helpedus outline key challenges for designing data curation tools. As our relationshipwith data evolves and likely becomes more complicated, we see taking a step backand reflecting on key design decisions as essential for shaping future products. Ourwork shows that there is great potential for exploring how to design tools that inte-grate data boundaries as a core part of their functionality and provide new mecha-nisms for managing them. We hope our work will provide a foundational startingpoint for innovative tools that can define future paradigms for data management.165Chapter 9ConclusionIn this dissertation, we have presented the results from four interview studies (totalof 64 interviews) and an online survey (349 respondents) on personal data curationbehaviors. We have investigated user decisions around what data to keep or dis-card and characterized individual differences in this space. Starting from an initialspectrum of general tendencies for preserving data (Chapter 4), we have identifiedfive behavioral styles that describe contextual patterns in participants’ curation ap-proaches (Chapter 6), making individual differences actionable for design. Then,we have used these insights to explore alternative design concepts for selecting data(Chapter 7) and a personalized approach to personal data curation (Chapter 8).Below, we first summarise these and a few additional contributions in moredetail. Then, we discuss implications for future work. We conclude with a fewclosing remarks to reflect on our work.9.1 Summary of results and contributionsIn this section, we summarize the key results and contributions of the dissertation,distinguishing between primary and secondary contributions.1669.1.1 Primary research contributionsRich characterization of data preservation tendenciesBefore our investigation, the HCI literature lacked a deep, systematic understandingof how people decide what data to keep or discard. Most of past work relies onthe assumption that people tend to just keep everything. But that was a broadassumption and in this dissertation we show that there can be substantial individualdifferences in how users decide what personal data to keep or discard.Our first key contribution is a rich characterization of general data preservationtendencies and their role for identity construction.In Chapter 4, using 23 interviews with a broad sample, we identify a spectrumof preservation behaviors with two extremes: “hoarding” on one side (where par-ticipants tended to keep most of their data over time) and “minimalism” on theother side (where participants tried to keep as little as possible). We show howthese behaviors are contextual, nuanced, and important for identity construction.Previous studies on personal data and PIM also highlight the tie between dataand identity, but our work shows how this connection can be experienced with op-posing attitudes (“I am my data” vs. “I am more than my data”). This distinction issubtle but key to understanding how different people might approach data curationusing different strategies and approaches.Five behavioral styles that synthesize personal data curation approachesBuilding on top of the “hoarding” and “minimalism” spectrum, we identify a setof five behavioral styles and approaches to personal data curation, using a total of64 interviews from the four main interview studies that make up the dissertation(Chapter 6).The five behavioral styles (Casual, Overwhelmed, Collector, Purger, Frugal)complement previous characterizations of individual differences in PIM researchwith a focus on keeping and discarding decisions. They also represent a significantadvancement of our initial characterization of user behaviors in Chapter 4. Theyenrich our understanding of personal data curation as a practice tied to identitycuration, bringing to light a temporal dimension to the process.167The behavioral styles also represent an actionable and generative resource fordesign that practitioners and researchers can use.Design dimensions and design concepts for supporting data selectionIn Chapter 7 we bring together insights from previous studies and past work ondigital data to explore a design space of possible solutions. We describe four keydesign dimensions that can be a helpful resource to generate design ideas: selectionregime, automation, system aggressiveness, and temporality.Then, we use the dimensions to create a set of five design concepts that mapthe design space and explore alternative approaches for supporting data selectionacross a range of situations and devices (Patina, Data Recommender, TemporaryFolder, Temporary App, Future Filters).These speculative concepts can be a starting point for more comprehensiveproducts, whether in industry or academic research.Our methodological approach of creating video prototypes for the conceptscan also serve as inspiration for future studies that explore uncharted design spaceswhere technical constraints limit possible prototypes.Example of a personalized approach to personal data curationOur work in Chapter 8 combines insights from previous studies into a cohesivesystem, Data Dashboard, that focuses on exploring and evaluating two key designapproaches for personal data curation: centralization and customization.Our results show that this approach can be successful in supporting personaldata curation behaviors. However, it needs to respect and integrate data boundaries(permeable, invisible rules that inform where and how people store personal data)as a first-class object for users to interact with.These insights provide a concrete starting point for building new products andadvancing user support for data curation.168Generative design directions based on empirical workThroughout the dissertation, we propose several generative design directions andopportunities that can help in prioritizing user support, exploring new product con-cepts, and expand the focus of personal data management tools.We see these generative design directions as broader, more open-ended, andless prescriptive than traditional design implications found in many HCI studies.The directions we propose indicate open opportunities but do not necessarily pre-scribe how to implement them [78, 248].At a high level, we propose to tailor technology to different tendencies for datapreservation, finding ways to offset costs inherent at both ends of the hoarding-minimalism spectrum (Chapter 4). We also highlight design opportunities for sup-porting data decluttering by visualizing temporal aspects of data, proactively rec-ommending items to discard, and preventing unwanted data accumulation (Chap-ter 5). Then, we outline ways for prioritizing user support, making curation feelmore engaging, exploring the role of curation for memory, and investigating al-ternative ways of capturing user variation based on the five behavioral styles wepresent in Chapter 6. We follow with directions for centering design work aroundpersonalization, automation, and privacy by exploring safeguarding mechanismsand new data curation actions (Chapter 7). Finally, in Chapter 8 we discuss waysfor integrating data boundaries into design, rethinking the language of personaldata, and envisioning a post-cloud future.These directions can be a starting point for future research and design agen-das, paving the way for innovative data management tools that will help shape theupcoming post-cloud landscape (Chapter 8).9.1.2 Secondary research contributionsReflexive account of our user modeling processOur research process in Chapter 6 can inform similar user modeling efforts in otherdomains. The reflexive account of how the behavioral styles evolved throughoutthe four studies, and how different methods helped us enrich them, can help re-169searchers or practitioners to carry out a similar modeling process for capturingsalient individual differences.Taxonomy of personal data types and decluttering criteriaIn Chapter 5 we build a taxonomy of data types and related decluttering criteria,using an online survey with 349 respondents. We identify six macro-categoriesof data types (Documents, Organization, Communication, Media, System data,Logging data) and map different decluttering criteria to each of the categories.These categories and criteria can be used for designing new tools, scoping re-search studies, or directing algorithm design.Description of temporal decluttering practicesFinally, still in Chapter 5, we identify a set of temporal decluttering practices (rou-tine, serendipitous, triggered) that can inform design and provide support for thebehavioral styles and approaches we present in Chapter 6.9.2 Implications and future workNext, we outline implications for future work, touching on how our results canbe leveraged for User Interface (UI) and product design, HCI research, algorithmdesign, policy and sustainability.9.2.1 Implications for UI and product designA key direction for future work is to build upon our efforts on differentiating, pri-oritizing, and personalizing user support. In particular, we encourage future designwork to explore alternative ways of addressing personalization. For example, theSettings in Data Dashboard provide a first step in this direction, but future workshould test and evaluate different visual approaches for providing customizationoptions that go beyond a set of check boxes.Similarly, a key implication of our work is to keep exploring and pushing newdesign metaphors for conceptualizing personal data. For example, exploring a pos-itive take on curation by bubbling up data at the right moment and building sed-iment into personal items. Or, exploring how concepts such as fossilization and170disintegration can apply to personal data. Bubbling data could involve shifting thefocus from active curation to passive curation, with systems reducing user effort byresurfacing relevant items based on content similarity or context-matching (e.g., aphoto of an event bubbling up in a message conversation when the event is beingdiscussed). Metaphors around sediment, fossilization, and disintegration, instead,could focus on expanding dynamic and temporal qualities of data from a visualstandpoint. As an example, a segment of data could become impossible to edit af-ter a set amount of time to visually signify its outdated relevance, thus fossilizing.Similar metaphors provide a starting point for exciting new interfaces that movebeyond current abstractions, largely tied to office work (as is the case with files andfolders) or technical implementations (as is the case with mobile applications).These ideas, together with the design directions we propose throughout thedissertation, point to the need for rethinking personal data management tools witha long-term perspective.9.2.2 Implications for future HCI researchFrom a research perspective, the key implication for future HCI work is to expandthe focus of our investigation in terms of users, data types, and methods.Our work focuses on personal data curation for individuals. However, a naturalextension of our work would be to look at individual differences in curation in a col-laborative setting, as we mention in Chapter 4. Future studies can use our work toinvestigate how different behavioral styles and approaches influence collaboration,both in work and domestic settings. For example, how do individual differences incuration play a role in small, unstructured groups (e.g., freelance knowledge work-ers, student groups, and so on)? How can collaborative tools better accommodatedifferent management and curation styles? How do similar insights shift when con-sidering shared data in a family context? And what are the implications of curatingdata (or failing to do so) when groups or family ties dissolve?Our work also largely focuses on personal data that people explicitly create oracquire, for example photos, documents, messages, applications. These are typesof data that people are familiar with and accustomed to interact with using estab-lished visual metaphors and paradigms (Chapter 7). But personal data now includes171a growing number of types, with new devices generating more types. Future workcan shift the focus from the types of data we explored to less visible data, suchas audio recordings stored by voice assistants, data stored on social networks, andinteractions recorded within Internet of Things devices.An additional way for future research to leverage and extend our work is to ex-pand our methodological approach. Our user modeling work is purely qualitative,but, as we mention in Chapter 6, future studies can use a quantitative approach totriangulate our results. Our design work in Chapter 7 and Chapter 8, instead, re-lies on prototypes and concepts, but future work could shift the focus to researchproducts (i.e., “inquiry-driven,” polished artifacts meant to be engaged with as theyare) [216], to investigate long-term curation practices through field deployments.We can imagine ideas from some of the concepts in Chapter 7 or the prototype ofData Dashboard (Chapter 8) becoming part of a functional research product thatcan generate new insights about data curation practices over the long-term. Lon-gitudinal studies could help better uncover the effects of engaging in personal datacuration over time and unpack its temporal dimensions.9.2.3 Implications for algorithm designThe design ideas we explore in the dissertation highlight the need for a paralleltechnical effort to make them viable. Data Dashboard in Chapter 8 shows howclustering personal data into automatic categories and types can be helpful, but asimilar approach will only be as good as the algorithms behind it. Current manage-ment systems and platforms can recognize different types of data, but they largelyrely on file extensions to do so. Instead, our work shows that often the content ofdata is the key focus of interest in curation. Improvements in automatic contentclassification and categorization algorithms can go a long way in supporting datacuration.However, given the sensitive nature of much of personal data, a key challengeis to make similar algorithms work entirely offline on local devices. Often onlineplatforms can offer advanced functionalities for recognizing and sorting data be-cause they rely on the cloud infrastructure behind them. But this might not be thebest approach for supporting curation, as participants’ reactions to Data Dashboard172show (Chapter 8). Companies such as Google are exploring how to make machinelearning available locally, entirely independent from cloud infrastructure, to pre-serve user privacy [162]. We believe that technical efforts for supporting personaldata curation should focus on this same challenge.9.2.4 Implications for policy and sustainabilityOur work can also have implications for policy and sustainability.Recent years have seen an increased focus on regulatory efforts for the storingand processing of personal data [1, 265]. However, our work suggests that any po-tential tools and policies in this domain should also address the confusing technicallanguage of personal data that everyday users often struggle with (Chapter 8). Ad-ditional regulatory efforts should also shift focus from storing and retention prac-tices to the algorithms behind advanced functionalities related to personal data. Forexample, they could focus on the algorithms for automatic recognition and catego-rization of personal data that we discuss in the previous section. Yes, we do thinkthat similar algorithms can help, but there is a need for stronger regulation aroundwho can use them and for what purposes.Finally, an indirect but important implication of our work is the need to reflecton the possible environmental effects of storing and preserving large quantitiesof data. We are not Luddites. We do not advocate for a return to the days ofpaper-based personal information management. But storing digital data can requirelarge quantities of energy [141, 199]. Although recent work shows that predictionsabout data centres’ energy use have been largely exaggerated, there is still a need toclosely monitor electricity consumption over the next years and proactively preparefor a growing demand in cloud data storage [185]. Making data storage a carbon-neutral practice is a first step, but often hard to achieve at scale [98, 272]. Asthe climate crisis exacerbates, there will be increasing pressure to ensure that datastoring practices remain sustainable.1739.3 Closing remarksFour years ago, the idea of automatically deleting data seemed extreme, a provoca-tion. Now, it is almost an industry standard for products that collect sensitive datasuch as audio recordings and location history [40, 235].Four years ago, the term “digital hoarding” was only briefly mentioned in amedical case study. Now, there is a growing body of research about keeping anddiscarding data practices, with perspectives from computer science, psychology,information science, and health sciences [171, 205, 206, 222, 253, 271].In the immediate future, people’s personal data will keep growing, with exter-nal actors wanting more and more of it, for uses that might not reflect what peoplewant. For sure, this feels as the right time to take more control of personal data.But then, what to keep of it?174Bibliography[1] Guide to the General Data Protection Regulation, May 2018. URLhttps://www.gov.uk/government/publications/guide-to-the-general-data-protection-regulation. → pages 1,162, 173[2] A. Acquisti, L. Brandimarte, and G. Loewenstein. Privacy and humanbehavior in the age of information. Science, 347(6221):509–514, 2015. →pages 133, 134[3] A. Adya, W. J. Bolosky, M. Castro, G. Cermak, R. Chaiken, J. R. Douceur,J. Howell, J. R. Lorch, M. Theimer, and R. P. Wattenhofer. Farsite:Federated, Available, and Reliable Storage for an Incompletely TrustedEnvironment. SIGOPS Oper. Syst. Rev., 36(SI):1–14, Dec. 2003. ISSN0163-5980. doi:10.1145/844128.844130. → pages 164[4] H. Almuhimedi, S. Wilson, B. Liu, N. Sadeh, and A. Acquisti. Tweets AreForever: A Large-Scale Quantitative Analysis of Deleted Tweets. InProceedings of the 2013 Conference on Computer Supported CooperativeWork, CSCW ’13, page 897–908, New York, NY, USA, 2013. Associationfor Computing Machinery. ISBN 9781450313315.doi:10.1145/2441776.2441878. → pages 17[5] L. Alon and R. Nachmias. Gaps between Actual and Ideal PersonalInformation Management Behavior. Computers in Human Behavior, page106292, 2020. → pages 16, 165[6] M. S. Alvim, M. E. Andre´s, K. Chatzikokolakis, P. Degano, andC. Palamidessi. Differential Privacy: on the trade-off between Utility andInformation Leakage. Mar. 2011.doi:10.1007/978-3-642-29420-4 3. URLhttps://arxiv.org/abs/1103.5188v3. → pages 155175[7] A. N. Antle. Child-Personas: Fact or Fiction? In Proceedings of the 6thConference on Designing Interactive Systems, DIS ’06, page 22–30, NewYork, NY, USA, 2006. Association for Computing Machinery. ISBN1595933670. doi:10.1145/1142405.1142411. → pages 77[8] Apple. How to use Stacks on your Mac, Sept. 2018. URLhttps://support.apple.com/en-us/HT209101. publisher:Apple. → pages 3[9] C. Arthur. Naked celebrity hack: security experts focus on iCloud backuptheory — Technology — The Guardian, Sep 2014. URL https://www.theguardian.com/technology/2014/sep/01/naked-celebrity-hack-icloud-backup-jennifer-lawrence. →pages 4[10] R. Banks, D. Kirk, and A. Sellen. A design perspective on three technologyheirlooms. Human–Computer Interaction, 27(1-2):63–91, 2012. → pages15[11] L. J. Bannon. Forgetting as a feature, not a bug: the duality of memory andimplications for ubiquitous computing. CoDesign, 2(01):3–15, 2006. →pages 2[12] N. Banovic, F. Chevalier, T. Grossman, and G. Fitzmaurice. TriggeringTriggers and Burying Barriers to Customizing Software. In Proceedings ofthe SIGCHI Conference on Human Factors in Computing Systems, CHI’12, page 2717–2726, New York, NY, USA, 2012. Association forComputing Machinery. ISBN 9781450310154.doi:10.1145/2207676.2208666. → pages 134, 135[13] J. Bardram, J. Bunde Pedersen, and M. Soegaard. Support forActivity-based Computing in a Personal Computing Operating System. InProceedings of the SIGCHI Conference on Human Factors in ComputingSystems, CHI ’06, pages 211–220, New York, NY, USA, 2006. ACM.ISBN 1-59593-372-7. doi:10.1145/1124772.1124805. → pages100, 101[14] D. Barreau and B. A. Nardi. Finding and reminding: file organization fromthe desktop. ACM SigChi Bulletin, 27(3):39–43, 1995. → pages 10, 112[15] S. Barth and M. D. De Jong. The privacy paradox–Investigatingdiscrepancies between expressed privacy concerns and actual online176behavior–A systematic literature review. Telematics and Informatics, 34(7):1038–1058, 2017. → pages 133, 134[16] R. Bekkerman. Automatic categorization of email into folders: Benchmarkexperiments on Enron and SRI corpora. 2004. → pages 160[17] R. W. Belk. Possessions and the extended self. Journal of consumerresearch, 15(2):139–168, 1988. → pages 14, 91, 133[18] R. W. Belk. Extended self in a digital world. Journal of ConsumerResearch, 40(3):477–500, 2013. → pages 14, 91[19] V. Bellotti, N. Ducheneaut, M. Howard, and I. Smith. Taking Email toTask: The Design and Evaluation of a Task Management Centered EmailTool. In Proceedings of the SIGCHI Conference on Human Factors inComputing Systems, CHI ’03, pages 345–352, New York, NY, USA, 2003.ACM. ISBN 1-58113-630-7. doi:10.1145/642611.642672. →pages 102, 132, 133[20] D. Ben Menachem. Behavioral Archetypes, Feb. 2016. URLhttps://smashingideas.com/behavioral-archetypes/. →pages 77, 85[21] D. Benyon. Designing interactive systems: A comprehensive guide to HCI,UX and interaction design. Pearson Edinburgh, 2014. → pages 143[22] O. Bergman. The user-subjective approach to personal informationmanagement: from theory to practice. In Human-computer interaction: theagency perspective, pages 55–81. Springer, 2012. → pages 72, 101, 131[23] O. Bergman. Variables for personal information management research. InAslib Proceedings: New Information Perspectives, volume 65, pages464–483. Emerald Group Publishing Limited, 2013. → pages 4, 70, 71, 72,76, 77[24] O. Bergman and S. Whittaker. The Science of Managing Our Digital Stuff.MIT Press, 2016. → pages 3, 14, 15, 16, 72, 73, 99[25] O. Bergman and N. Yanai. Personal Information Retrieval: Smartphonesvs. Computers, Emails vs. Files. Personal Ubiquitous Comput., 22(4):621–632, Aug. 2018. ISSN 1617-4909.doi:10.1007/s00779-017-1101-6. → pages 3, 72177[26] O. Bergman, R. Beyth Marom, and R. Nachmias. The user-subjectiveapproach to personal information management systems. Journal of theAmerican Society for Information Science and Technology, 54(9):872–878,2003. doi:10.1002/asi.10283. → pages 2, 72, 102[27] O. Bergman, R. Boardman, J. Gwizdka, and W. Jones. PersonalInformation Management. In CHI ’04 Extended Abstracts on HumanFactors in Computing Systems, CHI EA ’04, pages 1598–1599, New York,NY, USA, 2004. ACM. ISBN 1-58113-703-6.doi:10.1145/985921.986164. → pages 9[28] O. Bergman, R. Beyth Marom, R. Nachmias, N. Gradovitch, andS. Whittaker. Improved Search Engines and Navigation Preference inPersonal Information Management. ACM Trans. Inf. Syst., 26(4):20:1–20:24, Oct. 2008. ISSN 1046-8188.doi:10.1145/1402256.1402259. → pages 3, 72, 73[29] O. Bergman, S. Tucker, R. Beyth Marom, E. Cutrell, and S. Whittaker. It’sNot That Important: Demoting Personal Information of Low SubjectiveImportance Using GrayArea. In Proceedings of the SIGCHI Conference onHuman Factors in Computing Systems, CHI ’09, pages 269–278, NewYork, NY, USA, 2009. ACM. ISBN 978-1-60558-246-7.doi:10.1145/1518701.1518745. → pages 101, 102, 127, 132[30] O. Bergman, A. Komninos, D. Liarokapis, and J. Clarke. You Never Call:Demoting Unused Contacts on Mobile Phones Using DMTR. PersonalUbiquitous Comput., 16(6):757–766, Aug. 2012. ISSN 1617-4909.doi:10.1007/s00779-011-0411-3. → pages 101, 102, 132[31] O. Bergman, O. Elyada, N. Dvir, Y. Vaitzman, and A. B. Ami. Spotting theLatest Version of a File with Old’nGray. Interacting with Computers, 27(6):630–639, 2015. → pages 101, 102, 127[32] O. Bergman, S. Tucker, and S. Dahamshy. The effect of demotingnear-duplicate pictures. Proceedings of the Association for InformationScience and Technology, 55(1):755–756, 2018. → pages 132[33] O. Bergman, T. Israeli, and S. Whittaker. Search is the future? The youngsearch less for files. Proceedings of the Association for InformationScience and Technology, 56(1):360–363, 2019. → pages 72178[34] O. Bergman, S. Whittaker, and Y. Frishman. Let’s get personal: the littlenudge that improves document retrieval in the Cloud. Journal ofDocumentation, 75(2):379–396, 2019. → pages 132[35] J. D. Biersdorfer. New Ways to Delete Old Files - The New York Times,May 2018. URLhttps://www.nytimes.com/2018/05/17/technology/personaltech/new-ways-to-delete-old-files.html. →pages 66[36] J. Blom. Personalization: A Taxonomy. In CHI ’00 Extended Abstracts onHuman Factors in Computing Systems, CHI EA ’00, page 313–314, NewYork, NY, USA, 2000. Association for Computing Machinery. ISBN1581132484. doi:10.1145/633292.633483. → pages 134[37] A. Blomquist and M. Arvola. Personas in Action: Ethnography in anInteraction Design Team. In Proceedings of the Second Nordic Conferenceon Human-computer Interaction, NordiCHI ’02, pages 197–200, NewYork, NY, USA, 2002. ACM. ISBN 1-58113-616-1.doi:10.1145/572020.572044. → pages 77[38] R. Boardman and M. A. Sasse. ”Stuff Goes into the Computer andDoesn’T Come out”: A Cross-tool Study of Personal InformationManagement. In Proceedings of the SIGCHI Conference on HumanFactors in Computing Systems, CHI ’04, pages 583–590, New York, NY,USA, 2004. ACM. ISBN 1-58113-702-8.doi:10.1145/985692.985766. → pages 2, 10, 15, 16, 72[39] S. Bowen and D. Petrelli. Remembering today tomorrow: Exploring thehuman-centred design of digital mementos. International Journal ofHuman-Computer Studies, 69(5):324–337, 2011. → pages 15[40] R. Brandom. Amazon pushes Alexa privacy with new delete options, Sept.2019. URL https://www.theverge.com/2019/9/25/20883745/amazon-alexa-privacy-hub-security-voice-recordings-echo-devices. → pages 174[41] V. Braun and V. Clarke. Reflecting on reflexive thematic analysis.Qualitative Research in Sport, Exercise and Health, 11(4):589–597, 2019.→ pages 21, 25[42] R. N. Brewer, M. R. Morris, and S. E. Lindley. How to Remember What toRemember: Exploring Possibilities for Digital Reminder Systems. Proc.179ACM Interact. Mob. Wearable Ubiquitous Technol., 1(3):38:1–38:20, Sept.2017. ISSN 2474-9567. doi:10.1145/3130903. → pages 103[43] M. Broekhuijsen, E. van den Hoven, and P. Markopoulos. FromPhotoWork to PhotoUse: exploring personal digital photo activities.Behaviour & Information Technology, 36(7):754–767, 2017. → pages 14[44] M. J. Broekhuijsen. Curation-in-Action: design for photo curation tosupport shared remembering. PhD thesis, 2018. → pages 14[45] M. Buchenau and J. F. Suri. Experience Prototyping. In Proceedings of the3rd Conference on Designing Interactive Systems: Processes, Practices,Methods, and Techniques, DIS ’00, pages 424–433, New York, NY, USA,2000. ACM. ISBN 1-58113-219-0. doi:10.1145/347642.347802.→ pages 20, 106[46] A. Bunt, C. Conati, and J. McGrenere. Supporting Interface CustomizationUsing a Mixed-initiative Approach. In Proceedings of the 12thInternational Conference on Intelligent User Interfaces, IUI ’07, pages92–101, New York, NY, USA, 2007. ACM. ISBN 1-59593-481-2.doi:10.1145/1216295.1216317. → pages 109, 134, 135[47] C. Cadwalladr and E. Graham Harrison. Revealed: 50 million Facebookprofiles harvested for Cambridge Analytica in major data breach — TheGuardian, March 2018. URL https://www.theguardian.com/news/2018/mar/17/cambridge-analytica-facebook-influence-us-election. → pages 127[48] A. Calero Valdez and M. Ziefle. The users’ perspective on theprivacy-utility trade-offs in health recommender systems. InternationalJournal of Human-Computer Studies, 121:108–121, Jan. 2019. ISSN10715819. doi:10.1016/j.ijhcs.2018.04.003. URLhttps://linkinghub.elsevier.com/retrieve/pii/S1071581918301642. → pages 155[49] R. Capra, J. Khanova, and S. Ramdeen. Work and personal e-mail use byuniversity employees: PIM practices across domain boundaries. Journal ofthe American Society for Information Science and Technology, 64(5):1029–1044, 2013. → pages 133[50] N. Carr. I am a data factory (and so are you), 2018. URLhttp://www.roughtype.com/?p=8394. → pages 2180[51] M. E. Cecchinato, A. L. Cox, and J. Bird. Working 9-5? ProfessionalDifferences in Email and Boundary Management Practices. In Proceedingsof the 33rd Annual ACM Conference on Human Factors in ComputingSystems, CHI ’15, page 3989–3998, New York, NY, USA, 2015.Association for Computing Machinery. ISBN 9781450331456.doi:10.1145/2702123.2702537. → pages 133[52] S. Cerf, V. Primault, A. Boutet, S. B. Mokhtar, R. Birke, S. Bouchenak,L. Y. Chen, N. Marchand, and B. Robu. PULP: Achieving Privacy andUtility Trade-Off in User Mobility Data. In 2017 IEEE 36th Symposium onReliable Distributed Systems (SRDS), pages 164–173, Sept. 2017.doi:10.1109/SRDS.2017.25. → pages 155[53] C. N. Chapman and R. P. Milham. The personas’ new clothes:methodological and practical arguments against a popular method. InProceedings of the human factors and ergonomics society annual meeting,volume 50, pages 634–636. SAGE Publications Sage CA: Los Angeles,CA, 2006. → pages 93[54] K. Charmaz. The search for Meanings–Grounded Theory. In. Smith JA,Harre R., & Van Langenhove L.(Eds.), Rethinking Methods in Psychology(pp. 27–49), 1996. → pages 20[55] K. Charmaz. Constructionism and the grounded theory method. Handbookof constructionist research, 1:397–412, 2008. → pages 20[56] K. Charmaz. Constructing grounded theory. Sage, 2014. → pages 19, 20,25, 34, 55, 56, 73, 76[57] A. Y. S. Chen, W. Odom, C. Zhong, H. Lin, and T. Amram. Chronoscope:Designing Temporally Diverse Interactions with Personal Digital PhotoCollections. In Proceedings of the 2019 on Designing Interactive SystemsConference, DIS ’19, page 799–812, New York, NY, USA, 2019.Association for Computing Machinery. ISBN 9781450358507.doi:10.1145/3322276.3322301. → pages 95[58] E. Cheon and N. M. Su. “Staged for Living”: Negotiating Objects andTheir Values over a Porous Boundary. Proc. ACM Hum.-Comput. Interact.,2(CSCW), Nov. 2018. doi:10.1145/3274305. → pages 55, 133, 165[59] E. Cheon and N. M. Su. The Value of Empty Space for Design. InProceedings of the 2018 CHI Conference on Human Factors in ComputingSystems, CHI ’18, pages 49:1–49:13, New York, NY, USA, 2018. ACM.181ISBN 978-1-4503-5620-6. doi:10.1145/3173574.3173623. →pages 15, 55, 165[60] H. Cherrier and J. B. Murray. Reflexive dispossession and the self:constructing a processual theory of identity. Consumption Markets &Culture, 10(1):1–29, 2007. → pages 15[61] V. Clarke and V. Braun. Thematic analysis. In Encyclopedia of criticalpsychology, pages 1947–1952. Springer, 2014. → pages 21, 32, 76, 116,148[62] V. Clarke and V. Braun. Questions about thematic analysis - The Universityof Auckland, n.d. URL https://www.psych.auckland.ac.nz/en/about/our-research/research-groups/thematic-analysis/frequently-asked-questions-8.html. → pages32, 34[63] M. Crotty. The foundations of social research: Meaning and perspective inthe research process. Sage, 1998. → pages 22[64] G. Cselle, K. Albrecht, and R. Wattenhofer. BuzzTrack: topic detectionand tracking in email. In Proceedings of the 12th international conferenceon Intelligent user interfaces, pages 190–197. ACM, 2007. → pages 160[65] M. Csikszentmihalyi and E. Halton. The meaning of things: Domesticsymbols and the self. Cambridge University Press, 1981. → pages 14, 91[66] A. L. Cushing. Self extension and the desire to preserve digitalpossessions. Proceedings of the American Society for Information Scienceand Technology, 48(1):1–3, 2011. → pages 14, 29, 47, 91[67] A. L. Cushing. Possessions and self extension in digital environments:Implications for maintaining personal information. PhD thesis, TheUniversity of North Carolina at Chapel Hill, 2012. → pages[68] A. L. Cushing. ”It’s stuff that speaks to me”: Exploring the characteristicsof digital possessions. Journal of the Association for Information Scienceand Technology, 64(8):1723–1734, 2013. → pages 14, 15, 29, 47, 91[69] E. Cutrell, D. Robbins, S. Dumais, and R. Sarin. Fast, Flexible Filteringwith Phlat. In Proceedings of the SIGCHI Conference on Human Factors inComputing Systems, CHI ’06, pages 261–270, New York, NY, USA, 2006.ACM. ISBN 1-59593-372-7. doi:10.1145/1124772.1124812. →pages 101, 102, 132, 140182[70] K. Cutting and E. Hedenborg. Can Personas Speak? Biopolitics in DesignProcesses. In Companion Publication of the 2019 on Designing InteractiveSystems Conference 2019 Companion, DIS ’19 Companion, page 153–157,New York, NY, USA, 2019. Association for Computing Machinery. ISBN9781450362702. doi:10.1145/3301019.3323911. → pages 93[71] F. Dabek, M. F. Kaashoek, D. Karger, R. Morris, and I. Stoica. Wide-AreaCooperative Storage with CFS. SIGOPS Oper. Syst. Rev., 35(5):202–215,Oct. 2001. ISSN 0163-5980. doi:10.1145/502059.502054. →pages 164[72] S. Davidoff, M. K. Lee, A. K. Dey, and J. Zimmerman. Rapidly exploringapplication design through speed dating. In International Conference onUbiquitous Computing, pages 429–446. Springer, 2007. → pages 20, 106[73] J. D. Dinneen and C.-A. Julien. What’s in people’s digital file collections?Proceedings of the Association for Information Science and Technology, 56(1):68–77, 2019. → pages 2, 144[74] J. D. Dinneen and C.-A. Julien. The ubiquitous digital file: A review of filemanagement research. Journal of the Association for Information Scienceand Technology, 2019. → pages 10[75] J. D. Dinneen, F. Odoni, I. Frissen, and C.-A. Julien. Cardinal: Novelsoftware for studying file management behavior. In Proceedings of the 79thASIS&T Annual Meeting: Creating Knowledge, Enhancing Lives throughInformation & Technology, page 62. American Society for InformationScience, 2016. → pages 140[76] J. D. Dinneen, C.-A. Julien, and I. Frissen. The Scale and Structure ofPersonal File Collections. In Proceedings of the 2019 CHI Conference onHuman Factors in Computing Systems, CHI ’19, pages 327:1–327:12, NewYork, NY, USA, 2019. ACM. ISBN 978-1-4503-5970-2.doi:10.1145/3290605.3300557. → pages 2, 144[77] D. Dion, O. Sabri, and V. Guillard. Home sweet messy home: Managingsymbolic pollution. Journal of Consumer Research, 41(3):565–589, 2014.→ pages 133[78] P. Dourish. Implications for Design. In Proceedings of the SIGCHIConference on Human Factors in Computing Systems, CHI ’06, page541–550, New York, NY, USA, 2006. Association for Computing183Machinery. ISBN 1595933727. doi:10.1145/1124772.1124855. →pages 169[79] P. Dourish, W. K. Edwards, A. LaMarca, and M. Salisbury. Presto: AnExperimental Architecture for Fluid Interactive Document Spaces. ACMTrans. Comput.-Hum. Interact., 6(2):133–161, June 1999. ISSN1073-0516. doi:10.1145/319091.319099. → pages 101, 102, 132[80] Dropbox. How to create and share Google Docs, Sheets, and Slides inDropbox, NA. URL https://help.dropbox.com/installs-integrations/third-party/create-google-docs. → pages159[81] Dropbox. How to use selective sync, NA. URLhttps://help.dropbox.com/installs-integrations/sync-uploads/selective-sync-overview. → pages 140[82] S. Dumais, E. Cutrell, J. J. Cadiz, G. Jancke, R. Sarin, and D. C. Robbins.Stuff I’Ve Seen: A System for Personal Information Retrieval and Re-Use.SIGIR Forum, 49(2):28–35, Jan. 2016. ISSN 0163-5840.doi:10.1145/2888422.2888425. → pages 101, 102, 132, 140[83] J. L. Dupree, R. Devries, D. M. Berry, and E. Lank. Privacy Personas:Clustering Users via Attitudes and Behaviors toward Security Practices. InProceedings of the 2016 CHI Conference on Human Factors in ComputingSystems, CHI ’16, page 5228–5239, New York, NY, USA, 2016.Association for Computing Machinery. ISBN 9781450333627.doi:10.1145/2858036.2858214. → pages 77[84] A. K. Dutta and R. Hasan. How much does storage really cost? Towards afull cost accounting model for data storage. In International Conference onGrid Economics and Business Models, pages 29–43. Springer, 2013. →pages 2[85] C. Elsden, D. S. Kirk, and A. C. Durrant. A quantified past: Toward designfor remembering with personal informatics. Human–Computer Interaction,31(6):518–557, 2016. → pages 2, 11, 59[86] C. Elsden, M. Selby, A. Durrant, and D. Kirk. Fitter, happier, moreproductive: what to ask of a data-driven life. Interactions, 23(5):45–45,2016. → pages 162184[87] D. A. Epstein, A. Ping, J. Fogarty, and S. A. Munson. A Lived InformaticsModel of Personal Informatics. In Proceedings of the 2015 ACMInternational Joint Conference on Pervasive and Ubiquitous Computing,UbiComp ’15, pages 731–742, New York, NY, USA, 2015. ACM. ISBN978-1-4503-3574-4. doi:10.1145/2750858.2804250. → pages 11,59[88] S. Faily and I. Flechais. Persona Cases: A Technique for GroundingPersonas. In Proceedings of the SIGCHI Conference on Human Factors inComputing Systems, CHI ’11, page 2267–2270, New York, NY, USA,2011. Association for Computing Machinery. ISBN 9781450302289.doi:10.1145/1978942.1979274. → pages 93[89] M. Feinberg, G. Geisler, E. Whitworth, and E. Clark. UnderstandingPersonal Digital Collections: An Interdisciplinary Exploration. InProceedings of the Designing Interactive Systems Conference, DIS ’12,page 200–209, New York, NY, USA, 2012. Association for ComputingMachinery. ISBN 9781450312103.doi:10.1145/2317956.2317988. → pages 14[90] S. Fertig, E. Freeman, and D. Gelernter. Lifestreams: An Alternative to theDesktop Metaphor. In Conference Companion on Human Factors inComputing Systems, CHI ’96, pages 410–411, New York, NY, USA, 1996.ACM. ISBN 0-89791-832-0. doi:10.1145/257089.257404. →pages 101, 102, 132[91] L. Findlater and J. McGrenere. A Comparison of Static, Adaptive, andAdaptable Menus. In Proceedings of the SIGCHI Conference on HumanFactors in Computing Systems, CHI ’04, pages 89–96, New York, NY,USA, 2004. ACM. ISBN 1-58113-702-8.doi:10.1145/985692.985704. → pages 102[92] L. Findlater and J. McGrenere. Beyond performance: Feature awareness inpersonalized interfaces. International Journal of Human-ComputerStudies, 68(3):121–137, 2010. → pages 135[93] S. Fitchett, A. Cockburn, and C. Gutwin. Finder Highlights: FieldEvaluation and Design of an Augmented File Browser. In Proceedings ofthe SIGCHI Conference on Human Factors in Computing Systems, CHI’14, pages 3685–3694, New York, NY, USA, 2014. ACM. ISBN978-1-4503-2473-1. doi:10.1145/2556288.2557014. → pages 101,102185[94] A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson,A. Rabkin, I. Stoica, et al. Above the clouds: A berkeley view of cloudcomputing. Dept. Electrical Eng. and Comput. Sciences, University ofCalifornia, Berkeley, Rep. UCB/EECS, 28(13):2009, 2009. → pages 163[95] R. O. Frost, G. Steketee, and J. Grisham. Measurement of compulsivehoarding: saving inventory-revised. Behaviour Research and Therapy, 42(10):1163–1182, 2004. → pages 30[96] W. Gaver. What Should We Expect from Research through Design? InProceedings of the SIGCHI Conference on Human Factors in ComputingSystems, CHI ’12, page 937–946, New York, NY, USA, 2012. Associationfor Computing Machinery. ISBN 9781450310154.doi:10.1145/2207676.2208538. → pages 20[97] R. Geambasu, T. Kohno, A. A. Levy, and H. M. Levy. Vanish: IncreasingData Privacy with Self-Destructing Data. In USENIX Security Symposium,volume 316, 2009. → pages 67, 101, 102[98] M. Geuss. Five graphics from Google show how carbon-intensive its datacenters really are, Oct. 2018. URLhttps://arstechnica.com/information-technology/2018/10/googles-data-center-carbon-heat-maps-show-the-challenges-of-going-carbon-free/. → pages173[99] E. Giaccardi, E. Karana, H. Robbins, and P. D’Olivo. Growing Traces onObjects of Daily Use: A Product Design Perspective for HCI. InProceedings of the 2014 Conference on Designing Interactive Systems, DIS’14, pages 473–482, New York, NY, USA, 2014. ACM. ISBN978-1-4503-2902-6. doi:10.1145/2598510.2602964. → pages 66,108[100] S. Gibbons. Nielsen Norman Group: UX Research, Training, andConsulting, Jan. 2018. URLhttps://www.nngroup.com/articles/empathy-mapping/.→ pages 77[101] G. Gibbs. A Discussion with Prof Kathy Charmaz on Grounded Theory,Feb. 2015. URLhttps://www.youtube.com/watch?v=D5AHmHQS6WQ. → pages25186[102] A. Girgensohn, S. A. Bly, F. Shipman, J. S. Boreczky, and L. Wilcox.Home Video Editing Made Easy-Balancing Automation and User Control.In INTERACT, volume 1, pages 464–471, 2001. → pages 102[103] B. G. Glaser and A. L. Strauss. Grounded theory: Strategies for qualitativeresearch. Chicago, lL: Aldine Publishing Company, 1967. → pages 20[104] C. Golsteijn, E. van den Hoven, D. Frohlich, and A. Sellen. Towards aMore Cherishable Digital Object. In Proceedings of the DesigningInteractive Systems Conference, DIS ’12, pages 655–664, New York, NY,USA, 2012. ACM. ISBN 978-1-4503-1210-3.doi:10.1145/2317956.2318054. → pages 15, 36[105] C. J. Gormley and S. J. Gormley. Data hoarding and Information Clutter:The Impact on Cost, Life Span of Data, Effectivness, Sharing, Productivity,and Knowledge Management Culture. Issues in Information Systems, 13:90–95, 2012. → pages 30[106] S. Greenberg and I. H. Witten. Adaptive personalized interfaces—Aquestion of viability. Behaviour & Information Technology, 4(1):31–45,1985. → pages 135[107] G. Greenwald and E. MacAskill. NSA Prism program taps in to user dataof Apple, Google and others — The Guardian, June 2013. URLhttps://www.theguardian.com/world/2013/jun/06/us-tech-giants-nsa-data. → pages 127[108] J. Gruning. Displaying Invisible Objects: Why People Rarely Re-readE-books. In Proceedings of the 2018 CHI Conference on Human Factors inComputing Systems, CHI ’18, pages 139:1–139:12, New York, NY, USA,2018. ACM. ISBN 978-1-4503-5620-6.doi:10.1145/3173574.3173713. → pages 133[109] J. Gruning and S. Lindley. Things We Own Together: Sharing Possessionsat Home. In Proceedings of the 2016 CHI Conference on Human Factorsin Computing Systems, CHI ’16, pages 1176–1186, New York, NY, USA,2016. ACM. ISBN 978-1-4503-3362-7.doi:10.1145/2858036.2858154. → pages 64[110] E. G. Guba. Criteria for assessing the trustworthiness of naturalisticinquiries. Ectj, 29(2):75, 1981. → pages 22, 23, 96187[111] R. Gulotta, W. Odom, J. Forlizzi, and H. Faste. Digital Artifacts AsLegacy: Exploring the Lifespan and Value of Digital Data. In Proceedingsof the SIGCHI Conference on Human Factors in Computing Systems, CHI’13, pages 1813–1822, New York, NY, USA, 2013. ACM. ISBN978-1-4503-1899-0. doi:10.1145/2470654.2466240. → pages 2,10, 14, 15, 65, 99, 124, 133, 143, 144[112] R. Gulotta, A. Sciuto, A. Kelliher, and J. Forlizzi. Curatorial Agents: HowSystems Shape Our Understanding of Personal and Familial DigitalInformation. In Proceedings of the 33rd Annual ACM Conference onHuman Factors in Computing Systems, CHI ’15, pages 3453–3462, NewYork, NY, USA, 2015. ACM. ISBN 978-1-4503-3145-6.doi:10.1145/2702123.2702297. → pages 14, 102, 104, 133, 143[113] A. Gutmann and M. Warner. Fight to Be Forgotten: Exploring the Efficacyof Data Erasure in Popular Operating Systems. In Annual Privacy Forum,pages 45–58. Springer, 2019. → pages 162[114] J. Gwizdka. Timely Reminders: A Case Study of Temporal Guidance inPIM and Email Tools Usage. In CHI ’00 Extended Abstracts on HumanFactors in Computing Systems, CHI EA ’00, pages 163–164, New York,NY, USA, 2000. ACM. ISBN 1-58113-248-4.doi:10.1145/633292.633383. → pages 103[115] J. Gwizdka. Supporting Prospective Information in Email. In CHI ’01Extended Abstracts on Human Factors in Computing Systems, CHI EA ’01,pages 135–136, New York, NY, USA, 2001. ACM. ISBN 1-58113-340-5.doi:10.1145/634067.634150. → pages[116] J. Gwizdka. Reinventing the Inbox: Supporting the Management ofPending Tasks in Email. In CHI ’02 Extended Abstracts on Human Factorsin Computing Systems, CHI EA ’02, pages 550–551, New York, NY, USA,2002. ACM. ISBN 1-58113-454-1. doi:10.1145/506443.506476.→ pages 103[117] J. Gwizdka. Email Task Management Styles: The Cleaners and theKeepers. In CHI ’04 Extended Abstracts on Human Factors in ComputingSystems, CHI EA ’04, pages 1235–1238, New York, NY, USA, 2004.ACM. ISBN 1-58113-703-6. doi:10.1145/985921.986032. →pages 3, 10, 29, 48, 72[118] J. Gwizdka and M. Chignell. Individual Differences. In W. Jones andJ. Teevan, editors, Personal Information Management, chapter 12, pages188206–220. University of Washington Press, Seattle and London, 2007. →pages 4, 70, 72[119] M. Haraty and J. McGrenere. Designing for Advanced Personalization inPersonal Task Management. In Proceedings of the 2016 ACM Conferenceon Designing Interactive Systems, DIS ’16, page 239–250, New York, NY,USA, 2016. Association for Computing Machinery. ISBN 9781450340311.doi:10.1145/2901790.2901805. → pages 132, 134, 135, 142[120] M. Haraty, J. McGrenere, and A. Bunt. Online Customization SharingEcosystems: Components, Roles, and Motivations. In Proceedings of the2017 ACM Conference on Computer Supported Cooperative Work andSocial Computing, CSCW ’17, page 2359–2371, New York, NY, USA,2017. Association for Computing Machinery. ISBN 9781450343350.doi:10.1145/2998181.2998289. → pages 161[121] M. Haraty, Z. Wang, H. Wang, S. Iqbal, and J. Teevan. Design and in-situevaluation of a mixed-initiative approach to information organization.Journal of the Association for Information Science and Technology, 68(9):2211–2224, 2017. → pages 134, 135[122] R. Harper and W. Odom. Trusting oneself: an anthropology of digitalthings and personal competence. Trust, Computing, and Society, page 272,2014. → pages 15[123] R. Harper, S. Lindley, E. Thereska, R. Banks, P. Gosset, G. Smyth,W. Odom, and E. Whitworth. What is a File? In Proceedings of the 2013Conference on Computer Supported Cooperative Work, CSCW ’13, pages1125–1136, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1331-5.doi:10.1145/2441776.2441903. → pages 10, 126, 162[124] S. Harrison, D. Tatar, and P. Sengers. The three paradigms of HCI. In Alt.Chi. Session at the SIGCHI Conference on Human Factors in ComputingSystems San Jose, California, USA, pages 1–18, 2007. → pages 34[125] B. Hecht, L. Wilcox, J. Bigham, J. Scho¨ning, E. Hoque, J. Ernst, Y. Bisk,L. De Russis, L. Yarosh, B. Anjum, D. Contractor, and C. Wu. It’s Time toDo Something: Mitigating the Negative Impacts of Computing Through aChange to the Peer Review Process, March 2018. URLhttps://acm-fca.org/2018/03/29/negativeimpacts/. →pages 128189[126] S. Henderson. How do people manage their documents?: an empiricalinvestigation into personal document management practices amongknowledge workers. PhD thesis, ResearchSpace@ Auckland, 2009. →pages 30, 48[127] S. Henderson. Personal Document Management Strategies. In Proceedingsof the 10th International Conference NZ Chapter of the ACM’s SpecialInterest Group on Human-Computer Interaction, CHINZ ’09, pages 69–76,New York, NY, USA, 2009. ACM. ISBN 978-1-60558-574-1.doi:10.1145/1577782.1577795. → pages 3, 30, 48[128] S. Henderson and A. Srinivasan. Filing, piling & structuring: strategies forpersonal document management. In System Sciences (HICSS), 2011 44thHawaii International Conference on, pages 1–10. IEEE, 2011. → pages 72[129] L. A. Henkel. Point-and-shoot memories: The influence of taking photoson memory for a museum tour. Psychological science, 25(2):396–402,2014. → pages 49[130] D. Herron, W. Moncur, and E. van den Hoven. Digital Decoupling andDisentangling: Towards Design for Romantic Break Up. In Proceedings ofthe 2017 Conference on Designing Interactive Systems, DIS ’17, page1175–1185, New York, NY, USA, 2017. Association for ComputingMachinery. ISBN 9781450349222.doi:10.1145/3064663.3064765. → pages 15[131] C. G. Hill, M. Haag, A. Oleson, C. Mendez, N. Marsden, A. Sarma, andM. Burnett. Gender-Inclusiveness Personas vs. Stereotyping: Can WeHave It Both Ways? In Proceedings of the 2017 CHI Conference onHuman Factors in Computing Systems, CHI ’17, page 6658–6671, NewYork, NY, USA, 2017. Association for Computing Machinery. ISBN9781450346559. doi:10.1145/3025453.3025609. → pages 93[132] W. C. Hill, J. D. Hollan, D. Wroblewski, and T. McCandless. Edit wear andread wear. In Proceedings of the SIGCHI conference on Human factors incomputing systems, pages 3–9. ACM, 1992. → pages 66[133] V. Hollis, A. Konrad, A. Springer, M. Antoun, C. Antoun, R. Martin, andS. Whittaker. What does all this data mean for my future mood?Actionable Analytics and Targeted Reflection for Emotional Well-Being.Human–Computer Interaction, 32(5-6):208–267, 2017. → pages 2190[134] E. Horvitz. Principles of Mixed-initiative User Interfaces. In Proceedingsof the SIGCHI Conference on Human Factors in Computing Systems, CHI’99, pages 159–166, New York, NY, USA, 1999. ACM. ISBN0-201-48559-1. doi:10.1145/302979.303030. → pages 109, 134[135] A. Hsu. How Forgetting Might Make Us Smarter - NPR, June 2017. URLhttp://www.npr.org/sections/health-shots/2017/06/23/534001592/could-the-best-memory-system-be-one-that-forgets. → pages 49[136] J. Huh, M. Pollack, H. Katebi, K. Sakallah, and N. Kirsch. IncorporatingUser Control in Automated Interactive Scheduling Systems. InProceedings of the 8th ACM Conference on Designing Interactive Systems,DIS ’10, pages 306–309, New York, NY, USA, 2010. ACM. ISBN978-1-4503-0103-9. doi:10.1145/1858171.1858226. → pages 102[137] A. Hurst, J. Mankoff, A. K. Dey, and S. E. Hudson. Dirty Desktops: Usinga Patina of Magnetic Mouse Dust to Make Common Interactor TargetsEasier to Select. In Proceedings of the 20th Annual ACM Symposium onUser Interface Software and Technology, UIST ’07, pages 183–186, NewYork, NY, USA, 2007. ACM. ISBN 978-1-59593-679-0.doi:10.1145/1294211.1294242. → pages 108[138] I. Ion, N. Sachdeva, P. Kumaraguru, and S. Cˇapkun. Home is Safer Thanthe Cloud!: Privacy Concerns for Consumer Cloud Storage. In Proceedingsof the Seventh Symposium on Usable Privacy and Security, SOUPS ’11,pages 13:1–13:20, New York, NY, USA, 2011. ACM. ISBN978-1-4503-0911-0. doi:10.1145/2078827.2078845. → pages 28[139] R. Jenkins. Categorization: Identity, social process and epistemology.Current sociology, 48(3):7–25, 2000. → pages 91[140] J. Jones and M. S. Ackerman. Curating an Infinite Basement:Understanding How People Manage Collections of Sentimental Artifacts.In Proceedings of the 19th International Conference on Supporting GroupWork, GROUP ’16, pages 87–97, New York, NY, USA, 2016. ACM. ISBN978-1-4503-4276-6. doi:10.1145/2957276.2957316. → pages 14,15, 57, 64, 91, 102, 103, 104, 133[141] N. Jones. How to stop data centres from gobbling up the world’s electricity.Nature, 561:163–166, Sept. 2018.doi:10.1038/d41586-018-06610-y. URL191https://www.nature.com/articles/d41586-018-06610-y.→ pages 173[142] W. Jones. Personal information management. Annual review of informationscience and technology, 41(1):453–504, 2007. → pages 9, 102[143] W. Jones, P. Klasnja, A. Civan, and M. L. Adcock. The Personal ProjectPlanner: Planning to Organize Personal Information. In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems, CHI ’08,pages 681–684, New York, NY, USA, 2008. ACM. ISBN978-1-60558-011-1. doi:10.1145/1357054.1357162. → pages 101,102, 132, 140, 153[144] W. Jones, V. Bellotti, R. Capra, J. D. Dinneen, G. Mark, C. Marshall,K. Moffatt, J. Teevan, and M. Van Kleek. For Richer, for Poorer, inSickness or in Health...: The Long-Term Management of PersonalInformation. In Proceedings of the 2016 CHI Conference ExtendedAbstracts on Human Factors in Computing Systems, CHI EA ’16, pages3508–3515, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4082-3.doi:10.1145/2851581.2856481. → pages 2, 69, 102[145] D. Kahneman and A. Tversky. Prospect theory: An analysis of decisionunder risk. In Handbook of the fundamentals of financial decision making:Part I, pages 99–127. World Scientific, 2013. → pages 3, 15, 126[146] M. Kallahalla, E. Riedel, R. Swaminathan, Q. Wang, and K. Fu. Plutus:Scalable Secure File Sharing on Untrusted Storage. In Proceedings of the2nd USENIX Conference on File and Storage Technologies, FAST ’03,page 29–42, USA, 2003. USENIX Association. → pages 164[147] V. Kalnikaite˙ and S. Whittaker. A saunter down memory lane: Digitalreflection on personal mementos. International Journal ofHuman-Computer Studies, 69(5):298–310, 2011. → pages 15[148] V. Kalnikaite, A. Sellen, S. Whittaker, and D. Kirk. Now Let Me SeeWhere i Was: Understanding How Lifelogs Mediate Memory. InProceedings of the SIGCHI Conference on Human Factors in ComputingSystems, CHI ’10, page 2045–2054, New York, NY, USA, 2010.Association for Computing Machinery. ISBN 9781605589299.doi:10.1145/1753326.1753638. → pages 11[149] J. J. Kaye, J. Vertesi, S. Avery, A. Dafoe, S. David, L. Onaga, I. Rosero,and T. Pinch. To Have and to Hold: Exploring the Personal Archive. In192Proceedings of the SIGCHI Conference on Human Factors in ComputingSystems, CHI ’06, pages 275–284, New York, NY, USA, 2006. ACM.ISBN 1-59593-372-7. doi:10.1145/1124772.1124814. → pages14, 16, 29, 30, 47, 49, 91, 133[150] M. T. Khan, M. Hyun, C. Kanich, and B. Ur. Forgotten But Not Gone:Identifying the Need for Longitudinal Data Management in Cloud Storage.In Proceedings of the 2018 CHI Conference on Human Factors inComputing Systems, CHI ’18, pages 543:1–543:12, New York, NY, USA,2018. ACM. ISBN 978-1-4503-5620-6.doi:10.1145/3173574.3174117. → pages 4, 16, 17, 65, 70, 99, 131[151] S. Kim. Personal digital archives: preservation of documents, preservationof self. PhD thesis, 2013. → pages 16, 62, 64, 65, 95, 103[152] D. Kirk, A. Sellen, C. Rother, and K. Wood. Understanding Photowork. InProceedings of the SIGCHI Conference on Human Factors in ComputingSystems, CHI ’06, pages 761–770, New York, NY, USA, 2006. ACM.ISBN 1-59593-372-7. doi:10.1145/1124772.1124885. → pages14, 62[153] D. S. Kirk and A. Sellen. On Human Remains: Values and Practice in theHome Archiving of Cherished Objects. ACM Trans. Comput.-Hum.Interact., 17(3), July 2010. ISSN 1073-0516.doi:10.1145/1806923.1806924. → pages 14, 15, 133[154] D. S. Kirk, S. Izadi, A. Sellen, S. Taylor, R. Banks, and O. Hilliges.Opening up the Family Archive. In Proceedings of the 2010 ACMConference on Computer Supported Cooperative Work, CSCW ’10, page261–270, New York, NY, USA, 2010. Association for ComputingMachinery. ISBN 9781605587950.doi:10.1145/1718918.1718968. → pages 133[155] A. Klein. Hard Drive Cost Per Gigabyte, July 2017. URLhttps://www.backblaze.com/blog/hard-drive-cost-per-gigabyte/. → pages 2[156] M. Kljun, J. Mariani, and A. Dix. Toward understanding short-termpersonal information preservation: A study of backup strategies of endusers. Journal of the Association for Information Science and Technology,67(12):2947–2963, 2016. → pages 29, 30193[157] A. Konrad, E. Isaacs, and S. Whittaker. Technology-mediated memory: Istechnology altering our memories and interfering with well-being? ACMTransactions on Computer-Human Interaction (TOCHI), 23(4):23, 2016.→ pages 2[158] M. W. Lansdale. The psychology of personal information management.Applied ergonomics, 19(1):55–66, 1988. → pages 9, 10, 72[159] J. L. Lastovicka and K. V. Fernandez. Three paths to disposition: Themovement of meaningful possessions to strangers. Journal of ConsumerResearch, 31(4):813–823, 2005. → pages 15[160] M.-H. Lee, S. Cha, and T.-J. Nam. Patina Engraver: Visualizing ActivityLogs As Patina in Fashionable Trackers. In Proceedings of the 33rdAnnual ACM Conference on Human Factors in Computing Systems, CHI’15, pages 1173–1182, New York, NY, USA, 2015. ACM. ISBN978-1-4503-3145-6. doi:10.1145/2702123.2702213. → pages 108[161] M.-H. Lee, O. Son, and T.-J. Nam. Patina-inspired Personalization:Personalizing Products with Traces of Daily Use. In Proceedings of the2016 ACM Conference on Designing Interactive Systems, DIS ’16, pages251–263, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4031-1.doi:10.1145/2901790.2901812. → pages 108[162] T. B. Lee. Why Google believes machine learning is its future, May 2019.URL https://arstechnica.com/gadgets/2019/05/googles-machine-learning-strategy-hardware-software-and-lots-of-data/. → pages 173[163] I. Li, A. Dey, and J. Forlizzi. A Stage-based Model of Personal InformaticsSystems. In Proceedings of the SIGCHI Conference on Human Factors inComputing Systems, CHI ’10, pages 557–566, New York, NY, USA, 2010.ACM. ISBN 978-1-60558-929-9. doi:10.1145/1753326.1753409.→ pages 11[164] J. Li, J. H. Lim, and Q. Tian. Automatic summarization for personal digitalphotos. In Proc. of ICICS-PCM, volume 3. Citeseer, 2003. → pages 102[165] Y. S. Lincoln and E. G. Guba. But is it rigorous? Trustworthiness andauthenticity in naturalistic evaluation. New directions for programevaluation, 1986(30):73–84, 1986. → pages 23, 96194[166] S. E. Lindley, C. C. Marshall, R. Banks, A. Sellen, and T. Regan.Rethinking the Web As a Personal Archive. In Proceedings of the 22NdInternational Conference on World Wide Web, WWW ’13, pages 749–760,New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2035-1.doi:10.1145/2488388.2488454. → pages 3, 16, 133, 138, 139, 154,158[167] S. E. Lindley, G. Smyth, R. Corish, A. Loukianov, M. Golembewski, E. A.Luger, and A. Sellen. Exploring New Metaphors for a Networked WorldThrough the File Biography. In Proceedings of the 2018 CHI Conferenceon Human Factors in Computing Systems, CHI ’18, pages 118:1–118:12,New York, NY, USA, 2018. ACM. ISBN 978-1-4503-5620-6.doi:10.1145/3173574.3173692. → pages 66, 101, 126[168] W. Liu, O. Rioul, J. McGrenere, W. E. Mackay, and M. Beaudouin Lafon.BIGFile: Bayesian Information Gain for Fast File Retrieval. InProceedings of the 2018 CHI Conference on Human Factors in ComputingSystems, CHI ’18, pages 385:1–385:13, New York, NY, USA, 2018. ACM.ISBN 978-1-4503-5620-6. doi:10.1145/3173574.3173959. →pages 101, 102[169] C. Lo, J. Cheng, and J. Leskovec. Understanding Online Collection GrowthOver Time: A Case Study of Pinterest. In Proceedings of the 26thInternational Conference on World Wide Web Companion, WWW ’17Companion, pages 545–554, Republic and Canton of Geneva, Switzerland,2017. International World Wide Web Conferences Steering Committee.ISBN 978-1-4503-4914-7. doi:10.1145/3041021.3054189. →pages 107[170] V. Luckerson. Why Google Is Suddenly Obsessed With Your Photos - TheRinger, May 2017. URLhttps://www.theringer.com/2017/5/25/16043842/google-photos-data-collection-e8578b3256e0. → pages50[171] A. M. Luxon, C. E. Hamilton, S. Bates, and G. S. Chasson. Pinning ourpossessions: Associations between digital hoarding and symptoms ofhoarding disorder. Journal of Obsessive-Compulsive and RelatedDisorders, 21:60–68, 2019. → pages 3, 174[172] S. Machvovech. Amazon confirms that Echo device secretly shared user’sprivate audio — Ars Technica, May 2018. URL195https://arstechnica.com/gadgets/2018/05/amazon-confirms-that-echo-device-secretly-shared-users-private-audio/. → pages 4[173] W. E. Mackay. More Than Just a Communication System: Diversity in theUse of Electronic Mail. In Proceedings of the 1988 ACM Conference onComputer-supported Cooperative Work, CSCW ’88, pages 344–353, NewYork, NY, USA, 1988. ACM. ISBN 0-89791-282-9.doi:10.1145/62266.62293. → pages 72[174] W. E. Mackay. Triggers and Barriers to Customizing Software. InProceedings of the SIGCHI Conference on Human Factors in ComputingSystems, CHI ’91, page 153–160, New York, NY, USA, 1991. Associationfor Computing Machinery. ISBN 0897913833.doi:10.1145/108844.108867. → pages 135[175] W. E. Mackay and A.-L. Fayard. HCI, Natural Science and Design: AFramework for Triangulation across Disciplines. In Proceedings of the 2ndConference on Designing Interactive Systems: Processes, Practices,Methods, and Techniques, DIS ’97, page 223–234, New York, NY, USA,1997. Association for Computing Machinery. ISBN 0897918630.doi:10.1145/263552.263612. → pages 21[176] M. Madden and L. Rainie. Americans’ Attitudes About Privacy, Securityand Surveillance — Pew Research Center, May 2015. URL http://www.pewinternet.org/2015/05/20/americans-attitudes-about-privacy-security-and-surveillance/. → pages 4,127[177] T. W. Malone. How Do People Organize Their Desks?: Implications forthe Design of Office Information Systems. ACM Trans. Inf. Syst., 1(1):99–112, Jan. 1983. ISSN 1046-8188. doi:10.1145/357423.357430.→ pages 9, 10, 29, 48, 72[178] C. Mancini, Y. Rogers, A. K. Bandara, T. Coe, L. Jedrzejczyk, A. N.Joinson, B. A. Price, K. Thomas, and B. Nuseibeh. Contravision:Exploring Users’ Reactions to Futuristic Technology. In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems, CHI ’10,pages 153–162, New York, NY, USA, 2010. ACM. ISBN978-1-60558-929-9. doi:10.1145/1753326.1753350. → pages 106[179] N. Marsden and M. Haag. Stereotypes and Politics: Reflections onPersonas. In Proceedings of the 2016 CHI Conference on Human Factors196in Computing Systems, CHI ’16, page 4017–4031, New York, NY, USA,2016. Association for Computing Machinery. ISBN 9781450333627.doi:10.1145/2858036.2858151. → pages 92[180] N. Marsden and M. Pro¨bster. Personas and Identity: Looking at MultipleIdentities to Inform the Construction of Personas. In Proceedings of the2019 CHI Conference on Human Factors in Computing Systems, CHI ’19,New York, NY, USA, 2019. Association for Computing Machinery. ISBN9781450359702. doi:10.1145/3290605.3300565. → pages 92[181] C. Marshall and G. B. Rossman. Designing qualitative research. Sagepublications, 2014. → pages 32[182] C. C. Marshall. Rethinking personal digital archiving, Part 1: Fourchallenges from the field. D-Lib Magazine, 14(3/4):2, 2008. → pages 3,28, 29, 30, 99[183] C. C. Marshall. Rethinking personal digital archiving, part 2: implicationsfor services, applications, and institutions. D-Lib Magazine, 14(3):3, 2008.→ pages 2, 3, 140[184] C. C. Marshall, S. Bly, and F. Brun Cottan. The long term fate of ourdigital belongings: Toward a service model for personal archives. InArchiving Conference, volume 2006, pages 25–30. Society for ImagingScience and Technology, 2006. → pages 16, 28, 29, 30[185] E. Masanet, A. Shehabi, N. Lei, S. Smith, and J. Koomey. Recalibratingglobal data center energy-use estimates. Science, 367(6481):984–986,2020. ISSN 0036-8075. doi:10.1126/science.aba3758. URLhttps://science.sciencemag.org/content/367/6481/984.→ pages 173[186] C. Massey, T. Lennig, and S. Whittaker. Cloudy Forecast: An Explorationof the Factors Underlying Shared Repository Use. In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems, CHI ’14,page 2461–2470, New York, NY, USA, 2014. Association for ComputingMachinery. ISBN 9781450324731.doi:10.1145/2556288.2557042. → pages 12[187] C. Massey, S. TenBrook, C. Tatum, and S. Whittaker. PIM and Personality:What Do Our Personal File Systems Say About Us? In Proceedings of the32Nd Annual ACM Conference on Human Factors in Computing Systems,CHI ’14, pages 3695–3704, New York, NY, USA, 2014. ACM. ISBN197978-1-4503-2473-1. doi:10.1145/2556288.2557023. → pages 72,129[188] J. Matejka, T. Grossman, and G. Fitzmaurice. Patina: Dynamic Heatmapsfor Visualizing Application Usage. In Proceedings of the SIGCHIConference on Human Factors in Computing Systems, CHI ’13, pages3227–3236, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1899-0.doi:10.1145/2470654.2466442. → pages 108[189] T. Matthews, T. Judge, and S. Whittaker. How Do Designers and UserExperience Professionals Actually Perceive and Use Personas? InProceedings of the SIGCHI Conference on Human Factors in ComputingSystems, CHI ’12, page 1219–1228, New York, NY, USA, 2012.Association for Computing Machinery. ISBN 9781450310154.doi:10.1145/2207676.2208573. → pages 93[190] N. Maudet. Dead Angles of Personalization: Integrating CurationAlgorithms in the Fabric of Design. In Proceedings of the 2019 onDesigning Interactive Systems Conference, DIS ’19, page 1439–1448, NewYork, NY, USA, 2019. Association for Computing Machinery. ISBN9781450358507. doi:10.1145/3322276.3322322. → pages 134[191] M. J. May, E. Laron, K. Zoabi, and H. Gerhardt. On the Lifecycle of theFile. ACM Trans. Storage, 15(1), Feb. 2019. ISSN 1553-3077.doi:10.1145/3295463. → pages 10[192] M. L. Mazurek, E. Thereska, D. Gunawardena, R. Harper, and J. Scott.ZZFS: A hybrid device and cloud file system for spontaneous users. 2012.→ pages 67[193] J. McGrenere, R. M. Baecker, and K. S. Booth. An Evaluation of aMultiple Interface Design Solution for Bloated Software. In Proceedings ofthe SIGCHI Conference on Human Factors in Computing Systems, CHI’02, page 164–170, New York, NY, USA, 2002. Association for ComputingMachinery. ISBN 1581134533. doi:10.1145/503376.503406. →pages 134, 135[194] G. H. Mead. Mind, self and society, volume 111. Chicago University ofChicago Press., 1934. → pages 91[195] D. Merritt, J. Jones, M. S. Ackerman, and W. S. Lasecki. Kurator: UsingThe Crowd to Help Families With Personal Curation Tasks. InProceedings of the 2017 ACM Conference on Computer Supported198Cooperative Work and Social Computing, CSCW ’17, pages 1835–1849,New York, NY, USA, 2017. ACM. ISBN 978-1-4503-4335-0.doi:10.1145/2998181.2998358. → pages 102[196] D. Miller. The comfort of things. Polity, 2008. → pages 15[197] K. Moon and D. Blackman. A guide to understanding social scienceresearch for natural scientists. Conservation Biology, 28(5):1167–1177,2014. → pages 19, 22, 34[198] R. Mortier, J. Zhao, J. Crowcroft, L. Wang, Q. Li, H. Haddadi, Y. Amar,A. Crabtree, J. Colley, T. Lodge, T. Brown, D. McAuley, andC. Greenhalgh. Personal Data Management with the Databox: What’sInside the Box? In Proceedings of the 2016 ACM Workshop onCloud-Assisted Networking, CAN ’16, pages 49–54, New York, NY, USA,2016. ACM. ISBN 978-1-4503-4673-3.doi:10.1145/3010079.3010082. → pages 127[199] S. Moss. Pointless emails: they’re not just irritating – they have a massivecarbon footprint. The Guardian, Nov. 2019. ISSN 0261-3077. URLhttps://www.theguardian.com/technology/shortcuts/2019/nov/26/pointless-emails-theyre-not-just-irritating-they-have-a-massive-carbon-footprint. →pages 173[200] Mozilla. *Privacy Not Included: A Buyer’s Guide for Connected Products,n.d. URL https://foundation.mozilla.org/en/privacynotincluded/. →pages 94[201] M. J. Muller and S. Kogan. Grounded theory method in HCI and CSCW.Cambridge: IBM Center for Social Software, pages 1–46, 2010. → pages19[202] A. Murillo, A. Kramm, S. Schnorf, and A. De Luca. ” If I press delete, it’sgone”-User Understanding of Online Data Deletion and Expiration. InFourteenth Symposium on Usable Privacy and Security ({SOUPS} 2018),pages 329–339, 2018. → pages 17, 112[203] J. Naughton. ’The goal is to automate us’: welcome to the age ofsurveillance capitalism. The Guardian, Jan. 2019. ISSN 0029-7712. URLhttps://www.theguardian.com/technology/2019/jan/20/199shoshana-zuboff-age-of-surveillance-capitalism-google-facebook. → pages 162[204] T. Neate, A. Bourazeri, A. Roper, S. Stumpf, and S. Wilson. Co-CreatedPersonas: Engaging and Empowering Users with Diverse Needs Within theDesign Process. In Proceedings of the 2019 CHI Conference on HumanFactors in Computing Systems, CHI ’19, New York, NY, USA, 2019.Association for Computing Machinery. ISBN 9781450359702.doi:10.1145/3290605.3300880. → pages 92[205] N. Neave, P. Briggs, K. McKellar, and E. Sillence. Digital hoardingbehaviours: Measurement and evaluation. Computers in Human Behavior,96:72–77, 2019. → pages 3, 30, 174[206] N. Neave, K. McKellar, E. Sillence, and P. Briggs. Digital hoardingbehaviours: Implications for cybersecurity. In Cyber Influence andCognitive Threats, pages 77–95. Elsevier, 2020. → pages 30, 174[207] M. Nouwens, C. F. Griggio, and W. E. Mackay. ”WhatsApp is for Family;Messenger is for Friends”: Communication Places in App Ecosystems. InProceedings of the 2017 CHI Conference on Human Factors in ComputingSystems, CHI ’17, pages 727–735, New York, NY, USA, 2017. ACM.ISBN 978-1-4503-4655-9. doi:10.1145/3025453.3025484. →pages 47, 133[208] P. Obrador, R. de Oliveira, and N. Oliver. Supporting Personal PhotoStorytelling for Social Albums. In Proceedings of the 18th ACMInternational Conference on Multimedia, MM ’10, pages 561–570, NewYork, NY, USA, 2010. ACM. ISBN 978-1-60558-933-6.doi:10.1145/1873951.1874025. → pages 102, 132, 135[209] W. Odom and T. Duel. On the Design of OLO Radio: InvestigatingMetadata As a Design Material. In Proceedings of the 2018 CHIConference on Human Factors in Computing Systems, CHI ’18, pages104:1–104:9, New York, NY, USA, 2018. ACM. ISBN 978-1-4503-5620-6.doi:10.1145/3173574.3173678. → pages 95, 103[210] W. Odom, J. Pierce, E. Stolterman, and E. Blevis. Understanding Why WePreserve Some Things and Discard Others in the Context of InteractionDesign. In Proceedings of the SIGCHI Conference on Human Factors inComputing Systems, CHI ’09, pages 1053–1062, New York, NY, USA,2009. ACM. ISBN 978-1-60558-246-7.doi:10.1145/1518701.1518862. → pages 15, 95200[211] W. Odom, J. Zimmerman, and J. Forlizzi. Teenagers and Their VirtualPossessions: Design Opportunities and Issues. In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems, CHI ’11,pages 1491–1500, New York, NY, USA, 2011. ACM. ISBN978-1-4503-0228-9. doi:10.1145/1978942.1979161. → pages 10,14, 16, 29, 47, 66, 108, 133[212] W. Odom, R. Banks, D. Kirk, R. Harper, S. Lindley, and A. Sellen.Technology Heirlooms?: Considerations for Passing Down and InheritingDigital Materials. In Proceedings of the SIGCHI Conference on HumanFactors in Computing Systems, CHI ’12, pages 337–346, New York, NY,USA, 2012. ACM. ISBN 978-1-4503-1015-4.doi:10.1145/2207676.2207723. → pages 15, 103, 133, 144[213] W. Odom, A. Sellen, R. Harper, and E. Thereska. Lost in Translation:Understanding the Possession of Digital Things in the Cloud. InProceedings of the SIGCHI Conference on Human Factors in ComputingSystems, CHI ’12, pages 781–790, New York, NY, USA, 2012. ACM.ISBN 978-1-4503-1015-4. doi:10.1145/2207676.2207789. →pages 2, 3, 16, 28, 131, 138, 139, 163[214] W. Odom, J. Zimmerman, S. Davidoff, J. Forlizzi, A. K. Dey, and M. K.Lee. A Fieldwork of the Future with User Enactments. In Proceedings ofthe Designing Interactive Systems Conference, DIS ’12, pages 338–347,New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1210-3.doi:10.1145/2317956.2318008. → pages 20, 106, 115[215] W. Odom, J. Zimmerman, and J. Forlizzi. Placelessness, Spacelessness,and Formlessness: Experiential Qualities of Virtual Possessions. InProceedings of the 2014 Conference on Designing Interactive Systems, DIS’14, pages 985–994, New York, NY, USA, 2014. ACM. ISBN978-1-4503-2902-6. doi:10.1145/2598510.2598577. → pages 63,65, 110, 131, 133, 139[216] W. Odom, R. Wakkary, Y.-k. Lim, A. Desjardins, B. Hengeveld, andR. Banks. From Research Prototype to Research Product. In Proceedingsof the 2016 CHI Conference on Human Factors in Computing Systems,CHI ’16, page 2549–2561, New York, NY, USA, 2016. Association forComputing Machinery. ISBN 9781450333627.doi:10.1145/2858036.2858447. → pages 172201[217] W. Odom, R. Wakkary, J. Hol, B. Naus, P. Verburg, T. Amram, andA. Y. S. Chen. Investigating Slowness as a Frame to Design Longer-TermExperiences with Personal Data: A Field Study of Olly. In Proceedings ofthe 2019 CHI Conference on Human Factors in Computing Systems, CHI’19, New York, NY, USA, 2019. Association for Computing Machinery.ISBN 9781450359702. doi:10.1145/3290605.3300264. → pages15[218] W. T. Odom, A. J. Sellen, R. Banks, D. S. Kirk, T. Regan, M. Selby, J. L.Forlizzi, and J. Zimmerman. Designing for Slowness, Anticipation andRe-visitation: A Long Term Field Study of the Photobox. In Proceedingsof the 32Nd Annual ACM Conference on Human Factors in ComputingSystems, CHI ’14, pages 1961–1970, New York, NY, USA, 2014. ACM.ISBN 978-1-4503-2473-1. doi:10.1145/2556288.2557178. →pages 14, 15, 103[219] K. E. Oh. Types of personal information categorization: Rigid, fuzzy, andflexible. Journal of the Association for Information Science andTechnology, 68(6):1491–1504, 2017. → pages 72, 131, 133, 151[220] K. E. Oh. Personal information organization in everyday life: modeling theprocess. Journal of Documentation, 75(3):667–691, 2019. → pages 72[221] G. Oleksik, M. L. Wilson, C. Tashman, E. Mendes Rodrigues, G. Kazai,G. Smyth, N. Milic Frayling, and R. Jones. Lightweight Tagging ExpandsInformation and Activity Management Practices. In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems, CHI ’09,pages 279–288, New York, NY, USA, 2009. ACM. ISBN978-1-60558-246-7. doi:10.1145/1518701.1518746. → pages 101,102[222] J. A. Oravec. Virtual Hoarding. In Encyclopedia of Information Scienceand Technology, Fourth Edition, pages 4306–4314. IGI Global, 2018. →pages 30, 174[223] R. Orji, G. F. Tondello, and L. E. Nacke. Personalizing PersuasiveStrategies in Gameful Systems to Gamification User Types. In Proceedingsof the 2018 CHI Conference on Human Factors in Computing Systems,CHI ’18, New York, NY, USA, 2018. Association for ComputingMachinery. ISBN 9781450356206.doi:10.1145/3173574.3174009. → pages 77, 96202[224] D. Orth, C. Thurgood, and E. V. D. Hoven. Designing Meaningful Productsin the Digital Age: How Users Value Their Technological Possessions.ACM Trans. Comput.-Hum. Interact., 26(5), Aug. 2019. ISSN 1073-0516.doi:10.1145/3341980. → pages 133[225] L. Palen and P. Dourish. Unpacking “Privacy” for a Networked World. InProceedings of the SIGCHI Conference on Human Factors in ComputingSystems, CHI ’03, page 129–136, New York, NY, USA, 2003. Associationfor Computing Machinery. ISBN 1581136307.doi:10.1145/642611.642635. → pages 133, 134[226] S. T. Peesapati, V. Schwanda, J. Schultz, M. Lepage, S.-y. Jeong, andD. Cosley. Pensieve: Supporting Everyday Reminiscence. In Proceedingsof the SIGCHI Conference on Human Factors in Computing Systems, CHI’10, page 2027–2036, New York, NY, USA, 2010. Association forComputing Machinery. ISBN 9781605589299.doi:10.1145/1753326.1753635. → pages 15[227] D. Petrelli and S. Whittaker. Family memories in the home: contrastingphysical and digital mementos. Personal and Ubiquitous Computing, 14(2):153–169, 2010. → pages 15, 16, 57[228] D. Petrelli, S. Whittaker, and J. Brockmeier. AutoTopography: What CanPhysical Mementos Tell Us about Digital Memories? In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems, CHI ’08,page 53–62, New York, NY, USA, 2008. Association for ComputingMachinery. ISBN 9781605580111.doi:10.1145/1357054.1357065. → pages 15[229] D. Petrelli, E. van den Hoven, and S. Whittaker. Making History:Intentional Capture of Future Memories. In Proceedings of the SIGCHIConference on Human Factors in Computing Systems, CHI ’09, page1723–1732, New York, NY, USA, 2009. Association for ComputingMachinery. ISBN 9781605582467.doi:10.1145/1518701.1518966. → pages 2, 14[230] D. Petrelli, S. Bowen, and S. Whittaker. Photo mementos: Designingdigital media to represent ourselves at home. International Journal ofHuman-Computer Studies, 72(3):320–336, 2014. → pages 15[231] S. Petronio. Boundaries of privacy: Dialectics of disclosure. Suny Press,2002. → pages 133, 134203[232] J. Pierce and E. Paulos. Counterfunctional Things: Exploring Possibilitiesin Designing Digital Limitations. In Proceedings of the 2014 Conferenceon Designing Interactive Systems, DIS ’14, page 375–384, New York, NY,USA, 2014. Association for Computing Machinery. ISBN 9781450329026.doi:10.1145/2598510.2598522. → pages 133, 144[233] E. Politou, E. Alepis, and C. Patsakis. Forgetting personal data andrevoking consent under the GDPR: Challenges and proposed solutions.Journal of Cybersecurity, 4(1):tyy001, 2018. → pages 140[234] A. Ponsard and J. McGrenere. Anchored Customization: AnchoringSettings to the Application Interface to Afford Customization. InProceedings of the 2016 CHI Conference on Human Factors in ComputingSystems, CHI ’16, page 4154–4165, New York, NY, USA, 2016.Association for Computing Machinery. ISBN 9781450333627.doi:10.1145/2858036.2858129. → pages 219[235] J. Porter. Google will soon let you auto-delete your location tracking data,May 2019. URL https://www.theverge.com/2019/5/1/18525384/google-location-tracking-data-auto-delete-history-app-and-activity-data-3-18-months.→ pages 174[236] C. Preist, D. Schien, and E. Blevis. Understanding and Mitigating theEffects of Device and Cloud Service Design Decisions on theEnvironmental Footprint of Digital Infrastructure. In Proceedings of the2016 CHI Conference on Human Factors in Computing Systems, CHI ’16,page 1324–1337, New York, NY, USA, 2016. Association for ComputingMachinery. ISBN 9781450333627.doi:10.1145/2858036.2858378. → pages 2[237] B. A. Price, K. Adam, and B. Nuseibeh. Keeping ubiquitous computing toyourself: A practical model for user control of privacy. InternationalJournal of Human-Computer Studies, 63(1-2):228–253, 2005. → pages102[238] J. Pruitt and J. Grudin. Personas: Practice and Theory. In Proceedings ofthe 2003 Conference on Designing for User Experiences, DUX ’03, pages1–15, New York, NY, USA, 2003. ACM. ISBN 1-58113-728-1.doi:10.1145/997078.997089. → pages 77[239] E. Rader. Yours, Mine and (Not) Ours: Social Influences on GroupInformation Repositories. In Proceedings of the SIGCHI Conference on204Human Factors in Computing Systems, CHI ’09, page 2095–2098, NewYork, NY, USA, 2009. Association for Computing Machinery. ISBN9781605582467. doi:10.1145/1518701.1519019. → pages 12[240] K. M. Ramokapane, A. Rashid, and J. Such. ”I feel stupid I can’t delete...”:a study of users’ cloud deletion practices and coping strategies. InProceedings of the Thirteenth Symposium on Usable Privacy and Security(SOUPS 2017). USENIX Association, 2017. → pages 15, 17, 126[241] P. Raschke, A. Ku¨pper, O. Drozd, and S. Kirrane. Designing aGDPR-Compliant and Usable Privacy Dashboard. In IFIP InternationalSummer School on Privacy and Identity Management, pages 221–236.Springer, 2017. → pages 140[242] B. A. Richards and P. W. Frankland. The Persistence and Transience ofMemory. Neuron, 94(6):1071–1084, 2017. → pages 49[243] R. Rouge. Very Precious Memories: Digital Memories and DataValorization. Digitalization of Society and Socio-political Issues 1:Digital, Communication and Culture, pages 71–79, 2019. → pages 95[244] A. Rowstron and P. Druschel. Pastry: Scalable, Decentralized ObjectLocation, and Routing for Large-Scale Peer-to-Peer Systems. InR. Guerraoui, editor, Middleware 2001, pages 329–350, Berlin, Heidelberg,2001. Springer Berlin Heidelberg. ISBN 978-3-540-45518-9. → pages 164[245] W.-J. Ryu, J.-H. Lee, K.-M. Kim, and S. Lee. MeCurate: PersonalizedCuration Service Using a Tiny Text Intelligence. In Proceedings of the26th International Conference on World Wide Web Companion, WWW ’17Companion, page 269–272, Republic and Canton of Geneva, CHE, 2017.International World Wide Web Conferences Steering Committee. ISBN9781450349147. doi:10.1145/3041021.3054723. → pages 132,135[246] A. Sahney and D. Loxton. Introducing Backup and Sync for Google Photosand Google Drive, July 2017. URLhttps://blog.google/products/photos/introducing-backup-and-sync-google-photos-and-google-drive/. →pages 140[247] C. Sas and S. Whittaker. Design for Forgetting: Disposing of DigitalPossessions After a Breakup. In Proceedings of the SIGCHI Conference onHuman Factors in Computing Systems, CHI ’13, pages 1823–1832, New205York, NY, USA, 2013. ACM. ISBN 978-1-4503-1899-0.doi:10.1145/2470654.2466241. → pages 3, 16, 49, 63, 103[248] C. Sas, S. Whittaker, S. Dow, J. Forlizzi, and J. Zimmerman. GeneratingImplications for Design through Design Research. In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems, CHI ’14,page 1971–1980, New York, NY, USA, 2014. Association for ComputingMachinery. ISBN 9781450324731.doi:10.1145/2556288.2557357. → pages 169[249] C. Sas, S. Whittaker, and J. Zimmerman. Design for Rituals of Letting Go:An Embodiment Perspective on Disposal Practices Informed by GriefTherapy. ACM Trans. Comput.-Hum. Interact., 23(4):21:1–21:37, Aug.2016. ISSN 1073-0516. doi:10.1145/2926714. → pages 15, 64, 103,126[250] K. Schiele and M. Ucok Hughes. Possession rituals of the digital consumer:A study of Pinterest. ACR European Advances, 2013. → pages 30[251] T. Schnitzler, C. Utz, F. M. Farke, C. Po¨pper, and M. Du¨rmuth. Exploringuser perceptions of deletion in mobile instant messaging applications.Journal of Cybersecurity, 6(1):tyz016, 2020. → pages 17[252] S. Schoenebeck, N. B. Ellison, L. Blackwell, J. B. Bayer, and E. B. Falk.Playful Backstalking and Serious Impression Management: How YoungAdults Reflect on Their Past Identities on Facebook. In Proceedings of the19th ACM Conference on Computer-Supported Cooperative Work & SocialComputing, CSCW ’16, page 1475–1487, New York, NY, USA, 2016.Association for Computing Machinery. ISBN 9781450335928.doi:10.1145/2818048.2819923. → pages 95[253] D. Sedera and S. Lokuge. Is Digital Hoarding a Mental Disorder?Development of a Construct for Digital Hoarding for Future IS Research.In Proceedings of the International Conference on Information Systems,ICIS 2018, San Francisco, 2018. → pages 3, 16, 30, 174[254] A. Sellen and S. Whittaker. Beyond total capture: a constructive critique oflifelogging. Communications of the ACM, 2010. → pages 2[255] M. Seltzer and N. Murphy. Hierarchical File Systems Are Dead. InProceedings of the 12th Conference on Hot Topics in Operating Systems,HotOS’09, page 1, USA, 2009. USENIX Association. → pages 10, 100206[256] D. Siegel and S. Dray. The map is not the territory: empathy in design.interactions, 26(2):82–85, 2019. → pages 77[257] B. Signer. Towards Cross-Media Information Spaces and Architectures. In2019 13th International Conference on Research Challenges inInformation Science (RCIS), pages 1–7. IEEE, 2019. → pages 164[258] V. V. Simbelis, P. Ferreira, E. Vaara, J. Laaksolahti, and K. Ho¨o¨k.Repurposing Bits and Pieces of the Digital. In Proceedings of the 2016CHI Conference on Human Factors in Computing Systems, CHI ’16, pages840–851, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-3362-7.doi:10.1145/2858036.2858297. → pages 94[259] D. Sinn, S. Kim, and S. Y. Syn. Personal digital archiving: influencingfactors and challenges to practices. Library Hi Tech, 35(2):222–239, 2017.→ pages 3, 69[260] M. Sleeper, J. Cranshaw, P. G. Kelley, B. Ur, A. Acquisti, L. F. Cranor, andN. Sadeh. “i Read My Twitter the next Morning and Was Astonished”: AConversational Perspective on Twitter Regrets. In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems, CHI ’13,page 3277–3286, New York, NY, USA, 2013. Association for ComputingMachinery. ISBN 9781450318990.doi:10.1145/2470654.2466448. → pages 17[261] F. Smadja. Automatic categorization of documents based on textualcontent, Sept. 16 2003. US Patent 6,621,930. → pages 160[262] Spotify. Introducing Two New Personalized Playlists: On Repeat andRepeat Rewind, Sept. 2019. URLhttps://newsroom.spotify.com/2019-09-24/introducing-two-new-personalized-playlists-on-repeat-and-repeat-rewind/. → pages 95[263] K. M. Spurgin. ”Three backups is a minimum”: A first look at norms andpractices in the digital photo collections of serious photographers. I,Digital: Personal Collections in the Digital Era, Chicago: Society ofAmerican Archivists, pages 151–201, 2011. → pages 29, 30, 48[264] N. Srnicek. Platform capitalism. John Wiley & Sons, 2016. → pages 2,27, 49[265] State of California. California Consumer Privacy Act (CCPA), Oct. 2018.URL https://oag.ca.gov/privacy/ccpa. → pages 1, 162, 173207[266] R. A. Stebbins. Exploratory research in the social sciences, volume 48.Sage, 2001. → pages 21[267] A. Strauss and J. Corbin. Discovery of grounded theory. 1967. → pages 20[268] T. Stylianou Lambert, L. A. Henkel, C. M. Weber, and K. Parisi. Museumsand Visitor Photography: Redefining the Visitor Experience. Linda A.Henkel, Katelyn Parisi and Carey Mack Weber,”The Museum asPsychology Lab: Research on Photography and Memory in Museums,” inMuseums and Visitor Photography: Redefining the Visitor Experience, ed.Theopisti Stylianou-Lambert, Museumsetc, Edinburgh and Boston, 2016,pp. 152-83., 2016. → pages 49[269] S. S. Sundar and S. S. Marathe. Personalization versus customization: Theimportance of agency, privacy, and power usage. Human CommunicationResearch, 36(3):298–322, 2010. → pages 134[270] L. Swan, A. S. Taylor, and R. Harper. Making place for clutter and otherideas of home. ACM Transactions on Computer-Human Interaction(TOCHI), 15(2):9, 2008. → pages 133[271] G. Sweeten, E. Sillence, and N. Neave. Digital hoarding behaviours:Underlying motivations and potential negative consequences. Computers inHuman Behavior, 85:54–60, 2018. → pages 3, 16, 30, 55, 119, 174[272] B. Tarnoff. To decarbonize we must decomputerize: why we need aLuddite revolution. The Guardian, Sept. 2019. ISSN 0261-3077. URLhttps://www.theguardian.com/technology/2019/sep/17/tech-climate-change-luddites-data. → pages 173[273] L. Tauscher and S. Greenberg. How people revisit web pages: empiricalfindings and implications for the design of history systems. InternationalJournal of Human-Computer Studies, 47(1):97–137, 1997. → pages 10[274] J. Teevan, R. Capra, and M. P. Quin˜ones. How people find personalinformation. Jones et Teevan (2007b), chapitre, 2:22–34, 2007. → pages73[275] Y.-Y. Teing, A. Dehghantanha, K.-K. R. Choo, T. Dargahi, and M. Conti.Forensic investigation of cooperative storage cloud service: Symform as acase study. Journal of forensic sciences, 62(3):641–654, 2017. → pages164208[276] The Economist. The world’s most valuable resource is no longer oil, butdata, May 2017. URLhttps://www.economist.com/news/leaders/21721656-data-economy-demands-new-approach-antitrust-rules-worlds-most-valuable-resource. → pages 27[277] The Guardian. The NSA files, 2018. URLhttps://www.theguardian.com/us-news/the-nsa-files.→ pages 4[278] A. Thudt, D.