Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Policies, practices, and potentials for computer-supported scholarly peer review Nobarany, Syavash 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2015_september_nobarany_syavash.pdf [ 3.43MB ]
Metadata
JSON: 24-1.0166341.json
JSON-LD: 24-1.0166341-ld.json
RDF/XML (Pretty): 24-1.0166341-rdf.xml
RDF/JSON: 24-1.0166341-rdf.json
Turtle: 24-1.0166341-turtle.txt
N-Triples: 24-1.0166341-rdf-ntriples.txt
Original Record: 24-1.0166341-source.json
Full Text
24-1.0166341-fulltext.txt
Citation
24-1.0166341.ris

Full Text

POLICIES, PRACTICES, AND POTENTIALS FOR COMPUTER-SUPPORTED SCHOLARLY PEER REVIEW by  Syavash Nobarany  B.Sc., The University of Tehran, 2008 M.Sc., Simon Fraser University, 2010 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Computer Science)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  June 2015  © Syavash Nobarany, 2015 ii  Abstract  The scholarly peer-review process has been one of the cornerstones of science for centuries, but it has also been the subject of criticism for decades. The peer-review process is increasingly supported by computer systems; however, computer support for peer review has been mostly limited to facilitating traditional peer-review processes and remedying scalability issues. We took a holistic approach to understanding the peer-review process with the goal of devising computer-supported interactions, mechanisms, and processes for improving peer review. We conducted a series of studies to investigate various aspects of the peer-review process, including procedural fairness, anonymity and transparency, reviewing motivations, politeness of reviews, and opinion measurement. In the study of fairness, we learned about researchers’ attitudes and concerns about the fairness of the peer-review process. In the study of anonymity and transparency, we learned about the diversity of anonymity policies used by various publication venues. In the study of reviewing motivations, we learned the many reasons reviewers consider reviewing as part of their academic activities and what makes a review request more likely to be accepted. In the study of the use of politeness strategies, we learned about reviewers’ use of language for mitigating criticisms in a non-blind reviewing environment. In the study of opinion measurement we iteratively designed opinion measurement interfaces that can enhance elicitation of quantitative subjective opinions. Through these five studies, we expanded the understanding of challenges and opportunities for designing better peer-review processes and systems to support them, and we presented various ways through which computer support for peer review can be enhanced to address the identified challenges. iii  Preface All research reported in Chapters 1, 2, 3, 4, 5, and 7, and parts of Chapter 6 was conducted under the supervision of Dr. Kellogg Booth (Department of Computer Science). Dr. Joanna McGrenere (Department of Computer Science) supervised the study reported in Chapter 6, and Dr. Tamara Munzner helped with revising the framing and description of that work. Dr. Gary Hsieh (Department of Human-centered Design and Engineering, University of Washington) co-supervised the study on reviewing motivations (Chapter 4). Dr. Caroline Haythorthwaite (School of Library and Information Science) co-supervised the study on procedural fairness of peer review (Chapter 2). All research with human participants was reviewed and approved by the UBC Research Ethics Board. The numbers and project titles for the associated Certificates of Approval are:   H10-03344 Rethinking the Scientific Peer Review Process  H12-01599 CS HCI Course Projects I was the primary contributor to all aspects of this research and conducted the studies in every case not specified here. Chapter 6 includes a large portion of a collaborative paper on opinion measurement (the third publication listed below). Graduate students Louise Oram, Vasanth Kumar Rajendran, Devon Chen, and I collaborated in all aspects of the study. I took a lead role in the research design, implementation, and data analysis, and I was the lead author of the paper. The chapter has additional new material that situates the research within the context of the larger iv  discussion about peer review in the dissertation. For the study of politeness in peer review (Chapter 5), Mona Haraty helped with qualitative coding of the reviews. Large parts of Chapters 4, 5, and 6 have been published in peer-reviewed journals or conference proceedings:  Nobarany, S., Booth, K. S., & Hsieh, G. (2014). What motivates people to review papers? The case for the Human-Computer Interaction community. Journal of the Association for Information Science and Technology [to appear]. doi: 10.1002/asi.23469  Nobarany, S. and Booth, K. S. (2015), Use of politeness strategies in signed open peer review. Journal of the Association for Information Science and Technology, 66: 1048–1064. doi: 10.1002/asi.23229  Nobarany, S., Oram, L., Rajendran, V. K., Chen, C.-H., McGrenere, J., & Munzner, T. (2012). The design space of opinion measurement interfaces: exploring recall support for rating and ranking. In Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems (pp. 2035–2044). New York, NY, USA: ACM Throughout this dissertation the plural pronoun ‘we’ is used for stylistic reasons and should be taken to refer to either the singular author of this dissertation, Syavash Nobarany, or, for those portions of the dissertation that are derived from previously published co-authored work, the author of this dissertation and his co-authors, as detailed in the Preface. In every case ‘we’ includes the singular author of this dissertation, especially in the context of recommendations that are made based on the research. v  Table of Contents  Abstract .......................................................................................................................................... ii Preface ........................................................................................................................................... iii Table of Contents ...........................................................................................................................v List of Tables ............................................................................................................................. xvii List of Figures ............................................................................................................................. xix Glossary ..................................................................................................................................... xxii Acknowledgements .................................................................................................................. xxiii  Introduction ................................................................................................................1 1.1 A brief history of peer review and scholarly publishing................................................. 4 1.2 Problem statement and research approach ...................................................................... 8 1.3 Summary of research contributions .............................................................................. 10 1.4 Outline of the dissertation ............................................................................................. 12 1.5 How to read this dissertation......................................................................................... 14 vi   Examining Procedural Fairness of Peer Review ...................................................15 2.1 Related work ................................................................................................................. 19 2.1.1 Studies of concerns about the fairness of academic peer review .............................. 20 2.1.2 Studies of reviewers’ bias in peer review ................................................................. 21 2.1.3 Studies of consistency and reliability of peer review ............................................... 22 2.2 Methods......................................................................................................................... 24 2.2.1 Data collection .......................................................................................................... 24 2.2.2 Data analysis ............................................................................................................. 25 2.3 Results ........................................................................................................................... 26 2.3.1 General: Effect of perceived intentionality on perception of fairness ...................... 27 2.3.2 General: Fairness of procedure vs fairness of outcome ............................................ 27 2.3.3 General: Researchers’ bias in assessing fairness of their own experiences .............. 28 2.3.4 Concern: Reviewers’ misunderstandings caused by a lack of expertise or careless reading .................................................................................................................................. 29 2.3.5 Concern: Bias for or against certain authors by some reviewers .............................. 31 2.3.6 Concern: Lack of opportunity to voice concerns ...................................................... 35 vii  2.3.7 Concern: Harsh or impolite treatment of authors ..................................................... 39 2.3.8 Concern: Long delays in processing submissions .................................................... 41 2.3.9 Concern: Breach of confidence ................................................................................. 43 2.3.10 Concern: Inconsistency of decisions ..................................................................... 44 2.3.11 Concern: Subjective, ambiguous, or inappropriate evaluation criteria ................. 45 2.3.12 Concern: Lack of specific or appropriate policies and expectations .................... 47 2.3.13 Concern: Conflicting mindsets of “Mentorship and support” and “Survival of the fittest” ..... .............................................................................................................................. 50 2.3.14 Concern: Bias for specific methods and topics ..................................................... 52 2.3.15 Concern: Judgment based on incomplete or inaccurate representations of research ............................................................................................................................................... 53 2.3.16 Root cause: Difficulty of preserving author anonymity ....................................... 58 2.3.17 Root cause: Anonymous reviews .......................................................................... 60 2.3.18 Root cause: Difficulty of finding appropriate reviewers ...................................... 65 2.3.19 Root cause: Power imbalance and insufficient oversight of reviewers ................ 69 2.3.20 Root cause: High workload of reviewers and editors ........................................... 73 viii  2.3.21 Root cause: Insufficient feedback to reviewers and lack of recognition of reviewing .............................................................................................................................. 74 2.3.22 Root cause: Extremely high bars for acceptance in top-tier venues ..................... 77 2.3.23 Root cause: Pressure for publishing ...................................................................... 78 2.3.24 General: Overall satisfaction with peer review ..................................................... 81 2.4 Design implications and further discussion .................................................................. 81 2.4.1 Concern: Reviewers’ misunderstandings caused by a lack of expertise or careless reading .................................................................................................................................. 83 2.4.2 Concern: Bias for or against certain authors by some reviewers .............................. 84 2.4.3 Concern: Lack of opportunity to voice concerns ...................................................... 86 2.4.4 Concern: Harsh or impolite treatment of authors ..................................................... 86 2.4.5 Concern: Long delays in processing submissions .................................................... 87 2.4.6 Concern: Breach of confidence and conflict of interest ............................................ 88 2.4.7 Concern: Inconsistency of decisions ......................................................................... 89 2.4.8 Concern: Subjective, ambiguous, or inappropriate evaluation criteria ..................... 91 2.4.9 Concern: Lack of specific or appropriate policies and expectations ........................ 92 ix  2.4.10 Concern: Conflicting mindsets of “Mentorship and support” and “Survival of the fittest” ..... .............................................................................................................................. 95 2.4.11 Concern: Bias for specific methods and topics ..................................................... 96 2.4.12 Concern: Judgment based on incomplete or inaccurate representations of research ............................................................................................................................................... 97 2.4.13 General: Overall satisfaction with peer review ..................................................... 98 2.5 Conclusion .................................................................................................................. 101  Understanding and Supporting Anonymity Policies in Peer Review ................103 3.1 Background ................................................................................................................. 104 3.2 Related work ............................................................................................................... 106 3.2.1 Anonymity in peer review ...................................................................................... 107 3.2.1.1 Masking reviewers’ identity ........................................................................... 107 3.2.1.2 Masking authors’ identity ............................................................................... 108 3.2.1.3 Public disclosure of reviews ........................................................................... 110 3.2.2 Anonymity in online communities.......................................................................... 111 3.3 Methods....................................................................................................................... 112 x  3.3.1 Data collection ........................................................................................................ 113 3.3.2 Data analysis ........................................................................................................... 114 3.4 Findings....................................................................................................................... 115 3.4.1 Names ..................................................................................................................... 115 3.4.2 Relationships and conflict of interest ...................................................................... 118 3.4.3 Contributions and roles ........................................................................................... 120 3.4.4 Reviews, decisions, and other communications ..................................................... 121 3.4.5 Reviewers’ performance ......................................................................................... 123 3.4.6 Properties of disclosure/concealment ..................................................................... 124 3.4.6.1 Level of enforcement of policy ....................................................................... 125 3.4.6.2 Temporality of concealment/disclosure .......................................................... 127 3.5 Discussion and recommendations for design .............................................................. 130 3.5.1 Technology and anonymity..................................................................................... 130 3.5.1.1 How computer support affects anonymity ...................................................... 131 3.5.1.2 How computer support affects transparency................................................... 134 xi  3.5.1.3 Support for flexible policies and negotiation .................................................. 137 3.5.2 Implication for communities and future research ................................................... 139 3.5.2.1 Untested peer-review processes and parameters ............................................. 139 3.5.2.2 Connection between research and the practice of peer review ....................... 140 3.5.2.3 Future research should specify properties of policies ..................................... 141 3.5.3 Limitations of the research ...................................................................................... 142 3.6 Conclusion .................................................................................................................. 143  What Motivates People to Review Papers? The Case for the Human-Computer Interaction Community .............................................................................................................145 4.1 Related work ............................................................................................................... 147 4.2 Methods....................................................................................................................... 150 4.2.1 Materials (questionnaire design) ............................................................................. 150 4.3 Questionnaire design considerations ........................................................................... 153 4.3.1 Participants .............................................................................................................. 155 4.3.2 Data analysis ........................................................................................................... 155 4.4 Results ......................................................................................................................... 159 xii  4.4.1 Relative importance of general motivations for participating in the peer-review process ................................................................................................................................ 160 4.4.2 Relative importance of motivations for accepting a specific reviewing request .... 162 4.4.3 Factors underlying motivational variables .............................................................. 164 4.4.4 Effect of experience and demographics on motivations ......................................... 177 4.5 Discussion and implications for peer review .............................................................. 180 4.5.1 Different reviewers have different motivations ...................................................... 182 4.5.2 Implications for design of peer-review processes and systems that support them . 182 4.5.3 Limitations of our study .......................................................................................... 191 4.6 Conclusion .................................................................................................................. 193  Use of Politeness Strategies in Signed Open Peer Review ..................................194 5.1 Related work ............................................................................................................... 196 5.1.1 Politeness theory ..................................................................................................... 197 5.1.2 Analysis of politeness in peer review and other scholarly situations ..................... 201 5.1.3 Studies of anonymity in peer review ...................................................................... 202 5.2 Method ........................................................................................................................ 205 xiii  5.2.1 Data collection ........................................................................................................ 206 5.2.2 Profile of the alt.chi 2007 Reviewers ...................................................................... 206 5.2.3 Data analysis ........................................................................................................... 208 5.3 Findings....................................................................................................................... 210 5.3.1 Criticisms with no redressive action ....................................................................... 210 5.3.2 Criticisms mitigated through negative politeness. .................................................. 213 5.3.3 Criticisms mitigated through positive politeness .................................................... 215 5.3.4 Off-record strategy .................................................................................................. 218 5.3.5 Combining strategies .............................................................................................. 218 5.3.6 No evidence of using an avoidance strategy ........................................................... 219 5.3.7 Open signed processes create new needs for and offer new ways of using politeness strategies ............................................................................................................................. 221 5.4 Discussion ................................................................................................................... 223 5.4.1 Potential implications for the design of peer-review processes and support systems ............................................................................................................................................. 227 5.4.2 Limitations .............................................................................................................. 232 xiv  5.5 Conclusion .................................................................................................................. 236  Opinion Measurement Interfaces .........................................................................239 6.1 Basic concepts and terminology ................................................................................. 241 6.2 Opinion measurement for feedback giving ................................................................. 242 6.2.1 Feedback anonymity and credibility ....................................................................... 243 6.2.2 Individual differences in coping with feedback ...................................................... 244 6.3 Designing opinion measurement interfaces ................................................................ 245 6.4 Empirical studies of opinion measurement interfaces used in peer review ................ 248 6.4.1 Formative Study ...................................................................................................... 250 6.4.1.1 Part 1: Interview about experience and motivation for rating ........................ 250 6.4.1.2 Part 2: Rating exercises ................................................................................... 252 6.4.1.3 Summary of findings of the formative study .................................................. 253 6.4.2 Analysis of the Design Space ................................................................................. 254 6.4.3 Design process ........................................................................................................ 256 6.4.3.1 Low-fidelity HTML and paper prototypes ..................................................... 257 xv  6.4.3.2 Medium-fidelity HTML prototypes ................................................................ 259 6.4.4 Experimental evaluation of the final prototypes ..................................................... 261 6.4.4.1 Participants ...................................................................................................... 261 6.4.4.2 Methodology ................................................................................................... 261 6.4.4.2.1 Task and Procedure ................................................................................... 262 6.4.4.2.2 Measures.................................................................................................... 263 6.4.4.3 Results ............................................................................................................. 265 6.4.4.3.1 Quantitative analyses................................................................................. 265 6.4.4.3.2 Qualitative analysis ................................................................................... 267 6.4.4.4 Discussion and future work ............................................................................ 271 6.5 Summary and conclusions of this chapter .................................................................. 275  Conclusion ...............................................................................................................278 7.1 Summary of design implications and the potential future of peer review .................. 282 7.1.1 Getting the design right ........................................................................................... 282 7.1.1.1 An imaginary future scenario in the year 2020 ............................................... 286 xvi  7.1.2 Getting the right design ........................................................................................... 293 7.2 Final summation.......................................................................................................... 295 Bibliography ...............................................................................................................................296 Appendices ..................................................................................................................................344 Appendix A Interview script for study of procedural fairness of peer review ....................... 345 Appendix B List of publication venues, briefly reviewed for analysis of anonymity policies348 Appendix C Factors underlying motivations for reviewing ................................................... 353 Appendix D Interview script for informal formative study of opinion measurement interfaces................................................................................................................................................. 354  xvii  List of Tables  Table 2.1 Role structures in peer review ...................................................................................... 20 Table 2.2 Overview of participants ............................................................................................... 25 Table 2.3 Concerns about the fairness of the peer-review process (1st column) and summary of the discussed sub-themes related to each concern (2nd column) ................................................... 58 Table 2.4 Root causes and a summary of their subthemes ........................................................... 81 Table 2.5 Recommendations for design of peer-review processes and systems that support them, and the concerns and root causes addressed by them. ................................................................ 101 Table 3.1 Publication venues selected for thematic analysis of anonymity policies .................. 114 Table 3.2 Summary of practices around concealment of identities. In several cases the venues did not state some aspects of their practices. In these cases we assumed that they use the most common practice. Referee refers to the primary decision makers (Associate editors in most journals, and program committee members in most conferences) ............................................. 129 Table 3.3. Examples of tabular views representing three peer-review processes that differ based on anonymity policies about the visibility of reviews, authors, reviewers, and referees as described in Table 3.2. ................................................................................................................ 130 xviii  Table 4.1 Profile of study participants for the six background variables. .................................. 151 Table 4.2 General motivations for reviewing papers. ................................................................. 152 Table 4.3 Motivations for accepting a specific review request .................................................. 153 Table 4.4 Pairwise comparisons of general motivations for reviewing, using Tukey’s HSD test. Motivations followed by the same letter are not significantly different from each other (Piepho, 2004). .......................................................................................................................................... 161 Table 4.5  Pairwise comparisons of motivations for accepting a review request, using Tukey’s HSD test. Motivations followed by a same letter are not significantly different. ...................... 163 Table 5.1 Summary of concepts in Brown and Levinson’s theory of politeness. ...................... 199 Table 5.2 Profile of the 65 reviewers, the 38 senior authors, and the 40 contact authors .......... 207 Table 5.3 Relation between review ratings and use of politeness strategies .............................. 211 Table 6.1 A framework for understanding the design space of opinion measurement interfaces: n is the number of levels in the rating system, k is the number of items rated or ranked. Two dimensions provide a preliminary high level taxonomy. ............................................................ 255  xix  List of Figures  Figure 3.1 Interfaces for eliciting reviewer performance: 1. ScholarOne, 2. EditorialManager 124 Figure 3.2 . Interfaces for managing anonymity: 1. Editorial Manager 2. Precision Conference System ......................................................................................................................................... 138 Figure 4.1  General motivations for reviewing, sorted by frequency of being extremely or very important. .................................................................................................................................... 160 Figure 4.2 Participants’ top three general motivations for reviewing, sorted by frequency of being participants’ top motivation. ............................................................................................. 161 Figure 4.3 Motivations for accepting a review request, sorted by frequency of being reported to increase motivation. .................................................................................................................... 162 Figure 4.4  Effect of Reviewing experience on reviewing motivations. .................................... 178 Figure 4.5 Effect of Level of involvement in the review process on reviewing motivations. .... 178 Figure 4.6 Effect of Last earned degree on Learning through reflection. ................................... 179 Figure 4.7 Effect of Gender on reviewing motivations. ............................................................. 179 Figure 4.8 Effect of Position on reviewing motivations. ............................................................ 180 xx  Figure 5.1 Relation between use of politeness super-strategies and the ratings assigned to papers by the reviewers .......................................................................................................................... 211 Figure 5.2 Relation between reviewers’ years of publishing experience, and their not using redressive actions. Each point represents a reviewer. Only reviewers with more than five criticisms are depicted (jitter is added to reduce overlap). ......................................................... 212 Figure 5.3 Relation between categories of criticized issues and politeness strategies. .............. 214 Figure 5.4 Relation between use of hedges and self-reported levels of expertise to the reviews...................................................................................................................................................... 215 Figure 5.5 Relation between years of publishing experience of reviewers and authors. Each point represents a review (jitter is added to reduce overlap). .............................................................. 221 Figure 6.1 Rating interfaces in common use today..................................................................... 246 Figure 6.2 Answers to the Likert scale questions on motivation, binned into three categories: Agree, Disagree and Neutral (N=7). ........................................................................................... 251 Figure 6.3 Low-fidelity HTML prototypes: The user is rating the movie Inception. In part (c) the “Dark Knight” thumbnail is displayed when the user hovers over the fourth star. .................... 259 Figure 6.4 Low-fidelity paper prototypes. .................................................................................. 259 Figure 6.5 Medium-fidelity HTML prototypes (All thumbnails were identical in size in the prototypes, scaled here only.) ..................................................................................................... 260 xxi  Figure 6.6 Rating summaries presented to the participants to evaluate the accuracy of the List interface (left) and Stars/ Stars+History interfaces (right). The summary for the Binary interface was similar to the one for the List interface, but without the labels. .......................................... 264 Figure 6.7 Average ranking of interfaces based on perceived and actual speeds (N=16). Note that the y-axes differ. Shorter bars represent faster speeds. Error bars represent Std. Err. ............... 266 Figure 6.8 Average ranking of interfaces based on preference, fun, suitability for organization, accuracy, speed, and mental demand (N=16). Shorter bars indicate better ranks. ..................... 267  xxii  Glossary Peer-review process: The process of reading, commenting on, and often making recommendations on the suitability of a manuscript or paper for publication in a journal or presentation at a conference or to appear in the proceedings of the conference. Post-publication peer review: A peer-review process that occurs after publication of a manuscript or paper, maybe even after a pre-publication review (but not necessarily) Procedural fairness: study of subjective evaluations of the fairness of processes and policies that are used in determining decision outcomes, as opposed to the fairness of the outcomes themselves (Levine, 2012; Lind & Tyler, 1988) Signed peer review: A peer-review process in which the reviewer’s identity is not masked to the author of the paper Single-blind peer-review process: A peer-review process in which reviewers’ identities are masked to the authors Double-blind peer-review process: A peer-review process in which both authors’ and reviewers’ identities are masked to each other. Design Space: A theoretical construct that attempts to describe and segment all possible designs for a particular problem (or set of problems), given a number of constraints. In this dissertation we use this term to refer to both qualitative segmentation and parametrization of possible designs. xxiii  Acknowledgements  Many people have contributed to this dissertation:  Dr. Kellogg Booth, my supervisor, supported me, mentored me and advised me on research matters and other aspects of life.  Dr. Caroline Haythornthwaite, Dr. David Kirkpatrick, Dr. Michiel van de Panne, my supervisory committee, provided support and feedback over the course of this research.  Dr. David McDonald, Dr. Bill Aiello, Dr. Iain Taylor, my examining committee, asked thought-provoking questions and provided great feedback on my dissertation.   Dr. Joanna McGrenere supervised the study of opinion measurement reported in Chapter 6. In addition Joanna, Dr. Tamara Munzner, and Dr. Karon MacLean offered advice over the course of this research as part of the regular MUX meetings.  Louise Oram and Vasanth Kumar Renjendran contributed significantly to all aspects of the study of opinion measurement interfaces in Chapter 6.  Dr. Gary Hsieh helped with the study of reviewing motivations (Chapter 4), and along with Dr. Dan Cosley helped me navigate the literature on feedback provision (Chapter 6). Gary, along with Dr. Julie Kientz, Dr. Sean Munson, and all the students in their lab in the Department of Human Centered Design & Engineering at the University of Washington made me feel welcome there as a visiting researcher.  xxiv   Dr. Dan Cosley has provided help, support, and advice throughout my research career since 2007.  Dr. Jonathan Grudin provided welcome advice over the course of this research. Dr. Tom Erickson provided advice on the study of reviewing motivations. Along with Dr. Hrvoje Benko, Dr. Celine Latulipe, Dr. Tom Moran, Dr. Scooter Morris, Dr. Dan Olsen, Dr. Volker Wulf, and Dr. Shumin Zhai they helped arrange for access to a number of the people who participated either in the studies reported in this dissertation, or in the informal studies that informed this research.   My friends at UBC and elsewhere supported me throughout the years, and made these years more fun and more rewarding.  My parents and my brother have been supporting me in every way possible over the last 30 years of my life.  Mona, my wife, has been supporting me in every way imaginable, and helping me with writing, research methods, presentation, and other aspects of research as well as all other aspects of life.  The Natural Sciences and Engineering Research Council of Canada (NSERC) and the Graphics, Animation and New Media Network of Centres of Excellence (GRAND NCE) funded my research.   Thank you all!  1   Introduction As part of the process of research and knowledge creation, any claim of new knowledge should be scrutinized to ensure correctness, originality, and quality. This is done through peer review, which is inherently a social process that has both technical and interpersonal components. Peer review is used in other contexts such as assessment of grant proposals for allocation of research funds and assessment of employees in a workplace for the purpose of deciding promotions and salary bonuses. While we do not discuss these forms of peer review, some of the findings of this research may contribute to the body of knowledge on these other forms of peer review. The peer-review process that is the subject of study in this dissertation is the activity of reading, commenting on and often making recommendations on the suitability of articles for publication in a journal or for presentation at a conference. This process requires researchers to critique each other’s work through questioning and analyzing the goals, methods, and claims that are put forth. In a review of the research and opinion literature on peer review, Jefferson, Wager, & Davidoff (2002) characterized the goal of peer review as “(1) selecting submissions for publication (by a particular journal) and rejecting those with irrelevant, trivial, weak, misleading, or potentially harmful content, and (2) improving the clarity, transparency, accuracy, and utility of the selected submissions.” Although publishing research articles is primarily aimed at scholarly communication (Lynch, 2006), it is often viewed as a means to several ends such as obtaining research funding, increasing personal prestige, and promoting career advancement (Swan & Brown, 2005).  2  The scholarly peer-review process has been one of the cornerstones of science for centuries (Biagioli, 2002), but it has also been the subject of criticism for decades (Armstrong, 1997; Benos et al., 2007; Daniel, Mittag, & Bornmann, 2007). Studying the process of peer review is especially important because we get what we measure: peer review affects what research questions are addressed, and how research is conducted and reported. “The worst kind of clock is a clock that’s wrong, randomly fast or slow” said Godin (2012), arguing that if there’s no clock we try to find out through other means the right time, but with a wrong clock we’re tempted to simply accept what it says. Current peer review seems at times to be a wrong clock: it’s randomly wrong yet we keep relying on it. Richard Horton, the editor in chief of one of the most prestigious medical journals, Lancet, has pointed out some of the potential deficiencies of the peer-review process. “We know that the system of peer review is biased, unjust, unaccountable, incomplete, easily fixed, often insulting, usually ignorant, occasionally foolish, and frequently wrong” (Horton, 2000, p. 200). The goal of the research reported in this dissertation is to make sense out of some of the most pressing concerns about peer review and to come up with design solutions that might be incorporated into current or future computer systems that facilitate peer review. The peer-review process has been increasingly supported by computers over the last few decades, starting with the initial use of email by the early 1990’s to distribute requests for reviews and to return reviews after they were completed, to widespread deployment of dedicated peer-review support systems and conference management systems that conduct the entire process that have emerged in the past two decades. These systems support a large spectrum of activities 3  including finding reviewers, committee discussion, decision making, and final arrangements for including papers in conference proceedings or journals. Advancements in the design of online collaborative systems have created an unprecedented opportunity for devising and supporting novel peer-review processes that could potentially address various problems discussed in the literature, from slow turn-around times and reviewer overload (Ware & Monkman, 2008) to lack of fairness and perceived thwarting of innovation (Armstrong, 1997; Peters & Ceci, 1982). The literature on designing computer-supported collaborative systems shows that computer support can facilitate managing complex collaborations, remedy scalability problems, and enable processes that have not previously been possible. While several studies have investigated different types of peer-review process, there has been little research on problems with peer review from a design perspective. This dissertation takes a holistic approach to understanding concerns and attitudes toward the peer-review process to devise computer-supported interactions, mechanisms, and processes for improving the peer-review process.  We start with a brief look at the history of the peer-review process to understand how we got to the current way of conducting the peer-review process, followed by a statement of the problem addressed in this dissertation and our research approach. We end this chapter with a brief description and chapter-by-chapter outline of this dissertation. 4  1.1 A brief history of peer review and scholarly publishing Early scientists shared their thoughts and findings with a few others through verbal interactions in meetings (Zwart, 2005), and later through written correspondence. Scientific meetings and occasional larger gatherings were the primary methods of scientific communication. Large scale scientific gatherings date back to at least 2500 years ago. For example, in 550 AD, a scientific symposium was held by physicians of Jundishapur and the debates were recorded (Wiesehofer, 2001). However, such large gatherings were somewhat rare due to the difficulty and cost of transportation.  Formal and informal gatherings of scientists, and correspondence between them, continued to be the primary means of scientific communication and occasionally led to the formation of formal and informal academies, scientific clubs, and societies. Increased availability and ease of written correspondence enabled international scientific communication, which occasionally led to the formation of international academes such as Académie Parisienne formed by Marin Mersenne in 1635 (“Marin Mersenne (French mathematician) -- Britannica Online Encyclopedia,” n.d.).  With the availability of cost-effective printing, publishing became a prominent method of communication. It became possible to distribute writings, which could have affected citizens, hence a need was felt to regulate what was distributed to the public. Several scientists such as Galilieo, Servtus, and Versalius were tried in religious courts for their writings (Spier, 2002).  European academies of science such as Accademia Secretorum Naturae and Accademia dei Lincei published scientific books and pamphlets in the late 16th century; however, publication of 5  scientific discoveries suffered from censorship (Biagioli, 2002; Savage, 2010). Two of the famous examples of such censorship are the bans on reprinting Galileo’s and Descartes’s works.  On January 5, 1665, Denis de Sallo, a participant in Colbert’s clique, which was itself a descendant of Mersenne’s academy, published the first issue of the first scientific journal, Journal des Sçavans (later renamed to Journal des Savants). The first issue included results of scientific experiments, explanations of inventions, and also non-scientific content such as announcements of events and legal reports. However, the printing of the journal was banned after three months due to its clash with the Catholic Church. It then restarted under a different editor in 1666. From then until 1688, Académie des Sciences (founded by Colbert in 1666) published only collective works of the academicians, where reviewing and authorship was hard to distinguish. From 1688, Academicians were allowed to publish individual work without mentioning their membership in Académie. In 1699 Académie des Sciences was given a legal standing and it was decided that Academicians can mention their membership “only after a complete reading in the meetings, or at least only after an examination is made by those the Academy has designated to prepare a report.” This was a significant privilege because it did not require approval from of the royal board of censors (Biagioli, 2002; Savage, 2010). In England, scientific clubs and the Gresham College, a school where scientists conducted research and shared their findings in public lectures, led to the formation of the Royal Society in 1660. In the meetings of the Royal Society, scientists reproduced and communicated the results of their experiments. Oldenburg, a member of the society, published the first issue of the Philosophical Transactions, the first purely scientific journal, on March 6, 1665. Oldenburg 6  stated that the journal was his business and not that of the Society. Ultimately, in 1752, the Society claimed responsibility for the journal (Biagioli, 2002).  The Philosophical Transactions used peer review procedures, but unlike Académie it frequently accepted work of non-members. For example, a manuscript on Anatomy of Plants was read by Oldenburg and then passed on to another member who gave a positive review to the Royal Society urging them to read it (Biagioli, 2002). The phrase “communicated by” seems to root in the process used in that era, when one of the members of the society communicated the findings of non-members after perusal. Desire for faster dissemination of findings and a need for establishing priority were among the contributing factors to the prevalence of journals. Before journals, scientists used various methods such as anagrams to hide their results until their publication in the form of a book. Journals enabled faster dissemination of knowledge in smaller pieces in comparison with books, and wider dissemination in comparison with discussion in meetings. Journals needed an editor, someone credible who could accept the responsibility of putting together articles suitable for the journal based on various criteria including relevance to science and respect for values of the society or the institute enabling the publication of the journal. In the case of Philosophical Transactions, the credibility of Oldenburg derived primarily from the credibility of the Royal Society, in which he served as secretory. Since then, journals have become the primary method of disseminating new scientific knowledge, instead of books. Over the last few centuries, the decrease in cost and difficulty of transportation has increased the number of international scientific congresses from one or two meetings a year in the 1850s to 7  about 30 a year by the 1900s (Soderqvist & Silverstein, 1994). Recently, conferences are becoming a primary method of scientific communication, in addition to journals, in various disciplines (Glänzel, Schlemmer, Schubert, & Thijs, 2006), particularly in computer science (Patterson, Snyder, & Ullman, 1999). As discussed above, review of articles by other researchers prior to wide dissemination has been around since the first days of journals; however, the nature of the review process has evolved significantly. Up to the late 19th century, many journals followed newspapers’ editor-centered model, where editors were responsible for filling the available page space, and sometimes had difficulty doing so (Burnham JC, 1990). The adoption of a peer-review process, where editors seek the opinion of experts to decide on publication of articles, was a process that took almost a century to evolve. Two primary drivers of the adoption of the peer-review process were the increasing specialization of science, and the increasing volume of scientific research (Burnham JC, 1990). The two trends caused editors in various research areas to seek the opinions of outside experts in order to be able to handle the higher volume of submissions. However, even by the mid-20th century the use of external reviewers was not standard (Burnham JC, 1990); for example, Albert Einstein criticized John Tate, editor of the Physical Review, for sending out his article to a reviewer instead of publishing it immediately (Kennefick, 2005).  It’s worth noting that the peer-review process has been evolving within a larger, complex, and evolving ecosystem. The peer-review process is a key element of the scholarly publishing process. Consequently many of the players in the domain of scholarly publishing can be considered as stakeholders in the peer-review process whose desires and needs affect the 8  evolution of the peer-review processes. Researchers conduct, evaluate, and consume research, but they are not the only research consumers. Research findings could benefit industry, government, and the general public, hence these stakeholders sometimes provide financial support to relevant research. Academic publishers support dissemination of research by providing platforms for peer review and publishing. Professional associations and societies also participate in the dissemination of research, as part of their efforts toward supporting and nurturing areas of research. Libraries often mediate the relationship between readers and publications. Lastly, universities, research institutes, industrial labs, and research funders use outcomes of the peer-review process to assess researchers for various purposes such as funding allocation, hiring, and promotion (Bornmann, 2011; Chubin & Hackett, 1990). Perhaps the complexity of the web of relationships among these stakeholders has contributed to the relatively slow evolution of the peer-review process over the last few centuries.  Since the mid-20th century, the peer-review process has been the subject of empirical studies and has experienced modifications informed by those studies. Chapter 3 presents a detailed description of the current forms of the peer-review process that have been adopted by various research communities. In the rest of this chapter we focus on the goals of this dissertation and our research approach for achieving those goals. 1.2 Problem statement and research approach Computer-supported peer-review systems have become increasingly popular over the last few decades. Dedicated peer-review support systems such as ScholarOne Manuscripts and Editorial Manager, or conference management systems such as Precision Conference Solutions, 9  ConfMaster, HotCRP, EasyChair, Conference Management Toolkit, and CyberChair were developed to support the entire review process. However, computer-support for peer review has been mostly limited to facilitating traditional reviewing processes and remedying scalability issues. We believe that HCI studies are needed to make sense of how reviewers and authors interact using existing peer-review systems in order to understand authors’ and reviewers’ needs to inform the design of future peer-review processes and the systems that will support them.  The main problem addressed in this dissertation is how computer-support for peer review can be designed to improve the peer-review process. More specifically, how can we design computer-supported peer-review systems that enable or facilitate processes that are fairer, more accurate, more efficient, and more rewarding for both authors and reviewers? To address this problem we conducted a series of studies on concerns, attitudes, and behavior of researchers participating in the peer-review process. In addition, we conducted an exploration of the design space of opinion measurement interfaces, a key component of reviewing systems. A distinct feature of our research is that we adopted a human-computer interaction design perspective to analyze our observations with the purpose of unveiling the potentials of computer-supported peer review by envisioning the functions, features, processes and policies that are made possible or facilitated by using computer systems. This dissertation contributes primarily to the field of Computer-supported Cooperative Work (CSCW) and Human-Computer Interaction (HCI). Our understanding of technology and computer science supported our analysis of possibilities for developing support for human needs, hence some of this research may not directly contribute to basic computer science, and some of the research methods used in this research might be 10  unfamiliar to a computer science audience. Each chapter explains the methods used briefly, and provides references for further reading. 1.3 Summary of research contributions We provide here a high-level description of the five primary contributions of the research presented in this dissertation, and revisit and comment further on them in the concluding Chapter 7. 1. Examining procedural fairness of the peer-review process: We interviewed researchers across disciplines about their experiences with and concerns about fairness of the peer-review process. Some of the key concerns identified in the study are reviewers’ lack of expertise, insufficient oversight of reviewers, bias for some methods and topics, bias against some authors, lack of standard appeal processes, inconsistency of judgments, extremely high bars for acceptance, and arbitrary criteria for rejection. 2. Understanding anonymity policies in peer review: We analyzed anonymity policies and guidelines for authors and reviewers provided by publication venues. Based on that analysis, we developed a taxonomy of peer-review processes based on their anonymity policies.  3. Understanding reviewing motivations: We conducted a survey of reviewers of submissions to the International Conference on Human Factors in Computing Systems (CHI 2011) to gain a better understanding of their motivations for reviewing. We found that encouraging high quality research, giving back to the research community, and finding out about new research were the top general motivations for reviewing. We 11  further found that relevance of the submission to a reviewer’s research and relevance to the reviewer’s expertise were the strongest motivations for accepting a request to review, closely followed by a number of social factors. Gender and reviewing experience significantly affected some reviewing motivations, such as the desire for learning and preparing for higher reviewing roles.  4. Examining use of politeness strategies in open peer review: We analyzed how politeness strategies were employed by reviewers to mitigate their criticisms in an open peer-review process of a special track of the CHI 2007 conference. We found evidence of frequent use of politeness strategies, and that open peer-review processes hold unique challenges and opportunities for using politeness strategies. We found that less experienced researchers tended to express unmitigated criticism more often than did experienced researchers, and that reviewers tended to use more positive politeness strategies (e.g., compliments) toward less experienced authors. 5. Exploring the design space of opinion measurement interfaces: We provide a preliminary high-level taxonomy for the design space of computer interfaces to measure subjective opinions. We iteratively designed and evaluated several interface alternatives that incorporated different approaches to judging, within the design space. Finally, we ran, what to our knowledge is, the first controlled experiment that systematically investigates computer interfaces that measure subjective opinions. We found a significant preference for receiving a history of previous judgments when reviewing a new item, without having to directly compare items.  12  1.4 Outline of the dissertation Each of Chapters 2-6 describes a study that examines one aspect of the peer-review process. An investigation of the related work most relevant to each chapter is included in that chapter. In addition, each chapter discusses implications of our findings in the study described in the chapter for the design of future peer-review processes and systems to support them. In Chapter 2, we present our interview study that examined procedural fairness of the peer-review process. This focuses on the question “How do current peer-review practices undermine or enhance perceived fairness of the process?” While each chapter is motivated by the literature or our preliminary studies, Chapter 2 offers further justification for the rest of the studies presented in this dissertation by describing researchers’ concerns about peer review and illustrating each of the concerns with a rich set of comments on the experiences described by our participants. As a result, Chapter 2 is the longest chapter of the dissertation and it serves as additional background for the chapters that follow it. In Chapter 3 we describe a study of anonymity policies in use by various publication venues, and we provide a taxonomy of peer-review processes based on their anonymity policies. The main questions addressed by this chapter are “What is the range of anonymity policies in peer review?” and “How can the design of peer-review support systems better support these policies?” In Chapter 4 we describe a questionnaire study of CHI 2011 reviewers to address the question “What motivates people to review papers?” More specifically we looked at the specific questions 13  “Why do researchers value reviewing as part of their scientific activities?” and “What motivates them to accept or decline a request to conduct a review?” In Chapter 5 we used Brown and Levinson’s theory of politeness as a theoretical framework to investigate the use of politeness strategies in reviewers’ comments in an open peer-review process of a special track of a CHI 2007, called alt.chi. We examined the question “How do reviewers interact and criticize each other in an online open peer-review system?” More specifically, we tried to gain a better understanding of both “What are the politeness strategies that they use?” and “Do junior researchers avoid criticizing senior authors?” In Chapter 6, we describe the role of opinion measurement interfaces in peer review, and we present prototypes and user studies that explore the design space of opinion measurement interfaces to address the two questions “What are important dimensions of the design space of opinion measurement interfaces?” and “How can we design better opinion measurement interfaces?” Opinion measurement is used in many contexts. The discussion in Chapter 6 applies equally well to many of those contexts; it is not specific to peer review, but is included in the dissertation because of the prominence of opinion measurement in the peer-review process. In Chapter 7, we conclude the dissertation by providing a summary of the research contributions reported in the earlier chapters, an integrated overview of the implications for design of future peer-review support systems drawn from those chapters, and a discussion of possible futures for peer review based on the evidence we developed during the research. 14  1.5 How to read this dissertation Each of the five central chapters is written so it can be read on its own. They are largely self-contained. The chapters have been ordered within the dissertation with the expectation that they will collectively present a deepening understanding of the peer-review process and the opportunities that exist to use collaboration technology to improve one or more aspects of the peer-review process. Chapter 2 serves as a good introduction to problems that are examined in each of the four chapters that follow it, so it is perhaps advisable to read that chapter as a prelude to any of the others if a reader is interested only in the aspect of peer review discussed in a single one of those chapters. To some extent the other four chapters can be read in any order without much loss of continuity. Each chapter, based on our findings, offers recommendations for designing future peer-review processes and systems that support them. These recommendations could serve as potential avenues for future research, as they are conjectures about how to improve the peer-review process. Further studies are needed to evaluate the effectiveness of most of these recommendations.  The Preface provides additional information about any prior publication of parts of the dissertation and indicates the degree to which the previously published material in the chapters has been adapted to present a uniform treatment of the topics within the dissertation. It also indicates the relative contributions of all collaborators and co-authors of publications.   15   Examining Procedural Fairness of Peer Review One of the primary goals of the peer-review process, like other decision-making processes, is to be fair. While the fairness of scholarly peer review is widely criticized, there has been little systematic analysis of what it means for peer review to be fair, or how current peer review practices inhibit or contribute to its fairness. This chapter explores the concept of procedural fairness as it applies to peer review and presents the findings from an interview study that was conducted to gain further insight into how well procedural fairness serves as a framework for understanding fairness issues in scholarly peer review.  Fairness of the peer-review process can be studied through outcome-based or process-based lenses. Prior research has shown that people are concerned with the fairness of a decision-making procedure, as much as its outcomes (Tom R. Tyler, 1987). Deutch (1975) argues that for analyzing outcomes of any decision process various measures may be selected as the basis for a system of fairness. This selection depends on the type of social relations and the goal of the decision-making process. Some of the bases that have been considered for fairness include need (i.e. distribution according to needs), equality (i.e. everyone receives equal shares), and equity (i.e. equal outcome-to-input ratio). Need-based allocation of resources can be selected when the focus of a system is on taking care of its members, whereas an equality-based process may be chosen in informal cooperative systems. Equity is often chosen in formal and task-oriented systems, where the goal is to foster healthy competition (Deutsch, 1975). Peer review is intended to implement an equity-based system of fairness in which the input, a submission, is evaluated according to a set of criteria that often includes originality, correctness, and quality of the 16  submission. The outcome is quality control, operationalized as approval of the publication venue and publication of the submission (or not). The type of publication (e.g. method of presentation) and the prestige of the publication venue could affect how much attention the submission will receive. For example, a paper published in a top-tier journal is likely to receive larger readership. Similarly, an oral presentation at a conference is likely to reach a larger audience than a poster presentation at the same conference. In other words, the peer-review process attempts to give credit and attract attention to qualified submissions. According to Equity Theory proposed by Adams (1963), as summarized by van den Bos (2001) “people judge an outcome as fair when their own outcome-to-input ratio equals some comparative or referent outcome-to-input ratio”. However, often information about others’ outcome-to-input ratio is unavailable or incomplete (e.g. because of inability to accurately assess others’ contributions). Particularly, evaluating research is extremely difficult, even if the expert researcher is fully familiar with the work under scrutiny (Langfeldt, 2006; Lewis et al., 2011; Nielsen, 2011). Hence people rely on fairness of the decision-making procedure as a heuristic substitute for assessing outcome-to-input ratios (K. van den Bos, 2001). Moreover, the pervasiveness of egocentric biases in assessing fairness, for example in the form overestimating quality, importance, or appropriateness of one’s own work (Messick, Bloom, Boldizar, & Samuelson, 1985; M. Ross & Sicoly, 1979; Thompson & Loewenstein, 1992) has caused a shift in focus of researchers who study fairness from outcome-based to process-based explanations of people’s reactions to decisions (Tom R. Tyler & Blader, 2003). Procedural fairness is broadly defined as the study of subjective evaluations of the fairness of processes and policies that are 17  used in determining decision outcomes, as opposed to the fairness of the outcomes themselves (Levine, 2012; Lind & Tyler, 1988). According to studies of procedural fairness, if people feel that a process is fair (without knowing its outcome), they are more likely to perceive outcomes of the process as fair (T. R. Tyler, 1990; Kees van den Bos, Vermunt, & Wilke, 1997), to feel respected by the community and the decision makers (Lind & Tyler, 1988) and to behave more cooperatively and feel responsibility toward the community (De Cremer & Blader, 2006; Moorman, 1991; Tom R. Tyler & Blader, 2000). In addition, the legitimacy of institutions and organizations is also dependent on their adherence to procedural fairness norms (Tom R. Tyler, 2006).  What affects procedural fairness is still very much a subject of study (Blader & Tyler, 2003). The earliest studies of procedural fairness highlighted the importance of maximizing control over outcomes by the parties involved (Folger, 1977; Thibaut & Walker, 1975). One of the influential frameworks for procedural fairness was proposed by Leventhal (1980) and has been partially validated through subsequent empirical studies (Colquitt, 2001; Lind & Tyler, 1988). Leventhal suggests that individuals assess six rules of fairness: consistency, bias-suppression, accuracy, correctability, representativeness, and ethicality, for each of the following seven components of a procedure: 1. Selection of agents: Procedures for selecting decision makers, 2. Setting ground rules: Procedures for communicating evaluation procedures and criteria, 18  3. Gathering information: Procedures for collecting information that would inform the judgment 4. Decision structure: Procedures for decision making and resource allocation based on the evaluations, 5. Appeals: Procedures for appealing decisions, 6. Safeguards: Procedures for preventing abuse of power, and 7. Change mechanisms: Procedures for revising any of the procedures when needed. Several studies of procedural fairness have used this framework, interpreting it for the context under investigation (Gilliland, 1993; Tom R. Tyler, 1988). After Leventhal’s initial work (1980), several studies identified overlapping sets of the fairness components that describe what factors affect procedural fairness in various contexts. Dolan et al. (2007) reviewed the literature on procedural fairness and suggested that the literature can be summarized around six dimensions: Giving Voice to allow a degree of (perceived) control over decision making, Neutrality or absence of vested interest, Consistency of applying procedures across people and over time, Accuracy and up-to-dateness of information used in the decision making, Reversibility of decisions and presence of an appeal process, and Transparency to allow scrutinizing the process. In this study, these fairness frameworks were used as the basis for an interview study to explore the academic peer-review process and to probe participants’ perceptions of fairness of such peer 19  review. The interview script is provided in Appendix A. After reviewing the related work, we describe an outline of the interview study, as well as participant recruitment, and the data analysis methods. We then discuss the findings of the study and implications for peer-review systems. 2.1 Related work This section reviews studies of concerns about the fairness of academic peer review, studies of consistency of judgments in peer review, and studies of reviewers’ biases. Much of the prior research has focused on journal publication, and only few studies have investigated conference peer review. One of our research goals is to gain a better understanding of the differences between peer review for scholarly publication in journals and in conferences. Our own study, discussed later in this chapter, takes this into account in its design. Consequently, some of the experiences reported by our participants in this chapter are related to peer review as it is practiced in computer science where conferences are considered important publication venues that often have extensive and high-quality reviewing processes. Therefore, in the following we summarize the role structure in such peer-review processes.  Members of a conference program committee (the PC) are sometimes referred to as associates chairs (ACs) of the PC. Their role is in parallel to the role of associate editors in journals. In some conferences, the reviewers are referred to as program committee members, and associate chairs are referred to as area chairs. Most of the venues discussed in this dissertation implement a variation of these role structures, summarized in Table 2.1. 20   Most Journals Many conferences Many conferences Editor or Editor-in-chief Program chair Program chair Associate Editor Associate chair or program committee member Area chair or program board member Reviewer Reviewer Program committee member Table 2.1 Role structures in peer review 2.1.1 Studies of concerns about the fairness of academic peer review One of the most closely related studies to our study was conducted by Gilliland and Beckstein (1996). Using a questionnaire study of fairness of peer review, completed by 106 researchers, they found that the most important dimensions of procedural fairness for journal peer review were perceived fairness of the outcome, consistency of decisions across reviewers, consistency of process across papers, timeliness of feedback, interpersonal sensitivity, and the provision of explanations for the decisions that are made. In addition, they found that explanation and interpersonal sensitivity could predict an intention to submit to the journal in future. Other studies looked at problems that authors encountered in the peer-review process. Bradley (1981) found that 76% of the authors reported that they encountered pressure to conform to subjective preferences of the reviewers, 73% encountered false criticisms, 67% encountered lack of expertise of reviewers, 43% encountered inappropriate treatment by referees, and 40% experienced careless reading by referees. Bedeian (2003) found that 62% of authors were satisfied with the consistency of editor’ and reviewers’ comments, 38% encountered recommended revisions by the editors or reviewers that were based on an editor’s or referee’s personal preferences, and about 25% had to make revisions in their manuscripts that they felt 21  were wrong. These findings suggest that many researchers have experienced episodes of (perceived) unfairness, some of which could be attributed to procedural deficiencies of the peer-review process.  2.1.2 Studies of reviewers’ bias in peer review A different group of relevant studies investigated specific biases in peer review. Two groups of biases have been discussed in the literature: Bias for or against specific authors or groups of authors with specific characteristics, and bias against specific research. Several studies found that reviewers favour authors from English-speaking countries and from prestigious institutions (Blank, 1991; Peters & Ceci, 1982; J. S. Ross et al., 2006). Chapter 3 in this dissertation discusses anonymity policies intended to overcome such bias; a more detailed review of the literature on biases related to concealment or disclosure of identities of reviewers or authors in peer review is provided there. Another type of peer-review bias reported in the literature is resistance to discovery, and stifling innovation. According to Lock (1985), the former editor of British Medical Journal (BMJ), “peer review favors unadventurous nibbling at the margin of truth rather than quantum leaps.” Other studies revealed that many highly cited or Nobel-winning works had been initially rejected (Campanario, 1996; Gans & Shepherd, 1994). Other types of bias against specific research include bias in favor of positive results (Mahoney, 1977), results that match the reviewers’ stance (Abramowitz, Gomes, & Abramowitz, 1975), or results that corroborate previous work (Ernst & Resch, 1994). Campanario (1998) summarizes evidence for such biases, and concludes that peer review is an unreliable quality control mechanism. 22  2.1.3 Studies of consistency and reliability of peer review Several studies looked at consistency, reliability, and accuracy of the peer-review process in correctly identifying errors and shortcomings in submissions. Mahoney (1977) found that peer review may be affected by whether the experimental results being reported are consistent or inconsistent with the reviewer’s theoretical perspective. Several studies have investigated reliability of peer reviews. Peters and Ceci (1982) resubmitted a set of 12 already-published papers for review, only three of which were detected as resubmissions by the reviewers; eight of the remaining nine papers were rejected, often because of “serious methodological flaws”. In another study, secondary review of 50 previously accepted (but not published yet) papers recommended only six of them for publication “as is” while rejecting six of them and asking for revisions of the rest (Garfunkel JM, 1990). In the most recent study of this kind, the NIPS (Neural Information Processing Systems) conference, a top-tier conference in computer science, formed two independent program committees to assess 166 submissions. In this study, they aimed for a 22.5% acceptance rate, which is a typical acceptance rate for competitive conferences within computer science. Out of the 37 or 38 papers that each committee accepted, each committee accepted 21 or 22 papers that the other committee rejected (Price, 2014). Several other studies of this kind report similar results (G. W. Brown & Baca, 1986; Gardner & Bond, 1990; Murray, 1988). Campanario (1998) presented a more comprehensive review of peer-reviewing errors in the form of publication of papers that need further revision, or that should have been rejected outright. Campanario (1996, 2009) also provided an overview of peer-reviewing errors that rejected or resisted publication of highly cited and Nobel-class articles, and attributed them to the conservative nature of scientists and imperfections of peer review. 23  A meta-analysis of studies on reliability of peer review revealed poor inter-reviewer reliability (Bornmann, Mutz, & Daniel, 2010). Poor reliability of peer reviews has been criticized by many researchers; for example, Bornstein (1991) writes “if one attempted to publish research involving an assessment tool whose reliability and validity data were as weak as that of the peer-review process, there is no question that studies involving this psychometrically flawed instrument would be deemed unacceptable for publication.” However, this view has been challenged by others (Eckberg, 1991). Some previous studies found that the differences in reviewers’ opinions are caused mainly by their focus on different aspects of papers rather than by disagreement on particular points (Bakanic, McPhail, & Simon, 1989; Fiske & Fogg, 1990). However, reviewers do sometimes have conflicting opinions, which are caused by their different evaluation criteria and standards despite the existence of guidelines and standards imposed by journals (Kerr, Tolliver, & Petree, 1977; Siegelman, 1991; Robert J. Sternberg, 1985). Furthermore, it is known that irrelevant or less relevant aspects of papers such as reading difficulty (e.g., use of complex wording) and better type settings may contribute to acceptance of papers (Armstrong, 1980; Koren, 1986). Our interview study builds on these findings from prior studies and on theories of fairness. It was designed to expand our understanding of concerns about the fairness of peer review by collecting additional qualitative data from a diverse group of researchers, including peer review for both competitive conferences and journals, and for a variety of research areas that are primarily quantitatively focused, qualitatively focused, or interdisciplinary in nature. 24  2.2 Methods We conducted semi-structured interviews with 19 researchers from a variety of disciplines and we then used thematic analysis to examine concerns for fairness of the peer-review process. Details of our data collection and data analysis methods are described in this section. 2.2.1 Data collection Our goal was to collect data from a diverse set of researchers to be able to gain a broad understanding of their concerns about fairness of the peer-review process. We compiled a list of ten disciplines that covered a wide spectrum of social and natural sciences, encompassing disciplines with an emphasis on qualitative and quantitative methods, as well as disciplines that publish primarily at conferences. In addition, we wanted to collect opinions of researchers at various ranks. To achieve this, we sent invitations to participate in the study in batches. We continued interviewing new participants until saturation was reached. Participants were recruited from three universities and one industrial research lab, all in North America. We sent invitations by email to researchers at various ranks in the departments that we had chosen. Self-selection in responding to our invitations affected the mix of participants.  We do not specify the fields of our participants’ research or their institutions, because some of their quotes include information that in conjunction with the name of their institution or perhaps even the name of their specific discipline could jeopardize their anonymity. Nineteen researchers (four female, 15 male), including five junior researchers, three mid-career researchers, and 11 senior researchers, participated in the study. Table 2.2 presents a profile of our participants. In 25  the rest of this chapter participants are referred to as Area + (number). For example, a natural scientist is referred to as NS2, or a senior medical scientist is referred to as Med1. In some cases we do not fully distinguish between areas (such as the two branches of computer science). Each interview took between 25 and 76 minutes (average 44 minutes, median 42 minutes). All interviews were audio recorded and transcribed by the author, who conducted every interview.  Area Discipline # of participants Computer Science (CS) Computer Science: applied  2  Computer Science (CS) Computer Science: theoretical 2 Engineering (Eng) Electrical & Computer Engineering 1  Engineering (Eng) Mechanical Engineering 1  Natural Science (NS) Physics  1  Natural Science (NS) Biology 3 Natural Science (NS) Psychology 1  Medicine (Med) Medicine 4  Social Science (SS) Sociology 1  Social Science (SS) Education 2 Social Science (SS) Anthropology 1 Table 2.2 Overview of participants 2.2.2 Data analysis We used Braun and Clarke’s thematic analysis approach (Braun & Clarke, 2006) to analyze the interviews and to organize issues and concerns about peer review discussed by the participants. Braun and Clarke describe six phases in a thematic analysis: Phase 1: familiarizing yourself with your data Phase 2: generating initial codes 26  Phase 3: searching for themes Phase 4: reviewing themes Phase 5: defining and naming themes Phase 6: producing the report There are many variants of thematic analysis. Most involve a sequence of steps or stages, along the lines of what Clarke and Braun describe. Following the six-phase process identified by Clarke and Braun, we first transcribed the interviews and established an overview or high-level understanding of what had been collected. We then started a search for themes in a bottom-up approach, starting with open coding of the interviews, followed by an iterative search for themes, which culminated in a set of named themes. In the next sections we describe the themes we identified in our analysis. 2.3 Results In this section, we first briefly describe three general themes related to the perception of the concept of fairness. We then describe 20 themes that summarize our participants’ perceptions of fairness of peer review. We divide these themes into two groups, concerns and root causes where concerns are perceived problems that can directly cause unfairness, and root causes are considered potential factors that might cause the concerning behaviors. This is not a clear binary distinction, but a spectrum. Nevertheless, we found it helpful in organising the themes that we identified. We end this section with another general theme that described the overall satisfaction 27  of our participants with peer review. The title of each theme indicates whether it is a general theme, a concern, or a root cause.  Our participants shared a number of concerns about the peer-review process; however their perceptions differed on whether or not a specific type of concern is actually related to the fairness of peer review. The following two general themes summarize this. 2.3.1 General: Effect of perceived intentionality on perception of fairness  Reviewers’ biases were not necessarily considered to be instances of unfairness, because they are not necessarily intentional. Eng2 said “I haven’t had any examples where I feel [a] reviewer is there with a knife … [an] inappropriate set of reviewers might affect quality of the process, but not fairness.” Similarly CS3 felt it is often the case that “the person wasn’t competent rather having an axe to grind, to deliberately exclude your paper, they just don’t get the point.” NS2 mentioned “there are definitely times that it seems some reviewers are hypercritical and times that some reviewers have barely paid attention to the contents or that their main comment is that cite their own work more, I’ve seen all those, but there was only one case that I felt someone was really going out of their way to be unfair, but a lot of times where I feel like maybe [it’s] due to lack of effort or lack of knowledge.”  2.3.2 General: Fairness of procedure vs fairness of outcome  Several participants sometimes felt some of their concerns for the quality of peer review were not necessarily about its fairness. CS1 clearly stated this viewpoint “Quality [means that] enough people, smart people, looked at it and [have] really thought about it. Still [it] could be 28  completely unfair, could be completely biased, it’s a quality decision but everyone [could] still be biased …. It was judged well, I don’t know if it’s fair.” This comment essentially emphasizes the fact that fairness of a procedure does not necessarily lead to fairness of the outcomes, which is discussed in the introduction of this chapter. 2.3.3 General: Researchers’ bias in assessing fairness of their own experiences  It is also important to keep in mind authors’ potential bias in interpreting reviewers’ comments, which was pointed out by some of our participants. NS5 said “I know the personal value of finding that stuff unfair; so I’m reluctant to say that’s that.” Similarly Med2 said “I have to be very careful for my capacity for self-delusion. I may say that so they can give fairer judgment; but what I may mean is that they’re more likely to be favorable.” With those notes of caution in mind, we describe 20 major themes that summarize factors that could undermine the procedural fairness of peer review, as discussed by our participants.  We start with twelve themes that we grouped as concerns for fairness of peer review. Some of these concerns are related and some are overlapping. The concerns include reviewers’ misunderstandings caused by a lack of expertise or careless reading, bias for or against certain authors by some reviewers, lack of opportunity to voice concerns, harsh or impolite treatment of authors, long delays in processing submissions, breach of confidence, inconsistency of decisions, subjective, ambiguous, or inappropriate evaluation criteria, lack of specific or appropriate policies and expectations, conflicting mindsets of “Mentorship and support” and “Survival of the 29  fittest”, bias for specific methods and topics, and judgment based on incomplete or inaccurate representations of research. 2.3.4 Concern: Reviewers’ misunderstandings caused by a lack of expertise or careless reading Two of the common problems that our participants experienced were a lack of expertise of reviewers and careless reading of papers by the reviewers. While the two could be classified as distinct concerns, we grouped them as one because in many cases it was not clear to our participants which of the two had caused reviewers’ misunderstanding. For example, Eng1 said “[It] really annoys me when I get reviews back and someone clearly doesn’t know what they’re talking about and they pretend that they did. It could be that they didn’t read the paper properly, knew what they’re talking about, didn’t have enough time.” Other problems such as language barriers can contribute to misunderstandings. For example, NS4 suggested that “There might have been a language issue and that’s understandable, or focus, the person didn’t focus, because the issues that were raised were actually addressed in the paper.” NS1 expressed concerns specifically about lack of knowledge on the part of reviewers. “[Some reviewers] literally do not have the expertise to make the sort of comments they are making and you can often tell these reviews because they are very fussy they deal with little tiny issues and often, from lack of knowledge, a method that is completely well known in the field, they are asking stupid questions about it.” Often communities comprise multiple different subgroups with different expertise and different foci; this could make reviewer assignment challenging, because members of a subgroup might 30  not be able to fully assess contributions of members of the other subgroups. CS2 said “[Some] papers are more mathematical … often times there are people who haven’t done the math or do not really understand the implications of the math. They get the high level idea but their reviews are dilute … I think there is a divide in every community. There are people who are more formal … and people who are more qualitative … I’d rather have somebody actually closely read it.” SS1 had a similar experience. “Sometimes I do feel like they didn’t quite understand how qualitative research works, and what qualitative research is able to say and what it’s not … they come at it from a very straightforward model that every paper has to be empirical theory testing.” Some of the participants mentioned that part of the problem is that those who receive review requests hand the paper to their inexperienced trainees. CS3 said “[Many reviewers] often don’t review the paper; they give it to their junior graduate student or whatever, who may or may not be a good person to review it; sometimes we do not have enough experience in the field.” Eng1 said “If it was [my] or someone else’s MSc student, I absolutely don’t want them to review my paper, they just don’t have enough experience.” Our participants also mentioned having felt the problem of lack of expertise as a reviewer. Med1 said “The thing that I worry … as a reviewer, is that if I feel I don’t have expertise on some aspect of the paper, maybe some aspect of statistics, so I might just be tempted to just ignore that and deal with things I do know. What I try to do and I hope people would do is to say here is what I can comment on, but say to the editors, I don’t have enough knowledge of that sophisticated area, the statistics for example, to really comment whether that’s legitimate or not. 31  I request that you have somebody look at that.” NS2 said “I’ve certainly occasionally accepted reviews and then realized that I don’t really know enough to fully review this thing accurately.  … if it’s a field that I actually don’t have a good sense what already has been done in the field, there’s no way that in a reasonable amount of time I can really get a comprehensive sense of what the literature is.” Eng1 said “I often get [topic A] papers to review. I don’t know for certain if it’s novel, because I don’t know the literature completely inside and out, versus the [topic B] I know exactly the entire field; so, I might be able to do a good job on the review, but I’m not necessarily gonna be the person to ask for novelty and evaluation.” Some of the participants suggested that the problem with lack of expertise is, in part, a result of difficulty finding expert reviewers. See section 2.3.18 for a discussion of this root cause. Mapping this concern to Leventhal’s model of fairness components, lack of expertise of reviewers can be attributed to accuracy of “Selection of agents” and careless reading of submissions can be attributed to accuracy of “Gathering information” for evaluation.   2.3.5 Concern: Bias for or against certain authors by some reviewers Bias for or against certain authors (or their institutions) by some reviewers was another common concern. This is addressed in some communities by anonymizing submissions so reviews of reviewers who are biased against certain authors are less likely to be affected by the bias. SS2 expressed concern for fairness if authors are known to reviewers. “How would such a system account for disparities in power, disparities in institutional capacities?” CS3 said “Before, we didn’t have anonymous submissions for conferences, and I think people rewarded friends and 32  punished enemies a bit.” CS4 thought “I guess it’s advantageous to be single blind, because then my name helps me get accepted. In terms of what’s good for the community generally it’s better to be double blind to correct for the obvious biases.” Some of the participants discussed how knowing authors’ identities affects their review. For example, NS5 said “I’d be nicer if I knew that someone of higher status [is] on the paper.  I certainly don’t mean I wouldn’t criticize, but I’m probably more careful not to bother with stuff that [is] maybe not hugely important.” NS4 said “When I sometimes get the paper where I see the name of [the] author, I say ‘ooh, I’m not quite sure about this guy’ and I think that’s a bias that should be eliminated. I work very hard to avoid that to the extent that I very often try not to look at the title page.” In contrast, CS1 thought “there are lots of reasons to hate a paper, and the name’s not gonna save it.” NS2 said “I think most of the bias come[s] from just they are really busy and if they recognize the person and already think well of them they’d spend more time on them.” While bias against authors was a common concern, there were also concerns that anonymization to remove this bias might decrease reviewers’ ability to judge the quality of papers. Some of the participants pointed out the potential benefits of knowing the identity of authors, because it could reveal information such as the context of study and the expertise of authors, which could facilitate assessing the work under review. One of the benefits was being able to trust authors. CS2 said “There is definitely [a] certain presumption on the part of reviewer that if this is a paper that is coordinated with this person, it should be of quality. … I think you have to rely on people’s reputation to some extent.” NS3 said “It could save time, so when you get a paper from 33  a very prominent group, you figure that there is a certain number of things that are probably going to be OK, they’re probably not gonna be saying anything ridiculous, and they’re probably gonna be aware of what’s going on in the field … when you get something from some unknown group … you have to go through a whole set of other questions. Reviewing will be harder. For things like ordinary journals where they are not selective on novelty or how exciting the result is, the extra effort involved both in the mechanics of [anonymizing], but also in the reviewing might not be worth what you’re gaining in fairness.” In contrast, there were concerns that too much reliance on the authors’ reputations might also compromise the peer-review process. Eng1 warned against trusting authors when judging a paper, which argues against disclosing the names of authors and for anonymization. “Paper should be reproducible from itself … So, I have to trust this person knowing what they are doing to trust the results? Not a chance. Everybody makes mistakes no matter how high they are. There is no telling that this senior person that’s amazing doesn’t have a crappy student who did the work in the first place, and put their name on it.” When asked about this concern, NS1 said “Personally, I rewrite every paper that comes out of my lab … it’s my academic product, if I don’t take extraordinary care with that product I shouldn’t put my name on it … [the reviewers] ask you whether you can do controls, and I think after [the] first 500 papers somebody should at least accept we know how to do a controlled experiment.” Another potential benefit of not anonymizing is efficiency of writing a paper based on the authors’ own prior work. CS2 said “There are lots of cases where it hurts the authors because you cannot reference your prior work. … You can cite the work, but cannot claim that it’s your 34  work, which is important.” CS1 said “I’ve had a reviewer saying you should be citing [CS1’s] work; that’s me. Or another saying [CS1] did this last year, why does your stuff exist?” CS3 said “You have to be circumspect in how you refer to your own work, and if you are building on a body of work which we usually do, then it’s kind of an obstacle to try and pretend that I didn’t write that paper … you don’t want to repeat everything you said in your previous paper, because that’s self-plagiarizing.” There were other concerns about anonymization losing important, perhaps highly nuanced information that might be important to take into consideration. While, as mentioned by the participants, papers should not need any extra material to be understood, sometimes the context of a study or the identities of authors could matter to make sure that the information is accessed properly, and that it can be trusted. SS3 said “I do a lot of communities’ research … the community group is the second author … I think that it’s an important signal to the reviewer that it’s written in this way.” Med2 described an experience where as a reviewer the context of study mattered. “In the middle of the war, in the capital city of a nation in civil war, [authors] thought it’s important enough to keep pushing and to study. Well, you can see probably from the timbre in my voice that to me was a story that needed to be told, so that’s where I bridge over that knowing the authorship and the situation did affect in a positive way.” CS1 described a case where some information about authors was needed to ensure that the research was conducted ethically. “Authors knew something they shouldn’t have been able to know and they weren’t citing anything. … It was either a breach of confidentiality or you looked inside somebody’s study but you’re not citing it. … I had to ask ‘have you ever been [at this institution]?’” 35  Mapping this concern to Leventhal’s model of fairness components, this concern can be attributed to bias-suppression in decision making (“Decision structure”), while anonymizing can negatively affect “Gathering information” in the sense that useful information can be withhold from the reviewers due to anonymization. 2.3.6 Concern: Lack of opportunity to voice concerns While having a formal appeal procedure is rare, it appears to be common to allow contacting editors for discussion of comments or to challenge editorial decisions. However, this seems to be only used by senior researchers. CS3 said “You can always appeal, and I’ve seen people going … directly to program chair … I think there’s a bit of an old boy network about that; hopefully the decision is not gonna be based on that, but yeah, someone more established might be more willing to go to the chair.” NS2 talked about a lack of clear communication of the appeal process. “Because I’m fairly junior, I don’t usually know the editors that well … I’ve heard of people who’ve had some success with appealing, but I haven’t tried it myself because I’m not clear how to do it, and I think it helps if you’re someone who has a lot of leverages.” SS1 said “It’s something that you kind of have to be in the know that it’s OK to do it. Certainly as a junior scholar and a new assistant professor, I don’t think I knew that it’s OK to call up the editor.” NS2 pointed out that the way an editorial decision is framed can affect an author’s ability to appeal. “[With top tier journals,] sometimes it just won’t get sent out for review. … The professional editor just said we decided not to review this, but that’s really not something you can appeal very well, because they just say we don’t think there’s sufficient interest.  So, I suppose if you are a Nobel laureate, you can say I think it’s of sufficient interest because of x, y, 36  and z, but I feel like if I were to say that, it probably won’t change your mind. … I do see it as a fairness issue.” We learned from NS3 some publication venues do have formal appeal processes in place, and that it is considered valuable. “One of the responsibilities is to play super-referee for the cases that there seems to be a lot of controversy, either disagreeing referees or authors who strenuously and perhaps coherently make a case against referees, which happens. This is actually one thing that I like about [this venue] that they have a pretty formal process for appeals […] they are fairly open about what their appeal process is and it’s actually got quite a few levels, it’s like arguing courts to the Supreme Court.” Med3 pointed out that not having an appeal process is less problematic if there are multiple journals that an author can submit to. “If we were in a structure or field that had only one journal and they had draconian control over the field then an appeal process might be really important. … I think if you’ve gone through that process and the journal has decided that it’s not gonna publish your article, your resources are probably best directed to get it published in a different journal.” Similarly NS4 said “I never ever challenge an editor. … If they made a mistake, everyone makes mistake, move on; get the paper somewhere else. Fighting with an editor is a waste of time for everyone.” CS3 compared appealing conferences’ decisions with that of journals. “[For conferences] that’s very rare, usually everyone’s so busy, swamped, unless it’s an outrageous case … so, there is an appeal procedure possible, but it’s more likely with a journal.” 37  Participants point out that responding to reviewers’ concerns is often part of the standard process in journals, but the short timeline (often one to four months) for conducting the peer-review process for conferences often denies that opportunity. In some conferences instead of being able to revise and resubmit, authors might just be given a chance to rebut, i.e., address reviewers concerns in a short letter, rather than actually making any changes to the paper. Regarding rebuttals, CS1 said “Rebuttals are useless. … It’s not gonna change my opinion; you’re arguing on a tiny little fact while the big picture is that [the] paper sucks. It has never been a misunderstanding that was a big deal. It has never helped me and I’ve never seen it help anybody else in the rebuttals. [The revise and resubmit model] is fantastic; that works incredibly well, … My evaluation was I don’t think they can do this in time, or that I have found a flaw with his experiment and he’d have to do the whole experiment again … he might have actually done it or could do it and I’m being obnoxious about it.” In contrast, CS2 had a positive experience with rebuttals and considered it better than nothing. “I had a [venue A] paper where the rebuttal process was there so we responded and it saved the paper. It was an effective response. After that experience coming back to [another research area], there is no rebuttal …often times you feel like you need to respond.” Some have suggested instead of the current rigid structure around communication between authors and reviewers, a more fluid conversation model could be more helpful. SS1 had an experience where she engaged the editor in a discussion. “She was very friendly, very supportive, very encouraging, and told me on the phone what she really liked about the paper which was very helpful as I was revising it and rewriting it, and also clarified that she was not suggesting we move in the way I thought she had been suggesting.” However SS1 warned against potential 38  problems with too much back and forth between authors and reviewers. “As a reviewer, honestly I would rather my role be over and done with and not have an ongoing dialogue, … rather than have it coming back over and over, or getting emails, or having to revisit the issue, because then I have to go back [to] look at the paper. I might not have thought about it in a month …. As an author, not so sure I would [like it] either, because honestly part of reviewing and the whole peer-review and publishing process is kind of a game, it’s kind of like ‘OK, I’ve got this kind of feedback, what do I have to say in order to assuage that person’s concerns’.” SS2 echoed this feeling. “Because of pressures on publications, there would be endless back and forth on something. [... a conversation might] degenerate into disagreement and then put everybody under pressure.” In contrast CS4 said “The discussion basically just benefits the authors, so by taking things like that away, you’re just saying you don’t care what the authors think.” Regarding the concerns about long discussions between authors and reviewers, SS3 said “I think the spirit of enquiry runs counter to the spirit of productivity sometimes, I think having a conversation between author and reviewer would be hugely valuable.” While having rounds of revision as in traditional reviewing for journals and some conferences could be viewed as a conversation, the above comments seem to reflect a possible change in current expectations about feedback from reviewers to authors, particularly in the level of interactivity of the feedback. Mapping this concern to Leventhal’s model of fairness components, this concern is directly related to “Appeals” in addition to representativeness and correctability of decisions (“Decision structure"). Ability to voice concerns is also a core component Thibaut and Walker’s (1975) conceptualization of procedural fairness.   39  2.3.7 Concern: Harsh or impolite treatment of authors Some of the participants had experiences where their expected standards of politeness were not met. NS1 said “I’ve twice had reviewers disqualified for journals because of that. I’m not gonna stand and have somebody insult me for no particular reason.” CS1 said “[The reviewer] said the results were banal and I thought that was incredibly rude, like they were boring … ‘I’m sorry, did I need to make this exciting for you?’” CS1 added that monitoring reviews can solve the problem for small to medium-sized venues. ”I have seen a lot of rude reviews and I made them change them when I was in a place like PC chair … I don’t want somebody pissed off at the process because you are a dick and you didn’t have to be. … I couldn’t do it for [large venues].” Insufficient monitoring of and feedback to reviewers could be one of the root causes of this problem. This root cause is discussed in Section 2.3.21. SS1 attributed this problem to the norms of interaction within some groups of researchers. “We have unfortunately a bit of a norm among some people, for reviews they get kind of snotty, to be kind of mean and unnecessarily harsh, and the editor said at a recent meeting that he has decided that as editor of this journal if he receives a review that he feels is like that, he’s gonna send it back to the reviewer.” SS4 said “Especially with the young authors you don’t want to be insulting, and I’ve done it when I was younger. I felt really bad about it. I don’t think I would have done it if my name would have been on there. … I think you can do it more politely and it has the same effect; maybe even better effect. I know young authors get back the review, haven’t submitted very many things before, and they have a lot of ego in there and you don’t want to bruise somebody’s ego if you don’t have to.”  40  Several participants considered reviewers to be sometimes harsh, but they didn’t consider it to be impolite. Med4 said “Not inappropriately impolite, blunt perhaps, or reflective of an idiot reviewer perhaps, that’s my interpretation.” Med1 said “Not impoliteness … harsh reviews definitely, where clearly someone did not like this idea, did not like this results, and you say ‘whatever’.” CS2 said “[I’ve] never seen impoliteness. Sometimes they say harsh things, but not offensive. In [my] community people tend to be really hard on each other. Obviously if you’ve been working on a project for a year and then someone told you ‘that’s useless’ or ‘there’s no novelty to it.’ … I’m now used to that kind of criticism, but I could see how that could be discouraging, especially to junior students; could be really tough.” This comment suggests that familiarity with norms of interaction in the peer-review process, perhaps in conjunction with developing a thicker skin, could help authors better cope with the harshness of reviews. Several participants pointed out that dealing with inappropriate use of language is part of an editor’s job. NS3 said “I think once or twice I’ve got reviews on my stuff that I thought was bordering on impolite that maybe [the] editor should have done something … inflammatory wording, to me that’s the responsibility of the journal to deal with.” NS4 said “Some may say you should reveal your name, so that you don’t say stupid things, so you don’t attack the paper. Well, that’s the editor’s job to see that review and discard that review.” Eng2 suggested that writing harsher, more candid comments only in the comments to the editor could help avoiding the problem. “I’ll always try to reply to [an] author very gentle and polite, and the one to [the] editor is honest.” 41  Some thought harsh or even perhaps impolite reviews are not necessarily something to avoid. CS3 said “I've seen reviews that just said this is a complete junk, but it was complete junk. I don't know if that’s rude or impolite, saves people a lot of time”. Med2 said “People can disagree, and disagree firmly. Pretending to agree when you don’t is probably a great breach of etiquette and certainly a greater breach of professionalism.” Mapping this concern to Leventhal’s model of fairness components, this concern can be attributed to ethicality of decision making (“Decision structure”). Fair and ethical treatment is often discussed under the notion of interactional fairness (Bies & Moag, 1986), which is sometimes considered to be part of procedural fairness (Bies, 2005).   2.3.8 Concern: Long delays in processing submissions Speed of the peer-review process could affect authors’ satisfaction with the process, and even perceptions of fairness of the process. Several participants had occasional experiences with long delays. SS2 had an experience where he “had to withdraw because they seemed not be able to process it. In a year most probably you are off to writing something different, … suddenly this paper seems like pre-history, and how fair is it that you need to change all plans for the coming months just to be able to accommodate something that you left behind a while ago. There is a fairness issue, not just a technical matter.” Similarly SS3 said “[delay] is related to fairness, because it impedes that person’s publications [and] productivity.” NS2 suggested that delay could be perceived as intentional. “[It could be that] your work is being delayed because someone else is trying to do something similar. I don't have any evidence if that’s actually specifically happened with me, but I certainly worried about it sometimes.” 42  Several participants talked about causes of delay and potential solutions. CS1 mentioned that “Giving reviewers three or six weeks is useless, because I do them on the last day anyway.” SS3 pointed out the role of journals in setting expectations. “It’s almost like a cultural discussion: culture of reviewing. For some journals you know as a reviewer you’d better submit your review on time.” NS2 pointed out the reason behind journals’ inability to enforce timelines. “A lot of journals say stuff about how the editors expedite the process and we’d like everything done this time and this time, but it doesn't seem like teeth are put into that very often, and then the process just drags on anyway. That requires the editor to be the enforcer and the editor doesn’t like chasing after the reviewers.” CS3 pointed out the benefit of conference deadlines. “The beauty of conferences is that they have hard deadlines; you’ve got to do it by certain dates. With journals … it could take 6 months to get a review of your paper sometimes, and in the meantime someone could publish it at a conference and you’d be scooped.” Eng2 mentioned that finding appropriate reviewers could take time. “If the subject of the paper is outside my immediate area, then I need to do some research, I get onto Google … I get on their personal webpages to see what they are doing.” CS3 pointed out that not all of the people involved perform effectively. “Sometimes people just want to get on these boards, editorial board, or program committee, or whatever, so that they can have it on their CV. They want the credit, but they don’t want to do the work.” SS3 mentioned that peer-review support systems might alleviate the problem, and SS2 talked about importance of having “a pool of reviewers that are already committed.” Mapping this concern to Leventhal’s model of fairness components, concerns about undue delay or delays longer than usual can be attributed to consistency and ethicality of the decision-making process. 43  2.3.9 Concern: Breach of confidence Only a few of the participants had any experience with a breach of confidence; however they were not particularly concerned about some forms of breach of confidence that do not affect the fate of submissions. For example NS2 said “I have a strong impression that it’s quite common for reviewers to not probably quite stand to the level of anonymity or confidentiality they’re supposed to. I’ve certainly had numerous cases that people tell me that ‘I’m reviewing a paper about X’, probably technically they are not supposed to do that, but it happens all the time.” Several other participants discussed examples of revealing that they have reviewed a paper to the authors or others, but they did not think of that as breach of confidence. Most types of breach of confidence could easily go unnoticed by authors. However, Eng1 described an experience where a breach was brought to his/her attention. “I submitted to [Venue A] and [Venue A] is double blind. It came back reject, … then I got an email from a colleague of mine when I was in [another country], … he said ‘I hear you’re working on this area … I’m actually doing something similar, and I’d love to collaborate with you, and we could publish together’. And this thing I’d done was completely new … and he knew about it and no one had seen this. Even the students in the lab hadn’t seen it, and this guy’s supervisor was on [the] top level committee and he’d given him my paper with my name on it... my paper didn't have my name on it. I’m shocked, completely shocked at the level of corruption. It’s unbelievable that somebody would do that at that high level and think it’s OK.” CS3 described his concern for potential revealing reviewers’ identity by other reviewers or by editors involved in the process. “People aren’t always careful enough about being confidential 44  about this process … If it’s a vigorous discussion and you are strongly arguing against some paper with the other reviewers, then your name might be given to the author of the paper as someone who didn’t like their work, blocked them, whatever. It has happened to me in the past.” Mapping this concern to Leventhal’s model of fairness components, it is most closely related to ethicality of the decision-making process (“Decision structure”). 2.3.10 Concern: Inconsistency of decisions Some participants expressed concern about inconsistency of assessments over time and across reviewers/editors. Eng2 said “It’s unfair to the community. … Among the papers are wonderful papers that have substantial substance, and on the following page of the same journal is this emptiness.” CS4 said “Sometimes you see 2 papers on very similar topics where one gets rejected and the other gets accepted, and that’s clearly problematic. That’s because you’re making local decisions rather than global decisions.” SS1 described an experience where she thought “it was very, very close in the first journal which was a top tier journal … but then [it] went to the second but also very top tier journal, and I think it was rejected without being sent out for review.” Inconsistency of decisions was also a concern when the editor’s or associate chair’s decision was not consistent with reviewers’ recommendations. Eng1 described an experience where an AC “overruled all the reviewers … What's the point of having reviews?” Similarly CS1 said “Apparently in a lot of cases [ACs] insert their own opinion. I don’t like that. I think that’s not cool especially when all the other people liked it, and the AC is like ‘no, I don't like it’, 45  overriding all these other people who liked it.” SS1 had such an experience as a reviewer. “The editor basically ignored my comments, and accepted the paper, and I really felt like that was a total waste of my time.” In contrast, NS2 thought editors need to take a more active role in decision making. “The most common thing is for the editors to just forward the reviews on without providing any other input or curation of them; sometimes it’s because the editor isn’t exactly in the field and has a hard time judging it but sometimes the editor probably could do if they wanted to, but I assume they’re pretty busy so they don’t, or maybe they are instructed not to interfere with the reviews.” These two sets of comments suggest that there could be some level of ambiguity in what the roles of editors and ACs are. Mapping this concern to Leventhal’s model of fairness components, it is most closely related to consistency of the decision-making process (“Decision structure”). 2.3.11 Concern: Subjective, ambiguous, or inappropriate evaluation criteria Some of the participants pointed out the imprecision of acceptance criteria. CS4 said “The criteria for acceptance is not well defined in a lot of places, it’s up to the individual reviewers or associate editors.” Eng1 said “It also annoys me when they don’t evaluate the science, the actual contribution in the paper. We had papers rejected … where they felt the motivation wasn’t good. … That’s irrelevant. Fair enough you can duck the mark, but you cannot say strong reject and not even mention the actual work we did; drives me mental. They say ‘we cannot think any way that it could be used, why on Earth are you doing this?’” 46  NS3 considered subjective evaluation criteria to be a part of the problem. “Some journals lean more on qualitative things like importance or exciting or whatever. Of course these are subjective, and I think that’s where a lot of unfairness can come.” CS2 expressed concerns about subjective assessment of contribution, which is affected by the framing of the contribution. “You have to build [a large artefact], and then you have to argue for the scientific contributions. So, if those are not clearly articulated somehow then your paper gets killed, but even if they’re clearly articulated sometimes it’s hard to pull out the things that the reviewer would really find enticing. For the same [large artefact] you could write many papers so you end up doing that until you get it in and the thing that changes there is not the artefact it’s the wrapping and the scientific framing. … The fact that contributions don’t exactly line up to what the reviewers expect to see shouldn’t be cause for dismissal.” Existence of prestigious tracks/venues that deliberately ignore one or more aspects of submissions was perceived as an instance of fairness, because not all types of contributions can meet the standard criteria. For example, CS2 said “[we have a submission track for] new ideas papers and those are shorter, 4-5 pages long … The way I perceive fairness in the process is that often times the reviewers give you a break on that account. They really just evaluate the idea. They explicitly do not consider the evidence in support of the idea.” CS2 added “many of the conferences now have [a special track] for experiences. What they’ve realized is that there are a lot of papers getting rejected that fall into this category of ‘we built this cool thing, here’s a lot of interesting information about it and experiences but maybe we cannot really soften out the scientific contribution per se, maybe because we are lazy, maybe because we are not scientists, we [are practitioners] and maybe this is just a useful artefact that you should know about it.’” 47  Mapping this concern to Leventhal’s model of fairness components, it is most closely related to communication of decision procedures (“Setting ground rules”), and consistency of the decision-making process (“Decision structure”). 2.3.12 Concern: Lack of specific or appropriate policies and expectations Some of the participants found policies around expanding papers unclear. Eng2 said “Journals request that paper should be original, and what does original mean … it’s my usual practice to put effort into these [conference] papers, and after presenting the paper, I present the paper for publication to the journal. … I noticed that people differ in the approach to that, saying that look this is presented at the conference, it’s no longer original.” Similarly CS1 said “If you extend a workshop paper that was published in the digital library but not as a real paper, what constitutes acceptable extension to be acceptable as a conference paper, can you just add more content? Do you have to do another experiment? Can you just do more analysis of existing data?” Misunderstanding (or not knowing) about policies by reviewers could be problematic. Eng1 described such an experience. “One of the reviewers said that we were cheating because we altered the font size for references to fit more in the paper. And we were stunned, because they’d given a strong reject [with only a cursory review of] the content. … So we checked, and [it] absolutely conformed to the guidelines on the website. We were just stunned that this person haven’t checked the guidelines on the website before making this extremely strong and harsh criticism.” Eng1 also described a situation where s/he was about to enforce a policy that did not exist. “I start my review by saying that why on earth did you include your name and affiliation, 48  and then I go back to the website and see that the submission guidelines say that it’s single blind, but they never told us.” Some suggested better integration of policies with the user interfaces that are used for conducting the process. Eng1 said “The guidelines tend to be on the website for the conference, and it could be something hidden under ‘authors’, and there is often no link to it in the email that you get for a review. … Policies that are integrated in the interface that is used for composing or submitting the review would be less likely to be ignored.” Similarly NS5 praised the venues that are more specific about their expectations from reviewers in the review forms, those that “have [their expectations] in there before you can type anything in the comments box, it includes the list of things: ‘please include the following; don’t include the following.’” Adding more structure to papers and reviews could help increasing clarity of expectations and policies. CS2 suggested that more structure could help authors and reviewers to be on the same page with respect to a venue’s expectations. “[… Consider the] question of what your contributions are. … When a reviewer makes that kind of comment it’s a kind of a blanket statement over the entire thing … and that’s when you feel powerless … whereas if the reviewer had to explicitly engage with that statement, with that piece, or section of the paper, that would be effective.” SS3 suggested more structured reviewing. “In general it’s a very subjective thing. … I think the more you have [a] specific rubric to follow the fairer it is. Some journals have quite detailed issues that they want you to look at.” While clarity and communication of policies was desirable, some participants mentioned cases where flexibility in the process, or ad-hoc solutions helped (or could have helped) conducting a 49  better process. CS1 described judgment of a controversial paper using an ad-hoc process. ”[I] said [to program committee members] ‘here’s the big issues with the paper, can you all go read the paper and then weigh in, and don’t weigh in if you haven’t read the paper,’ …. It worked and actually a lot of people went and we had a huge discussion which was really cool, helped us make the decision.” Other examples of ad-hoc solutions are reviewers revealing themselves to offer help to authors, and authors contacting an editor to seek clarifications, which are discussed in previous sections. In areas where no policies are in place, all decisions are ad-hoc, hence, depending on the parties that make those decisions, some of the parties could be disadvantaged. CS1 described his effort to establish a conflict-of-interest policy. “I borrowed some from biology and put them for votes and a lot of them went away, like the ones that said if you have a financial interest if somebody is giving you money for this research you are forced to say so in the acknowledgements. … but in computer science nobody does that, and I tried to say ‘well, you should do that’, and … they were like ‘you have the freedom to say whether something was helpful or not.’” Mandatory disclosure of such conflicts of interest could help identifying potential biases in reporting findings. CS1 also tried to establish a policy about authorship. “There was a paper that sounded like two people that they talked to from a company helped them do the card-sort, and I’m like that’s kind of material to the paper. … I want to know who these people are to know whether there is an inherent bias. … They were like ‘no, they didn’t do enough to warrant authorship.’ I wanted to add a bylaw that said what exactly does it take to become an author. … It got rejected, because it should be up to the authors to decide.” One could argue that the lack of such a policy could potentially lead to unfairness toward some of the contributors. 50  Mapping this concern to Leventhal’s model of fairness components, it is most closely related to communication of decision procedures (“Setting ground rules”), and consistency of the decision-making process (“Decision structure”). 2.3.13 Concern: Conflicting mindsets of “Mentorship and support” and “Survival of the fittest”  Some of the participants pointed out a clash between two different mindsets within academia. SS2 said “Is it just we approach reviewing as a kind of Darwinistic process, where we just accept the best papers … or do we want to create an academe, a more inclusive and open space … How do you ensure that the kind of knowledge they bring [is] not literally destroyed because you say it doesn’t stand the journal’s whatever standard … Would you use your judgment to write your review in such a way that you allow that person to go do more research, think about some problems, revise, and resubmit? ... You can be sitting there within your walls, with all your more sophisticated conceptions of what standards are and willingly or unwillingly become part of a machine for discrimination and exclusion.” SS1 suggested a need for adaptation of the judgment process to the level of experience of authors. “I’m curious to know is this a graduate student, because maybe I’d be a little bit more supportive in my comments, making the difference between an R&R and a reject.” Med2 suggested that the mentorship aspect could be more important than the judgment. “I tend to see it as an opportunity to share my views whether they are right or wrong, at least so that the authors have the benefit of that. Rather than my being judge that decides. … If it was just an 51  up/down decision I wouldn’t be particularly interested in it. Seems to me that the function of peer review is to learn.” CS1 and Med2 expressed concern for unfairness towards non-native English speaking researchers and recommended creation of support structures. CS1 said “I think it’s incredibly unfair that we require papers be written in English, there’s the majority of speakers who are not native English speakers, and we force them to write in American or British English … then people reject their papers because they are hard to read, and I think that’s complete bullshit.” CS1 proposed the idea of providing official proofreading support, but received feedback that the idea was unappealing to the community's elders, who felt that poorly written papers with good research were a distinct minority of those that were poorly written and that most poorly written papers had little redeeming value at all; proofreading those would be a waste of people's time.Med2 described a venue that tried to accommodate non-native English speakers. “We would often have articles that were in very poor English … with phraseology that was very clumsy and so it would be a matter of guessing what they were getting at, offering a phrase, rewrite, or suggestion, and sending it back to them saying is this what you intended to say? … There was a culture that develops among the editorial board and core reviewers which we were.” The best experiences that participants described often involved a reviewer having a “support and mentorship” mindset. SS1 described her best experience as follows. “I had this one great reviewer, I later found out who she was, who actually gave me a really useful idea, and I thought wow, she shouldn’t have done that … almost all of the comments were about improving [the] paper for the better, so they were really constructive.” NS2 described an experience where “both 52  reviewers had read the paper exceptionally carefully and the final paper ended up being way better than what it would have been, because of the reviews. It’s the kind of thing when someone point[s] it that out to you, [you] probably would have made them a co-author if they were a colleague or someone you knew who they were.” This concern is difficult to map to the components of fairness as described by theories of fairness, because it questions the underlying basis of fairness assessment. We discuss this further in the discussion section. 2.3.14 Concern: Bias for specific methods and topics Participants mentioned that sometimes judgments are colored by individual biases and preferences for specific methods and/or topics. Eng2 said “I have seen reviewers say, ‘look you’re not doing it my way and I don’t like that.’ Maybe that’s a kind of unfairness.” Similarly Med3 said “I think inherent in peer review is what is everybody’s bias for how they want to see things done … It’s whether they feel that [the problem I’m studying] is a significant issue or not.” Med4 said “There have been a number of times when I’ve been unclear as to why a particular topic isn’t considered worthy of publication at all.” CS1 described an experience where a reviewer “was triggered by a word in my abstract … the rest of the review was just like railing against this perceived understanding of what I meant by that word.” Med3 explained how such biases could be community-wide, rather than individual biases. “Sometimes there will be themes that at the current time people want to see or hear a lot about … we end up with all these papers on these things, because the review committee had an interest 53  in that.” Med3 added that venues might have mandates for encouraging specific topics. “Because we are providing health care, and it might be that the government says that in the next five years what’s really important to us is quality of care … so the board … might have to tinker with the selection criteria to get more of those papers on the meeting.” Mapping this concern to Leventhal’s model of fairness components, it can be attributed to bias-suppression in decision making (“Decision Structure”), and perhaps to the “Selection of agents” because sometimes subsets of research communities are known to have such biases against methods used by members of other subsets of the community. 2.3.15 Concern: Judgment based on incomplete or inaccurate representations of research Often some of the artefacts or data used in research are not shared in the paper. Some of the participants commented on the importance of collecting such information or artefacts. NS3 expressed concern about insufficient details of research methods and procedures. “Sometimes I’ve asked for details about procedures, that I thought the authors were withholding, and that’s something that I’m pretty sensitive to; I don’t like people who try to have it both ways, where they publish something, but they don’t really disclose how they do it.” CS2 praised a venue where “as part of submission you submit a virtual machine, where you can rerun the experiment and reproduce the results. These are great. Especially in computer science … the paper is a nice summarized version that anyone could read but often times there’s so much more to the work, and being able to distribute and share it is important.” Similarly CS3 pointed to the importance of submitting source code, not for evaluation, but for future research. 54  “It’s really critical in [my area] to have benchmarks, and to be able to test other people’s code on standard benchmarks; people should be publishing their source code and the data that they tested it with, so that other people can repeat it; that’s the essence of science” Video is a powerful communication medium for demonstrating visual and temporal information. NS5 and Eng1 pointed out the importance of having videos as part of submissions. Eng1 said “The video becomes a huge aid to say this is what we are trying to demonstrate … This is our [technique] working in real time … so there are certain claims you can completely justify using video.” NS5 described an experience as a reviewer where s/he “kept insisting they show us their videos and they were resisting, and then once we saw the videos we were like you didn’t actually replicate our study.” Sharing datasets is another effort in the direction of collecting more complete representations of research. Med3 said “By the time you see the paper and the results tables, all that looks pretty clean, but the reality is that it’s never that clean … I guess if it was some landmark thing that was gonna change the direction of a whole area of future endeavor then being able to have someone vet the dataset would be pretty useful.” NS2 thought sharing data could help generate more trust of authors. “When I see that someone has presented a lot of their data, […and] they provide documentation of that process, it definitely makes me more confident. In my experience, people who tend to engage in greater data transparency frequently have done a more careful job, because either they are not trying to hide something, or they are more organized in what they are doing.” In contrast Med2 cautioned about datasets that could potentially mislead reviewers. “Datasets and graphs and things like that can be the magician’s hand distracting you 55  over here while some magic is being produced over there.” Several participants pointed out the importance of data for future research. NS4 said “If we go to a library archive and find a paper, and then the supplementary materials like an appendix that has the data in it, that’s a gold mine for us. … We generally use very large databases that we want to mine and use for years, so we’re not gonna give the whole database away, but whatever data used [is] for the paper, that’s a good idea.” Despite the importance of datasets, our participants often thought it would be too much work to assess them. Med4 said “I wouldn’t have time to look at it. When I go and do a review I do have a fundamental trust for the key issues that people are talking about, they have at least been honest about it.” Eng1 said “In one way it can really prove what you did was true, but in another way do you really expect the reviewer to go through this?” SS1 said “I don’t think the reviewer would ever ask for the dataset and try to redo the analysis of the dataset.” NS1 said “I don’t have the time to download that information and re-analyze it even though I sometimes suspect that the analysis has been done incorrectly, so I’m mostly looking for whether the conclusions are logical based on the information that is there.” Eng2 said “There have been a couple of cases where I would have liked to see the data; but some papers are so poorly written if the data is in the same state of confusion as the writing, I really don’t want to see the data.” CS1 pointed out the potential IP issues with submitting datasets. “I have seen in reviews where [… the reviewer] were like we shouldn’t accept it because they didn’t reveal the dataset, and I was like dude, they’re not gonna reveal the dataset. … Which would you rather have, the paper that describes the stuff and you just have to trust them because they’re giving you enough data to 56  believe it, or just have an academic conference completely useless to anybody outside the academia.” While there is a lot of potential for using more complete and accurate representations of research, there are concerns that cannot be addressed by these representations, and perhaps fall outside the scope of peer review. For example, Med1 said “The only issue there would be the question of if there are any problems in the quality of data collection that is not reflected in the paper.”  Table 2.3 summarizes the concerns discussed above and their subthemes. Theme (Concern) Subthemes Reviewers’ misunderstandings caused by a lack of expertise or careless reading   Differences in areas of expertise between subcommunities  Use of inexperienced junior reviewers with little oversight  Vastness of the expertise needed to evaluate each paper and partial expertise of reviewers Bias for or against certain authors by some reviewers  Bias for friends  Bias for high status individuals  Bias for individuals from high-status institutions  Difficulty of reviewing anonymized submissions because the reviewers cannot use their knowledge of abilities and experiences of the authors  Importance of understanding the context of a study, for assessing it Lack of opportunity to voice concerns   Lack of formal appeal processes  Lack of communication of appeal processes  Difficulty of responding to ambiguous or overly general comments made by reviewers   Severity of the problem when there are very few (or only one) comparable venues for publishing  Short timeline of conferences that might not allow for an appeal process  Inability to rebut, or revise-and-resubmit in many conferences 57  Theme (Concern) Subthemes Harsh or impolite treatment of authors   Rude or blunt reviews  Oversight to ensure politeness  Balancing honesty and efficiency on one side and politeness and careful language use on the other side Long delays in processing submissions   Negative effect on authors’ productivity  Role of the venue in setting expectations and managing the process Breach of confidence   Little concern for reviewers revealing themselves, or reviewers talking about their reviewing tasks with their colleagues   Concern for disclosure of papers under review to others   Concern for divulging a reviewer’s identity to authors by other reviewers (when reviewers are not anonymous to each other)  Inconsistency of decisions   Inconsistency between reviewers’ opinions  Editor or AC overruling reviewers Subjective, ambiguous, or inappropriate evaluation criteria  Lack of clear definition of evaluation criteria  Use of subjective criteria like importance  Interest in venues that deliberately ignore some criteria, and focus on others Lack of specific or appropriate policies and expectations  Ambiguity of policies around originality and submission of extended versions  Insufficient and out-of-context communication of policies  Possibility of enhancing structure of articles and reviews to reflect expectations from authors and reviewers  Need for flexible policies to handle special cases  Lack of policies around authorship, conflict of interest, etc. in some venues Conflicting mindsets of “Mentorship and support” and “Survival of the fittest”   Need for balancing the two mindsets  Need for more focus on mentorship when reviewing junior researchers’ work  Discrimination against non-native English speakers Bias for specific methods and topics  Individual biases for certain methods or topics  Community-wide biases for certain methods or topics 58  Theme (Concern) Subthemes Judgment based on incomplete or inaccurate representations of research  Insufficient details of research methods  Sharing datasets, videos, virtual machines and other artefacts to improve reproducibility and enable more reliable assessment  Need for facilitating handling and assessment of datasets  Need for establishing data quality standards for shared datasets Table 2.3 Concerns about the fairness of the peer-review process (1st column) and summary of the discussed sub-themes related to each concern (2nd column) In the following sections we describe eight themes that we grouped as root causes for concerns. Some of these root causes are related to and some contribute to each other. The root causes include difficulty of preserving author anonymity, anonymous reviews, power imbalance and insufficient oversight of reviewers, difficulty of finding appropriate reviewers, high workload of reviewers and editors, insufficient feedback to reviewers and lack of recognition of reviewing, extremely high bars for acceptance in top-tier venues, and pressure for publishing. Mapping this concern to Leventhal’s model of fairness components, it is closely related to accuracy and completeness of information collected for the purpose of evaluation (“Gathering information”). 2.3.16 Root cause: Difficulty of preserving author anonymity Several participants mentioned the difficulty of effective anonymization. Building on your own prior work appears to be a common concern for proper anonymization. CS3 mentioned that as a reviewer s/he would guess “if [authors] are building on someone’s work, they are probably building on their own work.” Med3 said “Sometimes you can figure out where they are from, 59  because institutions have kind of specific approaches.” NS5 said “[blinding could work] when the work is in the beginning of the work or really unknown, but my sense is that when I submit a paper everyone knows it’s me, because it’s based on my past work.”  Eng1 said “We’ve had like personal criticisms in the reviews versus objective criticisms … I had that kind of problem once in single blind but mostly in double blind. Where they can look at your previous work and references and tell who you are. I think anonymizing the work isn’t working to try to prevent either real or perceived bias.” SS3 said “Because [our field] is small, I think it’s very hard that … you really have no idea who this person is. I would say about a third of the time I can say who the person is. … For smaller areas of enquiry it’s impossible to have full arm’s length position and not know the people. So I think sometimes the expectations are unrealistic” Several participants mentioned their curiosity and desire for guessing authors’ identity. This desire and curiosity paired with academics’ desire for enhancing visibility of their research online and offline could jeopardize the efforts made to anonymize submissions, and can bring back potential biases for or against individuals. SS4 said “I always try to figure out who wrote the article. … I've been wrong sometimes when I thought I know exactly … It’s just curiosity.” Similarly SS1 said “I’m always curious, and I’ll admit that on occasion I have guessed and I have even Googled people.” Interestingly SS1 also described an experience where “[A journal] asked me to review a paper, and told me the name of the author, and I was really uncomfortable with that … because we have so much of a norm of double-blind.” Our participants pointed out that ineffective anonymizing could be caused regardless of who does the anonymization (authors, or the editorial staff). CS1 thought authors are in part to blame 60  for ineffective anonymization. “You’re supposed to anonymize yourself but tons of people don’t do it very well … whether it’s accidental or not … it’s like you’re the only one with a project with this name.” Med4 said “Unfortunately the editor or the folks supporting the editor’s office may miss something like ‘I’d like to acknowledge so and so at [location A].’ this sort of gives you a context of where this is and probably who did it.” Some of the participants suggested that publishing the digital age could diminish anonymity. SS3 pointed out the possibility of giving away identity because of the identity traces in digital documents. “I often attach track changes of the article. I don’t know how they get rid of my identifier.” CS4 mentioned the role of preprint servers. “Often when you guess, you guess wrong, unless you actually have explicitly seen [a preprint of] the paper.” Mapping this root cause to Leventhal’s model of fairness components, it affects bias-suppression in decision making (“Decision Structure”). 2.3.17 Root cause: Anonymous reviews Some of the participants considered anonymity an impediment to fairness. Increasing accountability was the most commonly cited benefit of disclosing reviewers’ identity. CS4 said it is not desirable if “you just send it to this blob of the conference and like there is never any name associated with any decisions that are made.” CS1 said “Revealing yourself makes you more accountable which means that you tend to be polite unless you are an [expletive].” SS1 said “I try to think of a review as if I’m signing my name in the end; sometimes I’ll even put my name at the bottom, write the review and then take my name off, because I do think there’s that tendency 61  to get really nasty, and if I feel myself falling into that, having my name down there even temporarily kind of catches me and makes me feel ‘no I need to be more polite, I need to be more constructive.’” SS4 said “It’s supposed to be anonymous but I usually make myself known because I figured authors will probably have questions. I don’t like anonymous review at all. It works against accountability. In anonymous review you can be really nasty, and some people are, and I know the reason is supposedly, if you’re not anonymous then you won’t criticize, but I write critical reviews all the time with my name on it. Similarly Eng1 said “I’m critical at people to their face at a conference, I’m not nasty about it … Younger people tend to be worse at that. They may feel it personal and feel [they are] being attacked and hopefully gradually over [a] PhD they realize most people are nice.” Eng1 recommended disclosure of identities not only to authors but also publicly. “If it’s publicly named then you’re gonna attack biases. … It’s a transparency thing, like having an open government, there’s always backroom deals, but you want to know about them.” Similarly NS5 praised a venue that discloses reviews publicly. “At [this venue], it’s a back and forth with reviewers and the whole thing gets published … if all of this is made public, then everyone’s gonna try harder to check their biases and be more reasonable … I like the idea that everyone can see the reviews, kind of holds people accountable.” Several participants talked about situations in which they might prefer to disclose their own identities. SS1 said “If its’ somebody I know and I know their work, and I wrote them a positive review, then there’s no harm in revealing it … it feels weird to keep the secret from that person.” NS2 said “There were a couple of times where I’ve said I’m giving my identity because it’s similar to what I’m working on, but I've never done that where I’m also giving a really negative 62  review … If I reveal my identity when I’m giving a negative review, I’m just gonna antagonize the authors, and if I’m gonna reveal my identity when it’s a positive review, I guess from the perspective of other reviewers and the editor, now I’m trying to game the system by gaining credit with the author.” SS1 considered gaining credit for giving high quality and positive review as part of the publishing strategy game. “Maybe I earn some brownie points, and maybe they’ll review something of mine later on, if they feel grateful … If somebody had told me that they have reviewed my work and I know that it was a positive review and then I were asked to review their work, I might feel more likely to agree, but I wouldn’t feel more likely to be positive, because I can still keep my anonymity … If they were more likely to write me a more positive review because they happen to know who I am, I’ll take it.” SS1 added that “there is so much informal inequity in publishing, I mean it’s so much, who you know, whether you’re familiar with their work, and how it gets presented, there’s inequality upon inequality, and to live a perfectly wholesome life where you never use anything to your advantage, and you never try to benefit from something that might benefit you, I think it’s kind of useless in a way. … publishing is [a] strategy game, so you try to be strategic about thinking about who might be an appropriate reviewer, how can I tailor the paper to increase the odds that that person is gonna be selected by the editor as one of my reviewers.” CS2 made a similar comment regarding strategically adapting papers to potential reviewers. “I’m much more cognizant now of who will be a potential reviewer. It’s something you eventually learn to do, after you get a lot of harsh reviews, you say it’s gonna be one of these three people because the topic is in their purview; so you learn to cite them, and adapt to them.” 63  CS4 mentioned cases where perceived credibility of the opinion could significantly increase as a result of the disclosure. “If you see that it’s a highly qualified reviewer you might take it more seriously.” Similarly CS1 said “If Turing was reviewing your paper … and he said like ‘you could do a better job, I know you’, that would be nice, especially if it was like Turing. You would want to know, and not just because he is somebody famous, but especially if like let’s say you wrote a paper that is [an] evolution of someone else’s work, and that person gets to review it which is pretty typical … it’s kind of neat to know who they are.” Another potential benefit of disclosing reviewers’ identity is to declare and make explicit personal biases. CS1 said “It should be fine [to self-reveal] if the reviewer is innately biased in some way, let them reveal their bias where bias is not enough to be a conflict.” CS2 said it’s possible to disclose biases without necessarily disclosing identity “[I might say] my background is the following, so, that’s obviously the perspective I’m gonna be reading from. In a sense it lets them know my biases.… There’s always the option of explicitly revealing yourself. It’s a protection given to you, but you can always refuse it.” However, whether or not concealment of identity is mandatory or optional is not always clear. CS1 described an experience as a PC chair where “one of our reviewers wanted to put a license on his review, because he wanted to sign it, and wanted to make sure that his signature will make it to authors … I took it to the steering committee who went ['ballistic'] basically. ‘Oh, my god this is horrible, dump the reviewer we cannot have licenses, nobody can sign the reviews.’ … There [are] lots of places where it’s a good idea to reveal who you are, but these people were super afraid of that. They were like you are somehow breaking down the world order if it was at all revealed.” 64  Lastly, the ability to disclose your identity could present opportunities for collaboration. NS2 said “I chose once to reveal myself, and I did it because the paper was reviewing a line of thinking, and I was interested as well, and in the review I pointed that out and said if the authors are interested they can contact me for further discussion, … they did contact me, and we had a long and interesting discussion.” Similarly SS4 said “Sometimes I see an article that I review and I want to know who this is because this is very interesting and I want to talk to this person. One of my closest colleagues I met that way.” CS2 praised self-revealing as a good gesture. “They wanted to say ‘contact me, if you want to talk to me about this’. …Those are really good gestures, because basically you’re hiding behind the smoke screen rejecting papers for various reasons.” In contrast, our participants also pointed out that remaining anonymous as a reviewer will help maintaining objectivity. Med3 said if “it’s not blind you are not gonna get objective reviews. It’ll be just the friends club.” Med1 said “I would have a hard time, I think, putting my comments down and not tailor them too much; … I think it does free up reviewers by being anonymous to just step back and not have to consider the impact on those relationships.” Similarly NS3 said “You’d think the chances are that this person is going to be on a grant application that I do or something like that, and they might take offence and they might take it out on you in the future. NS4 said that he declines review requests from the journals that disclose reviewers’ identity because “it can impact your precision as a scientist. If you criticize someone else’s work and you’ve been seen to criticize, then that person can take offence.” CS4 said “You don’t want to start people getting vendettas against each other.” 65  In addition to more objectivity, CS2 considered expedience as an additional benefit of remaining anonymous. “It helps to write the review much more candidly. I think it makes the reviewers’ job easier being able to say things without thinking about social repercussions as much. … They are direct for expedience mostly because I don't want to spend the time to develop rapport.” Mapping this root cause to Leventhal’s model of fairness components, it affects bias-suppression in decision making (“Decision Structure”) as well as perceived accuracy of “Selection of agents”, because the authors are blinded to the identity of decision makers and are not able to judge their suitability for the task. 2.3.18 Root cause: Difficulty of finding appropriate reviewers As suggested by the participants, lack of expertise of reviewers could be attributed to the challenge of finding expert reviewers. For example, CS1 said “Lots of my papers have gotten rejected because the reviewers had no [expletive] clue what I was talking about and I don’t think it was the paper … it was like they had like no expertise, and had no idea what was going on. You cannot solve that problem because there are just so many people available to review and if they don’t know what you’re talking about then you’re stuck. If the conference scheduling allowed for it, it’d be nice if they could say ‘I have no [expletive] idea what I’m doing, and please take me off that review.’ They cannot do it for a conference, because they cannot get a new one in time.” Finding reviewers, despite use of various computer systems for suggesting matches based on topical relevance (extracted from user-defined keywords or prior publications) while balancing 66  reviewing load on reviewers and avoiding conflict-of-interest concerns is a hard problem (Charlin & Zemel, 2013), perhaps because some of the concerns and constraints are difficult to formalize. SS4 pointed out the problem of reviewer selection when you’re not familiar with relationships in the community. “If it’s slightly outside your own sub-sub-subfield, you don’t know that so and so is the student of such and such who is a rival of this and so; so the student will say I don’t want my professor’s rival to be reviewing this because he’s out to get so and so.” Med2 suggested that finding reviewers is a more acute problem in broad-audience journals. “In a more general journal … they wouldn’t necessarily know or have in their pocket a list of reviewers in that particular field. It comes to the issue of fairness, fairness to readers that thoughtful people have reviewed this article and you can rely on it, and fairness to the researchers to not be judged by non-peers.” NS1, Med1, and CS4 pointed out the large number of reviewing tasks as a potential culprit. NS1 said “There [are] so many journals that it’s hard to find reviewers nowadays.” Med1 similarly said “Because there is such a proliferation of journals, one wonders … how is one going to maintain the quality of peer review given that it’s an onerous process. I’ve declined to do more reviews and then wondered who’s gonna do them.” CS1 and CS4 mentioned that finding reviewers is challenging for conferences because of the burst of reviewing tasks. CS4 said “[In] the conference system because they have to review so many papers at once, it will definitely happen that you get unqualified reviewers at times. I have had that experience and I’ve also been assigned papers that I was not qualified to review …with these conferences they have to review 1500 papers in 2 weeks, so they need to just find people 67  who do it.” Similarly CS1 said “Once you’ve started and allocated everything and you are left with a lot fewer people and you’re like, huh, how about this random … guy I don’t know what he does. I’ll just stick him on that review, because he’s available.” Eng1 suggested that publishing trends is one reason for the sudden need for a large number of reviewers in an area, because many of authors are not experienced enough to be a reviewer. “There tend to be fashions, in every field, which come up. People think ‘oh this is interesting’ or think it’s easier to get it published if they work in this area and so they do some work in this area and then there is an influx.” Med2 suggested that perhaps paying reviewers, especially practitioners, could be a way of incentivizing reviewing. “It’d be inappropriate for me to get paid for it. But … I’ve got it easy because I’m a university professor, and my salary is there, and what I do with my day doesn’t change what my income is. On the hand there are peers who might be having a private office and so on. … You could be systematically excluding some folks like private physicians who could have very valuable things to contribute to the peer-review process.” Having to use inexperienced reviewers is only one of the problems caused by the limited reviewing capacity in research communities. For example, depending on the availability of reviewers the mix of reviewers might become overly harsh or overly lenient. NS5 described one such experience that was properly handled by the editor. “The action editor sent it to reviewers who were gonna be really hard on me and knew that and he, despite the negative reviews, didn’t reject it, and said ‘I’ve seen these reviews in context; they would be negative and here’s the way you can improve the things they say and I accept the paper.’ It took several rounds of reviews a 68  year and a half or something like that and it did actually work and I think the paper is a lot better in the end for being able to speak to that other side.” To address the difficulty of finding suitable reviewers, several journals ask authors to recommend reviewers. SS3 said “I think there is value in allowing authors to recommend reviewers, even though I say our field is small, the diversity of topics and methodologies is quite wide; so, I think that’s quite useful, …  Having served on [an] editorial board, we are always searching for appropriate reviewers.” SS4 said “It might be that somebody’s out there that I haven’t heard of and if the author makes a suggestion you can check it out.” In contrast, some considered asking authors to recommend reviewers as useless, or even bogus. SS2 said “I never saw myself either entitled or thinking that it was productive for me to intervene in that. Given that you usually choose journals that are within your expertise, these journals should have by definition people who know what you’re doing.” CS1 said “Suggesting reviewers would seem bogus; like people suggest reviewers for tenure cases too. I want that guy to review my tenure case because that guy is a friend.” However NS3 pointed out that editors could manage that potential problem “the right thing to do [as an editor] is to pick someone from the list of suggestions by the authors and somebody else. Of course you probably will be picking people that you think will be sympathetic to your paper. … We all know that we all have friends out there, even if you cannot quantify what the conflict of interest is, maybe you just met at a conference and had a beer together.” Eng1 suggested an alternative solution where authors can choose relevant groups of reviewers. “Put all the names on the website, so you know your area 69  and you know who is on the list of reviewers. … Say that if you’re submitting with this keyword, these are the reviewers that can get your paper.” Mapping this root cause to Leventhal’s model of fairness components, it is closely related to “Selection of agents”. 2.3.19 Root cause: Power imbalance and insufficient oversight of reviewers Some of the participants expressed concern about power dynamics in the peer-review process which could cause behavior that leads to unfairness. In other words, reviewers’ and editors’ power over publication of manuscripts could taint the discussions between authors and reviewers, or could be exercised to influence decisions toward their personal preferences, especially because there is little oversight of reviewers. SS3 said “I think there’s power operating when you’re a reviewer … sometimes people are abusing their power, or they’re unaware of their own biases that are influencing their review.” SS1 said “There’s nothing to stop … an editor from deciding that I’m not gonna send this for review or who am I gonna send it to or you know once the reviews come back I’m not gonna accept it if they’re kind of lukewarm partly influenced by the reputation of the institution or anything else.” Similarly Eng1 said “You end up with a single person, the AC, is the only one who is gonna argue for your paper, and if they don’t like it, it’s gone … I don’t know if two ACs can help … If you have a dominant personality then your opinion is gonna be heard more than the shy person. … I assume most of them are probably objective and use the reviews to inform their decisions; but I’m convinced that there are some that say ‘no, I don’t like this’.” NS2 offered a different perspective and pointed out that intentional abuse of power is rare. “I’ve seen little [or] no cases of real abuse of 70  power and a lot more cases of not being careful or not paying attention on the part of editors or reviewers.” Several participants pointed out the occasional problem with lack of oversight of reviewers. Eng1 said “I’ve had reviews saying it’s complete crap … that reviewer should have been kicked off that team for using that kind of subjective language. I’d say [the venue] is a big culprit for this because of the volume of reviewers and complete lack of checking or reviewing the reviewers. CS4 pointed out that even if we monitor reviewers it’s difficult to discourage bad reviewing “there’s no disincentive to do bad reviews. In the worst case you won’t be asked to do a review again.” Participants who had experience with monitoring reviews expressed concerns about the repercussions of formally recording reviewers’ performance. CS1 said “As a PC chair … I went through all the reviews for the last three years of three conferences … to decide which people I’m blackballing … This is the kind of thing you don’t write down. … You talk to the people running the next year’s conference and you tell them the names. [If you write it] it will always come around to bite you. They are just not asked to be on PC next year.”  CS3 mentioned that secretive record keeping is used in some communities. “[It] might be against certain privacy rules, but anyway [program committees] might have a secret file of ratings of reviewers. … This used to happen anyway but it was by word of the mouth.” Other participants, who did not have experience with recording reviewers’ performance, expected the existence of such a system. NS1 said “I’m fairly certain that editors take note of who their more reliable and more detailed reviewers are that are more helpful in making 71  decisions.” And NS4 said “I’d be very surprised if editors do not track [reviewers’ performance].” Some of the participants talked about criteria for assessing reviewers’ performance. For example, CS2 suggested using measures that are as objective as possible. “You can rate reviewers’ timeliness, like whether they actually write coherent stuff, detailed, whether they participate in the discussion.” Eng1 suggested crowdsourcing the assessments. “I think every review of every paper should be on the website for everyone to see, keep it anonymous, it means that people can actually see that this paper got these reviews and this review is completely terrible, then we can have all reviewers of the conference moderate or vote on those reviews, something like Slashdot.” Eng1 also argued that it should be clear why one is being added to a blacklist. “If I’m blacklisted by a conference I want to be told and I want to know why. …  [if you] have an AC to sponsor you to come back, you get a second chance … Such list if visible to anyone, especially if public, could affect recruitment decisions, which is why it should be taken seriously and a half-assed implementation could be risky.” In contrast CS3 said “You could argue that anyone should have the right to see anything that’s written about them, but then you wouldn’t write anything.” While researchers review each other’s work, reviewing tasks and power over submissions in not necessarily distributed uniformly. CS3 pointed out that relationships play a crucial role in assignment of researchers to reviewing tasks, which could then bias the process. “Looking at how program chairs are appointed it’s very much an inbred situation, people tend to appoint their own PhD students to program committees. If you look at the whole structure it’s a self-replicating social system with its own set of norms.” 72  Several ways through which the power dynamics are managed were mentioned by the participants. For example, CS2 said “You can escalate issues to the chairs if there is an issue. It’s important to have a chain of command; most conferences do, and [that] makes it pretty clear who to turn to when there is a problem.” Med3 said “The editors are people that are well known in the field and have a long track record of service to the society and the area of interest, and they are altruistic in their mission, so I don’t think they exert control.” Med3 added that “the other reviewers see your review comments and so [does] the editor, and the editor’s inviting you to be the reviewer, so there is some accountability within the process.” Eng2 said “At the end of the day, [the] editor’s power is limited because the author can withdraw a paper from one journal and submit it somewhere else.” However, several participants mentioned a pressure to publish in specific top tier venues. NS1 said “There’s some peer-pressure to try to publish in those very highly ranked journals.” SS4 expressed dissatisfaction with such pressure. “They look at[the]  impact factor of journals and whether you’re the first author or not, which to me is complete idiocy. You have to read it to see if it’s any good. But it seems to be getting more and more quantified and more and more standardized.” NS2 also talked about the importance of publishing in top venues. “There’s already a huge number of examples where someone made a profound discovery and published it in a good specialist journal and someone else repackaged some variant and published in in [a top tier venue] and ended up getting a lot more credit for it, when they are sufficiently close in time. … People don’t often go back and look at the dates of submission, they just kind of say I read this paper in [a famous venue]. … Their work is judged based on where it’s published rather than what it actually contained. I do think it’s a fairness problem.” 73  Mapping this root cause to Leventhal’s model of fairness components, it is closely related to “Safeguards”, and “Appeals”.  2.3.20 Root cause: High workload of reviewers and editors As discussed before, a lack of reviewing capacity has been cited as a factor contributing to the difficulty of finding suitable reviewers, and consequently using inexperienced reviewers or a problematic mix of reviewers. Another problem caused by the limited reviewing capacity is overloading reviewers. This overload can diminish the quality of reviews, sometimes even intentionally. Eng1 described such an experience. “I was given 10 reviews to do, full papers, and I said ‘look my workload is already high, I cannot do that.’ They apologized and said ‘we have no one to give them to. If you can, just do shorter reviews.’ I replied no. I did a full review for each one, because you shouldn’t have to reduce the quality of review, this is a premiere conference, and you’re really reducing the chance of authors for getting good feedback, and you are letting bad papers to get in because of the shorter reviews. … There were reviews that were one line … less than a paragraph. I was stunned that people actually took this to heart and did tiny reviews.” Much of the responsibility for the peer-review process is on ACs or editors, including monitoring reviewers and ensuring that they review according to criteria and that the reviews meet scientific and professional standards. However, some of the participants suggested that in the case of conferences and prestigious journals they just might not have the time to do so, because of the high workload. CS2 said “It’s up to the chairs to supervise the process and to push back on the reviewers … but in practice you get some reviewers who are very stubborn, and maybe the paper 74  chairs aren’t paying as much attention to each individual piece. … If the paper gets two negative reviews, then maybe that kills the paper and the chair never looks at them. I think this issue with dealing with scale makes it difficult to police the reviewers.” Similarly NS3 said “Obviously I see a review that looks like a very bad review to me, but maybe then they are just too busy to be able to pick that up.” Mapping this root cause to Leventhal’s model of fairness components, it is closely related to “Selection of agents”, accuracy in decision making (“Decision Structure”), and “Safeguards” because part of editors’ role is safeguarding the integrity of the process. 2.3.21 Root cause: Insufficient feedback to reviewers and lack of recognition of reviewing Insufficient feedback to reviewers and the little visibility the peer-review process were cited as the root of some of the concerns such as reviewers’ occasional preference for expedience and efficiency over careful reading of submissions and careful use of language.  Some of the participants considered the oversight of reviewers as an opportunity for feedback provision, and improving the quality of reviewing. Med2 said “Feedback loops help me be a better reviewer, and I don’t know whether people ask me to review because they are absolutely desperate and they know that I’m gonna do it, or if they’re really getting good value from it.” Similarly Med4 said “If [reviewers’ performance] is being recorded I expect to know that it is and be able to access it … this is an opportunity to educate me to be a better reviewer” NS3 warned that “There is a fine line between helping reviewers to improve and pissing them off.” 75  Some of the participants suggested that explicit personal thank you notes could act as a simple (and perhaps less contentious) feedback mechanism to help motivate the reviewers. NS4 said “Sometimes you do get a personal note … I don’t expect it and don’t really want it every single time.… but every now and then a pat on the back is nice.” SS1 suggested that even standard form letters that are sent by individuals, not automatically, can be helpful. “I got a really nice email from the editor saying, ‘Just so you know I thought that was a fabulous review, very thorough and fair, and you really did a great job with it’ and I was really pleased, and I felt really good about myself until maybe a year later, I got the exact same letter from the same editor, I was like OK, that was a form letter, but at least he made the effort to send a nice thank you form letter. Even if it’s a form a letter I like knowing that this was a useful or helpful review, there is nothing worse than feeling you are wasting your time.” Some of the participants highlighted the unsatisfied need for feedback about what happened to the papers they reviewed and why. SS3 said “There was one journal article I reviewed that I sent substantive feedback and I thought it could be strengthened, and then… you know you don’t hear back other than some thanking. I remember this one article got published with almost no changes.” Med3 recommended a more comprehensive form of feedback on reviews “It’s really helpful to get feedback from other reviewers who are reviewing the same paper. … Saying here’s the 20 papers you reviewed here’s your scores and comments and here’s what other reviewers thought about the same papers, over time.… You’re getting educated on how to be an effective reviewer.” NS3 made a similar suggestion, expressing a desire “to see if you’re reasonable or unreasonable or people are picking up on similar things.” Eng1 made a similar suggestion and 76  added that “it’s not necessarily the case of I disagree, or that [an editorial decision] shouldn’t have happened. It’s more the case of where is the footprint of the thought process.” As mentioned above, a simple feedback loop could be achieved by showing every review on a paper to all reviewers. This can be expanded to become a collaborative process by asking reviewers to discuss their concerns. NS2 pointed out that such a discussion forum, if not anonymous “would force them to be accountable about what they say and be ready to defend their criticisms in an environment that is power-balanced. … In most of these situations there is, at least three people involved; the editor is usually pretty well-known … I think usually when people try to pull power dynamics they tend to do it when not a lot of other people are watching … If one of the reviews is wrong or mistaken, which is the case a lot, … other reviewers can correct him, and because the reviewers are not anonymous to each other it provides some incentive for them to put at least a minimum amount effort into review, which sometimes is not that common, if it’s totally anonymous.” Eng1 suggested that such discussion forums could be valuable even if anonymous. “It’s good that when I’ve been the expert, non-experts have differed saying that ‘OK, I gave it neutral but this guy has talked about it and made more sense, and it should be given more weight,’ and I’ve done the same thing.” NS2 described the benefits of using the discussion forum for reaching a decision and sharing the discussions with authors. “[The reviewers] have to reach some kind of consensus on what they’re gonna do. … it’s helpful to authors because they get a coherent thing of this is what we are saying rather than three different things.” 77  This root cause essentially is one of the underlying reasons for the difficulty of finding reviewers. Therefore, mapping this root cause to Leventhal’s model of fairness components, it affects “Selection of Agents”. Lack of feedback and recognition could also affect accuracy of decision making (“Decision Structure”). 2.3.22 Root cause: Extremely high bars for acceptance in top-tier venues Several of the participants expressed concern about how top-tier venues treat submissions. NS4 said “Higher rank journals, as soon as they get the slightest negative term they throw the whole paper out. That’s unfortunate as an excuse to get rid of papers, just because they have a mass of submissions.” Similarly NS3 said “Generally you send something to [a top venue] because you feel like it should be in a journal like that, so if they are rejecting it before even reviewing it you feel like that would be unfair … their philosophy is that we get so many submissions that if it all doesn’t go just [all reviewers saying yes] we’ll just reject it, even if they’re wrong in their reasons that they are hesitant for; they just have too many papers to care about.” CS4 suggested that the level of strictness could go even higher if a paper is expected to be influential. “We had a paper that was very influential, was getting a lot of attention and it got rejected, and I think it was just because it was held to a higher standard because it was already identified as such an influential paper … We put it on Arxiv so people around the world knew about it, people were citing it, people were building on it and they actually rejected the paper, which was a strange thing for a foundational paper in a new area to be rejected.” CS1 pointed out that extreme strictness could lower consistency of judgments. “[When the number of submissions is high] they end up drawing the line between accept and reject where 78  there is actually a lot of good papers below the accept line. And it tends to be a crapshoot whether it gets in or not. … It’s just so unfair to have a venue that’s so important and is the main conference and you cannot get in even if your paper is awesome.” This root cause might be difficult to map to any of the components of Leventhal’s model of fairness but the nature of the problem could negatively affect ethicality, correctability, and accuracy of the decision-making process (“Decision Structure”, and “Safeguards” components).  2.3.23 Root cause: Pressure for publishing The peer-review process is part of larger process of evaluation of researchers. While pressure for publishing is not directly related to peer review, participants mentioned it as a factor that could affect authors’ sensitivity to decisions made by peer review, and their perceptions of their experiences with it. For example, Eng2 said “[When evaluating researchers] you sit there and count pages, count students, and count this and that. Now there’s a great pressure on academics to put out papers, and the content is unimportant. It has a title and a citation and it has a count. From the point of view of the world’s evaluation of you an empty paper is equal to a full paper.” SS2 said “[It is expected from graduate students] in some disciplines to publish even before they graduate. Add to that the fact that pressures on tenure and promotion [have] become stringent … how do you use your role as the reviewer to identify people who are just starting with new ideas.” Med4 who is involved in both medical practice and medical research highlighted the pressures that could be on some academics have little effect on people like him/her. “I have always viewed 79  the opportunity to publish something as an honor rather than a right and my life has never depended really on what I published, at least not in a way that I’m gonna have no job … I guess that sort of biases the way I see it. I wouldn’t be bothering with appeal, I would be saying I want constructive feedback on why this isn’t an appropriate thing to publish.” This attitude appeared to be shared among our participants who were both medical researchers and practitioners. This root cause might be difficult to map to any of the components of Leventhal’s model of fairness, perhaps because pressure for publishing is a force that is external to peer review but that affects attitude of participants in the process.   Table 2.4 summarizes the root causes discussed above and their subthemes. Theme (Root cause) Subthemes Difficulty of preserving author anonymity  Authors’ building on their own previous work  Reviewers’ curiosity and desire for guessing authors’ identity  Imperfect anonymization by authors, or editorial offices  Digital traces left by authors in papers or on the web Anonymous reviews  Lack of accountability  Self-disclosure of reviewers’ identity only on positive reviews  Missing out on collaboration opportunities  Unknown credibility of reviews from authors’ perspective  Undisclosed reviewers’ biases  Positive effect on objectivity of reviewers  Positive effect on expedience 80  Theme (Root cause) Subthemes Power imbalance and insufficient oversight of reviewers  Reviewers’ and editors’ power over papers  Insufficient oversight of reviewers and lack of reliable mechanisms for assessing reviewers’ performance  Role of relationships in assigning reviewing tasks and other roles in the reviewing hierarchy  Importance of having a chain of command to avoid abuse of power  Excessive power of top venues due to the peer-pressure to publish in them  Difficulty of finding appropriate reviewers  Limited public knowledge of relationships within the community and other concerns and constraints that are difficult to formalize  Severity of the problem in broad-audience venues  Severity of the problem in emerging and trendy research areas  Difficulty of managing burst of reviewing tasks in conferences  Increasing number of reviewing tasks as a result of increasing number of publication venues  Insufficient incentives for reviewing, especially for practitioners  Need for balancing reliance on editors’ knowledge of appropriate reviewers with that of authors High workload of reviewers and editors  Negative effect on quality of reviews  Negative effect on quality of managing and conducting the peer-review process Insufficient feedback to reviewers and lack of recognition of reviewing  Need for (positive) explicit feedback   Need for recognition of reviewers’ efforts  Need for communication of fate of papers to reviewers at the end of the process  Need for sharing of reviews among reviewers (after completion of the first round of reviews), and enabling discussion among them as a form of implicit feedback Extremely high bars for acceptance in top-tier venues  Little tolerance of top-tier venues even for shortcomings that are easy to fix  Little concern about satisfaction of authors in top-tier venues due to high volume of submissions that is guaranteed because of their large audience 81  Theme (Root cause) Subthemes Pressure for publishing  High sensitivity of authors to criticisms  Frequent resubmission of rejected manuscripts without necessarily addressing reviewers’ concerns Table 2.4 Root causes and a summary of their subthemes 2.3.24 General: Overall satisfaction with peer review  Despite having various concerns and occasional problems with peer review, most of our participants were satisfied with how peer review is conducted. Med4 said “Peer review has helped me and helped the work and I suspect has helped the audience reading this.” Med1 said “I’ve been very impressed particularly with how well the editors and journals handle the process, it’s very respectful, you know what’s happening; the communication is really good.” CS2 said “I’m also surprised by how extensive the reviews are. Oftentimes they are very detailed, so that more than makes up for [the problems]. … I think the review process, is not going to be perfect, it’s run by humans, but it’s really important. All of my papers have improved as a result of reviews. I’m fairly happy. I think it is worth the effort. … Minor tweaks can be really important, and it can take a very long time to explain what they hell you have actually done.” 2.4 Design implications and further discussion Participants shared a wide range of concerns they had experienced with the peer-review process. While we cannot comment on how common each of the potential concerns are because of the relatively small sample size and the pool from which participants were drawn, most of the problems discussed could potentially affect any researcher because the procedures and 82  contextual parameters that caused those problems were not exclusive to our participants. In other words, the power structure, the complexity of assessing research with respect to various criteria, individual and community-wide biases for research methods and research topics, and social and psychological factors that affect researchers’ behavior create situations that influence fairness or perceptions of fairness of the peer-review process. Our participants concerns revealed how these factors could create concerns and some of the problems that were experienced by them. As mentioned by some of the participants and as discussed in the literature, researchers’ views are affected by various psychological factors. For example, they might have emphasized unfairness of peer review to defend their feelings of self-worth (Major & Schmader, 2001). By contrast, system justification theory states that people are unconsciously motivated to defend and reinforce the status quo, suggesting that researchers could have an innate tendency to view the current peer-review processes as fair and rationalize unjust decisions even if they are disadvantaged by the current system (Jost, Banaji, & Nosek, 2004). Regardless of how psychological factors distort researchers’ view of peer review, and of the impact of procedural fairness of peer review on its outcomes, perceptions of procedural fairness are important in how individuals interact with the community and participate in the process. In the rest of this section we review the concerns our participants discussed and make design recommendations for addressing or alleviating the concerns. Root causes related to the concerns are also discussed in the context of related concerns. 83  2.4.1 Concern: Reviewers’ misunderstandings caused by a lack of expertise or careless reading Concern for reviewers’ misunderstanding, lack of expertise, and careless reading of submissions could be attributed to the difficulty of finding appropriate reviewers and the high workload of reviewers. The concern highlights the importance of ensuring that researchers are motivated to accept reviewing tasks that fall within their purview, which is the focus of Chapter 4. In addition to motivating reviewers, one potential approach to alleviating this concern is to create procedures that allow weighting of reviewers’ opinions based on their expertise. For example, self-evaluations of expertise and a discussion between reviewers could help in estimating the relative expertise of reviewers. Some of the participants mentioned their ability to review some aspects of papers, but not all aspects. The peer-review process should take this into account by determining the suitability of reviewers at a finer granularity; hence, we recommend asking reviewers to self-evaluate their expertise with respect to key aspects of submissions at the time of deciding to accept or decline a review request, as well as when submitting their review. This enables editors and ACs to recruit reviewers that are expected to be able to complement the expertise of the reviewers already assigned to a paper. Indicating areas that are crucial to be covered by the prospective reviewer in the review request letters would also help reviewers decide whether or not their expertise is suited for the task. The iterative recruitment of reviewers that we recommend is inspired by the Spiral Model of software development, where a new round (or spiral) of development is initiated when a potential for improvement is identified or there is a need to resolve a perceived “risk” in the design decisions (Boehm, 1988).  84  Lack of oversight, paired with anonymity and high workload, was mentioned as the root cause for several problems and concerns including insufficient attention to the task. Our participants thought that even as reviewers they expect to receive more feedback than what the computer systems automatically send them. We recommend that computer support for peer review should facilitate managing the process by reminding editors to provide feedback to reviewers rather than doing it automatically. Feedback that one would receive by looking at others’ reviews could help to improve one’s own reviewing quality, but explicitly thanking reviewers assures reviewers that their effort has not gone unnoticed. Hence, we recommend sharing reviews among reviewers as a minimum form of feedback, and augmenting it with explicit positive feedback to thank high quality reviews, ideally personalized to the reviewer. 2.4.2 Concern: Bias for or against certain authors by some reviewers Bias for or against authors based on their relationships, institutional affiliations, or status was a common concern. Therefore anonymity of authors, although sometime ineffective, appeared to be considered desirable by all of our participants. However, several downsides to anonymizing papers were also discussed by our participants. Some of these potential downsides, such as missing opportunities for collaboration, or potentially receiving comments asking authors to cite their own work, could be alleviated if authors’ identities were revealed to reviewers after submission of the first round of reviews. Then reviewers could be allowed to amend their comments (or augment them if the first round comments cannot be altered) based on their knowledge of the authors’ identities. This could ensure that the initial set of reviews are not tainted by bias against or for authors, but that all of the information that can be used for proper 85  assessment of research is taken into account, including the reputation and prior work of authors. Studies of research practices demonstrate how scientific facts and procedures are malleable. As our participants mentioned, and as discussed in the literature, what appears in research papers is significantly different from how research is actually done (Leahey, 2008); hence, the possibility that an author’s good reputation would increase their papers’ chance of acceptance in not surprising. As suggested by our participants, quality of data collection or other aspects of research that are not possible to assess directly within the review process can be guessed, but only if they are based on knowledge of the experience and expertise of the authors. This phenomenon has been observed in the context of code contributions to open source projects, where project managers’ assessment of the trustworthiness of a contributor can increase the chance of approving the contribution (Pham, Singer, Liskin, Figueira Filho, & Schneider, 2013; Tsay, Dabbish, & Herbsleb, 2014). Some of our participants felt that they were attacked personally by reviewers, even when their papers were anonymized. In addition, most of our participants mentioned that they often guess who the authors are, even while knowing that they might be wrong. Such guessing could bring about unconscious biases, regardless of its correctness; hence the effectiveness of anonymizing submissions is questionable even if reviewers cannot guess correctly. We suggest that effectiveness of other approaches to suppressing bias, such as using transparency to force reviewers to be aware of their biases and to manage them, should be examined further. Chapter 3 explores the literature on anonymity and transparency and summarizes the variety of anonymity policies adopted by various venues to improve the peer-review process. 86  2.4.3 Concern: Lack of opportunity to voice concerns We found various levels of opportunity for authors to voice their concerns, ranging from no chance to respond to reviewers’ concerns in some conferences in Computer Science, to rebuttals in some conferences, revise-and-resubmit procedures in journals, and standardized appeal processes that allow authors to challenge an editor’s decision in journals. We are aware of one conference within Computer Science that offers a revise and resubmit procedure, but it is common to blame the short timelines of conferences for not allowing appeals or even rebuttals.  As was mentioned by one of our participants, even in computer science it is possible to challenge an AC’s decision about a conference paper, similar to challenging journal editors’ decisions. The interesting similarity between conferences and journals was that in both types of venues, appeal procedures appear to be not well documented (or at least not well known) and they are often informal. Perhaps the power imbalance within the research community could be a reason for such a procedural deficiency: whenever authors are powerful an appeal process exists (informally), hence those who have the power to change the system are not affected by the lack of a formal appeal process. While appealing can be a standard formal feature of peer-review support systems, so far this feature seems to have been implemented only informally by emails to editors or program chairs. 2.4.4 Concern: Harsh or impolite treatment of authors We found a wide range of attitudes towards harshness of reviews; while some were outraged by it, some thought that it expedites the process, or that the help that reviewers provide makes up for 87  their occasional harsh words. Several participants mentioned that it is the editors’ job to monitor the process and deal with reviews that do not meet professional standards; however the high workload of editors was blamed as a reason for the occasional problems that do occur. While most editors read the comments made by reviewers, it might be easy to miss instances of harsh language. We recommend designing and using automatic politeness indicators (Burke & Kraut, 2008) to notify editors of potential problems.  Some participants suggested that concern over impolite reviews is linked to anonymity of reviewers, which could inhibit a sense of accountability. Chapter 5 offers a review of the literature on politeness of reviewers, and describes ways through which reviewers whose identity is disclosed use politeness strategies to mitigate their criticisms and how that might be better supported by peer-review software systems. 2.4.5 Concern: Long delays in processing submissions Delay, while normal to some, was a concern for others. Long delays could be especially worrying when there is no indication of what is happening. As some participants mentioned, it could raise concern with respect to intentionally blocking one’s progress as a researcher. We recommend adding further transparency into the peer-review process. While knowing the progress of the reviewing process could help in alleviating concerns, it is probably more important to ensure that the process does not take too long.  As some of our participants suggested, the majority of the delay in the peer-review process is between the time that a reviewer accepts a review request until the reviewer actually gets around to starting the review. We think minimizing this time is an important avenue for future research that, to our knowledge, 88  has remained unexplored. We recommend experimenting with giving short deadlines (e.g., 5 to 10 days) to reviewers to assess its effect on the decline rate, and on the quality of reviews. In addition, we envision peer-review processes in which authors can submit an early draft or an abstract that would enable searching for qualified reviewers. The availability of those reviewers to review could then be collected by asking reviewers to identify their preferred time periods for reviewing. Based on that information, a venue could recommend specific time frames for submitting the final version of the paper that might significantly reduce delays in reviewing. 2.4.6 Concern: Breach of confidence and conflict of interest There were two types of breach of confidence about which our participants expressed concerns. One concern was about reviewers disclosing unpublished papers to others. Examples of such problems are rare, perhaps because it is difficult to detect or prove the stealing of ideas, unless the stealing involves plagiarism as well (Broad, 1980). Douglas (1992) describes how such stealing of ideas can happen unconsciously. “I have had my ideas stolen many times, at least twice by people who added insult to injury by ridiculing the ideas before stealing them... I honestly believe that they do not realize where their ideas have come from, because I have similarly stolen ideas without being aware of doing so.” Further transparency of the peer-review process could help ensuring that those who have access to unpublished articles will not abuse their exclusive early access. The second type of breach of confidence that one of the participants expressed concern about was disclosing identities of reviewers by the editor or by other reviewers. Lack of sufficient training for peer review and the fact that peer review is often conducted somewhat secretively means that some researchers might not be fully aware of 89  confidentiality expectations (Vardi, 2011). Peer-review support systems could play an important role in communicating policies and expectations to authors and reviewers by embedding that information within the corresponding interfaces. Ease of transportation and increased online communication over the last few decades might be a reason for closer relationships between researchers across the world. As a result, the traditional notions of conflict of interest which are often limited to collaboration, or being in the same institute, may not capture newer forms of conflict of interest that might arise in today’s highly connected world. This could lead to having reviewers that are not in conflict based on the standard definitions of conflict-of-interest, but who are friends or even collaborators with authors but not in the traditional manner covered by existing policies. Peer-review support systems could take advantage of the often rich data that is available about on-line relationships to alert editors and even reviewers themselves to the possibility of these newer type of conflicts. 2.4.7 Concern: Inconsistency of decisions The problem with consistency of decisions over time or across reviewers is a clear instance of how the process could work against the basic definition of equity. This problem is caused in part by the fact that different small sets of reviewers handle each paper. Ambiguous and subjective evaluation criteria exacerbate this problem. While it is possible to normalize scores across individuals, the small number of reviews by every reviewer lowers the effectiveness of normalization techniques. Furthermore, it is hard to expect even one individual to be consistent in the way they make decisions over the years, much less a whole community of reviewers. We recommend designing better opinion measurement interfaces to help reviewers maintain their 90  consistency over time. Research areas evolve and researchers’ attitudes towards methods and topics change over time, hence what we recommend is to support researchers in maintaining their standards when that is desirable and raising awareness of changes that take place in evaluation methods and criteria, rather than encouraging a literal constancy of evaluation criteria and attitudes. Chapter 6 discusses this issue in more detail and compares some new approaches that could be adopted. Even in their roles as reviewers, the participants in our study felt that sometimes the editor’s decision, occasionally, could appear to be not well justified. Enabling deeper discussion between reviewers and the editor could help with clarifying how a decision is made. An online discussion could provide a footprint of the decision process and thus a level of transparency that at least is helpful for reviewers to understand how their comments have been taken into account. The footprint of the process could be anonymized and also shared with authors to further enhance transparency and the perception of fairness in the process. Chapter 3 describe several variants of these approaches that have been implemented by some publication venues. Another factor contributing to inconsistency of decisions is the extremely high thresholds for acceptance in top-tier venues. As some of our participants suggested, peer-pressure for publishing in such venues causes a very high number of submissions, while only a small portion of them can be accepted for publications. Such a level of strictness means that the slightest concerns or disagreement by reviewers could lead to rejections. While such concerns or disagreements might reflect issues that are easy to address by the authors, some top venues often do not give a second chance to papers because of their high submission levels. In those 91  situations, the set of reviewers assigned to a paper becomes particularly important because researchers might feel that their high quality papers are being dismissed while other papers that are of comparable or lower quality are being accepted. 2.4.8 Concern: Subjective, ambiguous, or inappropriate evaluation criteria Differences of opinion in evaluation criteria and what the must-have criteria are for acceptance was one of the causes of dissatisfaction with the peer-review process that were cited. Some of the participants suggested that even an idea that hasn’t been validated could be important to publish because it could inspire a line of work that would eventually be useful. We recommend that instead of eliminating submissions based on deficiencies, peer-review processes should focus on identifying those deficiencies and attaching descriptive and evaluative tags as meta-data to papers, so that readers and information seekers can determine what aspects of the paper were approved by the peer reviewers and what aspects were not. The meta-data could then be used by consumers of research to search and filter research articles based on their needs. Special venues or submission tracks that focus on specific aspects of submissions, or that ignore specific aspects of submissions, are the current process of tagging submissions of being of a certain type. These submission tracks are often created to address concerns of a large subset of the community that have difficulty getting their papers accepted because of a mismatch between some of the standards of the field, and some inherent property of the type of research they are conducting. What we recommend could address this problem even for smaller groups that may have difficulty in establishing special venues or special submission tracks. 92  2.4.9 Concern: Lack of specific or appropriate policies and expectations Lack of specificity and clear communication of policies was a concern raised by the participants. One of the main policies that is not communicated well (and sometimes not communicated at all) is appeal policies, which we discussed earlier. However, lack of clarity appears to be common in many aspects of peer review. For example, the expectations for authors and for reviewers are not always clear. One approach, suggested by our participants, to alleviate this concern is to increase the structure of both articles and reviews. For research articles, recommending inclusion of specific sections in the paper that address the standard basic expectations of reviewers could help both authors and reviewers in communicating their concerns. Examples of such standardization that are adopted by some publication venues are requiring or recommending sections such as statement of contributions, practical implications, threats to validity, or limitations. Similarly some of our participants praised the use of review forms that specifically ask reviewers to comment on specific aspects, and to not comment on other specific aspects. Increasing the transparency of the peer-review process could be an approach to addressing problem with lack of communication of policies. We recommend designing peer-review support systems that are socially translucent (Erickson & Kellogg, 2000; McDonald, Gokhman, & Zachry, 2012): systems should increase awareness of how other authors and reviewers behave within the system, by the increasing visibility of authors’ and reviewers’ behavior. While complete transparency is incompatible with some forms of peer review, translucency (partial transparency) could be implemented by providing relevant representations of what other authors or reviewers do can help raise awareness of possibilities within the process, without jeopardizing 93  any desired anonymity of authors or reviewers. Showing how many reviewers or authors perform each of the possible actions could help other authors and reviewers become aware of those actions. For example, knowing that some authors appeal editorial decisions and that some of them are successful in reversing a decision could help authors understand the avenues open to them. Similarly, knowing that some papers are rejected because of not adhering to formatting requirements or for poor anonymization could raise awareness of the importance of following those policies. Some participants talked about the challenge of meeting originality criterion when submitting extended versions of conference or workshop papers. Such concerns might have become more salient now than in the past because of the decreasing cost of publishing and the increasing importance of conferences and workshops due to the wide availability of papers that were presented in those venues appearing on the Internet. If a journal is always publishing recycled materials from conferences, it becomes less useful as the place to go for knowing what is new. We recommend better integration of conferences and journals. Grudin (2013) described some of the efforts in creating journal-conference hybrids. In addition, we also recommend more formally connecting papers in the literature that are closely related, so that readers know which papers are extended versions of previously published papers, and whether or not a paper is a preliminary form of another paper. This could also alleviate the known problem of authors submitting multiple versions of a study, for the sake of increasing the number of papers they have published. Several participants pointed out that there are often exceptional cases that cannot be handled appropriately based on standard policies. While there are many reasons for having flexible and 94  adaptable policies, it’s important to make sure that the flexibility is communicated to authors and that the flexibility does not become a source of unfairness, which could be the case if individuals were treated differently depending on the situation. Perhaps how flexibility is expected to be used should be well defined as part of the policy. For example, multiple participants mentioned situations where disclosing the identity of authors could be beneficial; however, if we let authors choose to do that without having to seek approval of the reviewers or the editors, it could lead simply to disclosure of the names of all authors who think that disclosing their identities increases their chance of acceptance rather than actually making the decision-making process fairer. Chapter 3 has a detailed discussion of anonymity policies adopted by various venues and their level of enforcement. One thing that we feel is sorely missing in peer-review support systems is provision for ‘articulation work,’ activities that do not directly contribute to the outcome but that contribute to designing and conducting the process (Schmidt & Bannon, 1992). Currently articulation work such as discussing ways of improving current peer-review processes, interpreting policies, and handling special cases are poorly supported by peer-review support systems. Previous work has shown the importance of supporting such activities (Schmidt & Bannon, 1992), especially if we move toward more flexible modes cooperation (Kriplean, Beschastnikh, & McDonald, 2008; Stvilia, Twidale, Smith, & Gasser, 2008). We recommend supporting such activities within the peer-review support system to increase awareness of policies and norms, to facilitate the evolution of policies, and to enable more flexible modes of peer review. Leventhal (1980) included “change mechanisms” as one of the seven components of procedural fairness; however, framing this concern as a need for supporting articulation work brings into perspective a more 95  comprehensive set of interactions, and hence is a more productive framing for guiding the design of future peer-review support systems. 2.4.10 Concern: Conflicting mindsets of “Mentorship and support” and “Survival of the fittest”  Studies of procedural fairness often focus on just one basis of fairness, equity. Some of our participants suggested that the role of reviewer perhaps is beyond simply judging who is better than others, or, as described by one of our participants, simply being a cog in the Darwinistic process of survival of the fittest. Perhaps a reviewer can act as a mentor to help those who need more help. As discussed in the introduction to this chapter, level of need is another potential basis for fairness, which can be appropriate in the context of peer review if we look at the peer-review process as an educational process aimed at helping members of a research community flourish. However, current review processes are designed around equity-based fairness not need-based fairness. Some arguments for opening up the peer-review process emphasize nurturing colleagues and supporting research communities and joy and value of social interactions (W. Lipworth, Kerridge, Carter, & Little, 2011) An important aspect of social interaction that is lost in peer review due to anonymity is being adaptive to the status and needs of authors. Junior authors might react to criticism differently, hence some of our participants thought junior authors should be treated with a higher focus on support and mentorship. The desire to treat junior researchers differently is but one indication of a need for more adaptive treatment of authors. Like any educational method, one could argue that peer review should be personalized to the needs of the learner. 96  This concern about the role of the reviewer raises a bigger question: Is a research community that distributes resources based on equity more productive than one that does it based on needs? Is there a balance that needs to be struck? Can multiple venues that use different bases for fairness co-exist in one research community? We think this is an important avenue for future research. Wright (2001) argues that relationships within and between human societies are evolving toward nonzero-sum (i.e., cooperative rather than competitive) relationships. Envisioning such a future for academic research might lead to designing peer-review systems that are optimized for effectiveness of feedback and provision of support to researchers, rather than for simply selecting the best articles. 2.4.11 Concern: Bias for specific methods and topics Several participants pointed out that reviewers’ or editors’ preferences for specific methods and topics could lead to unfairness, because of the level of power that they have over submissions. Some suggested that a transparent process could force reviewers and editors to better manage their biases. We think another potential approach to alleviating this concern is to collect opinions of a larger group of researchers on each submission. More discussion of papers in editorial board meetings, or program committee meetings, could be one such solution; however, often only one or two editors or ACs actually read each submission, which makes such meetings less helpful for this purpose. Post-publication peer review, where over time various researchers can express their opinion about papers, could be another solution. The benefit of having the opinions of a larger group is that a single individual’s biases would not be sufficient to reject a paper.  However, we 97  should note that it is still possible that a community as a whole might be biased against certain methods or topics, which is a more difficult problem to address. 2.4.12 Concern: Judgment based on incomplete or inaccurate representations of research While several participants talked about limitations of research articles for providing reliable representations of research, all of the participants mentioned the difficulty of assessing datasets, codes, and other forms of materials that could be submitted for peer review along with research articles. A significant portion of scientific analysis is conducted using computer programs instructed by scripts that are applied to data, which itself is sometimes collected using other computer programs or scripts. Bukheit and Donoho (1995, p. 59) argue that an article in computational science is “merely advertising of scholarship” and that “the actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.” Lack of such details not only inhibits replication of studies, but also inhibits verification (Donoho, Maleki, Rahman, Shahram, & Stodden, 2009) and accurate peer review of the research. Misconduct and honest mistakes alike go unnoticed (Campanario, 1998) and contradicting findings and findings that wear off are not uncommon (Ioannidis JA, 2005; Lehrer, 2010). We recommend integration of peer-review support systems with data analysis tools to enhance the process of verification of research by facilitating data inspection, and by running the code that was used by the authors for data analysis. Integration with services that allow running virtual machines remotely can help reviewers easily try out submitted code and manipulate data without having to go through the process of replicating an execution environment, or even having to run a virtual machine on their own computer systems. 98  2.4.13 General: Overall satisfaction with peer review Several of the participants praised peer review for helping authors to improve their work, and helping them perfect the representation of their research that will be frozen in time as a publication. Digital publishing challenges the assumption that once something is published it is “out there” and cannot be changed. In the age of digital publishing, more dynamic publishing processes are possible: papers can be expanded and revised over time, and version control systems can be used to keep track of every change and the reasons behind the changes over time. Many theories have been disproven over the years and limitations of some published techniques have become clear, but inaccurate statements all-too-often remain untouched in the published literature because academic publishing has not fully taken advantage of malleability of digital documents. Even old-style documents can be augmented with links to future findings that question, confirm, or disconfirm their findings. It is difficult to assess correctness of cutting edge research, but decades later, subsequent research clarifies whether or not that research was correct or useful (De Millo, Lipton, & Perlis, 1979). We recommend viewing the peer-review process as a first step in a continual process of assessment. From this perspective, perhaps the peer-review process instead of dismissing a large portion of research due to errors or imperfections, should focus on guiding future researchers in taking advantage of that research. Table 2.5 summarizes the design recommendations discussed above. Recommendation Related concerns or root causes Eliciting self-evaluations of expertise at a finer granularity at the time of recruitment, and forming a team of reviewers incrementally to complement the expertise of the team.   Concern: Reviewers’ misunderstandings caused by a lack of expertise or careless reading 99  Recommendation Related concerns or root causes  Root cause: Difficulty of finding appropriate reviewers Automatically reminding editors to provide feedback to reviewers, and facilitating the process of feedback provision, rather than automatically sending “thank you” notes.  Root cause: Insufficient feedback to reviewers and lack of recognition of reviewing Using transparency to force reviewers to be aware of their biases and to manage them  Concern: Bias for or against certain authors by some reviewers  Root cause: Difficulty of preserving author anonymity  Root cause: Anonymous reviews Formal support for appeal processes  Concern: Lack of opportunity to voice concerns  Power imbalance and lack of oversight of reviewers Using automated politeness indicators to notify editors of potential problems  Concern: Harsh or impolite treatment of authors  Root cause: High workload of reviewers and editors Providing authors with a more detailed view of the status of the review process (e.g., whether or not each reviewer has provided feedback, and how the schedule looks like)   Concern: Long delays in processing submissions Experimenting with short deadlines for reviewers and assessing its effect on decline rate and review quality  Concern: Long delays in processing submissions Minimizing the reviewing timeline by determining preferred submission times based on availability of reviewers  Concern: Long delays in processing submissions Embedding information about policies, expectations, and norms, within the corresponding user interfaces of peer-review support systems  Concern: Breach of confidence and conflict of interest  Concern: Subjective, ambiguous, or inappropriate evaluation criteria  Concern: Lack of specific or appropriate policies and expectations 100  Recommendation Related concerns or root causes Analyzing rich data that is available about on-line relationships to alert editors and reviewers themselves to the possibility of newer type of conflicts  Concern: Breach of confidence and conflict of interest Designing better opinion measurement interfaces to help reviewers maintain their consistency over time  Concern: Inconsistency of decisions Enabling deeper online discussion between reviewers and the editor to elucidate how a decision is made, and to provide a footprint of the process to authors to understand the basis of decisions  Concern: Inconsistency of decisions Focusing on identifying papers’ deficiencies and attaching descriptive and evaluative tags as meta-data to papers, so that readers and information seekers can determine what aspects of the paper were approved by the peer reviewers (instead of eliminating submissions based on deficiencies)  Concern: Subjective, ambiguous, or inappropriate evaluation criteria   Concern: Inconsistency of decisions Moving toward more structured research articles and reviews  Concern: Lack of specific or appropriate policies and expectations Increasing awareness of how authors and reviewers behave within the system, by increasing the visibility of their behavior (social translucence)  Concern: Lack of specific or appropriate policies and expectations Formally linking papers in the literature that are closely related, so that readers know which papers are extended versions of previously published papers  Concern: Lack of specific or appropriate policies and expectations  Concern: Subjective, ambiguous, or inappropriate evaluation criteria Incorporating well-defined flexibility in policies to ensure that special cases are not ignored, and that the flexibility is not abused  Concern: Lack of specific or appropriate policies and expectations  Concern: Subjective, ambiguous, or inappropriate evaluation criteria Supporting articulation work within the peer-review support system  Concern: Lack of specific or appropriate policies and expectations  Concern: Subjective, ambiguous, or inappropriate evaluation criteria Augmenting pre-publication peer review with  post-publication peer review, where over time various researchers can express their opinions about papers  Concern: Subjective, ambiguous, or inappropriate evaluation criteria  Concern:  Inconsistency of decisions 101  Recommendation Related concerns or root causes Integrating peer-review support systems with data analysis tools to enhance the process of verification of research by facilitating data inspection, and with services that allow running virtual machines remotely to facilitate running submitted code and manipulating data  Concern: Judgment based on incomplete or inaccurate representations of research Table 2.5 Recommendations for design of peer-review processes and systems that support them, and the concerns and root causes addressed by them. 2.5 Conclusion Our interviews shed light on a variety of concerns about fairness and the quality of peer review. The key difference between this research and the previous studies of fairness of peer review was the broad range of disciplines that we included in our study, which allowed us to look at peer review in both journals and conferences in a variety of disciplines in both the natural and social sciences. We found some differences between concerns related to conferences and those related to journals, which also reflect differences across disciplines in terms of types of publications venues they value. Some of the concerns discussed by the participants were related to specific parameters of the design of the peer-review process that can be manipulated, such as anonymity of authors and reviewers. Other concerns were related to incorporating appropriate procedures, for example procedures for appealing decisions, or procedures for monitoring reviewers to assess quality of reviewing. Yet other concerns were about specifying and communicating policies and evaluation criteria to the research community. Finally, some concerns raised questions about the competitive nature of research communities and the existence of peer pressure for publishing in highly selective venues. This broad range of concerns points to several avenues for future 102  research, some of which we have explored in more depth through studies that are described in the chapters that follow in this dissertation.   103   Understanding and Supporting Anonymity Policies in Peer Review In the course of examining the peer-review process, we heard in town hall meetings and read and in published opinion pieces several discussions that proposed “new” ways of conducting the peer-review process that involved changes in the way that information about the identities of the participants (authors, reviewers, and even associate editors or program committee members) were disclosed. We decided that we needed to develop an understanding of the current landscape of how peer-review processes deal with anonymity to help the communities get beyond suggesting “new” processes that have already been adopted by other research communities. We thought about whether this should be framed as a review of the literature or as a formal study of policies. After looking at previous research on analyzing policies with the purpose of informing technology development (such as in Butler et al. (2008), Jackson et al. (2014), Kriplean et al. (2008)), we decided that adopting a more formal approach would lead to a more comprehensive and reliable view of the diversity of anonymity policies in peer review. Our contribution in this chapter is the picture or preliminary taxonomy that we created of this diversity, and of the nuanced differences that exist within the diversity, not just the fact that the diversity exists. More specifically: 1) We catalog what processes already exist and thus what can be studied for “lessons learned” rather than proposing the same solutions yet again as if they had not ever been tried. 104  2) We identify processes that have not yet been tried and thus perhaps need to be experimented with in an appropriate sub-community where those processes might address a problem or issue the sub-community is known to be concerned about. 3) We make suggestions about how to design better computer support for systems that are likely to have to deal in the future with two trends we observed –more nuanced varieties of anonymity and much more demand for transparency. 3.1 Background  Design of peer-review support systems is shaped by the policies that define and govern the peer-review process. An important component of these are policies that deal with anonymity: the policies that govern the concealment and transparency of information related to identities of the various stakeholders (authors, reviewers, editors, and others) involved in the peer-review process. Anonymity policies have been a subject of debate for several decades within scholarly communities. Because of wide-spread criticism of traditional peer-review processes (Benos et al., 2007; Campanario, 1998; Smith, 2006), a variety of new peer-review processes have emerged that manage the trade-offs between disclosure and concealment of identities in different ways. As one of the initial steps in this dissertation, we set out to map the current landscape of peer-review processes in terms of differences in how disclosure and concealment of identities is managed. In doing so, our primary goal was to understand the breadth of practice as it relates to anonymity. This chapter presents our findings, which form the basis for some of the discussions in subsequent chapters; it was itself guided in part by some of the findings in the previous 105  chapter where many issues of fairness have a component that relates to how anonymity is handled. Designing a peer-review process and a system to support it is an example of designing for adversarial collaboration, where stakeholders have conflicting goals. Each group of stakeholders would like to promote its own interests and in doing stakeholders in a group strategically disclose information to or conceal it from other stakeholders (Cohen, Cash, & Muller, 2000). Often, publication venues design their peer-review process and the policies around it to better serve the goal of the system as a whole, while still being considerate of the different interests of the parties involved. Policy decisions could be affected by the power structure within a research community (W. Lipworth & Kerridge, 2011) and by the various goals that the publication venue tries to achieve, including earning a reputation for quality by attracting the best articles and the best reviewers. Usually the policies include rules for whether the identities of each type of stakeholder will be known to other stakeholders. A traditional definition of anonymity is “the condition of not being identifiable to the other person” (Derlega & Chaikin, 1977, p. 109). However, absolute anonymity is rare in the context of peer review. In the context of peer review it is common for members of research communities to be familiar with each other’s areas of research and approaches to research; hence, both reviewers and authors might be able to guess each other’s identities even when the identities are concealed. Anonymity is thus a relative term and thus might best be defined probabilistically, as the complement of the probability that an outsider will correctly identify the correct individual as the person of interest (Reiter & Rubin, 1999).  106  Studying anonymity policies is one lens through which we can understand the variety of peer-review processes in order to guide the design of future peer-review support systems.  Identity management is one of the most important aspects of a peer-review process; indeed, the anonymity policies adopted for a peer-review process are commonly used to label that process. For example, processes may be referred to as blind (identities of reviewers are not known to authors), double blind (neither reviewers nor authors know each other’s identify), etc., depending on the type of anonymity policies implemented in the process.  We conducted a study of anonymity policies within peer-review processes. After an initial review of the anonymity policies of 112 publication venues (including both journals and competitive conferences), we chose 25 for detailed analysis. We then used thematic analysis to gain a better understanding of the anonymity policies by identifying key themes. We present a preliminary taxonomy of anonymity policies in peer-review processes, and discuss the appropriate role for information technology and computer support to manage concealment and disclosure of identities during peer review. 3.2 Related work We first summarize the findings of previous studies of anonymity in peer review, and then we briefly review studies of support for anonymity in online computer-mediated communication systems used in contexts other than peer review. 107  3.2.1 Anonymity in peer review Numerous scholars have argued for or against concealing the identity of reviewers or authors and the discussions continue unabated even today. It is beyond the scope of this work to offer a comprehensive review of the wide range of opinions and arguments. Instead, we review selected studies of anonymity management in the context of the peer review of manuscripts. Shatz offers a comprehensive analysis of arguments for and against anonymity of authors and reviewers for those who are interested in why the various management strategies might be used (2004). 3.2.1.1 Masking reviewers’ identity There is a fundamental underlying belief that transparency promotes accountability, while anonymity encourages frankness. Both accountability and frankness of reviewers are desirable. When there is a lack of transparency, abuse of editorial power is difficult to detect. When such abuse or hints of it do occur, it is rarely publicized because of a fear of legal action (Altman, Chalmers, & Herxheimer, 1994; McCarty, 2002). If an ombudsman or other adjudicator intervenes, a significant percentage of appeals lead to the reversal of editorial decisions (Campanario, 1998; Simon, Bakanic, & McPhail, 1986). A survey of academic psychologists revealed that many have reported encountering as authors “strictly subjective preferences of the reviewers (76%), false criticisms (73%), inferior expertise (67%), concentration on trivia (60%), treatment by referees as inferior (43%), and careless reading by referees (40%)” (Bradley, 1981). Walsh et al. (2000) found that signed reviews were more polite, of higher quality, and more likely to recommend acceptance, whereas van Rooyen et al. (1998) did not find any effect of 108  signing reviews on their quality. Van Rooyen et al. also found that unmasking reviewers increased the chance of reviewers declining review requests (23% vs. 35%) and that most authors preferred knowing the identity of reviewers (55% in favor vs. 26% against). In a survey, Melero, and López-Santoveña (2001) found that 75% of reviewers were in favor of masking reviewers and 17% were against it. Revealing the identity of reviewers might reduce some of the possible misbehaviors over the course of the peer-review process, such as editors requesting reviews from competing researchers or from researchers with conflicting stances, or delaying the publication of research that competes with a reviewers’ research (Campanario, 1998). However, these problems are probably inevitable in narrow research areas if only a few researchers are qualified to serve as reviewers (Riggs, 1995). According to Weicher’s review of the literature, anonymity of reviewers encourages candor, honesty and impartiality, and promotes freedom of expression; however, it also can protect reviewers from the consequences of their actions, hence it intensifies concerns about reviewer bias, negligence, favoritism, self-interest, enforcement of disciplinary orthodoxy, advancement of political agendas and hiding conflicts of interest (Weicher, 2008). 3.2.1.2 Masking authors’ identity Anonymity of authors may reduce reviewers’ biases, whereas revealing authors’ names may increase accountability for what authors submit, and may help reviewers communicate more effectively based on a more accurate knowledge of authors’ experience and background. Several studies have investigated the effects of anonymity of authors on the peer-review process. 109  Most previous studies found no effect of masking the identity of authors on review quality or on time spent writing reviews (Isenberg, Sanchez, & Zafran, 2009; Justice, Cho, Winker, Berlin, & Rennie, 1998; Susan van Rooyen, Godlee, Evans, Black, & Smith, 1999; Susan van Rooyen et al., 1998). One exception was a study by McNutt et al. (McNutt, Evans, Fletcher, & Fletcher, 1990) that found blind reviews (those in which reviewers do not know the identify of the authors) were of higher quality. However, several studies found that masking authors’ identity affected recommendation scores (Godlee, Gale, & Martyn, 1998; Isenberg et al., 2009). In a retrospective study of reviews of anonymized manuscripts, Isenberg et al. (2009) found that reviewers gave lower recommendation scores when they had no idea who the authors where in comparison with when they suspected or knew the identities of authors. Similarly, in a controlled study, Blank found that anonymized papers received lower scores and were treated more critically (Blank, 1991). In contrast, in another controlled study, Godlee et al. (1998) found that anonymized papers were less likely to be rejected. Several studies found that when the identity of authors was not masked, reviewers favored authors from English-speaking countries and from prestigious institutions (Blank, 1991; Peters & Ceci, 1982; J. S. Ross et al., 2006). While such biases are easy to operationalize and study, the potential effects of other types of bias such as biases in favor of positive results (Mahoney, 1977) or of results that match the reviewers’ stance (Abramowitz et al., 1975) or of results that corroborate previous work (Ernst & Resch, 1994) are largely unknown. Requiring anonymity does not guarantee anonymity. In a study of submissions to two radiology journals, editors found that 34% of the submissions contained information that could help reveal the identity of authors, such as authors’ initials, references to work “in press,” and references 110  identified within the text as the authors’ previous work (Katz, Proto, & Olmsted, 2002).  In 74% of those cases the editors were able to successfully identify the authors. In another study, editors were able to identify authors of over 45% of anonymous submissions, while wrongly believing to have identified another 5% (Blank, 1991). Mulligan et al. (2012) found that despite the difficulties in implementing double-blind peer review, 76% of researchers consider it to be effective. 3.2.1.3 Public disclosure of reviews Public disclosure of reviews has moved into the spotlight over the last decade. Van Rooyen et al. (2010) found that telling reviewers that their signed reviews might be published (on the web) did not affect the quality of reviews, but it increased the time spent on composing reviews. Bornmann and Daniel (2010) analyzed public reviews submitted to the journal Atmospheric Physics and Chemistry and found that the level of inter-reviewer reliability was low, comparable to that of traditional peer-review processes. Bingham et al. (1998) found that post-publication reviews by online readers can provide valuable feedback, but those reviews are often short and specific, and thus do not adequately replace full editorial peer review. The journal Nature found similar results and received little interest in this type of review from either authors or reviewers; only 5% of papers were made available by their authors for open review (sometimes for fear of being scooped or of compromising patent applications) and only 54% of those received any comments, and most of those were not of editorial value (“Nature’s trial of open peer review,” 2006). 111  The numerous studies of anonymity in peer review look at perception of various anonymity settings and individual’s behavior in them. To our knowledge, our study is the first to provide a comprehensive overview of anonymity policies used by various publication venues across disciplines. 3.2.2 Anonymity in online communities Numerous studies have looked at effects of anonymity in online communities and computer-mediated interactions. A complete review of those studies is beyond the scope of this chapter; in this section we only offer a brief overview. In a study of an online community of soldiers, Kilner and Hoadley found that removing anonymity options decreased antisocial comments, but that the community was divided in attitudes toward offering anonymity options (Kilner & Hoadley, 2005). Several other studies have similarly found that anonymity could facilitate violation of social norms (Millen & Patterson, 2003; Postmes, Spears, & Lea, 1998; Siegel, Dubrovsky, Kiesler, & McGuire, 1986; Vrooman, 2002). Diakoupoulos and Naaman (2011) studied the quality of comments on online news, and found that editors felt anonymity of comments was an important cause of low quality comments; however, about 40% of commenters thought they would not have commented if their real names were associated with the comments. Other studies found that “flaming” (expressing inflammatory opinions to others) is a more severe challenge in anonymous interactions (Siegel et al., 1986; Vrooman, 2002). Similarly, antisocial behavior in online games has been found to be associated with anonymity (Chen, Duh, & Ng, 2009). On the other hand, anonymity is found to encourage self-disclosure and honesty (Bargh, McKenna, & Fitzsimons, 2002). Kang et al. 112  (2013) interviewed users who had experienced anonymity online and identified several pros and cons for anonymity. Some of the benefits to individuals of anonymity that were found in the study were avoiding embarrassment, feeling comfortable, honesty, having control over personal image, and freedom. In contrast, some of the benefits of being identified were reputation building, stronger connections, feeling real, and avoiding irresponsible behavior. Based on previous research, Stuart et al. (2012) proposed a framework that characterizes the effects of transparency of identities, contents, and interactions. They concluded that requiring users to disclose their identities leads to higher information accuracy (due to increased accountability and desire for building reputation) and lower creativity (due to increased conformity).  Anonymity policies are imposed to reconcile various concerns of different parties. These concerns are context-specific; we therefore decided to conduct a study of anonymity policies in the context of peer review policies to shed light on how various communities balance concerns through such policies. 3.3 Methods We used thematic analysis (Braun & Clarke, 2006) to study anonymity policies in the peer-review processes of various publication venues. In this section we describe details of our data collection and data analysis methods. Results are presented in the next section. Some of the data collection and data analysis methods might be unfamiliar to a computer science audience, so the methods are described briefly, and references for further reading are included. 113  3.3.1 Data collection We were interested in understanding the range of anonymity policies used in peer review, so we used a purposive sampling method, Maximum Variation sampling (Patton, 1990, p. 169; Teddlie & Yu, 2007). We first briefly reviewed 120 different journals and conferences across a number of fields and selected 25 of them that we felt offered the most diversity for further analysis. The initial list of 120 publication venues was collected based on mention in the literature of their novel peer-review processes, an online search for alternative peer-review policies, and lists of the top five journals based on impact factors in various disciplines spanning natural, formal, and social sciences including medicine, biology, math, physics, chemistry, psychology, neuroscience, cognitive science, business, economics, law, history, and sociology. Appendix B has a complete list. Because conferences are often primary publication venues in computer science, we also included the top 20 conferences in computer science based on Microsoft Academic Search Field Ratings. We used three key characteristics for constructing our sample for detailed analysis: use of unconventional anonymity policies (policies other than single blind or double blind), use of unconventional ways of implementing a policy, and the presence of a detailed justification or description of the anonymity policies. We used a combination of these criteria to choose a diverse sample that includes deviant cases. We decided that saturation had been reached after choosing 25 venues. Our corpus included guidelines and instructions by editors or program committee chairs that were publicly available on the Internet. Table 3.1 shows the list of the publication venues we selected.   114  Publication Venue Short name Atmospheric Chemistry and Physics ACP alt.chi (A submission track in the ACM SIGCHI Conference on Human Factors in Computing Systems) alt.chi Behavioral and Brain Sciences BBS Biology Direct Bio Direct British Medical Journal BMJ British Medical Journal Open BMJ Open Bulletin of the Seismological Society of America BSSA Computer Communication Review CCR Cochrane Reviews Cochrane European Molecular Biology Organization (EMBO) Journal EMBO Faculty of 1000 F1000 Frontiers Frontiers GigaScience GigaScience Harvard Law review Harvard LR International Conference on Computer Vision ICCV International Conference on Software Engineering ICSE International Joint Conference on Artificial Intelligence IJCAI IEEE Information Visualization InfoVis Leonardo Leonardo Neuron Neuron Neural Information Processing Systems NIPS PeerJ PeerJ PLOS Biology PLOS Bio PLOS Medicine PLOS Med Stanford Law review Stanford LR Table 3.1 Publication venues selected for thematic analysis of anonymity policies 3.3.2 Data analysis The thematic analysis approach described by Braun and Clarke (Braun & Clarke, 2006) was the foundation for our analysis to provide a detailed account of anonymity of authors and reviewers. Section 2.2.2 in Chapter 2, provides a summary of Braun and Clarke’s approach. The goal of our 115  analysis was to develop our understanding of the ongoing evolution of anonymity policies deployed in peer review. We identified themes using an inductive ‘bottom up’ method, primarily focusing on extracting semantic (as opposed to latent) themes for the corpus. We selected publication venues on the basis of capturing a diversity of practices, not to provide a balanced representation of the space of journals and conferences. For this reason, the repetition count of deployment of policies is not informative and thus no quantitative data is reported.  3.4 Findings The main themes identified in the corpus of anonymity policies correspond to various aspects of behavior or identity of individuals that could be concealed or disclosed. The themes shared two primary dimensions: timing of the disclosure/concealment, and level of enforcement of the disclosure/concealment. These dimensions are described after the overview of the aspects of identity that we uncovered in our study. 3.4.1 Names Almost all peer-review processes have policies that address the way they handle names of authors and reviewers during the process. ACP, BBS, and Frontiers use multi-phase peer-review processes, where the first phase follows traditional blind peer reviewing and the second phase uses a more open discussion forum. ACP allows both anonymous and named comments by reviewers. BBS’s post-publication commentaries include the names of reviewers and are treated as publications. Frontiers allows close collaboration between authors and reviewers in a discussion forum for achieving a decision on a manuscript; however, it does not disclose the 116  identity of reviewers until after acceptance of the manuscript. Several venues disclose identity of reviewers to authors (e.g., BMJ, BMJ Open, Bio Direct, alt.chi, F1000), and sometimes to the general public after the completion of the peer-review process (e.g., BMJ Open, F1000, alt.chi, Bio Direct, Frontiers).  Some venues had distinct policies about identities of editors and program committee members. For example, the PLOS Bio policy states that “Academic Editors retain anonymity unless a paper is accepted for publication. The name of the Academic Editor is noted on each published paper.” Cochrane’s policy is more complex involving both identified and anonymous editors. “Two editors, one with primary responsibility for supporting the authors and one anonymous, provide ongoing support and peer review.” Most double-blind venues such as NIPS emphasize anonymity of authors to reviewers but not to program committee members. The NIPS policy states that “area chairs do know the author identities, to avoid accidental conflicts of interest and to help determine novelty.” Similarly, the InfoVis policy states that “When submitting your paper you will be asked to provide a complete list of authors even when submitting an anonymized version of the manuscript. This is required to avoid potential conflicts of interest when assigning reviewers.” However, IJCAI and ICCV are two conferences that conceal the identity of authors even from area chairs (equivalent to associate editors in journals). As mentioned in our review of the literature, the effectiveness of concealing authors’ identity is limited. This problem has been exacerbated due to traces that people leave behind on the Internet. For example, according to ICCV, publication of technical reports on Arxive.org, or on institutional repositories is not a violation of anonymity. To alleviate 117  this problem, ICCV and NIPS discourage or prohibit reviewers from searching the Internet to discover the identity of authors. Concealing the identities of program committee members from the authors of papers that are managed by the program committee members could be challenging if the authors (or those who are in conflict) are committee members as well, because they often attend in person face-to-face program committee meetings where they might sit next to each other. Sometimes program committee chairs (e.g. at ICSE) assign seats so that each program committee member is not sitting next to those who are in conflict with the papers managed by them. This could be done manually or with the help of constraint-satisfaction software. Some publication venues use online program committee discussion boards to replace or complement physical program committee meetings. In our sample, ICSE was a conference that recently changed from having a single-tier reviewing process (a committee but no use of external reviewers) to a two-tier model (a higher-level committee that shepherds the process and makes the final decisions, and external reviewers who each review one or more papers). An online discussion board is used to reach consensus among reviewers for each paper. A face-to-face physical meeting of the higher-level committee finalizes decisions. The online discussions were not anonymous, as had been the case for the face-to-face committee discussions in the earlier single-tier process. The program co-chairs for the ICSE 2014 conference, in their summary of experiences with the online discussion board, mentioned that “Some ‘power games’ took place among reviewers and we should consider keeping reviewers anonymous.” In this case, the move away from a single-tier process that involved a 118  physical program committee meeting to a two-tier process in which initial decisions were made online, without face-to-face physical meetings, introduced behavior that worried the program co-chairs. The program co-chairs suggested that making the online discussion board anonymous might inhibit the worrisome behavior. This type of anonymity is not an option in physical committee meetings, but it might not be necessary there because of the social norms in face-to-face meetings that might not be present in online discussion boards. None of the venues that we analyzed allowed anonymity of authors after publication, except for Harvard LR that imposed anonymity on their student contributors, because “many members of [Harvard LR], besides the author, make a contribution to each published piece.”  Some venues have policies that require disclosure of biographical information or resumes for authors. For example, Stanford LR provides a field in their submission form for uploading authors’ resumes.  3.4.2 Relationships and conflict of interest Most publication venues have policies governing disclosure of relationships between authors and reviewers, or between authors and their research. The alt.chi peer-review system asks reviewers to sign an agreement that includes “As reviewer, I agree to provide full information on my relationship to the authors of each submission I review, as well as any other information that may influence my objectivity.” For 2007, alt.chi disclosed the competing interests of reviewers publicly. While reviewers are expected to decline reviewing if there is a conflict of interest, ICCV uses the DBLP Computer Science Bibliography System to automatically identify authors’ 119  and reviewers’ collaborators within the past 3 years to better manage conflicts of interest. In addition, ICCV uses Microsoft’s Conference Management Toolkit that identifies conflicts of interest by matching the “domains of the academic department or institution where [authors and reviewers] currently work or study.” Several journals ask for disclosure of conflicts of interest by authors. For example, F1000 warns that the competing interests of authors will be disclosed publicly. “All financial, personal, or professional competing interests for any of the authors that could be construed to unduly influence the content of the article must be disclosed and will be displayed alongside the article.” Publication venues sometimes delimit which relationships need to be disclosed. For example Neuron wrote that “financial benefit exceeding $10,000 p.a. or 5% ownership of a company or research funding by a company with related interests would constitute a conflict that must be declared.” GigaScience asks authors “Are there any non-financial competing interests (political, personal, religious, ideological, academic, intellectual, commercial or any other) to declare in relation to this manuscript?” This definition takes into account intellectual and academic conflicts. This is important because paradigm conflicts or conflicting research agendas reflected in manuscripts are discussed as a source of bias in the literature (Abramowitz et al., 1975). GigaScience also asks for disclosure of “any non-financial competing interests that may cause them embarrassment were they to become public after the publication of the manuscript.” As mentioned in the examples of NIPS and InfoVis in the previous section, disclosing identities of authors to editors or program committee members for processes in which authors are anonymous is argued to be necessary for the purpose of identifying conflict-of-interest issues. 120  Use of processes where reviewers’ names and their relationships with authors are publicly visible (e.g. alt.chi 2007) could be more effective, because it allows the research community to verify that full disclosure taken place and to read the reviews and the papers in light of the disclosure(s). 3.4.3 Contributions and roles Another aspect of identity is the role of the people and organizations involved in a research project. The F1000 policy state that “individual contributions of each author to the manuscript should be detailed.” PLOS Bio, F1000, and GigaScience ask for disclosure of identities of all those who contributed to a submission. For example, the PLOS Bio policy states that “People who contributed to the work, but do not fit the criteria for authors should be listed in the Acknowledgments, along with their contributions. You must also ensure that anyone named in the acknowledgments agrees to being so named.” Interestingly, the two parts of the policy (inclusiveness and consent) could be in conflict; however PLOS Bio appears to put more emphasis on consent rather than inclusiveness (“must” vs “should”). In contrast, NIPS asks for concealment of any acknowledgments until a paper is accepted for publication to preserve the anonymity of authors. In the process conducted by PLOS Med and F1000 “All authors will be contacted via e-mail at submission to ensure that they are aware of and approve the submission of the manuscript, its content, authorship, and order of authorship.” Regarding the role of research sponsors, BMJ asks for disclosure of “the role of the study sponsor(s) or funder(s), if any, in study design; in the collection, analysis, and interpretation of data; in the writing of the report; and in the decision to submit the article for publication.” In 121  addition to asking for acknowledgment of the funding agency that has supported the research under review, F1000 asks authors to declare which specific grant has supported the work and who is responsible for the grant. 3.4.4 Reviews, decisions, and other communications What happens during the peer-review process is sometimes disclosed to authors or to the general public. Sometimes only aggregate information is disclosed. For example, the preface to the proceedings of IJCAI 2011 mentions that they “received 1,325 papers in total, and accepted 400 papers (30.2 percent), 227 for oral and poster presentation (17.1 percent) and 173 for just poster presentation (13.1 percent)... Each paper received at least three reviews from the Program Committee. Where issues remained, additional reviews were commissioned. One paper (which was eventually accepted) received a record of six reviews.” In contrast, sometimes the process that each individual submission has gone through is disclosed. PeerJ, EMBO, BMJ Open, ACP, and alt.chi all provide a complete record of communications during the peer-review process. For example, BMJ Open publishes “all previous versions of the manuscript [and] the reviewers’ comments and authors’ replies to those comments.” BMJ asks authors to provide copies of previous reviews if a manuscript has been initially submitted to another journal. Publishers such as Cell press, EMBO, and BioMed Central allow authors to transfer the reviews they receive from one of their journals to their other journals. Neuron discloses reviews to other reviewers of a manuscript; alt.chi, BMJ Open, EMBO, and ACP disclose the reviews to the general public. The NIPS conference publishes anonymous reviews. CCR, instead of disclosing the original reviews, publishes a “public review” that points 122  out “the contributions and interesting aspects of the paper, mentioning perceived shortcomings.” BMJ occasionally publishes commentaries to “help readers interpret the research or place it in context.” In addition, BMJ publishes rapid responses to publications and then archives them on the BMJ website. BBS invites commentaries and, if accepted, treats them as archival publications. The journal Frontiers also accepts commentaries and reviews them before publication. Some journals such as BMJ Open, PLOS Med, PLOS Bio, and PeerJ allow readers to put informal comments on publications. In contrast, Stanford LR “will never reveal their review to the authors” to ensure anonymity of reviewers. Confidentiality of communications during peer review is not limited to the reviews and contents of the paper. For example, the PLOS Bio policy states that “we regularly confer with potential reviewers before sending them manuscripts to review.  […] even these initial messages or conversations contain confidential information which should be regarded as such.” Even though BMJ discloses identities of authors and reviewers to each other, its policy nevertheless states that “all queries should still be directed through the editorial office,” perhaps to ensure that all the communications are disclosed to the editorial office. Neuron mentions that some communications between reviewers and editors can be concealed from authors. “If some specific aspects of the report seem inappropriate for presentation to the authors, they can be sent as comments for the editors’ eyes only.” In contrast, the EMBO policy states that “To further facilitate transparency, The EMBO Journal has removed the ‘Confidential Comments’ field from our referee reporting forms”, but EMBO still allows for sending confidential emails to the editor to communicate concerns about ethical violations.  123  3.4.5 Reviewers’ performance As discussed in the previous section, venues such as PeerJ, EMBO, BMJ Open, alt.chi, and ACP publish reviews with the reviewers’ names. These records could be considered one form of record of a reviewer’s performance; however, the full content of a review is not required for recording an assessment of the performance of a reviewer. In a guide for program committee chairs, Alan Bundy, program chair of IJCAI 1983, mentioned recording of information about late reviews by IJCAI chairs (Bundy, 1983), before the advent of electronic peer-review support systems that have significantly facilitated such record keeping. In an editorial note by four editors of computer science journals, they warn about the consequences of more elaborate record keeping. “Electronic manuscript systems easily provide time data for reviewers and some offer rating scales and note fields for editors to evaluate review quality. Many of us (editors) are beginning to use these capabilities and, over time, we will be able to have systematic and persistent reviewer quality data. Graduate students, faculty, chairs, and deans should be aware that these data are held” (Marchionini et al., 2007). While basic review assessment mechanisms are offered by some peer-review support systems (see Figure 3.1), BSSA was the only venue that we found encouraging use of those features, stating that rating reviewers “is an important step that will help us identify reviewers who regularly do a poor job so they can be avoided in the future.” Surprisingly, BSSA explained how to rate reviewers in instructions for associate chairs, but not in the instructions for reviewers. 124  1)    2)   Figure 3.1 Interfaces for eliciting reviewer performance: 1. ScholarOne, 2. EditorialManager Another form of disclosure of reviewers’ performance is the awards given to best reviewers (“Outstanding referees program,” 2013). For example, NIPS gives “up to 100 Reviewer Awards to reward high quality reviews.” While many publication venues focus on the positive side of the spectrum to avoid potential defamation claims, one could argue that being in a community that offers such awards and not winning any awards after years of participation could signal poor performance. 3.4.6 Properties of disclosure/concealment Each disclosure/concealment policy identifies the information that is disclosed or concealed, as well as the individuals or groups to/from whom the information will be disclosed/concealed. We tabulated these to gain insights into the range of policies that exist. In addition, we identified two properties of those policies: level of enforcement of policy, and temporality of the disclosure/concealment. 125  3.4.6.1 Level of enforcement of policy Not every disclosure/concealment request must be fulfilled. Policies ranged from being agnostic about disclosure/concealment, to various shades of encouragement, to coercion and threats of punishment. Even if a venue is agnostic about concealment of a piece of information it might clearly state that to avoid confusion. For example, InfoVis is agnostic about concealing the identity of authors. “The choice of complete anonymity is optional. Authors can reveal their names and affiliations in the first round of the review cycle if they choose not to anonymize their work.” BSSA offers similar instructions regarding anonymity of reviewers, stating that “If you choose to waive anonymity, please include your name in the ‘Reviewer Blind Comments to the Author’.” Regardless of reviewers’ choice of anonymity, BSSA removes identification information from documents submitted by reviewers to avoid unintended disclosure of their identities. “The online system removes identifying information from Microsoft Office files including Word, Excel, and PowerPoint files as well as Adobe PDF files.” Stanford LR provides a field in their submission form for uploading authors’ resumes, but mentions that “Resumes and biographical information are not required,” without expressing any preference. Often publication venues take a stance regarding the disclosure/concealment of information. An example of encouragement for adoption of a practice is how PeerJ approaches concealment of the identity of reviewers. “Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review.” In addition, PeerJ mentions the benefit of adhering to such practice. “If they agree to provide their name, then their personal profile page will reflect a public acknowledgement that they performed a review (even if the article is 126  rejected).” NIPS encourages reviewers not to try to discover authors’ identity “by searching the Internet, a reviewer may discover (or think he/she may have discovered) the identity of an author. We encourage you not to actively attempt to discover the identities of the authors.” In contrast, ICCV explicitly mandates against the behavior. “Reviewers must not seek the identity of the authors; authors must not bias the review process by suggesting their identities.” Another way that venues use to communicate their preference is by setting a default practice, while offering an option to opt out. For example the GigaScience policy states that “As a default, we will pass a reviewer's name on to the authors along with the comments. However, if reviewers do not wish to have their name revealed, we will honor that request.” The default mode of treating reviewers’ identity is the reverse in PLOS Bio and Leonardo. For example, the PLOS Bio policy states that “Unless reviewers have explicitly requested to be made known, we do not release their names either to authors or to other reviewers of the manuscript.” Sometimes, venues ask for best effort. For example, ICCV asks reviewers to “make all efforts to keep their identity invisible to the authors. Don’t say, ‘you should have cited my paper from 2006!’”  Some practices are required, and are enforced by venues. The BMJ policy states that “reviewers have to sign their reports, saying briefly who they are and where they work.” Some venues go further and threaten with punishments, especially when it is difficult to validate whether a policy is fully adhered to. For example, the PLOS Bio policy states that “Failure to declare competing interests at submission […] may result in immediate rejection of the paper. If a competing interest comes to light after publication PLOS journal will issue a formal correction or 127  retraction of the whole paper, as appropriate.” NIPS warns that it refuses to review papers that do not properly conceal the identity of authors.  Preserving anonymity of a party could require cooperation of multiple parties. For example, enforcement of anonymity of authors often requires the cooperation of reviewers as well. NIPS writes “We encourage [reviewers] not to actively attempt to discover the identities of the authors […] if you believe that you have discovered the identity of the author, we ask that you explain how and why…” 3.4.6.2 Temporality of concealment/disclosure Another important parameter that defines a policy is its dynamic or temporal aspect. Anonymity can change in various stages of a process. For example, in processes that mask the identity of authors, if a paper is accepted the reviewers (in fact everyone) will eventually know the identity of authors. At PLOS Bio disclosure of the identity of editors (but not reviewers) is not permanent. “Academic Editors retain anonymity unless a paper is accepted for publication.” NIPS 2013, instead, only published reviews depending on the fate of the paper. “Anonymous reviews of accepted papers will be made public.”  In summary, each anonymity policy involves an aspect of identity (name, role, relationship, communication, performance) that is governed (disclosed or concealed) by the policy, a group that is blinded (or not) to that aspect of identity, a level of enforcement of the policy, and temporal properties governing when the policy is in effect. In addition, policies can differ in how they are communicated, how they are implemented, and who makes the disclosure/ concealment 128  decision when a policy is flexible or indeterminate. While it is difficult to represent all the differences between anonymity policies, Table 3.2 summarizes some of the key differences between publication venues. Based on these key attributes, a peer-review process can be represented as shown in Table 3.3 by a table that summarizes the information that each group of stakeholders sees, in some cases at the different stages of the peer-review process.   129  Public’s view after the peer-review process  Authors’ identity Reviews Reviewers’ Identity Referees’ Identity Disclosed Many NIPS, EMBO, BMJ Open, F1000, alt.chi, Bio Direct, ACP BMJ Open, F1000, alt.chi, Bio Direct, Frontiers PLOS Med, PLOS Bio, ACP, alt.chi Preferably disclosed  EMBO, PeerJ PeerJ, GigaScience PeerJ Agnostic or both ways   ACP  Preferably concealed     Concealed  Harvard LR (Student papers) Many Many Many  Author’s view during the peer-review process  Other authors’ identity Reviews Reviewers’ Identity Referees’ Identity Disclosed F1000, alt.chi, Cochrane, ACP,  Many BMJ, BMJ Open, Bio Direct, alt.chi, F1000  Many Preferably disclosed   PLOS Med, PeerJ, GigaScience  Agnostic or both ways  Leonardo ACP, BSSA Cochrane Preferably concealed   PLOS Bio, Leonardo  Concealed  Many Stanford LR, Harvard LR Many Many  Reviewer’s view during the peer-review process  Authors’ identity Other reviews Reviewers’ Identity Referees’ Identity Disclosed Many IJCAI, alt.chi, F1000, NIPS, ACP, Neuron IJCAI, alt.chi, F1000, ICSE Many Preferably disclosed     Agnostic or both ways InfoVis  ACP  Preferably concealed   PLOS Bio  Concealed  Many Many Many   AE/PC’s view during the peer-review process  Authors’ identity Reviews Reviewers’ Identity Other Referees’ Identity Disclosed Many Many Many Many Preferably disclosed     Agnostic or both ways     Preferably concealed     Concealed  IJCAI, ICCV    Table 3.2 Summary of practices around concealment of identities. In several cases the venues did not state some aspects of their practices. In these cases we assumed that they use the most common practice. Referee refers to the primary decision makers (Associate editors in most journals, and program committee members in most conferences)   130  alt. chi Visibility of Authors ID Visibility of Reviews Visibility of Reviewers ID Visibility of Referees ID Public View     Author View     Reviewer View     Referee View      PeerJ Visibility of Authors ID Visibility of Reviews Visibility of Reviewers ID Visibility of Referees ID Public View  ~ ~  Author View   ~  Reviewer View     Referee View      Stanford LR Visibility of Authors ID Visibility of Reviews Visibility of Reviewers ID Visibility of Referees ID Public View     Author View     Reviewer View     Referee View      Key Disclosed  Preferably disclosed ~ No preference ~ Preferably concealed ~ Concealed   Table 3.3. Examples of tabular views representing three peer-review processes that differ based on anonymity policies about the visibility of reviews, authors, reviewers, and referees as described in Table 3.2.  3.5 Discussion and recommendations for design We found substantial diversity in how identities are treated in various peer-review processes. In this section we review how computer support for peer review has affected and can continue to affect anonymity policies and their implementation. In addition, we discuss some of the implications of our findings for future research on peer review and systems that support it. 3.5.1 Technology and anonymity We found several direct and indirect references to how technology affects anonymity practices. We used our findings as a basis for further reflection on how digital publishing and computer-131  supported peer review can affect anonymity policies, therefore some of our design recommendations are not directly tied to our findings. In other words, we have used this section as an opportunity to extrapolate or envision potential future ways of supporting anonymity and transparency that arose from thinking about solutions to issues that were in the findings that could be generalized to other situations related to anonymity.   3.5.1.1 How computer support affects anonymity  As discussed in the review of the literature in Section 3.1, the effectiveness of masking authors’ and reviewers’ identity is still questionable. Technology currently in use by researchers can challenge the desire of publication venues for anonymity of authors, reviewers, or other stakeholders. For example, people intentionally or unintentionally leave traces of what they do on the Internet, and a double-blind venue such as NIPS can only “encourage you not to actively attempt to discover the identities of the authors”. In addition, authors and reviewers may leave traces that are generated automatically in documents that they submit, sometimes in the form of document properties or metadata. As pointed out by BSSA, some peer-review support systems alleviate the problem by removing such metadata. However, various types of digital footprint on the Internet, outside the peer-review support system, could affect anonymity. For example, many researchers make available their publications on their personal websites, some of which track visitors. Reviewers accessing such publications could then be tracked by authors, who could then form a guess of who the reviewers might be for new submissions. As found by Rainie et al., while most Internet users are aware of such digital footprints, it is difficult not to leave such traces behind (Rainie et al., 2013).   132  It has been shown that it is possible to identify with high certainty the authors of a given paper using various features of manuscripts such as use of language (Basu Roy, 2013; Nanavati, Taylor, Aiello, & Warfield, 2011). In research communities that employ both blind and open reviewing, a corpus of open reviews can be used to train algorithms for identifying writers of anonymous reviews.  Automation can help decrease the number of individuals that need to know the identity of authors, hence enabling increased anonymity. Instead of relying on manual work and knowledge of editors and program committees for reviewer assignment and handling conflict of interest, several venues automate (often only partially) the reviewer assignment process by asking reviewers to enter their preferences, or by identifying their expertise based on their publications tracked by online bibliography systems. Such automation, in conjunction with automatic detection of conflicts of interest, allows publication venues to conceal the identity of authors from program committee members and editors. However, automatic detection of conflict-of-interest issues can only identify a limited subset of relationships that constitute a conflict of interest; hence we recommend that automated conflict-of-interest detection adopt a community-powered approach where community members (perhaps only program committee members) are asked to confirm the absence of conflict-of-interest. If it is desired to conceal the review assignment from those involved in assessing conflicts of interest, only a subset of the identified author-reviewer pairs with no conflict of interest could be used in the actual reviewer assignment, to ensure that those who verify absence of a conflict of interest will not know if the reviewer is actually used in the final reviewer assignment. This could be operationalized in a number of ways: verify twice as many potential reviewers as will be used so that verifiers cannot 133  guess who is asked to review, or perhaps have multiple verifiers who each verify a few reviewers but do not know if any of their reviewers are chosen. If there is a concern about the ability of verifiers to detect conflicts, multiple verifiers could be used for each reviewer and the reviewer would not be used if any verifier determined there was a conflict. Inspired by the “reCAPTCHA” technique (Von Ahn, Maurer, McMillen, Abraham, & Blum, 2008), to assess verifiers’ reliability a system could automatically identify individuals who are in conflict with particular authors (e.g., based on co-authorship, or based on the relationships that have been previously identified by previous users) and include them in the list that is to be verified for conflict-of-interest as a way to test the accuracy of the verification process (if known conflicts are not caught by the process, the system will have evidence that the process is not working as it should). The reCAPTCHA technique mixes challenge questions with known and unknown answers and uses a respondent’s answers to the questions with known answers to assess the validity of the unknown answers. We propose doing the same for identification of reviewing conflicts. It is possible to envision anonymity going beyond what is being practiced today. All potential identification information can be removed automatically from submissions and papers can be assigned automatically based on relevance of publications to members of the research community and automatic assessment of conflicts of interest based on co-authorships and affiliations, all of which can be extracted from online bibliography databases or prior submissions. Any information suggesting the identity of reviewers can be stripped out of the reviewer reports, and decisions can be made automatically based on assessments of reviewers, weighted according to their expertise and reputation as measured by their productivity, and automatic assessment of their judgment quality as measured by the correlation between their 134  previous judgments and the number of citations that their recommended papers received after publication. To date, computer-supported peer review has already significantly facilitated complex collaboration structures for conducting the peer-review process. For example, a higher level of anonymity of authors has been achieved for one venue by ensuring that those who handle conflicts of interest and hence need to know the identity of authors are different from those who make acceptance decisions. Currently some publication venues such as law reviews avoid giving reviewer reports to authors to ensure that authors cannot identify reviewers based their use of language. Computer support can alleviate this fear by enabling collaborative composition of reviews (e.g., in wiki-like interfaces) to decrease the chance of identifying individual reviewers, or by adding a post-processing (automatic, or manual with the help of crowd-powered systems (Bernstein et al., 2010)) to reword the reviews. 3.5.1.2 How computer support affects transparency We found several instances of how technologies enabled increasing transparency of the peer-review process.  Because most of the communications related to the peer-review process are conducted on the web, it is easy to automatically combine all communications including reviews and responses into a well-formed document that can be published on the Internet. The level of transparency that is currently offered by journals such as EMBO, F1000, and BMJ Open was hard to achieve prior 135  to online peer-review support systems. Similarly, open peer reviewing where anyone can decide to review an article for a venue (e.g. alt.chi, or the second phase of ACP) has been practically impossible without online peer-reviewing systems.  One area in which we found online peer-review support systems have taken a very different approach than most online communities is organizational memory. In online communities where users need to rely on performance of other users, it is common to offer a reputation system that keeps track of users’ performance based on feedback received from other users. For example couchsurfing.com offers multiple mechanisms for expression of trust between members (Adamic, Lauterbach, Teng, & Ackerman, 2011), and StackOverflow (Mamykina, Manoim, Mittal, Hripcsak, & Hartmann, 2011a) offers an elaborate reputation system where the reputations of users represent their effectiveness in both posing and answering questions. Automatic tracking of some aspects of reviewers’ performance (e.g., timeliness, rate of declining reviews), and allowing for assessment of reviewers by commenting on or rating their reviews could offer more transparency into reviewers’ behavior. While some peer-review support systems offer these functionalities, they seem to be rarely used in practice. As we mentioned in our findings, sometimes research communities rely on secretive approaches such as maintaining blacklists of reviewers. This could be an area where lack of transparency could lead to abuse of power. In one case we observed that policies related to assessment of reviewers were clearly communicated to editors, but not reviewers. We recommend designing peer-review support systems to provide mutual awareness of activities, also referred to as social translucence (Erickson & Kellogg, 2000; McDonald et al., 2012). Specifically, we recommend designing 136  peer-review support systems that allow authors and reviewers to know what is being recorded about them and who is able to see the recorded information. Such systems can raise awareness of how the process is run, without necessarily disclosing identities of those involved. In addition, we think that investigating ways of recording reviewers’ performance that balance concerns for defamation and liability with concerns for the effectiveness of the peer-review process is an important avenue for future research.  In the venues that we analyzed, selection of reviewers and the final decision making was rarely transparent. The alt.chi venue was one the most open processes among those that we analyzed. It allowed reviewers to self-select (ACP and BBS allowed that only in the 2nd phase or in post-publication commentaries). We propose experimenting with open peer-review systems that allow people to decide how they want to contribute to the peer-review process, for example by suggesting improvements, pointing out problems, offering help with specific aspects such as statistics or copyediting, making judgments, or any combination of the potential tasks that are needed during the peer-review process. Many online knowledge exchange systems offer such environments. Previous research has shown that people indeed choose specific (informal) roles based on their interest and expertise (Thom-Santelli, Muller, & Millen, 2008; Welser et al., 2011; Ye & Fischer, 2007). We suggest exploring the design of collaborative research-curation systems that incorporate peer-reviewing and dissemination, two major functions of publication venues, as part of informal exchanges, replacing complex formal procedures and policies for management of roles and permissions with informal roles accompanied by a transparent reward system that evolves over 137  time based on the perception of community members of the need for activities. Similar approaches have been used by online knowledge exchange systems such as StackOverflow (Mamykina et al., 2011a), where people can earn privileges by contributing to the system.  Lastly, digital publishing has allowed pushing the limits on the amount of materials that can be published. Some publication venues include the complete record of the review process, the roles and levels of involvement of authors and reviewers, and disclosures of conflict of interest (in addition to datasets and other materials that are outside the scope of this dissertation). 3.5.1.3 Support for flexible policies and negotiation  One important trend that we identified is the rise of publication venues that allow for flexible self-administered policies. These venues empowered authors or reviewers to decide whether or not to disclose their identities during the process. This is an important trend, because many of the concerns about peer-review processes do not apply to all situations. For example, if a less-experienced reviewer fears the repercussions of criticizing a senior author, allowing them to opt out of the default disclosure of their identity could solve the problem without requiring everyone else to conceal their identities. Similarly, a reviewer may feel the authors’ knowledge of her identity may facilitate communication and trust. There are situations where authors may feel their reputation (or lack thereof) may be used against them. Each of these situations could raise concerns for fairness, which need to be addressed through appropriate decision processes that include communication between the affected parties, rather than solely being based on one party’s decision. 138  A need for flexibility in management of roles, privacy, and permissions in CSCW systems have been discussed at length in the CSCW literature (Dourish & Anderson, 2006; Stevens & Wulf, 2002). Dourish suggests looking at privacy as a process of negotiating boundaries rather than a set of predefined rules (Dourish & Anderson, 2006). We suggest adopting this view in the management of identities in peer-review support systems. Current interfaces for supporting peer review use lists of permissions or a traditional access-control matrix of subjects, objects, and operations (Lampson, 1974) for representing and managing permissions (Figure 3.2). We think peer-review support systems need to enable implementation of flexible anonymity policies by supporting the negotiation of identity disclosure by reviewers, authors, and editors on a case-by-case basis.  1. 2.  Figure 3.2 . Interfaces for managing anonymity: 1. Editorial Manager 2. Precision Conference System Digital publishing allows temporary anonymity. For example, it is possible to allow reviewers to decide to disclose their identities or reviews after a cooling-off period, similar to declassification of old once-secret documents by governments. This would help create a more complete record of scientific progress and of the contributions of individuals in future digital libraries, while preserving anonymity of the process when it matters the most, which is the present. 139  3.5.2 Implication for communities and future research In this section we describe some of the implications of our research for research communities and for future research on the peer-review process. 3.5.2.1 Untested peer-review processes and parameters Despite the wide variety of anonymity policies that we observed, some potential variations appear to be missing. For example, we found no publication venue that conceals authors’ identity but discloses reviewers’ identity. As discussed earlier, concealing the identity of authors could suppress potential biases, and disclosing reviewers’ identity could encourage accountability. While there are downsides to this policy as there are to any other anonymity policy (as discussed in our review of the literature), we recommend experimenting with this approach to expand our understanding of the effects of anonymity policies and of interactions between policies. In addition, we found no venue that allowed anonymity of authors after publication. Over the centuries, some authors decided to publish their books anonymously or under pseudonyms. In some cases, authors’ identities were disclosed later by the authors themselves (e.g., The Autobiography of an Ex-Colored Man by James Weldon Johnson) or by others (e.g., Primary Colors by Joe Klein), but sometimes they remained anonymous until their death (e.g., A Woman in Berlin by Marta Hillers). Similar to anonymous books that enable authors to write about controversial issues, one could argue that anonymous authorship of research articles should be an option for authors who do not like to be associated with them, either because the topic of the 140  study is controversial or because knowing the identity of the authors would lead to uncovering of confidential information, or identification of anonymous subjects of a research study. 3.5.2.2 Connection between research and the practice of peer review As presented in our review of the literature, previous studies suggest that knowing authors’ identity affects acceptance decisions much more severely than the quality of reviews. It therefore seems crucial to conceal identities of authors from editors and from program committee members who make decisions, rather than (or in addition to) from reviewers. Yet, most of the double-blind peer-review processes that we analyzed concealed identity of authors only from reviewers rather than from those who actually make the final acceptance decisions. If these approaches are adopted, it is crucial to make sure that the implementation and communication of the policies are consistent. Most venues that describe their process as being double blind do not explicitly state who has access to the hidden identities. Numerous editorial notes and blog posts by researchers argue for or against various practices, and propose new types of processes. While these discussions are valuable for understanding concerns and finding solutions, most of the proposed solutions are already implemented in other research communities. Therefore, we think it is more important to collect empirical data about how our current practices perform, and to understand how the numerous types of process already implemented in various research communities perform in practice. Considering that the majority of activities in the peer-review process are conducted using online systems, we suggest recording reviewers’ and authors’ behavior, and sharing anonymized datasets with researchers for evaluating the various processes. BMJ does an impressive job of experimenting with the peer-141  review process (Godlee et al., 1998; Susan van Rooyen et al., 2010). It informs authors and reviewers of their ongoing editorial research, and it allows authors and reviewers to opt out of the experiments. 3.5.2.3 Future research should specify properties of policies Deciding the most appropriate peer-review process for a specific community with specific goals in mind is a design problem that requires policy design. To help with the process of policy design, researchers need to study the effects of each of the design parameters and the interactions between them. Numerous interview studies have elicited opinions from researchers about various types of processes, without clarifying all the related parameters and interacting policies. For example one of the largest surveys on peer review offered “open peer review” as an option for a question asking respondents for their most preferred mode of reviewing, without clarifying whether the reviews will be visible to the public or not (Ware & Monkman, 2008). This shortcoming was rectified in a follow-up survey a few years later (Mulligan et al., 2012). However, there is still significant variation in how a policy can be implemented. For example, some of the venues allow reviewers or authors to opt out of anonymity policies. Such subtleties can affect both perceptions and the reality of the process. One implication of our findings for future research is that we need to elicit opinions about a specific implementation of a policy to understand study participants’ attitudes more accurately. Similarly, sometimes findings of peer-review experiments are overly generalized to a type of process (rather than a specific implementation of it). We argue that implementations of a type of process can vary widely and appropriate caveats must be noted when drawing generalizations.  142  3.5.3 Limitations of the research An important limitation of our study is that we relied only on explicit published practices of publication venues. Much of what happens in research communities is not documented, and some of what is documented is only available to insiders. However, we hope our study offers a first step in understanding the information practices of peer-review processes in various research communities. Another limitation of our study is that we only looked at a tiny fraction of all scholarly journals. While we hope to have captured much of the variety in how scholarly peer review is implemented, it is extremely difficult to cover all the different peer-review processes and practices that are employed by thousands of scholarly journals considering the nuances in implementation of various policies. In addition, what we reported in this chapter summarizes anonymity policies in mid-2013. Many publication venues are evolving and our descriptions may not apply to how they conduct the peer-review process even a short time after we conducted our study. Lastly, peer-review processes could vary in selectiveness (sometimes measured by acceptance rate), as has been discussed by Grudin (2013), method of administration (offline vs. online), number of steps and phases (revisions, rebuttal, etc.), role structure (tiers of the program committee, or use of external reviewers), etc. The scope of this research was limited to differences in anonymity and transparency policies. Further research is needed to analyze, understand, and design for the numerous other aspects of peer review. 143  3.6 Conclusion After decades of criticizing the traditional scientific peer-review process, several publication venues have successfully implemented variations of the process, many of which would not have been practical without help of computer-supported information and communication systems. We found that some publication venues have increased the level of anonymity, while others have pierced or completely lifted the veil of anonymity. We developed a preliminary taxonomy of anonymity policies in peer review, aiming to advance our understanding of ongoing developments in the design of peer-review processes and thus inform the design of future peer-review support systems.  While the CSCW and HCI communities have been actively involved in trying out new peer-review processes, understanding the needs for supporting various peer-review processes and designing systems to support peer review have been areas that have received little attention in CSCW research. Another goal of this research was to bring this domain to the attention of CSCW researchers as a topic of research by showing how various online information systems have contributed to the development of novel peer-review processes.  Much more research needs to be done to make sense of how the variety of peer-review processes compare and how computer support can increase their effectiveness. CSCW researchers could play a significant role in future developments of the peer-review process. The findings conveyed in this chapter are intended to illustrate the many ways that peer-review support systems should and could be flexible and customizable in their use of anonymity to allow research communities to better tailor them to their needs and policies. By reviewing the 144  range of anonymity policies that are already in use, and those that might come into use if they can be supported by technology, we gain an understanding of how systems to support peer review should be parameterized so the systems can be customized or tailored to the specific needs of a community without having to re-program. As our study showed, there are sometimes subtle interactions between some of the anonymity settings that can constrain the combinations of choices that can be made in some situations, which means the parameterization will not be straightforward. 145   What Motivates People to Review Papers? The Case for the Human-Computer Interaction Community  The ability to recruit expert reviewers contributes both to the quality and fairness as well as the ease of conducting peer review because all peer-review processes rely on the expertise of researchers performing the reviews (Bloom, 1999; Eisenhart, 2002; Finke, 1990; Tite & Schroter, 2007; Tsui & Hollenbeck, 2009). Informal observations at journal editorial board meetings and conversations we had with authors, reviewers, and editors suggested that recruiting qualified reviewers is a common concern. Several journal associate editors expressed concern about the increase in declining to accept reviewing requests and lack of response from potential reviewers. Declining of review requests by reviewers has two consequences: it may delay the review process (in the case of journals) and it may lead to editors recruiting reviewers who do not appreciate the value or potential impact of the work under review or who are not familiar enough with the latest related research to be able to evaluate the work effectively. In the first case authors are dissatisfied, in the second case it is the scholarly community as a whole that may be dissatisfied.  We studied both motivations for accepting a specific review request and general motivations for reviewing (what makes researchers value reviewing as part of their scientific activities?) Previous studies have investigated reviewing motivations in various research communities and have provided often quantitative descriptions (Snell & Spencer, 2005; Tite & Schroter, 2007; 146  Ware & Monkman, 2008). In contrast, we sought to understand how reviewing motivations differ across reviewers within a community. We chose the human-computer interaction (henceforth HCI) community as the focus for our study. We chose HCI because previous studies might not fully generalize due to HCI’s unique characteristics: interdisciplinarity, a relatively high level of involvement by practitioners, and significant contributions from students. Interdisciplinarity is known to increase the difficulty of identifying and finding suitable reviewers because of the smaller pool of qualified reviewers familiar with a specific mix of disciplines (J. T. Klein, 2008; Nowotny, Scott, & Gibbons, 2003). One might also expect that practitioners and students would have different attitudes towards reviewing, and they might be less experienced in the peer-review process, which could impose constraints on reviewer selection. Another consideration is that, similar to many branches of computer science, a lot of HCI research is published in conference proceedings rather than in journals (Patterson et al., 1999), creating different temporal and social dynamics, such as episodic peaks of reviewing tasks right after conference submission deadlines, and concurrent involvement of a large portion of the community in reviewing and review assignment tasks for thousands of simultaneous submissions (Mackay, Baudisch, & Beaudouin-Lafon, 2013). The premiere conference in the HCI community is the Association for Computing Machinery’s annual International Conference on Human Factors in Computing Systems (ACM CHI, henceforth just CHI). For a CHI conference, reviewers perform double-blind reviews; associate chairs (ACs, also referred to as program committee members) assign papers to reviewers, write single-blind meta-reviews, participate in face-to-face program committee meetings where most papers are discussed, and make acceptance decisions (usually with the involvement of 147  subcommittee chairs and other ACs); program co-chairs and subcommittee chairs are primarily involved in organizing the reviewing process and overseeing the program committee meetings.  With the HCI community as our primary target, we designed and conducted a survey questionnaire to elicit reviewers’ opinions on reviewing motivations. We had four primary goals. First, we were interested in knowing the motivations that reviewers had for participating in the peer-review process. Second, we wanted to investigate influences on the decision to accept a specific review request. Third, we were curious about how various background and demographic variables affect reviewing motivations. Fourth, we were interested in how different motivations are related to each other, and the various dimensions of motivation that exist for reviewing papers, if they do indeed exist.  4.1 Related work Reviewing motivations have been studied in various research communities. Snell and Spencer (2005) conducted a survey in the medical education community. They found that the most common reasons for reviewing were staying up-to-date, enjoying it, and considering it a responsibility. Similarly, Kearney et al. (2008) found that keeping up-to-date was the primary motivation of reviewers for nursing journals. They also found that lack of time was the main reason for declining review requests. Francis (2013) found that helping the profession, followed by keeping up-to-date, were the primary reasons of library and information science researchers for reviewing. Lipworth et al. (2011) conducted interviews with editors and reviewers of biomedical journals and found quality control, communal obligation, and self-interest in learning and networking opportunities to be important reviewing motivations . In another study of 148  medical reviewers, Tite and Schroter (2007) found that the most important motivations for accepting review requests were contributions of papers, relevance of papers to the reviewer’s own work, and opportunity to learn something new. Conflict with other workload was by far the most important reason for declining specific requests. Ware and Monkman (2008) found that playing a part as a member of the community, improving the quality of papers, and seeing new work ahead of its publication were the most important reasons for reviewing. They also found that free subscriptions, acknowledgement in the journal, and payment in kind by the journal (e.g. waiver of color or other publication charges) were the most preferred incentives. Mulligan, Hall and Raphael (2012) found that not being expert in the topic of the paper and not having time were important reasons for declining reviews, that most reviewers enjoyed reviewing, and that about half of the reviewers were against disclosing their names or their reviews to the public. Motivations for reviewing may not fully generalize across research communities due to communities’ different characteristics, but we took these previous findings as starting points for our investigation of motivations to serve as a reviewer within the HCI community. No previous study we know of examined motivational differences between reviewers or the effect of background variables such as experience, gender, job, or education on reviewing motivations. We suspected that individual differences affect reviewing motivations and that understanding the diversity of the motivations could help in the design of the processes used for peer review and of the systems that support peer-review processes. In particular, it should be possible to exploit various motivations to facilitate recruitment of a wider variety of reviewers.  149  In addition to the literature on peer review, we also looked at other areas where motivation to contribute to a community has been studied. Reviewing papers can be considered volunteer knowledge work, so understanding motivations for doing knowledge work, such as participation in open source projects or platforms such as Wikipedia, is relevant. Lakhani and Wolf (2005) found that a personal sense of creativity, personal needs for new software, intellectual stimulation derived from programming, and improving programming skills were strong motivations for developers participating in open-source projects. In addition, Roberts and Slaughter (2006) found that status-based motivations enhance intrinsic motivations for open source software developers. Nov (2007) found that ideology (believing that “information should be free”) and fun are the most important motivations of Wikipedia contributors, although ideology was not correlated with contribution levels. Oreg and Nov (2008) compared Wikipedia contributors and open source developers and found that although they were not significantly different in terms of values held, open-source developers were motivated more by reputation-gaining and self-development motivations and less by altruism in comparison with Wikipedia contributors. Studies on question answering systems found that helping the community, learning, sense of mastery, and earning reputation points were the most effective motivations (Mamykina, Manoim, Mittal, Hripcsak, & Hartmann, 2011b; Nam, Ackerman, & Adamic, 2009). McLure et al. (2000) studied motivations for participating and sharing knowledge in online communities of practice and found that desire for having access to a community of practice, giving back to the community, and altruism were the strongest motivations. Lampe et al. (2010) found that receiving value from providing information, sense of belonging, and self-efficacy were significant predictors of intention to contribute to the Everything2 online community. 150  Lastly, Brzozowski et al. (2009) and Lampe and Johnston (2005) highlighted the role of providing feedback to contributors of online communities. Similar to scholarly peer review, all of these communities rely on volunteers’ contributions and expertise. These and many other studies on motivations provided a foundation for our research. Based on their findings, we included in our own survey questions related to the joy of intellectual stimulation, reputation, self-development, being heard, altruism, connecting with and belonging to a community, and reciprocity. 4.2 Methods We describe the questionnaire and recruitment process, provide a profile of survey participants, and summarize our approach to data analysis. 4.2.1 Materials (questionnaire design) Our questionnaire had three sections. The first, outlined in Table 4.1, solicited background information about participants. LEVEL OF INVOLVEMENT is the number of types of reviewing roles held. For example, LEVEL OF INVOLVEMENT for someone who has experience as a reviewer (for journals or for conferences), as a journal editor, and as an AC would be 3.   151    POSITION  LEVEL OF INVOLVEMENT  Full Professor  40  1 (only reviewing) 155  Associate Professor  39  2 63  Assistant Professor  56  3 47  Post-doctoral Fellow  23  4 24  Research Associate  18 5 (All reviewing roles) 4  Industry Researcher  50  <missing>1 14  Industry Practitioner  20  LAST EARNED DEGREE   Student  48 Bachelors 12  Other  13  Masters 60  REVIEWING EXPERIENCE  Doctoral 235  1 year or less 18 AREA OF EDUCATION   2-5 years 47 Computer Science/Engineering 181  6-10 years 50 Cognitive Science and Psychology 36  11-15 years 94 Social Sciences 25  16-20 years 50 Other Engineering 17  21-25 years 29 Other 48  26 years or more 18 GENDER   <missing> 1 Female 106    Male 197    <missing> 4 Table 4.1 Profile of study participants for the six background variables. The second section of the questionnaire was about general motivations for participating in the peer-review process, not specific to a particular reviewing request. Participants were asked how important they consider each of 12 potential motivations (Table 4.2) using a 5-point scale (“not                                                  1 Due to a technical problem, fourteen participants (5%) who started the questionnaire within 20 minutes from the release of the survey were unable to answer the background question on LEVEL OF INVOLVEMENT.  152  at all important” to “extremely important”). Participants were also asked to name any motivations that were not included in the list and to rank their top three general reviewing motivations. PLEASE INDICATE HOW IMPORTANT YOU CONSIDER EACH OF THE FOLLOWING AS YOUR REASON FOR REVIEWING PAPERS? QUESTION (FULL LONG FORM AS PRESENTED IN THE QUESTIONNAIRE) SHORT FORM (FOR ANALYSIS) I learn about how to write more effectively through the process of reflecting on papers & coming up with of suggestions. LEARNING THROUGH REFLECTION I learn about how to write more effectively by learning more about the review process. LEARNING THROUGH THE PROCESS I want to help authors improve their work. IMPROVING THE WORK I want to know what is new in my field. AWARENESS I want to ensure that other researchers will be exposed only to valuable research. GATE KEEPING I want to influence my field of research and my research community. INFLUENCING MY FIELD Editors or program committee members ask me to review and I don’t want to say no. RELUCTANCE TO SAY NO I want to encourage good research. ENCOURAGING QUALITY I want to establish or maintain a good reputation in my field. REPUTATION I receive reviews from the community, so I feel I should review for the community. GIVING BACK I want to gain experience and prepare for higher positions in the review process (AC, editor-in-chief, etc.) PREPARING FOR HIGHER ROLES I enjoy critically reading and reflecting on papers. ENJOYING CRITICAL READING Table 4.2 General motivations for reviewing papers. The third section of the questionnaire was about what influences decisions to accept a specific reviewing request (15 items, Table 4.3) using a 5-point scale (“greatly reduces” to “greatly increases”). Participants were also asked to name any factors that were not included in the list.   153   PLEASE INDICATE HOW MUCH EACH OF THE FOLLOWING INFLUENCES YOUR MOTIVATION FOR REVIEWING A PAPER. QUESTION (FULL LONG FORM AS PRESENTED IN THE QUESTIONNAIRE) SHORT FORM (FOR ANALYSIS) Knowing that my review will influence the final decision about the paper. INFLUENCING DECISIONS Knowing that the authors will apply my suggested changes AUTHORS LISTEN Personally knowing the authors (assuming that I am sure about who they are) KNOWING THE AUTHORS Knowing that the paper was written by well-known authors (assuming that I am sure about who they are) FAMOUS AUTHORS My being an expert in the specific topic of the paper BEING EXPERT Being asked by a well-known researcher (who is an editor or a PC member) to review the paper FAMOUS AC Being asked by a friend (who is an editor or a program committee member) to review the paper FRIEND AC Being asked to review the paper for a high-profile publication venue HIGH-PROFILE VENUE Having free time FREE TIME Close relation of the paper and my current research RELEVANCE TO OWN RESEARCH Close relation of the paper to another paper that I have read (e.g. 2 papers on different aspects of a project) FAMILIAR PAPER Feeling that I am part of the community related to the publication venue that I am asked to review for SAME COMMUNITY Being asked to review a well-written paper WELL-WRITTEN PAPER The length of the paper being short relative to other papers I have reviewed LONGER PAPER The length of the paper being long relative to other papers I have reviewed SHORTER PAPER Table 4.3 Motivations for accepting a specific review request 4.3 Questionnaire design considerations We developed an initial set of questions based on our review of related work and informal conversations with researchers we knew who had reviewing experience. This was refined over several rounds of discussion with both experienced and novice reviewers to establish content validity for the instrument.  We chose to limit the number of items in the questionnaire because a longer survey might have negative consequences, such as lowering the response rate due to longer completion time, or inaccuracy of responses due to fatigue. The population we targeted is quite busy, so we balanced 154  comprehensiveness against time for completion. We avoided questions about incentives for which most participants might have no experience. For example, we did not ask about financial incentives because, to our knowledge, they are not used within the HCI community or in nearby communities. Moreover, we focused on motivations at a level that could help us better understand why people review papers, and how people differ in their motivations. For example, instead of asking about broader motivations such as a general enjoyment of reviewing, or feeling responsible to the community of scholars, we asked about specific social, ethical, and professional reasons that might underlie a sense of enjoyment or responsibility. The estimated completion time was less than 15 minutes, which was mentioned in the recruitment letter with the hope this would encourage responses. We attempted to minimize social desirability bias (a tendency for participants to claim socially desirable traits), either by other-deception, where respondents purposely misrepresent the truth to make a positive impression, or by self-deception, where responses are believed to be true by the respondents even though they are actually inaccurate (Nederhof, 1985). Other-deception was mitigated through anonymity and self-administration (Nederhof, 1985). To cope with self-deception we used socially neutral wordings as much as possible. For example, PREPARING FOR HIGHER ROLES was worded in a way that combined both the socially desirable motive of gaining experience and the less socially desirable motive of desire for power, as was INFLUENCING MY FIELD, which could be interpreted as pushing an agenda (negative sense), or contribution to the betterment of the community (positive sense). Similar to other methods of coping with social desirability bias, this is not free from problems (Nederhof, 1985); we did in fact discovered that 155  sometimes our wording was slightly confusing for respondents who could see two interpretations for a question.  4.3.1 Participants Survey participants (n=307) were recruited by an invitation email to all 1952 reviewers who had reviewed at least one submission for CHI 2011. To encourage participation, invitations were sent by the SIGCHI Conference Management Committee on behalf of our research team. Over the 60 days the survey was open, 307 reviewers participated (16% response rate). Only one questionnaire failed our validity check and was not included or counted in our analysis. Table 4.1 summarizes the profiles of the 307 participants.  4.3.2 Data analysis We performed three analyses using data from the survey questionnaires. Analysis 1: Relative Importance of Motivations. We collected data through two different opinion measurement strategies: (1) absolute ratings of general motivations for reviewing and of motivations for accepting a specific review request, and (2) rankings of the top three general reviewing motivations. We analyzed relative importance by applying Tukey’s HSD to the absolute ratings. Analysis 2: Effect of Experience and Demographics on Motivations. To find out which of the background variables best predict reviewing motivations we examined the effect of the six background variables REVIEWING EXPERIENCE, LEVEL OF INVOLVEMENT in peer review 156  (reviewing roles taken), AREA OF EDUCATION, LAST EARNED DEGREE, POSITION (job title or function), and GENDER on each of the motivation variables using multiple ordinal logistic regression analyses. Categories with a low number of observations decrease reliability in regression models. We therefore simplified the nine categories for POSITION to just five: faculty members, industry researchers, industry practitioners, students, or others. We similarly combined participants with LEVEL OF INVOLVEMENT of 4 and 5 into a single group because only four of the participants had an involvement level of 5 (being a reviewer, a program committee member, a program committee chair, a journal editor, and an editor-in-chief). The simplified categories were used for all analyses. The predictor LAST EARNED DEGREE had only 12 participants with Bachelor as a highest degree, so we merged them with those who had Master’s as a highest degree. We treated all 27 Likert scale response variables (outcomes) as ordinal in multiple ordinal logistic regression analyses to assess the effect of the six predictor variables. Treating response variables as continuous could have simplified the interpretation of the results, but because of several differences in the resulting regression models we decided to only treat some of the ordinal predictors (and not the outcomes) as continuous. Three of the six background demographic and experience predictor variables were ordinal, the other three were categorical. Treating ordinal predictor variables as continuous can facilitate interpretation and increase statistical power (Agresti, 2007, p. 119). We considered the possibility of treating two ordinal predictors (years of REVIEWING EXPERIENCE and LEVEL OF 157  INVOLVEMENT) as continuous. Whenever simplification of the predictor variables resulted in a different regression model (as determined by an ANOVA with a threshold of p<.1) we took the safer interpretation and used the more complex model (Agresti, 2007, p. 118). For the categorical variables POSITION and AREA OF EDUCATION, we used the largest category as the baseline for regression to assess the relative significance of the different categories. The most common POSITION was faculty member and the most common AREA OF EDUCATION was Computer Science and Engineering. Variance inflation factors (VIF) were used to examine multicollinearity. None of the predictors exceeded the acceptable level (VIF<5) (Hair, Black, Babin, & Anderson, 2009; Menard, 2002). Including interaction terms did not improve the fit of any of the regression models. Due to the exploratory nature of this study, p-values were adjusted using a false discovery rate method (Storey, Taylor, & Siegmund, 2004).  Analysis 3: Factor Analysis. Although volunteering to be a reviewer and considering it part of scholarly activities is a different behavior than accepting or declining a specific review request, we suspected that some of the underlying motivational factors would be the same, so we included all 27 questions in a single principal axis factor analysis.  Factor analysis was used to identify the potential dimensions of reviewing motivations. It is common practice to apply factor analysis methods that are based on Pearson correlations to identify the factors underlying a set of Likert scale questions. However, this can lead to incorrect conclusions such as overestimating the number of factors that are required (Olsson, 1979b). 158  Therefore, we used polychoric correlation, which is suggested for computing the correlation between ordinal variables with underlying continuous variables (Olsson, 1979a). The polychoric correlation matrix was calculated and the overall significance of the correlation matrix was confirmed using a Bartlett test (p<0.001). Factorability of the items was assessed using the measure of sampling adequacy (MSA), which assesses how well a variable correlates with other variables. The only item with an unacceptable MSA (MSA < 0.5) was FAMOUS AUTHORS. This item was omitted and the MSA values were recalculated. The MSA for every item was in the acceptable range and the overall MSA was 0.64. High negative correlation between desire for a SHORTER PAPER and desire for a LONGER PAPER resulted in one factor just for length of the paper. This was not desirable for our purposes, so we decided to include only one of these items in the factor analysis. In order to decide which item to keep for the analysis we compared the discrimination power of the two items. Based on the type of scale used and its response format, Samejima’s graded response model (1969) was applied, which revealed that the item asking about the desire for SHORTER PAPER had slightly higher discrimination power (0.147 vs. 0.123). The next step was to select the right factor extraction method. We followed Castello and Osborne’s (2005) recommendation for using principal axis factoring to avoid normality assumptions. In order to decide the number of factors to retain, we applied several methods including parallel analysis (suggested a maximum of 12 factors), Cattel’s scree test (suggested 10 factors), Keiser rule (suggested 10 factors) and Very Simple Structure (suggested 6 factors) (Revelle & Rocklin, 159  1979; Zwick & Velicer, 1986). We looked at the structures suggested for 6 to 12 factors. Ultimately it appeared that 10 factors could be extracted to achieve a clean structure. While it is generally preferable to retain too many factors than too few (Fabrigar, Wegener, MacCallum, & Strahan, 1999), extracting more than ten factors did not result in additional clarity. Factor rotation methods are used to simplify the factor structure and facilitate its interpretation. Use of oblique rotations is suggested to reveal the latent structure more accurately (Costello & Osborne, 2005). After experimenting with both orthogonal (varimax) and oblique rotations (promax, and direct oblimin), direct oblimin rotation was selected based on the clarity and ease of interpretation of the suggested structure. Factor loadings below 0.3 were eliminated, as suggested in the literature (Hair et al., 2009). We experimented with various other rotations, factor extraction methods, and correlation methods and found that each time the results were fairly consistent. We ultimately made the decisions outlined above based on their statistical appropriateness, best practices, and the clarity and ease of interpretation of the results. Our factor analysis resulted in ten factors that accounted for 50% of total variance among the items included in the analysis (Appendix C). 4.4 Results We analyzed general motivations for reviewing and motivations for accepting a specific reviewing request, and we conducted a factor analysis on all of the motivations. 160  4.4.1 Relative importance of general motivations for participating in the peer-review process Interestingly, each motivation was considered very or extremely important by over a quarter of the participants (but not always the same quarter), as shown in Figure 4.1. Two of the top three reasons for reviewing (Figure 4.2), GIVING BACK to the community and AWARENESS of new research, had previously been identified as top motivations by Snell and Spencer (2005), Ware and Monkman (2008), Kearney et al. (2008), and Francis (2013). Table 4.4 shows a pairwise comparison of importance of reasons for reviewing.  Figure 4.1  General motivations for reviewing, sorted by frequency of being extremely or very important.   0%20%40%60%80%100%Percentage of ParticipantsExtremely/Very ImportantModerately ImportantNot at all/Slightly Important161   Encouraging quality  A Giving back  A B Awareness      B C Gate keeping         C D Improving the work         C D Maintaining reputation          C D Influencing my field              D Enjoying critical reading             D Learning through the process                 E Learning through reflection                 E Reluctance to say no                  E Preparing for higher roles                 E Table 4.4 Pairwise comparisons of general motivations for reviewing, using Tukey’s HSD test. Motivations followed by the same letter are not significantly different from each other (Piepho, 2004).  Figure 4.2 Participants’ top three general motivations for reviewing, sorted by frequency of being participants’ top motivation. 0306090120150Number of ParticipantsThird reasonSecond reasonFirst reason162  Participants were asked to name other general motivations for reviewing. In addition to pointing out variations of the 12 general motivation items, participants mentioned the following: reviewing being a tenure requirement (four participants), and three more motivations that were each mentioned by a single participant – empathy with ACs, training graduate students (who collaborate in reviewing), and serendipity. 4.4.2 Relative importance of motivations for accepting a specific reviewing request Figure 4.3 shows the distribution of ratings for each of the 15 potential motivations for accepting individual review requests. Table 4.5 shows pairwise comparison of motivations for accepting a review request.  Figure 4.3 Motivations for accepting a review request, sorted by frequency of being reported to increase motivation. 0%20%40%60%80%100%Percentage of ParticipantsIncreases MotivationHas No EffectReduces Motivation163   Relevance to own research  A Being expert  A Same community      B Friend AC      B C Famous AC     B C High-profile venue     B C Well-written paper     B C Influencing decisions     B C Authors listen          C D Free time             D Familiar paper                 E Shorter paper                  E Famous authors                     F Knowing the authors                        G Longer paper                            H Table 4.5  Pairwise comparisons of motivations for accepting a review request, using Tukey’s HSD test. Motivations followed by a same letter are not significantly different. Participants were asked to name other motivations for accepting individual review requests. In addition to pointing out variations of the motivations included in the questionnaire, participants mentioned the following: novelty, quality, or importance of the paper (15 participants), monetary incentives (7 participants), acknowledgment and feedback from the editor/AC (6 participants), being able to see other reviews to compare and learn (4 participants), knowing that other reviews’ quality is high (3 participants), openness (3 participants), if a paper cites the reviewer’s work (3 participants), receiving feedback from authors and possibility of interaction with them (3 participants), paying back a favor (2 participants), and 15 more motivations that were each mentioned by a single participant – public acknowledgement, receiving personal email before the automated one, clearly knowing the expectation of the conference, multiple rounds of reviewing to see the effect of review on the paper, helping junior authors or non-native authors, knowing that the process is fair, being anonymous, prospect of a dialogue with other reviewers, “bonus 164  points,” guidance, simple process, templates and structure, not having to use a specific template for reviews, being able to choose papers, and the editor's wording suggesting being desperate. Participants were asked to name additional demotivators other than the lack of the aforementioned motivations. Participants noted the following: poor English (14 participants), subsequent submission of a previously reviewed paper without revising it (4 participants), last minute requests (3 participants), too much structure or impersonality in the reviewing process (3 participants), high acceptance rates (3 participants), and nine more demotivators that were each mentioned by a single participant – too much math, too much theory, papers by people who don't contribute back to the system, having to discuss with other anonymous reviewers (i.e. not knowing the level of expertise and experience of the other reviewers was a demotivator), too many rounds of reviewing required, disagreeing with the review policies/guidelines, receiving unfair treatment from the venue on one’s own paper(s), receiving frequent rejections from the venue, and receiving poor reviews from the venue on one’s own paper(s). 4.4.3 Factors underlying motivational variables We analyzed the factorial structure of reviewing motivations using exploratory factor analysis, specifically, principal axis analysis of polychoric correlations of the motivational variables. Following the literature (Hair et al., 2009) we considering loadings that were less than 0.4 to be weak and loadings less than 0.3 were ignored. Weak loadings in a factor suggest that the factor might be relatively less coherent than other factors. Our factor analysis indicated ten factors. Appendix C provides details of the factor analysis results. We provide in this section quantitative and qualitative descriptions of the motivation items used in the factor analysis, grouped by the 165  factors identified in that analysis. We adopt the convention that motivational variables (items) are in ITALIC SMALL CAPS and factors are in BOLD ITALIC SMALL CAPS. For each factor, the loadings for items comprising it are given within parentheses after the item names.  FACTOR 1. LEARNING: Three items loaded onto this factor: learning to write more effectively through learning about the review process (LEARNING THROUGH THE PROCESS, 0.87) and through reflecting on papers (LEARNING THROUGH REFLECTION, 0.80), and PREPARING FOR HIGHER ROLES in the review process (0.36). LEARNING THROUGH REFLECTION was rated highly by 32% of the participants, and was the top motivation for 3% of them. One participant who rated it as extremely important wrote that “This is by far, the most important motivation for me. Reviewing encourages one to be critical which feeds back into one’s own work.” Another noted that “Being able to read the reviews of others is key … this process is not especially beneficial without being able to compare my review to those of others.” Seven participants mentioned that it was a more important motivation earlier in their careers. Four participants pointed out that the high number of badly written papers makes reviewing less useful for this purpose. However, two of them still considered it to be an opportunity for learning; one said “Reading badly written papers really makes me try to avoid writing such papers myself, as I realize again and again how painful they are to review.” One participant, who thought this motivation not important at all, explained that “it’s hard to transfer critiques of others’ work into ability to do better oneself.” 166  LEARNING THROUGH THE PROCESS was rated highly by 31% of the participants, and was the top motivation for 4%. Seven participants mentioned that it was a more important motivation earlier in their careers. One wrote “I actually believe that this IS very important, but at this point, I think I know all that [I] care to about the review process!” One participant who rated this motivation as extremely important mentioned that “This was more true for the first few times I reviewed. Now, I am more interested in learning what other members in my community think about specific pieces of research.” A participant who rated it as very important wrote “I didn’t intend this when I started out but I certainly have gained perspective by ‘seeing how the sausage is made’.” Two participants talked about learning norms of the community; one wrote “reading other reviewers’ reviews and discussions on the reviews helps to understand what is considered a flaw in a paper and, in a way, help see one’s paper with someone else’s eyes...” while the other preferred avoiding conformity: “I try not to conduct my research so that I can write to fit a pre-described format or style, although that may be more effective (but less innovative IMO).” PREPARING FOR HIGHER ROLES is discussed in FACTOR 2. FACTOR 2. REPUTATION: Three items loaded onto this factor: maintaining and establishing REPUTATION (0.84), gaining experience and PREPARING FOR HIGHER ROLES in the review process (0.43), and INFLUENCING MY FIELD of research and community (0.42). Maintaining and establishing REPUTATION was rated highly by 59% of the participants, and was the top motivation for 9%. A participant who rated this as very important noted that “when someone thinks you will be a good fit, it seems like a good opportunity to help. The person 167  collecting reviews may notice my high quality review.” Another participant mentioned the possibility of adding the reviewing to his curriculum vitae. Two participants who did not feel that reviewing contributed to their reputations pointed out the anonymity and confidentiality of reviews. PREPARING FOR HIGHER ROLES was rated highly by 29% of the participants, and was the top motivation for 8%. One participant who rated this motivation as extremely important said “I hope to be an AC at some point” and another who rated it as very important noted that “it is unclear how one progresses ‘up the chain’.” Two participants who rated it as not at all important mentioned their experience with higher positions in the process, and one wrote “I’ve been to the mountain already, there’s nothing there.” INFLUENCING MY FIELD of research was rated highly by 49% of the participants, and was the top motivation for 8%. Two participants pointed to the role of ACs, saying that “this is better achieved from the position of program committee member than reviewer,” and that “Chairs often over-rule even unanimous reviewer recommendations.” Another said “I see the role of reviewers as helping to shape the field.” One participant similarly wrote that “As sometimes the reviews can also depend on taste and opinion to some extent, I want to put in my vote for the kind of work that I see as important.” While the wording of the question was intended to be neutral, some participants thought it had a negative connotation and rated INFLUENCING MY FIELD as slightly or not at all important; one called it “dishonest and sleazy,” and another considered it to have an “ego-centric connotation.” 168  FACTOR 3. QUALITY CONTROL AND INFLUENCE: Three items loaded onto this factor: GATE-KEEPING (0.82), ENCOURAGING QUALITY (0.71), and INFLUENCING MY FIELD of research and community (0.35). GATE KEEPING or exposing researchers only to valuable research was rated highly by 55% of the participants, and was the top motivation for 6%. One participant commented “It’s not only other researchers I am concerned about but also students. I do not want students to get the wrong impressions about human factors in computer systems or about the profession’s standards for acceptable research practices.” Another pointed out that “This is important because conferences are expensive and should meet a certain quality.” One participant who did not consider it to be an important motivation mentioned that “I don’t see myself as a filter for protecting other researchers.” Another pointed out that “what is valuable is sometimes a matter of opinion as well, which often shows in how reviews on a paper can really differ a lot.” Two participants noted that the value of papers cannot be determined at the time of review; one wrote “I don’t really think it’s my place to decide what’s ‘valuable’, I decide what is good science and well written, whether that’s valuable is up to the reader.”  ENCOURAGING QUALITY research was rated highly (defined as extremely or very important) by 83% of the participants and was the top motivation for 12%. One participant who disagreed with the importance of this motivation stated that “good research is entirely subjective.” INFLUENCING MY FIELD is discussed in FACTOR 2. 169  FACTOR 4. PRESTIGE/SIGNIFICANCE OF THE REVIEW: Being asked by a FAMOUS AC (0.71) and desire for reviewing requests for a HIGH PROFILE VENUE (0.67) positively loaded onto this factor.  Receiving the review request from a FAMOUS AC was reported to increase motivation by 87% of the participants. Five participants mentioned that they feel honored to receive such a request; one wrote that “When I know my review will be read by someone I greatly respect and they will know that I wrote it, that is a strong motivator.” Two others mentioned that they feel more obliged in this situation. The participants who did not consider this to have any effect on them mentioned reasons such as not caring about impressing others, only caring about how qualified they themselves feel they are, or that it depends on who the person is. Receiving the review request for a HIGH-PROFILE VENUE was reported to increase motivation by 84% of the participants. Two participants mentioned that the submissions tend to be of higher quality, and two others mentioned that it is an honor. Another participant noted that “It increases my motivation since I know that the paper will [be] read [by] a wider audience if accepted.” FACTOR 5. SOCIAL OBLIGATION: Two items loaded onto this factor: having higher motivation when the associate chair is a friend (FRIEND AC, 0.87), and RELUCTANCE TO SAY NO (0.48). Receiving the review request from a friend (FRIEND AC) was reported to increase motivation by 89% of the participants. Four participants mentioned reciprocity as their reason; one wrote: “In the economy of reviewing, it means I may be able to armtwist that researcher to review a paper 170  for me in the future.” Others mentioned reasons such as helping friends, sympathy, and difficulty of refusing a friend’s request. RELUCTANCE TO SAY NO to review requests was rated highly by 26% of the participants, and was the top motivation for 8%. Two participants who rated this motivation as very or extremely important mentioned reputation concerns; one mentioned that “Turning down review requests can also get you branded,” and the other wrote “I think the editors/P.C. members will think I am foolish if I do not accept (it’s a great learning and service opportunity) and may not want to ask me to review again if I decline this time around. I want them to know they can depend on me.” Three participants mentioned that it depends on who asks. One wrote “sometimes [I] need to repay a favor or might want a favor from this person in the future.” Three participants mentioned the importance of helping the process; one wrote “One often feels obliged to take on reviews to help people out.” FACTOR 6. SCIENTIFIC ABILITY AND MATCH WITH RESEARCH INTERESTS: Two items loaded onto this factor: BEING EXPERT in the specific field of the paper (0.72) and RELEVANCE TO OWN RESEARCH (0.31). BEING EXPERT in the topic of the paper was reported to increase motivation by 97% of the participants. Six participants mentioned that it makes it easier to understand and review the paper, and four mentioned that it helps them to be more confident to help the authors or make meaningful comments. Two participants mentioned the difficulty of finding expert reviewers; one said “It also greatly increases my guilt to say no to a review request when I know I am one of the most appropriate people who could have been asked.” On the other hand, another 171  participant noted that “sometimes I grow tired of a subject and prefer to review in a topic that I want to learn about.” RELEVANCE TO OWN RESEARCH is discussed in FACTOR 7. FACTOR 7. CONVENIENCE: Five items loaded onto this factor: RELEVANCE TO OWN RESEARCH (0.62), having FREE TIME (0.47), knowing the project (FAMILIAR PAPER, 0.43), relatively short length of the paper (SHORTER PAPER, 0.38) and WELL-WRITTEN PAPER (0.31). All of these items reflect a desire for accepting reviews that require less effort or impose less inconvenience.  Relevance of the paper to the reviewer’s current research (RELEVANCE TO OWN RESEARCH) was reported to increase motivation by 98% of the participants, which corroborates the findings of Tite and Schroter (2007). Two participants emphasized that a paper should be related to their current work, not their past work: “Close relation of the paper and my ‘future or ongoing’ research will increase my motivation to review the paper, but close relation of the paper and my ‘past’ research will decrease my motivation to review the paper.” Four participants mentioned that the goal is to stay up-to-date, and two mentioned that it makes it easier to understand and review the paper. On the other hand, one participant noted that “I am uncomfortable when a paper is too close to my current research, because I don’t want to be influenced by the ideas of others before they are published (and therefore citable).” Having FREE TIME was reported to increase motivation by 65% of the participants. Thirty-two out of 37 participants who commented on this question indicated having no free time and suggested using different wording for the question, such as using the term “flexible” instead of “free” or 172  using a converse wording such as not “feeling overcommitted.” One wrote “I don’t know that anyone in academia ever feels like they have ‘free time’. As commitments come along, you carve out time for them.” One of the participants who thought having free time increases his motivation mentioned that “Certainly, reviewing is one of the first things to be dropped when pressed for time.” On the other hand, one who thought it does not have any effect wrote “I tend to think that reviewing for conferences is a great opportunity, probably because I am early in my career and lack experience. So, reviewing usually gets placed at the top of my priorities list.” Close relation of the paper to another paper that the reviewer has read (FAMILIAR PAPER) was reported to increase motivation by 60% of the participants. Six participants commented that they have not had such an experience. Two others who found it motivating mentioned that it facilitates the review, and two others mentioned that it helps them to write a more considered review. Two participants said that it depends on how good the original paper was and how incremental the new one is. The length of the paper being relatively short (SHORTER PAPER) was reported to increase motivation by 51% of the participants and to decrease it by 2%. On the other hand, a paper being long was reported to increase motivation by 2% of the participants, and decrease it by 52%. Two participants noted that they prefer shorter papers because they are already too busy, and two explained it was “Not because I am lazy, I simply think good short papers are more elegant and often deliver more value.” One participant said that this is his reason for not reviewing journal papers. Two participants pointed out that reviewing long papers is so much work that the reviewer should be acknowledged. One wrote “Reviewing takes lots of time and if I do it right I 173  feel that I should be almost a co-author or get acknowledged in the paper” and the other pointed out that “If it’s too long relative to papers submitted for that venue, I tend to assume it’s poorly written or premature. I don’t like authors using reviewers as editors (or collaborators!).” Two others preferred longer papers, because they contribute more and because it is easier to take into account reviewer’s comments in revisions (in contrast with short papers). Receiving a request for reviewing a WELL-WRITTEN PAPER was reported to increase motivation by 82% of the participants. Ten participants pointed out that they do not know if a paper is well written when deciding to accept a review request and four others mentioned that well-written papers are rare. One participant noted that “I also feel strongly that those papers that are not well-written may need the most attention, so I would not want to neglect the authors.” FACTOR 8. CONTENT BENEFIT: Three items loaded onto this factor: ENJOYING CRITICAL READING (0.60), desire for knowing what’s new in the field (AWARENESS, 0.36) and not caring about the (long) length of the paper (SHORTER PAPER, –0.47). ENJOYING CRITICAL READING was rated highly by 46% of the participants, and it was the top motivation for 8% of them. One of the participants who rated it as very important said that “I feel that regularly reviewing keeps one’s critical skills sharp.” Two participants (one who rated it moderately important, the other who rated it slightly important) mentioned that their busyness inhibits the joy; one wrote “I’m so overwhelmed with reviewing & other responsibilities that it is always a burden. I do often enjoy it, but it always feel like there are more important things I’m not doing when I’m reviewing.” Two other participants who rated it as slightly or not at all 174  important mentioned that reading accepted papers can serve this purpose. Three participants noted that they enjoy it when they receive a good paper for review. AWARENESS of new research was rated highly by 63% of the participants and it was the top motivation for 14%. One participant noted that “Being a program committee member gives an even better vantage point to see what’s new in the field.” Among the participants who considered this motivation of little importance, two pointed out that reading published materials would be more helpful for this purpose: “you get that anyway from actual publications, a few months later, and with a broader view.” SHORTER PAPER is discussed in FACTOR 7. FACTOR 9. RECOGNITION OF CONTRIBUTION: Two items loaded onto this factor: knowing that the review will influence the decision about the paper (INFLUENCING DECISIONS, 0.84) and that the authors will apply the suggestions (AUTHORS LISTEN, 0.48). It reveals a desire for recognition of the reviewer’s effor