Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Do infants prefer prosocial others? A direct replication of Hamlin & Wynn (2011) Sitch, Miranda Jane 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2018_november_sitch_miranda..pdf [ 5.25MB ]
Metadata
JSON: 24-1.0372317.json
JSON-LD: 24-1.0372317-ld.json
RDF/XML (Pretty): 24-1.0372317-rdf.xml
RDF/JSON: 24-1.0372317-rdf.json
Turtle: 24-1.0372317-turtle.txt
N-Triples: 24-1.0372317-rdf-ntriples.txt
Original Record: 24-1.0372317-source.json
Full Text
24-1.0372317-fulltext.txt
Citation
24-1.0372317.ris

Full Text

DO INFANTS PREFER PROSOCIAL OTHERS? A DIRECT REPLICATION OF HAMLIN & WYNN (2011) by  MIRANDA JANE SITCH  B.Sc., The University of Washington, 2013 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF  THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF ARTS  in  THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES   (Psychology)  THE UNIVERSITY OF BRITISH COLUMBIA  (Vancouver)  September 2018  © Miranda Jane Sitch, 2018    ii The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:  Do infants prefer prosocial others? A direct replication of Hamlin & Wynn (2011)  Submitted by Miranda Sitch                                   in partial fulfillment of the requirements for  the degree of  Master of Arts in              Developmental Psychology    Examining Committee:  Kiley Hamlin, Developmental Psychology  Supervisor Darko Odic, Developmental Psychology Supervisory Committee Member Andrew Baron, Developmental Psychology Supervisory Committee Member      iii Abstract Within the last five years, social sciences, especially psychology, have seen problems with replicability and reproducibility. A growing body of evidence suggests that low powered studies, undisclosed statistical flexibility and lack of pre-specified study standards are all contributing factors to a low rate of replicability. Within the realm of infant social evaluation, a topic of both theoretical interest and empirical controversy, both replications and non-replications exist. Given this, and the movement toward replication in psychology in general, this paper will present results from a pre-registered direct replication of Hamlin & Wynn’s (2011) “box scenario,” which examines whether preverbal infants prefer prosocial to antisocial others. The box scenario is a puppet show that involves three puppet characters: a prosocial dog (“the Helper”), an antisocial dog (“the Hinderer”), and the protagonist, a horse. The scene begins with the Helper and Hinderer puppets sitting in the corners of the stage. In the centre of the stage is a clear plastic box that contains a brightly coloured rattle. The horse enters the stage, runs over to the side of the box and leans down to peer inside and pops back up with excitement. The horse then jumps on top of the box and attempts to open it but is unable to do so. On the final attempt, the Helper or Hinderer intervene. During the prosocial event, the Helper assists the horse by pulling the lid open. Alternatively, in the antisocial event, the Hinderer jumps on top of the box and slamming it shut keeping the horse from getting the toy. The prosocial and antisocial events are presented in alternation until the infant is habituated. After watching the show, the infant is presented with the Helper and Hinderer and asked to pick between them. Our pre-registered sample did not replicate the original finding. However, when including all infants tested in the box scenario, the finding did indeed replicate. Our findings add to the scientific understanding of  iv infant social evaluation and provide an important opportunity to add to the replicability movement, thereby helping foster a more robust science and better confidence in our results.                  v Lay Summary Within the last five years, social sciences, especially psychology, have seen problems with replicability and reproducibility. A growing body of evidence suggests that low powered studies, undisclosed statistical flexibility and lack of pre-specified study standards all contribute to a low rate of replicability. Within the realm of infant social evaluation, a topic of both theoretical interest and empirical controversy, both replications and non-replications exist. Given this, and the movement toward replication in psychology in general, this paper will present results from a pre-registered study that directly replicates Hamlin & Wynn’s (2011) “box scenario”, which examines whether infants prefer prosocial to antisocial others. Our pre-registered sample did not replicate the original finding. However, when including all infants tested in the box scenario, the finding did indeed replicate. Overall, our results provide a better scientific understanding of infant social evaluation and an important opportunity to add to the replicability movement.         vi Preface This thesis is original, unpublished, independent work by the author, Miranda Sitch.                vii Table of Contents Abstract ......................................................................................................................................... iii Lay Summary.................................................................................................................................. v Preface........................................................................................................................................... vi Table of Contents.......................................................................................................................... vii   List of Figures ............................................................................................................................. viii Acknowledgements......................................................................................................................... x Introduction………………………………………………………………………………………. 1 Methods for Pre-registered Sample………………………………………………………...…... 11 Results and Discussion for Pre-registered Sample……………………………………...……… 16 Methods for Entire Sample……………………………………………………………...……… 18 Offline Coding for Entire Sample………………………………………………………………. 20 Results and Discussion for Entire Sample……………………………………………………… 23 General Discussion …………………………………………………………………………….. 32 Bibliography………………………………………………………………………………….… 37    viii List of Figures Figure 1. Examples of helping and hindering events……………………………………..…..,... 23 Figure 2. Example of a clear choice……………………………………………….................…. 24 Figure 3. Results of manual choice test phase for pre-registered sample………………….….... 25 Figure 4. Examples of jump size ratings…………………………………………….………...... 31 Figure 5. Results of manual choice test phase for entire sample……………………………….. 33 Figure 6. Frequency distribution for choice and rating for helper’s running speed..................... 34 Figure 7. Frequency distribution for choice and rating for hinderer’s running speed………….. 34 Figure 8. Frequency distribution for choice and rating for protagonist’s running speed………. 35 Figure 9. Frequency distribution for choice and size of the box opening during helping……… 35 Figure 10. Frequency distribution for choice and size of the box opening during hindering…... 36 Figure 11. Frequency distribution for choice and protagonist’s last two struggles…...………... 36 Figure 12. Frequency distribution for choice and protagonist’s jump size………...…………… 37 Figure 13. Frequency distribution for choice and hinderer’s jump size…………..…..………... 37 Figure 14. Frequency distribution for choice and helper’s jump size for the entire sample...….. 38 Figure 15. Frequency distribution for choice and helper’s jump size………...………………… 39  ix Figure 16. Poster parameter correlation plot for helper’s jump size and bias for the helper...…. 40 Figure 17. Trace plot for infants’ bias for the helper………………………...……………….… 41                x Acknowledgements Thank you to the parents and infants who participated, and the members of the Centre for Infant Cognition at the University of British Columbia. I also want to thank my advisor, Kiley Hamlin, for her support and guidance. Lastly, thank you to all my fellow lab mates and graduate students for helping see these studies through.     1 Introduction Being able to successfully navigate our complex social world requires determining who we should associate with and who are better to avoid. Accordingly, it best to interact with those who cooperate, who are “prosocial” and stay away from those who do not, who are “antisocial”. When do we learn to identify and evaluate potential social partners? Researchers have been investigating this question for over a decade. This line of work has shown that, infants distinguish prosocial from antisocial agents (Behne, Carpenter, Call, & Tomasello, 2005; Dunfield & Kuhlmeier, 2010; Premack, 1990) and prefer prosocial to antisocial others as early as 3 months after birth (Hamlin & Wynn, 2011; Hamlin, Wynn, & Bloom, 2007; Hamlin, Wynn, & Bloom, 2010; for a recent review see Holvoet, Scola, Arciszewski, & Picard, 2016 and Margoni & Surian, 2018).  One of the first studies exploring infants’ social evaluations of prosocial and antisocial others was conducted by Hamlin et al. (2007) using the “hill paradigm”, a live puppet show where a protagonist (circle with eyes) attempts to climb a hill and is aided by the “helper” (either triangle or square with eyes) who helps push the protagonist up the hill and impeded by the “hinderer” (either triangle or square with eyes) who pushes the protagonist down the hill. After being habituated to alternating helping and hindering events, infants were presented with the helper and hinderer and asked to choose one. Infants reliably reached for the helper over the hinderer, suggestive that they positively evaluated the helper and/or negatively evaluated the hinderer.  Further studies used the hill paradigm to explore whether infants negatively evaluate hindering, positively evaluate helping or are capable of doing both, by pairing a hinderer or a helper with a neutral character. Results indicated that by 6 months of age, infants prefer helpful      2 to neutral but neutral to unhelpful individuals, suggestive that both positive and negative evaluations emerge within the first year (Hamlin et al., 2007). Researchers have also assessed infants’ sensitivity to prosocial and antisocial actions in other helping and hindering scenarios aside from the hill paradigm (Hamlin & Wynn, 2011). In the” box show” a protagonist puppet attempts to open a clear box with an attractive toy inside and is aided the prosocial puppet who helps open the box and the impeded by the antisocial puppet who shuts the box. In the” ball show” a protagonist puppet accidentally drops their ball and is aided by a prosocial puppet who returns the ball and impeded by an antisocial puppet who steals the ball away. Both manual choice tasks and looking time measures revealed that infants preferred the prosocial over the antisocial puppet in these new contexts. Furthermore, when infants were shown physically matched control conditions of the box and ball puppet shows, they displayed no preference for a prosocial or antisocial character when the protagonist was an inanimate entity (Hamlin & Wynn, 2011).  Outside of the domain of helping versus hindering, research suggests that infants’ evaluations apply to characters engaging in several distinct kinds of social scenarios and experimental paradigms, including comforting others versus aggressing against them and dividing resources fairly versus unfairly (Buon et al., 2014; Geraci & Surian, 2011; Ziv & Sommerville, 2017). For example, one study examined 10-month-old infants’ preference for an aggressor versus a victim after watching physical interactions where the aggressor repeatedly attacked the victim (Kanakogi, Okumura, Inoue, Kitazaki, & Itakura, 2013). They found that infants significantly preferred the victim over the aggressor, as well as a neutral character over an aggressor. In control conditions where the characters performed the same physical acts, but did      3 not make contact, infants did not display a preference, suggesting that, as in helping/hindering studies, infants’ preferences are driven by the social aspects of the events. It may be possible that infants simply like those who make good things happen and dislike those who make bad things happen. However, this isn’t always the case, there is evidence to suggest that infants’ also take into account the mental state of the prosocial and antisocial characters (Choi & Luo, 2015; Kanakogi et al., 2017; Hamlin, Ullman, Tenenbaum, Goodman, & Baker, 2013; Woo, Steckler, Le, & Hamlin, 2017). For example, Hamlin (2013) examined infants’ preferences for characters who made failed attempts to help or hinder in a box show scenario. 8-month-olds preferred the helper (positive intentions) over the hinderer (negative intentions) regardless of the outcome thereby demonstrating a sensitivity to the characters’ mental states. Taken together, this research would suggest that infants’ preference for prosocial agents is robust. However, within the realm of infant social evaluation, a topic of both theoretical interest and empirical controversy, both replications and non-replications exist.  All positive evidence suggests that infants prefer prosocial others and a recent meta-analysis concurs with this view (Margoni & Surian, 2018). This meta-analysis called attention to the fact that many of studies put forth were conducted by Hamlin and collaborators. To date, the Hamlin lab is only group that has carried out successful direct replications of their original studies investigating infants’ preference for prosocial others. For example, Steckler, Woo and Hamlin (2017) found that 9-month old infants were more likely to choose the prosocial puppet after being familiarized to the box show scenario used in Hamlin and Wynn (2011). Additionally, two other studies from Hamlin and colleagues found that infants preferred the helper over the hinderer after being exposed to the box show (Hamlin, 2013; Tasimi & Wynn, 2016) . Other labs have executed successful conceptual replications. For instance, Scola and      4 colleagues (2015) found that infants were more likely to choose the prosocial giver after watching an animated version of the ball show (see Hamlin & Wynn, 2011). Additionally, Chae and Song (2018) found that infants were more likely to prefer the helper over a neutral agent and neutral agent over a hinderer after watching a modified version of the hill paradigm.  Alongside these successful replications failures also exist, all of which do not come from Hamlin and colleagues. For example, Salvadori and colleagues (2015) attempted both a direct and a conceptual replication of the box show. Unlike Hamlin and Wynn (2011), they found that across both studies, 9-month-old infants were equally likely to pick the prosocial and antisocial puppets. In addition, Cowell and Decety (2015) exposed infants modified version of the hill paradigm and unlike Hamlin et al. (2007), found that  infants’ choices between the prosocial and antisocial characters did not vary from chance. Lastly, when Scarf and colleagues (2012) showed infants an adapted version of the hill paradigm, they found that in the absence of bouncing motion, there was not a significant preference for the helper over the hinderer. In the above studies, all but one had significant methodological differences which will be discussed below. Social evaluation has seen both successes and failures, which harkens to issues present in psychological research as a whole, specifically the occurrence of inconsistent replicability. Thus, the goal of this project reported here is to conduct a direct pre-registered replication of the box show paradigm (see Hamlin & Wynn, 2011) in attempt to assess the replicability of infants’ preference for prosocial others and more broadly, help to uncover and better understand replicability issues within psychological science and more specifically within the field of developmental psychology.  Within the last five years, the social sciences, especially psychology, have seen problems with replicability and reproducibility. For example, the seminal study conducted by the Open      5 Science Collaboration (OSC, 2015) found that after attempting to replicate 100 studies in social and cognitive psychology, 97% of which initially reported significant findings, only 36%  replicated successfully. These and other high-profile replication failures (e.g. Galak, LeBoeuf, Nelson, & Simmons, 2012) have led to wide-spread concern amongst psychologists, who wish to discover the roots of these failures in order to improve replicability rates going forward.  A growing body of evidence suggests that low-powered studies, lack of pre-specified study standards, undisclosed statistical flexibility and publication biases are all contributing factors to a low rate of replicability. Statistical power refers to the probability of rejecting a null hypothesis when it is false. Most measures in psychological research contain some random error that can lead to differences in the outcome (Frank et al., 2017).  For example, Stanley and Spence (2014) used computer simulations to create thousands of replications using the same participants, in which the only difference between each simulated replication result was random measurement error. They found that the replication findings varied considerably as a result of random error alone. When studies lack appropriate statistical power (e.g., small sample sizes), they become more vulnerable to spurious findings due this random error and consequently have a reduced chance of detecting a true effect (Button et al., 2013; Simmons, Nelson, & Simonsohn, 2011). This becomes problematic for developmental research because we often have small sample sizes because testing young participants is both cumbersome and costly. For example, infants can spontaneously disengage during study, they may unexpectedly become fussy and therefore unable to continue.  Secondly, a replication attempt may also fail due to differences in procedures, measures, or samples between the original and replication samples (Frank et al., 2017; Kessler & Meier,      6 2014).  These differences might be quite subtle and so difficult to pinpoint; for example, Harless (1992) found that just a slight difference in the presentation of a choice problem significantly changed the observed effect of X on Y.  This highlights the importance of exact (or direct) replications.  If exactly replicating the procedures of the original study fails to yield significant results, then it may be reasonable to conclude that the results of the original study are due to a type I error. However, this can be challenging because many labs employ similar methods but vary in their individual set-ups, how they are displayed across age-groups, the mediums used present stimuli (e.g. video versus live display), etc. Developmental populations are even more unpredictable than adult populations because they cannot be given explicit instructions to follow, making it all the more challenging to employ standardized methods. Lots of communication is implicit, experimenters may be “saying” different things. For instance, how good an experimenter is with children can impact how they interact with their participants and those who are less skilled may encounter more difficulty. Scientists strive to discover truths about the world and attempt to do so accurately, but inevitably errors are made. For instance, the pressure to compile data and build up one’s CV can cause researchers to neglect the quality of their science and instead focus on publishing studies, resulting in “questionable research practices” (QRPs). These QRPs can lead to improper statistical inferences like “p-hacking” (Simmons et al., 2011). P-hacking occurs when research scientists collect or select data or statistical analyses until significant results are found (Head, Holman, Lanfear, Kahn, & Jennions, 2015). An example of p-hacking would be collecting and analyzing multiple conditions, but only reporting those that yield significant findings. Conducting multiple analyses and only reporting significant results, would be another example of p-hacking. This kind of selective reporting paints an inaccurate picture, thus making      7 replication all the more challenging. Other practices that can compromise the integrity of the literature include but are not limited to “HARKing” (hypothesizing after results are known) (Kerr, 1998). Presenting a hypothesis that is based on or informed by one’s results prevents valuable information from being shared and thereby also increases the likelihood of failed replications (Gonzales & Cunningham, 2015; John, Loewenstein, & Prelec, 2012).  A recent study by Eason, Hamlin and Sommerville (2017) helped shed light on the prevalence of QRPs within the field of developmental psychology. They surveyed developmental labs around the world on five practices that are typically a part of data collection: pilot work, sample size, condition assignment, blinding practices, inclusion and exclusion criteria, statistical analysis, and training. The QRPs associated with these practices were assigned into two categories: “clearly problematic practices” that should not be tolerated and “risk-permeable practices” that in certain situations, could comprise the integrity of the data. They found that “clearly problematic practices” were uncommon; however, a substantial amount of risk-permeable practices was found. Although researchers are motivated to follow a priori policies, infant research is difficult in that decisions must be made case-by-case, leaving space for potential inconsistencies in practice. Ultimately, this can make successful replication challenging and more importantly, our science more vulnerable to reporting “false positive” findings. Furthermore, developmental research might be even more prone to QRPs. For example, given that data collection can be very long and challenging process, researchers are hesitant to throw out data and could be making decisions under biased conditions (e.g. deciding to exclude and then replacing babies). Not only do scientists want to publish papers, they also want to uncover truths about the world and as indicated, this endeavor is not easy nor is it perfect.       8 Lastly, attempting to publish novel and significant data has led to a bias in publications that can result in the use of questionable practices and decreased likelihood of replicability (e.g. Ioannidis, 2005; Ioannidis, Munafo, Fusar-Poli, Nosek, & David, 2014; Kühberger, Fritz, & Scherndl, 2014). For example, OSC (2015) found that the majority of original study effect sizes were much larger than the replication effect sizes. They argue that the inflated effect sizes are due to a combination of reporting bias, and low-power research designs due to publication bias. Margoni and Surian (2018) not only discovered an overall preference for prosocial agents, they also uncovered evidence of a publication bias that suggests that the effect size in published studies is likely to be overestimated. Moreover, they found that when they compared studies that showed infants helping/hindering or giving/taking events, whether the research was conducted by Hamlin and collaborators or by another independent lab had a significant effect on the estimated effect size, such that Hamlin and colleagues’ studies reported larger effect sizes (Margoni & Surian, 2018). If effect sizes are overestimated, it makes it difficult to appropriately power replication studies.  In response to the “replicability crisis”, research has been making attempts to further examine the variation in replicability and look for ways to produce more credible science. One foray in this endeavor is the Many Labs Project, a group of 36 labs that used standardized procedures to explore the replicability of 13 classic and contemporary psychological effects, allowing for variation to be studied with considerable statistical power (Klein et al., 2014). Some of the effects were known to be highly replicable and for the others, replicability was unknown. The project replicated eleven of the thirteen chosen studies, confirming robust reproducibility of the previously found results in variable samples and contexts.  The authors suggest that future research questions could be explored with collective effort across labs to better understand and      9 better power future work. In an effort to answer that call, the Many Babies Project was created (Frank et al., 2017). The aim is to explore an initial finding, specifically the preference for infant directed speech, that has previously demonstrated robust replicability. This effect is being tested in labs across the world using different languages, methodologies and age groups. The data collected will serve as a basis to gather a preliminary information about less biased-estimates, variability that can be accounted for in future studies and will help identify the best practices for infancy research.  Aside from multi-lab projects (Frank et al., 2017; Klein et al., 2014) and surveys to help identify and avoid QRPs (e.g. Brandt et al., 2014; Eason et al., 2017), is the movement toward transparency. A practice that is becoming increasingly integrated is pre-registration. It requires researcher to submit a research rationale, hypothesis, design, and analytic strategy to be completed before data collection begins (Gonzales & Cunningham, 2015; Klein et al., 2018). Pre-registration keeps scientists accountable by forcing us to plan ahead of time and to keep official record of these plans. Furthermore, the hope is more transparency will decrease the motivation to p-hack and search for significant results based on collected data, minimize research publication bias and ultimately strengthen research as a whole.   As mentioned previously, publication bias is a pervasive issue in the field of psychology. Another way research scientists have tried to combat this problem is with the use of Bayesian Statistics. At the core of publication bias is the desire to find significant effects, but unlike frequentist methods, Bayesian methods can provide significant justification for the null hypothesis (Savalei & Dunn, 2015; Simmons et al., 2011). Such that Bayesian methods can provide evidence for or against an effect. Therefore, it allows the incorporation of background      10 knowledge instead of repeatedly testing a null hypothesis, ignoring the lessons of previous studies. In contrast, frequentist methods look to see if there is an effect, that “something is happening”. If a null result is found using frequentist methods, it only demonstrates that there is not an effect present. Simply put, that “nothing is happening”.  An addition to testing for effect significant effects, publication bias also poses issues with effect size and power estimates. The bias in statistical power is the difference in the average statistical power across the sampling distribution and the nominal level used for each power calculation to determine sample size. Blindly using effect size estimates in statistical power calculations, as previously indicated, can be flawed and result in bias. Therefore, sampling variability, how much an estimate varies between each sample, should be accounted for. Maxwell and colleagues (2015) point out that frequentist methods cannot account for sampling variability and therefore, replication attempts can run the risk of having underpowered studies. Bayesian methods, on the other hand, can account for sampling variability because it uses the entire distribution over a single value point for the effect size estimation (Kruschke, 2013; Maxwell et al., 2015). Developmental work, like other fields in psychology, fall victim to publication bias, are often underpowered and have to handle non-normal parameters. Therefore, Bayesian statistics may be valuable tool in infant research that can be used to battle against these issues (Van de Schoot et al., 2014). Given the mixed results seen infant sociomoral evaluations and the movement toward replication in psychology in general, this paper will present results from both a pre-registered sample and a larger sample that directly replicates Hamlin & Wynn’s (2011) box scenario. This particular puppet show has seen both successful (by Hamlin and colleagues) and failed      11 replications (by other labs). Therefore, it seems like an appropriate choice for a replication attempt such that it can provide more insight on infants’ preference for prosocial others and the issues with replicability more broadly.  In the box scenario, infants are presented a series of interactions between three puppets wherein which a protagonist puppet attempts to open a box with an attractive toy inside, while two other puppets watch. During alternating events, infants see one of the two puppets (the helper) help the protagonist fulfill his goal by opening the box; whereas the other puppet (the hinderer) prevents the protagonist form achieving his goal by closing the box.  Pre-Registered Sample Methods Participants- All participants were full-term, typically developing infants aged between 7 months, 6 days and 8 months, 15 days recruited from a database of parents who said they were interested in having their child participate in research at a large research institution in the Western Canada. All infants will be awarded a small gift (t-shirt or bath toy) for their participation in the study. In the pre-registered sample, there were 32 infants (14 females; average age = 7;28). An additional 20 infants began the experiment but were not included in the final sample due to fussiness (8), failure to choose a puppet (1) procedural error (5), parental interference (5) or technical issue (1). Procedures- The Behavioural Research Ethics Board of the University of British Columbia approved all procedures. Infants were seated on the lap of one parent in an      12 experimental room in front of a puppet show stage created by a black table surrounded on three sides with black cloth curtains; the directly in front of the infant could be lowered to occlude the puppet stage between events. A single puppeteer performed all events by placing her hands underneath the rear curtain; she wore a long-sleeved black shirt to cover her arms and was thus entirely hidden from the infants’ view. Parents were instructed to sit quietly with their infants, to not attempt to influence their attention in any way and to close their eyes during all puppet shows. After the infant and parent were seated, the curtain in centre of the stage was raised twice, both times following the squeak of a toy, to warm up infants to the sounds and movements. Parents were then asked to see if the infant was comfortable with the process and are then reminded to close their eyes before the raising the curtains again. Throughout the study, infants’ attention was recorded online by a coder who watched the infant through a live video feed in another room; coders were blind to the puppet show events. The study consisted of three phases: box familiarization, puppet show habituation and manual choice.  Box Familiarization- Infants were presented with a clear plastic box containing a rattle that will be presented during the puppet show. An experimenter held the box in front of the infant, saying “Look!” (shaking box), “Look!” (grabbing edge of the box), “Look!” (opening box), and “Ooh!” (lifting the rattle out of box and shaking it). Then she said, “Should we put it in again? Look!” (putting rattle back in box), “Look!” (closing lid), “It’s in there again!” (shaking box). She then said, “Should we take it out again?”. The familiarization procedure was repeated twice.  Puppet Show Habituation Events- The infants watched a puppet show that involves three hand plush puppet characters: a prosocial dog (the Helper), an antisocial dog (the Hinderer), and the protagonist, a horse. The scene begins with the Helper and Hinderer puppets sitting in the far      13 corners of the stage, facing slightly inwards. In the centre of the stage is a clear plastic box with a hinged lid, attached to the table via Velcro and inside the box is a brightly coloured toy (same as the box shown during familiarization). At the start of each trial, the horse appears on the stage through the centre of the back curtain and runs over to the side of the box that is opposite of the puppy who will also be acting in the scene. He leans down and peers into the box twice, each time popping back up with excitement. The horse then jumps on top of the box and leans down to pull the lid up to open the box. The protagonist makes four attempts to open the box but fails each time because the lid is too heavy: in the first two tries, he lifts the lid up slightly and trembles in effort to pull the lid open, in the second two, he lifts the lid up slightly and falls back down. On the fifth attempt, the Helper or Hinderer will run to the box to engage in the scene.   In the prosocial event, the Helper jumps on top and assists the horse by pulling the lid open. Once the lid opens and the horse jumps to the toy and leans down to grab it. The Helper jumps off the box and runs off stage.  In the antisocial event, on the protagonist’s fifth attempt, the Hinderer jumps on top of the box and slamming it shut, leading the protagonist to jump off the box and dropping his face on the table, as if he were sad/disappointed. Then the Hinderer jumps off the box and runs off stage.  In both situations, the horse remained in his final position until the infant had looked at the frozen scene for up to thirty seconds or had looked away for two consecutive seconds. The prosocial and antisocial events are presented in alternation for a minimum of six trials and a maximum of 14 trials. An experimenter in another room, blind to the order of the events and the identity of the prosocial and antisocial puppets, coded the infant’s attention to the paused scene to understand how they are processing the events via a customized coding program, JHab      14 (Casstevens, 2007). This experimenter was also responsible for prompting the end of a trial and terminating the experiment prematurely if the infant appears distressed, informing the presenter via microphone. Additionally, the show did not continue if the parent thought that the infant was uncomfortable (See Fig. 1).    Figure 1. Examples of both helping (right) and hindering events (left).   Manual Choice Test Phase- The second part of the study involved the infant making a choice between the prosocial and antisocial puppets. During this portion of the experiment, parents were asked to move away from the puppet show stage and place their feet on pre-set red tape line on the floor. The parents were then instructed to have the child sit on their knees and support the infant below his/her arms, around the ribs and, so they were free to reach forward. Parents were then reminded again to keep their eyes closed.  An experimenter, who the infant had been familiarized with earlier, presented the two puppets to the child. The experimenter held the puppets behind their back, out of the infants’ sight, and knelt in front of the them and said      15 “Hi” and waited for the infant to look at them. Next, the experimenter said, “Look!” and then brought the two puppets out, equidistant from and just out of reach from the infant. After the infant had looked at both puppets, the experimenter said “Hi” again until the infant looked at him or her after he or she proceed to ask, “Who do you like?” while pushing both puppets forward to reaching distance. Infants were given up to two minutes to decide on a puppet. Those who did not make a choice within the two-minute time-frame were excluded from the final sample. A choice was recorded if the infant looked at the puppet first and then touched it. The experimenter who executed the manual choice test phase was blind to the identity of the puppets (see Fig. 2).    Figure 2. Example of a participant making a clear choice during the manual choice test phase.   Counterbalancing- The following aspects of the experiment were counterbalanced for all infants: 1. social identity of the hinderer/antisocial character (dog puppet in blue or orange shirt),      16 2. order of hindering events during habituation (first or second), 3. side of hinderer/antisocial puppet during habituation events (left or right) and 4. side of hinderer/antisocial character during the manual choice test phase (left or right).  Results and Discussion All planned analyses were pre-registered at osf.io/z4tqu. Infants habituated in an average of 9.16 trials (SD = 2.91). A paired samples t-test demonstrated that infants looked equally to prosocial (M = 43.45, SD = 15.43) and antisocial (M = 41.78, SD = 17.99)   events; t(31) = .71, p > .48). Chi-square tests verified that there were no influences caused by factors such as age, gender, side preferences, and order of events (all p’s > .33). During the manual choice test phase, infants did not significantly prefer the prosocial over the antisocial puppet (18 out of 32 infants or 59%, binomial test, p >.59; Fig. 3).   Figure 3. The number of infants who reached for the helper and hinder in the pre-registered sample. Infants did not display a preference for either puppet (18 of 32 infants chose the helper, p<.59).   181402468101214161820# of InfantsHelperHindererns.       17 Unlike the original study where the majority of infants preferred the prosocial puppet over the antisocial puppet, only a little over half (56%) of the infants in this replication attempt showed such a preference. Our failure to replicate may be due  to the effect size may be smaller than originally projected and/or the methodology deviated from the original protocol. Salvadori and colleagues (2015) failed to replicate the box scenario when their studies were appropriately powered given the effect size projected from the original study. Thus, it may be the case that the effect is smaller than we originally thought, and our study wasn’t adequately powered, thereby leading to non-significant results (e.g. did not have a large enough sample to capture this effect).  Alternatively, the puppet shows may not have matched those shown to infants in the original study causing a failure to replicate due to discrepancies between the original and replication methodologies. For example, Hamlin (2015) found that inconsistencies in methodology can act as a potential confound such that differences in the protagonist’s pupil location impacted the results. Specifically, infants were more likely to choose the helper only when the climber’s gaze was directed to the top of the hill (consistent with the goal). In addition, when utilizing a live design, there is always inevitable variation. That being said, there was variability in our puppet shows that we were not aware of. In attempt to help elucidate these findings, we ran some exploratory analyses on the entire collection of infants tested over the last two years, including the sample of 32 infants from the pre-registration.  Entire Sample Our pre-registered sample did not replicate the original finding. This failure came as a surprise to us because previously, albeit not ideal, data collection using the box show looked promising. To avoid making any QRPs, we used this data to inform our methodological decisions. All of the data we collected is technically “usable”, but it is not QRP free. However,      18 this does not preclude us from doing exploratory analyses to help us better understand why our pre-registered sample did not replicate.   Over the past two years, as a master’s student, prior to this last pre-registered sample, I had run quite a few babies in the box show in order to: learn the paradigm, establish the best way to perform the paradigm such that the relevant information was salient to our infant participants, and correct for a colour and side biases. Ultimately, this resulted in a large collection of infants that were exposed to a similar version of Hamlin and Wynn's (2011) box show. Accordingly, we ran exploratory analyses on the entire collection of infants tested (entire sample) to see if the effect held for a larger, more powerful sample size. However, it is important to note that without knowing how variation can impact data, it was hard for us to know what to stop doing during our live puppet shows. So, we searched for and identified aspects of the data that we thought warranted exploration. We observed that the puppet running speed, puppet jump size, size of the box opening, and the protagonist’s last two struggles appeared to vary considerably between participants. Thus, we decided to see if any of these variations made an impact on infants’ choices. Methods Participants- All participants were full-term, typically developing infants aged between 6 months, 20 days and 8 months, 26 days recruited from a database of parents who said they were interested in having their child participate in research at a large research institution in the Western Canada. All infants will be awarded a small gift (t-shirt or bath toy) for their participation in the study. In the entire sample, there were 115 infants (51 females; average age = 7;29). An additional 50 infants began the experiment but were not included in the final sample due to      19 fussiness (22), failure to choose a puppet (8), procedural error (17), parental interference (12) or technical issue (1). Procedures- The procedures shown to all infants were modeled off the original study but included some variation: 1. identity of the protagonist, 2. shirt colours of the prosocial and antisocial puppets, 3. inclusion of the box familiarization, and 4. location box familiarization procedure. Infants observed a protagonist portrayed by either a cow (66 infants) or horse (49 infants) hand puppet. We elected to use the horse puppet in the pre-registered sample because the eyes on the cow puppet are slightly occluded and we wanted to ensure that the protagonist’s goals were salient. Secondly, the majority of the sample saw the prosocial and antisocial puppets wearing orange and green shirts (83 infants), while the pre-registered sample saw the prosocial and antisocial puppets wearing orange and blue shirts (32 infants). In an initial pre-registered sample, infants were frequently choosing the puppet in the orange shirt (12 infants) versus the puppet in the green shirt (5 infants). So, we paused testing the pre-registered sample and piloted infants’ choices (n = 10) when given an option between a puppet in a blue shirt versus a puppet in an orange shirt and found that infants were equally likely to choose either colour (binomial test p > .4). Therefore, we opted to use the blue and orange shirts in the pre-registered sample. Additionally, a portion of the sample (49 out of 115 infants) were exposed to the box familiarization procedure. Although several published conditions have not utilized a box familiarization since the original Hamlin & Wynn (2011) publication, we included the box familiarization event in both of our pre-registered samples (our final sample of 32 and our sample with the colour bias) to ensure that our protocol was as similar to the original study as possible. Lastly, infants who were exposed to the box familiarization saw it performed either off to the side of the puppet show stage (17 infants) or from behind the puppet show stage (32      20 infants). In the original study, the box familiarization was performed in the lobby. We opted to perform the box familiarization in the testing room given that the lab tests multiple participants a day.  Infants who were in the initial pre-registered sample saw the procedure performed off to the side of the stage and we noticed that many of the infants were choosing the puppet on their right side (11 out of 17 infants). The experimenter was close, almost within reaching distance, to the infants and we noticed that many infants were extending their hand(s) toward the box. This may have biased their choices during test given they were facing the same direction for both the box familiarization and manual choice procedures. Consequently, we opted to perform the box familiarization at the puppet show stage to avoid biasing infants’ behaviors during choice by both increasing the distance between experimenter and infant (to suppress infants’ reaching behaviour) and changing the direction (to avoid familiarity and potential perseveration).  Offline Coding  Predicted variation- After reviewing the videos of the puppet shows from the entire group of infants, we noticed that there was variation between the helping and hindering events shown to some infants versus others. In attempt to perform the cleanest and most salient versions of the prosocial and antisocial events, we frequently discussed details of the procedures, which appeared to lead to certain features of the shows shifting over time. As a result, we decided to isolate said features and create an offline coding scheme around them.   We coded for variances for the following variables: 1. Running speed of all three characters (Rated 0-2), 2. Jump size of all three characters (Rated 0-3), 3. Size of the box opening during helping and hindering events (rated 0-2), and 4. Whether or not the protagonist released the box on the last two struggles (Yes or No). An independent coder unaware of the      21 choice the infant had made during the test phase, provided an average rating across all events shown to each individual participant for the first three variables and took note of the presence or absence of the fourth variable. The details for the coding are provided below:  1. Running Speed of all three characters was rated from “0” to “2”- a. “0”- Puppet ran slowly across all events. Each character enters the scene to engage with box and/or protagonist and did so at a very slow pace. b. “1”- Puppet ran up at an average speed across all events. Each character enters the scene to engage with box and/or protagonist and did so quickly, but at a slow enough pace for an infant to process it.  c. “2”- Puppet ran at a fast speed across all events. Each character enters the scene to engage with box and/or protagonist so quickly that it may have been difficult for an infant to process it.  2. Jump size of all three characters was rated from “0” to “3” (see Fig. 4)- a. “0”- Little to no jump present. Each puppet barely jumps, if at all, onto the box either to open it (protagonist and helper) or to close it (hinderer).  b. “1”- Small jump present. Each puppet jumps just high enough to get onto the box either to open it (protagonist and helper) or to close it (hinderer).  c. “2”- Large jump present. Each puppet performs a large rainbow jump to get onto the box either to open it (protagonist and helper) or to close it (hinderer).      22 d. “3”- Very large jump present. Each puppet performs a very large rainbow jump places the puppet way above the box attempting to open it (protagonist and helper) or to close it (hinderer).   Figure 4. Examples of jump size ratings- 3 rating (top left), 2 rating (top right), 1 rating (bottom left) and 0 rating (bottom right).   3. Size of the box opening during both helping and hindering events was rated from “0” to “2”- a. “0”- Little to no box opening present. The protagonist appears to barely, if at all, open the box to get the interesting toy inside it.  1  0  2 3       23 b. “1”- Small box opening present. The protagonist opens the box an inch or so high, but never appears to be able to get the interesting toy inside it. c. “2”- Large box opening present. The protagonist opens the box very wide and almost appears to be able to get the interesting toy inside it. 4. Whether or not the protagonist released the box on the last two struggles was coded as “Yes” or “No”-  a. “Yes”- The protagonist let go of the box’s lid during its final two attempts before the helper or hinder intervened.  b. “No”- The protagonist held onto the box’s lid during its final two attempts before the helper or hinder intervened. Results and Discussion  We conducted the same pre-registered analyses on the entire sample. A binomial test was used to verify if a significant number of infants prefer to choose one of the two characters.  Chi-squared tests were used to verify that there are no influences caused by factors such as age, gender, side preferences, and order. Lastly, a paired samples t-test was run to detect for differences in looking time between the prosocial and antisocial events to probe whether attention during the puppet show influences choice. Infants habituated in an average of 8.71 trials (SD = 2.63). A paired samples t-test revealed that infants looked equally to prosocial (M = 39.89, SD = 15.81) and antisocial (M = 40.07, SD = 17.27) events; t(114) = -.14 , p > .89). Chi-square tests verified that there were no influences caused by factors such as age, gender, side preferences, and order of events (all p’s > 0.29). In addition, the differences in protagonist identity, inclusion of the box familiarization and the location of the box familiarization (for those participants who saw it) also did not have an      24 influence on the data (chi-square tests, all p’s >.28). During the manual choice test phase, infants were significantly more likely to choose the prosocial over the antisocial puppet (68 out of 115 infants or 59%, binomial test, p = .05; Fig. 5).  With a much larger sample size, we were able to detect infants’ preference for a prosocial over an antisocial puppet.   Figure 5. The number of infants who reached for the helper and hinder in the larger sample. Infants displayed a preference for the helper over the hinderer (68 of 115 infants chose the helper, * p = .05).   Offline Coding Results - The puppets’ running speed (Fig. 6, 7 & 8) nor the size of the box opening (Fig. 9 & 10) during both helping and hindering events did not significantly impact infants’ choice (Mann-Whitney U tests, all p’s >.39). Likewise, whether or not the protagonist released the box on the last two struggles (Fig. 11) also did not significantly affect infants’ choice (Chi-square test, p >.55).  In addition, the jump size ratings for both the protagonist (Fig. 12) and the hinderer (Fig. 13) did not significantly affect infants’ choice (Mann-Whitney U tests, all p’s >.48). Alternatively, results indicated that ratings for the helpers’ jumps was significantly greater for those who chose the helper than for those who chose the hinderer (the mean ranks of the infants who chose the helper and the hinderer were 59.1 and 47.8, respectively; U = 1113.0, p 684701020304050607080# of InfantsHelperHinderer*      25 < .03; Fig. 14). Specifically, infants’ tendency to prefer the helper was increased when it performed a larger jump.                                                                          Figure 6. The frequency distributions for the number of infants who chose either the helper or the hinderer broken down by the rating score given for the helper’s running speed for the entire sample (left) and the pre-registered sample (right), p >.39.    Figure 7. The frequency distributions for the number of infants who chose either the helper or the hinderer broken down by the rating score given for the hinderer’s running speed for the entire sample (left) and the pre-registered sample (right), p >.39.  05101520253035Zero One TwoHelper Hinderer024681012Zero One TwoHelper Hinderer0510152025303540Zero One TwoHelper Hinderer0246810121416Zero One TwoHelper Hinderer     26    Figure 8. The frequency distributions for the number of infants who chose either the helper or the hinderer broken down by the rating score given for the protagonist’s running speed for the entire sample (left) and the pre-registered sample (right), p >.39.    Figure 9. The frequency distributions for the number of infants who chose either the helper or the hinderer broken down by the rating score given for the helper’s jumps the size of the box opening during helping trials for the entire sample (left) and the pre-registered sample (right), p >.39. 05101520253035Zero One TwoHelper Hinderer02468101214Zero One TwoHelper Hinderer05101520253035404550Zero One TwoHelper Hinderer0246810121416Zero One TwoHelper Hinderer     27  Figure 10. The frequency distributions for the number of infants who chose either the helper or the hinderer broken down by the rating score given for the size of the box opening during hindering trials for the entire sample (left) and the pre-registered sample (right), p >.39.    Figure 11. The frequency distributions for the number of infants who chose either the helper or the hinderer broken down by whether or not the protagonist on the last two trials for the entire sample (left) and the pre-registered sample (right), p >.55  051015202530354045Zero One TwoHelper Hinderer024681012Zero One TwoHelper Hinderer05101520253035404550Yes NoHelper Hinderer02468101214161820Yes NoHelper Hinderer     28    Figure 12. The frequency distributions for the number of infants who chose either the helper or the hinderer broken down by the rating score given for the protagonist’s jumps for the entire sample (left) and the pre-registered sample (right), p >.48.       Figure 13. The frequency distributions for the number of infants who chose either the helper or the hinderer broken down by the rating score given for the hinderer’s jumps for the entire sample (left) and the pre-registered sample (right), p >.48.      051015202530354045Zero One TwoHelper Hinderer024681012141618Zero One TwoHelper Hinderer05101520253035Zero One Two ThreeHelper Hinderer024681012Zero One Two ThreeHelper Hinderer     29    Figure 14. The frequency distributions for the number of infants who chose either the helper or the hinderer broken down by the rating score given for the helper’s jumps. The helper’s jumps received higher ratings for infants who chose the helper than for infants who chose the hinderer, U = 1113.0, *p < .03.   Given that these variables were selected because we thought they may have shifted over time, it seemed appropriate to look at how the helper’s jump score impacted infants’ choices when grouped by those who belonged to the pre-registered sample and those who did not belong to the pre-registered sample. Results indicated that the effect did not hold for infants in the pre-registered sample (Mann-Whitney U tests, p >.39; Fig. 15). However, when we looked at infants who did not belong to the pre-registered sample (n = 77), they were more likely to choose the helper when its jumps were rated higher (the mean ranks of the infants who chose the helper were and the hinderer were 43.1 and 32.9, respectively; U = 523.5, p = .031; Fig. 15).  The significant results for the larger sample appear to be driven by those participants that did not belong to the pre-registered sample. Therefore, it looks like the difference in the shows over time may have influenced infants’ decisions during the manual choice test phase.  *      30      Figure 15. The frequency distributions for the number of infants who chose the helper/hinder broken down by rating and grouped by infants in the pre-registration sample (left) and infants not in the pre-registration sample (right). The helper’s jumps were rated higher only for infants who chose the helper and did not belong to the pre-registered sample, U = 523.5, *p = .031.  Bayesian Analyses- With the recent published warning against inference based on p values by the American Statistical Association (Wasserstein & Lazar, 2016) and the rise in the use of Bayesian methods in developmental work, we wanted to see if main exploratory findings held using Bayesian statistics. We first ran a Bayesian logistic regression as an equivalent for a binomial test in R and found that infants have a baseline preference for the prosocial over the antisocial puppet, β = .40 with 95% credible interval [.003, .75], replicating our frequentist analysis.  To see if the helper’s jump score impacted infants’ choice during test, we ran a Bayesian logistic regression with two predictors, infants’ baseline preference for the helper and the helper’s jump score rating. Results indicated that infants’ baseline preference for the helper is augmented by the helper’s jump score such that the higher the size of the helper’s jump, the more likely the infant is to pick the helper (β = .50 with 95% credible interval [.02, .93]; Fig. 16). Both analyses yielded parameter values that fall within their 95% credible interval.  024681012141618Zero One TwoHelper Hinderer0510152025Zero One TwoHelper Hinderer*     31                                     Figure 16. Posterior parameter correlation plot that demonstrates that infants’ baseline preference for the helper is augmented by the helper’s jump score such infants were more likely to pick the helper when his jumps were rated higher, (β = .50 with 95% credible interval [.02, .93].  In order to evaluate how well the model fits to our data, we first created trace plots to visually examine model convergence. If the trace plot is fuzzy and doesn’t have any big gaps, breaks or large spikes, then it suggests that the chains have converged (Lunn, Jackson, Best, Thomas, & Spiegelhalter, 2012). These chains come from the Markov chain Monte Carlo (MCMC). The MCMC methods generate samples from a target distribution and each sample depends on the previous, hence the name “chain”. From our trace plots, it would appear that our Bayesian inference based on an MCMC sample is valid such that the Markov chain has converged, and the sample is drawn from the desired posterior distribution. More specifically, it indicates that there is a baseline bias for the helper (see Fig. 17). Another diagnostic metric is the Rhat statistic that compares the variances between chains (Gelman & Rubin, 1992; Sorensen & Vasishth, 2015). Specifically, the Rhat value is the ratio of the average variance of the draws      32 within each chain to the variance of the pooled draws across chains. This value should be around 1.0 ± 0.1. If the Rhat is 1 then the chains are in equilibrium (i.e. the chains converged). Both analyses yielded Rhat values of 1, indicating that chains converged to the posterior distribution. Again, providing an indication that our posterior values for the baseline bias for the helper and the helper’s jump score rating are correlated.  Figure 17. Trace plot indicates that the MCMC has achieved convergence, such that our posterior values indicate a bias for the helper after 4,000 iterations. Different colors denote different chains.   General Discussion  Our pre-registered sample did not replicate the original finding seen in Hamlin & Wynn (2011). However, when including all infants tested in the box scenario, the finding did indeed replicate. Additionally, Bayesian methods indicated that infants’ have a baseline bias for the      33 prosocial over the antisocial puppet. As mentioned previously, the failed replication may be due to a small effect size and/or variation in methodology.  Our pre-registered sample of 32 was not able to display a significant preference for the helpful puppet, but when we analyzed the data from 115 infants were able to capture the effect. This may be due to the fact that the effect size is smaller than originally thought. In the original study, 75% of the sample of nine-month old infants and 73% of the sample of 5-month-old infants preferred the prosocial puppet (Hamlin & Wynn, 2011). A successful replication study found that 94% of 9-month-old infants also preferred the prosocial puppet (Steckler et al., 2017). However, two unsuccessful replication attempts, testing 9-month-old infants, found that 62.5% (study 1) and 50% (study 2) preferred the prosocial over the antisocial puppet (Salvadori et al., 2015). Furthermore, if Salvadori and colleagues (2015) had collapsed across both studies, they still would not have yielded significant results (27 out of 48 infants, binomial test, p >.47). We found that 56% of our unsuccessful pre-registered replication and 59% of our entire sample of infants displayed a preference for the prosocial puppet. All three unsuccessful replication attempts, and the entire sample displayed similar preference percentages, but a significant result was only found with a sample that more than tripled the size of the other studies. Therefore, it may be the case that the effect is smaller than anticipated. However, it should be noted that all of these studies had deviations in their methodologies that could have impacted their results.  One difference present among all three unsuccessful replication attempts, and the entire sample is that parents were asked to close their eyes during habituation events. A recent study found that infants can “catch” their mothers’ affective states. Specifically, they found that mothers and infant dyad’s physiological responses synchronized after mothers engaged in either a low-arousal positive/relaxation task or a high-arousal negative/stress task (Waters, West,      34 Karnilowicz, & Mendes, 2017). Given that there is research to suggest that infants can pick up on parent’s positive and negative affective states, it may be possible that parents’ responses to the puppet shows could have impacted infants’ own responses. Therefore, we cannot rule out the possibility that parents inadvertently influenced infants’ perception of the puppet shows.  A follow-up study should be run to see if allowing parents to watch the habituation events affects infants’ behavior.  Our post-hoc exploratory analyses provided some insight about how variations in the puppet show presentation may impact infants’ preferences. We found, using both frequentist and Bayesian methods, that infants were more likely to prefer the prosocial puppet when the helper’s jump score was higher. One reason is that infants may only be using low-level perceptual cues to inform their decision making and choose whomever jumps higher. In order to rule out this possibility, we looked to see if infants were more likely to choose whoever received the higher jump score rating. We found that the hinderer’s jumps were rated higher for all but 5 participants (the helper and hinderer’s jumps received equal ratings). Given that not all or even most infants preferred the hinderer, it seems that infants didn’t choose based on jumping alone. As mentioned previously, in the original paper infants did not display a significant preference for a prosocial or antisocial puppet when the protagonist was an inanimate entity (Hamlin & Wynn, 2011). Therefore, it appears unlikely that infants’ preferences are based purely on the low-level physical aspects of the displays.  Another possibility is that infants could like the jumping movement and also find the hindering aversive. Thus, the liking of jumping could be competing with infants’ preference for the helper. For example, Scarf and colleagues (2012) adapted their own hill stimuli (based off      35 Hamlin, Wynn, & Bloom, 2007) where they modified certain physical aspects of the display (removing or adding bouncing movements). They found that 10-month-old infants’ preferences were impacted by the presence of the climber’s bouncing behavior such that they consistently chose any agent that was linked to the climber bouncing. Therefore, it may be possible that infants like seeing the jumping behavior and instances where the helper doesn’t jump as high as the hinderer, they may not be as inclined to choose the helper.  The richest interpretation would be that infants evaluate the helper more positively if their physical actions better match their intention to help. As mentioned previously, goal encoding plays an important role in infants’ sociomoral evaluations (Hamlin, 2015). That is to say when the helper makes little to no jump, it may be difficult for the infant to understand what the helper is trying to accomplish. Furthermore, it may appear that the helper doesn’t seem motivated to help because so little effort is put in to complete the task at hand. On the other hand, when the helper jumps higher his goal may be more salient to the infant. The helper may also look more motivated to help when he jumps higher, as if he is there to “save the day”.  Lastly, the fact that most of the prosocial and antisocial character’s jumps did not match may be a potential confound. As previously mentioned, the slightest variation can impact one’s perception and if the physical aspects of the display, specifically the prosocial and antisocial puppets’ jumps, do not match it makes it more challenging to know what information is driving their choices. It may be the case that the results from exploratory analyses are spurious, but it doesn’t change the fact that certain aspects of the show shifted over time. If the puppets’ jumps were equally high, it would help rule out the possibility that differences in jump height are influencing infants’ evaluations.       36 Either way, future work should continue to explore under what conditions infants do and do not prefer prosocial others.  Specifically, a replication study should be done where parents are allowed to watch the puppet shows to see if parental, albeit likely subconscious, influence impacts infants’ preferences. Another study, where the jumps of the prosocial and antisocial puppet are equated, should be run to see if the original findings could be replicated under these conditions. If so, it would suggest that the differences jumps are indeed a confound. These post-hoc analyses should be taken with a grain of salt, as the proposed confirmatory work needs to be done. Yet, these findings may suggest that the box scenario effect may not be as strong and therefore more susceptible to noise in the data. Overall, our findings add to the scientific understanding of infant social evaluation and provide an important opportunity to add to the replicability movement, thereby helping foster a more robust science and better confidence in our results.                37 References Behne, T., Carpenter, M., Call, J., & Tomasello, M. (2005). Unwilling versus unable: infants’ understanding of intentional action. Developmental Psychology, 41(2), 328. Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, F. J., Geller, J., Giner-Sorolla, R., … Van’t Veer, A. (2014). The replication recipe: What makes for a convincing replication? Journal of Experimental Social Psychology, 50, 217–224. Buon, M., Jacob, P., Margules, S., Brunet, I., Dutat, M., Cabrol, D., & Dupoux, E. (2014). Friend or foe? Early social evaluation of human interactions. PloS One, 9(2), e88612. Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365. Casstevens, R. M. (2007). jHab: Java habituation software (version 1.0. 2)[computer software]. Chevy Chase, MD. Chae, J. J. K., & Song, H. (2018). Negativity bias in infants’ expectations about agents’ dispositions. British Journal of Developmental Psychology. Choi, Y., & Luo, Y. (2015). 13-month-olds’ understanding of social interactions. Psychological Science, 26(3), 274–283. Collaboration, O. S. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. https://doi.org/10.1126/science.aac4716 Cowell, J. M., & Decety, J. (2015). Precursors to morality in development as a complex interplay between neural, socioenvironmental, and behavioral facets. Proceedings of the National Academy of Sciences, 112(41), 12657–12662. Dunfield, K. A., & Kuhlmeier, V. A. (2010). Intention-mediated selective helping in infancy.      38 Psychological Science, 21(4), 523–527. Eason, A. E., Hamlin, J. K., & Sommerville, J. A. (2017). A Survey of Common Practices in Infancy Research: Description of Policies, Consistency Across and Within Labs, and Suggestions for Improvements. Infancy, 22(4), 470–491. https://doi.org/10.1111/infa.12183 Frank, M. C., Bergelson, E., Bergmann, C., Cristia, A., Floccia, C., Gervain, J., … Levelt, C. (2017). A collaborative approach to infant research: Promoting reproducibility, best practices, and theory-building. Infancy, 22(4), 421–435. Galak, J., LeBoeuf, R. A., Nelson, L. D., & Simmons, J. P. (2012). Correcting the past: Failures to replicate psi. Journal of Personality and Social Psychology, 103(6), 933–948. https://doi.org/10.1037/a0029709 Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472. Geraci, A., & Surian, L. (2011). The developmental roots of fairness: infants’ reactions to equal and unequal distributions of resources: Evaluation of fairness of distributive actions. Developmental Science, 14(5), 1012–1020. https://doi.org/10.1111/j.1467-7687.2011.01048.x Gonzales, J. E., & Cunningham, C. A. (2015). The promise of pre-registration in psychological research. Psychological Science Agenda, 29(8). Hamlin, J. K. (2013). Failed attempts to help and harm: Intention versus outcome in preverbal infants’ social evaluations. Cognition, 128(3), 451–474. Hamlin, J. K. (2015). The case for social evaluation in preverbal infants: gazing toward one’s goal drives infants’ preferences for Helpers over Hinderers in the hill paradigm. Frontiers in Psychology, 5. https://doi.org/10.3389/fpsyg.2014.01563 Hamlin, J. K., & Wynn, K. (2011). Young infants prefer prosocial to antisocial others. Cognitive      39 Development, 26(1), 30–39. Hamlin, J. K., Wynn, K., & Bloom, P. (2007). Social evaluation by preverbal infants. Nature, 450(7169), 557. Harless, D. W. (1992). Actions versus prospects: The effect of problem representation on regret. The American Economic Review, 82(3), 634–649. Head, M. L., Holman, L., Lanfear, R., Kahn, A. T., & Jennions, M. D. (2015). The Extent and Consequences of P-Hacking in Science. PLOS Biology, 13(3), e1002106. https://doi.org/10.1371/journal.pbio.1002106 Holvoet, C., Scola, C., Arciszewski, T., & Picard, D. (2016). Infants’ preference for prosocial behaviors: A literature review. Infant Behavior and Development, 45, 125–139. https://doi.org/10.1016/j.infbeh.2016.10.008 Ioannidis, J. P. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124. Ioannidis, J. P., Munafo, M. R., Fusar-Poli, P., Nosek, B. A., & David, S. P. (2014). Publication and other reporting biases in cognitive sciences: detection, prevalence, and prevention. Trends in Cognitive Sciences, 18(5), 235–241. John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling                                                    ,                                                             Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling. Psychological Science, 23(5), 524–532. https://doi.org/10.1177/0956797611430953 Kanakogi, Y., Inoue, Y., Matsuda, G., Butler, D., Hiraki, K., & Myowa-Yamakoshi, M. (2017). Preverbal infants affirm third-party interventions that protect victims from aggressors. Nature Human Behaviour, 1(2), 37. Kanakogi, Y., Okumura, Y., Inoue, Y., Kitazaki, M., & Itakura, S. (2013). Rudimentary sympathy in      40 preverbal infants: preference for others in distress. PloS One, 8(6), e65292. Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2(3), 196–217. Kessler, J. B., & Meier, S. (2014). Learning from (failed) replications: Cognitive load manipulations and charitable giving. Journal of Economic Behavior & Organization, 102, 10–13. https://doi.org/10.1016/j.jebo.2014.02.005 Kiley Hamlin, J., Ullman, T., Tenenbaum, J., Goodman, N., & Baker, C. (2013). The mentalistic basis of core social cognition: Experiments in preverbal infants and a computational model. Developmental Science, 16(2), 209–226. Kiley Hamlin, J., Wynn, K., & Bloom, P. (2010). Three-month-olds show a negativity bias in their social evaluations. Developmental Science, 13(6), 923–929. Klein, O., Hardwicke, T. E., Aust, F., Breuer, J., Danielsson, H., Mohr, A. H., … Frank, M. C. (2018). A practical guide for transparency in psychological science. Collabra: Psychology, 4(1). Kruschke, J. K. (2013). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General, 142(2), 573. Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication bias in psychology: a diagnosis based on the correlation between effect size and sample size. PloS One, 9(9), e105825. Lunn, D., Jackson, C., Best, N., Thomas, A., & Spiegelhalter, D. (2012). The BUGS book: A practical introduction to Bayesian analysis. CRC press. Margoni, F., & Surian, L. (2018). Infants’ evaluation of prosocial and antisocial agents: A meta-analysis. Developmental Psychology. Maxwell, S. E., Lau, M. Y., & Howard, G. S. (2015). Is psychology suffering from a replication crisis? What does “failure to replicate” really mean? American Psychologist, 70(6), 487.      41 Premack, D. (1990). The infant’s theory of self-propelled objects. Cognition, 36(1), 1–16. Salvadori, E., Blazsekova, T., Volein, A., Karap, Z., Tatone, D., Mascaro, O., & Csibra, G. (2015). Probing the Strength of Infants’ Preference for Helpers over Hinderers: Two Replication Attempts of Hamlin and Wynn (2011). PLOS ONE, 10(11), e0140570. https://doi.org/10.1371/journal.pone.0140570 Savalei, V., & Dunn, E. (2015). Is the call to abandon p-values the red herring of the replicability crisis? Frontiers in Psychology, 6, 245. Scarf, D., Imuta, K., Colombo, M., & Hayne, H. (2012). Social evaluation or simple association? Simple associations may explain moral reasoning in infants. PloS One, 7(8), e42698. Scola, C., Holvoet, C., Arciszewski, T., & Picard, D. (2015). Further evidence for infants’ preference for prosocial over antisocial behaviors. Infancy, 20(6), 684–692. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. Sorensen, T., & Vasishth, S. (2015). Bayesian linear mixed models using Stan: A tutorial for psychologists, linguists, and cognitive scientists. arXiv Preprint arXiv:1506.06201. Stanley, D. J., & Spence, J. R. (2014). Expectations for replications: Are yours realistic? Perspectives on Psychological Science, 9(3), 305–318. Steckler, C. M., Woo, B. M., & Hamlin, J. K. (2017). The limits of early social evaluation: 9-month-olds fail to generate social evaluations of individuals who behave inconsistently. Cognition, 167, 255–265. Tasimi, A., & Wynn, K. (2016). Costly rejection of wrongdoers by infants and children. Cognition, 151, 76–79.      42 Van de Schoot, R., Kaplan, D., Denissen, J., Asendorpf, J. B., Neyer, F. J., & van Aken, M. A. (2014). A gentle introduction to Bayesian analysis: Applications to developmental research. Child Development, 85(3), 842–860. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70(2), 129–133. https://doi.org/10.1080/00031305.2016.1154108 Waters, S. F., West, T. V., Karnilowicz, H. R., & Mendes, W. B. (2017). Affect contagion between mothers and infants: Examining valence and touch. Journal of Experimental Psychology. General, 146(7), 1043–1051. https://doi.org/10.1037/xge0000322 Woo, B. M., Steckler, C. M., Le, D. T., & Hamlin, J. K. (2017). Social evaluation of intentional, truly accidental, and negligently accidental helpers and harmers by 10-month-old infants. Cognition, 168, 154–163. https://doi.org/10.1016/j.cognition.2017.06.029 Ziv, T., & Sommerville, J. A. (2017). Developmental differences in infants’ fairness expectations from 6 to 15 months of age. Child Development, 88(6), 1930–1951.                

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0372317/manifest

Comment

Related Items