Structured Quantitative Inquiry Labs:Developing Critical Thinking in theIntroductory Physics LaboratorybyNatasha Grace HolmesB.Sc., The University of Guelph, 2009M.Sc., The University of British Columbia, 2011A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Physics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)December 2014c© Natasha Grace Holmes 2014AbstractMany undergraduate labs engage students in experimentation without developing criticalthinking or scientific reasoning skills, especially about measurement and data. In this the-sis, I present a pedagogical framework for developing students’ critical thinking behavioursin a first-year undergraduate physics lab. The main critical thinking behaviours assessedwere for students to reflect on their data collection and analyses, iterate to improve theirmeasurements and methods, and evaluate the experiments and theoretical models. Thepedagogy uses structured comparisons between measurements and models, with a criticalfocus on understanding measurement and uncertainty at a conceptual level and applyingthe concepts to quantitative analysis of data. Implementation involved scaffolded instruc-tions and support for reflection and iteration that was dynamically faded throughout thecourse. Through analysis of students’ written lab materials, I evaluated their engagement inreflection, iteration, and evaluation, comparing to a previous iteration of the course that didnot include the critical thinking scaffolding. Students in the new course structure not onlytransferred the previously scaffolded reflection and iteration behaviours to unscaffolded ex-periments, but also spontaneously evaluated theoretical models, which was never explicitlystructured. While the previous version of the course supported students in data analysisat a procedural, ‘plug-and-chug’ level, the new course structure significantly improved stu-dents’ critical thinking behaviours, shifted students into more expert-like epistemologicalframes, and improved their motivation and attitudes about experimental physics.iiPrefaceThis dissertation is an original intellectual product of the author, N.G. Holmes. This re-search has been approved by the full Behavioural Research Ethics Board at the Universityof British Columbia, certificate number H12-01241. Excerpts from lab books and interviewsare included with written permission from the participants.All data presented throughout the thesis was analyzed by N.G. Holmes; Dhaneesh Ku-mar also analyzed 10% of the data in Chapters 4, 5, and 6 for validation purposes. Parts ofthe data presented in these same chapters are currently being prepared for publication. Alldata was analyzed by N.G. Holmes and the manuscript is being written by N.G. Holmesand D.A. Bonn.Analysis of the Index of Refraction data in Year 1, presented in Chapter 5, has beenpublished in the 2013 PER Conference Proceedings [Portland, OR, July 17-18, 2013], editedby P. V. Engelhardt, A. D. Churukian, and D. L. Jones (Holmes & Bonn, 2013). The datawas collected and analyzed by N.G. Holmes; the manuscript was written by N.G. Holmesand D.A. Bonn, and the project was supervised by D.A. Bonn.The Achievement Goal Questionnaire items used in Section 7.1 were developed in collab-oration with Daniel Belenky, who also supervised the data analysis, though the full analysiswas carried out by N.G. Holmes.The material presented in Section 7.2 has been accepted for publication in the 2014 PERConference Proceedings [Minneapolis, MN, July 30-31, 2014], edited by P. V. Engelhardt, A.D. Churukian, and D. L. Jones (Holmes, Ives & Bonn, 2014). The survey was administeredby N.G. Holmes and Joss Ives; survey responses were collated by H.J. Lewandowski andTakako Hirokawa; data was analyzed by N.G. Holmes; the manuscript was written by N.G.Holmes, Joss Ives, and D.A. Bonn, and the project was supervised by D.A. Bonn.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Measurement and uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Inventing the data analysis toolbox . . . . . . . . . . . . . . . . . . . . . . 51.3 Critical thinking in the lab . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Pedagogy in the introductory lab . . . . . . . . . . . . . . . . . . . . . . . 72 Structured quantitative inquiry labs . . . . . . . . . . . . . . . . . . . . . . 112.1 Introduction to experimental physics course . . . . . . . . . . . . . . . . . . 112.1.1 Learning goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.2 The student population . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 Key features of the SQILabs . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3 Introducing the t′-score to students . . . . . . . . . . . . . . . . . . . . . . 192.4 The SQILab framework in practice: pendulum for pros . . . . . . . . . . . 202.5 Scaffolding in the SQILab . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2 Course experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2.1 Pendulum experiments . . . . . . . . . . . . . . . . . . . . . . . . . 32ivTable of Contents3.2.2 Pendulum for pros experiments . . . . . . . . . . . . . . . . . . . . 323.2.3 Analog and digital measurements . . . . . . . . . . . . . . . . . . . 333.2.4 Light intensity I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2.5 Light intensity II . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2.6 Radiation shielding . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2.7 Mass on a spring experiments . . . . . . . . . . . . . . . . . . . . . 353.2.8 Standing waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.2.9 Index of refraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.2.10 Diffraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.2.11 RC circuits experiments . . . . . . . . . . . . . . . . . . . . . . . . . 383.2.12 LR circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3 Other logistical changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.4 Data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.2.1 LR circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.2.2 Year-long effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.2.3 Tool-based reflective comments . . . . . . . . . . . . . . . . . . . . . 514.2.4 Quality of reflective comments . . . . . . . . . . . . . . . . . . . . . 554.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595 Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.1 Index of refraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.1.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.1.3 Iteration behaviours . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.1.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.2 Year-long effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.2.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.3 Motivation to iterate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786.1 LR circuits experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786.1.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806.2 Other examples of evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 82vTable of Contents6.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857 Motivation and attitudes in the lab . . . . . . . . . . . . . . . . . . . . . . 877.1 Achievement goals and motivation . . . . . . . . . . . . . . . . . . . . . . . 877.1.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 887.2 Attitudes and epistemologies . . . . . . . . . . . . . . . . . . . . . . . . . . 907.2.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 917.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 937.3 Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 967.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 978 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 988.1 Advice for implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1008.2 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104viList of Tables2.1 Decision tree of possible behaviours based on the outcomes of a least-squaresfit between a student’s data and a model. . . . . . . . . . . . . . . . . . . . 152.2 Decision tree of possible behaviours based on the outcomes of a least-squaresfit between a student’s data and a model. . . . . . . . . . . . . . . . . . . . 182.3 Sample data sets produced by a group of students in the Pendulum exper-iment. The three iterations represent their progression through the activityas they attempt to improve their measurement quality. . . . . . . . . . . . . 222.4 Summary of students’ behaviours each year during similar experiments com-paring the period of the pendulum at different amplitudes. Data representmean values within the class samples and standard uncertainties in the means. 232.5 Timeline of experiments and associated scaffolding in Year 2. Comparisonswere either made between individual measurements or between data andmodels. Issues such as common errors or limitations of models were built into many experiments. A high level means the item was explicitly instructedand there were marks in the grading scheme. Low level means the item wasonly present in the grading scheme. . . . . . . . . . . . . . . . . . . . . . . . 283.1 The table shows the timeline of the statistical tools introduced to studentsthroughout each year. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2 The table shows the accurate values of n for each of the three measurementsin the IoR experiment and the approximate values of the systematic effectsfor SL and TIR, as measured by the instructor and TAs in the lab course.Uncertainties are defined by the precision of the measuring instrument. . . 384.1 List of the number of analytic tools available for use and appropriate to usein each experiment each year. . . . . . . . . . . . . . . . . . . . . . . . . . . 444.2 ANOVA table for the number of tools students used in each experiment as afraction of the tools available to them in that experiment. ∗∗∗ p<.001. . . . 504.3 ANOVA table for the number of tools students commented on in each ex-periment as a fraction of the tools available to them in that experiment. ∗∗∗p<.001. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53viiList of Tables4.4 ANOVA table for the number of tools students commented on in each exper-iment as a fraction of the tools used in that experiment. ∗∗∗ p<.001. . . . . 534.5 ANOVA table for the maximum reflective comment level reached by studentsin three experiments. ∗∗∗ p<.001. . . . . . . . . . . . . . . . . . . . . . . . . 555.1 The table shows the accurate values of n for each of the three measurementsand the approximate values of the systematic effects for SL and TIR. Uncer-tainties are estimated from precision of the protractor. . . . . . . . . . . . . 635.2 The table presents the mean and standard uncertainty in the mean of stu-dents’ reported uncertainties for each measurement across years in the IoRexperiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.3 Results from the logistic regression comparing students’ iteration behaviourseach year across five unscaffolded experiments. ∗∗ p<.01. ∗∗∗ p<.001. . . . . 736.1 Table for χ2 tests of independence for students’ interaction with the interceptin the fit. ∗∗∗ p<.001. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807.1 The 9 items on the AGQ and their associated achievement goal orientations.Students were asked to rank their agreement on a 5-point Likert scale fromStrongly Disagree to Strongly Agree. . . . . . . . . . . . . . . . . . . . . . . 887.2 ANOVA table for the three achievement goal orientations. Bonferroni correc-tion was applied to account for the multiple comparisons (α=.02). ∗∗ p<.01.∗∗∗ p<.001. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 887.3 ANOVA table for students’ personal and expert beliefs on the E-CLASS itemsacross courses and time. ∗∗ p<.01. ∗∗∗ p<.001. . . . . . . . . . . . . . . . . 92viiiList of Figures2.1 Four contrasting cases of distributions for the t′-score Invention activity. Thefour pairs of graphs extract features regarding one’s confidence in how differ-ent are the distributions with respect to their means and standard deviations. 202.2 One student’s conclusion at the end of the Pendulum experiment discussesthe conflict between the outcome they expected and the results they haveobtained. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.1 Sample data set by a student in the RS experiment demonstrates the decaymodel of the Sr-90 source as a function of aluminum shielding. The change inmodel between the exponential decay and the constant background is distinct. 353.2 The diagram represents a schematic of the plexiglass and protractor orienta-tion for the SL measurement. The protractor has two 0◦ to 180◦ scales. Theprotractor and the beam are fixed such that the plexiglass itself is rotated toobtain the desired incidence angle. . . . . . . . . . . . . . . . . . . . . . . . 373.3 Circuit diagram for the LR Circuit experiment, where an inductor (L) and aresistor (R) are in series with the AC function generator. . . . . . . . . . . . 393.4 Example of the data collected by a student in the LR experiment, with dif-ferent possible fit lines. The solid, red line shows a y=mx, one-parameterfit, while the dashed, green line represents a two-parameter, y=mx+b fit.Although the theoretical model recommends the one-parameter fit, the datasuggests that a two-parameter model is better, due to additional resistancein the other circuit components. . . . . . . . . . . . . . . . . . . . . . . . . . 404.1 A student’s reflections from the LR experiment provides a clear sample of thecoding scheme. The student makes a level 1 comment about applying χ2 ontheir data, then analyzes that this value is high (level 2). A level 3 statementdescribes considering a different model, and then the student finally evaluatesthe new model by describing the much lower χ2 value. . . . . . . . . . . . . 46ixList of Figures4.2 A student’s reflections on a variety of tools in the LR experiment, first withlevel 1 comments about χ2 and the inductance, then analyzing the fit linecompared to the model (level 2). They then comment on χ2 being small,attributing it to large uncertainties (level 3). They justify their uncertaintydue to limitations of the measurement equipment (level 4). Finally theyprovide further suggestions for improvement (additional level 3). . . . . . . 474.3 Mean number and fraction of tools used and tools used and reflect on in theLR experiment each year. The fractions are as a fraction of the tools availablefor use, which differed each year. Error bars represent standard uncertaintiesin the mean. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.4 Distribution of the maximum comment levels reached by students in theLR experiment each year, showing students in Year 2 making significantlyhigher-level reflections. Error bars represent 95% confidence intervals on theproportions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.5 Distribution of the maximum comment levels reached by students in the LRexperiment on their χ2 values. Used corresponds to students using χ2 val-ues without any associated comments, Level 1-4 comments are application,analysis, synthesis, and evaluation comments. Error bars represent 95% con-fidence intervals of the proportions. . . . . . . . . . . . . . . . . . . . . . . . 504.6 The distribution of students’ tool use in each experiment each year, as a)the number of tools used or b) the fraction of tools used relative to thoseavailable for use. Error bars represent standard uncertainties in the mean. . 514.7 The distribution of students’ reflective tool-based comments in each experi-ment each year, as a) the number of tools commented on or b) the fraction oftools commented on relative to those available for use. Error bars representstandard uncertainties in the mean. . . . . . . . . . . . . . . . . . . . . . . . 524.8 Distribution of the number tools used and commented on in each experiment.Error bars represent standard uncertainties in the mean. The P3, MS1, andMS2 experiments were not analyzed in Year 1. . . . . . . . . . . . . . . . . 544.9 Distribution of the fraction of students using (faded colour) and reflectingon (solid colour) each tool (x-axis) in a variety of experiments (right-handy-axis) across the lab course each year. Error bars represent 95% confidenceintervals of the proportions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.10 The distribution of students’ maximum comment level across experimentseach year, where error bars represent 95% confidence intervals of the pro-portions. Level 0 means no comment was made, Level 1 comments are ap-plication of tools, Level 2 comments analyze application of tools, Level 3comments synthesize multiple ideas, and Level 4 comments are evaluativereflections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57xList of Figures5.1 Word cloud displaying key words that students submitted in response to thequestion, “Why didn’t we all get the exact same value for the period of thependulum?” The size of each word is proportional to the frequency withwhich it was submitted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.2 Excerpt from a student lab book demonstrates a corrected change from n of2.09 to 1.49 on the SL experiment due to a correction of the angle measurement. 645.3 The average n values reported by students each year for the SL (top) and TIR(bottom) measurements. Initial and final values were recorded if studentsmade changes to their measurements (or they would have the same numberrecorded for both if no changes were made). More students in Year 2 correcteda measurement error in the SL measurement than in Year 1, but few studentsboth years corrected an error in the TIR measurement. . . . . . . . . . . . . 655.4 The distribution of the types of measurements made in the IoR experiment.Measurements were either initially accurate, initially incorrect but then cor-rected, or inaccurate and never corrected. Error bars represent 95% confi-dence intervals of the proportions. . . . . . . . . . . . . . . . . . . . . . . . 685.5 The distribution of the methods through which students made changes. Val-ues were either crossed out and replaced with new ones without justificationor students provided clear descriptions and explanations for changes made.Error bars represent 95% confidence intervals of the proportions. . . . . . . 695.6 An excerpt from a student lab book shows an example of proposed measure-ment changes in the Pendulum for Pros experiment. . . . . . . . . . . . . . 705.7 Distribution of students making or proposing changes to their experimentalmethods across experiments each year. Error bars represent 95% confidenceintervals of the proportions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.8 Flowchart produced by one group of students describing how we do experi-ments in physics. All students’ flowcharts included forms of iterative loops,such as this one, to reflect on and improve data, precision, methods, or models. 756.1 An excerpt from a student lab book evaluating the given model in the LRexperiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796.2 The distribution of student evaluation behaviours during the LR circuitsexperiment each year shows a shift towards more expert-like evaluation be-haviours. More students in Year 2 included an intercept in their fit, com-mented that it did not match the theoretical model, and physically inter-preted the intercept as being due to additional resistance in the circuit. Errorbars represent 95% confidence intervals on the proportions. . . . . . . . . . 80xiList of Figures6.3 The distribution of graphical analyses made by students by the end of theLR circuits lab in Year 1 and Year 2 and within the first two-hours of the labin Year 2. Error bars represent 95% confidence intervals on the proportions.They are larger for the Year 2-2hour mark, since only groups, rather thanindividuals, were assessed. Bars in each year or time group may add to morethan 1, since students may have created any number of the three graphs. . 826.4 Samples from student lab books of evaluating theoretical models in the (a)Pendulum and (b) RS experiments. . . . . . . . . . . . . . . . . . . . . . . . 846.5 Sample from a student lab book of evaluating the theoretical model in theMS2 experiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857.1 Changes in motivation orientation over time for three achievement goal ori-entations: Mastery Approach (MAp); Performance Approach (PAp); andPerformance Avoidance (PAv). Error bars represent standard uncertaintiesof the mean. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 897.2 Changes in students’ personal and expert beliefs on the E-CLASS items for atraditional physics lab course and the SQILab. Error bars represent standarduncertainties in the mean. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93xiiList of AbbreviationsAC Alternating CurrentANOVA Analysis of VarianceBA Brewster’s AngleE-CLASS Colorado Learning and Attitudes about Science Survey for Exper-imental PhysicsIoR Index of RefractionLI1 Light Intensity ILI2 Light Intensity IILR Inductor-Resistor CircuitM MeanMAp Mastery ApproachMS1 Mass on a Spring IMS2 Mass on a Spring IIP1 Pendulum IP2 Pendulum for ProsP3 Pendulum for Pros IIPAp Performance ApproachPAv Performance AvoidanceRC1 Resistor-Capacitor Circuit 1RC2 Resistor-Capacitor Circuit 2RS Radiation ShieldingSD Standard DeviationSL Snell’s LawSQILab Structured Quantitative Inquiry LabSW Standing WavesTA Teaching AssistantTIR Total Internal ReflectionY1 Year 1Y2 Year 2xiiiAcknowledgementsI would first like to thank my research supervisor, Dr. Doug Bonn, who took me on as agraduate student in Physics Education Research despite having (several) large and successfulresearch groups in condensed matter physics and several years ahead of him as departmenthead. He modelled what it meant to be a good scientist, experimentalist, and academic.He taught me how to work independently, to lead, to juggle an ambitious to-do list, and tobalance work and life. He has been an incredible role model for my future career. Thankyou for taking the time to teach and work with me through this exciting project.I would also like to acknowledge the support of Dr. Carl Wieman, who was my firstintroduction to the world of PER, who encouraged me to pursue it for my graduate degree.He and Dr. Sarah Gilbert have always provided me with support, insight, and ideas at theprecise moments that I needed it. Carl also told us that our goal was much too lofty —that we could never get students to reflect in a first year course, a challenge Doug and Ienthusiastically accepted.I am indebted to a number of mentors who have helped me along the way. Dr. JamesDay — my conference buddy, sounding board, and presentation master. Dr. Ido Roll— the invention activity guru, who offered a constant supply of references, feedback, andenthusiastic discussions. Jim Carolan — the PER historian who was always ready with anarticle or a story to back up any discussion. Joss Ives — who, in a very short time, becamean incredible resource, support, and friend. Thank you all for all your help and supportover the last five years.I would also like to acknowledge the hard work and support of Dr. Dan Belenky,Dhaneesh Kumar, Dr. Heather Lewandowski, and Dr. Ben Zwickl who helped with variouselements of the data analysis, collection, and interpretation. My supervisory committeemembers, Drs. Ian Affleck, Simon Bates, and Deb Butler, have been extremely helpful inencouraging me to think harder about what I was doing, why I was doing it, and what itwould show me, providing new ideas, interpretations, and resources for my work. To allthe professors, peers, STLFs, members of the PHASER group (Dr. Georg Rieger and Dr.Jared Stang), and others I have forgotten who have taught and inspired me along the way,thank you.This research would not have been possible without the teaching assistants in Phys107/109,who were incredibly patient with us as we developed materials at the last minute, as weplayed a balancing act with our research goals and pedagogical goals. They provided ideasxivAcknowledgementsand feedback on every aspect of the course over the years, which undoubtedly contributedto its success. Of course, my thesis would not be a reality without the students in the lab:my data. Thank you for being patient with us as we worked out kinks in the course. Thankyou for signing up for interviews, for bearing your souls, being honest, and telling me likeit is. Thank you for learning.I would not be where I am today without the support of my friends and family. Myimmediate family, Sylvia, Brien, and Melanie, constantly inspire and motivate me to learn,and challenge what I think I understand. And Brad, who made sure I ate healthy andexercised, kept me grounded and focused, and supported me the whole way. Thank you allfor getting me here!xvChapter 1IntroductionOne of the goals for running lab courses in conjunction with lectures in university sciencecourses is for students to explore the course content through scientific inquiry (Galvez& Singh, 2010). The American Association of Physics Teachers (AAPT) put togethera list of goals for the undergraduate physics lab that focused on a set of themes thatinclude both inquiry processes and physics content, namely “The Art of Experimentation,”“Experimental and Analytical Skills,” “Conceptual Learning,” “Understanding the Basis ofKnowledge in Physics,” and “Developing Collaborative Learning Skills” (AAPT, 1998). InK-12 education, the Next Generation Science Standards in the US have moved the emphasisin learning science away from facts and content and towards scientific inquiry processes andskills, with one of the three core dimensions of the framework focusing on scientific practices(Stage, Asturias, Cheuk, Daro & Hampton, 2013). Traditionally, many physics labs focusexplicitly on reinforcing physical concepts presented in lecture and, at best, implicitly onissues of learning about experimentation.A goal of post-secondary education more generally, however, is the development of crit-ical thinking. In the sciences, or indeed any field that depends on data or experimentation,a crucial component of critical thinking is the ability to handle data, which inevitablyhas some uncertainty, and, while acknowledging that uncertainty, to make comparisons tomodels or to other experimental results. Data, and conclusions based on data, are in factencountered every day by any citizen, so critical thinking about data is essential in con-temporary life. An important and often misunderstood feature of critical thinking in thesciences is its iterative nature. Experts reflect on experimental results and comparisons tomodels and then act on those comparisons in various ways, such as adjusting or discardingmodels, improving data, or devising wholly new experiments. Much research has shownthat students struggle in all of these areas (Kanari & Millar, 2004; Kumassah, Ampiah &Adjei, 2013; Kung & Linder, 2006; Ryder, 2002; Ryder & Leach, 2000; Se´re´, Journeaux &Larcher, 1993).One of the first hurdles that stands in the way of teaching students about this itera-tive experimentation process is that students view the classroom environment as distinctfrom a research environment (Buffler, Lubben & Ibrahim, 2009; Se´re´, Fernandez-Gonzalez,Gallegos, Gonzalez-Garcia, Manuel, Perales & Leach, 2001). Research into student epis-temologies and attitudes has demonstrated that while students hold relatively expert per-ceptions about how scientists do physics, they do not hold those same beliefs personally11.1. Measurement and uncertainty(Gray, Adams, Wieman & Perkins, 2008). These attitudes and epistemologies, their be-liefs about how knowledge is created, can influence their learning (Lising & Elby, 2005),especially when epistemologies and attitudes that support learning differ from those thatsupport earning a good grade in a course (Elby, 2001).In an introductory physics lab, these epistemologies often surround definitions of scien-tific measurement and experimentation and can affect how students carry out experiments,especially since they often assume that they, as students, are not capable of conductinghigh quality experiments (Buffler et al., 2009; Se´re´ et al., 2001). Typical introductory labactivities can often aggravate this separation. Experiments that do not work well reinforcestudent feelings that they make ‘errors’ relative to experts (Allie, Buffler, Campbell, Lubben,Evangelinos, Psillos & Valassiades, 2003; Evangelinos, Psillos & Valassiades, 2002). Exper-iments that involve comparisons to well-known results reinforce the separation betweenstudent behaviour and the text book values produced by professional scientists (Allie, Buf-fler, Campbell & Lubben, 1998; Buffler, Allie & Lubben, 2001). Rushed experiments withtoo many, often implicit, learning goals, do not afford students an opportunity to thinkcritically about their data (Holmes & Bonn, 2013; Lippmann, 2003). Underlying many ofthese issues is a need to develop students’ fundamental understanding of measurement anduncertainty (Volkwyn, Allie, Buffler & Lubben, 2008).1.1 Measurement and uncertaintyResearch into students’ understanding of measurement and uncertainty has suggested twoconceptual paradigms: a set and a point paradigm (Buffler et al., 2001, 2009). Buffler andcolleagues define these terms as follows:“The point paradigm is characterized by the notion that each measurementcould in principle be the true value. ... The set paradigm is characterized bythe notion that each measurement is only an approximation to the true valueand that the deviation from the true value is random.” (Buffler et al., 2001, p.1139)The point paradigm emphasizes the importance of any single piece of data, placing specialvalue on the individual measured value. In contrast, the set paradigm emphasizes only theimportance of a collection of data, recognizing that an individual measured value is onlyan estimate of the physical quantity being measured. In the lab, this distinction manifestsitself in a variety of ways. For example, the use of the word ‘error,’ used synonymously with‘uncertainty,’ reinforces point-like thinking (Allie et al., 2003; Evangelinos et al., 2002).Expert use of the term ‘error’, when encountered in ‘error bars’ or ‘error propagation,’differs from the colloquial and student understanding of the term (interpreted as meaning21.1. Measurement and uncertaintymistake). As Buffler and colleagues describe, students may hold a point-like understand-ing of measurement with the view that measurement ‘errors’ are actual mistakes that havecaused the measured value to be different from the ‘true value’ (Buffler et al., 2009). Thus,point-like thinking involves students’ interpretation of measurement errors (used synony-mously with uncertainties) as measures of accuracy, rather than precision. In fact, manystudents are introduced to the percent-error equation in high school:% error =|Actual −Measured|Actual× 100%, (1.1)which compares the measured value with a ‘true’ or ‘actual’ value from authority, such asa textbook. The use of the word ‘error’ in Equation 1.1 indeed refers to a quantitativemeasure of accuracy or a systematic deviation. One can use Equation 1.1 to compare twouncertainty-less values, but the use of the word ‘error’ in this case is not synonymous with theword ‘uncertainty.’ It is clear, then, how students subsequently misinterpret errors, intendedas meaning uncertainties, as literal errors, expressing the deviation of a measurement fromits ‘actual’ value (Se´re´ et al., 2001).Even as a measure of accuracy, however, Equation 1.1 is problematic since neither the‘actual’ or ‘measured’ values have associated uncertainties. This is also problematic in thePearson’s χ2 test, where,χ2 =n∑i=1(Oi − Ei)2Ei, (1.2)Oi and Ei (the observed and expected frequencies, respectively) are introduced without as-sociated uncertainty. In both of these equations (Equation 1.1 and 1.2), special importance(or weight) is placed on the theoretical or actual value, instead of on the measurement un-certainty. Weighting by the actual value, rather than by the measurement uncertainty, is anextreme reinforcement of point-like thinking, since it ignores the probability distributionsthat underly both values.The combination of the emphasis on ‘true values’ and the terminology of ‘errors’ rein-forces the point-like reasoning that zero error (synonymous with uncertainty) correspondsto a single, perfect measurement of the true value. Indeed, if the actual value is equal to themeasured value in Equation 1.1, the percent error is zero. As such, students believe thatmeasurement ‘errors’ (again, meaning uncertainty) can be reduced to zero and a perfectmeasurement of the true value could be made, presumably by scientists in a lab (Buffleret al., 2009; Leach, Millar, Ryder, Se´re´, Hammelev, Niedderer & Tselfes, 1998; Se´re´ et al.,2001). It is impossible to make a perfect measurement with no uncertainty, and so, we cannever exactly know the true value. In many cases of authentic experimental science researcha theoretical value does not yet even exist. There are, of course, theoretical or predictedvalues, but those, generally, also come with an associated probability distribution. Exclu-31.1. Measurement and uncertaintysively using the term uncertainty in place of error is a small step towards improving theinterpretation of measurement uncertainties into more set-like paradigms(Allie et al., 2003).It could also be argued that classrooms should avoid terminology such as ‘true’ or ‘actual’values. Unfortunately, many experts in a variety of fields commonly use this language.In statistics, one is often attempting to estimate the true population value of a parame-ter from a sample of measurements. In this treatment, there is a theoretical true value, forexample, the mean value for every individual in the population. In most physics measure-ments, however, there is no finite population from which to draw samples. As such, we canonly ever estimate the value through many high precision measurements. Fortunately, eventhe population mean in statistics has an associated frequency distribution (from the finiteset of measurements from the population) and, therefore, an associated uncertainty.Beyond the percent error equation, the traditional treatment of comparing multiple mea-surements with uncertainty also poses pedagogical problems (Etkina, Karelina & Ruibal-Villasenor, 2008; Kung & Linder, 2006; Volkwyn et al., 2008). When students enter the firstyear lab, most of their experience comparing two measurements involves checking whethertheir uncertainty ranges overlap. This dichotomous comparison (either the ranges overlapindicating agreement or do not overlap indicating disagreement) differs significantly fromauthentic scientific practice. In the behavioural sciences, the difference between two meansis only said to be statistically different if no more than 5% of their probability distribu-tions overlap (which would be about a difference of 2 units of uncertainty, or a 2-σ leveldifference). In particle physics, a discovery is not made unless it is different at the 5-σlevel. While the threshold for agreement may differ depending on the research questionor field (Krzywinski & Altman, 2013), rarely is the comparison a binary one at the 1-σlevel. The binary comparison also reinforces point-like thinking as it ignores the probabilitydistribution that characterizes each measurement. This issue is reinforced by the standarduse of error bars, which provides a discrete range on either the side of the measured value.While experts may be able to overlay the continuous probability distribution, this visualrepresentation reinforces the notion that the ‘true’ value must exist within that range.The mathematical notation itself (using the ± symbol) is guilty of reinforcing this idea.It also supports more extreme versions of point-like thinking, especially if the language (plusor minus) is interpreted literally, that is, that the value is either the measured value plusits uncertainty or the measured value minus its uncertainty. This places the importanceon the extremes of the range (again, a 1σ range) rather than on the central value. Thisinterpretation of the ± symbol is analogous to its use when solving the quadratic equation,with which students are much more familiar from high school. In that case, the ± symbolis used to determine the two different roots of the equation and the solution is one or theother extreme (or both), rather than the values in between.There have been a number of recommendations for shifting students’ reasoning frompoint-like paradigms towards set-like reasoning. A probabilistic treatment of uncertainty41.2. Inventing the data analysis toolboxhas been shown to vastly improve students’ set-like reasoning (Buffler, Allie & Lubben, 2008;Buffler, Allie, Lubben & Campbell, 2007; Kung, 2005; Pillay, Buffler, Lubben & Allie, 2008).Even with this understanding, however, students still struggle to make set-like comparisonsbetween measurements Etkina et al. (2008); Kung (2005); Kung & Linder (2006), a criticalprocedure for evaluating experimental procedures and theoretical models. In addition to thisfundamental conceptual understanding, there are also a number of data handling skills andtools that one must understand to further interpret and evaluate experimental outcomes.In the next section, I will discuss the issues with developing the procedural and conceptualcomponents of this toolbox, and then discuss the critical thinking skills necessary to applythe measurement and analysis concepts to experiments in the lab.1.2 Inventing the data analysis toolboxIf one is able to develop a foundation of measurement and uncertainty concepts, studentsthen need to apply these to data analysis techniques, such as least-squares fitting of datato models. Coming into an introductory physics lab, students are generally ill-equipped toprocess raw data (Day & Bonn, 2011; Slaughter, 2012). In fact, few introductory physicscourses make significant gains on assessments of data handling skills, with scores on diag-nostics remaining static until students reach graduate school (Day & Bonn, 2011; Slaughter,2012). The only gains observed in these studies were in courses where data processing skillswere explicit learning goals (Slaughter, 2012).These sorts of statistical concepts are often viewed as rote procedures (Day, Nakahara &Bonn, 2010), which are to be applied to data in a recursive plug-and-chug stance (Tuminaro& Redish, 2007). Students often miss the underlying structure of the statistics concepts.Inquiry activities such as Invention activities can help students deconstruct the equationsinto conceptual components, promoting deeper, long-term understanding and transfer (Day,Adams, Wieman, Schwartz & Bonn, 2014; Holmes, Day, Park, Bonn & Roll, 2014; Roll,Holmes, Day & Bonn, 2012; Schwartz & Martin, 2004). In an Invention activity, studentsmust invent a solution to a complex problem for which they do not yet know the rightanswer. Using carefully crafted contrasting cases (Bransford, Franks, Vye & Sherwood,1989; Day et al., 2014; Schwartz & Bransford, 1998; Schwartz, Chase, Oppezzo & Chin,2011), individual features of the concept become exposed through pairwise comparisonsof the cases. With the conceptual features in mind, students build a general model orsolution to apply across the cases. Though few students invent the expert solution, theInvention process prepares them to learn from subsequent instruction and transfer to newsituations better than if they had received the instruction first (Day, Holmes, Roll & Bonn,2013; Day et al., 2010; Holmes et al., 2014; Schwartz et al., 2011; Schwartz & Martin,2004). The Invention process provides a Productive Failure experience (Holmes et al., 2014;Kapur, 2008, 2012; Kapur & Bielaczyc, 2011) where students recognize their knowledge51.3. Critical thinking in the labgaps, thus preparing them for instruction. In addition to improved learning (Holmes et al.,2014; Schwartz & Martin, 2004) and inquiry behaviours (Holmes et al., 2014; Roll et al.,2012), Invention activities have been found to improve student’s motivation orientation overprocedural problem solving tasks (Belenky & Nokes-Malach, 2012), promoting motivationgoals focused on mastering content.While the Invention activities help students understand the functional (procedural) com-ponents of the tools, there are questions regarding how students use them. Do students knowwhen to use each tool? What do they do after they have applied the tool to their data?In a physics lab, we want our students to be using this toolbox to explore the nature ofscientific measurement, which involves critically reflecting on the data they have collected.1.3 Critical thinking in the labAlthough fundamental issues with understanding measurement and uncertainty, describedabove, mean that students are lacking a foundation for thinking critically about data, thecritical thinking skills themselves need to be addressed. In the lab, critical thinking ofteninvolves reflecting on the outcomes of data collection and analysis, designing follow-upexperiments or ways to improve measurements based on that reflection, and ultimatelyevaluating the results in light of theoretical models. Indeed there is research that showsthat students do not spontaneously reflect on their data (Se´re´ et al., 1993), iterate theirexperimental procedures to repeat or improve measurements (Kanari & Millar, 2004; Se´re´et al., 2001), or use theoretical models to support their analysis and evaluation (Leach et al.,1998; Ryder & Leach, 2000).Telling students to engage in these behaviours is, of course, insufficient, especially sincedirect instructions are unlikely to be transferred to new situations (Bransford et al., 1989;Bransford & Schwartz, 1999). Instead, students need to see how the effects of such actionswould be useful to them (Butler, 2002). Providing students with contrasting experienceshave been shown to be beneficial for learning and transfer, as we have just discussed inrelation to the data handling toolbox, but students need support in effectively using thecontrasts (Bransford et al., 1989; Gick & Holyoak, 1980; Holmes et al., 2014; Roll et al.,2012). For example, when students are given two problems with dissimilar overlaying sur-face features but with very similar underlying problems and solutions, students will notspontaneously use the solution of one problem when solving the analogous problem (Gick& Holyoak, 1980). Often they lack the metacognitive and self-regulative learning skills toapply critical thinking and reasoning skills independently and spontaneously (Butler, 2002;Schoenfeld, 1987).Structuring students’ development of these self-regulated learning skills into a largeuniversity course can be problematic (Butler, 2002, 2003). Successful instruction involvesbuilding up scaffolding of carefully structured prompts and then slowly fading or decon-61.4. Pedagogy in the introductory labstructing that scaffolding over time (Butler, 2002). Providing individualized feedback andsupport is also important (Butler, 2002; Lepper & Woolverton, 2002), but can become trou-blesome with large numbers of students. Labs have some advantage here. Firstly, class sizesare often much smaller than lectures and there are, typically, multiple teaching assistants(TAs) who circulate to assist students. The group work also means that students can useeach other as tutors. The lab periods are typically 2- to 3-hours in length, much longer thantypical lecture sessions. Even in labs, however, development of higher level reasoning skillsthrough careful scaffolding of activities and assessment requires significant time (Etkinaet al., 2008).Indeed, moving students from applying data analysis tools procedurally to using thosetools to think critically about the physical phenomena under investigation is non-trivial,but very important (Hoskinson, Couch, Zwickl, Hinko & Caballero, 2014; Zwickl, Finkel-stein & Lewandowski, 2013b). Students often fail to use theoretical models to supporttheir interpretation of data (Ryder & Leach, 2000), treating theory and data as distinctentities (Leach, 1999). Recently, work in advanced physics labs has aimed to incorporate“model-based inquiry” into lab design (Zwickl, Finkelstein & Lewandowski, 2013a). Mod-eling Instruction (Wells, Hestenes & Swackhamer, 1995) has been implemented in highschool physics instruction, but has diffused less into university courses (Brewe, 2008). Thepedagogy, which involves iterative cycles of using experiments, representations, and otheractivities to promote and evaluate student-developed models, has been shown to improvestudents’ conceptual understanding (Brewe, 2002; Wells et al., 1995), attitudes towards sci-ence (Brewe, Kramer & O’Brien, 2009), and problem-solving behaviours (Brewe, 2002). Itis unclear, however, how this approach should be combined with goals for developing con-cepts of measurement and uncertainty. There are some pedagogical approaches that aim tomake this connection, which shall be discussed in the following section.1.4 Pedagogy in the introductory labA number of lab courses have been designed to support student development of these andother scientific reasoning abilities. The Investigative Science Learning Environment (ISLE)is a lab philosophy that develops students’ conceptual knowledge and scientific abilitiesthrough experimental design and reflection (Etkina & Heuvelen, 2007; Etkina, Karelina,Ruibal-Villasenor, Rosengrant, Jordan & Hmelo-Silver, 2010). In a typical session, studentsobserve a new phenomenon and invent or propose possible ideas or mechanisms to explainthe observation. They then design experiments to test these ideas, ensuring that theycan predict the necessary outcomes of the experiment according to each of the possibleexplanations (if one explanation is true then this should happen but if another explanationis true then this other thing should happen). After carrying out the experiment, they reflecton the outcomes of the experiment in relation to their predictions, in an attempt to narrow71.4. Pedagogy in the introductory labdown the possible explanations, and iterate as necessary (Etkina & Heuvelen, 2007).This structure is somewhat analogous to that of Invention activities (Schwartz & Mar-tin, 2004), described above, in that students invent explanations or models and comparethe outcomes of the experiments with these models, with an ultimate goal of discoveringthe appropriate physical model. In Invention activities, however, it is not expected thatstudents will discover the expert model, but the activity prepares them to learn about theexpert model. This raises an important issue for labs that are not closely tied to the goingson of the lectures, since the students ought to engage in discovery of the physical conceptsprior to instruction. This format works well in course structures that integrate the variouscomponents of a course (lectures, labs, tutorials or recitations), such as in the SCALE-UPprogram (Beichner, Saul, Abbott, Morse, Deardorff, Allain, Bonham, Dancy & Risley, 2007;Beichner, Saul, Allain, Deardorff & Abbott, 2000) or studio physics approaches (Cummings,Marx, Thornton & Kuhl, 1999; Wilson, 1994), but become problematic in traditional struc-tures where the various components are disconnected and subsets of students experiencethe content presented in lectures, labs, and tutorials (or recitations) in different sequences.The benefit, however, is that, beyond the concept development students also acquirea variety of scientific abilities through carefully crafted marking rubrics and scaffolding(Etkina, Heuvelen, White-Brahmia, Brookes, Gentile, Murthy, Rosengrant & Warren, 2006;Etkina et al., 2010). These abilities include,“the ability to represent physical processes in multiple ways; the ability to de-vise and test a qualitative explanation or quantitative relationship; the abilityto modify a qualitative explanation or quantitative relationship; the ability todesign an experimental investigation; the ability to collect and analyze data; theability to evaluate experimental predictions and outcomes, conceptual claims,problem solutions, and models, and the ability to communicate” (Etkina et al.,2006, p. 1),each of which have additional, more specific sub-abilities. Developing these abilities takestime and repeated exposure with scaffolding prompts (Etkina et al., 2008), but studentslearn and are able to transfer these skills much better than students in traditional, cook-book labs (Etkina et al., 2006). Some abilities are more fragile and context-dependentthan others, with time constraints in the lab being a significant limitation to studentsfully presenting each ability in their written reports (Etkina et al., 2008). Consistent withresearch described in Section 1.1, evaluating experimental uncertainties was one of thehardest abilities for students to develop (Etkina et al., 2006, 2008).Labs that explicitly focus on teaching concepts of measurement and uncertainty, ratherthan just the calculations associated with them, improves students’ understanding of theseideas, with shifts towards the set paradigm (Buffler, Allie, Lubben & Campbell, 2003; Kung,2005; Lippmann, 2003; Pillay, 2006; Pillay et al., 2008). The focus on experimentation81.4. Pedagogy in the introductory labskills, especially when applied to experiments that the students themselves design, alsopromotes students’ self-efficacy, metacognition, and sensemaking behaviours (Etkina et al.,2010; Kung, 2002; Lippmann, 2003). Many of these courses, however, do not continue todevelop students’ data processing abilities with higher-level techniques, such as least-squaresfitting. Application and conceptual understanding of more complicated analysis techniquesare rarely discussed in the introductory labs, though are important to prepare students forsubsequent lab courses, especially in physics degree programs.Finding an effective pedagogy for developing the plethora of desirable skills must alsoaccount for myriad logistical issues that are unique to labs. Time is a critical issue in mostlab courses, as mentioned previously, especially if students submit their work at the endof the lab session. Students often put sensemaking aside as they rush to hand in theirlab reports (Holmes & Bonn, 2013; Lippmann, 2003). In one study, it was observed thatif students had many rushed experiences in lab, suddenly providing them with more thanenough time to complete a shorter experiment resulted in many students leaving early, manyeven with poorly completed lab reports and results (Holmes & Bonn, 2013).To avoid students’ default conclusions that disagreements with theoretical models aredue to ‘human error’, the equipment and experiments themselves must be of sufficientlyhigh quality (Holmes & Bonn, 2013; Se´re´ et al., 2001) and the measurement system suf-ficiently understood and modelled (Zwickl et al., 2013b). Obtaining measurements thatdiffer significantly from values in textbooks instills epistemological beliefs that physics inthe instructional lab differs from physics done by expert physicists, and that students doingthe experiments differ from ‘actual’ scientists. It also encourages inauthentic behaviourssuch as inflating measurement uncertainties so that student-measured values contain theexpert value (Holmes & Bonn, 2013). When the instructors understand the measurementsystem thoroughly, the students have the opportunity to understand it as well. This can bechallenging, however, with advanced experimental set-ups that use ‘black box’ equipment.The final logistical component is that, at many large universities, undergraduate labsare primarily facilitated by graduate TAs. There has been much research in understandingand developing TAs’ pedagogical and pedagogical content knowledge in physics classrooms(Goertzen, Scherr & Elby, 2009; Holmes, Martinuk, Ives & Warren, 2013; Maries & Singh,2013; Seung & Bryan, 2010; Spike & Finkelstein, 2011), so this is not the focus here. Theunique issue relevant for our discussion is that, since many undergraduate degree programsinsufficiently address many of the experimentation skills and concepts that we are tryingto teach the undergraduate students, the graduate TAs are not experts in this content. Ofcourse, graduate TAs are expert learners and so can develop the conceptual understandingmore quickly than the introductory students. It is important, however, that instructors notassume an expert-level understanding of the content presented in labs, especially with afocus on measurement, uncertainty, data analysis, and critical thinking.In this thesis, I aim to describe a course, the Structured Quantitative Inquiry Labs91.4. Pedagogy in the introductory lab(SQILabs), that gathers elements of many of the successful pedagogies described above topromote students’ conceptual understanding of measurement and uncertainty, develop asuite of data handling tools, and to engage students in critical thinking and reflection aboutthose tools and the physical models being investigated. I will also address many of thelogistical issues mentioned. The primary research questions are:a) Does the SQILab pedagogy engage students in quantitative reflection on data analysisand results with and without scaffolding?b) Does the SQILab pedagogy engage students in iterating to improve their measure-ments with and without scaffolding?c) Does the SQILab pedagogy prepare students to evaluate theoretical models based ontheir own data?d) How does the SQILab pedagogy affect students’ attitudes and epistemologies?e) How many scaffolded SQILab sessions does it take for students to then use the be-haviours independently without scaffolding?f) What elements of the SQILab are important for promoting students’ independentcritical thinking, high-level data analysis and reflection, and conceptual understandingof measurement and uncertainty?In the following chapters, I will describe the new course structure (Chapter 2) and providean overview of the general research methods (Chapter 3). The research methods above willbe evaluated independently in the following chapters (Chapters 4, 5, 6, and 7 cover questions1-4, respectively, while questions 5 and 6 are answered throughout). I will summarize theproject and the implementation and provide further research questions in the conclusions(Chapter 8).10Chapter 2Structured quantitative inquirylabsAs discussed in the introduction, there are many desirable goals for introductory physicslabs and many subtle issues involved in getting students to develop scientific reasoning skillsand think critically about their experimentation process. In this chapter, I will discuss thestructure and pedagogies involved in the Structured Quantitative Inquiry Labs (SQILabs)designed to support these issues. Before getting into the SQILab structure, I will providedetails of the learning environment in which the work took place. The details therein alsoapply to the previous version of the lab course, which will be referred to as the Year 1 coursein subsequent sections, with the SQILab referred to as Year 2.2.1 Introduction to experimental physics coursethe lab course being studied is an introductory physics lab course, called “Introduction toExperimental Physics” (hereby referred to as “the lab course”), at the University of BritishColumbia. The lab course takes place across two semesters, each semester consisting ofeight to ten weeks of lab sessions. Each lab session is three hours long and all student workis submitted at the end of the three hours. Students work in pairs or groups of three, butsubmit individual work as written notes in a lab book. The lab course does not involve anyhomework, though some resources are provided to the students for reference outside of thelab.2.1.1 Learning goalsThe lab course has over 40 learning goals1 that focus exclusively on developing skills relatedto measurement, uncertainty, graphing, and statistical analysis. Acquiring these skills takesplace in the context of conducting and analyzing physics experiments, but there is noexpectation that students will learn physics concepts through the lab. In the past few years,much research has been conducted investigating the quality of different learning strategiesfor teaching these concepts (Day, Holmes, Roll & Bonn, 2014; Day et al., 2010; Holmes, 2011;Holmes et al., 2014; Roll et al., 2012). Many of these concepts are taught using Invention1The learning goals can be found at www.phas.ubc.ca/∼phys109/LearningGoals.html112.1. Introduction to experimental physics courseactivities, where students invent solutions to problems prior to receiving instruction on theexpert solution (see Chapter 1 for descriptions of the benefits of Invention activities tolearning). One of the first Invention activities used in the lab course has been previouslypublished (see Day et al., 2010). The specific data handling skills explicitly developed usingInvention activities are:• Histograms;• Standard deviation and standard uncertainty in the mean;• Unweighted χ2 for least-squares fitting;• Uncertainty in the slope of a best-fitting line;• Weighted averages; and• Weighted (or reduced) χ2 for least-squares fitting.Four of the Invention activities (unweighted χ2, uncertainty in the slope, weighted av-erage, and weighted χ2) are presented to students using the Invention Support Environ-ment (Holmes, 2011), a computer-based learning environment to support Invention andsubsequent instruction and practice activities. Techniques for linearizing exponential andpower-law data (especially using ln-ln and semi-ln plots) are also developed in the labcourse, but were not found to work well in Invention activities. Instead, these conceptsare taught through worked examples (Atkinson, Derry, Renkl & Wortham, 2000), whichgive students samples of expert solutions developing and using the concepts with embed-ded self-explanation prompts (Chi, Leeuw, Chiu & Lavancher, 1994). In each lab where anew statistical tool is introduced, students apply the concept to the data analysis of thatsession’s experiment. In a few cases, students retroactively applied the new tool to datacollected in a previous week’s experiment. These elements of the lab course did not changebetween Year 1 and Year 2 and will be further elaborated in the next chapter.2.1.2 The student populationStudents in the lab course are, generally, concurrently enrolled in one of two introductorycalculus-based physics lecture streams, either through the Enriched Physics sequence orthe Science One Program. In both courses, students are generally of high-academic abilityand entered the university having previously taken grade 12 physics. The lab is distinctfrom both lecture components, with different instructors and TAs for each. Approximately150-180 students are enrolled in the lab course each year, with a maximum of 48 studentsin each of four lab sections. The TA to student ratio is, at worst 1:24, in each lab section.Many of the students in the Enriched Physics sequence plan to major in physics, astronomy,engineering physics, mathematics, or computer science, while the majority of the students122.2. Key features of the SQILabsin the Science One program plan to major in the life sciences. This demographic providesa unique population of students that come from a diverse collection of backgrounds with adiverse set of goals. It should be noted that, despite the high academic ability of the studentsin the lab course, their understanding of data processing, uncertainty, and measurementcoming into the lab course is on par with students in other first-year physics courses at ourand other institutions (Day & Bonn, 2011).2.2 Key features of the SQILabsFror a number of years, the lab course been focusing on developing students’ understandingof measurement and uncertainty, as well as their experimentation and data processing skills.There were, however, five higher-level learning goals that were being insufficiently addressed.These skills are:• To offer a plausible modification or further tests when confronted by a disagreementwith an expected model;• To devise experiments to search for and correct hidden systematic effects when con-fronted by a disagreement with an expected model;• To develop a new experiment that further tests a successful model after having drawna conclusion from an experiment;• To apply tactics for efficient data collection, including covering a wide variable rangequickly, when possible, and evaluating data early and adjusting choices ‘on-the-fly’;• To recognize whether numbers with an associated uncertainty are in agreement withone another or not.These inquiry process skills are certainly important ones for conducting authentic scientificinvestigations. The common feature of these skills is that they require students to thinkcritically about the data collection process and that data itself. A common issue in in-troductory labs, especially traditional “cookbook” labs, is that students work through theinstructions with little to no independent sensemaking. Once they complete the items inthe instructions, they hand in their work without spending the time reflecting on whatthey have just done or attempting to improve the quality of their work (Holmes & Bonn,2013; Kanari & Millar, 2004; Se´re´ et al., 2001). The five goals above, and the goals for theSQILab structure, centre around improving students’ reasoning and sensemaking in the labby focusing on reflection and iteration behaviours.The SQILab structure involves an iterative inquiry model for conducting experiments,based in the quantitative analysis of data. First, students conduct a somewhat prescribedexperiment, the analysis of which attends to a particular new data handling skill, which may132.2. Key features of the SQILabshave been previously developed using an Invention activity. When I say prescribed, I meanthat the measurement to be made is not selected by the students, though the way in whichthe data is collected is up to the students’ discretion. After collecting data, students reflecton their data and results and design a follow-up investigation based on these outcomes. Thecycles of reflecting and iterating naturally lead to opportunities to evaluate the experimentor the physical models under investigation. While these are desirable behaviours, it isimportant to motivate this cycle and to promote its spontaneous use when instructions toreflect and iterate are removed.The key here is for the reflection to authentically lead into the follow-up experiment.Let’s take a case where the outcome of the initial experiment is in some form of comparison,either to compare two individual measurements or to compare a set of co-varying data toa theoretical model. Quantitatively, the latter is generally done using the weighted χ2equation in least-squares fitting, described earlier as one of the statistical tools already usedin the lab course. The equation,χ2w =1NN∑i=1(yi − f(xi))2δy2i, (2.1)takes the average squared difference between each of the measured dependent variable val-ues, yi, and the corresponding model values, f(xi), as a fraction of the measurement uncer-tainty, δyi. With a given model form, f(x) (for example, linear, exponential, or power-law),χ2w is minimized with respect to the model parameters to determine the best-fitting param-eter values associated with that model form.The χ2w value itself also provides useful information to the experimenter about thesuitability of the model. A χ2w value around one means that the measured values, yi, are,on average, about one unit of uncertainty away from the model. Mathematically, this isderived as follows:142.2. Key features of the SQILabsχ2w Value Interpretation Follow-up action0<χ2w<1 Good fit or uncertaintymay be overestimatedImprove measurements, especially to reduce uncer-tainty1<χ2w<9 Unclear whether fit isgoodImprove measurements, especially to reduce uncer-tainty9<χ2w Model is a poor fit tothe dataImprove measurements, perhaps to correct system-atic effects, or evaluate the appropriateness, limi-tations, or assumptions of the modelTable 2.1: Decision tree of possible behaviours based on the outcomes of a least-squares fitbetween a student’s data and a model.χ2w =1NN∑i=1(yi − f(xi))2δy2i= 1⇒N∑i=1(yi − f(xi))2δy2i= N∵N∑i=11 = N∴(yi − f(xi))2δy2i≈ 1⇒ yi − f(xi) ≈ δyiTherefore, χ2w values around one suggest that the variability of the data from the model is onthe order of the size of uncertainties, suggesting a good fit. Values less than one may suggestthat the uncertainties have been overestimated, since the variability in the measurementsis less than the given uncertainties. Values greater than nine mean the measurements differfrom the model by more than three units of uncertainty, suggesting a poor fit. Valuesbetween one and nine are in tension, encouraging an improved measurement to obtain amore definitive value. The particular scale used here is not an absolute scale, but chosen asa reasonable scale to assist with the conceptual understanding of the index and to fit withthe quality of the students’ measurements in the given experiments. Indeed, the particularcut-offs for agreement or disagreement vary between fields and researchers (Krzywinski& Altman, 2013). With an appropriate scale in mind, follow-up investigations becomeapparent, either to improve the quality of the measurement to reduce uncertainty or toevaluate the model choice, identify unjustified assumptions, or address possible systematiceffects (see Table 2.1).152.2. Key features of the SQILabsIn previous iterations of the lab course, weighted least squares fitting was not intro-duced to students until half-way through the second semester of the lab course, whichwould mean that the iteration behaviours could not be justified until the end of the labcourse. Fortunately, comparing individual measured values is also subject to the sameset of behaviours using overlapping uncertainty ranges. Instead of point-like dichotomouscomparisons of overlapping ranges typically used in introductory physics labs (Lippmann,2003) (with overlap meaning agree and no overlap meaning disagree), one could examinewithin how many units of standard uncertainty the values overlap (that is, 1σ, 2σ, or 3σdifferences):1σ : A−B ≈ δA,B2σ : A−B ≈ 2δA,B3σ : A−B ≈ 3δA,Bwhere A and B are individual measurements and δA,B is a combined uncertainty. Thiscontinuous scale maps nicely on to the χ2w structure and is much more consistent withset-like reasoning as it highlights the continuous probability distributions that uncertaintiesrepresent (Buffler et al., 2008, 2007). In order to define the particular δA,B values, wesuggest using a combined uncertainty quantity consistent with the recommendations by theInternational Organization for Standardization in their publication Guide to the Expressionof Uncertainty in Measurement (GUM) (BIPM, IEC, IFCC, IUPAC, IUPAP & OIML, 2008).That is, the uncertainty of the difference A−B. As such, we have defined the t′-score:t′ =A−B√δ2A + δ2B, (2.2)where A and B are two independently measured values and δA and δB are their associateduncertainties. We use the letter t due to the structural similarities with Student’s t. Thestatistic differs significantly from Student’s t, however, in that rather than comparing samplemeans according to the sample standard deviations (which also requires the samples be ofthe same size and come from populations with the same variance), the t′-score uses anymeasurement value with its associated uncertainty. As such, we do not try to interpretthe t′-scores on the t-distribution or make inferences about probabilities. Indeed, if themeasurements being made are means of a sample of measurements, then the t′-score wouldapproximate to Welch’s t or Student’s t, depending on the sample sizes and populationvariances (the Welch’s t is an approximation of Student’s t that deals with violations of theassumption of homogeneity of variance).162.2. Key features of the SQILabsThe modified t′-score from Equation 2.2 provides the desired continuous scale for com-paring measured values that includes their uncertainty. This avoids the dichotomous agreeand disagree language for comparing overlapping uncertainty ranges and requires both val-ues to have measurement uncertainty, moving students further away from point-like thinkingand comparisons to ‘true’ values. Students could, however, make comparisons between onemeasurement with uncertainty and a theoretical value without uncertainty (so that only oneuncertainty would appear in the denominator). This structure is more similar to that of theχ2w value (Equation 2.1), where one compares the measured data values with uncertainty tothe theoretical uncertainty-less model values. While this comparison is valid, it encouragespoint-like thinking by analyzing how different the measurement is from a true value. Thisputs back the importance and weight on the ‘actual’ value and reinforces the definition of‘error’ as a mistake, diverting the progress on developing expert-like interpretation of mea-surement uncertainties. For these reasons, this type of comparison is deliberately avoidedin the SQILab until χ2w analyses are reached, to avoid point-like reasoning and hunting fora specific, ‘true’ value.The t′-score, roughly speaking, describes, in units of uncertainty, the degree to which theresults differ from each other. Similar to the χ2w value structure in Table 2.2, a t′-score ofthree suggests that the two values are probably different, while a t′-score near one suggeststhat the two values may be the same, and a t′-score between one and three is in tension(Table 2.2). Again, the cut offs for each regime here are not absolute, but chosen to supportthe conceptual understanding of these comparisons and the quality of the measurementsbeing made. A higher-cut off for disagreement may be suitable in other situations or otherfields (Krzywinski & Altman, 2013). Regardless of the chosen cut off value, a large t′-scorecould suggest there is a systematic effect in place due to issues related to the experimentalmethod, or that the model through which the comparisons are being made is inappropriate,is limited in this circumstance, or involves an unjustified assumption. A small t′-score pri-marily motivates follow-up experiments that aim to achieve higher precision by decreasingthe size of the measurement uncertainties. Regardless of the outcome of an initial mea-surement, there are always options for follow-up measurements that differ for each group,meaning students always have the option and structure to keep working and improving theirmeasurements. Students have some autonomy as to the choice of measurement method andengage in a constrained experimental design process. Opportunities for autonomy throughstudent-designed experiments mean students are personally invested in the experiment andbegin to engage with authentic scientific inquiry behaviours and resources (Etkina et al.,2010). This also gives students an Invention opportunity, as they invent a new procedureto obtain a desired outcome, that is, an improved measurement (Schwartz & Martin, 2004).Of course, if uncertainties are as small as possible, a small t′-score could also suggest ahigh level of agreement between the measurements, though there is evidence that studentstend to overestimate or inflate their uncertainties especially in attempts to obtain agreement172.2. Key features of the SQILabst′-score Interpretation Follow-up action0<t′<1 Uncertainty may beoverestimatedImprove measurements, especially to reduce uncer-tainty1<t′<3 Unclear whether thevalues are differentImprove measurements, especially to reduce uncer-tainty3<t′ Measurements arelikely differentImprove measurements, perhaps to correct system-atic effects, or evaluate why the values may differbased on possible limitations or assumptions of themodelTable 2.2: Decision tree of possible behaviours based on the outcomes of a least-squares fitbetween a student’s data and a model.between measurements (Holmes & Bonn, 2013). Regardless of the size of the t′-score,students are always encouraged to double check their measurements and calculations, trydifferent measurements or methods, and compare their results and methods with their peers.All of these tactics are regularly used in experimental research. Developing the t′-score withstudents first prepares them for using χ2w values with larger data sets later in the semester.At this point we have not generalized t′ for comparing more than two values. Instead,students can conduct multiple pairwise comparisons when more than two measurementsneed to be compared.In summary, this framework offers students a language with which to discuss a con-tinuous and quantitative degree of confidence in comparisons between measurements andmodels and to reflect on the outcomes of an experiment. This language provides a numberof benefits, including:• setting agreement on a set-like, continuous scale, rather than the point-like binarynotion of agreement (agree or disagree);• providing a generic scaffolding upon which students can base their reflections in anylab situation;• driving students towards improved measurements and high-quality data in all labsituations; and• approaching real-world statistical usage in scientific experimentation.In the remaining sections of this chapter, I will describe how this framework was introducedto students and supported throughout the lab course, with detailed examples of specific labexperiments.182.3. Introducing the t′-score to students2.3 Introducing the t′-score to studentsWhile students in Year 1 were taught weighted least-squares fitting through the χ2w equation,we did not introduce t′-scores. In Year 2, we first introduced the t′-score to students duringthe second week of the lab course. At the start of the lab session, students learned aboutstandard deviation and standard uncertainty in the mean (as measures of uncertainty)through an Invention activity (see Day et al., 2010) and applied these tools to data collectedthe previous week from measurements of the period of a pendulum. Students were thengiven another Invention activity to introduce the t′-score concept. In Year 1, studentsreceived the same instruction about standard deviation and standard uncertainty in themean, but did not receive the t′-score instruction. Instead, they worked through an activityin a later lab session that focused on overlap of uncertainty ranges and described agreementon a dichotomous scale (overlap means agreement, no overlap means disagreement).As described previously, key to an Invention activity are deliberately chosen contrastingcases (Holmes et al., 2014; Schwartz et al., 2011), which should, when compared pairwise,elicit features of the target concept. Contrasting cases for the t′-score aimed to highlightthree main features:• that agreement depends on the relative means of two data sets or measurements,• that agreement depends on the relative widths of the distributions being compared,and• that there are degrees to agreement (that is, one may be more or less confident thattwo measurements differ compared to another pair of measurements).Thus, the contrasting cases given to students involved four pairs of measurement distribu-tions with their associated means and standard deviations (see Figure 2.1).One pair of distributions each had the same spread, but means such that the distributionsclearly did not overlap (Figure 2.1a). Another pair had the same spread as in the previouscase, but the means were closer so that the distributions did overlap (Figure 2.1b). Thethird pair had the same means as in the previous case, but larger spread (Figure 2.1c).An additional pair was included with different means and different spread (Figure 2.1d),making it slightly more ambiguous than the other cases. This pair was provided to studentslast, after they had gone through the other pairs one at a time.Students were asked, for each pair, to record how confident they were that the measure-ments were different on a scale of one to three, where one meant that they were not confidentthat the distributions were different and three meant that they were very confident thatthe measurements were different. They were then asked to invent an equation that wouldquantify their confidence in the disagreement between two measurements when applied tothe measured values and their associated uncertainties. After Invention, the instructor leda classwide discussion about the main features of the problem, which were then mapped to192.4. The SQILab framework in practice: pendulum for pros0 20 40 60 80 1000.000.040.08M=20,SD=5 M=80,SD=5(a)0 20 40 60 80 1000.000.040.08M=35,SD=5M=45,SD=5(b)0 20 40 60 80 1000.000.020.04M=35,SD=10M=45,SD=10(c)0 20 40 60 80 1000.000.040.08M=20,SD=5 M=80,SD=15(d)Figure 2.1: Four contrasting cases of distributions for the t′-score Invention activity. Thefour pairs of graphs extract features regarding one’s confidence in how different are thedistributions with respect to their means and standard deviations.quantitative components to then build up equation 2.2. The instructor then described thefollow-up behaviours and the rationale behind them, as described in the previous section(see Table 2.2).To put this structure into context, I will describe the first experiment to which studentsapplied the t′-score and its associated behaviours, Pendulum for Pros.2.4 The SQILab framework in practice: pendulum for prosThe Pendulum for Pros lab occurred in the second week of the lab course, immediatelyfollowing the instruction on standard deviation, standard uncertainty in the mean, and t′-scores. Standard deviation was conceptualized by combining the whole class’ measurementof the period of a common pendulum. Students developed a histogram of this data andcalculated the mean and standard deviation. Through additional measurements and discus-202.4. The SQILab framework in practice: pendulum for prossions of the sources of uncertainty in the measurement, it was developed that the reactiontime uncertainty was the dominant source of uncertainty and was estimated at 0.1s (thestandard deviation of the distribution). In future measurements, therefore, many studentsused this value as the uncertainty in any single measurement, with appropriate correctionsfor the number of trials if using a mean of several measurements. To put these skills touse, students were asked to measure and compare the period of a pendulum through timingmeasurements at two different amplitudes (angles of 10◦ and 20◦). The physical equationdictating this process usually presented to students in class is,T = 2pi√Lg, (2.3)where T is the period, L is the length of the pendulum, and g is the acceleration due togravity. This equation suggests that the period is independent of the amplitude of theswing and thus the measurements at 10◦ and 20◦ should not differ. The derivation ofthis equation, however, makes an approximation that assumes sin θ ≈ θ, which is onlyvalid for small angles. In this experiment, students were not given equation 2.3, but manystudents expressed familiarity with the equation from previous or current physics courses.If students were familiar with the approximation in the formula, they were often unsureas to where this equation is valid in terms of measurement. That is, practically speaking,what constitutes a small angle? With reasonable precision (uncertainties around 0.1% ofthe measured values), the two measurements at 10◦ and 20◦ are, in fact, distinct, with the20◦ angle being ‘large enough’ to vary from this approximation. Obtaining the required0.1% precision was a surprising achievement for the students compared to their previousmeasurement experience. Many students reported 1% precision or accuracy being a goodtarget level for them in the past.In the lab, students were instructed to make an initial round of measurements of theperiod at 10◦ and 20◦, compare the two values using a t′-score, reflect on and interpret themeaning of the t′-score, and then conduct follow-up measurements based on the t′-scorecomparison. While they were told to make measurements at two prescribed angles, theywere not told how to make the measurements, nor what follow-up measurements to make.As an example of how the process works, we will demonstrate one pair of students’progress through the experiment. The pair chosen were typical in their data quality, butstrong in their explanations and reasoning. Table 2.3 shows their three measurement at-tempts (though, only two attempts were explicitly requested in the lab instructions). Firstthey made 10 single-period trials and calculated means and standard uncertainties in themeans for the two angle measurements. Comparing these using a t′-score gave them a valueof 0.11. Based on the structure provided, this suggests that the values may be the same, butthey should improve their measurement to reduce their uncertainties. Without instructionon how to improve their data, they designed a measurement that let the pendulum swing212.4. The SQILab framework in practice: pendulum for prosStudents’ designAttempt Number of swings Number of trials 10◦ 20◦ t′1 Single period 10 trials 1.83 ±0.08 1.81 ±0.08 0.112 10 periods 5 trials 1.823 ±0.008 1.850±0.008 2.353 20 periods 5 trials 1.8303±0.004 1.851±0.004 3.66Table 2.3: Sample data sets produced by a group of students in the Pendulum experiment.The three iterations represent their progression through the activity as they attempt toimprove their measurement quality.10 times between measurements (so they only start and stop the stopwatch once every 10swings rather than once per swing), thus decreasing their uncertainty by a factor of 10.This gives an increased t′-score of 2.35, which is in the tension area, suggesting they shouldfurther improve the measurement to confirm whether they differ. The third data set in-creased the number of swings per measurement to 20, thus reducing their uncertainty inthe previous measurement by a factor of 2, and increasing their t′-score to 3.66, at whichpoint they can conclude the values are different.Analysis of all students’ submitted work demonstrated that the majority of studentsinitially reported a t′-score less than 2 after their first comparison. In the second roundof measurements, however, 28% of students reported a final t′-score greater than 3 and,on average, they decreased their uncertainties by 37%. Nearly all students reduced theiruncertainties through iterations and, on average, these uncertainties went down by a factorof 4. Students improved their measurement uncertainty either by increasing the number oftrials or by increasing the number of swings in each timing measurement.A similar version of this experiment was also used in Year 1 during a lab session thatfocused on making comparisons between measured data points using overlapping uncertain-ties. Students were asked to measure and compare the periods of the pendulum at threedifferent angles (5◦, 10◦, and 25◦). In this very unscaffolded experience, nearly all studentsmade a single trial of ten-swing measurements of the pendulum, achieving an average un-certainty of 0.017±0.002s. This was also the first measurement made by the example groupin Year 2 (Table 2.3), suggesting it is a common default measurement for students. By theend of the session in Year 2, students’ measurements involved, on average, three trials of40-swing measurements, achieving an average uncertainty of 0.003s. Table 2.4 summarizesthese distinctions. This data was extracted from a large sample of students’ lab books eachyear (Year 1, n=121; Year 2 n=90).It is clear, then, that the explicit instructions to iterate and improve measurements,in an autonomous way, improved the quality of students’ data. How students interpretand evaluate these results is also important. Figure 2.2 is an image of the final conclusionfrom one of the students in the example group. After their third comparison, the studentimmediately identifies (confesses) that this result is the opposite of what he had expected.222.4. The SQILab framework in practice: pendulum for prosYear Number of trials Number of swings Average uncertaintyYear 1 1.008±0.008 10±0 0.017 ±0.002sYear 2 3.0 ±0.2 40±2 0.0035±0.0005sTable 2.4: Summary of students’ behaviours each year during similar experiments comparingthe period of the pendulum at different amplitudes. Data represent mean values within theclass samples and standard uncertainties in the means.He expected, from class, that the pendulum periods should be the same. He does, however,recognize the limitations of the model, namely that it is only valid for small angles. Hethen begins to tie together the theoretical model, the mathematical approximation, and thereality of experimental measurement as he discusses how 20◦ is not a ‘small enough’ anglewhen precision is high.Figure 2.2: One student’s conclusion at the end of the Pendulum experiment discusses theconflict between the outcome they expected and the results they have obtained.Through this experience, the students have seen that they can make precise measure-ments that are better than the approximations they see in class. That is, they can observeand measure physics more accurately than the physics they are presented in class. This isa non-trivial experience given that many students expect to make poor quality measure-ments, especially poorer than expert physicists (Allie et al., 1998; Holmes & Bonn, 2013;Se´re´ et al., 2001). This immediately removes ‘human error’ as an acceptable limitation232.5. Scaffolding in the SQILabfor an experiment. The student also begins to explore the limitations of theoretical mod-els, recognizing that the physical world is often more complicated than what is presented.This experience has set them up for future model-based discussions about approximationsthat are made, either about the physical models or the measurement system (Zwickl et al.,2013a). This also suggests to students, somewhat implicitly, that they should attempt highquality measurements in the lab, since the results may not be as expected.Through this activity, the students worked iteratively through the three desired phasesof critical thinking: reflecting on the outcomes of measurements, iterating to improve theirmeasurements, and evaluating the experiment and a theoretical model. The comparisonsact to structure these cycles, allowing students free agency within designed constraints,since the cycles are defined by the outcomes of the individual student’s experiments, ratherthan by the instructor or written instructions. In this way, students obtain individualizedself-regulated learning experiences (Butler, 2002), though instructors and TAs are presentto support the students. The benefit of this cycle is that the iterative inquiry process doesnot depend on the instructor or TA support. The cycle of measurement, comparison, andfollow-up action provides a more authentic laboratory experience that can have a positiveimpact on student behaviour and motivation, provided that careful attention is paid toproviding students with the tools to succeed and sufficient time to reflect and revise. Theprocess is deeply rooted in the concepts of uncertainty and measurement and is facilitatedby the application of quantitative analysis tools. By regularly making comparisons thatinvolve measured quantities (rather than theoretical quantities), uncertainty moves from anabstraction to a real issue that the students must manage. The cycles provide students alanguage with which they can reflect on their data, facilitating the comparisons better thanoverlapping uncertainty ranges, which have been shown to be challenging (Kung & Linder,2006; Volkwyn et al., 2008).In this activity, however, the students were instructed to make comparisons betweentheir measurements and iterate. One of the aims of this work is for students to engagein these independently. To achieve this, we used a faded scaffold system across the year(Butler, 2002).2.5 Scaffolding in the SQILabThe aim of the new structure was for students to reflect on the outcomes of their mea-surements, iterate their experiments to improve their measurement quality, and evaluatethe experiments and theoretical models. While students in the SQILab (Year 2) had thet′-score (or χ2w) decision tree available to them, they would not necessarily apply it to theirmeasurements spontaneously. So, to engage students in these behaviours, we built a scaf-folding system at the beginning of the lab course and faded it over time. The primary modesof structure were through explicit written instructions, marks in the marking scheme, and242.5. Scaffolding in the SQILabTA or instructor support in class.As an example of scaffolding through written instructions, I will summarize the in-structions for one of the Light Intensity experiments (the details of the experiment will beelaborated on in chapter 3). By the end of this experiment, students should have produceda high-quality data set of the light intensity of a light bulb as a function of the number oftranslucent sheets blocking the light, including two graphs: light intensity versus number ofsheets and semi-log of light intensity vs number of sheets. They were given explicit learninggoals to use semi-log plots to reveal exponential behaviour in scatter plots and to reflect ondata while it is collected, in order to continuously improve a scatter plot.The explicit instructions were as follows:THE EXPERIMENTTo develop a model relating intensity to the number of plastic sheets blockingthe light, make measurements using a wide range of number of sheets. To getstarted:• Write down a plan for your first 10 measurements.• Make measurements with several different numbers of sheets (about 10measurements) in front and remember to:– adjust the light’s distance so that the intensity reading is 500 lux with-out any sheets in front of it– regularly remeasure the intensity with no sheets in the way is near 500lux. Adjust distance slightly if it is has changed more than 1%.– check your meter zeroing occasionally– plot the data on a linear-linear plot as you go alongREFLECTING ON YOUR DATAOnce you have an initial set of data, consult with another group and decidewhat to do next to improve the quality of the data set. This could include, butis not limited to:• trying more thicknesses of sheets• trying more extreme thicknesses of sheets• retaking some data points (but do not throw data out)Based on your discussion, write down what you think is wrong with your dataso far, and what you plan to do next, and then take more data.THE SEMI-LOG PLOTProduce a semi-log plot of your data so far to qualitatively confirm whetheror not it shows exponential decrease. Again, confer with others about whatextra data you might need to improve the semi-log scatter plot.252.5. Scaffolding in the SQILabAgain, write down what you think needs improvement and what measure-ments you need to do to improve things. Continue until you are satisfied withyour final result.You need to include uncertainty in these plots and we will show you how tohandle that when rescaling graph axes.Once you are satisfied with the data, and are satisfied with your semi-logplot, write down the model that describes intensity versus number of sheets.Marking Scheme• 2 marks for description of experimental technique for your first measure-ments• 2 marks for critique of first data and plan for more data-taking• 2 marks for critique after your second set of data and plan for more data-taking• 3 marks for total, final data set with uncertainty• 2 marks for semi-log plot of data• 2 marks for conclusion, including estimate of the coefficient in the exponen-tial (what model describes the data, including estimates of the coefficients)From this example, the explicit instructions to reflect and improve measurements areprovided at key comparison moments and are further supported in the marking scheme.Since the evaluation of students’ lab books is the sole determination of their marks for thelab, the marking schemes were deliberately structured so that course assessment alignedwith the lab course goals. Marking schemes, therefore, are considered part of the scaffolding.Indeed, students’ study and learning habits are often motivated by grades (Elby, 1999) andso, if grades need to be assigned, it is important for those grades to be aligned to thedesired learning behaviours (Butler, 2003; Elby, 1999, 2001). In addition to the explicitwritten support (marking schemes and instructions), the TAs and instructor would discusseach of the cycle phases with the student groups, focusing on asking questions about whatthe students are doing and why they are doing it.Early experiments began with all of these support systems in place, and written instruc-tions were faded first (keeping marks in the marking scheme and TA support). The markingscheme was faded next to a non-specific set of marks for the entire experiment. In the LRcircuits example (this will be elaborated on in chapter 3), the marking scheme was simply:11 marks: includes high quality measurement of time constant versus resistance,plotting and fitting a model to the data, and conclusionsA number of experiments also had TA and instructor support removed to observe howstudents would behave independently. Student behaviour when all scaffolding is removed,262.5. Scaffolding in the SQILabtherefore, provide an image of behaviours that had become habitual with students as wellas those that they deemed expected behaviours. That is, even though explicit instructionsand grades were not present, students may have thought they were expected to engage inthose behaviours. Determining the expected behaviours that have proved successful in thepast is an important trait of self-regulated learning (Butler, 2002).Scaffolding for comparisons, reflections, and iteration can be each defined on three lev-els: high scaffolding involved explicit instructions and marks in the marking scheme, lowscaffolding involved only explicit marks in the marking scheme, and no scaffolding did notinclude any support. The different comparisons to be made are also an important feature;these were either comparing individual data points or comparing data sets to models, espe-cially graphically. The types of comparisons and levels of scaffolding across experiments inYear 2 can be found in Table 2.5. A thorough description of each of the experiments usedin the lab course can be found in chapter 3.272.5.ScaffoldingintheSQILabExperiment Week Comparison Issue Scaffolding LevelType Built In Comparison Reflection IterationTe rm1Pendulum I 1 None No None None NonePendulum for Pros I 2 Measurements Yes High High HighPendulum for Pros II 3 Measurements Yes High Low NoneAnalog & Digital Uncertainty 4 Measurements No None None NoneLight Intensity I (LI1) 5 Model No High High HighLight Intensity II (LI2) 6 Model Yes High High HighRadiation Shielding I (RS1) 7Measurements& ModelYes High Low LowRadiation Shielding II (RS2) 8 Model Yes None None NonePendulum Exam 9 Measurements Yes None None NoneTe rm2Mass on a Spring I (MS1) 1 Model No None Low NoneMass on a Spring II (MS2) 2 Measurements Yes High None NoneStanding Waves (SW) 3Measurements& ModelNo Low Low NoneIndex of Refraction (IoR) 4 Measurements Yes None None NoneDiffraction 5 Measurements Yes High None NoneRC Circuits I (RC1) 6 Model No None None NoneRC Circuits II (RC2) 7 Model No High None NoneLR Circuits (LR) 8 Model Yes None None NoneHydrogen Spectrum 9, 10Measurements& ModelNo None None NoneTable 2.5: Timeline of experiments and associated scaffolding in Year 2. Comparisons were either made between individual measurementsor between data and models. Issues such as common errors or limitations of models were built in to many experiments. A high levelmeans the item was explicitly instructed and there were marks in the grading scheme. Low level means the item was only present inthe grading scheme.282.6. Summary2.6 SummaryIn this chapter, I described the main elements of the pedagogy employed in the SQILabs.The main features are for students to make structured comparisons between measurementsor models, reflect on those comparisons, and iterate to improve the quality of their measure-ments. The comparisons were supported using quantitative tools such as χ2w and t′-scores.These behaviours were highly structured at the beginning of the year through explicit in-structions in the lab materials, assigned marks in the grading scheme, or TA and instructorsupport. This support was faded across the year. In the next chapters, I will present ev-idence of students’ improved behaviours in the new lab structure, focusing on reflecting,iterating, and evaluating theoretical models, as well as examining their attitudes and moti-vation. These behaviours will be compared to a previous implementation of the lab coursethat did not include the SQILab framework, but did include many of the same experi-ments and statistical tools. The difference between these two years will be more thoroughlydescribed in the next chapter, which will also present the overall assessment methods.29Chapter 3MethodsWhile the analysis methods will be explained in each data chapter more explicitly, thischapter aims to provide an overview of the full research methodology, which would begenerally defined as action research.3.1 ParticipantsParticipants were approximately 300 students enrolled in an introductory physics lab courseat the University of British Columbia across two academic years. Different numbers ofparticipants were involved in different sets of analyses and so will be specific at each analysis.The students in the lab course were mostly first-year students. Students were enrolled inthe lab course through one of two course streams, both of which are considered enrichedcurriculum courses, meaning the content is above that typically presented in a first-yearphysics course. Both courses are calculus-based and students enter the courses with meanscores on the Force Concept Inventory (Hestenes, Wells & Swackhamer, 1992) above 75%.Approximately 25% of the students intended to major in physics or astronomy, thoughnearly all students intended to major in a science discipline.Most data throughout the thesis is presented either as aggregate data from all partici-pants each year or through aggregates of random samples of participants each year. Whereindividual student data is presented (either through images of student lab work or excerptsfrom students interviews), the data was deliberately selected to illustrate a particular case.This work is included with permission from the students.3.2 Course experimentsThroughout the thesis, students’ written work in a number of experiments was analyzedand compared across years. To understand students’ written work, and to understandthe distinctions between the two years, it is important to have context for each experiment,which I will provide here. For each experiment, I will draw attention to the new statistical oranalytic focus of the experiment (summarized in Table 3.1), the differences in the experimentbetween Year 1 and Year 2, and describe the level of scaffolding in Year 2 (which is alreadysummarized in Section 2.5).303.2.CourseexperimentsYear 1 Year 2Statistical Tool Experiment Week Experiment WeekHistograms Pendulum I 1.1 Pendulum I 1.1Mean, σ, σx¯ Pendulum for Pros I 1.2 Pendulum II 1.2t′-score na na Pendulum II 1.2Overlap Comparing & Predicting 1.5 na naSemi-log plots Light Intensity I 1.7 Light Intensity I 1.5Log-log plots Light Intensity II 1.6 Light Intensity II 1.6Residuals na na Light Intensity II 1.6Unweighted chi-squared Standing Waves II 2.2 Radiation Shielding II 1.8Uncertainty in slope of unweighted fit line LR Circuits 2.7 Mass on a Spring 2.2Weighted average Index of Refraction 2.4 Index of Refraction 2.4Weighted chi-squared RC Circuits II 2.6 RC Circuits I 2.6Table 3.1: The table shows the timeline of the statistical tools introduced to students throughout each year.313.2. Course experiments3.2.1 Pendulum experimentsThe first experiment of the year introduces students to histograms through an Inventionactivity, which students then apply to class-wide measurements of the period of a singlependulum (P1). This session was the same both years.In Year 1, however, the second week of the lab course introduced students to the mean,standard deviation, and uncertainty in the mean through a follow-up Invention activity.They then also applied these skills to their measurements of the period of the pendulum theweek earlier and compared the standard deviation and uncertainty in the means each week.In Year 2, the same Invention activities were used to develop these same concepts (mean,standard deviation, and uncertainty in the mean), but an additional Invention activitydeveloped the t′-score and the rest of the session involved the Pendulum for Pros experiment(part 1).3.2.2 Pendulum for pros experimentsIn Year 1, the Pendulum for Pros experiment (P2) involved measuring the period of thependulum at three different angles of amplitude, 5◦, 10◦, and 25◦. The statistical focuswas to compare the measurements using overlap of uncertainty ranges and students wereexplicitly instructed to make the comparisons between the three angles. The experimenttook place in week five of the first term.In Year 2, the Pendulum for Pros experiment was spread across two lab sessions duringweeks two and three of the first term. The first half (P2), which was described earlier inthis chapter, involved comparing measurements of the period of the pendulum at 10◦ and20◦. The statistical focus was on calculating the mean, standard deviation, and uncertaintyin the mean of a set of trial measurements, and on comparing values based on t′-scorecalculations. Scaffolding included explicit instruction to compare their measurements usingt′-scores, reflect on that comparison, and iterate to improve their measurement based onthe reflection.The following week (P3), students measured the pendulum at five different angles from5◦ to 25◦. With reasonable precision and accuracy, students could observe the quadraticsecond-order angle dependence to the period of the pendulum when graphed. In this exper-iment, scaffolding was removed and students were instructed to make high quality measure-ments of the period of the pendulum, improving over the previous week’s measurements.They were not instructed to iterate or improve their measurements. This experiment, there-fore, provides an example of the quality of student work when the scaffolding is immediatelyremoved. This experiment did not introduce a new statistical tool and, instead, allowedstudents to practice the tools they had learned thus far, though it was their first graphingexperience.323.2. Course experiments3.2.3 Analog and digital measurementsThis experiment taught students about Type B evaluations of uncertainty (Allie et al., 2003;BIPM et al., 2008), which depend exclusively on the non-statistical information about themeasurement instrument, for example, the precision of the scale or instrument specifications.In both years, students explored these concepts using Invention activities and then appliedthem to a series of different measurements that involve this uncertainty. In Year 2, thissession did not involve any reflective or iterative behaviours, and so will not be discussedin other elements of the thesis. It is important, however, for the reader to be aware of thisconceptual treatment of uncertainty, and we recommend the work of Allie, Buffler, Lubben,and colleagues for a more thorough treatment (see Allie et al., 2003; Buffler et al., 2008,2003, 2007; Pillay, 2006; Pillay et al., 2008). In future iterations of the lab course, we planto enhance this session by exploring the limitations of measurement devices and furtherexposing the underlying measurement models of equipment (Zwickl et al., 2013a).3.2.4 Light intensity IThis experiment (LI1) involved measuring the exponential decay of the light intensity of abulb as it is filtered through translucent sheets. Students were taught how to linearize ex-ponential data using semi-log scaling through a worked example. They took measurementsof the light intensity of the bulb as a function of the number of sheets and plotted it onlinear-linear axes (intensity versus number of sheets) and semi-log axes (natural logarithmof intensity versus number of sheets). The aim of the experiment was to determine therelationship governing the decay.In Year 2, students were explicitly instructed to iterate their measurements throughoutthe process. The experiment was broken down into several steps. First, students were totake a few measurements across a wide range of values and make an initial plot on linear-linear axes. They were then asked to reflect on the quality of that plot and identify waysto improve their measurement. After carrying out those improvements, students were thentold to linearize the data using semi-log axes and reflect and improve again, conducting afinal reflection. There were no common systematic effects, model adjustments, or modellimitations in this experiment. It did aim to target efficient data collection methods.3.2.5 Light intensity IIIn this experiment (LI2), students measured the power-law decay of light intensity as afunction of distance from a sensor. Students were taught how to linearize power-law datausing log-log scaling through a worked example. In Year 1, this experiment was carried outover two sessions, with the first session also involving another measurement. In the firstweek, they made two measurements of the light intensity of the bulb 20cm and 40cm fromthe sensor. They were then asked to predict the light intensity at 80cm then measure it and333.2. Course experimentscompare to their prediction. This gave students the opportunity to explore the non-linearbehaviour of the light intensity. They then collected data at a wide range of distances andthe second week focussed on the analysis of this data. They were also asked to identify anysystematic deviations in their data, which aimed to elicit potential issues with the near-fieldeffects of the relationship where the theoretical 1/R2 behaviour breaks down. In Year 1,this experiment came before the Light Intensity I experiment.In Year 2, students were, again, explicitly instructed to iterate their measurementsthroughout the process. Similar to the Light Intensity I experiment, there were specificinstructions to reflect on the quality of data and iterate and improve their measurementsafter plotting each of three graphs (linear-linear axes, log-log axes, and finally power-lawscaled axes). Students were also told that the process is generally governed by a whole-integer power law, encouraging students to compare their data to a 1/R2 relationship, ratherthan fitting to the best-fit power law. Students were also asked to analyze the near-field dataseparately, taking measurements at close distances only after the rest of the analysis hadbeen conducted. They were instructed to compare the behaviour of the near-field data withthe far-field data. In Year 2, students were also introduced to plots of residuals, the differencebetween the dependent measured and model values as a function of the independent values.3.2.6 Radiation shieldingIn both years, the Radiation Shielding experiment (RS) immediately followed the two LightIntensity experiments. This experiment involved measurements of the exponential decay ofthe radiation count rate from a Sr-90 source as a function of thickness of aluminum shielding.Once the aluminum shielding becomes sufficiently thick to shield the Sr-90 radiation, thedata is dominated by unshielded background radiation. As such, the linearized semi-log plotof decay rate versus thickness looks to have two distinct model behaviours, one with thenegative slope representing the shielding and one of a constant background (see a sampledata set in Figure 3.1). The statistical focus was on introducing students to the Poissondistribution, which governs the randomness of the decay process and the uncertainty in theprocess. In both years, students’ attention was drawn to the presence of the backgroundradiation during the experiment.In Year 2, students were asked to initially make a high-precision measurement of thebackground radiation and compare, using a t′-score, to a measurement obtained by a peernearby. Then they would commence their measurements of count rate as a function ofshielding, with no explicit scaffolding to iterate or reflect. In both years, students werenot given a theoretical model to test and thus had to deduce the relationship through thelinearization tactics developed in the previous weeks. They also had to decide on their ownaccord how to deal with the background radiation. With two weeks of high scaffolding inthe previous experiments in Year 2, this experiment provides an example of the quality ofstudent work when scaffolding is removed. In the following week, students were introduced343.2. Course experiments34560 2 4 6Al Thickness (mm)ln(Count Rate)BkgDecayFigure 3.1: Sample data set by a student in the RS experiment demonstrates the decaymodel of the Sr-90 source as a function of aluminum shielding. The change in modelbetween the exponential decay and the constant background is distinct.to the unweighted chi-squared equation, which they applied to their RS data from theprevious week.3.2.7 Mass on a spring experimentsThe Mass on a Spring experiments (MS1 and MS2) took place during the first two weeksof the second semester of the lab course in Year 2 only. Students in Year 1 conducted thisexperiment as their final lab of Term 1, but the conditions differed too significantly fromYear 2 for comparisons to be made. The first experiment (Mass on a Spring I) involvedmeasuring the extension of a spring as a function of a downwards force. Students weregiven the governing theoretical relationship, Hooke’s Law, which states that the extensionis directly proportional to the force exerted, where the constant of proportionality is thespring constant, k.In the following week (Mass on a Spring II), the students were introduced to the equationfor the uncertainty in the slope of a best-fitting line through an Invention activity, which theyused to obtain the uncertainty in their spring constant from the previous week. They thenmeasured the spring constant through measurements of the frequency of oscillations of thespring with mass. With sufficient precision, students could obtain disagreement betweentheir two k -values if they do not account for the mass of the spring in their oscillationsmeasurement. In this experiment students received a faded level of scaffolding that did notinstruct them to iterate and improve, but did instruct them to make comparisons betweentheir two values of k.353.2. Course experiments3.2.8 Standing wavesThe Standing Waves experiment (SW) occurred in week three of the second term both years.It followed other experiments using the Standing Waves equipment in Year 1 and the Masson a Spring experiments in Year 2. This experiment involved measurements of the resonantfrequency of a wire as a function of tension. The wire oscillated due to interactions betweena current in the wire and a magnetic field around the wire. There were no new statisticaltools during this experiment, nor any common systematic effects or model corrections.As such, this experiment was another opportunity to reinforce their statistical tools andanalysis behaviours. Students were, once again, not instructed to iterate and improve, butin Year 2 they were instructed to compare the linear mass density of the wire (as obtainedthrough their modelling and analysis) with a value calculated from given measurements ofthe length and mass of the wire.3.2.9 Index of refractionThe Index of Refraction experiment (IoR) asked students to determine the index of re-fraction (n) of a plexiglass prism through the application of Snells Law (SL), the criticalangle for Total Internal Reflection (TIR), and Brewster’s Angle (BA). The experiment,instructions, and support were identical both years.Snell’s lawFor the first measurement, students were asked to use an incident angle of θincident = 60◦,measure the angle of the refracted beam, θrefracted, and then determine n and an estimateof its uncertainty from Snell’s Law:n =sin θincidentsin θrefracted. (3.1)The approximate orientation for the apparatus and relative positions of the beams were asin Figure 3.2. The incoming beam entered the flat side of the prism along the 0◦-line of theprotractor. For an incident angle of 60◦ the position of the normal line is rotated to alignwith the 60◦-line of the protractor. As such, the refracted beam exited the plexiglass alongthe 24.3◦ ± 0.2◦ mark of the protractor, setting the refracted angle at 35.7◦ ± 0.2◦ (relativeto the normal line). Using Equation 3.1, n would be 1.48 ± 0.01. The fixed nature of theincident beam (rather than the normal line) along the 0◦-line of the protractor led manystudents to incorrectly measure the refracted angle as 24◦, giving n around 2.13.Total internal reflectionFor the second measurement, students used the properties of TIR to determine n. Theywere asked to measure the critical angle of incidence, θcritical, beyond which the incident363.2. Course experimentsFigure 3.2: The diagram represents a schematic of the plexiglass and protractor orientationfor the SL measurement. The protractor has two 0◦ to 180◦ scales. The protractor andthe beam are fixed such that the plexiglass itself is rotated to obtain the desired incidenceangle.beam is only reflected, with no refracted beam, and then determine n and an estimate ofits uncertainty from:n =1sin θcritical. (3.2)In the SL measurement, the refracted beam is approximately 1 mm wide. As the incidentangle approaches the critical angle, however, the refracted beam spreads to nearly 10 cmin width. With appropriate identification of θcritical as the point at which the centre of thebeam is about to disappear, approximately 42.5◦±0.5◦ relative to the normal, Equation 3.2gives n of 1.48±0.01. Instead, many students take θcritical as the point where the refractedbeam has entirely disappeared, around 45◦, giving n of 1.41.Brewster’s angleFor the third measurement, students used a polarizer to determine n and its associateduncertainty using Brewster’s angle, θBrewster, the angle of incidence at which the reflectedbeam is completely polarized:n = tan θBrewster. (3.3)This measurement did not include any common systematic effects and, in general, stu-dents were able to accurately measure θBrewster at 56◦±1◦, with Equation 3.3 giving n of1.48±0.02.The values and their uncertainties are summarized in Table 3.2. For all three measure-ments, the uncertainties are determined through analysis of the precision of the measuring373.2. Course experimentsMeasurement Accurate Values Inaccurate ValuesSnell’s Law 1.48±0.02 ∼ 2.13Total Internal Reflection 1.48±0.01 1.41±0.01Brewster’s Angle 1.48±0.02 naTable 3.2: The table shows the accurate values of n for each of the three measurements inthe IoR experiment and the approximate values of the systematic effects for SL and TIR,as measured by the instructor and TAs in the lab course. Uncertainties are defined by theprecision of the measuring instrument.device, namely a protractor to measure the angles involved in the measurements, and prop-agating those uncertainties to obtain the uncertainty of n in each case. The apparatus hadsufficient precision that the systematic effects were identifiable and could be resolved by anexpert. Both years, there was a statistical focus on calculating the weighted average of aset of measurements with uncertainty.3.2.10 DiffractionIn Year 2, students used the laser from the IoR experiment to measure the spacing ofa diffraction grating. They were instructed to determine the grating spacing using firstand second order measurements separately and then compare. There was no scaffolding ofiteration or reflection, however, and there was no new statistical tool. This experiment wasnot used in Year 1.3.2.11 RC circuits experimentsTwo RC Circuits (RC1 and RC2) experiments occurred during week 6 and 7 of the secondterm, the weeks preceding the LR Circuits experiment, both years. The first experimentinvolved reproducing the voltage decay curve across the resistor in a parallel RC circuit(Resistor-Capacitor), primarily to become exposed to the oscilloscope and equipment, butalso to observe the exponential nature of the decay. The RC2 experiment involved measuringthe time constant of this voltage decay as a function of resistance.In Year 1, students were introduced to weighted least squares fitting using the weightedχ2 equation. In Year 2, students learned this tool during the previous week during RC1. InYear 2, there were no new statistical tools in the RC2 experiment, no common systematiceffects or model changes, students were not instructed to iterate or improve their measure-ments, but they were instructed, in the marking scheme only, to compare the value of thecapacitance from their analysis with the value set on the decade box.383.2. Course experiments3.2.12 LR circuitsNear the end of term, the students each year performed an experiment using an Inductor-Resistor (LR) series circuit, shown in Figure 3.3, where an inductor (L) and a resistor (R) areconnected in series with a square-wave, alternating current (AC) function generator as thevoltage source (Vo). The voltage across the resistor, VR, was measured using an oscilloscope.In a macro view, VR shows a similar square wave pattern to the source, Vo, but a closer lookat the decay edges shows a non-instantaneous drop off. That is, there is a non-zero timeconstant for the voltage across the resistor, which students were to measure. Since studentshad completed the RC circuits experiments in the weeks prior, they had some familiaritywith the equipment and measurement process. The instructions and support during thisexperiment were identical each year.VoLR VRFigure 3.3: Circuit diagram for the LR Circuit experiment, where an inductor (L) and aresistor (R) are in series with the AC function generator.In this lab, they were given the theoretical relationship between the time constant ofthe decay across the resistor and the components, namely:τ =LR. (3.4)They were asked to make at least 10 measurements of the time constant for different resis-tance values between 10 and 2000 Ω and to produce a plot of 1τ as a function of R to checkthe relationship. The goal provided to the students was “...to check this theory and deter-mine the value of the inductance L.” The range of resistance values provided to them wasdeliberately chosen such that, for large values of the resistance, the relationship appears tobe a straight line with an intercept of zero. A systematic deviation from the model emergesat the low resistances, however, if one ignores resistance in the additional components inthe circuit. The linearized axes selected made this deviation the most pronounced, as itemerges as a non-zero intercept in the plot. In contrast, the theoretical model provided tothem would be a straight line with an intercept of zero on the provided axes. An exampleof the data collected by a student in the class is shown in Figure 3.4. The single-parameterfit to the data (with no intercept), is given by the solid (red) line and the two-parameter fitis given by the dashed (green) line. In Year 1 students learned the equation for calculatingthe uncertainty in the slope of a best-fit line at the beginning of this experiment, which393.3. Other logistical changes0.00.10.20.30.40 500 1000 1500 2000R (Ω)1τ (µs)mxmx+bFigure 3.4: Example of the data collected by a student in the LR experiment, with differentpossible fit lines. The solid, red line shows a y=mx, one-parameter fit, while the dashed,green line represents a two-parameter, y=mx+b fit. Although the theoretical model recom-mends the one-parameter fit, the data suggests that a two-parameter model is better, dueto additional resistance in the other circuit components.students in Year 2 had already learned during the MS2 experiment.3.3 Other logistical changesA few additional changes were made to the lab course in Year 2. All labs were thoroughlytested and debugged by the lab course instructor and the TAs. Three students also vol-unteered to be part of a pilot group of students who met a week in advance of the rest ofthe class to pilot each experiment. This helped ensure the experimental set up, design, andinstructions permitted students to achieve a sufficiently high quality set of data in the timeallotted.As in previous years, students conducted each lab in pairs or groups of three, withgroups changing each week. In Year 1, all students’ books were marked each week. In Year2, only one student’s notebook per group was marked each week (chosen randomly by theTA). This, primarily, encouraged students to work collaboratively during the experiment,with an additional benefit of reducing the number of books the TAs needed to mark, thusproviding more time per book (Lippmann, 2003). Students also submitted their spreadsheetcalculations and graphs electronically rather than printing them.TA training was also changed in Year 2 to better highlight the pedagogical purpose of thelab, beyond the conceptual goals. TAs were provided with detailed notes about the contentof the lab in advance of weekly preparation meetings. In each meeting, TAs conducted the403.4. Data sourcesexperiment in pairs and added to the notes issues that they thought students may have andideas for what they should be doing to support the students at each step. These were thendiscussed as a group and the head TA (me) sent the TAs the summary before the lab weekbegan.3.4 Data sourcesMost of the analysis throughout this thesis was carried out on the contents of student labnotebooks. Students’ written work was coded through a number of different measures,which will be elaborated in each of the subsequent chapters. In all cases, I coded all thedata and another rater also coded 10% of the data to assess inter-rater reliability. Thedevelopment of the coding schemes as well as the results of reliability tests will also bespecified in each results chapter. Students’ written work was also supported by informalobservations of students in class and interviews with me outside of class. Several studentvolunteers participated in 1-hour interviews each year. The interviews did not use a formalinterview protocol and so are included only as anecdotal support of the quantitative results.The interviews did always began with an open-ended question that probed the students’perceptions of the goals and what they thought the goals of the lab course were. Responsesto this question, therefore, can be used for qualitative comparisons about students’ overallopinions of the lab course each year.41Chapter 4ReflectionThe first targeted critical thinking behaviour was for students to use a number of statisticaltools in their analysis and then to reflect on their application to further inform their methods,results, and conclusions. Rarely, however, do students spontaneously reflect on their analysis(Ryder & Leach, 2000; Se´re´ et al., 1993). Whether a student will choose to reflect on theircalculations has a strong dependence on their motivation and their epistemological beliefs,their beliefs about how knowledge is created. In a physics course, students draw on theseepistemological beliefs as they choose resources to apply to a learning situation (Hammer &Elby, 2003). This choice is often guided by instructional practices (Hammer & Elby, 2003)and may even compete with choices related to earning a good grade (Elby, 2001).In mathematical problem solving, many students will enter rote reasoning (Redish, 2014)or ‘recursive plug-and-chug’ (Tuminaro & Redish, 2007) epistemological stances, with littleto no sensemaking or metacognitive activity about why they are doing what they are doing(Schoenfeld, 1987). Employing this sense of metacognition is crucial for students to monitortheir own progress and to support higher-level behaviours such as iterating and evaluating.While previous versions of the lab course had significant focus on developing data handlingtools and calculations, in the SQILabs, we have provided students with a conceptual andprocedural framework with which to reflect on their data analysis procedures. We hypoth-esize that this framework will support students in combining their conceptual knowledge(about measurement and physics) with their symbolic knowledge (about the statistical cal-culations) (Kuo, Hull, Gupta & Elby, 2013), shifting them out of rote reasoning framestowards more reflective and metacognitive frames. We measure this via students’ use ofreflective comments in their lab note books in a variety of experiments.4.1 MethodsHere we evaluate students’ reflective comments associated with their use of a number ofstatistical tools for data analysis. The set of tools used throughout the lab course were:• Least-squares fitting and χ2 values (weighted and unweighted);• residual graphs;• t′-scores;424.1. Methods• comparing measurements through overlapping uncertainty ranges;• comparing the relative size of measurements not including uncertainties (difference innumbers);• consulting with peers;• plotted data (analyzed visually);• theoretical models (evaluating them in light of the data);• uncertainty in the fitting parameters;• graphical scaling techniques (ln-ln or semi-ln plots, in particular);• weighted averages;• measurement uncertainties.These tools are all useful for interpreting and evaluating the results of experiments. Toolswere introduced at different times each year and some tools were not appropriate for use ina particular experiment. Table 4.1 shows the distribution of the available and appropriatetools for each experiment each year. Residuals and t′-scores were not introduced to studentsin Year 1 and overlapping uncertainty ranges were replaced with t′-scores in Year 2.For 8 different experiments in the lab course, a random subset of 20-30 student notebookswas analyzed from each year. In three of the experiments (P2, IoR, and LR), the full classeach year was analyzed since these experiments were analyzed in more detail for additionalassessment measures. Three additional experiments in Year 2, that were not used in Year1, were also analyzed.In each case, the specific statistical tools used by the students were recorded from theirnotes. Any reflective comments associated with those tools were also coded. That is, foreach statistical tool, students received a code of zero for not using the tool, a code of onefor using the tool, or a code of 2 for using and reflecting on the use of the tool.434.1. MethodsYear 1Tool P2 P3 LI I LI II RS MS1 MS2 SW IoR RCII LRχ2 X X XResidualsModel X X X X X X XPlot X X X X X Xln-ln X X X X Xsemi-ln X X X X Xdm/db XUnc X X X X X X X Xt′-scoreOverlap X X X X X X X XDiff X X X X X X X XPeers X X X X X X X XWA XNum. Tools 5 na 7 7 8 na na 9 5 9 10Year 2Tool P2 P3 LI I LI II RS MS1 MS2 SW IoR RCII LRχ2 X X X X XResiduals X X X X X X XModel X X X X X X X X X XPlot X X X X X X X X Xln-ln X X X X Xsemi-ln X X X X Xdm/db X X X XUnc X X X X X X X X X X Xt′-score X X X X X X X X X XOverlapDiff X X X X X X X X X X XPeers X X X X X X X X X X XWA XNum. Tools 5 6 7 8 9 7 9 11 5 11 11Table 4.1: List of the number of analytic tools available for use and appropriate to use ineach experiment each year.Students’ reflective comments were also analyzed at a finer-grained scale for three ex-periments (P2, IoR, LR), this time including all students in the lab course. The reflectivecomments were coded using a set of four classes based on Bloom’s Taxonomy classes (An-derson & Sosniak, 1994). Figures 4.1 and 4.2 provide samples of this coding applied tostudent work. The four comments levels were:1. Application - a written reflection statement that demonstrates the use of the tool, forexample, “The χ2 value is 2.1.” These comments were distinct from procedural state-ments, such as, “Then we calculated the chi2 value.” Keywords: apply, demonstrate,compute, use.444.1. Methods2. Analysis - a written reflection statement that analyzes the use of the tool, for example,“Our χ2 value is 0.84, which is close to one, indicating that our model fits the datawell.” Keywords: interpret, compare, categorize, infer.3. Synthesis - a written reflection statement that synthesizes multiple ideas, tool analyses,or reflections to propose a new idea. This could include suggestions for ways to improvethe measurement (e.g. “we will take more data in this range, since the data is sparse”)or model (e.g. “there should be an intercept in the equation”), as well as makingcomparisons (e.g. “The χ2 value for the y=mx fit was 43.8, but for the y=mx+b fitχ2 was 4.17, which is much smaller.”). Keywords: create, propose, design, invent.4. Evaluation - a written reflection statement that evaluates, criticizes, or judges theprevious ideas presented. Evaluation can look similar to analysis, but the distinctionis that evaluation must follow a synthesis comment. For example, after a synthesis thatcompared two different models and demonstrated that adding an intercept loweredthe χ2 value, an evaluation could follow as, “...the intercept was necessary due, mostlikely, to the inherent resistance within the circuit (such as in the wires).” Keywords:justify, defend, judge, evaluate.Level 1 comments were not considered reflective and so would be considered only as toolsused in the previous, coarser coding scheme. Level 2, 3, and 4, comments, however, wouldbe given a code of 2 in the previous scheme as used and reflected on.Figures 4.1 and 4.2 demonstrate how the coding scheme is applied to three excerptsfrom students’ books in the LR experiment. Each of the levels build on themselves, so astudent making a level-4 evaluation statement would also have made lower level statements,though Level 1 comments (application) need not be present. The levels also did not needto build linearly in the text. Again, reflective comments were associated with specific tooluse, so the student in Figure 4.1 would have received a level 4 code on the χ2 tool for thisexcerpt. The student in Figure 4.2 would have, in the end, received a level 3 comment onχ2 and a level 4 comment on uncertainties.While it is important that students reflect on several of the tools used, it is likely thatthey will only reach the highest level of reflection on one or two tools. That is, if they reachevaluation through reflection of χ2 use, it is unnecessary to also reach evaluation throughreflection on the plot, residuals, and uncertainty in the fitting parameters, especially sincethe synthesis phase often involves drawing on a number of tools. As such, it is appropriateto look at the maximum reflection level a student reached, rather than the average ortool-by-tool. Ultimately, it is important whether students reach evaluation at all.For all of the analysis in this chapter, I coded all items and another rater coded ap-proximately 10% of the items. Inter-rater reliability analysis using Cohen’s κ statistic wasperformed to evaluate consistency between raters. Values greater than 0.6 were consideredsubstantial agreement. Several rounds of coding and comparisons were carried out before454.2. ResultsFigure 4.1: A student’s reflections from the LR experiment provides a clear sample of thecoding scheme. The student makes a level 1 comment about applying χ2 on their data,then analyzes that this value is high (level 2). A level 3 statement describes consideringa different model, and then the student finally evaluates the new model by describing themuch lower χ2 value.settling on the final coding scheme (described above). Through the initial rounds, discus-sion of discrepancies between raters allowed the coding scheme to be refined and furtherspecified. The inter-rater reliability for the raters on the tools used (whether or not eachtool was used) was found to be κ = 0.926, p<.001 and on the reflective tool-based comments(was the tool used or commented on reflectively) was found to be κ = 0.862, p<.001. Forthe quality of reflective comments (maximum reflection level a student reached during theexperiment), the inter-rater reliability for the raters was found to be κ = 0.659, p<.001.4.2 ResultsTo help support the discussion of the reflective comments, we will first examine the reflectivebehaviours of students in the LR experiment and then examine more broadly across theyear.4.2.1 LR circuitsThe LR Circuits experiment has been previously described in Section 3.2.12. For the currentdiscussion, it is important to highlight that this experiment occurred at the end of the termboth years, and the instructions and TA support were identical both years. This experimentalso involved an issue with the theoretical model due to additional resistance in the circuit,464.2. ResultsFigure 4.2: A student’s reflections on a variety of tools in the LR experiment, first withlevel 1 comments about χ2 and the inductance, then analyzing the fit line compared to themodel (level 2). They then comment on χ2 being small, attributing it to large uncertainties(level 3). They justify their uncertainty due to limitations of the measurement equipment(level 4). Finally they provide further suggestions for improvement (additional level 3).which was revealed as a non-zero intercept in the data, but a zero-intercept predicted bythe model.Tools used and reflected onThe number of tools students used and reflected on in the LR experiment was averagedacross each year. Since students in Year 2 had additional tools at their disposal (residuals),the averages were also normalized by the number of tools available to them (10 in Year 1and 11 in Year 2). Two independent samples t-tests were used to compare the fractionsof tools used and tools used and reflected on between years. There were significant effectsfor the fraction of tools used, t(241)=10.11, p< .001, and the fraction of tools reflected on,t(257)=13.19, p< .001, with students in Year 2 consistently using and reflecting on moretools. Figure 4.3 shows the mean numbers and fractions of tools used without reflectionand tools used with reflection each year. The height of the bars represents the total number(or fraction) of tools used, while the shading demonstrates the amount of reflection beyondapplication of tools.Quality of reflectionBeyond whether students were reflecting, the quality of students’ reflective comments isalso of interest. The maximum comment level for each student in the LR experiment wasaveraged each year and an independent samples t-test was used to evaluate the differencesin means across years. It was found that students in Year 1, on average, reached a maximumcomment level of 1.82±0.08, while students in Year 2 reached a maximum comment levelof 3.35±0.07, which was a significant difference, t(251)=14.14, p< .001. Figure 4.4 shows474.2. Results02460.00.20.40.6NumberFractionY1 Y2Year (LR Only)Tools Used and Reflected UponYear Y1 Y2Analysis Tools Used Only Used and reflected onFigure 4.3: Mean number and fraction of tools used and tools used and reflect on in the LRexperiment each year. The fractions are as a fraction of the tools available for use, whichdiffered each year. Error bars represent standard uncertainties in the mean.the distribution of maximum comment levels reached by students each year. It is clear thatthe distributions differ significantly, with most students focused on low-level application oftools in Year 1 and most students focused on high-level reflection in Year 2.As an example of this reflection, we break down the comment levels for a specific tool,the χ2 value (Figure 4.5). A χ2 test of independence on the distribution of students’ useof χ2 in their analysis demonstrated significant differences across years: χ2(5)= 100.57,p<.001. Figure 4.5 shows that students in Year 1 primarily calculated χ2 (used) with someincluding a written statement about that application (Level 1). Very few students in Year1 went beyond basic application of the tool. In Year 2, in contrast, many more studentsmade high-level reflection comments interpreting what that value means with regards totheir data and the theoretical model. Only a small fraction of students stopped at theapplication level.This tool in particular demonstrates the strength of the SQILab and, more specifically,the t′-score framework for making comparisons. In Year 1, χ2 fitting was primarily aprocedural tool for determining the best-fit line. In Year 2, it became a useful reflectivetool to understand one’s data and results.484.2. ResultsY1 Y20.00.20.40.60 1 2 3 4 0 1 2 3 4Maximum Reflective Comment LevelFraction of studentsFigure 4.4: Distribution of the maximum comment levels reached by students in the LR ex-periment each year, showing students in Year 2 making significantly higher-level reflections.Error bars represent 95% confidence intervals on the proportions.While there are some dramatic differences in students’ behaviours in this end-of-the-year, unscaffolded experiment, it is important to examine these elements across the fullyear before we can reach further conclusions about what components of the SQILab areresponsible for these effects.4.2.2 Year-long effectsWe will look at a similar set of analyses as those above, but spread out over a number ofexperiments across the year.Tools usedThe number of tools students used during each experiment was averaged over each year (Fig-ure 4.6a). Since students in Year 2 were introduced to more tools (i.e. t′-scores, residuals) atdifferent times, we normalize the number of tools used by the number of tools available forthem to use in that experiment (Figure 4.6b). A univariate Analysis of Variance (ANOVA)was used to compare the mean fraction of tools used (over tools available) between yearsand across experiments. The results of the ANOVA are presented in table 4.2, where dfrefers to the degrees of freedom in the analysis, F is the F -statistic to compare the means, ηis the effect size for the variable, and p is the probability that the effect would be observedif the null hypothesis (that there are no differences between years or across experiments)were true. All future ANOVA tables will be presented in this way without definition. Table4.2 shows that there were significant significant effects between both variables (so there wasa significant difference in tool use between years and and a significant difference in tool use494.2. ResultsY1 Y20.00.20.40.6Not Used Used Level 1 Level 2 Level 3 Level 4 Not Used Used Level 1 Level 2 Level 3 Level 4Level of chi−squared use and reflectionFraction of studentsFigure 4.5: Distribution of the maximum comment levels reached by students in the LRexperiment on their χ2 values. Used corresponds to students using χ2 values without any as-sociated comments, Level 1-4 comments are application, analysis, synthesis, and evaluationcomments. Error bars represent 95% confidence intervals of the proportions.across experiments), as well as a significant interaction (so the tool use in each experimentalso differed by year). This interaction is better understood by evaluating the results infigure 4.6b.Effect df F η pFraction of toolsusedYear 1, 1009 149.25 0.13 <.001∗∗∗Experiment 7, 1009 24.71 0.15 <.001∗∗∗Experiment*Year 7, 1009 23.52 0.14 <.001∗∗∗Table 4.2: ANOVA table for the number of tools students used in each experiment as afraction of the tools available to them in that experiment. ∗∗∗ p<.001.From Figure 4.6a, we see that students in Year 2 began the year using a similar numberof tools, but there is a significant change at the RS experiment. In Year 2, this experimentfollowed two experiments that introduced students to residuals and had explicit scaffoldingto reflect on different graphical representations of data (linearized and non-linearized) anditerate to improve their data. While there was no explicit iteration or reflection scaffolding,the RS experiment in Year 2 did prompt students to compare their measurement of thebackground radiation with a peer, as well as with a measurement of the Sr-90 count rateshielded by more than 3mm of Aluminum. This immediately engaged students with two oftheir available tools, peers and t′-scores and prompted them to the model issue, which maybe partly responsible for the increase in tool use. Additional tools may also have been used504.2. ResultsY1 Y202468P2 LI1 LI2 RS SW IoR RC2 LR P2 LI1 LI2 RS SW IoR RC2 LRExperimentNumber of Tools Used(a)Y1 Y20.000.250.500.75P2 LI1 LI2 RS SW IoR RC2 LR P2 LI1 LI2 RS SW IoR RC2 LRExperimentNumber of tools used/tools available(b)Figure 4.6: The distribution of students’ tool use in each experiment each year, as a) thenumber of tools used or b) the fraction of tools used relative to those available for use. Errorbars represent standard uncertainties in the mean.because all reflection scaffolding was removed and students were accessing their full toolboxto resolve the issues with the background. Indeed, the proportional tool use stays constant(though slightly lower) over the course of the year as the iteration and reflection scaffoldingremains low.The sharp drop-off in the number of tools used in the IoR experiment is because theexperiment did not involve any graphical analysis. The fractional use of available tools isconstant and consistently above that of students in Year 1. The next section will furtherexamine which tools were being used in each experiment in combination with the associatedreflective comments.4.2.3 Tool-based reflective commentsIn addition to whether tools were used in students’ analysis, the number of tools studentsreflected on during each experiment was averaged over each year (Figure 4.7a) and normal-ized over the tools available (Figure 4.7b). A reflective comment was categorized as any514.2. Resultswritten statement that analyzed the use of a tool.Y1 Y201234P2 LI1 LI2 RS SW IoR RC2 LR P2 LI1 LI2 RS SW IoR RC2 LRExperimentNumber of Tools Commented On(a)Y1 Y20.00.10.20.30.4P2 LI1 LI2 RS SW IoR RC2 LR P2 LI1 LI2 RS SW IoR RC2 LRExperimentNumber of tools commented on/tools available(b)Figure 4.7: The distribution of students’ reflective tool-based comments in each experimenteach year, as a) the number of tools commented on or b) the fraction of tools commentedon relative to those available for use. Error bars represent standard uncertainties in themean.A univariate ANOVA across years and experiments on the fraction of tools commentedon compared to tools available showed significant effects between both variables, as wellas a significant interaction between the two (see Table 4.3). Compared to the tool use,the reflective comments across experiments show a very different picture (Figure 4.7a and4.7b). While we saw previously that students in both years used approximately the samenumber of tools in the first two experiments, students in Year 2 reflected on more of theiravailable tools than students in Year 1. Once again, there is a significant change during theRS experiment in Year 2, for many of the reasons previously discussed regarding tool use.At the start of the second term (SW and onwards), the amount of reflection seems to drop,though much more significantly in Year 1 than Year 2.It is possible that students in Year 2 were only reflecting on more tools because theywere using more tools, which we saw in the previous section. If we divide the number of524.2. ResultsEffect df F η pYear 1, 1032 174.80 0.14 <.001∗∗∗Experiment 7, 1032 39.88 0.21 <.001∗∗∗Experiment*Year 7, 1032 8.10 0.05 <.001∗∗∗Table 4.3: ANOVA table for the number of tools students commented on in each experimentas a fraction of the tools available to them in that experiment. ∗∗∗ p<.001.Effect df F η pYear 1, 1032 111.46 0.10 <.001∗∗∗Experiment 7, 1032 79.45 0.35 <.001∗∗∗Experiment*Year 7, 1032 8.33 0.05 <.001∗∗∗Table 4.4: ANOVA table for the number of tools students commented on in each experimentas a fraction of the tools used in that experiment. ∗∗∗ p<.001.tools reflected on by the number of tools used, we still see significant differences across yearsand across experiments (Table 4.4). Figure 4.8 helps to visualize the interplay of tool useand reflection more clearly. This figure also includes additional experiments analyzed inYear 2 that were not used in Year 1.From the heights of each bar in Figure 4.8, one can see the total statistical tool use ineach experiment, while the shading of the bar demonstrates whether it was used mostly ina procedural (faded colour) or reflective (solid colour) context. Looking at the amount ofreflection alone may lead one to suggest that, as new tools were introduced, unsupportedreflection became more difficult. The number of tools being used in Year 1, however,remained quite steady towards the end of the lab course, though the amount of reflectiondecreased significantly. It appears, then, that students in Year 1 progressed into a proceduralplug-and-chug stance, where they were using the tools, but thinking less about their use.That is, the lack of reflection in Year 1 is not solely attributed to too many tools, since thetool use in, for example the LI2 experiment is the same as that of the SW, RC2, and LRexperiments (heights of the bars), though the amount of reflection decreases significantly(shading).534.2.ResultsY1 Y202468P2 P3 LI1 LI2 RS MS1MS2 SW IoR RC2 LR P2 P3 LI1 LI2 RS MS1MS2 SW IoR RC2 LRExperimentNumber of ToolsYear Y1 Y2Analysis Tools Used Only Used and reflected onFigure 4.8: Distribution of the number tools used and commented on in each experiment. Error bars represent standard uncertaintiesin the mean. The P3, MS1, and MS2 experiments were not analyzed in Year 1.544.2. ResultsEffect df F η pYear 1, 1054 177.48 0.14 <.001∗∗∗Experiment 7, 1054 33.31 0.18 <.001∗∗∗Experiment*Year 7, 1054 18.72 0.11 <.001∗∗∗Table 4.5: ANOVA table for the maximum reflective comment level reached by students inthree experiments. ∗∗∗ p<.001.Looking at the behaviours in Year 2, one can isolate three main common features thatsupported reflection: scaffolding that instructed students to reflect and iterate (as in theP2, LI1, and LI2 experiments), opportunities to make comparisons between two measuredvalues (as in the P2, P3, RS, and SW experiments), and an accessible model feature tobe evaluated (as in the P2, P3, RS, and LR experiments). Once, again, there appears tobe a shift in behaviours triggered by the RS experiment. There is, however, a decreasein reflection thereafter. This may be because students found some tools more useful forreflection than others for understanding and interpreting data. That is, they may not havefound it necessary to reflect on a variety of tools, if one or two tools were proving to beparticularly useful.Figure 4.9 shows the distribution of tool use and reflection across experiments commonto Year 1 and Year 2 on each statistical tool involved in the lab course. Other than the P2experiment in Year 2, uncertainties were generally procedural, including the uncertainty inthe slope. In the P2 experiment students in Year 2 were exploring how their uncertaintieschanged with different methods, leading to reflection about these changes. The t′-scorereflection structure, introduced earlier in the lab course, is easily mapped to the weightedχ2w value later in Year 2, as demonstrated by the significant reflection on both of those toolsin Year 2. Residuals were also a consistently useful reflective tool in Year 2. Indeed, filling atoolbox with a set of useful reflection tools increased the amount of reflection, but studentsin Year 2 were even reflecting more on their plots (see LI1, LI2, RS, and LR). This figuremore clearly demonstrates how students in Year 1 progressed into a ‘plug-and-chug’ stance,while students in Year 2 maintained reflective behaviours.4.2.4 Quality of reflective commentsTo evaluate the quality of reflection, the reflective comments were broken down into thefour categories described in the Methods section. This analysis was performed on samplesof student work from multiple experiments each year. A univariate ANOVA across years andexperiments on the average maximum comment level reached by students showed significanteffects between both variables, as well as a significant interaction between the two (seeTable 4.5). Figure 4.10 shows that students in Year 2 were, apart from the IoR experiment,consistently making more higher level comments than students in Year 1.554.2. ResultsY1 Y20.00.51.00.00.51.00.00.51.00.00.51.0P2LI1LI2RSChi−squared Residuals Model Plot ln.ln semi.ln dm.db Unc t.score Overlap Diff Peers WA Chi−squared Residuals Model Plot ln.ln semi.ln dm.db Unc t.score Overlap Diff Peers WAToolFraction of StudentsAnalysis Tools Used Used + ReflectedYear Y1 Y2Term 1Y1 Y20.00.51.00.00.51.00.00.51.00.00.51.0SWIoRRC2LRChi−squared Residuals Model Plot ln.ln semi.ln dm.db Unc t.score Overlap Diff Peers WA Chi−squared Residuals Model Plot ln.ln semi.ln dm.db Unc t.score Overlap Diff Peers WAFraction of StudentsTerm 2Figure 4.9: Distribution of the fraction of students using (faded colour) and reflecting on(solid colour) each tool (x-axis) in a variety of experiments (right-hand y-axis) across thelab course each year. Error bars represent 95% confidence intervals of the proportions.564.2.ResultsP2 LI1 LI2 RS SW IoR RC2 LR0.000.250.500.750.000.250.500.75Y1Y20 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4Maximum LevelFraction of StudentsYear Y1 Y2Figure 4.10: The distribution of students’ maximum comment level across experiments each year, where error bars represent 95%confidence intervals of the proportions. Level 0 means no comment was made, Level 1 comments are application of tools, Level 2comments analyze application of tools, Level 3 comments synthesize multiple ideas, and Level 4 comments are evaluative reflections.574.2. ResultsIn the P2 experiment, students in Year 2 were, on average, reasoning at a higher levelthan students in Year 1, primarily due to more students in Year 2 reaching evaluation (Fig-ure 4.10). This is probably attributable to the explicit scaffolding instructing students tocompare the pair of measurements, reflect on the comparison, and improve their measure-ments based on the reflection. Many students both years reached synthesis, primarily dueto students offering suggestions to improve their measurements or to explain discrepanciesbetween the angles (without evaluating the model). Stopping at analysis (level 2), typi-cally meant that students only analyzed whether their measurements agreed or not withoutfurther interpretation.Many students in the LI1 experiment in Year 1 reached synthesis (level 3), primarilybecause they took a set of data, attempted to linearize the data but realized they had col-lected it incorrectly, and proposed a method for fixing their data. This analysis of theirdata during linearization, leading to a designed change in their measurement process, con-stitutes a synthesis. In contrast, students in Year 2 were instructed to reflect and improvetheir data at multiple points during their analysis, and so students were making deliberateimprovements to their data beyond fixing errors, and evaluating those improvements. Thisevaluation behaviour in Year 2 continues in the LI2 experiment, which was similarly scaf-folded. Without a systematic effect in the LI2 experiment, students in Year 1 were primarilyanalyzing their linearized graphs (concluding that the model is an exponential relationshipbecause the semi-log graph is linear).In the RS experiment both years, students were primed to consider the competing effectsof the background and the source radiation, which increased the chances that students wouldevaluate their models accordingly. In Year 2, however, most students reached this evaluationphase. The SW experiment, in contrast, did not involve common systematic effects orobservable model limitations. In Year 1, this meant that most students were applying toolsin a rote-reasoning frame, while students in Year 2 were still analyzing comparisons (therewas an embedded comparison in Year 2 and not in Year 1), and proposing measurementimprovements.Reflection during the IoR experiment did not differ significantly between years and, ingeneral, few students were making reflective comments beyond analysis. We attribute thisto a number of issues. First, the statistical focus of the experiment, the weighted average,was a procedural and non-reflective tool. That is, compared to tools such as t′-scores,residuals, and χ2 values, the weighted average offers little to inform the quality of data,conclusions, or results. With a procedural goal to calculate a weighted average, studentswere focused on making their measurements and combining them into a weighted average,rather than on analyzing their measured data. Of course, the intention was that the studentswould compare their three measurements for internal consistency, but we suspect that theintroduction of the weighted average distracted students from this reflective goal. Informalobservations in the class in Year 2 also suggested that students were unclear about how to584.3. Conclusionsuse t′-scores to compare across three different values, since the single equation only comparespairs of measurements.Most students in Year 2 made suggestions for how to improve measurements in theRC2 experiment (level 3 synthesis statements), which were often attributed to frustrationsdealing with old oscilloscopes. In Year 1, however, students are mostly analyzing theirgraphs. As we have already seen, the effects in the LR experiment are dramatic, withstudents in Year 2 primarily making level 4 evaluation statements, while most students inYear 1 were only making application-level statements.Overall, this data suggests that students’ reflection behaviours depend significantly onthe structure and features of the associated experiments. This dependance also seems toovershadow the time-progression, though there is evidence that students in Year 1 progressedinto a plug-and-chug frame (mostly level 1 and 2 comments) over time. High-level reflectionbehaviours correspond to opportunities to iterate and improve measurements or to evaluatemodels, which will be further discussed in chapters 5 and 6. It appears, however, that theSQILab framework successfully supported high-level reflection behaviours throughout theyear, when tied to experiments that gave students meaningful evaluation opportunities.4.3 ConclusionsThrough analysis of students’ written comments, we find that students in Year 2 were, ingeneral, reasoning at a much higher level during the experimentation process throughoutthe year. These effects were predictable in experiments when the lab instructions scaffoldedreflection, but when the scaffolding was removed, the effects remained. An exception to thiswas on the IoR experiment, where there was a procedural focus on a statistical tool thatdid not support reflection.This is not an effect of students each year performing different analyses with differenttools. In the LR experiment, for example, nearly all students in both years performed leastsquares fits and calculated corresponding χ2 values. Very few students in Year 1, however,included a written analysis of that fit or value. This suggests that students in Year 1 were ina rote reasoning or plug-and-chug stance (Redish, 2014; Tuminaro & Redish, 2007), wherethey framed the task only as a mathematical one, not bringing to bear other resources, inparticular their physical or experimental knowledge. Many more of the Year 2 students werein epistemological stances that brought meaning to the mathematics (Tuminaro & Redish,2007), combining the analytic procedures with their physical intuition and understanding ofexperimentation. Of course, students in Year 1 may have been trying to bring meaning tothe mathematics, but they did not write it down. Since the content of the lab notebook isthe sole determination of a student’s mark in the lab course, choosing not to record reflectivecomments suggests behaviours for earning a good grade in the lab course did not supportreflection (Elby, 2001).594.3. ConclusionsThe reflection data also suggests that students were taking different meaning from thesetools each year. When taught on its own as an analysis tool, the χ2 value is an objectthat must be minimized to find the best-fit model parameters. Under this definition, oncethe value is minimized, the analysis is complete. With the SQILab structure, however,the χ2w value is a part of a critical thinking toolbox that informs follow-up analysis andmeasurement. The framework provided by the t′-score was mapped onto the χ2w valueand students understood that performing the fit is not the end point to their analysis.They became comfortable dealing with disagreements and seeking out practical solutionsto resolve them (as we will see in the next chapters).From these results, we can establish some of the key features of the intervention thatpromoted reflection beyond the scaffolding instructions. One of these is in how studentsinterpret the goals of the experiments, reminding us that the actual goals of the activity mustbe clear and deliberate. When procedural goals were included in an experiment that did notmap on to the reflection strategies (for example, learning to calculate weighted average oruncertainty in the slope), students’ attention was diverted away from the reflection strategiesand towards the procedural goal.Of course, the comparison and reflection scaffolding did support students’ critical think-ing during their analysis. From the data, it is unclear how much scaffolding was necessary toinvoke the behaviours, since, as new tools were introduced, students’ reasoning continued toimprove with or without scaffolding. Providing a reflection framework that can be appliedto a variety of analysis tools may be sufficient to engage students in critical reflection. Thisis not the case for some of the other critical thinking behaviours, though, as we will see inthe following chapters.60Chapter 5IterationA significant element of the intervention was to engage students in deliberate reflectionleading to iteration and improvement of their methods. It has been found to be rare forstudents to spontaneously repeat experiments or measurements (Kanari & Millar, 2004;Se´re´ et al., 1993) and when they do do so, it is often due to an automatic routine (withoutself-driven purpose) or to correct mistakes (Kanari & Millar, 2004). One study foundthat students repeat measurements in an attempt to get closer to the ‘true value’ (Allieet al., 1998). Since a major goal of repetition in measurement is to reduce uncertaintyor correct systematic effects, students’ limited understanding of measurement uncertaintyand variability may be a root cause (Kanari & Millar, 2004). This is especially hinderedby students’ assumptions that their measurements are inherently low quality comparedto scientists’ measurements (Holmes & Bonn, 2013; Se´re´ et al., 2001). If students obtainagreement, they will assume they are done, and if they obtain disagreement, they will blameit on bad equipment or ‘human error’ (Holmes & Bonn, 2013; Se´re´ et al., 2001). We willprovide an example from an activity in Year 2 that highlights these issues.At the beginning of Year 2, in the first Pendulum experiment after learning abouthistograms, students each measured the period of the same pendulum and we displayed theclass results in a histogram. We then asked students, “Why didn’t we all get the exact samevalue for the period of the pendulum?” They submitted key word answers using LearningCatalytics2, a web-based program to support peer instruction (Crouch & Mazur, 2001)using a variety of free- or closed-response question types (not limited to multiple choice).Students’ responses are represented in the word cloud in Figure 5.1, where the size of thewords represents the frequency with which they were submitted.Though some students identified sources of uncertainty such as reaction time, the wordcloud highlights that many students associate the difference in measurements to ‘humanerror’ (Se´re´ et al., 2001). This activity triggered a full class discussion about the natureof measurement and uncertainty and a clear definition of the term ‘human error.’ In thelab course, we define ‘human error’ as an error (mistake) that a human made that shouldbe corrected. While the main source of uncertainty in the pendulum measurement is thedifference in reaction time (when the measurer perceives the pendulum starting and ending aperiod and the delay in starting and stopping the stopwatch, for example), students associatethis with being an error made by humans. As demonstrated elsewhere, students here are2www.learningcatalytics.com615.1. Index of refractionFigure 5.1: Word cloud displaying key words that students submitted in response to thequestion, “Why didn’t we all get the exact same value for the period of the pendulum?”The size of each word is proportional to the frequency with which it was submitted.interpreting the word error as meaning mistake, rather than uncertainty (Evangelinos et al.,2002). When errors are interpreted as mistakes, one could, with sufficiently high qualityequipment and techniques, reduce these errors to zero and make a perfect measurement(Evangelinos et al., 2002; Leach et al., 1998).Through this example, we see that the students entering the lab course hold novice ideasabout uncertainty. As suggested in chapter 2, we believe the iterative comparison cycles,which are the underpinning of the SQILab framework, help shift students’ ideas towardsthe set-like paradigm, making uncertainties less abstract and more useful. We also expectthe comparison cycles to encourage students to iterate and make changes throughout theexperiments with and without scaffolding. We have already seen how students can greatlyimprove their measurements in scaffolded experiments (the P2 experiment in Section 2.4).In this chapter, we will first examine a specific, unscaffolded experiment for these elementsand then look across the full year for evidence of iteration and purposeful improvement.5.1 Index of refractionThe IoR experiment was described in detail in Section 3.2.9. In summary, the experimentinvolved measuring the index of refraction of a piece of plexiglass using Snell’s Law (SL),Total Internal Reflection (TIR), and Brewster’s Angle (BA). SL and TIR measurementsinvolved common systematic effects that lead to the discrepancies in Table 5.1.625.1. Index of refractionMeasurement Accurate Values Inaccurate ValuesSnell’s Law 1.48±0.02 ∼ 2.13Total Internal Reflection 1.48±0.01 1.41±0.01Brewster’s Angle 1.48±0.02 naTable 5.1: The table shows the accurate values of n for each of the three measurementsand the approximate values of the systematic effects for SL and TIR. Uncertainties areestimated from precision of the protractor.5.1.1 MethodsStudents’ values for n from the SL and TIR measurements were extracted from their labbooks. Books from all students both years were analyzed (Year 1, n=136; Year 2, n=145).If students made changes to their values, their original and final values were discernibleeither through clearly described changes or through initial values crossed out and replacedwith final ones. A sample from a lab book in Figure 5.2 shows that the student first madethe SL error and measured an angle of 24.5◦ giving n of 2.09. They subsequently correctedit and changed the measurement to 60◦-24.5◦=35.5◦, giving n of 1.49. In this example, avalue of 2.09 would be recorded as his initial SL measurement (before iteration) and 1.49would be regarded as his final SL measurement (after iteration). If no changes were made,the student would get the same value for their initial and final values. Students’ values werealso then coded as being either initially correct, incorrect but later corrected, or incorrectand not corrected.Three student measurements in Year 1 were removed from the TIR analysis. Thesevalues differed significantly from all other measurements with values near n ≈ 2, whichcannot be explained by the common measurement errors. These points misleadingly skewedthe TIR data in Year 1 to represent an approach in n towards the expert value, bringing itabove the inaccurate value. It has been previously demonstrated that students place specialimportance on their first measurement (Se´re´ et al., 1993) and so it is suspected that thesestudents manufactured their TIR and BA values to match their SL measurement, whichwas n ≈ 2 due to the common measurement error.In addition to their values, the method through which changes were made was recorded.This was either through crossing out values and replacing them with new ones withoutexplanation (as in the example in Figure 5.2), or through explained and justified changes.Both years, the TAs in the lab course provided limited in-class support during the IoRexperiment and all scaffolding (explicit written instructions and marking scheme instructionto compare, reflect, or iterate) was removed in Year 2. The marking rubrics and writteninstructions provided to the students were identical between years.635.1. Index of refractionFigure 5.2: Excerpt from a student lab book demonstrates a corrected change from n of2.09 to 1.49 on the SL experiment due to a correction of the angle measurement.5.1.2 ResultsFigure 5.3 shows the mean n values from the SL and TIR measurements each year beforeand after any iteration that students made. In the SL experiment, many students in bothyears made and corrected the measurement error, with more students initially making theerror in Year 2. We suspect this is because students in Year 1 had previous experiencewith the equipment earlier in the lab course, having used it for a small part of an earlierexperiment. This experiment was not used in Year 2, making the students much less familiarwith the measurement apparatus. Since the SL error is a result of incorrectly reading theprotractor, it is reasonable that experience measuring angles on the apparatus would reducethe likelihood of making the error.Students’ reported values of n for each measurement were compared through a repeatedmeasures analysis of variance (rANOVA), comparing within-subjects effects on the beforeand after iteration values of n and between-subjects effects comparing Year 1 and Year 2.Within-subjects effects will suggest whether the values of n before and after iteration aresignificantly different independent of year, representing a significant shift towards the expert645.1. Index of refractionl l1.41.61.82.02.2Initial Value Final ValueSnell's Law nl l1.401.441.48Initial Value Final ValueTIR nl Expert Value Year 1 Year 2Figure 5.3: The average n values reported by students each year for the SL (top) and TIR(bottom) measurements. Initial and final values were recorded if students made changes totheir measurements (or they would have the same number recorded for both if no changeswere made). More students in Year 2 corrected a measurement error in the SL measurementthan in Year 1, but few students both years corrected an error in the TIR measurement.655.1. Index of refractionvalue. Between-subjects effects will inform whether students’ values of n (both before andafter iteration) differ significantly each year, independent of whether they iterated. This willalso provide a measure of whether students values were, in general, more or less accurate(close to the expert value) each year. Finally, an interaction between the two variables(iterated and year) will inform whether the amount of iteration differed by year, which isthe most interesting comparison for our purposes.Within-subjects effectsWithin years, the TIR measurements did not show any differences before or after iter-ations, meaning there was a non-significant shift in the class towards the expert value:F (2,276)=0.06, p=0.801. In contrast, many students iterated to improve the SL measure-ment, with a significant shift towards the expert value: F (2,279)=16.43, p<.001.There are several reasons students would not correct the TIR error. First, the differencebetween the accurate and inaccurate values of n for the TIR measurement is small relativeto the SL error. TThat is, even though the sizes of the effects are similar in units ofuncertainty, the absolute value of the difference between the values is much larger (ignoringuncertainty, 1.41 is more similar to 1.47 than 2 is to 1.47). Through interviews with studentsit was found that many considered the TIR difference to be negligible, especially if theyhad previously corrected the much larger SL error. This suggests that students are stillin point-like paradigms when comparing measurements, focusing on the magnitude of thenumbers rather than the difference in units of uncertainty (Buffler et al., 2001).This brings up the second point, that the difference between the TIR error value of n andthe other values may not be statistically significant if one’s uncertainties are inappropriatelylarge. Since the main source of uncertainty was the precision of the protractor, studentsused their own discretion to determine their uncertainty in reading the scale. Many studentsdescribed assigning large uncertainties to their angle measurements (as large as ±3◦) so thatthey could be “safe” rather than precise (safe refers to being more likely for values to agree).This tactic still would not produce agreement with the SL error value, but it would causethe TIR measurement to agree with other accurate measurements.In Year 2, however, students reported significantly smaller measurement uncertainties,on average, than in Year 1: F (3,276)=3.38, p=.019 (Table 5.2). This may be a directresult of earlier scaffolding to iterate to improve measurements and reduce uncertainty inYear 2. This result demonstrates a shift in students’ epistemologies in Year 2 away fromartificially inflating uncertainties to account for potential error (inaccuracies), towards aregard for high precision to better understand the measurements. While in Year 1, asuccessful experiment may have been defined by encapsulating the ‘true’ value (leading themto inflate uncertainties), in Year 2, successful experiments involved precise measurements.Students’ accuracy, however, does not describe the same story, as we will see through thebetween-subjects effects.665.1. Index of refractionMeasurement Year 1 Year 2SL Uncertainty 0.083±0.01 0.049±0.01TIR Uncertainty 0.074±0.02 0.028±0.005BA Uncertainty 0.132±0.02 0.059±0.01Table 5.2: The table presents the mean and standard uncertainty in the mean of students’reported uncertainties for each measurement across years in the IoR experiment.Between-subjects effectsThe TIR measurements showed no differences between years: F (2,276)=0.34, p=.561; andno interaction: F (2,276)=1.06, p=.304. This is surprising since, as mentioned above, stu-dents had significantly smaller uncertainties in Year 2, which increased their chances ofobtaining disagreement between the inaccurate TIR measurement and the other values.The SL measurements, on the other hand, did show significant differences between years:F (2,279)=7.58, p=.006; with a significant interaction: F (2,279)=28.28, p<.001. Theseresults suggest not only that more students in Year 2 left the lab with an accurate SLmeasurement, but also that more students in Year 2 iterated to correct the measurementerror (Figure 5.3). This effect is amplified by the fact that more students in Year 2 reportedan initially inaccurate measurement (seen as the much higher n value before iteration inYear 2), as we have already discussed. It is possible, of course, that students in Year 1were simply not recording their initially incorrect values. Closer examination of how theydocumented these changes sheds some light on this.5.1.3 Iteration behavioursThough the distribution of measured values themselves explain much about students’ be-haviours, there are other actions to examine. Figure 5.4 shows the distribution of students’behaviours when their measurements are categorized as accurate, corrected, or inaccurate,based on the values and actions in their lab books. Accurate corresponds to students whomeasured the value accurately at the beginning and did not change. Corrected valueswere initially measured inaccurately, but then recognized and corrected. Inaccurate valueswere initially measured inaccurately, but not fixed. χ2 tests of independence showed sig-nificant effects for the SL measurement, χ2(2)=14.48, p<.001, and the TIR measurement,χ2(2)=7.32, p=.026. For the SL measurement, Figure 5.4 demonstrates that more studentsin Year 2 iterated to correct their measurement, consistent with the between-subjects effectsdiscussed in the previous section. For the TIR measurement, however, Figure 5.4 suggeststhat students in Year 1 made more initially correct measurements and fewer left with in-correct measurements. Again, we primarily associate this last behaviour with the studentsin Year 1 having previously made this measurement.Finally, we can also examine the method through which students made the changes.675.1. Index of refractionY1 Y20.00.20.40.60.00.20.40.6SLTIRAccurate Corrected Inaccurate Accurate Corrected InaccurateMeasurement TypeFraction of studentsFigure 5.4: The distribution of the types of measurements made in the IoR experiment.Measurements were either initially accurate, initially incorrect but then corrected, or inac-curate and never corrected. Error bars represent 95% confidence intervals of the proportions.Analysis of student books demonstrated that this either involved crossing out initial mea-surements and replacing them with updated values as in Figure 5.2, or through clear de-scriptions explaining and justifying the changes. Figure 5.5 shows the distribution of thesefor both measurement types, clearly demonstrating that most students in Year 1 crossed-out their values and replaced them with no explanation, writing off initial errors (mistakes).This was a significant difference in the SL measurement, χ2(1)=23.39, p<.001, but therewas insufficient data to infer significance in the TIR measurement (very few students madechanges). This result demonstrates that not only did more students in Year 2 iterate toimprove their measurements, but they also recognized the value of the iteration process,opting not to try to hide their mistakes but justify them and demonstrate their attemptsto improve. This suggests a significant and important attitudinal shift.5.1.4 DiscussionThis single experiment provided a case study of students’ iteration behaviours in an ex-periment that did not involve limitations to a theoretical model. The results shed light onthe conditions necessary to encourage students to reflect on their experimental results andact on the reflection to make improvements. Preceding instruction that included tools for685.1. Index of refractionY1 Y20.000.250.500.751.000.000.250.500.751.00SLTIRDescribe Cross−out Describe Cross−outChange TypeFraction of studentsFigure 5.5: The distribution of the methods through which students made changes. Valueswere either crossed out and replaced with new ones without justification or students providedclear descriptions and explanations for changes made. Error bars represent 95% confidenceintervals of the proportions.meaningful reflection and scaffolding for making comparisons and iterating on those com-parisons resulted in more students iterating to correct gross systematic effects in the IoRexperiment in Year 2. This focus on iterating and improving measurements also resulted instudents with significantly smaller and more expert-like measurement uncertainties, as wellas more students describing and justifying the changes made in their measurements, ratherthan crossing them out in attempts to hide them. This demonstrates an extensive shift instudents’ expert-like attitudes and behaviours due to the SQILab structure.In Year 1 it was observed that some of the students that identified their errors did so byconferring with their peers. When suggested to other students as a potential resource in theclassroom, many responded that they had thought that sharing results and ideas with theirpeers was considered cheating, as in an exam setting. In Year 2, discussing with peers wasdeliberately and, sometimes, explicitly, added to the possible reflection behaviours. Indeed,more students in Year 2 described conferring with their peers in the IoR experiment thanin Year 1 (see Figure 4.9 in chapter 4).The intervention, however, was insufficient for students to recognize and improve theTIR measurement. The magnitude of the systematic effect for TIR was much more subtle695.2. Year-long effectsthan that for SL. In interviews, students mentioned that the TIR measurement seemed‘close enough’, especially if they found and corrected the much more extreme SL error.This error also required a more nuanced understanding of the measurement. Since the errorcomes from spreading of the beam as one approaches the critical angle for total internalreflection, the students must define the critical angle correctly as being the point when thecentre of the beam is on the verge of ‘disappearing.’ Most students instead define the criticalangle as the point in which the refracted beam has completely disappeared. Adjusting thisdefinition is not supported by the tools we provided to the students.This is reinforced by issues described in chapter 4, where it was found that relatively fewstudents were comparing their three measurements (through overlapping uncertainty rangesor t′-scores), and were instead focusing on the non-reflective procedural goal of combiningthe three measurements through a weighted average. In future iterations of this course, weplan to move the weighted averages activity to an earlier lab and give students experiencescomparing across more than two measurements earlier in the year. We expect this willimprove the number of students who correct both systematic effects.5.2 Year-long effectsStudents’ written lab books were analyzed for any instances of changes made or proposedto their methods across the year. In experiments such as the Pendulum for Pros I (P2)experiment, students in Year 2 were instructed to iterate and improve their methods, so weexpect the majority of the class to engage in this behaviour. In the following week, however,in the Pendulum for Pros II (P3) experiment, this scaffolding was removed. It is importantto observe whether these behaviours persist when the scaffolding is removed and, if not, whatduration, frequency, or levels of scaffolding are necessary for students to engage in thesebehaviours independently. Students may independently iterate to improve measurementsonce it has become habit to them, they recognize it’s value in experimentation, or theyrecognize that it is an expectation in the lab course. As an example, Figure 5.6 shows asample of student work during the Pendulum for Pros experiment where they suggest waysto improve their measurements of the period of a pendulum.Figure 5.6: An excerpt from a student lab book shows an example of proposed measurementchanges in the Pendulum for Pros experiment.705.2. Year-long effects5.2.1 MethodsFor 8 different experiments in the lab course common to both Year 1 and Year 2, a randomsubset of 20-30 student notebooks was analyzed from each year. In three of the experiments(P2, IoR, and LR), the full class each year was analyzed since these experiments wereanalyzed in more detail for additional assessment measures. Three additional experimentsin Year 2, that were not used in Year 1, were also analyzed.To analyze the changes to methods, any instances of proposing or carrying out changesto measurements were coded. I coded all items and another rater coded approximately10% of the items. Inter-rater reliability analysis using Cohen’s κ statistic was performedto evaluate consistency between raters. Values greater than 0.6 were considered substantialagreement. Only one round of inter-rater reliability analysis was necessary, since the theinter-rater reliability for the raters on the codes of proposing or carrying out changes tomeasurements was found to be κ = 0.714, p<.001 after the first round. A logistic regressionwas performed to compare the amount by which students made or proposed changes (thesewere combined for analysis) across five experiments where scaffolding to do so was removedin Year 2.5.2.2 ResultsThe distribution of the frequency of students making or proposing changes to methodsacross experiments is found in Figure 5.7. The logistic regression model was statisticallysignificant, χ2(5)=307.94, p<.001. The model explained 50.4% (Nagelkerke R2) of thevariance in students’ method changes and correctly classified 78.6% of cases (see Table5.3). Students in Year 2 were 52 times more likely to make or propose changes to theirmethods than students in Year 1. There were also significant differences across the differentexperiments, χ2(4)=15.0, p=.005.715.2.Year-longeffectsY1 Y20.000.250.500.751.00P2 P3 LI1 LI2 RS MS1MS2SW IoRRC2 LR P2 P3 LI1 LI2 RS MS1MS2SW IoRRC2 LRExperimentFraction of StudentsYear Y1 Y2Method Changes Proposed Proposed and ChangedFigure 5.7: Distribution of students making or proposing changes to their experimental methods across experiments each year. Errorbars represent 95% confidence intervals of the proportions.725.2. Year-long effectsCategory PredictionSensitivity 28.8% (True positives)Specificity 49.9% (True negatives)Positive Predictive Value 90.9% (Correct vs observed positivesNegative Predictive Value 72.9% (Correct vs observed negatives)Variables in the Equation B S.E. Wald z p eBYear 3.95 0.35 11.45 <.001∗∗∗ 51.72LabSW -1.52 0.56 -2.70 .007∗∗ 0.22LabIoR -1.30 0.45 -2.87 .004∗∗ 0.27LabRC2 -0.67 0.60 -1.11 .267 0.51LabLR -0.67 0.45 -1.48 .139 0.51Table 5.3: Results from the logistic regression comparing students’ iteration behaviourseach year across five unscaffolded experiments. ∗∗ p<.01. ∗∗∗ p<.001.From Figure 5.7 it is clear that, in general, students in Year 2 carried out or proposedmore changes to their measurement procedures than students in Year 1. This was neveran explicit requirement in Year 1 and so the students did not engage in this behaviour.The peak in changes in Year 1 during the LI1 experiment was primarily due to studentsmaking gross errors in their data collection process and needing to re-do their full dataset. In Year 2, when the lab instructions and assessment (scaffolding in the Pendulum andLight Intensity experiments) explicitly required students to make multiple attempts at theirmeasurement, nearly all students made changes to their method, with many more proposingadditional changes.There are some interesting highlights to these behaviours. First, it appears that theexplicit instructions to iterate need to be present for several weeks before students williterate independently. This is demonstrated by the sequence of behaviours during the firsttwo weeks of the term. The scaffolding introduced to students in the second lab of Year 2 wasremoved the following week, which resulted in very few students iterating their experimentalprocess (shift from P2 to P3). In contrast, the LI1 and LI2 experiments both had high-levels of iteration scaffolding and, when removed the following week in the RS experiment,more than 75% of the students continued to repeat to improve their measurements. Afterthe winter break, the scaffolding remained absent and students’ iteration behaviour settledaround 50%, with three interesting peaks.Two of these peaks, during MS2 and LR experiments, correspond to experiments thatinvolved a need to change the theoretical model. We propose that this is due to studentstrying to reconcile the disagreement with the theoretical model. That is, their first instinct isthat their measurements are wrong (Holmes & Bonn, 2013; Se´re´ et al., 2001) and so they tryto improve their measurement. When an improved measurement still causes a disagreementwith the theoretical model, they are eventually convinced that it is an issue with the model735.3. Motivation to iterateinstead. We will examine how students evaluate models further in the next chapter. Thethird peak, during the RC II experiment, is for proposed changes to the experiment, ratherthan carried out changes. We speculate that this is due to frustrations using aged electricalequipment, especially oscilloscopes where data must be extracted manually from the screen.The proposed changes often involved ways to improve the precision of the experiment,especially by getting better equipment. These proposals continue during the following week,an experiment that uses the same equipment.5.3 Motivation to iterateThere is one cautionary tale for the scaffolding to promote iteration, but I offer a solutionas well. In an interview with a student in Year 2 at the end of term 1, it was revealedthat some students were making deliberately flawed initial measurements to ensure theyhad something to improve in the subsequent attempts. Since there were marks allottedto demonstrated improvements and changes, students made sure they had something tochange. At the beginning of the second term, we addressed this issue through a full classdiscussion about the nature of the experimentation process.Before the discussion, students worked in small groups to create a flowchart describinghow we do experiments in physics. A sample of one of these flowcharts can be found inFigure 5.8. All the flowcharts produced by students included some form of iterative loop toreflect on their data and improve their data, precision, models, or methods depending onthe outcome. In Figure 5.8, for example, there are reflective steps to determine, “Does thedata make sense?” and “Is the data accurate/precise?” If no is the response to either of thequestions, there are loops back to re-design the experiment or collect more data. There isalso a loop to determine, “Does the model fit?” and, if not, to cycle back to evaluate themodel.These expert-like flowcharts conveniently set the stage to discuss why we iterate dur-ing the experimentation process, which brought up issues of validity, systematic effects,limitations and assumptions of models, and many other authentic nature of scientific mea-surement issues. the lab course instructor also specifically addressed the gaming behaviourthat had been identified (specifically doing bad quality measurements initially to ensureimprovement was possible). Students were reminded that this was not worth their time,since there were always many ways to improve their measurements, and, if they truly madea perfect measurement initially, they did not need to iterate. This also prompted removingthe iteration scaffolding from the marking rubric to decrease the incentive for ‘gaming.’ Werecommend that this issue be addressed outright early in the year, with frequent remindersabout why it is useful to improve measurements. We also recommend providing studentswith productive and efficient examples for iterating their measurements beyond simply re-peating the whole experiment, ensuring that the iteration process is indeed productive and745.4. ConclusionsFigure 5.8: Flowchart produced by one group of students describing how we do experimentsin physics. All students’ flowcharts included forms of iterative loops, such as this one, toreflect on and improve data, precision, methods, or models.useful.5.4 ConclusionsThe data presented in this chapter demonstrates that students in Year 2 engaged signifi-cantly more often in iterating and improving their measurements, with and without instruc-tions to do so. Indeed, if students are critically reflecting on their analysis, as was found inYear 2 in the previous chapter, it is more likely for them to productively identify avenuesfor improvement. What is also interesting is that, beyond the changes being made, studentsin Year 2 were more likely to describe and explain those changes, rather than just crossingout the first set of measurements and replacing them with updated ones. When studentsin Year 1 did spontaneously iterate their measurements, it was mostly due to making grosserrors that needed to be corrected, consistent with other literature (Kanari & Millar, 2004).The key elements of the intervention, we believe, surround a more authentic experiencein the lab, and one that rewards improvement, rather than punishing mistakes. In theyear preceding the intervention, Year 1, students framed their behaviours in the lab suchthat their measurement abilities differed from those of expert scientists. Students in Year1 perceived the plexiglass prism as a “lump of clay” and assumed that they would obtainlow-quality results. As such, they were prone to inflating their uncertainties to encapsulatepotential errors (that they could not control) in an attempt to try to get agreement. Incontrast, the SQILab structure gave students experiences where they could improve theirmeasurements with the given equipment, especially to reduce uncertainties, and this often755.4. Conclusionsuncovered new physics or exposed assumptions or approximations in the physical models.For example, the Pendulum for Pros experiments (P2 and P3) exposed the small angleapproximation for the period of a pendulum when measurement precision was sufficientlyhigh. In the Mass on a Spring experiments (MS1 and MS2) the mass of the spring becameno longer negligible as students performed progressively higher quality measurements. Thisdeep connection between measurement quality and modelling develops a more practicalunderstanding of uncertainty, moving from an abstract number that students must calculate,to a useful tool that students can quantify and manage. The t′-score and χ2w values givethem follow-up actions and things to do. The SQILab framework shifts students towardsmore expert-like epistemologies, with the attitude that they, as students, are engaging inmore authentic, high-quality scientific experimentation.From the flow charts students produced about how we conduct experiments in physics, itis clear that students understood the purpose of iterating measurements, but there was somemisalignment with their beliefs about earning good grades in the lab course (as evidenced bythe ‘gaming’ behaviours). This is consistent with a conflict between their epistemologicalframes and their goals for succeeding in the lab course (Elby, 2001). While it is impor-tant for the lab course assessments to align with the desired behaviours (that is, we gavestudents marks for iterating), instructors must be careful with how students’ performancegoals (wanting to succeed in the lab course) interact with their mastery goals (wanted tolearn the material and do well in the experiment). These epistemological and attitudinalconsiderations will be further discussed in chapter 7.Nonetheless, the results in this chapter have demonstrated key features necessary forengaging students in iterations when the scaffolding to do so is removed. First, it takesa number of experiences with instructions to iterate and change measurement proceduresbefore it becomes habit with the students. There also seems to be a pattern of iteratingmeasurements being tied to issues in the experiment, especially theoretical model changes.That is, the unscaffolded experiments with the most students iterating all involved someform of systematic effect or theoretical model issue. This suggests that, beyond reflectingon their data and results, students require an authentic reason to manipulate and iteratetheir experimental methods. If the results come out agreeing and making sense, they arenot encouraged to go back and make changes, even though agreement could be a result ofimprecise measurements (inflated uncertainties). Giving students the impetus to improvetheir measurement when results seem satisfactory seems to be the challenge, as evidencedthrough the IoR results.As discussed in the introduction, one issue with this analysis is that the written com-ments may not map directly onto students’ behaviours in the lab. For example, while thenumber of students that recorded making changes to their data increased in Year 2, it doesnot necessarily mean that fewer students in Year 1 actually made changes. It is possiblethat students in both years made changes that they did not record. This is especially true765.4. Conclusionsfor the IoR experiment where students recorded single values. Observations of students inclass, for example, found that some groups of students held animated discussions regardingthe precise definition of the critical angle, but no trace of this was found in their books. Itis fair to conclude, from these results, that students in Year 2 produced much more detailedlab notes that better reflected their process and behaviours in the lab. While this may besimply attributed to the intervention itself (that is, much of the scaffolding instructed orrewarded students for providing evidence of reflection and iteration), the improved exper-imentation outcomes suggest that their behaviours were also improved. Future researchshould involve observational studies of students’ behaviour and discourse in the two typesof labs to confirm these conclusions. Nonetheless, it is clear that the SQILabs promoteiteration as a useful and productive activity, with students demonstrating this behavioursignificantly more.77Chapter 6EvaluationEvaluation is the highest level of sophistication in Bloom’s Taxonomy (Anderson & Sosniak,1994) and we have already seen that students were making many more evaluative reflectionstatements in Year 2 (Chapter 4). Beyond evaluating their tool use, reflection and iterationoften uncovered issues with theoretical models, such as limitations or unjustified assump-tions. In this way, theoretical models are deeply intertwined with experimental procedures(Hoskinson et al., 2014; Zwickl et al., 2013a), though students rarely use theoretical modelsto help interpret experimental data (Ryder & Leach, 2000).While evaluating theoretical models was never an explicitly scaffolded behaviour, reflect-ing on results and iterating to improve measurements leads naturally into evaluation, as hasbeen previously demonstrated. Many of the experiments in the SQILab were also intention-ally designed such that reasonable precision and accuracy could expose specific limitationsor approximations in the associated model. The first example of this is in the period of thependulum measurement, where comparing and reflecting on pairs of measurements and iter-ating to reduce the measurement uncertainty revealed the small angle approximation in theperiod of the pendulum (Section 2.4). In this chapter, I will examine students’ evaluation ofphysical models throughout the lab course and provide a specific example by comparing theLR experiment between the two years. Once again, we will begin with the specific exampleand then expand into the year-wide effects.6.1 LR circuits experimentThe LR Circuits experiment has been previously described in Section 3.2.12. The mainfeatures were that the theoretical model,τ =LR. (3.4)suggested that a plot of 1τ against resistance (R) would produce a straight line through theorigin. Additional resistance in the circuit, however, results in a small intercept in the plot(see Figure 3.4). A simple unit analysis could bring students to the physical interpretationof the intercept without a significant understanding of the circuit elements.The written instructions provided to the students both years were the same and TAsprovided very limited support to the students, other than supporting low-level technical786.1. LR circuits experimentFigure 6.1: An excerpt from a student lab book evaluating the given model in the LRexperiment.issues. Given that students choose the form of the model to fit, we will examine whetherstudents fit the theoretical model to their data or fit the best fit to their data (that is,a one- or two-parameter fit). Regardless of the fit, did they recognize the disagreementbetween their data and the theoretical model and, if so, how did they reconcile it? We havealready seen that students in Year 2 were engaging in higher-level reflection on their dataand analyses (Chapter 4) and iterating to improve their measurements (Chapter 5) duringthis experiment. Here we show that these improved behaviours also led to more studentsevaluating the theoretical model.6.1.1 MethodsAll students’ lab books and analysis spreadsheets were analyzed (Year 1: n=130; Year 2:n=136) to assess whether they had included an intercept in their fit, if their conclusionsrecognized the non-zero intercept (whether or not it was included in the fit), and whetherthey had associated the intercept with an additional resistance in the circuit. Figure 6.1shows an excerpt from a student lab book describing this evaluation process (identifyingthat the given model does not have an intercept and then attaching physical meaning totheir intercept). I coded all items and another rater coded approximately 10% of the items.The inter-rater reliability for the raters across coding items was found to be κ = 0.881,p<.001 on the first round of coding.The raters were not blind to condition, since the intervention itself affected what waswritten and included in the lab books. The most significant difference was that in Year 1students provided printed copies of their spreadsheet calculations in their books, whereasin Year 2 students submitted the spreadsheet files electronically. Since the contents of thespreadsheet needed to be analyzed to determine whether an intercept was included, the two796.1. LR circuits experiment0.000.250.500.751.00InterceptIncluded CommentonIntercept Interceptdue toResistanceFraction of Students Year 1Year 2Figure 6.2: The distribution of student evaluation behaviours during the LR circuits experi-ment each year shows a shift towards more expert-like evaluation behaviours. More studentsin Year 2 included an intercept in their fit, commented that it did not match the theoreticalmodel, and physically interpreted the intercept as being due to additional resistance in thecircuit. Error bars represent 95% confidence intervals on the proportions.Effect χ2 df pIntercept included 88.70 1 < .001∗∗∗Comment on intercept 89.70 1 < .001∗∗∗Associate intercept with extra resistance 37.07 1 < .001∗∗∗Table 6.1: Table for χ2 tests of independence for students’ interaction with the intercept inthe fit. ∗∗∗ p<.001.groups of students were distinguishable. Because the inter-rater reliability demonstratedalmost perfect agreement between raters, blindness to condition is not expected to affectthe results.6.1.2 ResultsThe fraction of students who included intercepts in their fit, wrote comments in their labbooks about the presence of the intercept, and/or associated the intercept with additionalresistance in the circuit can be found in Figure 6.2. χ2 tests of independence between yearsshowed that significantly more students in Year 2 engaged in all three of these behavioursthan students in Year 1 (see Table 6.1).These results surely stem from the fact that students in Year 1 were not reflectingon or making sense of their data or analyses, since a visual analysis of the plot woulddemonstrate that one’s data did not fit the one-parameter model. More detailed analysis806.1. LR circuits experimentof the quality of students’ reflective comments has already been discussed in chapter 4.Specific to this experiment, the most prominent reflective comment by students in Year 1was a statement identifying that their data showed a linear relationship between 1/τ andR, and an invalid inference that this verified the theoretical model. This result furthersuggests that students in Year 1 were reluctant to make changes to the theoretical modeland still held an unproductive view of authority. In Year 2, in contrast, students hadhad myriad experiences where they could identify the limitations and assumptions of atheoretical model, which laid the foundation for an enhanced trust in their own abilities asexperimenters.Time on taskOne confounding issue to this experiment is that students in Year 1 worked through acomputer activity (Invention activity) at the start of the lab and used the tool from thatactivity (Uncertainty in the slope) to reanalyze the previous week’s data. As such, thestudents in Year 1 spent approximately two hours on the LR circuits lab, whereas the Year2 students had the full three hours. Not having enough time to reflect and act on thatreflection may explain the different outcomes observed above. As a precautionary measure,I observed students in Year 2 two-hours into the lab session to evaluate what fit they hadanalyzed by that time. Two hours into each lab section, I visited each group and recordedwhether the group had already produced a one-parameter mx fit, a two-parameter mx+bfit, or a ln-ln plot.The results, shown in Figure 6.3, demonstrates that if the students in Year 2 had beengiven the same amount of time on task as students in Year 1, more of them still wouldhave made the modification to the model and included an intercept in their fit. Givenadditional time, however, even more students were able to think critically about the taskand make better sense of their data. From this result, we conclude that the effects seen inthis experiment are still primarily due to students’ overall improved behaviours. Indeed,the effect is much larger due to the additional time, which is an important feature of theintervention itself. It takes time for students to engage deeply in a task, think critically, andsolve any problems that arise (Hofstein & Lunetta, 2004). Comparing between students inYear 2 at the 2-hour mark and the final 3-hour mark demonstrates the striking effect thatan extra hour can make to students’ productivity.I should note that the number of single-parameter mx fits and ln-ln plots decreasedslightly from the 2-hour observations and the final submitted materials. This is expected tohave occurred if students recognized that these fits were not helpful in understanding theirdata, due to the additional intercept required. This is interesting to note in light of thelimitations in the previous chapters. It was discussed that analyzing lab books can only keeptrack of recorded activity and many behaviours may have occurred without record. Theresult that some students created additional plots and then did not submit them at the end816.2. Other examples of evaluation0.000.250.500.751.00mx+b fit mx fit ln−ln plotFraction of StudentsYear 1−finalYear 2−2hour markYear 2−finalFigure 6.3: The distribution of graphical analyses made by students by the end of the LRcircuits lab in Year 1 and Year 2 and within the first two-hours of the lab in Year 2. Errorbars represent 95% confidence intervals on the proportions. They are larger for the Year2-2hour mark, since only groups, rather than individuals, were assessed. Bars in each yearor time group may add to more than 1, since students may have created any number of thethree graphs.of the lab period demonstrates that students in Year 2 still may have engaged in additionalreflective and iterative behaviours beyond what was recorded. Differences between Year 1and Year 2, then, are unlikely attributed to students in Year 2 simply recording more whileengaging in the same behaviours as students in Year 1.The Invention activity provided to the students in Year 1 just before the LR circuits labmay, however, have narrowed the focus of students’ analysis. That is, the activity introducedstudents to the uncertainty in the slope of a one-parameter best fitting line (that is, withthe intercept fixed at the origin). As such, it could be argued that these students weremore likely to fix the intercept at the origin so that they could apply the learned formula.The Invention activity, however, also included a task that introduced the uncertainty inthe slope of a two-parameter best fitting line (intercept not fixed) and so students did haveaccess to both options. They also could have used their analysis to identify the issue evenif they did not change their fit.6.2 Other examples of evaluationOther experiments involved opportunities to evaluate models. This could involve, in theRS experiment for example, identifying that beyond a certain thickness the detector onlymeasured unshielded background radiation and so the shielding model was limited. I havedistinguished here evaluating limitations of given or developed models (as in the RS experi-826.2. Other examples of evaluationment) or identifying unjustified assumptions (for example, assuming the mass of the springwas negligible in the MS2 experiment) from systematic measurement errors (as in the IoRexperiment).As an example of this evaluation process early in the lab course, we will return to asample explanation already examined in chapter 2. In this example (Figure 6.4a), thestudent is reasoning through his results, trying to reconcile the disagreement between thetwo pendulum period measurements, which was “the opposite of the expected” outcome.The student has combined his reflection on tools (“t > 3, so the measurements are different”)and the improvements through iteration of measurements (“If you can make a precise enoughmeasurement...”), to better understand the model and the physical nature of approximations(“...you can show that the [equation] for a pendulum is just a ‘good approximation’ andreality is slightly more complicated.”).In the RS experiment, the model change involved recognizing the limitation of their ownmodel (rather than a given model) for decay rate as a function of radiation shielding. Oncethe shielding reaches a large enough thickness, the measured decay rate reaches a constantlevel of background and the model no longer applies. Figure 6.4b shows an example of astudent coming to this conclusion. In this experiment, students were primed to the presenceof the background radiation at the start of the experiment through explicit instructions tocompare the measurement of background radiation (with no source) to a measurement ofcount rate at a large enough thickness that the source radiation was mostly shielded. Whilethis was not an explicit instruction to evaluate the theoretical model, it did prepare studentsto deal with this limitation to their model.In the mass on a spring experiment, the model change involved an unjustified approx-imation, where one assumes that the mass of the spring is negligible in the oscillationmeasurements. When the measurement quality becomes sufficiently high, this is no longerthe case, resulting in a disagreement in the value of the spring constant from the Hooke’s lawmeasurements. This model issue was much more subtle than in other experiments, requiringvery careful measurements of the oscillations, which few students obtained by the end ofthe experiment. Figure 6.5, shows an example of a student reaching this conclusion. Thisgroup had also forgotten to include the mass of the holder, initially obtaining a t′-score of33 between their two measurements of the spring constant. This model change did requiresignificant TA and instructor support, as students were generally frustrated with countingoscillations to obtain sufficiently high quality data.By the time students conducted the LR experiment in the second term, students weremore willing to adjust the theoretical model based on their data. The high rates of evaluationin the RS and LR experiments also suggests that the mode through which the model issuebecomes exposed may contribute to whether students make the evaluation. This involvesboth the precision of measurement required and whether the issue is revealed graphically(as in the RS and LR experiments) or through individual measurement comparisons (as in836.2. Other examples of evaluation(a)(b)Figure 6.4: Samples from student lab books of evaluating theoretical models in the (a)Pendulum and (b) RS experiments.846.3. Conclusionsthe Pendulum and Mass on a Spring experiments).Figure 6.5: Sample from a student lab book of evaluating the theoretical model in the MS2experiment.6.3 ConclusionsIn this chapter, we have seen how the SQILab structure increased the likelihood of studentsevaluating theoretical models. This is an important result, since evaluating models was nevera scaffolded behaviour in the SQILab. This tells us that scaffolding structured comparisons,reflection on analyses, and iteration of experimental methods led to long-term changesin whether students evaluated the theoretical models, a behaviour that is generally notobserved in undergraduate physics labs (Ryder & Leach, 2000).It has already been shown that students in Year 1 were focusing on using the technicaldata handling skills they had developed to analyze their data, without using higher-levelskills to make sense of those tools (Chapter 4). Students were much more often in a rote rea-soning or plug-and-chug epistemological stance (Redish, 2014; Tuminaro & Redish, 2007),fitting their data to the given model and extracting the desired quantities from the fit.Without that reflection, it makes sense that students were unlikely to evaluate the the-oretical model, since they had not thoroughly made sense of the data that supports theevaluation.In addition, students in Year 1 were still in novice epistemologies regarding the nature ofscientific measurement. As discussed earlier in this thesis, introductory students often are856.3. Conclusionsof the mindset that their own measurements are of poor quality (Allie et al., 1998; Buffleret al., 2009; Se´re´ et al., 2001) and, consequently, explicitly aim to conclude that their datais in agreement with theory (Holmes & Bonn, 2013). Without a sense of trust in theirdata, it is unlikely they would evaluate a theoretical, authoritative model (Ryder & Leach,2000). The time progression of the model changes in Year 2, excluding the RS experiment,demonstrates a breakdown of students’ initial resistance to evaluating the theoretical modelsbased on their data (Ryder & Leach, 2000). Of course, the experiments must be carefullychosen and designed to give students this opportunity. Without experiments that allowstudents to obtain sufficiently high quality data to expose the limitations and unjustifiedassumptions of the theoretical models, they would never be able to do so.We have not examined a rigorous time development of evaluation behaviours here, mostlydue to challenges coding this kind of behaviour throughout the year. Future research shouldexamine this time development, but also examine how students evaluate models. For ex-ample, are students more likely to evaluate models when they come across different typesof issues (be it model limitations or unjustified assumptions)? Also, does the analysis pro-cess support model evaluation (be it identifying issues graphically or through quantitativedisagreements)?The fact that students in Year 2 evaluated a given model on an experiment where allscaffolding was removed (including comparison, reflection, iteration, and even instructorsupport), demonstrates the substantial shift in epistemological frames in the SQILab. Theintervention, it seems, provided students with much greater confidence in their measurementability and a clearer perception of the limitations of theoretical models. They were muchmore willing to adapt the theoretical model based on their own measurements, and, moreimportantly, translate this analytic change into a physical phenomenon (namely, additionalresistance in the LR circuit). One could conclude, then, that having a better understandingof the measurement process and experimental data provided students better opportunitiesfor understanding the physical concepts at play. This is indeed supported by the fact thatevaluating theoretical models was never a scaffolded behaviour in the lab course.As we have already identified in earlier chapters, conducting experiments with thesesorts of measurable model issues improves students’ spontaneous reflection and iteration.The experiences identifying them, however, bring an interesting affective benefit, which wewill examine in the next chapter.86Chapter 7Motivation and attitudes in the labIn addition to the behaviours and actions of students in the lab, it is important to eval-uate the impact of the framework on student attitudes, epistemologies, and motivation. Ihave used two tools to evaluate these affective components of learning: the AchievementGoal Questionnaire and the Colorado Learning and Attitudes about Science Survey forExperimental Physics.7.1 Achievement goals and motivationStudents’ motivation for learning, or their goals for completing a learning task, influencetheir learning behaviours (Dweck, 1986). Achievement goals, the goals that students wish toachieve from cognitive tasks, are often categorized as being either mastery- or performance-oriented. Mastery orientation relates to motivation focused on mastering skills or concepts,while performance orientation focuses on demonstrating competence or outperforming oth-ers. These goals can be either an approach orientation (as in “I want to do well”) or anavoidance orientation (as in “I do not want to do poorly”) and they are not mutually exclu-sive. That is, an individual can want to master the content material while also wanting toperform well, though these combinations may differ in different contexts. These orientationshave been shown to affect students’ learning strategies (Ames & Archer, 1987, 1988). Inparticular, students with higher mastery-orientation are often found to demonstrate moreeffective learning and study strategies (Ames & Archer, 1988; Elliot & McGregor, 2001) andto better transfer their learning to a new situation (Belenky & Nokes-Malach, 2012).We aimed to determine whether the intervention affected students’ achievement goalorientation across the year using a nine-item survey adapted from items used by Elliot &McGregor (2001) and Belenky & Nokes-Malach (2012). The students were asked to respondto how much they agreed with the statements in relation to the lab course (on a 5-pointLikert scale from strongly disagree to strongly agree). The items and their associatedorientation goal can be found in Table 7.1.Of the nine items, three are Mastery Approach (MAp) oriented statements (items 1, 6,and 8), three are Performance Approach (PAp) oriented (items 2, 4, and 7), and three arePerformance Avoidance (PAv) oriented (items 3, 5, and 9). Mastery Avoidance orientationitems were not included in the questionnaire, since it was unclear how, in a full-coursecontext, one would be concerned that they had not mastered all of the material.877.1. Achievement goals and motivationItem Statement Orientation1 My aim is to completely master the material presented in this class MAp2 I am striving to do well compared to other students PAp3 I am striving to avoid performing worse than others PAv4 My aim is to perform well relative to other students PAp5 My goal is to avoid performing poorly compared to others PAv6 I am striving to understand the content of this course as thoroughlyas possibleMAp7 My goal is to perform better than the other students PAp8 My goal is to learn as much as possible MAp9 My aim is to avoid doing worse than other students PAvTable 7.1: The 9 items on the AGQ and their associated achievement goal orientations.Students were asked to rank their agreement on a 5-point Likert scale from Strongly Disagreeto Strongly Agree.Achievement Goal Orientation Comparison Type df F η pMApTime 1, 236 39.16 0.14 <.001∗∗∗Year*Time 1, 236 12.13 0.049 <.001∗∗∗PApTime 1, 236 13.69 0.05 <.001∗∗∗Year*Time 1, 236 7.65 0.03 .006∗∗PAvTime 1, 234 2.06 0.01 .152Year*Time 1, 234 0.001 0.00 .971Table 7.2: ANOVA table for the three achievement goal orientations. Bonferroni correctionwas applied to account for the multiple comparisons (α=.02). ∗∗ p<.01. ∗∗∗ p<.001.Students’ responses on the three items in each orientation type were averaged so studentseach received one score out of five for MAp, PAp, and PAv, separately. Students in bothyears were given the survey during the first and last lab sessions of the year (that is,September and April, beginning of term 1 and end of term 2) along with a 10-item datahandling diagnostic (the Concise Data Processing Assessment Day & Bonn, 2011). Studentsin Year 1 also received different versions of this survey at additional time points during thelab course, but this data was not included in the study.7.1.1 ResultsOnly students who completed both the pre- and post-questionnaire were included in theanalysis (Y1: n=117; Y2: n=121). Using repeated measures ANOVAs for the three dif-ferent orientations (Bonferroni correction measures significances at the α=.02 level), therewere significant changes across the year and significant interactions for the MAp and PAporientations, but not for PAv (see Figure 7.1 and Table 7.2).Across all elements, motivation generally decreased over the course of the year (thoughnot significantly for PAv). The significant interactions demonstrate that the motivation887.1. Achievement goals and motivationMAp PAp PAvllllll3.03.54.04.55.0Pre Post Pre Post Pre PostAssessment TimeMean Scorel Year 1 Year 2Figure 7.1: Changes in motivation orientation over time for three achievement goal orienta-tions: Mastery Approach (MAp); Performance Approach (PAp); and Performance Avoid-ance (PAv). Error bars represent standard uncertainties of the mean.scores for the students in Year 2 did not decrease as dramatically as for students in Year 1.That is, even though we saw gaming behaviours with students attempting to get high marks(making deliberately flawed measurements to do so), students in Year 2 left the lab coursewith higher mastery and performance approach orientation than did students in Year 1.There is evidence that inquiry tasks, such as Invention activities (Schwartz & Martin,2004), encourage the adoption of mastery goals, while procedural tasks do not (Belenky& Nokes-Malach, 2012). In their study, students completing procedural tell-and-practicetasks were focused on reproducing the procedures they had learned, rather than on cre-atively generating their own solutions. Clearly, the results here can map on to similarbehaviours, as has already been discussed in previous chapters. In Year 1, we saw studentsin the lab focused on carrying out the statistical calculations for each lab, in rote-reasoningor plug-and-chug frames. In Year 2, students were deeply engaged in scientific inquiry, inmore expert-like epistemological frames. The motivation data here further supports thoseinterpretations. It also further supports the idea that the iteration scaffolding in the SQI-Lab acts as an Invention opportunity, with students inventing (designing) procedures toaccomplish a desired outcome (usually to improve their measurement quality).To further understand how the motivation maps onto students’ epistemologies, we willlook at students’ attitudes about experimental physics, this time comparing to a moretraditional introductory physics lab.897.2. Attitudes and epistemologies7.2 Attitudes and epistemologiesIt has been described that students epistemological stances, their beliefs about how knowl-edge is created, can influence their learning (Lising & Elby, 2005). Unfortunately, manyintroductory physics courses observe a decline in students expert-like attitudes and epis-temologies (Adams, Perkins, Dubson, Finkelstein & Wieman, 2004; Redish, Saul & Stein-berg, 1998). Positive attitudinal shifts have been observed, however, in courses with a focuson developing epistemologies, whether explicitly (Elby, 2001) or implicitly (Lindsey, Hsu,Sadaghiani, Taylor & Cummings, 2012), and teaching the process of science (Brewe et al.,2009; Otero & Gray, 2008). Introductory labs, therefore, could be the key opportunity fordeveloping students attitudes and epistemologies through the experimentation process.As has already been discussed, many students disassociate the activities of expert scien-tists from their own experiences in a lab (Buffler et al., 2009; Redish et al., 1998), suggestingthat the experimentation involved in traditional labs is not engaging students authenticallyin the scientific process. In a recent study, these beliefs were mapped onto inauthenticscientific inquiry behaviours such as artificially inflating uncertainties or not correcting sys-tematic errors in an experiment (Holmes & Bonn, 2013). This is unsurprising when oneconsiders the large set of desired learning outcomes in traditional labs, which are often notmade explicit to the students. These span learning physics concepts, developing scientificreasoning abilities, acquiring technical lab skills, understanding measurement and uncer-tainty, applying real-world connections to in-class material, shifting student attitudes andepistemologies regarding the nature of measurement and science, and much more (AAPT,1998; Zwickl et al., 2013a). Trying to accomplish this diversity of desired outcomes in asingle lab course can lead to cognitive overload, resulting in exposure to many ideas butmastery of none. Indeed, many students often find there is simply not enough time to focuson all aspects of the labs (Holmes & Bonn, 2013).We have already seen how the SQILab framework described in chapter 2 engages stu-dents in more authentic experimentation behaviours with a more explicit focus on scientificprocess, especially the reflective and iterative nature of scientific experimentation and de-velopment of models. In this section, we aim to evaluate whether the SQILab impactedstudents perceptions and attitudes about experimental physics, as measured by the Col-orado Learning and Attitudes about Science Survey for Experimental Physics (E-CLASS;Zwickl, Finkelstein & Lewandowski, 2012). We compare the SQILab to a more traditionalintroductory physics lab, one meant to synchronize with and support the physics topicscovered in lectures. This traditional lab targeted the development of a large number ofskills, but had no explicit goals to develop expert-like scientific reasoning. We thus aim toexamine whether activities that deliberately target authentic scientific reasoning, supportedby associated assessment, has a positive impact on students’ attitudes and epistemologiesabout experimental physics.907.2. Attitudes and epistemologies7.2.1 MethodsParticipants were 580 students in two introductory physics courses at the University ofBritish Columbia. Students included in the analysis met three criteria: they wrote both thepre- and post-survey E-CLASS; did not leave more than 10 items blank throughout boththe pre- and post-surveys; and correctly responded to the item, “We use this statement todiscard the survey of people who are not reading the questions. Please select ‘Agree’ forboth questions to preserve your answers.”The traditional lab course (n=453) is a single-semester calculus-based physics course,where the lab is a required component of the lab course, involving 6 three-hour lab sessionsduring the term. For the SQILab (n=127), we included only the first semester of thelab course, consisting of 8 three-hour lab sessions, to make it more comparable to thetraditional course. As previously described, the SQILab is taken by students seeking anenriched physics curriculum and approximately 25% of the students intended to major inphysics or astronomy, compared to only 2% in the traditional lab. 97% of students (inboth courses) intended to major in a science discipline. Since students in the SQILab optedto take a more challenging course, it is likely that they entered the lab course with moreexpert-like attitudes. Shifts in students’ attitudes from pre- to post-survey between thecourses is, therefore, the important comparison to be made.Students completed the E-CLASS at the beginning and end of the semester in eachcourse. Students in the SQILab completed the survey during lab time and were offered asmall amount of course credit for completion. Students in the traditional lab were asked tocomplete the survey outside of class time and were offered a small amount of course creditfor completion.The E-CLASS toolThe E-CLASS poses questions about 30 concepts in three ways (Zwickl et al., 2012):• Students’ personal attitudes and beliefs: “What do you think?”• Students’ views of experts: “What would an experimental physicist say about theirresearch?”• Students’ views of the importance: “How important for earning a good grade in thisclass was...?”All 30 concepts were posed as personal and expert beliefs questions both on pre- and post-surveys, while 23 concepts were also posed for level of importance in the lab course onpost-survey only.Scores for the personal and expert attitudes were converted from a 5-point Likert scalemeasure of agreement (where 1 is strongly disagree and 5 is strongly agree) to a binary scale917.2. Attitudes and epistemologiesBeliefs Type Comparison Type df F η pPersonalCourse 1, 578 31.20 0.05 <.001∗∗∗Time 1, 578 31.74 0.05 <.001∗∗∗Course*Time 1, 578 10.16 0.02 .002∗∗ExpertCourse 1, 578 13.05 0.02 <.001∗∗∗Time 1, 578 0.01 0.00 .931Course*Time 1, 578 1.74 0.003 .188Table 7.3: ANOVA table for students’ personal and expert beliefs on the E-CLASS itemsacross courses and time. ∗∗ p<.01. ∗∗∗ p<.001.of whether their response was favourable or unfavourable, since concept items were phrasedsuch that favourable expert-like responses varied between agreement or disagreement. Thefavourable responses for each concept item have been previously validated by surveying asample of experimental physicists during the survey development (see Zwickl et al., 2012).Students were then given a single score for the number of favourable responses for personaland expert attitudes, separately, each out of 30. For the items related to importance in thecourse, students’ Likert ratings across the items were averaged to give a single score for howimportant the items were for earning a good grade in the course.7.2.2 ResultsTwo sets of analyses were carried out on students’ responses. First, two repeated measuresANOVAs were carried out on the personal and expert beliefs items separately. An inde-pendent samples t-test was used to compare students’ views of how important the conceptswere for earning a good grade in each course. The next sections will present these analysesindividually.Personal and expert beliefs itemsFigure 7.2 shows the average fraction of favourable responses by students at pre- and post-survey on both the personal and expert items. Results of the repeated measures ANOVAscan be found in Table 7.3. There were overall significant effects for course on both thepersonal and expert beliefs, which is in line with the general population differences previ-ously described. There was also a significant shift over time and a significant interactionbetween course and time on students’ personal beliefs. From Figure 7.2, we can see thatthe interaction represents a significant drop in personal beliefs by the traditional students,but a neutral shift for the SQILab students. The expert beliefs were stable over time forboth courses, which has also been observed in other courses surveyed with the E-CLASS(Zwickl et al., 2012).927.2. Attitudes and epistemologiesPersonal Expertl ll l0.50.60.70.80.91.0Pre Post Pre PostAssessment TimeMean ScoreCourse lSQILab TraditionalFigure 7.2: Changes in students’ personal and expert beliefs on the E-CLASS items for atraditional physics lab course and the SQILab. Error bars represent standard uncertaintiesin the mean.Importance for the course itemsWhen comparing the average Likert score for how important the concepts were for obtaininga good grade in the course, the average score of the SQILab students (M = 3.86, SD = 0.45)was significantly higher than those of the traditional lab students (M = 3.67, SD = 0.58),t(254)=4.12, p < .001. This suggests that the assessment in the SQILab more successfullyaddressed the items on the E-CLASS, since students in the SQILab found the items to bemore important for earning a good grade in the course.7.2.3 DiscussionThis study examined the effects of two lab courses on student attitudes and beliefs aboutexperimental physics. It was found that students in a traditional lab demonstrated sig-nificant negative shifts on personal attitudes, consistent with other measures of studentattitudes deteriorating over the course of an introductory physics term (Adams et al., 2004;Redish et al., 1998). While positive shifts in epistemologies have been found in courses thatemphasized scientific process (Brewe et al., 2009; Elby, 2001; Lindsey et al., 2012; Otero &Gray, 2008), the results presented here suggest that simply engaging in experimentation, asin the traditional course, does not produce the same effect. From this result, one could con-clude that traditional lab experiments do not effectively engage students with the scientificprocess. Without explicitly supporting the scientific reasoning goals, students attention isnot exclusively focused there.The goals and structure of the SQILab went beyond measurement and uncertainty by937.2. Attitudes and epistemologiesalso emphasizing reflection, iteration, and improvement of experimental methods and phys-ical models. These are vital for developing expert-like epistemologies and attitudes towardsexperimental physics. Through these cycles, students saw physics models adapt and changeas the quality of their measurements improved. The item, “When doing a physics experi-ment, I don’t think much about sources of systematic error” was addressed in experimentssuch as the IoR and Pendulum experiments, where students had to confront issues in theirmeasuring process (such as misreading the protractor or miscounting pendulum periods)to resolve disagreements between measurements. More students in the SQILab respondedfavourably on this item by the end of the term (from 73% at pre-survey to 82% at post-survey), while fewer students did in the traditional lab (from 67% to 55%). Students inthe SQILab also found this item more important for earning a good grade in the course(M = 4.38, SD = 0.81) than did students in the traditional lab (M = 3.83, SD = 0.96), astatistically significant difference, t(236.31)=6.50, p < .001. This reinforces the intentionalalignment of the course goals to the course assessment in the SQILab.The cycles also moved uncertainties from abstract calculations required by the instruc-tor to important tools that were useful for understanding their data. Indeed, the item,“Calculating uncertainties usually helps me understand my results better” is a key goaltargeted by the reflection and iteration cycles described earlier, but one that might be re-garded as an implicit goal in traditional courses as well. This item showed an increase infavourable responses in the SQILab (from 61% of students responding favourably to 77%)and a decrease for the traditional lab (from 64% to 47%). Students in the SQILab againfound this item to be more important for earning a good grade in the course (M = 4.56, SD= 0.67) than did students in the traditional course (M = 4.04, SD = 1.00), t(300.76)=6.77,p < .001.Finally, the course had a large focus on data handling and analysis procedures. Theitem “If I don’t have clear directions for analyzing data, I am not sure how to choose anappropriate analysis method” showed an increase in favourable responses in the SQILab(from 34% to 45%) and no shift in the traditional lab (25% favourable responses at pre- andpost-survey). Once again, students in the SQILab found this item to be more importantfor earning a good grade in the course (M = 4.39, SD = 0.78) than did students in thetraditional lab (M = 3.95, SD = 0.91), t(231.80)=5.38, p < .001.Course elements such as these provided students in the SQILab with more authenticscientific reasoning experiences. Even with this focus, however, students in the SQILabdid not improve significantly on their expert-like beliefs overall. This may be due to asubset of items on the E-CLASS that were not targeted in the lab course. For example,the lab course deliberately does not involve much experimental design and students donot conduct experiments that come from their own research questions. While the reflect,iterate, and improve cycles provided opportunities for designing the improvements to theirmeasurements, the item, “When doing an experiment I usually think up my own questions to947.2. Attitudes and epistemologiesinvestigate” was deliberately not targeted. Indeed, this item shows a decrease in favourableresponses in the SQILab (from 37% to 32%) and in the traditional course (from 32% to20%). Students in the SQILab found this item just as important for earning a good gradein the course (M = 2.91, SD = 1.15) as did students in the traditional lab (M = 2.86, SD= 1.22), t(211.61)=0.36, p=0.718. This is further evidence that students’ attitudes in eachcourse were tied to the perceived goals of the course (and the associated assessment of thosegoals).Another possible reason for the lack of overall significant upward shifts in the SQILabmay be that we did not explicitly confront students’ attitudes and epistemologies. Elby(2001) suggests that attitudinal or epistemological goals must be targeted explicitly, ratherthan implicitly through the activities. While our course more thoroughly engaged studentsin the scientific process than a traditional lab, the course did not explicitly address theirpersonal beliefs and attitudes. The Physics of Everyday Thinking curriculum, which hasbeen shown to produce positive shifts in student attitudes and epistemologies, explicitlyasks students to reflect on the nature of science, on their own learning, on the learning oftheir peers, and on the learning of scientists (Otero & Gray, 2008). The latter, in particular,would help address epistemological issues and connect students’ experiences in physics classwith the nature of physics more broadly. While our course appears to have shifted students’epistemologies, we may not have made them sufficiently aware of this shift.We must be careful, however, when comparing this study to those mentioned above.In particular, the context of epistemologies in the lab, versus those in lecture courses, iscertainly a distinct experience for students. The epistemological concepts addressed bythe E-CLASS differ significantly from the surveys used in the other studies. It is also anewer instrument so there is not yet a large database of results of other courses and otherpopulations.Regardless, the distinction between engaging students in authentic scientific processin a lab and engaging students in experimentation is clear. Simply performing physicsexperiments does not improve students’ attitudes and epistemologies about experimentalphysics. While the SQILab demonstrates an approach that yields improvement over thetraditional course, deliberate and explicit attention to student attitudes and epistemologiesmay be required for additional improvement. The study also demonstrates the importanceof aligning assessment to the desired learning outcomes. Students in the SQILab viewedthe items on the E-CLASS as more important for earning a good grade in the course thanthe students in the traditional course. With the goals aligned with assessment, studentsknew where to focus their attention, thus maintaining their attitudes in those areas. Anumber of student interviews were also conducted to further understand the results fromthese surveys.957.3. Interviews7.3 InterviewsEach year, a number of interviews were conducted with students to supplement the datacollected from students’ lab books and documentation. The interviews always began withan open-ended question asking for the student’s opinions about the lab course overall. Justfrom this question, significant distinctions between years emerge. While in both years, thecomments are generally positive, in Year 1, students focused on the technical, proceduralaspects and the data handling skills. For example, one student commented:“I liked how they were different from other labs I’ve done before. I’ve found thatwith other labs, you don’t really learn anything. You just have to do somethingand it’s kind of a hassle to write it up and stuff. ... I also found it really beneficialbecause you actually learn a technique about how to analyze data and how tointerpret it and how to change it so you can use your data in whatever way youwanted, so I found that that was something more helpful than other labs.”One student in Year 1 did identify one of the largest limiting factors to their criticalthinking behaviours:“I think ... the labs are too long. ... It becomes counter-learning after a certainlength, because if you can finish in 2 hours comfortably, then you have extratime to ask questions, to ask these higher questions that students should beasking. So I need more time to be like, ‘Why, why would that happen?” Butwhen it’s super time pressured you just - you don’t care. You want to get thelab done, you want to get it handed in.”In Year 2, however, students opened up the discussion with comments such as:“[The lab has] kind of refined my scientific thinking and scientific procedure ...I think ‘What am I doing?’ then, ‘ok, How am I going to do it?’ ... then youget data and you look at it and then write down how are you going to get betterdata? What does it tell you? What more could you do to it?”“It’s [the lab is] good, because we get to actually practice, like, real sciencerather than just filling out [forms] and finding answers ... We get to scrutinizeand criticize what we’re actually getting and trying to understand what themeasurement itself is, is more important than getting the right answer in thevery end.”When asked what the goals of the lab were, one student replied, “to be a good scientist onour own.” Other students provided evidence of their changing epistemological and stances:“It’s actually pretty helpful, because it gives me a perspective of thinking aboutthings - thinking about data in a way I wouldn’t have if I hadn’t taken thiscourse ... What I’m [doing] actually helps me develop a scientific mind.”967.4. Summary“In a physics class, we use equations to solve, theoretically, what the answer’ssupposed to be, whereas in the lab, it shows us what it actually is, with ev-erything considered that is usually not part of the equations. It makes it morerealistic, because I can connect with it more.”“In physics, theres lots of simplifications and approximations and things thatwe can ignore. When we do the experiments ourselves we can see why physicistswould do that.”To some students, the lab course affected their images of themselves as scientists:“When I’m reading about something or solving physics problems or just readingabout physics concepts, the idea of me being a physicist in that sense is veryfar fetched ... it [the lab] helped me think about a bunch of data that I have infront of me, that looks like chaos, in a more scientific way. ...It integrates everything so much more and it helps me see myself as a scientistway more than all my other classes, because those are just putting information -giving me information, rather. It helps me actually reach in and realize, ‘oh, thismakes sense! I can actually do this too,’ rather than just memorize a textbook.”7.4 SummaryIn this chapter, we have examined the effects of the SQILab on student attitudes, motivation,and epistemologies. In addition to the improved behaviours measured elsewhere in thethesis, the results in this chapter demonstrate that SQILabs also improved a number ofattitudinal issues, from mastery orientation attitudes to epistemologies.Students’ experiences iterating to improve measurements and evaluating models to de-termine the assumptions and limitations of theoretical equations has greatly enhanced stu-dents’ understanding of the nature of scientific measurement. While structured comparisonswere key, comparisons were never done in a confirmatory way, to verify values or equationsin textbooks. The comparisons were always made to validate measurements or models — toexplore and to do science. This distinction is important for providing an authentic scientificinquiry environment, where students truly feel they are doing real science, as evidenced bythe interview data. This authentic experience is thought to be key to improving studentepistemologies (Brewe et al., 2009; Otero & Gray, 2008). The element of creativity andinquiry is said to trigger more mastery-oriented motivation procedural activities (Belenky& Nokes-Malach, 2012). Students in this course had experiences where they measured aphysical phenomenon better than the approximations they were taught in class. Theseexperiences not only engaged them in more authentic scientific investigations but madethem feel that they were becoming scientists, rather than just being given information thatscientists have already developed.97Chapter 8ConclusionsThis thesis has explored the impacts of the SQILab structure on students’ behaviours,attitudes, and epistemologies in a first-year physics lab. The main elements of the pedagogyinvolve scaffolding students’ reflection on structured comparisons and quantitative analysisof measurements, leading to iteration and improvement of experimental procedures. Tosuccessfully reflect on their data, students relied on an analytic toolbox and a quantitativeframework with which to think critically. Both of these were critical resources to shiftstudents into productive epistemological frames (Hammer & Elby, 2003).The other critical resources were the analytic data handling tools that supported thecomparisons, the t′-score and χ2w tools. These tools grounded the cycles in a quantitativeframework. They also supported the use of measurement uncertainty as an important, con-crete tool that students need to understand and manage. With a language and frameworkfor reflection in place, follow-up actions emerge from the comparisons. With careful guid-ance and support, these follow-up actions engage students in experimental design as theyinvent ways to improve their measurements. This iteration reminds students that sciencedoes not end after the data collection and analysis, but that it introduces new questionsand measurements to test.With more complicated data to, for example, verify a theoretical model, this reflec-tion process can lead to refining the theoretical model to map it onto the physical systemand measurement tools (Zwickl et al., 2013a). It also provides a constrained space for ex-perimental design and reflection, which supports improved development of many scientificreasoning abilities (Etkina et al., 2010). As the quality of measurements improves, limi-tations of and unjustified approximations in theoretical models become exposed, leadingstudents naturally to evaluate those models. Through this iterative process, students areengaged in authentic scientific inquiry experiences that traditional ‘cookbook’ labs do notsupport. Experiences where their own measurements outperform the approximations seenin lecture enhance students’ epistemological frames and their beliefs and attitudes towardsthe nature of science, scientific measurement, and learning.If we go back to the list of explicit learning goals for the “Introduction to ExperimentalPhysics” course, the majority of them focus on these rote procedures for handling andanalyzing data from measurement. Indeed, this has been the focus for a number of yearsand no tools were removed from the list in Year 2 (though two were added that help withcomparing individual measurements and verifying models). We also have evidence that98Chapter 8. Conclusionsdeliberate efforts were improving students’ understanding of the various procedures (Dayet al., 2014; Holmes et al., 2014). The procedures were, however, hoops that the students hadbeen asked to jump through by the instructor and TAs. Without the reflective structure, theprocedural focus of the lab put students in epistemic games (Tuminaro & Redish, 2007), suchas the “plug-and-chug” stance, where they were working through the procedures withoutmindfully reflecting on their meaning or interpreting the associated outcomes (Buffler et al.,2001). They had not been shown how the analytics procedures themselves or repeatingmeasurements could be used to improve data and models or explain physical concepts at adeeper level. The results in this thesis have shown that the SQILab structure did not limitstudents to just learning the procedures, but instead taught them how the procedures actas tools that inform scientific inquiry. Students moved from rote reasoning frames in Year1 to “mapping meaning to mathematics” frames in Year 2, as they reflected on the tools tomake improvements, modifications, and novel conclusions.In Year 2, this shift in instruction helped to develop self-regulation behaviours, in linewith Strategic Content Learning (SCL) pedagogies (Butler, 2002). In SCL, self-regulatedlearners work through an iterative cycle of learning activities: interpreting and analyzingthe task demands; selecting strategies to apply to the task; and monitoring the outcomesof this strategy use. Even though the initial set of procedures for analyzing the data wasthe same both years, the students interpreted the goal of the tasks very differently. Eventhough the goal provided to students in the LR Circuits Lab was to verify the model andfind the value of the inductor, students in a “plug-and-chug” stance may have misinterpretedthat goal as “fit the model to your data and find the value of the inductor.” If studentschoose strategies based on what has been successful in the past, the Year 2 students wereundeniably at an advantage, since their past experiences involved a much more meaningfulset of behaviours based on the strategy use. The main distinction between Year 1 and Year2, however, lies in the monitoring phase. Students in Year 1 showed little to no evidenceof monitoring the outcomes of their strategy use, evidenced by the lack of reflection. Withinstruction that focused primarily on the use of analysis procedures, it is unsurprising thatstudents in Year 1 did not engage in other elements of self-regulated learning.The lab instructions and assessments seem to be a major factor in these outcomes. InYear 1, the marking schemes focused on whether students had carried out the technicalanalysis and procedures, but did not give credit for quality of data, evidence of reflection,or iteration. Students could succeed in the lab without thinking critically about theirexperiment or data. The rote reasoning and plug-and-chug epistemic stances certainlyaligned with the structure of the lab course. In Year 2, when critical thinking behaviourswere scaffolded, they engaged in their experiments at a much more sophisticated level.There were, however, issues with putting too much weight on the behaviour (iteration) interms of assessment. It is clear that rather than rewarding students for iterating, one mustreward students for the critical thinking behaviours associated with iterating, especially in998.1. Advice for implementationjustifying why they made the measurements they did.While the issue in Year 1 was whether students were engaging in these higher-levelbehaviours at all, in Year 2 the question became whether they would transfer them tounscaffolded experiments. From the results throughout this thesis, especially in chapter 6,it is clear that students were able to transfer their behaviours spontaneously and withoutsupport, though not immediately (see the Pendulum 3 experiment, for example). Withsufficient time and practice, the behaviours became abstracted and transferable to differentcontexts (Salomon & Perkins, 1989). For example, setting the reflection and iterationframework through t′-score comparisons became sufficiently abstracted that students couldthen apply the concept to the χ2w value for comparing data with models with limited supportfrom the instructors. The associated behaviours, iterating to improve data or measurements,or evaluating theoretical models, then get mapped on to the abstracted structure. Thisabstraction was only possible with the deliberate practice opportunities (Ericsson, Krampe& Tesch-Romer, 1993) and scaffolded support in strategy use, including associated supportfor reflection, metacognition, and motivation (Butler, 2003). One should be reminded thatstudents in Year 1 did learn the about weighted least-squares fitting and the interpretationsof the χ2w, in terms of small values suggesting overestimated uncertainties and large valuessuggesting disagreement. They had equal opportunity to employ these strategies in labssuch as the LR circuits experiment, but they did not.The iterative comparison cycle also provided and engaged students with meaningful andauthentic contrasting cases, through which they developed deep features of the problem(Bransford et al., 1989; Holmes et al., 2014). That is, by making a high-quality measure-ment of the period of the pendulum, even using multiple trials of multiple swings, all onecan conclude is the result of that measurement. It does not provide any information aboutperiods of pendulums in general. Measuring the period at a different angle and then com-paring to the original measurement, regardless of the result, begins to provide a pictureof how pendulum periods work. Then doing a more precise measurement and comparingall four cases opens up a bigger picture about measurement and pendulums. The originalmeasurement needed alternative comparisons in order for the learner to construct meaningfrom the measurement (Bransford et al., 1989). Students need to be prompted to makethese sorts of comparisons before they can do them on their own (Bransford et al., 1989;Holmes et al., 2014; Salomon & Perkins, 1989).8.1 Advice for implementationFor readers interested in applying this pedagogy to their courses, there are a number ofkey features for successful implementation. I should remind the reader that the pedagogyin place here does not require the lab to be integrated with the lecture component of thecourse. The experiments did not expect or require any conceptual physics understanding1008.1. Advice for implementationto be brought into the lab, nor did we expect or require students to leave with a strongergrasp of any typical introductory physics concepts. That being said, there is, of course,a benefit to the students having a balanced level of prior knowledge and knowledge gaps.For example, the pendulum experiment, where students discover the angle dependence inthe period of the pendulum, worked well if students had prior knowledge that the period ofthe pendulum should not depend on angle. If students had extensively explored the smallangle approximation, identifying what size of angle or precision of measurement would benecessary for the approximation to no longer be valid, they would not have had the discoverylearning opportunity. Conversely, if they had never seen the equation for the period ofthe pendulum and did not know whether the period should depend on amplitude, theirintuition may have accepted the angle dependence without much evaluation. Indeed, thereis a fine balance here that ought to be further explored. One clear benefit to having the labseparate from the conceptual course, however, is that one can spend more time developing aquantitative statistical toolbox and explicitly focus on experimentation without the necessityto uncover specific conceptual ideas.First and foremost, it is critical that students have sufficient time to engage in reflection,iteration, and evaluation in every activity. In Year 1, students had felt rushed for time in anumber of activities. When they were given sufficient time to reflect and iterate in the IoRexperiment, they were in the habit of rushing and so did not change behaviours. Instead,many of them opted to leave the lab early.One also needs to be careful with explicit instructions to repeat measurements. As pre-viously discussed, students had been gaming the scaffolding in the experiments, deliberatelyperforming poor experiments on their first attempts to ensure they could improve in thesubsequent iterations. This should be addressed and discussed early, making it clear tostudents why iteration is important and what it can do for them.With that in mind, the experiments and equipment themselves should be thoroughlyvetted so that students can have the desirable model-evaluating experiences in the avail-able time. This brings up a significant distinction between systematic effects and modelchanges. I define a systematic effect as an issue (error, mistake) with the measurement pro-cedure, such as a calibration error or misreading a measurement device. A model change,in contrast, relates to limitations or unjustified approximations of a theoretical model be-ing applied to the physical system, especially when interpreting the data. When studentsidentify these issues with a theoretical model based on a high-quality data set that theyhave collected, they are exposed to advanced, expert-like ideas about epistemologies andthe nature of science and scientific measurement. When students confront systematic ef-fects, novice ideas about ‘human error’ and bad equipment become reinforced, promotingnovice-like epistemologies and nature of science issues. Indeed, from the data presentedin this thesis, it seems model changing experiences more often supported high-level criti-cal thinking behaviours, while issues due to systematic effects sent students into frustrated1018.2. Future directions‘plug-and-chug’ stances. A careful treatment of systematic effects as part of modeling themeasurement equipment (Zwickl et al., 2013b; Zwickl, Finkelstein & Lewandowski, 2014)may be required to move these unproductive behaviours into more productive scientificreasoning behaviours.In developing the analytic toolbox, it is important to be clear about the proceduralversus reflective goals of the tool. When tools were presented exclusively as proceduralobjects, students entered plug-and-chug stances (for example, the weighted average in theIoR experiment or slope uncertainty during the LR experiment in Year 1). When thetools could be mapped onto the critical thinking and experimentation behaviours, studentsbehaved accordingly. The higher-level behaviours associated with the mathematics arenot obvious to novice learners. These items ought to be explicitly addressed, targeted,and supported. There should always be opportunities to use the analytic tools to makecomparisons between their own measurements or with a theoretical model (never with ‘true’values).8.2 Future directionsThis research has elicited a number of follow-up research questions. The first, which mycollaborators and I aim to target in the near future, is whether this pedagogy is transfer-able to a different population of students (namely, non-honours students) and to a largercourse with more students, more sections, more instructors, and more TAs. It would thenbe interesting to examine how the pedagogy transfers to lab courses in other disciplines.All of the principles embedded in the framework are indeed science-general, though theiteration process may be more challenging in the life sciences, for example, given typicaltime constraints.In addition to transfer across disciplines, transfer outside of experimentation is also ofinterest, for example, in problem solving. If we begin from the hypothesis that it is importantto reflect and iterate to improve problem solving attempts, what is the equivalent of thequantitative t′-score framework? The SQILab framework gave students authentic ways toreflect and loop back to make improvements, but the analogous framework for problemsolving is unclear. One option would be to use carefully crafted problems that studentscould solve multiple ways, so the iteration is to compare their first solution to a solutionthrough a different method. Use of approximations or estimations may also be of usehere, where students compare their rigorously solved problem to an order-of-magnitudesolution involving approximations or estimations of physical quantities. There certainlyexists literature that builds on these ideas, but the connection between the SQILab iterationcycles and the problem solving literature would be worth further investigation.While students’ iterative measurement behaviours were improved by the SQILab, thereare two important pieces of missing data. The first is whether the iteration process actually1028.2. Future directionsled to improved measurements. This is a question we aim to answer in the near futurethrough detailed analysis of the data students collect in several experiments. The secondis a more thorough evaluation of students’ conceptual understanding of measurement un-certainty. While students in the SQILab were using uncertainties in more expert-like ways,their conceptual understanding was not directly assessed. There does not yet exist a simple,efficient, and validated instrument for assessing these concepts, though there is certainly aneed for such an instrument.With regards to the modelling aspects, the time progression of students’ evaluation oftheoretical models ought to be clarified. From the data collected, it is unclear whether eval-uation is better supported by particular model issues (limitations of a model or unjustifiedapproximations) or in the process through which those issues are uncovered (comparingindividual measurements or graphically comparing data sets to theoretical model equa-tions). One may expect graphical comparisons between data and models to better promoteevaluation, since problems are more easily visualized graphically (and using residuals), al-lowing students to identify patterns of behaviours, rather than simply determining whetherdifferences exist. Indeed, the graphical comparisons help to inform an actual model.Evaluating transfer of these behaviours from the SQILab to experimentation experiencesin upper-year curriculum is also important. This first applies to whether students continueto use them when they move on to upper-year lab courses or research experiences. It alsoapplies to transformations that should occur in upper-year labs to extend the frameworkpresented here. One significant place for extension would be in adjusting the physicalmodels when limitations have been identified. For example, once students identify thatthe current model for the period of a pendulum is breaking down due to a small angleapproximation, follow-up measurements should aim to develop the second-order correctionterm to the model (experimentally and/or mathematically). This may be a level of reasoningappropriate for upper-year labs, rather than the first-year introductory lab.Finally, it would also be worthwhile to examine the impact of the SQILab on the TAsfacilitating the labs, since these skills are directly applicable to their research, much moreso than the traditional concepts in introductory physics courses. It would be interesting tocompare the TAs’ experiences in SQILab and in traditional courses. Also of note is whetherteaching a SQILab course improves the TAs conceptual understanding of measurement anduncertainty as well as their experimentation skills.103BibliographyAAPT (1998). Goals of the introductory physics laboratory. American Journal of Physics,66 (6), 483–485.Adams, W., Perkins, K., Dubson, M., Finkelstein, N., & Wieman, C. (2004). The designand validation of the colorado learning attitudes about science survey. In Marx, J., Heron,P., & Franklin, S. (Eds.), 2004 PERC Proceedings [Sacramento, CA, August 4-5, 2004],volume 790, (pp. 45–48). AIP Conference Proceedings.Allie, S., Buffler, A., Campbell, B., & Lubben, F. (1998). Firstyear physics students’perceptions of the quality of experimental measurements. American Journal of Physics,20 (4), 447–459.Allie, S., Buffler, A., Campbell, B., Lubben, F., Evangelinos, D., Psillos, D., & Valassiades,O. (2003). Teaching measurement in the introductory physics laboratory. The PhysicsTeacher, 41 (7), 394.Ames, C. & Archer, J. (1987). Mothers’ beliefs about the role of ability and effort in schoollearning. Journal of Educational Psychology, 79 (4), 409–414.Ames, C. & Archer, J. (1988). Achievement goals in the classroom: Students’ learningstrategies and motivation processes. Journal of Educational Psychology, 80 (3), 260–267.Anderson, L. & Sosniak, L. (1994). Bloom’s taxonomy: A forty-year retrospective. Chicago,IL: NSSE : University of Chicago Press.Atkinson, R. K., Derry, S. J., Renkl, A., & Wortham, D. (2000). Learning from exam-ples: Instructional principles from the worked examples research. Review of EducationalResearch, 70 (2), 181–214.Beichner, R. J., Saul, J. M., Abbott, D. S., Morse, J., Deardorff, D. L., Allain, R. J.,Bonham, S. W., Dancy, M., & Risley, J. S. (2007). The student-centered activities forlarge enrollment undergraduate programs (scale-up) project. In Research-based reform ofuniversity physics, volume 1 of 1.Beichner, R. J., Saul, J. M., Allain, R. J., Deardorff, D. L., & Abbott, D. S. (2000). Intro-duction to scale-up: Student-centered activities for large enrollment university physics.104BibliographyTechnical report, US Department of Education, Office of Research and Improvement,Educational Resources Information Center.Belenky, D. M. & Nokes-Malach, T. J. (2012). Motivation and transfer: The role of mastery-approach goals in preparation for future learning. Journal of the Learning Sciences, 21 (3),399–432.BIPM, IEC, IFCC, IUPAC, I., IUPAP, & OIML (2008). Guides to the expression of uncer-tainty in measurement. Organization for Standardization.Bransford, J. D., Franks, J. J., Vye, N. J., & Sherwood, R. D. (1989). New approachesto instruction: because wisdom can’t be told. In S. Vosniadou & A. Ortony (Eds.),Similarity and Analogical Reasoning chapter 17, (pp. 470–497).Bransford, J. D. & Schwartz, D. L. (1999). Rethinking transfer: A simple proposal withmultiple implications. Review of Research in Education, 24, 61–100.Brewe, E. (2008). Modeling theory applied: Modeling instruction in introductory physics.American Journal of Physics, 76 (12), 1155–1160.Brewe, E., Kramer, L., & O’Brien, G. (2009). Modeling instruction: Positive attitudinalshifts in introductory physics measured with class. Physical Review Special Topics -Physics Education Research, 5 (1), 013102.Brewe, E. T. (2002). Inclusion of the energy thread in the introductory physics curriculum:An example of long-term conceptual and thematic coherence. PhD thesis, Arizona StateUniversity.Buffler, A., Allie, S., & Lubben, F. (2001). The development of first year physics students’ideas about measurement in terms of point and set paradigms. International Journal ofScience Education, 23 (11), 1137–1156.Buffler, A., Allie, S., & Lubben, F. (2008). Teaching measurement and uncertainty the gumway. The Physics Teacher, 46 (9), 539.Buffler, A., Allie, S., Lubben, F., & Campbell, B. (2003). Evaluation of a research-based cur-riculum for teaching measurement in the first year physics laboratory. In 4th InternationalConference of the European Science Education Research Association, Noordwijkerhout,The Netherlands.Buffler, A., Allie, S., Lubben, F., & Campbell, B. (2007). Introduction to Measurementin the Physics Laboratory. A Probabilistic Approach (3.4 ed.). Department of Physics,University of Cape Town.105BibliographyBuffler, A., Lubben, F., & Ibrahim, B. (2009). The relationship between students’ views ofthe nature of science and their views of the nature of scientific measurement. InternationalJournal of Science Education, 31 (9), 1137–1156.Butler, D. L. (2002). Individualizing instruction in self-regulated learning. Theory intoPractice, 41 (2), 81–92.Butler, D. L. (2003). Structuring instruction to promote self-regulated learning by adoles-cents and adults with learning disabilities. Exceptionality, 11 (1), 39–60.Chi, M. T., Leeuw, N. D., Chiu, M.-H., & Lavancher, C. (1994). Eliciting self-explanationsimproves understanding. Cognitive Science, 18 (3), 439–477.Crouch, C. H. & Mazur, E. (2001). Peer instruction: Ten years of experience and results.American Journal of Physics, 69 (9), 970.Cummings, K., Marx, J., Thornton, R., & Kuhl, D. (1999). Evaluating innovation in studiophysics. American Journal of Physics, 67 (S1), S38–S44.Day, J., Adams, W. K., Wieman, C. E., Schwartz, D. L., & Bonn, D. (2014). Inventionactivities: A path to expertise. Physics in Canada, 70 (2).Day, J. & Bonn, D. (2011). Development of the concise data processing assessment. PhysicalReview Special Topics - Physics Education Research, 7 (1), 010114.Day, J., Holmes, N., Roll, I., & Bonn, D. (2013). Finding evidence of transfer with inventionactivities: Teaching the concept of weighted average. In Engelhardt, P. V., Churukian,A., & Jones., D. (Eds.), 2013 PERC Proceedings, Portland, OR.Day, J., Holmes, N., Roll, I., & Bonn, D. (2014). Finding evidence of transfer with inventionactivities: Teaching the concept of weighted average. In Engelhardt, P., Churukian, A.,& Jones, D. (Eds.), 2013 PERC Proceedings, Portland, Or.Day, J., Nakahara, H., & Bonn, D. (2010). Teaching standard deviation by building fromstudent invention. The Physics Teacher, 48 (8), 546.Dweck, C. S. (1986). Motivational processes affecting learning. American Psychologist,41 (10), 1040–1048.Elby, A. (1999). Another reason that physics students learn by rote. American Journal ofPhysics, 67 (S1), S52.Elby, A. (2001). Helping physics students learn how to learn. American Journal of Physics,69 (S1), S54–S64.106BibliographyElliot, A. J. & McGregor, H. A. (2001). A 22 achievement goal framework. Journal ofPersonality and Social Psychology, 80 (3), 501–519.Ericsson, K., Krampe, R. T., & Tesch-Romer, C. (1993). The role of deliberate practice inthe acquisition of expert performance. Psychological Review, 100 (3), 363–406.Etkina, E. & Heuvelen, A. V. (2007). Investigative science learning environment—a scienceprocess approach to learning physics. In Research-based reform of university physics,volume 1.Etkina, E., Heuvelen, A. V., White-Brahmia, S., Brookes, D. T., Gentile, M., Murthy, S.,Rosengrant, D., & Warren, A. (2006). Scientific abilities and their assessment. PhysicalReview Special Topics - Physics Education Research, 2 (2), 020103.Etkina, E., Karelina, A., & Ruibal-Villasenor, M. (2008). How long does it take? a studyof student acquisition of scientific abilities. Physical Review Special Topics - PhysicsEducation Research, 4 (2), 020108.Etkina, E., Karelina, A., Ruibal-Villasenor, M., Rosengrant, D., Jordan, R., & Hmelo-Silver,C. E. (2010). Design and reflection help students develop scientific abilities: Learning inintroductory physics laboratories. Journal of the Learning Sciences, 19 (1), 54–98.Evangelinos, D., Psillos, D., & Valassiades, O. (2002). An investigation of teaching andlearning about measurement data and their treatment in the introductory physics labo-ratory. In D. Psillos & H. Niedderer (Eds.), Teaching and Learning in the Science Labo-ratory, volume 16 of Science and Technology Education Library chapter 4, (pp. 179–190).Springer Netherlands.Galvez, E. & Singh, C. (2010). Introduction to the theme issue on experiments and labora-tories in physics education. American Journal of Physics, 78 (5), 453.Gick, M. L. & Holyoak, K. J. (1980). Analogical problem solving. Cognitive Psychology,12 (3), 306–355.Goertzen, R. M., Scherr, R. E., & Elby, A. (2009). Accounting for tutorial teaching assis-tants’ buy-in to reform instruction. Physical Review Special Topics - Physics EducationResearch, 5 (2), 020109.Gray, K. E., Adams, W. K., Wieman, C. E., & Perkins, K. (2008). Students know whatphysicists believe, but they don’t agree: A study using the class survey. Physical ReviewSpecial Topics - Physics Education Research, 4 (2), 020106.Hammer, D. & Elby, A. (2003). Tapping epistemological resources for learning physics.Journal of the Learning Sciences, 12 (1), 53–90.107BibliographyHestenes, D., Wells, M., & Swackhamer, G. (1992). Force concept inventory. The PhysicsTeacher, 30 (3), 141–158.Hofstein, A. & Lunetta, V. N. (2004). The laboratory in science education: Foundationsfor the twenty-first century. Science Education, 88 (1), 28–54.Holmes, N. (2011). The invention support environment : using metacognitive scaffoldingand interactive learning environments to improve learning from invention. Master’s thesis,University of British Columbia, Vancouver.Holmes, N. & Bonn, D. (2013). Doing science or doing a lab? engaging students withscientific reasoning during physics lab experiments. In Engelhardt, P. V., Churukian,A. D., & Jones., D. L. (Eds.), 2013 PERC Proceedings, (pp. 185–188)., Portland, Or.Holmes, N., Day, J., Park, A. H., Bonn, D., & Roll, I. (2014). Making the failure moreproductive: Scaffolding the invention process to improve inquiry behaviours and outcomesin productive failure activities. Instructional Science, 42 (4), 523–538.Holmes, N., Ives, J., & Bonn, D. (2014). The impact of targeting scientific reasoning onstudent attitudes about experimental physics. In Engelhardt, P. V., Churukian, A. D.,& Jones., D. L. (Eds.), 2014 PERC Proceedings.Holmes, N., Martinuk, M. S., Ives, J., & Warren, M. (2013). Teaching assistant professionaldevelopment by and for tas. The Physics Teacher, 51 (4), 218.Hoskinson, A.-M., Couch, B. A., Zwickl, B. M., Hinko, K. A., & Caballero, M. D. (2014).Bridging physics and biology teaching through modeling. American Journal of Physics,82 (5), 434–441.Kanari, Z. & Millar, R. (2004). Reasoning from data: How students collect and interpretdata in science investigations. Journal of Research in Science Teaching, 41 (7), 748–769.Kapur, M. (2008). Productive failure. Cognition and Instruction, 26 (3), 379–424.Kapur, M. (2012). Productive failure in learning the concept of variance. InstructionalScience, 40 (4), 651–672.Kapur, M. & Bielaczyc, K. (2011). Classroom-based experiments in productive failure. InProceedings of the 33rd Annual Conference of the Cognitive Science Society, (pp. 2812–2817). Cognitive Science Society.Krzywinski, M. & Altman, N. (2013). Points of significance: Error bars. Nature Methods,10 (10), 921–922.108BibliographyKumassah, E. K., Ampiah, J. G., & Adjei, E. J. (2013). An investigation into senior highschool (shs3) physics students understanding of data processing of length and time ofscientific measurement in the volta region of ghana. International Journal of ResearchStudies in Educational Technology, 3 (1), 37–61.Kung, R. L. (2002). Analyzing students’ use of metacognition during laboratory activities.Paper presented at American Educational Research Association Meeting, New Orleans,LA.Kung, R. L. (2005). Teaching the concepts of measurement: An example of a concept-basedlaboratory course. American Journal of Physics, 73 (8), 771.Kung, R. L. & Linder, C. (2006). University students’ ideas about data processing and datacomparison in a physics laboratory course. Nordic Studies in Science Education, 2 (2),40–53.Kuo, E., Hull, M. M., Gupta, A., & Elby, A. (2013). How students blend conceptual andformal mathematical reasoning in solving physics problems. Science Education, 97 (1),32–57.Leach, J. (1999). Students’ understanding of the co-ordination of theory and evidence inscience. International Journal of Science Education, 21 (8), 789–806.Leach, J., Millar, R., Ryder, J., Se´re´, M.-G., Hammelev, D., Niedderer, H., & Tselfes, V.(1998). Survey 2: Students’ images of science as they relate to labwork learning. WorkingPaper 4, European Commission - Targeted Socio-Economic Research Programme.Lepper, M. R. & Woolverton, M. (2002). The wisdom of practice: Lessons learned from thestudy of highly effective tutors. In J. Aronson (Ed.), Improving Academic Achievement,Educational Psychology chapter 7, (pp. 135–158). Academic Press.Lindsey, B. A., Hsu, L., Sadaghiani, H., Taylor, J. W., & Cummings, K. (2012). Positiveattitudinal shifts with the physics by inquiry curriculum across multiple implementations.Physical Review Special Topics - Physics Education Research, 8 (1), 010102.Lippmann, R. F. (2003). Students’ understanding of measurement and uncertainty in thephysics laboratory: Social construction, underlying concepts, and quantitative analysis.PhD thesis, University of Maryland, College Park, Maryland, United States.Lising, L. & Elby, A. (2005). The impact of epistemology on learning: A case study fromintroductory physics. American Journal of Physics, 73 (4), 372.Maries, A. & Singh, C. (2013). Exploring one aspect of pedagogical content knowledge ofteaching assistants using the test of understanding graphs in kinematics. Physical ReviewSpecial Topics - Physics Education Research, 9 (2), 020120.109BibliographyOtero, V. K. & Gray, K. E. (2008). Attitudinal gains across multiple universities usingthe physics and everyday thinking curriculum. Physical Review Special Topics - PhysicsEducation Research, 4 (2), 020104.Pillay, S. (2006). The evaluation of a research-based curriculum for teaching measurementin the first year physics laboratory. Master’s thesis, University of Cape Town.Pillay, S., Buffler, A., Lubben, F., & Allie, S. (2008). Effectiveness of a gum-compliant coursefor teaching measurement in the introductory physics laboratory. European Journal ofPhysics, 29 (3), 647.Redish, E. F. (2014). Oersted lecture 2013: How should we think about how our studentsthink? American Journal of Physics, 82 (6), 537–551.Redish, E. F., Saul, J. M., & Steinberg, R. N. (1998). Student expectations in introductoryphysics. American Journal of Physics, 66 (3), 212–224.Roll, I., Holmes, N., Day, J., & Bonn, D. (2012). Evaluating metacognitive scaffolding inguided invention activities. Instructional Science, 40 (4), 691–710.Ryder, J. (2002). Data interpretation activities and students’ views of the epistemology ofscience during a university earth sciences field study course. In D. Psillos & H. Nied-derer (Eds.), Teaching and Learning in the Science Laboratory, volume 16 of Science andTechnology Education Library chapter 3, (pp. 151–162). Springer Netherlands.Ryder, J. & Leach, J. (2000). Interpreting experimental data: the views of upper sec-ondary school and university science students. International Journal of Science Educa-tion, 22 (10), 1069–1084.Salomon, G. & Perkins, D. N. (1989). Rocky roads to transfer: Rethinking mechanism of aneglected phenomenon. Educational Psychologist, 24 (2), 113–142.Schoenfeld, A. (1987). What’s all the fuss about metacognitlon? In Cognitive science andmathematics education (pp. 189–215). Psychology Press.Schwartz, D. L. & Bransford, J. D. (1998). A time for telling. Cognition and Instruction,16 (4), 475–522.Schwartz, D. L., Chase, C. C., Oppezzo, M. A., & Chin, D. B. (2011). Practicing versusinventing with contrasting cases: The effects of telling first on learning and transfer.Journal of Educational Psychology, 103 (4), 759–775.Schwartz, D. L. & Martin, T. (2004). Inventing to prepare for future learning: The hiddenefficiency of encouraging original student production in statistics instruction. Cognitionand Instruction, 22 (2), 129–184.110BibliographySe´re´, M.-G., Fernandez-Gonzalez, M., Gallegos, J. A., Gonzalez-Garcia, F., Manuel, E. D.,Perales, F. J., & Leach, J. (2001). Images of science linked to labwork: A survey ofsecondary school and university students. Research in Science Education, 31 (4), 499–523.Se´re´, M.-G., Journeaux, R., & Larcher, C. (1993). Learning the statistical analysis ofmeasurement errors. International Journal of Science Education, 15 (4), 427–438.Seung, E. & Bryan, L. A. (2010). Graduate teaching assistants’ knowledge development forteaching a novel physics curriculum. Research in Science Education, 40 (5), 675–698.Slaughter, K. A. (2012). Mapping the transition - content and pedagogy from school throughuniversity. PhD thesis, University of Edinburgh.Spike, B. & Finkelstein, N. (2011). Toward an analytic framework of physics teaching assis-tants’ pedagogical knowledge. In Physics Education Research Conference 2011, volume1413 of PER Conference, (pp. 363–366)., Omaha, Nebraska.Stage, E., Asturias, H., Cheuk, T., Daro, P., & Hampton, S. (2013). Opportunities andchallenges in next generation standards. Science, 340 (6130), 276–277.Tuminaro, J. & Redish, E. F. (2007). Elements of a cognitive model of physics problemsolving: Epistemic games. Physical Review Special Topics - Physics Education Research,3 (2), 020101.Volkwyn, T. S., Allie, S., Buffler, A., & Lubben, F. (2008). Impact of a conventionalintroductory laboratory course on the understanding of measurement. Physical ReviewSpecial Topics - Physics Education Research, 4 (1), 010108.Wells, M., Hestenes, D., & Swackhamer, G. (1995). A modeling method for high schoolphysics instruction. American Journal of Physics, 63 (7), 606–619.Wilson, J. M. (1994). The cuple physics studio. The Physics Teacher, 32 (9), 518–523.Zwickl, B. M., Finkelstein, N., & Lewandowski, H. (2012). Development and validationof the colorado learning attitudes about science survey for experimental physics. InAmerican Institute of Physics Conference Series, volume 1513, (pp. 442–445).Zwickl, B. M., Finkelstein, N., & Lewandowski, H. J. (2013a). A framework for incorporatingmodel-based inquiry into physics laboratory courses. arXiv preprint, arXiv: 1301.4414v1.Zwickl, B. M., Finkelstein, N., & Lewandowski, H. J. (2013b). The process of transform-ing an advanced lab course: Goals, curriculum, and assessments. American Journal ofPhysics, 81 (1), 63–70.111BibliographyZwickl, B. M., Finkelstein, N., & Lewandowski, H. J. (2014). Incorporating learning goalsabout modeling into an upper-division physics laboratory experiment. American Journalof Physics, 82 (9), 876–882.112
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Structured quantitative inquiry labs : developing critical...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Structured quantitative inquiry labs : developing critical thinking in the introductory physics laboratory Holmes, Natasha Grace 2014
pdf
Page Metadata
Item Metadata
Title | Structured quantitative inquiry labs : developing critical thinking in the introductory physics laboratory |
Creator |
Holmes, Natasha Grace |
Publisher | University of British Columbia |
Date Issued | 2014 |
Description | Many undergraduate labs engage students in experimentation without developing critical thinking or scientific reasoning skills, especially about measurement and data. In this thesis, I present a pedagogical framework for developing students' critical thinking behaviours in a first-year undergraduate physics lab. The main critical thinking behaviours assessed were for students to reflect on their data collection and analyses, iterate to improve their measurements and methods, and evaluate the experiments and theoretical models. The pedagogy uses structured comparisons between measurements and models, with a critical focus on understanding measurement and uncertainty at a conceptual level and applying the concepts to quantitative analysis of data. Implementation involved scaffolded instructions and support for reflection and iteration that was dynamically faded throughout the course. Through analysis of students' written lab materials, I evaluated their engagement in reflection, iteration, and evaluation, comparing to a previous iteration of the course that did not include the critical thinking scaffolding. Students in the new course structure not only transferred the previously scaffolded reflection and iteration behaviours to unscaffolded experiments, but also spontaneously evaluated theoretical models, which was never explicitly structured. While the previous version of the course supported students in data analysis at a procedural, 'plug-and-chug' level, the new course structure significantly improved students' critical thinking behaviours, shifted students into more expert-like epistemological frames, and improved their motivation and attitudes about experimental physics. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2014-12-05 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0165566 |
URI | http://hdl.handle.net/2429/51363 |
Degree |
Doctor of Philosophy - PhD |
Program |
Physics |
Affiliation |
Science, Faculty of Physics and Astronomy, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2015-02 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2015_february_holmes_natasha.pdf [ 22.67MB ]
- Metadata
- JSON: 24-1.0165566.json
- JSON-LD: 24-1.0165566-ld.json
- RDF/XML (Pretty): 24-1.0165566-rdf.xml
- RDF/JSON: 24-1.0165566-rdf.json
- Turtle: 24-1.0165566-turtle.txt
- N-Triples: 24-1.0165566-rdf-ntriples.txt
- Original Record: 24-1.0165566-source.json
- Full Text
- 24-1.0165566-fulltext.txt
- Citation
- 24-1.0165566.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0165566/manifest