LONGITUDINAL ANALYSIS FOR BINARY AND COUNT DATAByJinko GrahamB. Sc. (Mathematics) University of British ColumbiaA THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinTHE FACULTY OF GRADUATE STUDIESDEPARTMENT OF STATISTICSWe accept this thesis as conformingto the required standardTHE UNIVERSITY OF BRITISH COLUMBIA1991© Jinko Graham, 1991In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(SignatureDepartment of 3 i A i 15 TICSThe University of British ColumbiaVancouver, CanadaDate N.C. 33 ) 199iDE-6 (2/88)AbstractLongitudinal data sets consist of repeated observations of an outcome over time, and a corre-sponding set of covariates for each of many subjects. In many fields, multivariate analysis-of-variance is commonly used to analyse longitudinal data. Such an analysis is appropriate whenresponses for each subject are multivariate Gaussian with a common covariance matrix for allsubjects. In many cases, however, the longitudinal response cannot be transformed to satisfythese assumptions. An alternate analysis might rely on specification of an appropriate likeli-hood to obtain estimates of regression parameters and their standard errors. Unfortunately,the correlation structure in the data for each subject may not be well understood, making suchparametric modelling difficult.This thesis discusses two methods for the analysis of longitudinal data which require onlyminimal assumptions about the true correlation structure in the data for each subject to yieldconsistent estimates of regression parameters and their standard errors. The first methodis based on the use of a "working" likelihood and extends the results of Zeger, Liang andSelf (Biometrika, 1985) to the case of time-dependent covariates. The second method, firstpresented by Liang and Zeger (Biometrika, 1986), is based on quasi-likelihood theory. Thismethod uses generalized estimating equations to arrive at consistent estimates of the regressioncoefficients and their standard errors, and can be applied to any longitudinal response withunivariate marginal distributions for which the quasi-likelihood formulation is sensible. Thisiiincludes Gaussian, Poisson, binomial (bernoulli), gamma, and inverse Gaussian distributions.Both methods are extensively illustrated using the results from an experiment on hummingbirdlearning. These methods enable much more information to be extracted from the hummingbirddata set than a more traditional analysis-of-variance, and therefore provide useful and powerfultools for researchers in this subject area.iiiTable of ContentsAbstract^ iiList of Tables^ viiList of Figures^ viiiAcknowledgement^ xi1 Introduction 11.1 Background to the Experiment ^ 11.2 Description of the Experiment 31.3 Description of the Data ^ 51.4 Outline of the Thesis 62 Classical Analysis 92.1 Univariate Analysis-of-Variance for Repeated Measures ^ 122.2 The Statistical Model ^ 132.3 Results of the Analysis 142.4 Conclusions and Limitations ^ 183 Analysis Based on a Working Likelihood Approach 203.1 Logistic Regression for Binary Repeated Measures ^ 20iv3.2 The Working Likelihood Approach ^ 203.3 Incorporating Time-dependent Covariates 283.3.1 The Independence Working Model ^ 283.3.2 The First-Order Working Model 303.4 Exploring the Binary Hummingbird Data ^ 343.5 Model Fitting With Sex Excluded 373.5.1 The Initial Model^ 373.5.2 An Informal Check of the Working Likelihoods ^ 383.6 Model Reductions With Sex Excluded ^ 403.6.1 Strategy for Model Reductions 403.6.2 Description of the Reductions ^ 413.7 Residual Analysis for the Final Model with Sex Excluded ^ 473.7.1 Analysis of the Independence Residuals ^ 473.7.2 Analysis of the First-Order Residuals 493.8 Summary of Analysis with Sex Excluded ^ 513.9 The Initial Model With Sex Included 533.10 Model Reductions with Sex Included ^ 543.10.1 Strategy for Model Reductions 543.10.2 Description of the Reductions for the Independence Working Likelihood ^ 553.10.3 Description of the Reductions for the First-Order Working Likelihood . ^ 573.11 Summary of Analysis with Sex Included ^ 604 Analysis Based on the GEE Approach^ 634.1 Description of the GEE approach 634.1.1 Brief Description of Quasi-Likelihood ^ 644.1.2 Generalized Estimating Equations (GEEs) for Longitudinal Data ^ 664.2 GEE Analysis of the Binary Hummingbird Data ^ 704.2.1 Description of the Model Reductions 714.2.2 Fits of the Final Model Obtained by Other Working Correlations ^ 744.3 GEE Analysis of the Binomial Hummingbird Data ^ 764.3.1 Exploring the Binomial Hummingbird Data 774.3.2 Model Fitting for the Binomial Hummingbird Data^ 794.3.3 Fits of the Final Model Obtained by Other Working Correlations ^ 845 Conclusion^ 875.1 Discussion of Results ^ 875.2 Comparison of Methods 89Figures^ 92Bibliography^ 117viList of TablesTable 2.1. Analysis of Variance for Blocks of 10 ^ 16Table 2.2. Analysis of Variance for Blocks of 15 16Table 3.1. Comparison of Z-scores from the Initial Model ^ 39Table 3.2. Final Model Fits from the First Reduction Path 43Table 3.3. Final Model Fits from the Second Reduction Path ^ 46Table 3.4. Estimated Autocorrelations for the Independence Residuals ^ 49Table 3.5. Estimates of Transformed Lag One Autocorrelations 50Table 3.6. Model Reductions for the Independence Working Likelihood with SexIncluded ^ 57Table 3.7. Model Reductions for the First-Order Working Likelihood with Sex Included 59Table 4.1. Reductions for the Stationary 1-Dependent Working Correlation with SexIncluded ^ 74Table 4.2. Robust z-Scores (Parameter Estimates) for the Binary Final Model^ 75Table 4.3. Model Reductions for the Binomial Data ^ 81^Table 4.4. Robust z-Score (Parameter Estimate) for the Binomial Final Model 84viiList of FiguresFigure 1.1. Feeding Arrays for the Hummingbird Experiment ^ 92Figure 2.1. Mean Treatment (Transformed) Response versus Measurement for Blocksof 15 ^ 93Figure 2.2. Mean Treatment (Transformed) Response versus Measurement for Blocksof 10 ^ 93Figure 2.3. Estimated Variances versus Measurement for Blocks of 15 ^ 94Figure 2.4. Estimated Variances versus Measurement for Blocks of 10 ^ 94Figure 3.1. Average Treatment Response in the Logit Scale for Blocks of 3 Trials . . . 95Figure 3.2 Average Treatment Response in the Logit Scale for Blocks of 5 Trials . . . 96Figure 3.3. Average Treatment Response in the Logit Scale for Blocks of 10 Trials . . 97Figure 3.4. Smoothed Logit of Average Treatment Response versus Trial(Window = 10) ^ 98Figure 3.5. Smoothed Logit of Average Treatment Response versus Trial(Window = 20) ^ 99Figure 3.6. Smoothed Logit of Average Treatment Response versus Trial(Window = 30) ^ 100viiiFigure 3.7. Smoothed Individual Logits for Treatment 1 ^ 101Figure 3.8. Smoothed Individual Logits for Treatment 2 102Figure 3.9. Smoothed Individual Logits for Treatment 3 ^ 103Figure 3.10. First Independence Fits without Sex 104Figure 3.11. Second Independence Fits without Sex ^ 104Figure 3.12. Fitted Success Probabilities for Treatment 1 from the IndependenceWorking Model ^ 105Figure 3.13. Independence Pearson Residuals by Bird ^ 106Figure 3.14. Boxplot of Independence Pearson Residuals by Sex ^ 107Figure 3.15. Smoothed Logit of Average Response versus Trial for Treatment 1(Window = 60) ^ 108Figure 3.16. Smoothed Logit of Average Response versus Trial for Treatment 2(Window = 60) ^ 108Figure 3.17. Smoothed Logit of Average Response versus Trial for Treatment 3(Window = 60) ^ 108Figure 3.18. Binary Independence Fits for Females ^ 109Figure 3.19. Binary First-Order Fits for Females 109Figure 3.20. Fitted Success Probabilities for Females in Treatment 1 ^ 110Figure 4.1. Binary Fits for Females Obtained from the Stationary 1-DependentWorking Correlation ^ 111Figure 4.2. Average Logit of Response for Treatment 1, Blocks of 10 Trials ^ 112ixFigure 4.3. Average Logit of Response for Treatment 2, Blocks of 10 Trials ^ 112Figure 4.4. Average Logit of Response for Treatment 3, Blocks of 10 Trials ^ 112Figure 4.5. Smoothed Logit of Success Proportion versus Trial (Window = 30)^113Figure 4.6. Smoothed Logit of Success Proportion versus Trial (Window = 60)^113Figure 4.7. Smoothed Individual Logits (Binomial Data) for Treatment 1(Window = 60) ^ 114Figure 4.8. Smoothed Individual Logits (Binomial Data) for Treatment 2(Window = 60) ^ 114Figure 4.9. Smoothed Individual Logits (Binomial Data) for Treatment 3(Window = 60) ^ 114Figure 4.10. Initial Model Fits for Treatment 1 (Binomial Data) ^ 115Figure 4.11. Initial Model Fits for Treatment 2 (Binomial Data) 115Figure 4.12. Initial Model Fits for Treatment 3 (Binomial Data) ^ 115Figure 4.13. Final Fits for Females from the Stationary 1-Dependent WorkingCorrelation ^ 116Figure 4.14. Final Fits for Males from the Stationary 1-Dependent Working Correlation116xAcknowledgementI would like to thank John Petkau for his guidance and supervision throughout the develop-ment of this thesis. I would also like to thank Nancy Heckman for her careful reading of themanuscript and subsequent helpful suggestions. Additionally, I am grateful to Gayle Brown formany instructive conversations about hummingbird learning, and, in particular, for Figure 1.1of the text. Finally, I would like to thank the Statistical Consulting and Research Laboratoryfor providing the data.xiChapter 1Introduction1.1 Background to the ExperimentA desire for a more complete understanding of how organisms relate objects separated in spacemotivates much research on "spatial learning" in both the biological sciences and psychology.Currently, in the Department of Zoology at U.B.C., two Ph.D. students are studying spatiallearning in hummingbirds. One of these students, Gayle Brown, came to the Statistical Con-sulting and Research Laboratory in the summer of 1990 with questions about how to analyzedata she had collected in one of her experiments. The experiment was designed to answer ques-tions about "spatial association learning" in hummingbirds. In "spatial association learning",organisms learn to associate certain cues with reinforcement (such as food, water, or mates)separated in space from the cues. For example, Cori (1989) has shown that bees use the coloursof spots on banner petals to select flowers with greater average nectar rewards.Results from previous experiments conducted as part of the ongoing research of the inves-tigator show that hummingbirds learn spatial associations very quickly relative to all otherorganisms that have been tested in the laboratory (monkeys, rats, pigeons, and children) andeven perform perfectly on problems that may be unlearnable by other animals, according to1Chapter 1. Introduction^ 2documentation in the literature (see reviews by MacKintosh, 1983; and Bowe, 1984). One ex-planation for the success of hummingbirds in spatial association experiments is that, in nature,hummingbirds must frequently learn floral cues in order to efficiently extract nectar hiddenwithin flowers. The fact that many of these floral cues are visibly connected to the nectar ofthe flower suggests visible guides may facilitate the learning of spatial associations in humming-birds.To test this possibility, an experiment on the feeding behaviour of eighteen experimentally-naive adult rufous hummingbirds (10 females and 8 males) was carried out. The hummingbirdswere assigned randomly by sex to one of three treatments: "No Tape" (NT or treatment 1),in which no visible guides connected cues with the feeders below them; "Partial Tape" (PT ortreatment 2), in which fluorescent orange Dymo type (9 mm wide) provided a discontinuous(i.e., broken in two places) connection between each cue and its feeder; and "Full Tape" (FTor treatment 3), in which the visible guide between each cue and its feeder (fluorescent orangeDymo tape) was continuous. The NT group (treatment 1) had 4 females and 2 males, whilethe PT (treatment 2) and FT (treatment 3) groups each had 3 males and 3 females.The investigator was interested in comparing learning patterns of the birds between thesetreatments. She expected different learning patterns on each treatment; in particular, she ex-pected hummingbirds in the two treatment groups with visible guides between cues and feeders(2=PT and 3=FT) to become aware of the cue-feeder relationship sooner than hummingbirdsin the treatment group with no visible guides (1=NT). From psychological principles of visualperception, she also expected birds in the treatment groups with discontinuous and continousvisible guides to become aware of the cue-feeder relationship at roughly the same time.Chapter I. Introduction^ 31.2 Description of the ExperimentOn a given day, three birds (one bird from each treatment group) were tested separately inthree randomly assigned rooms, 1.3 x 2.5 x 2.5 metres high. A portion of one end wall ineach room contained a horizontal array of six feeders spaced 3 cm apart on a thin metalpanel. Figure 1.1 displays the feeding arrays for the NT, PT, and FT groups. Feeders in eachfeeding array were marked by 19 mm round fluorescent orange Avery labels with central 3 mmholes. Hummingbirds hovered at the feeders and probed their bills through the holes to smallfood reservoirs. If the feeder was the correct or "profitable" one, a miniature solenoid valveimmediately released 2 pl of 20% sucrose solution into the reservoir. Hummingbirds were freeto fly and visit feeders at all times during the experiment. Between brief foraging bouts to thefeeding array and short non-foraging flights around the room, they spent most of their time ona 1.5 metre high central perch, located 1.8 metres from the feeding array and designed to placebirds at eye level with the feeders.A small red light, 4 mm in diameter, protruded slightly through the metal panel 12 cmabove each feeder. These small lights served as the spatial cues. During each feeding trial onewas lit to signal the feeder below it as profitable. A computer controlled the lights, dispensedthe food, and recorded the time and duration of all visits to feeders and perches.To allow the three hummingbirds to become accustomed to the feeding array used in theexperiment, on the morning of the day before testing, the birds were placed in training cagesidentical to their home cages, except that food was available from a wall feeder similar inappearance to the feeding array. The birds quickly learned to feed from these feeders and wereChapter 1. Introduction^ 4moved from their training cages to the experimental rooms that afternoon. Until experimentaltraining began the following morning, the hummingbirds fed from a standard commercial feedermarked with a fluorescent orange Avery label around the access hole. The feeder was hung infront of a central array feeder on which training would start. The other five feeders were covered.At the beginning of the next day, the standard feeder was removed from each room, exposinga central feeder in the array to the hummingbird in each room. After the birds had fed severaltimes from this feeder, the light above it was lit and training continued. After several morefeedings, all other feeders were uncovered and the cue and profitability then reassigned to adifferent cue-feeder pair. This transfer procedure was repeated until birds had fed several timesfrom each feeder when the light above it was lit. Training lasted approximately 2 hours.Testing started between 10:30-11:00 am, immediately after training was finished. Eachhummingbird was tested for 60 feeding trials on each of three successive days. Birds receivedinitial training (as described in the preceding paragraph) on each testing day. On the first andsecond days, at the end of the day's 60 feeding trials, a commercial feeder was hung in front of amiddle feeder in the array, the other feeders in the array were covered, and the birds remainedin the experimental rooms overnight. On the third and final day, birds were returned to theirhome cages immediately after testing was completed.The one profitable feeder and its cue were reassigned randomly among the six feeders eachfeeding trial, but the same random sequence was used for all birds. Feeding trials began2 minutes after preceding trials ended, or as soon as preceding trials ended and the birdsperched. At the beginning of each feeding trial the following events occurred simultaneously: asoft buzzer sounded for 0.5 seconds, the cue to the profitable feeder was lit, and the profitableChapter 1. Introduction^ 5feeder was set to provide 2 ,a1 of solution on each visit. Birds could visit any sequence of feedersand obtain food from the correct feeder up to 11 times, for a total of 22 id each feeding trial.This quantity provided food at a rate slightly more than average ad libitum feeding rates.No limit was imposed on the time for birds to respond in feeding trials. Feeding trials endedonly after birds had probed the correct feeder 11 times, or after they had visited at least onefeeder and returned to their perch. When a feeding trial ended, the light cue was turned off andthe previously profitable feeder delivered no more food. All feeders remained exposed betweentrials.1.3 Description of the DataLearning and performance of the hummingbirds was to be evaluated from first visits in feedingtrials. If the first visit was to a profitable feeder, the feeding trial was scored as a success (codedas 1); if the first visit was to any one of the five unprofitable feeders, the feeding trial was scoredas a failure (coded as 0). The binary data set is comprised of 18 binary series of length 180,corresponding to the successes and failures of each bird on the first feeding foray in a trial, for3 days of 60 consecutive feeding trials.In addition to the information in these binary series on the first feeding foray of each trial,information on both the number of successful feeding forays and the total number of feedingforays in each trial is available. What we will refer to as the binomial data set consists ofthe total number of feeding forays (denoted by n) and the number of successful feeding forays(denoted by y) in each trial. Note that the number of successful forays in each trial is not reallya binomial random variable because:Chapter 1. Introduction^ 6• the feeding forays within a trial are not independent, and• even if the forays within a trial were independent, for those trials in which the bird feedssuccessfully the maximum number of 11 times (roughly 60% of the 3240 trials), the totalnumber of feeding forays made during the trial (denoted by n), rather than the numberof successful feeding forays made during the trial (denoted by y), would be the randomquantity (a negative binomial random variable) because of the way the experiment hasbeen set up.1.4 Outline of the ThesisThis thesis discusses several approaches to the analysis of binary longitudinal data, the last ofwhich is also applicable to count (and continuous) responses.The first approach discussed is univariate analysis-of-variance for repeated measures usingthe arcsine square-root variance stabilizing transformation for proportions; this was the ap-proach taken by the investigator in her analysis of the binary hummingbird data. To use thisapproach, the data for each bird must be blocked over trials before the elements in the resultingresponse vector are transformed. For example, the response vector could consist of 3 elements,the arcsine square-roots of the proportions of successful trials on each of the 3 days in the ex-periment. The arcsine square-root transformation may not be entirely effective if the responseis a vector of proportions because, although the variances of the proportions are stabilized,covariances remain functions of the success probabilities. Because the investigator expects suc-cess probabilities to differ across treatment groups, the covariance matrices are also expectedto be different. In this context, univariate analysis-of-variance for repeated measures is of littleChapter I. Introduction^ 7utility because it requires a common covariance matrix for valid statistical tests. Even if weassume a common covariance matrix is shared by all birds, conservative statistical tests are re-quired (with this data) for checking the effects of measurement and measurement-by-treatmentinteraction in the analysis-of-variance. Moreover, the sex of each bird is not easily incorporatedinto the analysis because of the unequal number of males and females in the experiment. Analternate technique for analysis of binary longitudinal data which accounts for correlation overtime within subjects and permits flexible modelling of the response would be preferable to thismore classical approach.The "working likelihood" approach of Zeger, Liang, and Self (1985), described in Chapter 3,is just such a technique. Here, a working likelihood (not necessarily the true likelihood) is usedto generate estimating equations for regression parameters which model the marginal proba-bility of success. Sections 3.1 and 3.2 outline the approach of Zeger, Liang, and Self (1985),while Section 3.3 extends to this approach to time-dependent covariates. This extension per-mits success probabilities on the first feeding foray to be modelled as functions of time, whiletaking into account correlation between measurements on each bird. Hence, comparisons of thelearning trajectories of hummingbirds across treatment groups are more direct than compar-isons achieved by the classical analysis. The remainder of Chapter 3 presents an analysis of thebinary hummingbird data using this extension. Note that with only 18 birds in the experiment,efficient estimation of the regression parameters is an important issue. One weakness of thisapproach is that only one working likelihood is available for explicitly modelling correlation inthe data (although, the results could presumably be extended to include working likelihoodsmodelling correlation in a variety of ways). Therefore, specification of a working likelihoodChapter 1. Introduction^ 8"close" to the true likelihood (for reasons of efficiency) may be difficult. Further, when mod-elling correlation explicitly, asymptotic results only hold under direct assumptions on the truecorrelation structure.Chapter 4 presents the "generalized estimating equation" (GEE) approach of Zeger andLiang (1986) and Liang and Zeger (1986). The GEE approach extends the use of quasi-likelihoodto longitudinal data and may be used for any multivariate response with univariate marginaldistributions for which the quasi-likelihood formulation is sensible. The GEE approach is moreflexible than the working likelihood approach not only because the marginal likelihoods need notbe specified, but also because many possible "working correlations" are available for explicitlymodelling correlation in the responses. Moreover, no assumptions about the true correlationstructure are necessary for valid asymptotic results with any of these working correlations. Toillustrate the flexibility of the GEE approach, both the binary and binomial hummingbird dataare analyzed using different working correlation structures.In Chapter 5, the thesis concludes with a summary of the important conclusions maderesulting from the analyses and a comparison of the methods discussed.Chapter 2Classical AnalysisA standard approach to the analysis of binary longitudinal data such as the hummingbirddata (for the first feeding foray) might be to calculate the proportion of correct responses overblocks of an equal number of successive time points for each subject, and then "normalize" theseproportions via the arcsine square-root transformation. The resulting subject response vectorswithin the ith treatment group might then be assumed to be approximately normally distributedwith mean vector and covariance matrix Ei. Moreover, if there is a physical basis for doing so,or if there is insufficient evidence in the data to conclude otherwise, the covariance matrices maybe assumed to be equal, a sufficient assumption for the use of multivariate analysis-of-variance(MANOVA) of the response vectors.Before using this approach, the equality of the covariance matrices should be tested. Assum-ing the covariance matrices of birds within each treatment group are the same, the estimatedcovariance matrix common to each treatment group will be non-singular only if q, the numberof elements in each subject's response vector, is fewer than one less the number of subjects inthe treatment group. The hummingbird experiment has only 6 birds per treatment group, so qmust be 4 or less for these estimated covariance matrices to be non-singular. The assumptionof equal covariance matrices cannot be formally checked if q > 5 because the test requires theestimated covariance matrices to be invertible. To check the equality of covariance matrices,9Chapter 2. Classical Analysis^ 10subject data must therefore be blocked over at least 45 trials (leading to q = 4 measurementsper subject). Aggregating the data of each bird over 45 trials is awkward, though, becausethere are 60 trials per day and day effects could be important in describing the performance ofthe birds.Instead, individual data might be blocked over 60 trials (one day), allowing MANOVA withq = 3 "measurements" on each bird. But with only 3 measurements (corresponding to differentdays) on each bird, such an analysis is likely to be too crude to capture the true nature of thetemporal learning pattern on each treatment. The aggressive blocking necessary to carry outMANOVA with only 3 measurements per subject does not permit detailed inference on howsuccess probabilities within the treatment groups change as a function of time. An alternativemethod of analysis able to provide higher resolution of the temporal learning patterns wouldbe ideal.In the MANOVA context, a step towards this ideal would be to block over fewer trials.If data were blocked over 30 trials instead of 60, there would be 6 measurements on eachsubject, and the previous confounding of blocks and days would be eliminated, allowing for amore detailed representation of temporal learning patterns. But the resulting MANOVA testswould not be valid unless all treatment groups shared a common covariance matrix; that is,E i = E, i = 1,2,3. This assumption of common covariance matrices across treatments isinconsistent with the investigators beliefs and contradicted by the data.' Moreover, if the1 The data for q = 3 measurements per bird (where each measurement is the arcsine square-root of a rawsuccess proportion) indicate that covariance matrices for the transformed success proportions are not equalacross treatments. Note that if these covariance matrices are not equal when there are q = 3 measurements perbird, it is unlikely that they will be equal when less aggressive blocking results in q > 3 such measurements.Covariance matrices can be estimated within treatment groups when there are 3 or fewer measurements per bird,but the distribution of the test statistic for their equality is only chi-squared as the number of subjects in eachtreatment group (6 for this experiment) becomes large. Moreover, the test is very sensitive to non-normalityChapter 2. Classical Analysis^ 11success probabilities for birds differ across treatment groups, the covariance matrices for thetransformed response must differ across treatment groups as well because, even though thearcsine square-root transformation can be expected to stabilize the variances, the covarianceswill still be functions of the success probabilities. Although this analysis would provide a moredetailed picture of temporal learning patterns, its conclusions would be somewhat tenuous be-cause the assumption of a common covariance matrix is supported by neither the investigator'sbeliefs, the data, nor the nature of the variance-stabilizing transformation.Even though assuming a common covariance matrix may not be correct, it allows blockingover as few as 15 trials. The resulting 12 measures per bird permit representations of temporalresponse patterns across treatments that are much more detailed than the representationsobtained when the equality of covariance matrices can be tested (and we are limited to 3 orfewer measurements per bird). Another benefit of assuming a common covariance matrix isthat univariate analysis-of-variance for repeated measures, a simpler method more accessibleto researchers familiar with experimental design, may be used as an alternative to MANOVA.The details of such analyses are presented in the remainder of this chapter.in the data (see Morrison 1976, page 252). Hence, both the test results and the estimated covariance matricesshould be examined to judge whether underlying covariance matrices appear to be approximately equal.The estimated covariance matrices (common to birds within treatment groups) for the case of q = 3 measure-ments per bird (where the measurements are transformed success proportions based on blocks of 60 successivetrials) are,2.5 3.7 2.7^2.2^1.1^-0.1^5.5 3.7^1.1^.01 [ 2.7 4.2 8.13.7 5.9 4.2 i , .01 [ -0.1^0.3^3.01 ^2.1^0.3 I , .01 [ 3.7 9.8^7.4 I ,1.1^7.4^11.0for treatments 1, 2, and 3 respectively. These appear quite different, and the same conclusion is suggested bythe chi-squared test; the value of the test statistic is 44.1 on 12 degrees of freedom (p < .001, based on theasymptotic chi-squared distribution). It thus seems unlikely that the covariance matrices are equal when thereare q > 3 measurements per subject.Chapter 2. Classical Analysis^ 122.1 Univariate Analysis-of-Variance for Repeated MeasuresUnivariate analysis-of-variance for repeated measures is strictly valid only if observations aremultinormally distributed with a common covariance structure corresponding to the "type-H"pattern described by Huynh and Feldt (1970); i.e., E = (ak,/), has the pattern defined byork,1 = ak + ai + A k = 1.a k,1 = ak + cei k 1.If the covariance structure of the observations does not correspond to the type-H pattern,Greenhouse and Geisser (1959) have proposed conservative tests based on adjusting the degreesof freedom for the univariate tests. Even though the univariate approach may sometimes re-quire use of these conservative tests, it still has a strong appeal because it is easy to interpretand commonly used by those familiar with experimental design. Further, unlike multivariateanalysis-of-variance, test statistics are still defined if the number of repeated measurements persubject is greater than the within treatment degrees of freedom. The binary hummingbird datamay thus be blocked over even fewer than 15 (say 10, for example) trials to perform the uni-variate analysis, yielding representations of temporal response patterns that are more detailedthan those permitted by MANOVA. However, if the binary hummingbird data is blocked overfewer than 15 trials, it will not be possible to check for the type-H pattern in the underlyingcovariance matrix because the estimated covariance matrix will once again be singular. Evenso, an analysis using conservative univariate tests may still be carried out.Because the investigator wishes to understand how success probabilities for birds in differenttreatment groups compare over time, detailed representations of temporal response patterns areChapter 2. Classical Analysis^ 13desirable. Since these are permitted by univariate analysis-of-variance for repeated measuresand since this method of analysis is well understood and commonly used by researchers familiarwith experimental design, this technique will be used to analyze the binary hummingbird data(on the first feeding foray), even though a key assumption for this technique (i.e., that acommon covariance matrix for the transformed response vectors be shared by all subjects) isnot supported by either the beliefs of the investigator, the data, or the nature of the arcsinesquare-root variance stabilizing transformation for binary data.2.2 The Statistical ModelIn order to "normalize" the hummingbird data, the proportion of correct responses for blocks offifteen and then ten successive trials was calculated for each bird and transformed via the arcsinesquare-root transformation. The resulting 12- and 18-dimensional response vectors for each birdin the ith treatment group were then assumed to be approximately normally distributed withmean pi and common covariance matrix E.The treatment means of these transformed proportions are shown in Figures 2.1 and 2.2.These figures suggest that the (transformed) response grows roughly linearly with time, par-ticularly for birds in treatments 2 (PT) and 3 (FT). The lack of systematic discontinuities inthese response curves at measurements corresponding to new days (measurements 7 and 13in Figure 2.1, and measurements 5 and 9 in Figure 2.2) suggests that modelling a "day" ef-fect is not necessary. Thus, for both blocks of 10 and blocks of 15 observations, a univariateanalyis-of-variance for repeated measures was carried out under the following model:Chapter 2. Classical Analysis^ 14Yijk = arcsine(V _Ask) = It -I- treatment= + measurements + (treatment * measurement)issubject(ok + (measurement * subject)(osk Cijk,for i = 1, 2,3; j = 1, ..,12 or 18 (for blocks of 15 and 10 observations respectively); k = 1, .., 6.Here Pijk is the raw success proportion for the kth subject within the ith treatment at thei th measurement, and ciik is the noise term for the kth subject within the i th treatment atthe i th measurement. We assume that (Eijk )j=1,..,12 or 18 (i = 1,2,3; k = 1, ..,6), the subjectnoise vectors, are independent normally distributed random vectors with mean 0 and commoncovariance matrix E. "Treatment" and "measurement" are fixed effect factors, and "subject"is a random effect factor nested within "treatment".2.3 Results of the AnalysisBefore carrying out univariate analysis-of-variance on the binary hummingbird data (for thefirst feeding foray) blocked over 15 trials, the underlying covariance matrix was tested for thetype-H pattern. Because the value of the appropriate chi-squared statistic (see Morrison 1976,page 251) is 259.17 with 65 degrees of freedom, the covariance matrix does not seem to havethe type-H pattern. The estimated covariance matrix for the binary hummingbird data blockedover 10 trials is singular because there are 18 measurements per bird and only 18 birds in thestudy, so the underlying covariance matrix cannot be tested for the type-H pattern. However,it seems unlikely that the covariance matrix for blocks of 10 trials will have the type-H patternsince the covariance matrix for blocks of 15 trials does not. Univariate analysis-of-variance wasChapter 2. Classical Analysis^ 15therefore carried out using the conservative tests of Greenhouse and Geisser (1959) for bothblocks of 15 and blocks of 10 observations.For the univariate analysis-of-variance tests to be strictly valid, the transformed responsevectors for the individual subjects must be normally distributed with a common covariancematrix E; that is, (Yi303=1,..,120, 18 ", N(µi, E). The investigator does not believe that acommon covariance matrix is shared by all birds, but separate covariance matrices common tobirds within treatment groups cannot be modelled due to the small number of birds within eachtreatment group. Because these estimated covariance matrices, Ei (i = 1, 2,3), are singular,a formal test for the equality of the underlying covariance matrices is not available. However,elements of the estimated covariance matrices may still be compared visually. Figures 2.3and 2.4 plot the estimated variances of the transformed responses (the diagonal elements of Ei)against measurement by treatment group for the 12- and 18-measurement analyses, respectively.Neither figure suggests an obvious pattern in the treatment variances over time, but both figuressuggest that the variances for birds in treatment 3 are generally greater than those for birds inother treatments, while the variances for birds in treatment 2 are generally lower than thosefor birds in other treatments (the estimated variances of responses for birds in treatment 3are anywhere from 1-11 times larger than the estimated variances for birds in treatment 2 forthe 12-measurement analysis, and anywhere from 0.4-22 times larger for the 18-measurementanalysis). These results suggest that the assumption of a common covariance matrix is not validfor either the 12- or the 18-measurement analysis. Hence, the distributions of test statisticsformulated assuming equal covariance matrices can only be expected to roughly approximatethe true distributions of the test statistics. In spite of this, the tests should still provideChapter 2. Classical Analysis^ 16useful information about the relative magnitude of the components of variability, and strongconclusions from the analyses-of-variance should be consistent with conclusions from analyseswhich permit unequal covariance matrices.The analyses-of-variance are given in Tables 2.1 (18-measurement analysis) and 2.2 (12-measurement analysis). Tables 2.1 and 2.2 confirm one clear indication from Figures 2.1 and2.2: "measurement" is an important factor. But Tables 2.1 and 2.2 give no strong evidence of"treatment" effects, either through main effects or through interactions with "measurement".On the other hand, Figures 2.1 and 2.2 clearly indicate that the (transformed) responses fortreatment 3 are generally greater than those in the other treatments, suggesting that "treat-ment" main effects do, in fact, exist but are not being detected because there is too muchvariability between subjects within treatment groups.Table 2.1. Analysis of Variance for Blocks of 10Source SS d.f. MS F conservative d.f. pmeasurement 14.74 17 .87 14.20 1,15 .002treatment .57 2 .29 .75 .49subject (within treatment) 5.67 15 .38measurement *treatment 1.26 34 .037 .62 2,15 .55subject*measurement (within treatment) 15.33 255 .060total 37.57 323Table 2.2. Analysis of Variance for Blocks of 15Source SS d.f. MS F conservative d.f. pmeasurement 3.56 11 .32 8.68 1,15 .01treatment 1.70 2 .85 1.76 .21subject (within treatment) 7.22 15 .48measurement*treatment .958 22 .044 1.17 2,15 .34subject*measurement (within treatment) 6.15 165 .037total 19.59 215The conservative F-tests used in these univariate analyses-of-variance are based on approx-imate distributions of the sums of squares when the subject response vectors are multivariateChapter 2. Classical Analysis^ 17normal with a common covariance matrix E. Greenhouse and Geisser (1959) have shown thatwhen the hypothesis of no measurement effects is true, the F-statistic for measurement has anapproximate F-distribution with degrees of freedom (K — 1)c and (K —1)(N — K)c, where Kdenotes the number of measurements per subject, and N denotes the number of subjects. Thescale factorK2(0 _ F.12E — — 1)(E i Ej a — 2K a72 K2Tr72)is computed from the elements of the (common) covariance matrix E, where o71 is the meanof the p diagonal terms, cr: is the grand mean of all variances and covariances, and c)7 is themean of the elements in the i th row of E. They have also shown that when the hypothesisof no measurement-by-treatment interaction is true, the F-statistic for this interaction has anapproximate F-distribution with degrees freedom (K —1)(p-1)c and (K — 1)(N — p)€, wherep denotes the number of treatments.In practice, E is never known and estimating c from the sample covariance matrix is unde-sirable because more uncertainty is then introduced into the approximate analysis-of-variance.Greenhouse and Geisser have shown that c > 1/(K — 1), implying that the degrees freedom forthe approximate tests cannot be less than 1 and (N — K), and (K — 1) and (N — K), respec-tively. A conservative F-test results from assuming the maximum possible reduction in degreesfreedom, and therefore requires larger values of the F-statistic to reject the null hypothesis (i.e,the hypothesis of no measurement effects, or the hypothesis of no treatment-by-measurementinteractions) than tests which assume less than the maximum reduction in degrees freedom.These conservative F-tests are used in both the 12- and 18-measurement analyses describedChapter 2. Classical Analysis^ 18above, but the conclusions from these analyses are the same as the conclusions from the anal-yses which estimate the scale factor c from the data (e is .33 for the 12-measurement analysis,and .24 for the 18-measurement analysis, so the use of e results in, respectively, degrees offreedom 3.63 and 4.08 times the conservative degrees of freedom).2.4 Conclusions and LimitationsIn both univariate analyses, the conservative test for the measurement main effects finds "mea-surement" to be an important factor. Figures 2.1 and 2.2 also indicate that "measurement" isan important factor. The test (not conservative) for treatment main effects in both univari-ate analyses-of-variance does not find "treatment" to be important, nor does the conservativetest for the interactions of treatment with measurement find "treatment*measurement" to beimportant. Thus, both analyses-of-variance indicate that treatment group has no effect eitherdirectly, or through interaction with measurement, on the (transformed) response pattern. Onthe other hand, plots of the fitted values (Figures 2.1 and 2.2) clearly indicate that the (trans-formed) responses for treatment 3 are greater than those in the other treatments, suggestingthat a treatment main effect is present but cannot be detected by these methods because vari-ability between subjects within treatments is too large. The same result would therefore beexpected in the corresponding MANOVA. Some of the within-treatment variability betweensubjects may be explained by their sex, but modelling of a sex effect or any interactions ofsex with the other factors is difficult in both univariate and multivariate analysis-of-variancebecause of the unequal number of males and females in treatment 1 (NT). An approach allowingmodelling of sex, despite the imbalanced number of males and females across treatment groups,Chapter 2. Classical Analysis^ 19might reduce the unexplained within-treatment variability, and thereby provide more sensitivetests of all effects.Note that many other "classical" approaches to analyzing the binary hummingbird data(for the first feeding foray) have not been discussed in this section. These include parametricmodelling of the "measurement" effects (for example, analysis-of-covariance, which requires thatobservations within each subject be independent), and analysis-of-variance (either univariateor multivariate) in the logit scale (logit(p) = log( TF_T,), p between 0 and 1) rather than in thearcsine square-root scale to correspond to logistic regression.Although univariate and multivariate analysis-of-variance are standard techniques used toanalyze data in which repeated measurements are taken for each subject, both seem to modelhow the data change over time in indirect ways which lack intuitive appeal. To allow changesin response over time, univariate analysis-of-variance includes "measurement" main effects andinteractions in its model. Multivariate analysis-of-variance allows changes in response overtime through the different elements of its treatment mean response vectors. More intuitiveapproaches, presented in the next sections, model treatment response patterns directly as afunction of time, while taking into account the correlation of measurements within subjects.Treatment response patterns can then be compared visually by examining the fitted treatmentresponse models.Chapter 3Analysis Based on a Working Likelihood Approach3.1 Logistic Regression for Binary Repeated MeasuresA natural approach to the analysis of the binary hummingbird data is to extend logistic re-gression to the case where the binary outcome variable is observed repeatedly for each subject.Zeger, Liang, and Self (1985) present the details of one way of implementing such an approachin which, rather than the actual likelihood, a "working likelihood" based on simple assumptionsabout the time dependence within each subject's data is used to generate estimating equationswhich lead to consistent estimates of the regression parameters, as well as standard errors forthese estimates. Not only does this approach allow for correlation over time within subjects,but it allows flexible modelling of the response and is powerful enough (provided the number ofsubjects is large) to detect important factors and covariates, while remaining simple and easyto interpret.3.2 The Working Likelihood ApproachTo establish notation, let Yi, t (t = 1, .., n t ) be a stationary binary time series, and let Z, be ansxl vector of time-independent covariates for the ith subject (i = 1, K) . Fordefit = Pr(Yi, t = 1 I Zi) , it is assumed throughout this section that logit in = ZP. The20Chapter 3. Analysis Based on a Working Likelihood Approach^ 21primary objective is to make inference about the vector of parameters, p. There may beadditional nuisance parameters in the working likelihood, so let 9 denote all the parametersmodelled by the working likelihood. Let S(9) be the working likelihood score function, andlet H(0) be the corresponding Hessian matrix. Finally, let 0 be the estimate of 0 obtained bymaximizing the working likelihood; in general, we have S(9) = 0. Throughout, score functionsand Hessian matrices will be evaluated under the working likelihood, E(.) will denote theexpectation under the true (unknown) likelihood and Ew(.) will denote the expectation underthe working likelihood.Motivation for the results of Zeger, Liang, and Self (1985) is provided by the Taylor ex-pansion for S(9) about 0, and the Mean Value Theorem which, together with the fact thatS(0) = 0, allow us to writeS(9) = SA+ H(0.)(0 — 0) = —H(0,0(9 — 9),where 0,, is between 0 and O.If the data for the different subjects is independent, then the working log-likelihood is thesum of the independent log-likelihoods for the different subjects. Thus, the score function andthe Hessian are also sums of independent random quantities, and we can write,- 0 =^S(9) = -[- Hi(0.)]-1 - E Si(0),i=1^K i=1where Si and Hi are the i th subject's score function and Hessian, respectively.If E Si(0) 0, then provided regularity conditions for Lindeberg's Central Limit Theoremhold, under the true model,K ^1E Si(0) =^N(0, K2i=1Chapter 3. Analysis Based on a Working Likelihood Approach^ 22where /*(0) = E S(0)S'(0).Under regularity conditions ensuring B is consistent for 9, we haveK^ 1 K^ 1—K EHi (0.) urn --E E Hi(0) = lim —E H(0).K --+ oo KK ".4 00 K i=11=1Thus, by Slutsky's Theorem,— 9 N( 0, [E H(0)] -1 I*(0)[E H(0)] -1 ),where all expectations are taken under the true (unknown) likelihood, and convergence is underthe true (unknown) distribution.Note that to use this result for inference about 0, both /*(0), the variance-covariance matrixfor the score function under the true (unknown) model, and E H(0), the expectation of theHessian under the true (unknown) model, must be estimated from the data. Provided 0 isconsistent for 0, I*(0) = E S(0)S'(0) can be consistently estimated by EL Si(d)S(d). Theexpected Hessian under the working model is identical to the expected Hessian under thetrue (unknown) model provided simple assumptions about the time dependence within eachsubject's data are valid. Therefore, if d is consistent for 0, EwH(0) is a consistent estimateof E H(9). Because H(0) is a sum of independent random quantities, if 0 is consistent for 0then an alternative consistent estimate of E H(0) is H(0), the observed Hessian evaluated at O.The extension of the Zeger, Liang, and Self approach to the case of time-dependent covariates,presented in the next section, uses the observed Hessian evaluated at d, rather than EwH(0),to (consistently) estimate E H(9).Zeger, Liang, and Self (1985) present two estimators of 0. The first, ijo , is obtained from theworking assumption that repeated observations for a subject are independent of one another.Chapter 3. Analysis Based on a Working Likelihood Approach^ 23This estimator is consistent (as K^oo) for any set of stationary binary time series for whichlogit ii = 4/3. The second estimator, fji , is obtained from the weaker working assumption thateach time series has a stationary 1-dependent correlation structure. A . is consistent for any setof stationary binary time series for which both logit = 40 and the first lag autocorrelation,p = is common to all subjects.For the independence working model with logit 7ri = 40, the working log-likelihood can bewritten asK ni^10(0) = E E [yi,t^+ log(1 — ri )]1=1 t=1where logit irY = 4[3. Let S(/3) be the score function for the vector /3, and let Sti (13) be its Uthelement (u = 1, s). The maximum likelihood equations under the working model areK n,0 = su(i3) = E E(.Yi>i — zi)Ziu, u = 1,^s.1=1 t=1Define A) to be the solution to these estimating equations.Note that the expected values of the scores under the true (unknown) model are 0 providedthe mean structure has been correctly specified; that is, E(17i,t Zi) = [1 + exp(-40)] -1 , orlogit E(Yi, t 1Z1) = Z'i 13. This ensures that ijo is an asymptotically unbiased estimate of /3.The (u, v)th element of the Hessian matrix is given byK n,Iluv(0) = E E ziuZivri(1 —i=1 t=1which does not depend upon the Yi, t 's. Therefore, the expected Hessian is the same under boththe true likelihood and the working likelihood; in other words, the working likelihood Fisherinformation matrix is the same as — E H(/3) in this case. As K —> oo, we obtain the result thatChapter 3. Analysis Based on a Working Likelihood Approach^ 24[40 is consistent and asymptotically normal for any set of stationary binary time series such thatlogit 7i = 40, as stated by Zeger, Liang, and Self (1985) in their:PROPOSITION 2.1. Let Yi, t (t = 1, ..,ni < oo; i = 1,..,K) be stationary binary series suchthat logit E(Yi, t I Zi) = 43. Then /Jo — is asymptotically multivariate Gaussian with expecta-tion 0 and covariance matrix V0 (/3) = -1O(0) -1 -1;(0)Io(0) -1 , where MO) is the working Fisherinformation matrix and PAP) = E S(/3)S'(/3).A consistent estimate of /0 (0), the Fisher information matrix under the working model, isgiven byfo 10(Qo).A consistent estimate of ^the variance-covariance matrix of the score function, is providedbyE sioosmc,),where Si(.) is the score function for the i th subject. Combining these results leads to a consistentestimate of V0 (0) given byfo —1 f —1o The estimator^is obtained similarly, after replacing the working assumption of indepen-dence by the working assumption of a common lag one (and no other) autocorrelation. Letp = COTT(Yi,t,Yi,t-1) and 0 = (p, If pi t def E (Yi, t Zi), the true lag one conditionalmean, the assumption of a common first-lag autocorrelation implies thatChapter 3. Analysis Based on a Working Likelihood Approach^ 25Pit =^POri,t-1^= Ew (Yi,t I Yi,t—i, Zi). The working log-likelihood can thus be writtenni/AO) = E [yi , 1 4,3 + log(1 — 70+ E[yi, t log pit + (1 — yi, t )log(1 — NA],i=1^ t=2where logit ri = 40 as before, and pit =^p(yi,t-i - 70•For this working model, let S0(0) be the score function for p, and let Su (0) be the scorefunction for i3„ (u = 1, s). The maximum likelihood equations under the working model areK ni^0 = So(9) = ^= E E (yi,t -^- ri) °P^2=1 t=2^PitPit,Q7^Ziori7T(Yi,t — Pit)},0 = Su (0) =^= E{Ziu(Yi,1 —^+ ( 1 P)upu^i=1 t=2^PitPitwhere 73-87 = 1 - pit , and IT = 1 — ri. Define B = (ji„di) to be the solution to these estimatingequations.Note that the expected values of the scores under the true (unknown) model are 0 providedthe lag one conditional mean structure has been correctly specified; that is, E (Yi, t Yi, t_ i , Zi) =P(Yi,t-1 — ri). This can be seen by conditioning on Yi,t_i for each Yi, t in the estimatingequations. Thus, 0 1 is an asymptotically unbiased estimate of /3.The elements of the Hessian matrix are given by:K n,2/a 21 = E E^(yit - pit) ,^2pid ,OPit )2 ,p^ j^Ii=1 t=11-PiiPit^(Pit.7)2K= E —ri3f7ZiuZivi=1[pit1pit + ((Ypitit—ivP)i2t)(1 2piol ( aPoiu.t )( aPiLt )1+ n^(Yi,t — Pit) ( 02 Pit )t=2 l PitPiti0/3,,0/3vand021 1000.{ \ n21).tK ni^(K t — Pit)( " 1-2 )iE=1 tE=1^PitTi 0P0/3u [ 1 + (Yit Pit) (1^2Pit )1 ( 0/3u )( °Pit )}LPitPit^(PitPit 9 ^OPChapter 3. Analysis Based on a Working Likelihood Approach^ 26where-= Yi,t-1 —op°Pi0ut = (1 — p)riTr7Ziu,4982Pit = (1 — p)(1 — 271-i)ri3t7ZiuZiv,afivaot,and,02pit^ = ri7r7ziu.apafi.The assumption that, under the true likelihood, a common lag one autocorrelation is sharedby all subjects, that is, that the lag one conditional mean structure has been correctly specified,yields E(Yi,t Zi) = 7ri P(Yi,t-i - ri), without any further assumptions concerning thehigher lag autocorrelations. Under this assumption E(Yi,t^Zi) = Ew(Yi,tand it follows that E H(9) = Ew H(0); in other words, under the assumption that the lag oneconditional mean structure has been correctly specified, the working likelihood Fisher informa-tion matrix and —E H(0) are identical.The Fisher information matrix under the first-order working model, / 1 (B), can be partitionedasI1=[ Ipp Ina10P 100where the (u, v)th element of 100 isE ziuzivri7Tfi + (ni — 1)(1 Wilri(1 3P) + P },(1 — p)2 71-07-7 pthe uth element of /op is(2r; — 1) P E(ni — (1 — p)27rjr7 p,°PitChapter 3. Analysis Based on a Working Likelihood Approach^ 27and, \ K^ri 7riInv^( 1 P)(n^1)(2^(1 — P)2 roTT p )As K^oo, we obtain the result that B is consistent and asymptotically normal for any setof stationary binary time series such that logit E(Yi,t j Zi) = 4 13 and corr(li,t, 17i,t--1) = p, asstated by Zeger, Liang, and Self (1985) in their:PROPOSITION 2.2.^Let Yi,t (t = 1,^< oo; i = 1, K) be stationary binary seriessuch that logit E(Yi, t Zi) = Zp and corr(li,t,li,t-1) = p. Let 0 (p, 0). Then 0 — 0is asymptotically multivariate Gaussian with expectation 0 and covariance matrix V 1 (0) =Ii (0) -1 I1(0)I1 (0) -1 , where 11 (0) is the Fisher information matrix under the first-order workingmodel and IV O) = E S(0)S'(0).A consistent estimate of h (0), the Fisher information matrix under the first-order workingmodel, is given byA consistent estimate of /17(0), the variance-covariance matrix of the score function, is providedby_ E si(o)s:(o),i=iwhere Si(• is the score function for the ith subject. Combining these results leads to a consistentestimate of V1 (0) given byV1 =I Ii I11 -1^-1A technical point which might arise during the Newton-Raphson iteration to obtain 0 isthat, depending on the values of the ri's, it may be necessary to constrain p to a smaller regionChapter 3. Analysis Based on a Working Likelihood Approach^ 28than (-1, 1) in order to prevent the pit = Ew (Yi,t I^Zi) = ai +^— ir) = 1, K)from taking on values outside the range [0, 1].Since the covariates, Zi, are time independent, the success probability ri = Pr(Yi, t = 1 Zi),does not depend on time. Zeger, Liang, and Self (1985) were concerned with short binary timeseries, in which there is no need for time-dependent success probabilities. However, the binarytime series for the first feeding forays of the hummingbirds are long (180 time points), and thesuccess probabilities seem to increase with time. Figures 3.1, 3.2, and 3.3 plot the transformedaverage response for birds within a treatment group over consecutive blocks of 3, 5, and 10 timepoints respectively. Note that, except for the vertical scale, Figure 3.3, which plots the averageresponse in the logit scale, and Figure 2.2, which plots the average response in the arcsinesquare root scale used for the classical analysis, are identical. All the plots clearly indicatethat success probabilities generally increase with time. In the next section, the results of thissection are extended to the case of time-dependent covariates to allow the modelling of successprobabilities which change with time.3.3 Incorporating Time-dependent Covariates3.3.1 The Independence Working ModelSuppose we start with the working assumption that repeated observations for a subject areindependent, but now let Zi, t be an sxl vector of time-dependent covariates for the i th subject(i = 1, .., K). If ri,t Pr(Yi,t = 1 I Zi, t), it is assumed throughout this section thatChapter 3. Analysis Based on a Working Likelihood Approach^ 29logit 7ri t = 440. The estimator )40 is obtained from the working log-likelihoodK nilo(Q) = E E [yiA,03 + log(1 — ri,t)]i=1where logit 7ri , t = 4413.Let Su ( i3) be the score function for O u (u = 1, ..,^) under this working likelihood. Then themaximum likelihood equations under the working model are, m^K ni0 = Su (13) = o = E E(yi t —i=1 t.iDefine go to be the solution to these estimating equations.Note that the expected values of the scores under the true (unknown) model are 0 provided^the mean structure has been correctly specified; that is, 7r,, t^E(Yi,t Zi,t) = [1 + exp(—Z2/3)] -1 ,or logit E(Yi,t I Zi,t) = 4, t ,(3. Therefore, go is asymptotically unbiased for 0.The (u, v)th element of Ho (/3), the Hessian matrix for this independence working model, isgiven byK n,— E E^—i=1 t=1which does not depend upon the Yi, t 's. Therefore, the expected Hessian is the same underboth the true likelihood and the working likelihood; in other words, the working likelihoodFisher information matrix is the same as — E Ho (/3) in this case. Under regularity conditionssufficient to guarantee the consistency of go , the following generalization of Zeger, Liang, andSelf's (1985) Proposition 2.1 is immediate:PROPOSITION 1. Let Yi, t (t = 1 , ..,ni < oo; i = 1,..,K) be binary series such thatlogitE(Yo I Zi,t) = 4,0. Then &— is asymptotically multivariate Gaussian with expectationChapter 3. Analysis Based on a Working Likelihood Approach^ 300 and covariance matrixvo(0) = [E H0(0)] -1 1O(0) [E 110(0)] -1 ,where I8`(0)= E S(0)SV).A consistent estimate of E HO), the expectation of the Hessian under the true (unknown)model, is given by Ho (00 ). A consistent estimate of /8`(0), the variance-covariance matrix ofthe score function, is provided byKEsi(go)sago),i=iwhere Si(.) is the score function for the ith subject. Combining these results leads to a consistentestimate of Vo(0) given byVo = [Ho(go)] - 1 Io [Ho(g0)] -1 •3.3.2 The First-Order Working ModelThe first-order working model with time-dependent covariates assumes each binary series hasa stationary 1-dependent correlation structure with logit 7ri , t = Zii,t0 andcorr(170,Yi,t-i.) = P. If Pit 41 E (Yi,t I Yi,t-i, Zi), the assumption of a common lag oneautocorrelation implies thatwherePit^p71i,t7ri,t ^ (yi,t-i -70-1 71-0-1lri ,t = ( 1 — ri,t)•Chapter 3. Analysis Based on a Working Likelihood Approach^ 31Let 0 = (p,/3). Then the working log-likelihood isni/1( 0) = E [yoz:,io + log(1 —^E[yi,tiog Pit + (1 — yi,t)log(1— pit)]]t=2wherei ,t Pit = 711 ,t^P — 7ri,t-1)•7ri,t—i 7ri,t—iFor this working model, let So(0) be the score function for p, and let S.(0) be the scorefunction for ou . The maximum likelihood equations under the working model areallK= So(0) =^= EE (Yi ,t - Pit) °PitPitPit^OPL'P^i=1 t=2andall K^niv,na = —^E ( Yi'ti=1 t=2 PitF Pit ) 'Pito = su(e ) =uP.^ ift 0,314where pi t = 1 - pit . The partial derivatives appearing in these equations are given by:t^Ori,t7ri,t^—afitt7ri,tiri,t— 7ri,t-1] [(1 — 27ri,t)Zi,t,u — (1 — 2 7ri,t—i)Zi,t—i,u],°pitOp^7ri,t-1lri,t-17ri,t7ri,t [Yi,t—i — ri,t-1] •Define 0 =^A.) to be the solution to these estimating equations.Note that the expected values of the scores under the true (unknown) model are 0 providedthe lag one conditional mean structure has been correctly specified; that is, E(Yi,t I^Zi,t) =7r • tri t7r^p^t-1 — ir i ,t). This can be seen by conditioning on Yi,t_i for each 11 4 in2,tthe estimating equations. Therefore, B is asymptotically unbiased for 0.The Hessian matrix for the first-order working likelihood, H1 (0), can be partitioned as+22andChapter 3. Analysis Based on a Working Likelihood Approach^ 32O ],H1(9) = HppP[ HapII 00where the (u, v)th element of Ho is02 1100.00vKi=1ni1 ^-^\i0Pit 0Pit^(yi,t — Pit) 0 2 Pit I+ E [{PitPit--=+^(yi,t (PitPit)2Pit) f — LPit)t=2^aou oov^Pit ^a Oua Ovthe uth element of Hop is4921 1i1^(Yi,t P 2t) (13 uan —i=1 t=2 [ [pit.7 it-(prct)and^2p )] °pit °pit^(yi,t - pit) 02Pit 00u OP ( 0 Oua P )IIPP02 /1^K ni+ (Yi t Pit)^aPit ) 2^(Yi,t — Pit) 02 Pit10132 = EE^_^„ (1 2pit )] pOW 0,92i=1 t=2 [PitPit^kiittPitr OpThe partial derivatives appearing in these expressions are given by:= (1 -— 2^' tri" t-i ri,t -1 (1 — 27i,t)1r ,t 1ri,t ^ [Yi,t-1+ P2whereait^.5[(1 — 27ri,t)Zi,t, u — (1 —^— 27ri,t)Zi, t ,„ — (1 —— 2 [7ri,tri,tZi,t,uZi,t 7v —^Zi,t-LuZi,t-1,v] ,02,"^.5aduaPr—^27ri,t)Zi,t,u — (1 — 27i,t-i)Zi,t-1 04.] —Chapter 3. Analysis Based on a Working Likelihood Approach^ 33andO2Pitop2^•Under regularity conditions sufficient to guarantee the consistency of 0, the following gen-eralization of Zeger, Liang, and Self's (1985) Proposition 2.2 is immediate:PROPOSITION 2.^Let Yi,t (t = 1, ..,ni < oo; i = 1,..,K) be binary series such thatlogit E(Yi,t I Zi,t) = 440 and corr(Yi,t,Yi,t—i) = p. Let 0 = (1)0(3). Then 0 - 0 is asymp-totically multivariate Gaussian with expectation 0 and covariance matrixV1 (0) = [E H1(0)] -1^111(9)]where Ip(0) = E S(0),5"(0).A consistent estimate of E H1 (0), the expectation of the Hessian under the true (unknown)model, is given by H1 (0). A consistent estimate of IV O), the variance-covariance matrix of thescore function, is given byIi = E Si(0)s:(0),i=iwhere S1(.) is the score function for the ith subject. Combining these results leads to a consistentestimate of V1 (0) given byVi =^( 0 )]' Ii [H1A -1 .Because this extension to the Zeger, Liang, and Self approach incorporates time dependentcovariates, it provides the necessary tools for the analysis of the binary hummingbird data.The extension presented in this section permits the success probabilities to be modelled asa function of time, while taking into account the correlation between measurements on eachChapter 3. Analysis Based on a Working Likelihood Approach^ 34bird, and thereby allows comparisons of temporal response patterns across treatments which aremore direct than the comparisons achieved by either univariate analysis-of-variance for repeatedmeasures or MANOVA.3.4 Exploring the Binary Hummingbird DataPrior to applying the extension of the methods of Zeger, Liang, and Self (1985) to the analysisof the binary hummingbird data, a series of plots was examined to determine appropriateparametric forms to be incorporated into modelling of the temporal response patterns.To check if a day effect was apparent, the average treatment responses on the logit scale(log( 1 1—Y-4--cFc ), c = .5) for successive blocks of 3, 5, and 10 trials were plotted (see Figures 3.1-3.3). Because there were no systematic discontinuities at those trials corresponding to newdays (trials 61 and 121), there seemed to be no need to model a day effect. To get an overallpicture of how treatment responses changed over time, the averages over the 6 birds withineach treatment were calculated separately at each time point, and then the logits of the resultswere smoothed using robust locally weighted linear least squares ("lowess").Robust locally weighted linear regression is a method for smoothing a scatterplot, (x i , yi),i = 1, .., n,, in which the fitted value at xk is the value of a linear fit to the data using weightedleast squares, where the weight for (xi, yi) is large if xi is close to xk and small if it is not."Lowess" centres a window at xk which includes the r nearest neighbours of xk on either side.The 2r + 1 points in this window are weighted symmetrically about xk. If xk is near theendpoints, however, "lowess" cannot centre the window there. Points close to the endpoint,including xk, get larger weights than they would if the window was centred at xk. If the dataChapter 3. Analysis Based on a Working Likelihood Approach^ 35has extreme values near the endpoints, the resulting fit for xk may not be consistent with fitsfor points at which the window can be centred. Consequently, "lowess" fits near the endpointsmay have peculiar features.Figures 3.4-3.6 show the results of this "lowess" smoothing with windows of 10, 20, and30 points. In these figures, the fits near the endpoints do not seem peculiar because logits oftreatment average responses (over 6 birds) have been smoothed. This averaging reduces thenumber of extreme values in the data and protects against extreme values near the endpoints.Figure 3.6 seems to best display the basic features of the change in transformed responses overtime. The average responses for birds in treatments 2 (PT) and 3 (FT) appear to grow roughlylinearly with time in the logit scale, with the responses for birds in treatment 3 (FT) beinghigher, in general, than the responses for birds in treatments 1 (NT) and 2 (PT). This is contraryto the expectations of the investigator, who used psychological principles of visual perception topredict that birds in the treatment groups with discontinuous (PT) and continous (FT) visibleguides should have similar learning trajectories. For birds in treatment 1, improvement in theaverage response appears to be either linear or quadratic. The responses for birds in treatment1 seem to be higher, in general, than the responses for birds in treatment 2, but not as high asthose for birds in treatment 3. Of all the birds, those in treatment 3 seem to have the highestresponses throughout the entire experiment.To study individual-to-individual variation within each treatment, the logits of individualresults (log(-1 ), c = .5) at each time point were also smoothed by "lowess", using a windowof 30 consecutive time points. Figures 3.7-3.9 display the results of this smoothing. Step-likepatterns associated with sudden awareness of the relationship between light cue and feederChapter 3. Analysis Based on a Working Likelihood Approach^ 36were expected by the experimenter, but are not present in these plots. In all the figures,the peculiarities of "lowess" fits near the endpoints occur because individual logits, ratherthan logits of treatment averages are smoothed. The individual logits are more likely to haveextreme values at the endpoints, leading to fits which may not accurately describe the patternsin the data of each individual. In Figure 3.7, note that two females in treatment 1 share thesame "lowess" fits given by the horizontal solid line. This is because both birds have binaryseries which are virtually all zeroes; in other words, almost all of their first feeding forays wereunsuccessful. In addition, a third female has a binary series which is predominantly zero upto the 110th trial. In the interior of the data sets, the plots do not show any obvious changein response variability over time for any of the treatments. Moreover, at any given time, thevariability of responses for birds in treatment 1 (NT) appears to be larger than the variabilityof responses for birds in treatments 3 (FT). At any given time, the variability of responses forbirds in treatment 2 (PT) appears to be the smallest of all three treatments.These plots of the "lowess"-smoothed binary responses for each bird do not suggest anobvious difference between the responses of females and males in treatment 2, but in treatments1 and 3, the male responses seem to be higher, on average, than the female responses. In fact,when designing the experiment, the investigator did not anticipate a difference in the successprobabilities for the first feeding foray of male and female hummingbirds. However, sincesex may be incorporated into the model with little extra effort, two analyses of the binaryhummingbird data are presented in the following sections; the first analysis does not includesex as a covariate, while the second does. The first analysis corresponds to the classical analysis(univariate analysis-of-variance for repeated measures) discussed in the previous section, whereChapter 3. Analysis Based on a Working Likelihood Approach^ 37sex was ignored. Note that the unequal number of males and females on one of the treatmentsin the experiment would complicate including sex as a covariate in the classical analysis.For the binary hummingbird data on the first feeding foray, both the analysis excludingsex and the analysis including sex will be repeated under the independence and the first-orderworking model. The independence working model assumes that observations within individu-als, as well as between individuals, are independent. The first-order working model, assumesthat the observations for all individuals share a common first lag autocorrelation.3.5 Model Fitting With Sex Excluded3.5.1 The Initial ModelFor the analysis of the binary data which excludes sex as a covariate, the exploratory plotssuggest that the transformed probability of success on the first feeding foray may be modelled,initially, aslogit ri,t =^+ 01,it 02,it 2^i = 1,..,3, T = 1, ..,180, t = s.d.(T)where p i (i = 1, 2, 3) may be interpreted as the main effect of the i th treatment. Since birds indifferent treatment groups are not restricted to have the same linear and quadratic coefficientsfor the pattern over time, interactions between time and treatment group are permitted by thismodel. Note that the time covariate has been standardized to orthogonalize the highly collinearlinear and quadratic time covariates.T — TChapter 3. Analysis Based on a Working Likelihood Approach^ 383.5.2 An Informal Check of the Working LikelihoodsWhen calculating the standard errors of the parameter estimates, we must take into accountthat the working likelihood may not be the same as the true likelihood. Section 3.3 providesconsistent estimators of the covariance matrix for the parameter estimates from which "robust"standard error estimates can be obtained. Because these "robust" standard error estimatesaccount for the fact that the working likelihood may be incorrectly specified, we expect themto be different from the "naive" standard error estimates calculated assuming that the workinglikelihood correctly specifies the true likelihood. If the working likelihood is correctly specified,the robust standard error estimates should be the same as the naive standard error estimatesas the number of subjects becomes large. Thus, an informal method for checking how well theworking likelihood approximates the true likelihood (when the number of subjects is sufficientlylarge) might compare the naive and robust standard error estimates or the corresponding naiveand robust z-scores. Differences between robust and naive standard error estimates couldresult from two possibilities: (i) although the working likelihood adequately described the truelikelihood, the number of subjects was not large enough for accurate estimation of the truestandard errors; or, (ii) the working likelihood was not adequate for the data.Table 3.1 presents the naive and robust z-scores resulting from the initial model fits for boththe independence and first-order working likelihoods. Notice that the naive z-scores for theindependence fit are consistently larger, and typically substantially so, than the correspondingrobust z-scores. If we assume that the robust z-scores reflect the associations between theoutcome variable and the covariates accurately, we might conclude that the naive z-scores forthe independence working likelihood tend to overstate the association, a well-known resultChapter 3. Analysis Based on a Working Likelihood Approach^ 39when the observations on a subject are positively correlated. More efficient parameter andstandard error estimates might then be expected from the first-order working likelihood whichmodels some correlation. Although substantial differences still exist for many parameters, theagreement between the naive and robust z-scores appears slightly improved for the first-orderworking likelihood. This suggests the first-order working likelihood may better reflect the actuallikelihood than the independence working likelihood; a gain in efficiency of estimation wouldthen result from using the first-order working likelihood instead of the independence workinglikelihood. This conclusion is somewhat tenuous since the assumption that the robust z-scoresreflect the associations between the outcome variables and the covariates accurately may notbe reasonable with only 18 birds in the experiment. Qualitatively, the conclusions obtainedfrom the two working likelihoods are identical: modelling of the treatment main effects and thelinear terms seems worthwhile, while modelling of the quadratic terms for treatments 2 and 3may be unnecessary.Table 3.1. Comparison of Z-scores from the Initial ModelIndependence First-Orderparameter estimate robust z-score naive z-score estimate robust z-score naive z-scorePi -.31 -.97 -3.35 -.32 -1.00 -2.95P2 -.71 -3.41 -7.23 -.68 -3.68 -6.32/13 .16 .42 1.72 .17 .42 1.5401,1 .27 2.31 4.14 .26 2.37 3.51/31,2 .50 3.03 7.27 .48 3.31 6.4231,3 .60 2.70 9.13 .62 2.54 7.92/32,1 -.21 -2.37 -2.82 -.20 -2.23 -2.41/32,2 .04 .37 .47 .03 .35 .41/32,3 -.09 -.68 -1.29 -.09 -.66 -1.02Chapter 3. Analysis Based on a Working Likelihood Approach^ 403.6 Model Reductions With Sex Excluded3.6.1 Strategy for Model ReductionsThe primary aim of this analysis is to understand the differences in hummingbird learningacross treatment groups. Because birds in different treatments may learn in different ways, theinitial model for the success probabilities allows all possible interactions of time and treatment.The purpose of subsequent model reductions is to obtain a parsimonious final model whichreliably addresses the investigator's questions about the way hummingbird learning differs acrosstreatments. To focus the analysis on treatment differences, the reduction procedure shouldeliminate unnecessary terms for negligible interactions between treatment and time first, leavingthe examination of treatment main effects for the final steps. The elimination of interactionterms reduces the complexity of the model to be used for interpretation; in particular, a finalmodel with no interactions between treatment and time permits differences in learning patternsfor birds in different treatment groups to be easily described by treatment main effects only.One such reduction procedure for the binary hummingbird data (with sex excluded as acovariate) is outlined in the following steps:1. Fit the initial model.2. Check the quadradic terms.• Can they be set equal?• If so, can this common term be eliminated?3. Check the linear terms.Chapter 3. Analysis Based on a Working Likelihood Approach^ 41• Can they be set equal?• If so, can this common term be eliminated?4. Examine the treatment main effects.This procedure is not meant to be followed rigidly. Instead, it is meant as a rough guidelinefor model reductions, to be used in conjunction with existing scientific theory and exploratoryplots of the data.3.6.2 Description of the ReductionsUsing the strategy outlined above, two plausible model reduction paths arise for both theindependence and first-order working likelihoods. In this section, the two models are compared,and for each working likelihood, the model most consistent with patterns in the data suggestedby the exploratory plots is selected to represent the experimental results.To examine how the results of the analysis depend on the choice of the working likelihood,these final models are then compared across working likelihoods. Final models with similarparameter and standard error estimates for different working likelihoods suggest that the effi-ciency of the two working likelihoods is about the same. Note that with only 18 birds in theexperiment, the number of subjects may not be sufficiently large to ensure the adequacy of theasymptotic multivariate normal approximation to the distribution of the parameter estimates.The Wald statistics used to test the model reductions are based on this approximation.The first model reduction path is described by the following steps:1. Check the quadratic terms.Chapter 3. Analysis Based on a Working Likelihood Approach^ 42• Can they be set equal? Yes; the p-values for this reduction are .17 and .21 for theindependence and first-order working likelihoods, respectively.• Can the common quadratic term be eliminated? Yes; the p-values for this reductionare .17 and .19 for the independence and first-order working likelihoods, respectively.(P-values are calculated using the reduced model from the previous step.)2. Check the linear terms.• Can they be set equal? Yes; the p-values for this reduction are .29 and .26 for theindependence and first-order working likelihoods, respectively.• Can this common linear term be eliminated? No; the p-value for this reduction isless than .001 using both the independence and the first-order working likelihoods.3. Examine the treatment main effects.• Can they be set equal? Not entirely clear; the p-value for this reduction is .14 usingboth the independence and first-order working likelihoods.This first plausible reduction path leads to models with coefficients and standard errorestimates given in Table 3.2. Note that the predicted success probabilities resulting from thefinal models are virtually identical. Figure 3.10 displays the final model fit for the independenceworking likelihood and suggests the success probabilities for birds in treatments 1 and 2 are verysimilar. Since corr(iii,iii) :::-., 0, the standard errors in Table 3.2 indicate that the model couldbe further reduced by taking main effects for treatments 1 and 2 to be the same. In fact, thep-values for the reduction are .69 and .71, with the common fitted values for p i and 1/2 beingChapter 3. Analysis Based on a Working Likelihood Approach^ 43—.60 (s.e.=.18) and —.58 (s.e.=.17), for the independence and first-order working likelihoods,respectively.Table 3.2. Final Model Fits from the First Reduction PathIndependence^First-OrderParameters^estimate s.e.^estimate s.e.PPi.P2P3/31,1 = /31,2 ==/31,3/32,1 = /32,2 = /32,3^.17 ^.03-.53^.31^-.52^.30-.67^.17^-.65^.15.07^.33^.08^.34.45^.11^.45^.10The estimated lag one autocorrelation and standard error provided by the first-order work-ing likelihood suggest the presence of (positive) correlation in the data of each bird. On theother hand, the first-order and independence working likelihoods result in virtually the sameestimates of the regression parameters and their standard errors. Both final models indicatethe transformed response depends only linearly on time, with a common slope for all threetreatment groups; no terms for interaction of treatment and time are included. This lack of in-teraction in the fitted model allows for a simple interpretation of treatment differences throughexamination of treatment main effects only. For example, Figure 3.10 suggests the successprobabilities of birds in treatment 1 are about the same as the success probabilities of birds intreatment 2, while birds in treatment 3 have higher success probabilities than birds in eithertreatment 1 or treatment 2 (about .67 units greater in the logit scale at any given time).These final models describe the success probabilities of birds in treatment 1 by a lineartrajectory. This seems to disagree with the exploratory lowess plot of the transformed outcomesin Figure 3.6, which suggests the success probabilities of birds in treatment 1 could be betterChapter 3. Analysis Based on a Working Likelihood Approach^ 44described by a quadratic trajectory. Figure 3.6 also suggests birds in treatment 2 start out withlower success probabilities than birds in treatment 1, but develop higher success probabilitiesby the end of the experiment.A possible explanation for the discrepancy between the lowess plot and the first final modelemerges upon examining the path of model reductions leading to the first final model. Althoughthe p-values for setting the quadratic terms for all three treatments equal are .17 and .21 forthe independence and first-order working likelihoods respectively, and the p-values for settingthis common quadratic term to be equal to zero are .17 and .19 for the independence andfirst-order working likelihoods, respectively, the p-values for proceeding directly from the fullmodel to a model with no quadratic terms are a marginal .12 and .14 for the independence andfirst-order working likelihoods, respectively. This suggests the final model may be neglectingsome curvature present in the data. In fact, z-scores for the quadratic term in treatment 1 (-2.37 and -2.23 for the independence and first-order working likelihoods, respectively; see Table3.1) lend support to what is suggested by Figure 3.6: much of the curvature present in the datacomes from the learning trajectory of birds in treatment 1.Table 3.3 displays the coefficient and standard error estimates resulting from an alternatereduction path which allows curvature in the fitted learning trajectory of birds in treatment 1.This second reduction path is described by the following steps:1. Check the quadratic terms.• Can they be eliminated? No; the p-values for this reduction are a marginal .12 and.14 for the independence and first-order working likelihoods, respectively.Chapter 3. Analysis Based on a Working Likelihood Approach^ 452. Check the linear terms.• Can they be set equal? Yes; the p-values for this reduction are .32 and .27 for theindependence and first-order working likelihoods, respectively.• Can the common linear term be eliminated? No; the p-values for this reduction areless than .001 for both the independence and first-order working likelihoods.3. Examine the treatment main effects.• Different quadratic terms for different treatment groups are still present in the model,so checking whether the treatment main effects are equal corresponds to checkingwhether the success probabilities for all three treatment groups are equal at t = 0,or at trial 90.5. The p-values for this reduction are .12 and .13 for the independenceand first-order working likelihoods, respectively, indicating that this reduction maynot be justified. Note that even if this reduction was justified, it may be of littlepractical interest because it does not simplify interpretation of the final model.4. For interpretability of the final model...• Eliminate the quadratic terms for treatments 2 and 3. The p-values for this reductionare .62 and .74 for the independence and first-order working likelihoods, respectively.These p-values justify what is suggested by the lowess plot: the success probabilitiesof birds in treatments 2 and 3 can be adequately described by a linear trajectory.Chapter 3. Analysis Based on a Working Likelihood Approach^ 46Table 3.3. Final Model Fits from the Second Reduction PathParametersIndependence First-Orderestimate s.e. estimate s.e.p .17 .03-.31 .32 -.52 .30/2 2 -.67 .17 -.65 .15.07 .33 .08 .3401,1 = 01,1 = 03,1 .46 .11 .45 .1002,1 = 02,2 = 02,3 -.23 .10 -.22 .10As was the case for the first reduction path, the estimated first lag autocorrelation and stan-dard error from the first-order working likelihood suggest the presence of (positive) correlationin the data of each bird, but the predicted success probabilities resulting from the independenceand first-order fits are virtually identical. Figure 3.11 plots the final model for the second re-duction path using the independence working likelihood. Comparison of Figures 3.10 and 3.11suggests the only noticeable difference between the final models for the two reduction paths isthe fit for the learning trajectory of birds in treatment 1. Because the final model from thesecond reduction path allows curvature in the fitted learning trajectory of treatment 1 and thefinal model from the first reduction path does not, the former appears to be more consistentwith patterns displayed by the exploratory lowess plots, and may thus be considered the morereliable summary of the experimental results.Note that the final fits for birds in treatment 1 resulting from the two different reductionpaths do not differ greatly in the probability scale except at the very beginning and the very endof the experiment; see Figure 3.12. The largest discrepancies in the fitted success probabilitiesoccur at the end of the experiment, with the maximum difference in fitted probabilities on thelast trial. Here the linear model fits a success probability of .56 (with a standard error of aboutChapter 3. Analysis Based on a Working Likelihood Approach^ 47.09) and the quadratic model fits a probability of .45 (with a standard error of about .10).3.7 Residual Analysis for the Final Model with Sex ExcludedAnalysis is based on the Pearson residuals (given by (yi t — fit )/0"--;st (1 — ii-70) from both the in-dependence and first-order fits. To determine whether the independence working assumption isreasonable (recall that the independence working assumption isn't necessary for asymptoticallyvalid results), the independence residuals are examined to see if they are approximately uncor-related with mean 0 and variance 1. After analyzing the independence residuals, the first-orderresiduals are examined to determine whether the assumption of a common lag one autocorrela-tion for all birds (necessary for asymptotically valid results when using the first-order workinglikelihood) is reasonable.3.7.1 Analysis of the Independence ResidualsUnder the independence working assumption, the 180 Pearson residuals for each bird should beapproximately uncorrelated, with mean 0 and variance 1. Figure 3.13 displays scatterplots ofthe Pearson residuals for each bird and suggests the distributions of residuals for all birds havemean 0 and variance 1, with those for birds in treatments 1 and 2 being positively skewed, andthose for birds in treatment 3 being more symmetric. This is explained by the relative numberof O's and l's in each treatment. Only 38% of treatment 1 responses (407/1080) and 35% oftreatment 2 responses (374/1080) are l's, whereas 52% of the responses in treatment 3 are l's(558/1080). The final model therefore fits the "1" responses in treatment 3 as well as the "0"responses, but for treatments 1 and 2, fits "0" responses better than "1" responses since theChapter 3. Analysis Based on a Working Likelihood Approach^ 48majority of responses in these treatments is 0. Note that the gaps around zero in Figure 3.13arise because the responses are either 0 or 1, and the denominator of the Pearson residuals(Vflt (1 — CO) is essentially constant.To check if Pearson residuals from the independence final model were approximately uncorre-lated within birds, estimates of the first, second, and third lag Pearson residual autocorrelationswere calculated for each bird (see Table 3.4). Under the independence working assumption, thestandard error of each of these autocorrelation estimates is about 1/ 180 = .075. From Table3.4, we see that 4 out of 18, or 2/9 of the first lag autocorrelations appear significantly differentfrom zero at level a = .05 (birds 5, 6, 14, and 15 have rh's greater than .15). Because birdsare independent, we would expect this to happen for only 1 out of every 20 estimated first lagautocorrelations. Therefore, the assumption that observations within birds are independentseems questionable for these data. It is interesting to note that when the three autocorrelationsare compared across birds, birds 14 and 15 (both in treatment 3) appear to have higher first,second, and third lag autocorrelations than the others.Checking the adequacy of the final model is limited by the fact that the only explanatoryvariables modelled are treatment group and time. The patterns of Pearson residuals acrosstreatment groups can be seen in Figure 3.13 and have already been explained by the relativenumber of 0's and l's in each treatment group. Note that the Pearson residuals for the i thtreatment group can be characterized by two functions of time - one (essentially the negativefit, or —1-7-t ) for those responses which are 0; the other (essentially the negative fit plus one, or1 — fii) for those responses which are 1. Thus, plotting residuals against time is not a usefulcheck of model adequacy. Another variable, excluded as a covariate in the model because itsChapter 3. Analysis Based on a Working Likelihood Approach^ 49role in hummingbird learning was thought to be minimal, is the sex of each bird. Figure 3.14,a boxplot of the Pearson residuals by sex, suggests sex should be included in the model. Theresidual distribution for males appears to be positively skewed, whereas for females it appearsto be negatively skewed. The skewness of the distributions is a consequence of the femaleshaving a smaller proportion (607/1800 or 34%) of successful first feeding forays (response=1)than the males (732/1440 or 51%).Table 3.4. Estimated Autocorrelations for the Independence ResidualsBird (Treatment 1)1^2^3^4^5^6.07 -.02 .000 -.06^.16 .16.11 -.09^.09^-.05 .02 .02.08 -.09^.03^-.04 .10 .08Bird (Treatment 2)7^8^9^10^11^12P1 - .005 .13 -.09 -.07^.10^.02P2^-.05^.07^.13 -.09^.03^.04/53 -.009 .10 -.07^.10^-.06 .00213Bird (Treatment 3)14^15^16^17 18/51 .01 .24 .34 -.07 .02 .10P2 .14 .16 .28 .03 .15 .07143 .02 .28 .37 -.006 -.01 .123.7.2 Analysis of the First-Order ResidualsFor the first-order working model, let r2 denote the estimate of the l th subject's lag 1 autocorre-lation obtained by regressing the Pearson residuals for that subject on their predecessors. Thevariance of ri, approximated by 1/ni(1 - lq), where ni and pi are, respectively, the numberP1P2P3Chapter 3. Analysis Based on a Working Likelihood Approach^ 50of time points and the lag one autocorrelation for the i th subject, can be stabilized by tak-ing the inverse hyperbolic sine of ri. If the number of time-points for each subject is large,sinh -1 (ri) is approximately normally distributed with mean sinh- l(pi) and variance l/ni.The standard error of all the transformed individual first-lag autocorrelation estimates is thusapproximately 1/ 180 = .075, allowing a check on the assumption of a common first lag au-tocorrelation for all subjects. Testing the values in Table 3.5 for homogeneity (average= .16,sample variance= .021), leads to a chi-squared statistic of 65.2 on 17 degrees of freedom; thedata thus provide quite convincing evidence that a common lag one autocorrelation is notshared by all subjects.Table 3.5. Estimated Lag One Autocorrelations for the First-Order ResidualsBird (Treatment 1)1 2 3 4 5 6ri .16 .04 .27 .04 .33 .19sinh- l(ri) .16 .04 .27 .04 .33 .197Bird (Treatment 2)8^9^10^11 12Ti -.005 .13 .06 -.07 .17 .03sinh -1 (ri) -.005 .13 .06 -.07 .17 .03Bird (Treatment 3)13 14 15 16 17 18ri .04 .27 .47 .15 .45 .15sinh -1 (ri) .04 .27 .45 .15 .44 .15The assumption of a common lag one autocorrelation is critical to deriving the asymptoticresults for the first-order working model, so the fact that this assumption appears to be violatedcasts doubt on the validity of such an analysis. However, except for birds 3, 5, 14, 15 and 17,the estimated first-lag autocorrelations are not particularly large. The reason for compellingChapter 3. Analysis Based on a Working Likelihood Approach^ 51evidence against homogeneity is that the standard errors of the estimates are quite small due tothe large number of responses (ni = 180) for each bird. Although the assumption of a commonlag one autocorrelation does not appear to hold for the binary responses of the 18 birds in thisexperiment, the conclusions drawn from the analysis based on the first-order working likelihoodare virtually identical to those from the analysis based on the independence working likelihood.Therefore, the conclusions drawn from the first-order analysis appear to be reasonable.The Pearson residuals obtained from the first-order working likelihood are virtually thesame as those obtained by the independence working likelihood because the final models arealmost identical. The plot of the first-order residuals against sex has the same features as theplot of the independence residuals displayed in Figure 3.14. Hence, analysis of both types ofPearson residuals indicates sex is an important covariate.3.8 Summary of Analysis with Sex ExcludedThe models obtained by the independence and first-order working likelihoods are virtuallyidentical. For both working likelihoods, the final model corresponding to the second reductionpath has been selected to represent the experimental results. This final model indicates anincrease in success probabilities over successive trials for treatments 2 (PT) and 3 (FT), andan increase, peaking at about trial 110 (near the end of the second day), followed by a declinein success probability for treatment 1 (NT).The final model uses a quadratic trend in the logit scale to describe the pattern of improve-ment in treatment 1, and a linear trend to describe the patterns of improvement in treatments2 and 3. Its main features are summarized as follows:Chapter 3. Analysis Based on a Working Likelihood Approach^ 52(i) birds in treatment 3 (FT) have greater fitted success probabilities than birds in treatments1 (NT) and 2 (PT) throughout the entire experiment;(ii) the fitted rate of improvement for birds in treatment 2 (PT) is the same as the fitted rateof improvement for birds in treatment 3 (FT); and(iii) although birds in treatment 2 (PT) have lower fitted success probabilities than birds intreatment 1 (NT) for the intermediate trials of the experiment (trials 30-120), sometimeduring the course of the third day of the experiment (at about the 150t h trial), theydevelop greater fitted success probabilities than birds in treatment 1.Note that different working assumptions lead to virtually identical fits. This is becauseboth approaches give similar parameter and standard error estimates, suggesting the differencein efficiency between the two approaches is minimal. The estimated lag one auto-correlationand standard error provided by the first-order working likelihood seem to indicate the presenceof positive correlation in the data (see Tables 3.2 and 3.3), suggesting the first-order param-eter estimates and standard errors may be more efficient. On the other hand, the first-orderPearson residuals suggest a common lag one autocorrelation may not be shared by all subjects.Because this assumption is critical in deriving the asymptotic results for the first-order param-eter estimates and standard errors, the independence model fit (which does not require anyassumptions about the time-dependence within each subject's responses) may be consideredthe more reliable summary of the experimental results.Both the exploratory plots and the residual analysis suggest sex may be an importantcovariate worth including in the model. In the next section, the binary hummingbird data isChapter 3. Analysis Based on a Working Likelihood Approach^ 53analyzed with sex included as a covariate.3.9 The Initial Model With Sex IncludedThe importance of sex as a covariate is clearly indicated by Figures 3.15-3.17 which plot thelowess smoothed logits of the average response by sex for each treatment group. On average,males seem to have a higher success probability than females in all three treatments throughoutthe entire experiment. For treatments 2 (PT) and 3 (FT), both males and females appear tohave success probabilities which increase roughly linearly with time, whereas for treatment 1(NT), the male and female success probabilities appear to increase over the first two days, butto decrease over at least part of the third day. For all three treatment groups, it appears amodel incorporating an additive (on the logit scale) effect for sex, with males having highertransformed success probabilities than females, may be adequate to explain the patterns in thedata. Furthermore, sex-by-treatment interaction may also be present, since the figures suggestthat a different additive effect for sex in each treatment group may be necessary.These exploratory plots suggest the transformed probability of success on the first feedingforay may be modelled, initially, asT — Tlogit^=^sexi,s^01,1,st + 132,i,8t2^i = 1, .., 3, s = 1, 2, T = 1, .., 180, t = ^s.d.(T)where pi may be interpreted as the main effect of the ith treatment group, sex2, 1 is thefemale-to-male differential sex effect for the ith treatment group, and sex2,2 s-g 0 (for males inthe ith treatment group). The female-to-male differential sex effect is permitted to vary acrosstreatments so treatment-by-sex interaction is incorporated. Similarly, the three-way interactionof sex, treatment and time is incorporated because different linear and quadratic coefficientsChapter 3. Analysis Based on a Working Likelihood Approach^ 54for each sex and each treatment group are permitted. Again, the time covariate has beenstandardized to orthogonalize the highly collinear linear and quadratic time covariates. As inSections 3.5-3.8, the analysis is repeated under both the independence and the first-orderworking model.3.10 Model Reductions with Sex Included3.10.1 Strategy for Model ReductionsThe reduction strategy for the initial model which includes sex as a covariate is similar toits counterpart with sex ignored. Because the investigator does not anticipate complicatedsex-by-time interactions, the reduction strategy checks for the presence of such interactionsbefore it checks for the interaction of treatment group and time. In addition, quadratic termscommon to males and females within a treatment are treated differently from the correspondinglinear terms because Figure 3.15 suggests males and females in the treatment 1 have successprobabilities (for the first feeding foray) which may be described by a quadratic trend in thelogit scale. The general reduction strategy is outlined by the following steps:1. Fit the initial model.2. Check the quadradic terms.• Can the coefficients for males and females be set equal within all treatment groups?• If so, can they be elimated?3. Check the linear terms.• Can the coefficients for males and females be set equal within all treatment groups?Chapter 3. Analysis Based on a Working Likelihood Approach^ 55• If so, can these coefficients be set equal for all treatment groups?• If so, can this common term be eliminated?4. Check the differential sex effects.• Can the coefficients be set equal for all treatment groups?• If so, can this common term be eliminated?5. Examine the treatment main effects.As in Section 3.6, this procedure is meant as a rough guideline for model reductions, to be usedin conjunction with existing scientific theory and exploratory plots of the data.3.10.2 Description of the Reductions for the Independence Working LikelihoodThe model reduction path for the independence working likelihood including sex as a covariateis described by the following steps:1. Check the quadratic terms.• Can the coefficients for males and females be set equal within all treatment groups?Yes; the p-value for this reduction is .80.• Can the quadratic coefficients be eliminated? No; the p-value for the simultaneouselimination of all three quadratic coefficients is a marginal .12. Leave the quadraticterms for each treatment in the model for the time being.2. Check the linear terms.Chapter 3. Analysis Based on a Working Likelihood Approach^ 56• Can the linear coefficients for males and females be set equal within each treatmentgroup? Yes; the p-value for this reduction is .91.• Can the resulting linear coefficients be set equal for all treatment groups? Yes; thep-value for this reduction is .38.• Can this common linear term be eliminated? No; the p-value for this reduction isless than .001.3. Check the differential sex effects.• Can they be set equal for all treatment groups? Yes; the p-value for this reductionis .40.• Can this additive sex effect be eliminated? No; the p-value for this reduction is .007.4. Examine the treatment main effects.• Different quadratic terms for different treatment groups are still present in the model,so checking whether the treatment main effects are equal corresponds to checkingwhether the success probabilities for all three treatment groups are equal at t = 0,or at trial 90.5. The p-value for this reduction is .10, indicating this reduction maynot be justified. As explained in Section 3.6, a reduction like this is of little practicalinterest because it does not simplify interpretation of the final model.5. For interpretability of the final model...• Eliminate the quadratic terms for treatments 2 and 3. The p-value for this reductionis .74, justifying what is suggested by the lowess plots in Figures 3.16 and 3.17: theChapter 3. Analysis Based on a Working Likelihood Approach^ 57success probabilities of birds in treatments 2 and 3 can be adequately described bya linear trajectory.Note that quadratic terms for all three treatment groups are permitted to remain in themodel until the final step, leading to a final model which is consistent with the patterns observedin the lowess plots. The reduction path for the independence working likelihood is summarizedin Table 3.6.Table 3.6. Model Reductions for the Independence Working Likelihood with Sex IncludedParameters 1 2Model3^4 5 6Pi .36(.03) .36(.17) .36(.17) .39(.17) .18(.27) .18(.27)P2 -.59(.31) -.53(.26) -.53(.24) -.53(.24) -.36(.26) -.32(.20)P3 .53(.75) .60(.71) .60(.69) .58(.67) .54(.48) .45(.41)sex1,1 -1.03(.42) -1.02(A2) -1.03(.43) -1.06(.44) -.75(.28) -.75(.28)sex2,1 -.25(.40) -.37(.30) -.36(.30) -.36(.31) set=sexi,i set=sex 1 , 1sex3,1 -.72(.76) -.86(.66) -.86(.64) -.83(.60) set=sex1, 1 set=sex1,101,1,1 .21(.09) .21(.09) .29(.14) .48(.11) .47(.11) .48(.11))31,2,1 .55(.21) .54(.21) .50(.16) set=,31 , 1 , 1 set= i31 , 1 , 1 set=th, i , i01,3,1 .54(.38) .54(.38) .63(.24) set=/31,1,1 set= /3 1,1,1 set=/31,1,1/31,1,2/31,2,2.41(.31).46(.23).41(.31).46(.24)set=/31,1,1set=01,2,1set=/31,1,1set=/31,1,1set=fii,i,iset=81,1,1set=/31,1,1set=/31,1,1.74(.27) .73(.27) set=/31,3,1 set -4314,1 set=/31,1,1 set=/31,1,1/32,1,1 -.21(.08) -.21(.09) -.22(.08) -.24(.09) -.24(.10) -.24(.10)/32,2,1 -.04(.05) .03(.10) .04(.10) .04(.10) .04(.11) -/32,3,1 -.16(.15) -.09(.12) -.10(.15) -.10(.14) -.09(.14) -/32,1,2 -.22(.21) set=82,1,1 set=i32,1,1 t^3se =, 2,1, i set7-132,1,1 set=/32,1,1/32,2,2 .10(.18) set=82,2,1 set=82,2,1 set=82,2,1 set=82,2,1 -/32,3,2 -.007(.18) Set=/32,3,1 set=g2,3,1 set=02,3,1 set=f32,3,1 -Working Deviance 4029.67 4031.50 4035.64 4048.42 4062.55 4064.58Pearson X 2 3235.89 3233.99 3236.09 3252.95 3252.44 3252.62degrees freedom 3222 3225 3228 3230 3232 32343.10.3 Description of the Reductions for the First-Order Working LikelihoodThe model reduction path for the independence working likelihood including sex as a covariateis described by the following steps:Chapter 3. Analysis Based on a Working Likelihood Approach^ 581. Check the quadratic terms.• Can the coefficients for males and females be set equal within all treatment groups?Yes; the p-value for this reduction is .74.• Can the quadratic coefficients be eliminated? Yes; the p-value for this reduction is.20.2. Check the linear terms.• Can the linear coefficients for males and females be set equal within each treatmentgroup? Yes; the p-value for this reduction is .88.• Can the resulting linear coefficients be set equal for all treatment groups? Yes; thep-value for this reduction is .34.• Can this common linear term be elimated? No; the p-value for this reduction is lessthan .001.3. Check the differential sex effects.• Can they be set equal for all treatment groups? Yes; the p-value for this reductionis .33.• Can this additive sex effect be eliminated? No; the p-value for this reduction is .02.4. Examine the treatment main effects.• Can they be set equal? Not entirely clear; the p-value for this reduction is a marginal.10.Chapter 3. Analysis Based on a Working Likelihood Approach^ 59The reduction path for the first-order working likelihood is summarized in Table 3.7.Since corr(rii,iii) .c.--, 0, the standard errors for the final model in Table 3.7 indicate thattaking the main effects for treatments 1 and 2 to be equal would be a permissible further reduc-tion of the model. In fact, the p-value for this reduction is .35, with the fitted value commonto p i and p2 being —.19 (s.e.= .21). This reduction could be carried out for interpretability ofthe final model but is not included in this analysis.Table 3.7. Model Reductions for the First-Order Working Likelihood with Sex IncludedParameters 1 2Model3^4 5 6P .13(.03) .13(.03) .13(.03) .13(.03) .13(.03) .14(.03)iii .36(.04) .35(.20) .16(.29) .15(.25) .17(.26) -.04(.27)P2 -.58(.30) -.53(.24) -.49(.19) -.50(.18) -.50(.17) -.30(.20)Pa .59(.87) .65(.81) .56(.76) .54(.69) .51(.64) .46(.42)sexio. -1.03(.40) -1.02(.42) -1.02(.42) -1.02(.41) -1.06(.42) -.73(.28)sex2,1 -.23(.37) -.34(.27) -.34(.27) -.33(.27) -.33(.27) set=sesi,isex3,1 -.79(.88) -.89(.75) -.89(.76) -.87(.69) -.84(.64) set=ses1,101,1,1 .20(.08) .20(.08) .19(.07) .27(.13) .47(.11) .46(.11)01,2,1 .52(.20) .51(.20) .52(.20) .49(.15) set=i31,1,1 set=th,i,i01,3,1 .54(.38) .53(.38) .53(.38) .64(.26) set=g1,1,1 set=th,i,i01,1,2 .42(.35) .42(.34) .42(.36) set=th,i,i set=/31,1,1 set=fli,i,i/31,2,2 .46(.22) .46(.22) .47(.23) set=fi1,2,1 set=-#1,1,1 set=/31,1,1/31,3,2 .77(.33) .76(.33) .77(.37) set=i31 ,3, 1 set=/31,1,1 set=/31,1,1/32,1,1 -.19(.09) -.20(.10) - - -/32,2,1 -.03(.05) .03(.10) - - -/32,3,1 -.15(.15) -.10(.13) - -/32,1,2 -.21(.24) set=024 , 1 - -/32,2,2 .09(.18) set=/32,2,1 - - -,32,3,2 -.04(.19) set=02,3,1 - - - -Working Deviance 3980.30 3981.21 3988.20 3993.00 4005.18 4017.33Pearson X 2 3230.33 3230.80 3234.58 3229.04 3234.18 3228.03degrees freedom 3221 3224 3227 3230 3232 3234As in the analysis where sex was not included as a covariate, the estimated lag one auto-correlation and standard error provided by the first-order working likelihood suggest the datafor each bird are positively correlated. As before, the fits resulting from the independence andChapter 3. Analysis Based on a Working Likelihood Approach^ 60first-order working likelihoods are similar. When sex is included in the analysis, the only differ-ence in the fits is the learning trajectory for birds in treatment 1. The independence workinglikelihood models a quadratic learning trajectory for birds in treatment 1, whereas the first-order working likelihood models a linear learning trajectory. The fitted learning trajectoriesresulting from the independence and first-order working likelihoods are virtually identical forboth treatments 2 and 3. Figures 3.18 and 3.19 display the fitted female learning trajectoriesin the logit scale for the independence and first-order working likelihoods respectively; becausethe effect of sex has been modelled as additive, the male trajectories are parallel to those forfemales, with the fitted (transformed) response for males being greater than that for femalesby an additive constant (-75 and -.73 for the independence and first-order working likelihood,respectively).3.11 Summary of Analysis with Sex IncludedExcept for the fitted learning trajectory for birds in the first treatment group, the final modelsobtained from the independence and first-order working likelihoods are virtually identical. Bothfinal models describe the effect of sex as additive only; interactions of sex with time or withtreatment group are not present in either model. Both models describe the success probabilitiesof males as being about .74 (s.e.= .28) units higher in the logit scale than those for femalesthroughout the entire experiment; sex is clearly an important covariate for these binary responsedata. As in the previous models where sex was not included as a covariate, the transformedsuccess probability for birds in treatments 2 (PT) and 3 (FT) appears to improve linearlyat the same rate in both final models, with the transformed success probabilities for birds inChapter 3. Analysis Based on a Working Likelihood Approach^ 61treatment 3 being about .77 (s.e.= .35) units higher than those for birds in treatment 2, forboth the working likelihoods, throughout the entire experiment.For the learning trajectories of birds in the first treatment group, different working likeli-hoods lead to different final models. The final model resulting from the independence workinglikelihood describes the learning trajectory for birds in treatment 1 (NT) by a quadratic trend inthe logit scale for both males and females, whereas the final model resulting from the first-orderworking likelihood describes the same trajectory by a linear trend. For birds in treatment 1,the independence final model suggests an increase, peaking at about trial 110 (near the end ofthe second day), followed by a decline in success probability in the logit scale. The first-orderfinal model, on the other hand, suggests a steady increase throughout the entire experiment(see Figures 3.18 and 3.19).Figure 3.20 displays the final fits resulting from the two working likelihoods for female birdsin treatment 1. Although these fits do not differ greatly in the probability scale, they may beinterpreted quite differently. For example, the independence fit suggests success probabilities forbirds in treatment 2 are higher than those for birds in treatment 1 at the end of the experiment(see Figure 3.18). On the other hand, the first-order fit suggests the success probabilities forbirds in treatment 1 are higher than those for birds in treatment 2 (see Figure 3.19). Collectingmore data on the first treatment group in the form of additional subjects or further trials mightresolve this conflict in interpretation, since the presence or absence of a quadratic effect wouldpresumably become more clear. In this analysis, the final model resulting from the independenceworking likelihood is viewed as the more reliable summary of the experimental results becausethe independence fits are more consistent with patterns in the exploratory plots (see FiguresChapter 3. Analysis Based on a Working Likelihood Approach^ 623.15-3.18) than the first-order fits.The results of this analysis indicate that sex is important in describing the probabilitiesof success on the first feeding foray for the hummingbirds in this experiment. One possibleexplanation for the importance of sex is the higher body-weight to wing-span ratio of males.Because males must work harder to obtain their food, they be more likely to succeed on the firstfeeding foray than females. Since sex is important in this analysis, it seems that any completeanalysis of the hummingbird data (binary or binomial) should include sex as a covariate.In the next chapter both the binary and binomial hummingbird data are analyzed with sexincluded as a covariate, using the generalized estimating equation (GEE) approach of Liangand Zeger (1986) and Zeger and Liang (1986). The GEE approach extends quasi-likelihoodtheory (see McCullagh and Nelder, 1989) to arrive at estimating equations for the regressionparameters, and is applicable to any longitudinal response with univariate marginal distri-butions for which the quasi-likelihood formulation is sensible. Unlike the working likelihoodapproach described in this section, the GEE approach does not require assumptions about thetrue correlation structure when explicitly modelling correlation in the responses to improve theefficiency of estimation. In these cases, the GEE approach permits many alternatives to theindependence "working correlation" structure, unlike the working likelihood approach whereonly one alternative is available.Chapter 4Analysis Based on the GEE ApproachThis chapter presents the generalized estimating equation (GEE) approach described in Liangand Zeger (1986) and Zeger and Liang (1986). This unified approach to the analysis of longitu-dinal data can be applied to any repeated measures data with univariate marginal distributionsfor which the quasi-likelihood formulation is sensible. The GEE approach is first used to analyzethe binary hummingbird data. Then, to illustrate an application to non-binary longitudinaldata, it is used to analyze the binomial hummingbird data.'4.1 Description of the GEE approachLet /Ti t , i = 1, .., K and t = 1, .., n1, be the outcome variable for the ith subject at the tthmeasurement, and xi t be the corresponding pxl vector of covariates. Suppose that we areinterested in how the outcome variable Yi t depends on the covariates xi t . If there were only onetime of observation for each subject, a generalized linear model (see McCullagh and Nelder,1989) could be used to model a variety of continuous and discrete outcome variables. Whendealing with longitudinal data, however, the situation is complicated by the fact that there may1 A11 the GEE analyses reported here were carried out using a SAS macro received from K.Y. Liang, andwritten by M. Rezaul Karim. The SAS macro permits modelling of the following "working" correlation matrices:independence, stationary M-dependent, non-stationary M-dependent, AR-M, exchangeable, unspecified, anduser-specified. Only working correlations which are common to all subjects may be specified. In addition,if observation times are unequally spaced or there is missing data, only the independence, exchangeable, oruser-specified working correlations should be used.63Chapter 4. Analysis Based on the GEE Approach^ 64be correlation among the repeated observations for a given subject. This correlation must betaken into account to obtain a valid analysis.Modelling the correlation directly is often difficult for non-Gaussian longitudinal data be-cause there are few natural models for the joint distribution of the repeated observations. Liangand Zeger (1986) and Zeger and Liang (1986) propose a methodology based on an extension ofthe quasi-likelihood approach described by Wedderburn (1974) and McCullagh (1983). Theymodel the marginal distribution of the univariate outcome at each time point by specifyinga known function of the mean as a linear function of the covariates, and by assuming the(marginal) variance is a known function of the mean. Rather than specify a form for the jointdistribution of the outcome vector for each subject, they merely specify a "working" correlationmatrix for this outcome vector. This partial specification of the joint distribution gives rise togeneralized estimating equations (GEEs) which, under weak assumptions, provide consistentestimates of the regression coefficients and their asymptotic covariance matrix.4.1.1 Brief Description of Quasi-LikelihoodSince the GEE approach is an extension of the quasi-likelihood approach to modelling indepen-dent univariate outcomes, a brief description of the pertinent aspects of quasi-likelihood theoryis provided to motivate the GEE approach and establish notation.Quasi-likelihood is a unified methodology for the regression analysis of (independent) dis-crete or continuous outcomes first proposed by Wedderburn (1974) and examined further byMcCullagh (1983). A desirable property of quasi-likelihood analysis is its flexibility. Unlikelikelihood analysis, the actual form of the outcome distribution does not have to be specified;Chapter 4. Analysis Based on the GEE Approach^ 65only the relationships between the outcome mean and the covariates and between the meanand the variance are required.To fix notation, let Yi be the univariate response for the ith individual, and let xi be thecorresponding p-dimensional vector of covariates for that individual. Define pi and vi to be theexpectation and variance of Yi. The expectation is assumed to be a known function of a linearcombination of the covariates, i.e.,= h(43),where /3 is a px1 vector of regression parameters. In addition, except for an unknown scaleparameter 0, the variance is assumed to be a known function of the expectation, i.e.,vi = 9(p i )10.Inference for the regression parameters /3, rather than for the nuisance parameter 0, is theprimary objective.The quasi-likelihood estimator for /3 is obtained by solving a score-like system of equationswhere the equation for the Ph regression parameter, 0/, is given byMO) = r_, 814 ^- iti) = 0, 1 = 1, ..,p.i=1(4.1)These are in fact score equations for /3 when the distribution of the outcome is in the expo-nential family. The solution to these quasi-score equations is the quasi-likelihood estimator ofthe regression parameters, and can be obtained by the method of iteratively re-weighted leastsquares (for example, by the use of GLIM). McCullagh (1983) has shown that, under mild reg-ularity conditions, the quasi-likelihood estimator of the regression parameters is asymptoticallyGaussian.Chapter 4. Analysis Based on the GEE Approach^ 664.1.2 Generalized Estimating Equations (GEEs) for Longitudinal DataTo extend the quasi-likelihood approach to longitudinal data, Zeger and Liang (1986) adoptthe structure described above for each of the univariate outcomes Yi t . Their extension involvesthe specification of a "working" correlation matrix for Yi, the response vector of the ith subject,and the incorporation of this correlation into the estimating equations for 0. The observationtimes and the working correlation matrices are allowed to differ across subjects in the GEEapproach.Let Ri(a) be the nixni "working" correlation matrix for the ith subject. Zeger and Liang(1986) call Ri(ct) the working correlation matrix because it is not necessarily specified correctly.The collection of working correlation matrices is assumed to be fully specified by a, an sx1 vectorof unknown parameters. The working covariance matrix for Yi is then= V2Ri(ce)A 1 /2 /0,where Ai is an nixni diagonal matrix with g(pi t ) as the tth diagonal element.Liang and Zeger (1986) extend the earlier quasi-likelihood equations in (4.1) and proposegeneralized estimating equations,K^Ksi(0, a) = 1 (yi - pi),where Si( i3, a) denotes the p-dimensional quasi-score function for the i th subject, pi = E(Yi) =(z11, ••, = (1(x=10), -,h(x'in, O)Y, and Di = 0140 13. Note that these equations reduce to(4.1) in the case ni=1 for all i. More generally, the quasi-score function for the ith subject,S1(/3, a) = vi-1(y, - pi), is equivalent to the estimating function suggested by Wedderburn(1974), except that here the V's are functions of a as well as 0. Because Di Vi-1 does not--1(4.2)Chapter 4. Analysis Based on the GEE Approach^ 67depend on the y i 's, as long as E(Y —^= 0 (i.e., as long as the mean is correctly specified),the average of the quasi-score functions will converge to zero, ensuring that the equations haveconsistent roots.Since the nuisance parameters a (a parameter defined only for the working correlation) and0 are unknown and the primary goal is inference for the regression parameters /3, an obvious wayto proceed is to replace a in the expression for Vi by a consistent estimator, a__(yi , yK, /3, 0),replace 0 in a by a consistent estimator 0 yK , /3), and solve the altered estimating equationsfor /3. Liang and Zeger (1986) adopt this approach and define /3G, the GEE estimate of /3, asthe solution ofKYK^= O.Provided a is consistent given (/3, 0), is consistent given /3, and mild regularity conditionshold, Liang and Zeger (1986, Theorem 2, page 16) show that as the number of subjects, K,becomes large, ;JG consistently estimates /3. They also show that under these same conditions,K(/3G — /3) is asymptotically multivariate Gaussian with zero mean and a covariance matrixgiven by-1^VG = liM K (E .1Z Vi-1 Di) [E D: Vi-1 cov(YOK-1 Di]^Vi-1 Di1^i=1^ i=1bill K17 -1 V. V -1 ,K--.cowhere cov(Yi) is the actual (unknown) covariance matrix for the response vector of the i thindividual. Although in general one might expect VG, the asymptotic covariance matrix of ,jc,to depend on which estimators are used for a and 0, the above result indicates that it doesnot depend upon how a and 0 are estimated, as long as they are consistently estimated. TheChapter 4. Analysis Based on the GEE Approach^ 68covariance matrix VG can be consistently estimated by replacing cov(Yi) by (yi — rti)(yi —where^= (h(eii i3), h(eini MY , and a and d by consistent estimates in (4.2).The estimate of the regression coefficient, SG, is obtained by iterating between solving thegeneralized estimating equations for and consistently estimating a and cb. These two stepsare repeated until convergence of all the estimates. To consistently estimate a and 0, Liang andZeger (1986) recommend the use of the standardized residuals based on the updated estimateof /3, rit = (yit — Pit) ([17vi — 1 ]tt) - 112 . Provided the fourth moments of the Yit's are finite, at agiven iteration, 0 can be consistently estimated by the longitudinal analog of the Pearson X 2statisticK^20-1 = v, rit N — p'2=1where N = Efc=i ni. As in many quasi-likelihood problems (where the mean-variance relation-ship is specified by a scale parameter times a function of the mean), direct estimation of theparameter cb may not be necessary for estimation of Q and VG. In particular, if the workingcorrelation matrix is common to all subjects, i.e., Ri(a) = R(a), and the elements of this com-mon working correlation matrix are all multiples of the parameter a, an estimate of 0 need notbe calculated. (Because 0 appears as a multiplicative constant in the formula for ii, it cancelsin the calculation of Vi = A/ 2R(a),4 21 2 , so is not required for estimation of /3 and VG.) This isthe case for many choices of the working correlation matrix which are of practical interest. Toestimate a, we need to borrow strength across subjects. The specific estimator depends uponthe choice of the Ri(a), but when R 2 (a) = R(a) for i = 1, .., K, the general idea is to estimateChapter 4. Analysis Based on the GEE Approach^ 69a by a simple function ofIZuv^riurivN. p •:=1In practice, the true correlation structure of the response is rarely known. At best, we mayonly have vague ideas about the actual correlation structure present in the data. Nevertheless,we may wish to use this knowledge in order to obtain more efficient estimates of the regressionparameters. The GEE approach allows us to incorporate this knowledge into the estimationprocess by specifying working correlation matrices we expect might be close to the true correla-tion structure. The GEE approach has a strong appeal because it does not require the workingcorrelations to be correctly specified for asymptotically valid results.For binary longitudinal data, the GEE approach is more flexible than the working likelihoodapproach of Section 3 because working correlations other than independence and stationary1-dependent may be used, permitting increased efficiency of estimation in many different situ-ations. Further, when a working correlation other than the identity matrix is specified in theGEE approach, observations belonging to different subjects need not share the same correla-tion structure for asymptotically valid results. This is not the case in the working likelihoodapproach. (Recall that for valid asymptotic results using the first-order working likelihood, allsubjects are required to share a common first-order autocorrelation.)However, a trade-off between efficiency and validity exists in the GEE approach, just as inthe working likelihood approach. When specifying the working correlation R 2 (a) as somethingother than the independence working correlation in the GEE approach, the parameter a mustbe consistently estimated from the data in order to obtain asymptotically valid parameter andstandard error estimates. Because a (a parameter defined only for the working correlation) mustChapter 4. Analysis Based on the GEE Approach^ 70be consistently estimated, it should describe some aspect of the true correlation structure, atleast indirectly. An analogous situation arises when applying the working likelihood approachdeveloped for binary data. When using the first-order working likelihood, the stationary 1-dependent working correlation is consistently estimated from the data provided all subjectshave a common lag one autocorrelation.One final point regarding the GEE approach is that it may be used for any multivariateoutcomes with (univariate) marginal distributions for which the quasi-likelihood formulation issensible. A wide variety of univariate distributions such as Gaussian, Poisson, binomial (binary),gamma, and inverse Gaussian, are included in this family. Liang and Zeger (1986) suggest theapproach may even be extended to include multivariate multinomial and ordinal data by usingformulations described by McCullagh and Nelder (1989) for these types of variables and choosingan independence working correlation.4.2 GEE Analysis of the Binary Hummingbird DataFor the GEE analysis of the binary hummingbird data, the logit link with the binomial mean-variance relationship (up to an unknown scale factor 0) is used. If the independence workingcorrelation is specified using this choice of link and mean-variance relationship, the scale pa-rameter 0 does not enter into the estimation of 0 and VG. Therefore, the GEE parameter andstandard error estimates using the independence working correlation will be identical to thoseobtained in the earlier analysis based on the independence working likelihood. In contrast, fitsfor the first-order working likelihood may be different from the GEE fits based on a stationaryChapter 4. Analysis Based on the GEE Approach^ 711-dependent working correlation (the quasi-likelihood analog of the first-order working likeli-hood) because the working likelihood models the correlation directly, whereas quasi-likelihoodtreats the correlation as a nuisance parameter, obtaining a method-of-moments estimator afterfirst estimating the regression parameters at each iteration of a process which continues untilconvergence of all the parameter estimates.In order to compare the fits obtained by the working likelihood approach and the GEEapproach, in particular, the fits obtained by the first-order working likelihood and its quasi-likelihood analog, the binary hummingbird data was analyzed using the GEE approach witha stationary 1-dependent working correlation. Sex was included as a covariate in the GEEanalysis because of the clear indications of its importance in the working likelihood analyses ofChapter 3. To examine the robustness of parameter and standard error estimates across workingcorrelations, the final model obtained from the stationary 1-dependent working correlation wasre-fit using several other working correlations.4.2.1 Description of the Model ReductionsFor the GEE analysis including sex as a covariate, the same general reduction procedure outlinedin Section 3.10.1 was applied to the initial model for the success probabilities on the first feedingforay described in Section 3.9. The model reduction path for the stationary 1-dependent workingcorrelation is described by the following steps:1. Check the quadratic terms.• Can the coefficients for males and females be taken to be equal within all treatmentgroups? Yes; the p-value for this reduction is .81.Chapter 4. Analysis Based on the GEE Approach^ 72• Can the quadratic coefficients be eliminated? No; the p-value for the simultaneouselimination of all three quadratic coefficients is a marginal .12. Leave the quadraticterms for each treatment in the model for the time being.2. Check the linear terms.• Can the linear coefficients for males and females be set equal within each treatmentgroup? Yes; the p-value for this reduction is .91.• Can the linear coefficients for all treatment groups be set equal? Yes; the p-valuefor this reduction is .21. (Note that the fitted linear coefficients of Model 3 in Table4.1 suggest different slopes might be identified if more birds were included in theexperiment.)• Can this common linear term be eliminated? No; the p-value for this reduction isless than .001.3. Check the differential sex effects• Can they be set equal across treatment groups? Yes; the p-value for this reductionis .48.• Can this additive sex effect be eliminated? No; the p-value for this reduction is .006.4. Examine the treatment main effects.• Different quadratic terms for different treatment groups are still present in the model,so checking whether the treatment main effects can be set equal corresponds tochecking whether the success probabilities for all three treatment groups can be setChapter 4. Analysis Based on the GEE Approach^ 73to be equal at t = 0 or at trial 90.5. The p-value for this reduction is .02, so it is notjustified by the data.5. For interpretability of the final model...• Eliminate the quadratic terms for treatments 2 and 3. The p-value for this reductionis .79, justifying what is suggested by the lowess plots in Figures 3.16 and 3.17; thesuccess probabilities of birds in treatments 2 and 3 can be adequately described bya linear trajectory.Interestingly, the reduction path for this GEE analysis with a stationary 1-dependent workingcorrelation matches that for the independence working likelihood (described in Section 3.10.2),rather than that for the first-order working likelihood (described in Section 3.10.3); the pathleads to a final model with a quadratic term for the trajectory of birds in treatment 1.Figure 4.1 plots the fitted final model for females (for the males, the curves within eachtreatment are simply shifted .77 units higher on the logit scale), while Table 4.1 summarizesthe model reductions. Although the path of reductions is the same, the final model fits obtainedfrom the independence working likelihood and the stationary one-dependent working correlationare slightly different. In particular, the fitted linear coefficient obtained here (.39, with standarderror .10), is somewhat less than that obtained from the independence working likelihood (.48,with standard error .11; see Table 3.6), although these estimates are still reasonably closeconsidering the standard errors. The estimated female-to-male differential sex effect (-.77, withstandard error .28) is very similar to that obtained from the independence working likelihood(-.75, with standard error .28).Chapter 4. Analysis Based on the GEE Approach^ 74Table 4.1. Reductions for the Stationary 1 -Dependent Working Correlation with Sex IncludedParameters 1 2Model3^4 5 6iii .36(.03) .36(.17) .35(.16) .37(.16) .19(.26) .19(.26)P2 -.59(.31) -.52(.24) -.50(.24) -.53(.24) -.35(.26) -.34(.20)/13 .53(.75) .60(.71) .60(.70) .57(.66) .55(.48) .46(.41)sexi,i -1.03(.42) -1.02(.42) -1.02(.41) -1.05(.42) -.77(.28) -.77(.28)sex 2 , 1 -.25(.40) -.37(.30) -.39(.30) -.44(.30) set=sex1,1 set=sex1,1sex3 , 1 -.72(.76) -.86(.66) -.86(.64) -.81(.59) set=sexi,i set=sexi,i/31,1,1 .22(.09) .22(.09) .18(.11) .39(.10) .39(.10) .39(.10)/31,2,1 .55(.22) .54(.21) .42(.17) set=th,i,i set=th,i,i set=/31,1,1)31,3,1 .55(.38) .54(.38) .63(.24) set=/31,1,1 set=/31,1,1 set=/31,1,1/31,1,2 .41(.31) .41(.31) set=/31,1,1 set=/31,1,1 set=/31,1,1 set=/31,1,1/31,2,2 .46(.23) .46(.24) set=/31,2,1 set=/31,1,1 set=/31,1,1 set=/31,1,1/31,3,2 .74(.27) .73(.27) set=/31 , 3 , 1 set=/31,1,1 set=/31,1,1 set=/31,1,1/32,1,1 -.21(.08) -.21(.09) -.21(.09) -.23(.09) -.22(.10) -.22(.10)/32,2,1 - .04(.06) .03(.10) .03(.10) .01(.11) .01(.11) -/32,3,1 -.16(.15) -.09(.12) -.10(.15) -.10(.14) -.09(.14) -/32,1,2 -.22(.21) set=/32,1,1 set=/32,1,1 set=02,1,1 set=/32 ,1, 1 set=/32,1,1/32,2,2 .10(.18) set=/32,2,1 set=fl2,2,1 set=#2,2,1 set=/32,2,1 -'32,3,2 -.008(.17) set =i32,3, i set =fl2,3, i set =02,3, i set =/32,3, i -Estimates of the lag one (working) autocorrelation at each stage of the model reductionsare not included in Table 4.1. These are .12, .12, .13, .13, .13, and .13, for Models 1 through6, respectively, essentially the same as the fitted lag one autocorrelations for the first-orderworking likelihood which were roughly .13 (s.e. P..,- .03) for all model fits.4.2.2 Fits of the Final Model Obtained by Other Working CorrelationsThe final model obtained using the stationary 1-dependent working correlation was re-fit usinga variety of working correlation structures; the results are summarized in Table 4.2. Qualita-tively, the conclusions obtained with each working correlation are the same; all terms in thefinal model appear to be important. In general, the robust z-scores do not reflect as muchassociation between the covariates and the response as the corresponding naive z-scores (notpresented), which are up to 4 times larger. Results across working correlation structures areChapter 4. Analysis Based on the GEE Approach^ 75remarkably similar, although the exchangeable working correlation fits a larger male-to-femalesex differential and a larger (negative) main effect for treatment 2 (as well as larger standarderrors for both these estimates) than the other working correlations. This suggests parameterand standard error estimates may be reasonably accurate, even with only 18 subjects.Table 4.2. Robust z-Scores (Parameter Estimates) for the Binary Final Model*Working Corr. P1 P2 P3 sex(11 1312, 1,1 133i,1Independent .68(.18) -1.54(-.32) 1.09(.45) -2.64(-.75) 4.31(.48) -2.47(-.24)Stat. 1-dep. .73(.19) -1.71(-.34) 1.12(.46) -2.76(-.77) 3.89(.39) -2.27(-.22)Stat. 2-dep. .73(.19) -1.71(-.34) 1.12(.46) -2.75(-.77) 3.89(.39) -2.26(-.22)Stat. 5-dep. .76(.20) -1.74(-.34) 1.13(.46) -2.76(-.77) 3.98(.40) -2.34(-.23)Stat. 8-dep. .75(.20) -1.69(-.33) 1.14(.45) -2.75(-.76) 4.03(.40) -2.40(-.23)AR-1 .73(.19) -1.71(-.34) 1.12(.46) -2.76(-.77) 3.89(.39) -2.27(-.22)AR-2 .73(.19) -1.71(-.34) 1.12(.46) -2.75(-.77) 3.90(.40) -2.26(-.23)AR-5 .76(.20) -1.74(-.34) 1.13(.46) -2.76(-.77) 4.01(.40) -2.38(-.24)Exchangeable .55(.16) -1.83(-.54) 1.14(.50) -2.69(-.95) 3.51(.40) -2.28(-.23)* Entries in table are z-score (parameter estimate).1. sex 1 , 1 = sex i^common, 2 = sex 1 , 3 sex, the comn differential sex effect.2. )31,1,1 — 131,1,2 - 01,2,1 - 131,2,2 - 131,3,1 - /31,3,2, the common linear coefficient.= )32 ,1, 2 , the common quadratic coefficient for birds in treatment 1.3 . i(32,1,1Note that among the working correlations for which results are presented in Table 4.2,the exchangeable structure is scientifically the least plausible candidate for the true structurebecause it models the correlations for responses distant in time to be the same as those forresponses adjacent in time. Oddly enough, of all the working correlations presented, the naiveand robust z-scores for the exchangeable structure are the closest together. Although thisdoes not necessarily imply the exchangeable structure best approximates the true correlationstructure, it is interesting to note that when the working correlation structure is specified tobe stationary 8-dependent, the estimated lag one through eight working correlations are allChapter 4. Analysis Based on the GEE Approach^ 76about .12 (slightly larger than .08, the fitted exchangeable working correlation), suggestingcorrelations for responses relatively far apart in time are not too different from correlations forresponses adjacent in time.4.3 GEE Analysis of the Binomial Hummingbird DataTo illustrate the versatility of the GEE approach, we turn now to an analysis of the "binomial"data. Recall that the maximum number of successful feeding forays on any given trial in theexperiment was 11, while the number of unsuccessful feeding attempts was unrestricted. Be-cause the total number of feeding forays in a trial is random, the "binomial" response, i.e., thenumber of successful feeding forays on each trial, cannot have a binomial distribution. Never-theless, we assume the mean-variance relationship for the number of successful feeding foraysin a trial is that of a binomial random variable and interpret the fitted success probability for aparticular trial as a rough average of the success probabilities for all the (possibly dependent)forays within that trial, rather than as a success probability for a binomial experiment (i.e., thetrial) in which the total number of (independent) forays is non-random.The binomial data is not as meaningful to the investigator as the binary data becausehummingbird researchers believe the first feeding forays are the most accurate measure of abird's expectations about feeder profitability. Many of the birds which are unsuccessful on thefirst foray have not yet associated the light cue with the feeder. For these birds, any forayssubsequent to the initial unsuccessful attempt will reflect their strategy for finding food afterfailing the first time. This strategy varies with each individual; so, on a particular trial, abird's binomial response (which summarizes information on all of its forays during that trial)Chapter 4. Analysis Based on the GEE Approach^ 77is expected to convey less information about the bird's feeding expectations than its binaryresponse at the first feeding foray. Even though analysis of the binomial hummingbird datamay not be as scientifically relevant as the earlier analysis of the binary data, it is worthwhileas an example of the GEE approach applied to non-binary data.4.3.1 Exploring the Binomial Hummingbird DataA series of plots was used to determine appropriate parametric forms to be incorporated intomodelling of the temporal response patterns.To check if a day effect was apparent, the logits of the individual responses (logit(y) =log(-2i- -n e ), c = .5, where y is the number of successful forays out of n total forays in the trialfor a particular bird) were averaged over all birds of the same sex within each treatment groupfor successive blocks of 10 trials and then plotted (see Figures 4.2-4.4). Because there were nosystematic discontinuities at those trials corresponding to new days (trials 61 and 121), thereseemed to be no need to model a day effect. To get a better overall picture of the binomialdata, the logits of responses within each treatment-sex group were smoothed using robust locallyweighted linear least squares ("lowess") with windows of 30 and 60 points.Figures 4.5 and 4.6 show the results of this "lowess" smoothing. Comparing across treatmentgroups, the success probabilities of birds in treatment 3 (FT) are generally higher than thoseof birds in the other treatments, suggesting the presence of treatment main effects. Successprobabilities of males in treatment 1 (NT) appear to be almost as high as those of males intreatment 3, and roughly the same as those of females in treatment 3. In contrast, successprobabilities of females in treatment 1 appear to be lower than those of females in treatmentChapter 4. Analysis Based on the GEE Approach^ 783 and closer to those of females in treatment 2 (PT). In general, males appear to have higherprobabilities of success than females, especially for birds in treatment 1. This suggests thepresence of sex-by-treatment interaction in this data. The success probabilities of all birdsappear to increase roughly linearly over time, with the slope for females in treatment 1 beingsomewhat less than the slopes for the other birds. This suggests a three-way interaction of sex,treatment group and time may be present in this data.To study individual-to-individual variation within each treatment, the logits of individualresults at each time point were smoothed using lowess, with a window of 60 consecutive timepoints; see Figures 4.7-4.9. Peculiarities of these "lowess" fits near the endpoints occur becauseonly the individual logits, rather than logits for all the individuals of a certain sex within atreatment, have been smoothed. The individual logits are more likely to have clusters of extremevalues with few moderate values to counteract their effect at the endpoints, leading to fits foran individual which may not accurately describe the patterns in the individual's data near theendpoints. In the interior of the the data sets, the plots do not show any obvious change inresponse variability over time for any of the treatments. The plots also suggest that, at anygiven time, the variability of responses in treatments 1 (NT) and 2 (PT) is roughly the same,but that responses in treatment 3 (FT) are more variable at than responses in treatments 1and 2.Chapter 4. Analysis Based on the GEE Approach^ 794.3.2 Model Fitting for the Binomial Hummingbird DataThese exploratory plots suggest the transformed probability of success may be modelled, ini-tially, aslogit^=^sexi,s^ 2,i,st2^= 1, .., 3, s = 1,2, T = 1, ..,180, t = T — Ts.d.(T)where pi may be interpreted as the main effect of the ith treatment group, sexi, i is the female-to-male differential sex effect for the ith treatment group, and sexo s=-St 0 (for males in the ithtreatment group). As in the binary analysis, a quadratic learning trajectory (in the logit scale)has been modelled to allow for some departure from linearity in the responses. In addition, thetime covariate has been standardized to orthogonalize the highly collinear linear and quadratictime covariates. The female-to-male differential sex effect is permitted to vary across treatmentsto allow for treatment-by-sex interaction. Similarly, the three-way interaction of sex, treatmentand time is incorporated because different linear and quadratic coefficients for each sex andeach treatment group are permitted.Because the stationary 1-dependent working correlation models some correlation in theresponses, it is expected to provide more efficient estimates than the independence workingcorrelation. Model reductions have therefore been carried out using the stationary 1-dependentworking correlation and are summarized in Table 4.3. The path of these reductions is describedby the following steps:1. Check the quadratic terms.• Can the coefficients for males and females be taken to be equal within all treatmentgroups? Yes; the p-value for this reduction is .65. Set these coefficients to be equal.Chapter 4. Analysis Based on the GEE Approach^ 80• Can the quadratic coefficients in the resulting model be eliminated? Yes; the p-valuefor the simultaneous elimination of all three quadratic coefficients is .58. Removethe quadratic terms from the model.2. Check the linear terms in the resulting model.• Can the linear coefficients for males and females be set equal within each treatmentgroup? No; the p-value for this reduction is less than .001. From the coefficients andstandard errors of Model 3 in Table 4.3, it appears that essentially all the evidenceagainst this reduction comes from the responses for birds in treatment 1.• Can the linear coefficients for all treatment groups be set equal within sexes? Yes;the p-value for this reduction is .25. To simplify the model, set two common linearslopes: one for females and one for males.• Can the resulting linear terms for females or males be eliminated? For the females,yes; the p-value for the reduction is .67, so remove this term from the model. Forthe males, no; the p-value for this reduction is less than .001.3. Check the differential sex effects.• Can they be taken to be the same for all treatment groups? Yes; the p-value for thisreduction is .47.• Can this the common sex effect be eliminated? No; the p-value for this reduction is.05.4. Examine the treatment main effects.Chapter 4. Analysis Based on the GEE Approach^ 81• Can they be taken to be equal? No; the p-value for this reduction is .02.For interpretability of the final model, a further reduction might set the main effects fortreatments 1 and 2 to be equal, as suggested by standard errors of these estimates under Model6 (see Table 4.3). The p-value for this reduction is .63, indicating the reduction is permissible.The resulting model (not included in Table 4.3) leads to a simpler representation of the learningtrajectories because, within each sex, birds in treatments 1 and 2 are then modelled as havingthe same learning trajectory in the logit scale.Table 4.3. Model Reductions for the Binomial DataParameter 1 2Model3^4 5 6Pi 1.08(.15) 1.08(.08) 1.07(.01) 1.08(.02) 1.08(.02) 1.12(.18)P2 1.08(.14) 1.12(.12) 1.12(.12) 1.11(.13) 1.12(.13) 1.24(.17)P3 2.22(.71) 2.36(.66) 2.27(.63) 2.24(.57) 2.25(.57) 1.91(.30)sex 1 , 1 -.35(.49) -.35(.36) -.35(.36) -.38(.38) -.37(.37) -.43(.22)sex 2 ,1 -.12(.20) -.20(.23) -.20(.23) -.19(.23) -.18(.23) set=sex 1 ,1sex3,1 -.76(.71) -.95(.63) -.96(.64) -.93(.57) -.93(.57) set=sexi,ith,i,i -.06(.09) -.06(.09) -.06(.09) .06(.09) - -/31,2,1 .23(.09) .23(.09) .23(.10) set=/31,1,1 - -/31,3,1 .10(.24) .10(.25) .10(.26) set=/31,1,1 - -/31,1,2 .31(.02) .31(.02) .31(.01) .37(.09) .37(.09) .37(.09)/31,2,2 .39(.18) .39(.18) .39(.18) set=th,1,2 set=/31,1,2 set="31,1,2/31,3,2 .48(.30) .45(.30) .47(.33) set=131,1,2 set=/31,1,2 set=/31,1,2/32,1,1 -.01(.11) -.01(.09) - - - -/32,2,1 -.04(.08) -.01(.08) - -/32,3,1 -.14(.04) -.09(.07) - - -/32,1,2 -.01(.17) set=i324 , 1 - - - -/32,2,2 .04(.15) set--7432 , 2 , 1 - - - -/ .05(.16) set=/32,3,1 -The fits for the initial model with the stationary 1-dependent working correlation given inTable 4.3 are plotted in Figures 4.10-4.12; similar fits are obtained with both the independenceand the AR-1 working correlations. These fits do not reflect some features of the lowess plotsin Figures 4.5 and 4.6. For example, the lowess plots suggest the male-to-female differential sexeffect should be largest for birds in treatment 1 (NT), yet the GEE fits for the initial modelChapter 4. Analysis Based on the GEE Approach^ 82suggest this effect is largest for birds in treatment 3 (FT). In addition, the lowess plots suggestsuccess probabilities of treatment 1 males and treatment 3 females and males are higher thanthose of birds in treatment 2 (PT) on the logit scale, but in the fits for the initial model, onlythe success probabilities (on the logit scale) of treatment 3 males are clearly higher than thoseof birds in treatment 2. Finally, the lowess plots suggest success probabilities for females intreatment 1 increase slightly over time, whereas the fits for the initial model indicate they de-crease slightly over time. The initial fit for females in treatment 1 is probably overly influencedby the female with a large dip in success probabilities near trial 140; see Figure 4.7. Discrep-ancies such as these may occur because, unlike the lowess fits for intermediate trials, the GEEfits at intermediate trials may be influenced by extreme data at the beginning and end of theexperiment. Runs of extreme data are more likely to occur for a particular individual at thebeginning or end of the experiment and may have high leverage because the number of subjectsin the experiment (18) is very small relative to the number of trials (180). Such discrepanciesbetween lowess and initial model fits may disappear once data on more birds are collected underthese conditions.A noteable peculiarity of the GEE results in Table 4.3 is the large drop (from .08 to .01)in the standard error of the estimated effect for treatment 1 compared to those for the othertreatments (from .12 to .12, and from .66 to .63, for treatments 2 and 3, respectively) when thequadratic terms are removed from Model 2. The corresponding naive standard error estimatestays relatively constant for this model reduction, dropping from .11 to .09. Another jump(from .02 to .18) in the standard error of the estimated effect for treatment 1 occurs when thedifferential sex effects for the treatment groups are set equal (Model 5 to Model 6 in the table).Chapter 4. Analysis Based on the GEE Approach^ 83Again, the corresponding naive standard error estimate remains relatively constant (droppingfrom .09 to .07). These results suggest the robust standard error estimate for the main effect oftreatment 1 may not accurately reflect the true standard error; perhaps there are too few birds(18) in the experiment for the asymptotic approximation to be adequate. If the asymptoticapproximation to the covariance matrix of the parameter estimates is inaccurate, the validityof all the model reductions (and hence the final model) is questionable. Somewhat reassuringly,the values of standard error estimates for the other parameters are fairly constant.Figures 4.13 and 4.14 display the final model fits for the binomial hummingbird data ob-tained with the stationary 1-dependent working correlation. Model reductions are intended tolead to a parsimonious final model which describes essential features of learning trajectories forhummingbirds in general, rather than those for the eighteen hummingbirds in this experiment.Hence, some features evident in the lowess plots are absent in the final model fits, presumablybecause of the large variability of the responses. For example, the variability of female successprobabilities, primarily in treatments 1 and 3 (see Figures 4.7-4.9), permits the fit of a commonlinear slope for females in Model 4 of Table 4.3. This feature of the fits is not consistent withthe lowess plots, which suggest success probabilities for females in treatments 2 and 3 increasemore quickly over time on the logit scale than those for females in treatment 1. When averagingall the female curves, those for the four females in treatment 1 dominate, leading to a negligiblefitted slope which is eliminated in Model 5 of Table 4.3. Because both a common (negligible)female and a common male slope are modelled, a common differential sex effect is permissiblein Model 6. This feature of the fits is also inconsistent with the lowess plots, which suggest ahigher male-to-female differential sex effect in treatment 1. Specifying a common differentialChapter 4. Analysis Based on the GEE Approach^ 84sex effect affects the placement of the fitted curve for males in treatment 1 relative to the fittedcurves for males in other treatments. For this reason, the fitted curve for males in treatment 1is lower than the curve suggested by the lowess plots.In order to identify which of these patterns more accurately reflects the true response prob-abilities of hummingbirds under these experimental conditions, more birds need to be includedin the experiment. However, one feature of the binomial data suggested by the lowess plots isconfirmed by the GEE analysis: the (binomial) success probabilities for birds in treatment 3are higher, in general, than those for birds in treatments 1 and 2.4.3.3 Fits of the Final Model Obtained by Other Working CorrelationsTo examine the robustness of parameter and standard error estimates across working correla-tions, the final model was re-fit using a variety of working correlation structures; results aresummarized in Table 4.4.Table 4.4. Robust z-Score (Parameter Estimate) for the Binomial Final Model*Working Corr. /21 /12 1131)sex („ /3121).,2Independent 6.03(1.08) 6.48(1.13) 5.76(1.81) -1.81(-.40) 3.95(.36)Stat. 1-dep. 6.23(1.12) 7.33(1.24) 6.32(1.91) -1.94(-.43) 4.12(.37)Stat. 2-dep. 6.44(1.18) 8.56(1.43) 7.10(2.04) -2.13(-.47) 4.06(.37)Stat. 3-dep. 5.82(1.31) 9.86(1.75) 7.39(2.23) -2.16(-.54) 3.41(.38)AR-1 6.27(1.12) 7.49(1.26) 6.42(1.92) -1.97(-.43) 4.13(.37)AR-2 6.37(1.22) 9.23(1.57) 7.46(2.15) -2.19(-.50) 3.98(.38)* Entries in table are z-score (parameter estimate).1. sex1, 1 = sex 1 ,2 = sex1,3 = sex, the common differential sex effect.2. 01,1,2 = 01,2,2 = 01,3,2, the common linear coefficient for males.Chapter 4. Analysis Based on the GEE Approach^ 85The robust z-scores for different working correlations are reasonably close, leading to qual-itatively similar conclusions: all terms in the final model are important. For the workingcorrelations used in the analysis of this binomial hummingbird data, the robust z-scores do notreflect as much association between the covariates and the response as the naive z-scores (notpresented), which are up to 5 times larger. When naive and robust z-scores are compared acrossthe working correlations used in this analysis, the differences between naive and robust z-scoresfor the stationary 3-dependent working correlation are the smallest. This, together with thefact that the stationary 3-dependent correlation structure is scientifically plausible, might sug-gest it approximates the true correlation better than the other working correlations presentedin the table. For the stationary 3-dependent working correlation, estimates for the first, sec-ond and third lag autocorrelations are .24, .26, and .26, respectively, suggesting non-negligiblecorrelation may indeed be present in the data.Note that estimates of all the effects, and, in particular, the treatment main effects, increaseas more correlation structure is incorporated into the working assumptions. In fact, when solv-ing the generalized estimating equations iteratively using the stationary 5-dependent, stationary8-dependent, AR-3, and exchangeable working correlations (not included in the table), succes-sive estimates of the treatment main effects become very large and do not converge. This is incontrast to the binary data on the first feeding foray analyzed in the previous section, whereconvergence was achieved with all of these working correlations. For the binomial hummingbirddata, lack of convergence with more detailed working correlations may be related to the largenumber of trials in the experiment (180) relative to the small number of subjects (18).Chapter 4. Analysis Based on the GEE Approach^ 86This same feature of the data set (180 trials on only 18 subjects) may explain discrepanciesbetween the lowess fits and the GEE fits for the full model in the binomial analysis: becausethe number of trials (180) is large relative to the number of subjects (18), the GEE fits atintermediate trials may be overly influenced by a run of extreme data for a particular individualat the beginning or end of the experiment. This problem does not arise in the binary analysisbecause there are no extreme responses for binary data (binary data takes on only two values).Hence, the GEE and lowess fits for the binary data are reasonably consistent. The binomialanalysis seems to illustrate that when a long series of observations is made on relatively fewsubjects, the GEE approach should be used with caution since it relies on combining strengthacross subjects to(i) obtain accurate estimates of the regression parameters; and(ii) compensate for making no strong assumptions about the correlation structure of eachsubject's response.Chapter 5Conclusion5.1 Discussion of ResultsExploratory plots of the binary responses for each bird on the first feeding foray (Figures 3.7-3.9)do not reveal the step-like pattern associated with sudden awareness of the spatial associationbetween light cue and feeder expected by the experimenter. These exploratory plots suggesta more gradual increase in individual success probabilities over time, but with considerablefluctuation throughout the experiment.Evaluating the performance of birds across treatment groups by the three methods describedin this thesis, both the working likelihood and GEE analysis indicate that, as expected, birds intreatment 3 (FT) have the highest success probabilities throughout the entire experiment. Theunivariate ANOVA, however, does not detect an important difference in success probabilitiesacross treatment groups.Unlike the ANOVA, both the working likelihood and GEE approaches permit modelling ofthe response directly as a function of time. When the learning trajectories of birds are modelledby a quadratic trend in the logit scale using these approaches, birds in treatments 2 (PT) and3 (FT) appear to improve linearly at the same rate. However, success probabilities for birdsin treatment 2 appear to start out (and remain) lower than those for birds in treatment 387Chapter 5. Conclusion^ 88throughout the entire experiment. Unlike the linear trajectories (in the logit scale) for birds intreatments 2 and 3, the trajectory for birds in treatment 1 (NT) appears to be quadratic, withsuccess probabilities at the beginning of the experiment closer to those for birds in treatment 3than those for birds in treatment 1. However, at the end of the experiment, success probabilitiesof birds in treatment 1 appear to drop closer to and perhaps below those of birds in treatment2. Contrary to the expectations of the investigator, birds in treatment 2 (PT) appear to havelower success probabilities than the other birds except, perhaps, at the end of the experiment,where their success probabilities may be greater than or equal to those for birds in treatment1. One possible explanation proposed by the investigator is that the orange Dymo tape forthe partially taped feeding array creates three horizontal bands which may interfere with theintended vertical pattern to be perceived by the birds in treatment 2 (see Figure 1.1).Contrary to the accepted wisdom among researchers in hummingbird learning, both theworking likelihood and GEE analyses reveal that sex is important in describing the probabilitiesof success on the first feeding foray for the hummingbirds in this experiment. Since the effectof sex appears to be additive for the binary data, the fitted models indicate the odds of successfor males are more than twice the odds of success for females. (For the fit resulting from theindependence working likelihood, the odds for males are about 2.1 the odds for females, with astandard error of roughly 0.6.) One possible explanation for the importance of sex is the higherbody-weight to wing-span ratio of males. Because males must work harder to obtain their food,they may be more likely to succeed than females. Note that in any ANOVA of the results inthis experiment, sex would be difficult to incorporate because of the imbalance in the numberof males and females in treatment 1 (NT).Chapter 5. Conclusion^ 89Although not as scientifically relevant as the analysis of the binary data for the first feedingforay, the analysis of the binomial data for all the feeding forays in a trial confirms that birds intreatment 3 (FT) have higher success probabilities, in general, than birds in the other treatmentgroups. Again, sex appears to be an important covariate, although its effect on the binomialresponse (where sex appears to interact with time) seems to be more complicated than its effecton the binary response (where the effect of sex is simply additive).Because both the working likelihood and GEE approaches model success probabilities as afunction of time while taking into account the (unknown) correlation present in the data, theyenable much more information to be extracted from the hummingbird data set than a moretraditional analysis-of-variance. These methodologies therefore provide useful and powerfultools for researchers in this subject area.5.2 Comparison of MethodsLongitudinal data consists of repeated measurements over time for each of many subjects.Hence, the response of each subject may be considered a random vector with possibly correlatedelements corresponding to measurements at each time of observation. If the response vector isnormally distributed, classical methods permit modelling of the joint distribution as a functionof the covariates. For example, when sex is not included as a covariate in the analysis of thebinary hummingbird data, MANOVA permits testing of parallelism and equality of treatmentmeans, provided the number of subjects is greater than the number of time points. MANOVAmay be carried out when the number of subjects is less than the number of time points if data areblocked over time points, but this leads to a loss of information. Modelling of responses whichChapter 5. Conclusion^ 90are not normally distributed is more difficult. Transformations which stabilize variance forunivariate responses do not necessarily stabilize covariance matrices for multivariate responses.Hence, equality of covariance matrices, a necessary assumption for valid F-tests in analysis-of-variance, may not hold.Instead of attempting to transform non-normal data into forms more suitable for analysis-of-variance, a more natural approach might be to model the marginal expectations of the responsesas functions of the covariates while taking into account the (unknown) correlation present inthe data of each subject. For example, with binary longitudinal data, logistic regression mightbe extended in the manner described by Zeger, Liang and Self (1985). Rather than the "ac-tual" likelihood, this approach uses a "working" likelihood based on simple assumptions aboutthe correlation within each subject's data to generate estimating equations for the regressionparameters. Under mild assumptions about the true correlation structure, these estimatingequations lead to consistent estimates of the regression parameters and their standard errors.The approach described by Zeger, Liang, and Self (1985) was developed for short binaryseries. Because detailed models for the change in response over time are unnecessary in this con-text, the authors considered time-independent covariates only. In the case of the hummingbirdlearning experiments, where birds are expected to improve over a long series of measurements,the probability of success should be modelled as a function of time. In this thesis, the resultsof Zeger, Liang, and Self (1985) have been extended to allow this. The resulting "workinglikelihood" approach may be applied to any binary series with time-dependent covariates. Oneweakness of this approach is that when correlation is explicitly modelled (in order to increasethe efficiency of estimation), asymptotic results only hold under certain assumptions on theChapter 5. Conclusion^ 91true correlation structure. Other than the independence working likelihood, only the first-order Markov working likelihood, which models stationary one-dependent correlation in theresponses, has been considered in this thesis.Unlike the working likelihood approach, the "generalized estimating equation" or GEE ap-proach of Liang and Zeger (1986) and Zeger and Liang (1986) requires no strong assumptionsabout the true correlation structure when modelling correlation explicitly. The GEE approachextends quasi-likelihood theory to the case where the response is observed repeatedly over timefor each subject, and uses a "working" correlation for these responses to arrive at estimatingequations for the regression parameters. Hence, the GEE approach is applicable to any longitu-dinal response with univariate marginal distributions for which the quasi-likelihood formulationis sensible. This includes binary, binomial, count, and continuous longitudinal responses. TheGEE approach thus provides a unified methodology for the analysis of longitudinal data.Future work investigating the extension of the GEE approach to the case of ordinal longi-tudinal data, and to the case of data with censored or missing observations would be valuable.For instance, the approach could be used to analyze the results of an ongoing clinical trial, inwhich U.B.C. is one of the participating centres, concerning the effects of beta-interferon on theprogression of multiple sclerosis. In this clinical trial, several outcome variables are observed oneach patient repeatedly over time, including the so-called "Kurtzke score", an ordinal responsevariable measuring the severity of disease.Figure 1.1. Feeding Arrays for the Hummingbird Experimentlit cue• ooc oo 4-- cues12 cm000000^feeders3 cmKey• 0 0 0 0 00 0 0 0 0 0r---No Tape TreatmentPartial Tape TreatmentFull Tape Treatment92Figure 2.1. Mean Treatment (Transformed) Response versus Measurement for Blocks of 152^4^6^8^10^12MeasurementFigure 2.2. Mean Treatment (Transformed) Response versus Measurement for Blocks of 105^10^15Measurement5 10 15OOFigure 2.3. Estimated Variances versus Measurement for Blocks of 152^4^6^8^10^1 2MeasurementFigure 2.4. Estimated Variances versus Measurement for Blocks of 10MeasurementFigure 3.1. Average Treatment Response in the Logit Scale for Blocks of 3 Trials0 1 0 20 30 40 50 60Trial Block of 3Figure 3.2. Average Treatment Response in the Logit Scale for Blocks of 5 Trials0^10^20^30Trial Block of 5Figure 3.3. Average Treatment Response in the Logit Scale for Blocks of 10 Trials0^5^10^15Trial Block of 100 50 100 150^ treatment 1^ treatment 2— - — treatment 3Figure 3.4. Smoothed Logit of Average Treatment Response versus Trial (Window=10).88.82.73.62.50.38.27.18.12.08Trial0 50 100 150treatment 1^ treatment 2— - — treatment 3iI ^ Icy -0Figure 3.5. Smoothed Logit of Average Treatment Response versus Trial (Window = 20 Trials)TrialFigure 3.6. Smoothed Logit of Average Treatment Response versus Trial (Window = 30)0^50^100^150TrialFigure 3.7. Smoothed Individual Logits for Treatment 10^50^100^ 150TrialFigure 3.8. Smoothed Individual Logits for Treatment 2TrialFigure 3.9. Smoothed Individual Logits for Treatment 30^ 50^100^ 150Trial.1800 50 50150 150100100trial trial.73.62.50..4-o.38^•-•.271 ^ Treatment 12 ^ Treatment 23 — - — Treatment 31 ^ Treatment 12 ^ Treatment 23 — - — Treatment 3Figure 3.10. First Independence Fits without Sex^Figure 3.11. Second Independence Fits without SexFigure 3.12. Fitted Success Probabilities for Treatment 1 from the Independence Working Modeltrial-s-co ----CO --L.Pearson Residuals-1 0 1 2all••••••••• OM N •• MO •• • OW •^•all• OP OD • 40 • • •a • ail.^• •^• • • •^•^•M4041 ••• • • •^•• • •^• •01•••••••^ •••^ Gal•••■••• • •. •. • MO •• • • •afa ••■••••••^ Miala10• • •• • • • • •• 4, • •a.••••• ••N• WOO. • • •4••4■•^ •••••■••1••• • •• • • a • • ••••••• a • •• • • OD • • • • •••1•10111MMILONSM• 0111••••■••^•D •a•1•1•0•1044•••■•••a• 041•00 ••4•allaa•••••••• • • • • ••aNIONIaMeaLl• ••••••• •• 411.• •^••••••• •••••■• • •• • MO MO NO Ma••■••■•••••1414aMMID ^401•0000 ••• .. • .0• • • • • •aa••••••01M01106Figure 3.14. Boxplot of Independence Pearson Residuals by Sexmale^ femalesexFigure 3.15. Smoothed Logit of Average Responsevs. Trial for Treatment 1 (Window = 60)Figure 3.16. Smoothed Logit of Average Responsevs. Trial for Treatment 2 (Window = 60)Figure 3.17. Smoothed Logit of Average Responsevs. Trial for Treatment 3 (Window = 60)0 50 100 150trial trial trialFigure 3.18. Binary Independence Fitsfor FemalesFigure 3.19. Binary First-Order Fitsfor Femalestr).62 00.50.12 cJ-0 50 100 150 0 50 100 150trial trial.38.271^Treatment 12 ^ Treatment 23 --- Treatment 3.18to.62.50.18.12I - Independence FitF ^ First-Order Markov FitFigure 3.20. Fitted Success Probabilities for Females in Treatment 1trial1 - Treatment 12 ^ Treatment 23 --- Treatment 30 50 100 150O.0iY/.Figure 4.1. Binary Fits for Females Obtained fromthe Stationary 1-Dependent Working Correlationtrial5 10Trial Block of 1015Figure 4.2. Average Logit of Response by Sexfor Treatment 1, Blocks of 10 TrialsFigure 4.3. Average Logit of Response by Sex^Figure 4.4. Average Logit of Response by Sexfor Treatment 2, Blocks of 10 Trials for Treatment 3, Blocks of 10 Trials5 1515 10Trial Block of 1010Trial Block of 10— .92— .88.92.88.82.73.82so.38Proportion versus Trial (Window = 30)Figure 4.5. Smoothed Logit of Success^Figure 4.6. Smoothed Logit of SuccessProportion versus Trial (Window = 60)Trial Trial0 50 100 150Treatment 1, FemaleTreatment 1, MaleTreatment 2, FemaleTreatment 2, MaleTreatment 3, FemaleTreatment 3, Male0 50 100 150N -T.0 -F^Treatment 1, FemaleM Treatment 1, MaleF ^ Treatment 2, FemaleM ^ Treatment 2, MaleF Treatment 3, FemaleM^Treatment 3, MaleTrial.73501.73.50.27.12.95^r,.88.95ea.27^'7.12^°4Figure 4.7. Smoothed Individual Logits (Binomial Data)for Treatment 1 (Window = 60)Figure 4.8. Smoothed Individual Logits (Binomial Data)for Treatment 2 (Window = 60)Figure 4.9. Smoothed Individual Logits (Binomial Data)for Treatment 3 (Window = 60).7350^100^150Trial0^50^100^150TrialFigure 4.10. Initial Model Fits for Treatment 1(Binomial Data)20co(,)a)0rnFigure 4.11. Initial Model Fits for Treatment 2(Binomial Data)Figure 4.12. Initial Model Fits for Treatment 3(Binomial Data)he .F^FemalesM Males50^100^150^ 0^50^100^150^ 0^50^100^150Trial Trial Trial0 50 100 150Trial TrialFigure 4.13. Final Model Fits for Females fromthe Stationary 1-Dependent Working CorrelationFigure 4.14. Final Model Fits for Males fromthe Stationary 1-Dependent Working Correlation0.01q2-0 50 100 1501^Treatment 12 ^ Treatment 23 — - — Treatment 3.92.88.73.621 ^ Treatment 12 ^ Treatment 23 — - — Treatment 31^Bibliography[1] Bowe, C.A. (1984). Spatial relations in animal learning and behavior. Psychological Record,34 181-209.[2] Cori, D.F. (1989). Floral color change in Lupinus argenteus (Fabaceae): why should plantsadvertise the location of unrewarding flowers to pollinators? Evolution, 43 870-881.[3] Greenhouse, S.W., and Geisser, S. (1959). On methods in the analysis of profile data.Psychometrika, 24 95-112.[4] Huynh, H., and Feldt, L.S. (1970). Conditions under which mean square ratios in repeatedmeasurements designs have exact F-distributions, Journal of the American Statistical As-sociation, 65 1582-1589.[5] Liang, Kung-Yee, and Zeger Scott L. (1986). Longitudinal data analysis using generalizedlinear models. Biometrika, 73 13-22.[6] Mackintosh, N.J. (1983). Conditioning and Associative Learning. Oxford: Clarendon Press.[7] McCullagh, P. (1983). Quasi-likelihood functions. Annals of Statistics, 11 59-67.[8] McCullagh, P., and Nelder, J.A., (1989). Generalized Linear Models (2nd edition). Chapmanand Hall, London.[9] Morrison, Donald F. (1976). Multivariate Statistical Methods. McGraw-Hill, New York.[10] Wedderburn, R.W.M. (1974). Quasi-likelihood functions, generalized linear models, andthe Gauss-Newton method. Biometrika, 61 439-447.[11] Zeger, Scott L., and Liang, Kung-Yee (1986). Longitudinal data analysis for discrete andcontinuous outcomes. Biometrics, 42 121-130.[12] Zeger, Scott L., Liang, Kung-Yee, and Self, Steven G. (1985). The analysis of binarylongitudinal data with time-independent covariates, Biometrika, 72 31-78.117