N O N L I N E A R P R I N C I P A L C O M P O N E N T A N A L Y S I S O F C L I M A T E D A T A by Adam Hugh Monahan B. Sc. (Honours Physics) University of Calgary, 1993 M . Sc. (Physics) University of British Columbia, 1995 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES EARTH AND OCEAN SCIENCES We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA February 2000 © Adam Hugh Monahan, 2000 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Earth and Ocean Sciences The University of British Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1Z1 Date: Abstract A nonlinear generalisation of Principal Component Analysis ( P C A ) , denoted Nonlinear Pr inc ipa l Component Analysis ( N L P C A ) , is introduced and applied to the analysis of climate data. This method is implemented using a 5-layer feed-forward neural network introduced originally in the chemical engineering literature. The method is described and details of its implementation are addressed. It is found empirically that N L P C A partitions variance in the same fashion as does P C A , that is, that the sum of the total variance of the N L P C A approximation with the total variance of the residual from the original data is equal to the total variance of the original data. A n important distinction is drawn between a modal P-dimensional N L P C A analysis, in which P successive I D approximations are determined iteratively so that the approximation is the sum of P nonlinear functions of one variable, and a nonmodal analysis, in which the P-dimensional N L P C A approximation is determined as a nonlinear non-additive function of P variables. Nonlinear Pr incipal Component Analysis is first applied to a data set sampled from the Lorenz attractor. It is found that the N L P C A approximations are much more repre-sentative of the data than are the corresponding P C A approximations. In particular, the I D and 2D N L P C A approximations explain 76% and 99.5% of the total variance, respec-tively, in contrast to 60% and 95% explained by the I D and 2D P C A approximations. When applied to a data set consisting of monthly-averaged tropical Pacific Ocean sea surface temperatures (SST), the modal I D N L P C A approximation describes average vari-ability associated with the E l Nino/Southern Oscillation (ENSO) phenomenon, as does the I D P C A approximation. The N L P C A approximation, however, characterises the asymmetry in spatial pattern of SST anomalies between average warm and cold events i i (manifes ted i n the skewness of the d i s t r ibu t ion) i n a manne r tha t the P C A a p p r o x i -m a t i o n cannot . T h e second N L P C A mode of S S T is found to character ise differences i n E N S O v a r i a b i l i t y between i n d i v i d u a l events, and i n pa r t i cu l a r is consistent w i t h the ce lebra ted 1977 "regime shift". A 2D n o n m o d a l N L P C A a p p r o x i m a t i o n is de t e rmined , the i n t e rp re t a t i on of w h i c h is compl i ca t ed by the fact that a secondary feature e x t r a c t i o n p r o b l e m has to be ca r r ied out to in terpret the results . It is found that this a p p r o x i m a t i o n contains m u c h the same in fo rma t ion as that p rov ided by the m o d a l analysis . A m o d a l N L P C analysis o f t r o p i c a l Indo-Pac i f ic sea level pressure ( S L P ) finds tha t the first m o d e describes average E N S O va r i ab i l i t y i n this f ield, and also characterises an a s y m m e t r y i n S L P fields be tween average w a r m and cold events. N o robust nonl inear m o d e b e y o n d the first c o u l d be found . N o n l i n e a r P r i n c i p a l C o m p o n e n t A n a l y s i s is used to find the o p t i m a l nonl inear approx-i m a t i o n to S L P d a t a p roduced by a 1001 year in tegra t ion of the C a n a d i a n C e n t r e for C l i m a t e M o d e l l i n g and A n a l y s i s ( C C C m a ) coupled general c i r cu l a t i on m o d e l ( C G C M 1 ) . T h i s a p p r o x i m a t i o n ' s associated t ime series is s t rongly b i m o d a l a n d pa r t i t i ons the d a t a in to two d i s t inc t regimes. T h e first and more persistent regime describes a s t and ing os-c i l l a t i o n whose s ignature i n the mid- t roposphere is a l t e rna t ing ampl i f i ca t i on a n d a t ten-u a t i o n of the c l ima to log i ca l r idge over N o r t h e r n E u r o p e . T h e second a n d more episodic reg ime describes mid- t ropospher i c spli t-f low south of G r e e n l a n d . Essen t i a l l y the same s t ruc tu re is found i n the I D N L P C A a p p r o x i m a t i o n of the 500mb height field itself. In a 500 year in t eg ra t ion w i t h a tmospher ic CO2 at four t imes p re - indus t r i a l concent ra t ions , the o c c u p a t i o n s tat is t ics of these preferred modes of va r i ab i l i t y change, such tha t the episodic spl i t - f low regime occurs less frequently whi le the s t and ing osc i l l a t ion reg ime occurs more frequently. F i n a l l y , a general isa t ion of K r a m e r ' s N L P C A using a 7-layer au toassocia t ive n e u r a l 111 ne twork is i n t r o d u c e d to address the i n a b i l i t y of K r a m e r ' s o r ig ina l ne twork to f ind P -d i m e n s i o n a l s t ruc ture topo log ica l ly different f rom the un i t cube i n Up. T h e e x a m p l e o f an ell ipse is considered, and i t is shown that the a p p r o x i m a t i o n p r o d u c e d by the 7-layer ne twork is a subs tan t i a l improvemen t over that p roduced by the 5-layer ne twork . iv Table of Contents Abstract ii List of Tables viii List of Figures ix List of Acronyms xvi Acknowledgements xix 1 Introduction 1 2 Nonlinear Principal Component Analysis: Theory and Implementation 5 2.1 Introduction 5 2.2 Feature Extraction Problems 5 2.2.1 Principal Component Analysis 6 2.2.2 Nonlinear Principal Component Analysis 9 2.3 Implementation of N L P C A 17 2.4 Dynamical Significance of Low-Dimensional Approximations 20 3 Nonlinear Principal Component Analysis of the Lorenz Attractor 23 3.1 Introduction 23 3.2 Model Building 25 3.3 Results 25 3.4 Conclusion 37 4 Nonl inear P r inc ipa l Component Analysis of Tropica l Indo-Pacific Sea Surface Temperature and Sea Level Pressure 39 4.1 Introduction 39 4.2 Data and Model Building 41 4.3 Tropical Pacific Sea Surface Temperature 42 4.4 Tropical Indo-Pacific Sea Level Pressure 65 4.5 Conclusions 75 5 Nonl inear P r inc ipa l Component Analysis of Nor the rn Hemisphere A t -mospheric Ci rcula t ion D a t a 77 5.1 Introduction 77 5.2 Data and Model Building 80 5.3 Analysis of GCM Sea Level Pressure 82 5.4 Analysis of GCM 500mb Heights 100 5.5 Analysis of GCM SLP in a 4xC0 2 Integration 115 5.6 Conclusions 123 6 Seven-Layer Networks for Discontinuous Project ion and Expans ion Func-tions 125 6.1 Introduction 125 6.2 Neural Network Approximations to Discontinuous Functions 126 6.3 7-Layer NLPCA Network 130 6.4 Conclusions 132 vi 7 Summary and Conclusions 136 7.1 Summary 136 7.2 Conclusions 142 Appendices 143 A Neural Networks 143 B Principal Curves and Surfaces 147 C Symmetric and Anti-symmetric Components of Composites 150 Bibliography 153 vn List of Tables Percentages of variance explained by the ID and 2D N L P C A approxima-tions to the Lorenz data for the three noise levels 77 considered vm List of Figures 2.1 T h e 5-layer feed-forward autoassociat ive neura l ne twork used to p e r f o r m N L P C A 10 3.1 T h e L o r e n z a t t rac tor , p ro jec ted on the (2:1,2:3), (2:3, £ 2 )> a n d (x2,X\) planes 24 3.2 A s i n F i g u r e 3.1, for a subsample of 584 points . 26 3.3 Noise-free L o r e n z da t a for a subsample of 584 points and the i r I D P C A a p p r o x i m a t i o n , p ro jec ted as i n F i g u r e 3.1 (note axes have been rescaled) . T h e dots represent the o r ig ina l da ta poin ts , the open circles represent poin ts of the a p p r o x i m a t i o n 28 3.4 A s i n F i g u r e 3.3, but for the I D N L P C A a p p r o x i m a t i o n 29 3.5 Noise-free L o r e n z da ta for a subsample of 584 points and the i r 2 D P C A a p p r o x i m a t i o n . T h e dots represent the o r ig ina l da t a poin ts , a n d the open circles the points of the a p p r o x i m a t i o n 31 3.6 A s i n F i g u r e 3.5, but for the n o n m o d a l 2 D N L P C A a p p r o x i m a t i o n . . . . 32 3.7 L o r e n z d a t a w i t h noise level rj = 2.0 for a subsample o f 584 poin ts and its I D N L P C A a p p r o x i m a t i o n . Do t s represent the da t a poin ts and the open circles represent points of the a p p r o x i m a t i o n 33 3.8 A s i n F i g u r e 3.7, bu t w i t h the 2 D n o n m o d a l N L P C A a p p r o x i m a t i o n o f the L o r e n z da t a w i t h noise level n = 2.0 35 3.9 A s i n F i g u r e 3.7, for Lo renz da ta w i t h noise leve l 77 = 5.0 36 ix 4.1 S p a t i a l pa t terns of the first three S S T A E O F pat terns , no rma l i s ed to un i t magn i tude . T h e contour in t e rva l is 0.02, the zero contour is i n b o l d , a n d negat ive contours are dashed 43 4.2 Sca t te rp lo t of S S T A da ta pro jec ted onto the plane spanned by the first two E O F pat terns . 44 4.3 Sca t te rp lo t of S S T A da ta (points) and S S T A N L P C A mode 1 a p p r o x i m a -t i o n (open circles) p ro jec ted onto the planes spanned by E O F s (a) ( e i , e 2 ) , (b) ( e 2 , e 3 ) , (c) ( e ^63) . (d) shows a scat terplot of the I D N L P C A approx-i m a t i o n p ro jec ted in to the subspace (e 1 ; e2,e3) 46 4.4 (a) P l o t of ai(tn) = Sf(X.(tn)), the s tandardised t i m e series associated w i t h S S T A N L P C A mode 1. (b) P l o t o f the N i n o 3.4 i n d e x n o r m a l i s e d to un i t var iance 47 4.5 Sequence of spa t ia l maps character is ing S S T A N L P C A m o d e 1 for (a) a x = - 3 . 5 (b) a i = - 1 . 5 (c) a i = - . 7 5 (d) a, = - . 2 5 (e) a j = .25 (f) a i = 0.75 (g) a i = 1.5 (h) a i = 3.5. C o n t o u r in te rva l : 0 . 5 ° C 49 4.6 S S T A compos i te maps for "average" (a) E l N i n o and (b) L a N i n a events. C o n t o u r in te rva l : 0 . 5 ° C . (c) S y m m e t r i c component o f composi tes (a) and (b). C o n t o u r in te rva l : 0 . 1 ° C . See text for def in i t ion of composi tes a n d of the s y m m e t r i c component 51 4.7 M a p o f po in twise cor re la t ion coefficient between observed S S T A a n d (a) I D N L P C A a p p r o x i m a t i o n , (b) I D P C A a p p r o x i m a t i o n , a n d (c) difference between (a) and (b) 53 4.8 A s for F i g u r e 4.3, bu t for S S T A N L P C A mode 2 54 4.9 A s for F i g u r e 4.4(a), bu t for ct2(tn), the t ime series cor responding to S S T A N L P C A m o d e 2 55 x 4.10 Maps corresponding to SSTA N L P C A mode 2 for (a) a 2 = - 4 (b) a 2 = - 1 (c) a 2 = -0 .25 (d) a2 = 0 (e) a 2 = 0.15 (f) a 2 = 0.25 (g) a 2 = 0.3 (h) a 2 = 0.35 (i) a 2 = 0.4 (j) a 2 = 0.5 (k) a 2 = 0.75 (1) a 2 = 1.5. Contour interval: 0 . 5 °C 56 4.11 Composites of S S T A for L a Nina events (a) before 1977 and (b) after 1977. Contour interval 0.5°C7 58 4.12 As for Figure 4.3, but for S S T A 2D nonmodal N L P C A approximation. . 60 4.13 Time series (a) B^Q and (b) B2(tn), where (3x,B2)(tn) = s r(X(<„)) is the pair of time series associated with the S S T A 2D nonmodal N L P C A approximation. Both time series have been normalised to unit variance. . 61 4.14 Maps of pointwise correlation between observed SSTA and (a) 2D P C A ap-proximation (b) 2D nonmodal N L P C A approximation, and (c) 2D modal N L P C A approximation 62 4.15 As for Figure 4.3, but for SSTA 2D modal N L P C A approximation. . . . 64 4.16 Spatial patterns of S L P A (a) E O F mode 1, (b) E O F mode 2, (c) E O F mode 3. The black dots in (a) designate the positions of Tahit i and Darwin, Austral ia 67 4.17 As for Figure 4.3, but for S L P A N L P C A mode 1 68 4.18 (a) Plot of ai(tn) = 5 / 1 ( X ( i n ) ) , the standardised time series associated with S L P A N L P C A mode 1. (b) Plot of 5-month running mean of SOI, standardised to unit variance 69 4.19 Plot of a sequence of spatial maps characterising S L P A N L P C A mode 1 for (a) a i = - 3 (b) ax = - 2 {c)ax = - 1 (d) ax = - 0 . 5 (e) ax = 0 (f) ax = 0.5 (g) a : = 1 (h) a : = 2. Contour interval: 0.5 hPa 70 4.20 Composites of S L P A during (a) E l Nino and (b) L a Nina . Contour Interval: 0.5 hPa 71 x i 4.21 Spatial pattern of pointwise correlation coefficient between SLPA and (a) ID N L P C A approximation and (b) ID P C A approximation 73 4.22 SLPA N L P C A mode 2 (a) Spatial pattern (not normalised, units are hPa) and (b) time series (normalised to unit variance) 74 5.1 Spatial structure of the leading E O F pattern from observed SLP. Contour intervals are 1 hPa (...,-1.5,-0.5,0.5,1.5,...) 79 5.2 Spatial structure of the leading four E O F patterns from CCCma SLPA: (a) E O F 1, (b) E O F 2, (c) E O F 3, (d) E O F 4. These patterns explain 23.7%, 10.6%, 8.5%, and 6.5% of the variance in SLP, respectively. Negative contours are dashed. Contour intervals are 1 hPa (..., -1.5, -0.5, 0.5, ...). . 83 5.3 Spatial structure of the leading four E O F patterns from CCCma Z500A: (a) E O F 1, (b) E O F 2, (c) E O F 3, (d) E O F 4. These patterns explain 19.6%, 12.5%, 9.3%, and 8.2% of the variance in Z50o, respectively. Negative contours are dashed. Contour intervals are 10 m (..., -15, -5, 5, ...). . . . 84 5.4 Scatterplot of the leading two SLPA P C time series, overlaid with a his-togram estimate of the corresponding marginal probability density func-tion. Contour intervals are 5 x 10 - 4 ,1.5 x 10 _ 3 ,3 x 10~3,6 x 10~3,1 x 10 2 , 2 x 10 2 , 3 x 10 2 . The histogram bin size is 25 hPa in both directions. 85 5.5 ID N L P C A approximation X of SLPA, projected in the space of the first two SLPA EOFs (open circles), overlaying histogram estimate of SLPA P D F as in Figure 5.4 86 5.6 Plot of the ID SLPA N L P C A time series a(tn) (left) and the associated histogram estimate of the P D F (right) 87 xn 5.7 A s i n F i g u r e 5.5, but w i t h the P D F s of the popula t ions cor responding to B r a n c h 1 (sol id contours) and B r a n c h 2 (dashed contours) p l o t t e d sepa-ra te ly , and w i t h a b i n size of 20 h P a . 89 5.8 Compos i t e s of S L P A over character is t ic ranges of a. These ranges are i n d i c a t e d i n parentheses below the maps , a long w i t h the n u m b e r N o f maps used i n the composi te . C o n t o u r in t e rva l is 2 h P a (...,-3,-1,1,3,...). C o n t i n u e d on next page 90 5.8 C o n t i n u e d 91 5.9 A s w i t h F i g u r e 5.8, bu t for composi tes of Z500 anomal ies . T h e con tour i n t e rva l is 20 m (...,-30,-10,10,30,...) 93 5.9 C o n t i n u e d 94 5.10 A s w i t h F i g u r e 5.8, bu t for composi tes of £500- T h e 5300 and 5500 m contours are i n b o l d . C o n t o u r in te rva l is 50 m 95 5.10 C o n t i n u e d 96 5.11 M a p s of spa t ia l d i s t r i bu t i on of variance of S L P f rom (a) C C C m a G C M and (b) observations (contour in te rva l 10 ( h P a ) 2 ) , a n d skewness of S L P f r o m (c) C C C m a G C M and (d) observations (contour i n t e rva l 0.2). . . . 98 5.12 H i s t o g r a m est imates o f 2D m a r g i n a l P D F s o f Z 5 0 0 A P C s (a ) (PCl , PC72), (b) (PC71,Pc73), (c)(PC2,PC73), and ( d ) ( P C l , PC4) . T h e contour inter-vals are as i n F i g u r e 5.4. B i n sizes are 2500 m 101 5.13 P r o j e c t i o n of the Z500A da t a (dots) and i ts I D N L P C A a p p r o x i m a t i o n (open circles) onto the spaces spanned by E O F s (a) ( e i , 62), (b) ( e i , e 3 ) , (c) ( e 2 , e 3 ) , and (d) ( e i , e 2 , e 3 ) 102 5.14 P l o t o f the I D Z500A t i m e series a(tn) (left) a n d the associa ted h i s t o g r a m es t imate of the P D F (right) 103 X l l l 5.15 A s i n F i g u r e 5.9, bu t for Z500A compos i t ed over charac ter i s t ic ranges o f a associa ted w i t h the I D Z500A N L P C A a p p r o x i m a t i o n . C o n t o u r i n t e r v a l is 2 0 m (.. . ,-30,-10,10,30,.. .). C o n t i n u e d on next page 105 5.15 C o n t i n u e d 106 5.16 Sca t te rp lo t of the t ime series a(tn) cor responding to the I D S L P A a n d Z500A N L P C A approx imat ions 108 5.17 M a p s o f var iance (a) and skewness (b) of C C C m a mode l l ed Zsoo- C o n t o u r i n t e rva l i n (a) is 500 m 2 and i n (b) is 0.2 110 5.18 A s w i t h F i g u r e 5.15, but for composi tes of Z50Q. T h e 5300 and 5500 m contours are i n b o l d . C o n t o u r in t e rva l is 50 m I l l 5.18 C o n t i n u e d 112 5.19 A s i n F i g u r e 5.15, but for S L P A . C o n t o u r in t e rva l is 2 h P a (.. . ,-3,-1,1,3,. . .). C o n t i n u e d on next page 113 5.19 C o n t i n u e d . . 114 5.20 H i s t o g r a m es t imate o f the m a r g i n a l p r o b a b i l i t y dens i ty f u n c t i o n (contours) of S L P A f rom the G C M in tegra t ion w i t h C 0 2 concentra t ions at four t imes the p re - indus t r i a l value, i n the space of the l ead ing two con t ro l i n t eg ra t ion S L P A P C A modes , over la id w i t h the cor responding I D N L P C A approx-i m a t i o n (open circles) . C o n t o u r intervals are 5 x 1 0 — 4 , 1 . 5 x 1 0 ~ 3 , 3 x 1 0 ~ 3 , 6 x l O - 3 , 1 x l O - 2 , 2 x 1 0 - 2 , 3 x 1 0 ~ 2 . T h e h i s tog ram b i n size is 25 h P a i n b o t h direct ions 116 5.21 P l o t of the I D N L P C A S L P A t ime series a(tn) (left) and the associa ted h i s t o g r a m es t imate of the P D F (right) for the G C M in tegra t ion w i t h CO2 concen t ra t ion at four t imes the p re - indus t r i a l level 117 xiv 5.22 As in Figure 5.8, but for SLPA in the GCM integration at four times the pre-industrial CO2 concentration. Contour interval is 2 hPa (...,-3,-1,1,3,...). Continued on next page 118 5.22 Continued 119 5.23 (a) Variance (contour interval 10 (hPa)2) and (b) skewness (contour in-terval 0.2) of SLPA from GCM integration with quadrupled atmospheric C 0 2 122 6.1 Plot of Y(tn) (diamonds) and X(i n) (open circles) as defined by equation (6.3) with iV = 50 128 6.2 Neural network approximations of the functional relationship between data sets X(i„) and Y(tn) defined in equation (6.3): (a) network with one hidden layer, (b) network with two hidden layers 129 6.3 Results of ID NLPC analysis of an ellipse using a 5-layer autoassociative neural network: (a) NLPCA approximation X(£„), (b) associated time series a(tn) = s/(X(tn)) (note scale on y-axis is arbitrary) 131 6.4 As in Figure 6.3, but for NLPCA performed using a 7-layer autoassociative neural network 133 A. l Diagrammatic representation of neural network with input data X and output data Z 144 xv List of Acronyms A A O A n t a r c t i c O s c i l l a t i o n A O A r c t i c O s c i l l a t i o n C C C m a C a n a d i a n C e n t r e for C l i m a t e M o d e l l i n g a n d A n a l y s i s E O F E m p i r i c a l O r t h o g o n a l F u n c t i o n E N S O E l N i n o / S o u t h e r n O s c i l l a t i o n F E V F r a c t i o n of E x p l a i n e d Va r i ance F U V F r a c t i o n of U n e x p l a i n e d Va r i ance G C M G e n e r a l C i r c u l a t i o n M o d e l N A O N o r t h A t l a n t i c O s c i l l a t i o n N L P C A N o n l i n e a r P r i n c i p a l C o m p o n e n t A n a l y s i s N M S D N o r m a l i s e d M e a n Square Dis t ance P C A P r i n c i p a l C o m p o n e n t A n a l y s i s P C S P r i n c i p a l C u r v e s a n d Surfaces P D F P r o b a b i l i t y D e n s i t y F u n c t i o n P N A P a c i f i c - N o r t h A m e r i c a R P C A R o t a t e d P r i n c i p a l C o m p o n e n t A n a l y s i s x v i SLP (A) Sea L e v e l Pressure ( A n o m a l y ) SST(A) Sea Surface Tempera tu re ( A n o m a l y ) Z5oo(A) 500mb geopotent ia l height ( A n o m a l y ) x v n C A L V I N : I used to hate writing assignments, but now I enjoy them. I realized that the purpose of writing is to innate weak ideas, obscure poor reasoning and inhibit clarity. With a little practice writing can be an intimi-dating and impenetrable fog! Want to see my book report? H O B B E S (reading): "The Dynamics of Interbeing and Monological Im-peratives in 'Dick and Jane': A Study in Psychic Transnational Modes." C A L V I N : Academia, here I come! Bill Watterson X V l l l Acknowledgements I would like to thank my supervisors Dr. William Hsieh and Dr. Lionel Pandolfo for their help and guidance during the course of this work. Thanks are also due to Dr. John Fyfe and Dr. Greg Flato, who collaborated on much of the work presented in Chapter 5. As well, I would like to acknowledge Dr. Benyang Tang, Dr. Susan Allen, Dr. Phil Austin, Dr. Cindy Greenwood, and Dr. Roland Stull for suggestions and advice. None of this work could have been accomplished without the computer support pro-vided by Denis Laplante. Merci. My colleagues, past and present, improved this work with useful comments and sug-gestions. Thank you, Steven Bograd, Ana Carrasco, Pal Isachsen, Youmin Tang, Fredolin Tangang, and Yuval. I am grateful to the National Science and Engineering Research Council, the Crisis Points Group of the Peter Wall Institute for Advanced Study, and the University of British Columbia for financially supporting me over the course of this work. Tom Waits, John Zorn, and Frank Zappa have helped me maintain what has passed for my sanity over the last four years. Gentlemen, I am in your debt. Most of all, I would like to express my love and gratitude to my friends and family. Thank you Drew, Michelle, James, Anna, Jenn, Jono, and Alison; thank you Mom, Dad, Erin, Sarah, and Darren. Thanks for putting up with me, and thanks for dragging me away from my computer every now and again. x i x Chapter 1 Introduction Early work in the field of climate variability involved the consideration of climatic vari-ables averaged over very large spatial and temporal scales. These variables could in consequence be represented in phase spaces of a small number of dimensions. However, the range of spatial and temporal scales under consideration has broadened substantially over the last few decades, so that data sets considered in contemporary research may involve phase spaces with hundreds to thousands of dimensions. While this refinement of scales allows consideration of a richer class of physical phenomena, it also confounds attempts at reaching an holistic understanding of the data under consideration. A typ-ical modern climatic dataset is overwhelming in the amount of information it contains, so statistical techniques to distill massive multivariate datasets down to a phase space of smaller dimension, characterising the essential information contained in the data, are of great importance. In other words, it is necessary to develop methods to extract the signal from noise in climate data. These methods may be described as belonging to the gen-eral class of feature extraction problems, which attempt to characterise lower-dimensional structure in large multivariate datasets. One of these feature extraction methods, Principal Component Analysis (PCA), also known as Empirical Orthogonal Function (EOF) analysis, has been widely used in oceanography and meteorology since its introduction to these fields by Lorenz (1956). PCA is an objective technique used to detect and characterise optimal lower-dimensional linear structure in a multivariate data set, and it is one of the most important methods 1 Chapter 1. Introduction 2 in the geostatistician's multivariate statistics tool-box. Consequently, it has been well-studied, and standard references exist describing the method and its implementation (Preisendorfer, 1988; Wilks, 1995; von Storch and Zwiers, 1999). Its applications include reduction of data dimensionality for data interpretation (e.g. Barnston and Livezey, 1987; Miller et al., 1997) and for forecasting (e.g. Barnston and Ropelewski, 1992; Tangang et al., 1998). Furthermore, the connection between the results of PCA, which are statistical in nature, and the underlying dynamics of the system under consideration are understood in some detail (North, 1984; Mo and Ghil, 1987). By construction, PCA finds a lower-dimensional hyperplane which optimally charac-terises the data, such that the sum of squares of orthogonal deviations of the data points from the hyperplane is minimised. If the structure of the data is inherently linear (for example, if the underlying distribution is Gaussian), then PCA is an optimal feature ex-traction algorithm; however, if the data contain nonlinear lower-dimensional structure, it will not be detectable by PCA. An example of a data set with nonlinear low-dimensional structure is the noisy parabola X{tn) = {tn,t2nf + e(tn) (1.1) where e(tn) is a 2D iid AA(0,S2) noise process. Underlying this 2D data set is a ID parabolic curve. Because this curve cannot be described by a single straight line, a ID PCA approximation would be unable to characterise this underlying ID structure. In the early 1990s, a neural-network based generalisation of PCA to the nonlinear fea-ture extraction problem was introduced in the chemical engineering literature by Kramer (1991), who referred to the resulting technique as Nonlinear Principal Component Anal-ysis (NLPCA). Another solution to this problem, coming from the statistics community, was put forward independently by Hastie and Stuetzle (1989), who named their method Principal Curves and Surfaces (PCS). Recently, Malthouse (1998) demonstrated that Chapter 1. Introduction 3 NLPCA and PCS are closely related, and are, for a broad class of situations, essen-tially the same. Kramer's NLPCA has been applied to problems in chemical engineering (Kramer, 1991; Dong and McAvoy, 1996), psychology (Fotheringhame and Baddeley, 1997; Takane, 1998), dynamical systems theory (Kirby and Miranda, 1994; Kirby and Miranda, 1999), biomedical signal processing (Stamkopoulos et al., 1998), satellite remote sensing (Del Frate and Schiavon, 1999), and image compression (De Mers and Cottrell, 1993), but apart from a single unpublished report by Sengupta and Boyle (1995), the results of which were equivocal, it has not been applied to the large multivariate datasets common in oceanic and atmospheric sciences. The object of this thesis is to investigate the application of NLPCA to climatic data sets. A brief review of PCA and a description of NLPCA are given in Chapter 2. As well, Chapter 2 contains a discussion of subtleties in the implementation of NLPCA. In particular, it addresses the problem of overfitting and presents the approach adopted for its avoidance. In Chapter 3, NLPCA is first applied to a synthetic data set sampled from the Lorenz attractor (Lorenz, 1963). Synthetic data are used to develop intuition about the implementation of NLPCA and its results. As well, the addition of noise to this synthetic data allows an investigation of the ability of this statistical tool to extract structure from noisy data. The application of NLPCA to actual climate data is inves-tigated first in Chapter 4, in which the low-dimensional nonlinear structure of tropical Pacific Ocean sea surface temperatures (SST) and tropical Indo-Pacific sea level pressure (SLP) is investigated. Chapter 5 contains the results of an analysis of Northern Hemi-sphere extratropical SLP and 500mb geopotential height from the Canadian Centre for Climate Modelling and Analysis (CCCma) coupled general circulation model (GCM). Chapter 6 discusses a generalisation of Kramer's NLPCA to a 7-layer autoassociative neural network, which can solve a broader class of feature extraction problems than can Chapter 1. Introduction 4 the o r ig ina l 5-layer ne twork . Conc lus ions are presented i n C h a p t e r 7. A p p e n d i x A de-scribes feed-forward neura l networks and A p p e n d i x B brief ly in t roduces P r i n c i p a l C u r v e s . A p p e n d i x C discusses the s y m m e t r i c and a n t i s y m m e t r i c components of composi tes , an i dea i n t r o d u c e d i n C h a p t e r 4. Chapter 2 Nonlinear Principal Component Analysis: Theory and Implementation 2.1 Introduction Both traditional Principal Component Analysis (PCA) and Nonlinear Principal Compo-nent Analysis (NLPCA) are examples of feature extraction problems, the goal of which is to extract from multivariate data sets representative structures of lower dimension. However, unlike P C A , N L P C A does not admit an analytic solution. Its implementation involves the iterative solution of a variational problem, which must be approached with care. This chapter presents P C A and N L P C A as feature extraction problems, and then addresses the methodology adopted for the implementation of N L P C A . 2.2 Feature Extraction Problems Denote by X ( i n ) £ SRM a typical meteorological or oceanographic data set, where n G ( 1 , N ) labels observation times and the individual components of the vector X ( i n ) correspond to individual observing stations. It is usually the case that the field values at different stations do not evolve independently in time, that is, that the temporal variability of X ( t n ) includes contributions from large-scale, spatially-coherent features. In such a circumstance, the data will not be scattered evenly through the M-dimensional phase space of stations, but will tend to cluster around lower-dimensional surfaces; it is then appropriate to describe X ( i n ) by the model: 5 Chapter 2. Nonlinear Principal Component Analysis: Theory and Implementation 6 X(* n ) = (f os f)(X(< n)) + en = ±{tn) + en (2.1) The function S f : D£M —> 9£ p, 1 < P < M parameterises a manifold of dimensionality lower than that of X ( i n ) , f : SRP -> 3ftM is a smooth map from this manifold to the original space, and the en are residuals. The estimation of S f and f from the data X ( i n ) , subject to an optimality criterion such as minimising the sum of squares of the residuals, is an example of a feature extraction problem: given a noisy data set X ( i n ) , it is desired to retrieve the signal X(£ n ) = (f o s f)(X(£„)). Doing so, the M-dimensionality of X(tn) is treated as being in a sense only superficial, as the signal of interest lives on a P-dimensional manifold embedded in SRM. Because once f and S f have been found, it is no longer necessary no longer need to work with the signal in 9ftM and can concentrate instead on the signal in 3? p, feature extraction can also be thought of as reduction of data dimensionality. The method of feature extraction most common in the atmospheric and oceanic sci-ences is Principal Component Analysis (PCA), which optimally extracts linear features from the data. However, if the underlying structure of the data is nonlinear, traditional P C A is suboptimal in characterising this lower-dimensional structure. This deficiency of P C A motivates the definition of Nonlinear Principal Component Analysis ( N L P C A ) . In this section, I providea brief review of P C A and demonstrate how it generalises naturally to a nonlinear method. I then discuss the N L P C A method in detail. 2.2.1 P r inc ipa l Component Analysis Traditional P C A can be formulated variationally as a special case of the feature-extraction problem, in which the data X(i„) (assumed, without loss of generality, to have zero mean Chapter 2. Nonlinear Principal Component Analysis: Theory and Implementation 7 in time) is fit to the linear P-dimensional model: p X(i n) = £ ( X ( t „ ) - e f c ) e f c + e„ (2.2) fc=i for vectors e*. £ 9ftM, e^ • e.,- = 8kj such that the sum of squares of the residuals: J = < | | X - X | | 2 > (2.3) is a minimum, where angle brackets denote a sample time average. The vector e*. is the Aith Empirical Orthogonal Function (EOF) and the projection of X(i n ) on ek is the fcth Principal Component (PC). The product of the fcth EOF with the fcth PC defines a vector time series usually referred to as the fcth PC mode. The P-dimensional PCA approximation X(£ n ) to X(f„) lives on the P-dimensional hyperplane that passes optimally through the middle of the data (von Storch and Zwiers, 1999). Principal Component Analysis has the variance partitioning property: M M M £ var(Xi) = £ var(Xi) + £ var(X< - Xi), (2.4) i=l «=1 »=1 so it is sensible to say that X(i n) "explains" a certain fraction of the variance of X(£ n ) . In particular, X(i n) = (X(in) • ei)ei is the one-dimensional linear approximation to X(£ n ) which explains the highest percentage of the variance. The fraction of variance explained by X(£ n ) is a non-decreasing function of the approximation dimension P; increasing the dimensionality of the PCA approximation increases its fidelity to the original data. Principal Component Analysis is usually thought of in terms of the eigenstructure of the data covariance matrix C =< X X T >. In fact, the vectors e^ are eigenvectors of C corresponding to the P largest eigenvalues: Cek = \ k e k (2.5) where Ai > A 2 > ... > Xp. This fact follows from the minimisation of (2.3) subject to the constraint that the vectors e^ are normalised. While this eigenanalysis approach is the Chapter 2. Nonlinear Principal Component Analysis: Theory and Implementation 8 s t anda rd approach to ca lcu la t ing the e*., i t has no analogue i n the nonl inear genera l i sa t ion to be considered. T h e va r i a t iona l fo rmula t ion of P C A (from w h i c h follows i ts r e l a t ion to the e igens t ruc ture of the covariance m a t r i x ) is emphas ised because i t generalises n a t u r a l l y to the nonl inear feature ex t r ac t i on p rob lem. N o t e tha t the P C A a p p r o x i m a t i o n X ( i „ ) to X ( i n ) is the compos i t i on o f two funct ions : 1. a p ro j ec t i on func t ion Sf : S R M —l 3 ? p : s f ( X ( t n ) ) = ( X ( i n ) . e 1 , . . . , X ( t n ) - e P ) T = I I X ( * n ) (2.6) where I I is the P x M m a t r i x whose fcth row is the vector e^, and I. an expans ion func t ion f : 3tp ->• * R M : f (sf) = n T s f (2.7) T h u s , the P C A a p p r o x i m a t i o n X ( £ „ ) to X ( i n ) is g iven by X{tn) = ( f 0 8 f ) ( X ( t B ) ) = n T ( n x ( i „ ) ) = ( n r n)x ( i n ) (2.8) In the language o f L e B l a n c and T i b s h i r a n i (1994), the p ro jec t ion func t ion characterises the dimension reduction aspect of P C A , and the expans ion func t ion characterises i ts function approximation aspect. In t r ad i t i ona l P C A , b o t h the p ro jec t ion a n d expans ion funct ions are l inear . T h i s m e t h o d is thus o p t i m a l i f the feature to be ex t r ac t ed is we l l -charac te r i sed by a set o f o r thogona l , s traight axes: tha t is , i f the d a t a c l o u d is c igar-shaped (e.g. Gauss i an ) . B u t wha t i f the da t a c loud is r ing- l ike , or bowed? I n such cases, there is a clear lower -d imens iona l s t ruc ture to the da ta , bu t not one w h i c h is l inear , Chapter 2. Nonlinear Principal Component Analysis: Theory and Implementation 9 and thus can not be ex t rac ted by t r ad i t i ona l P C A . T h i s mot iva tes the def in i t ion o f a general ised, nonl inear P C A . T h e genera l i sa t ion of P C A to N L P C A can also be m o t i v a t e d b y the fo l lowing obser-v a t i o n . T h e I D P C A a p p r o x i m a t i o n X ( £ n ) to X ( i n ) is separable i n t e rms o f i ts spa t i a l a n d t e m p o r a l s t ruc ture . T h a t is, X ( £ n ) is the p roduc t of a func t ion of t i m e , X ( i „ ) • e i , w i t h a func t ion of space, e i : X(tn) = (X{tn)-e1)e1. (2.9) I n consequence, this a p p r o x i m a t i o n can on ly describe s tand ing va r i ab i l i t y i n the d a t a set, tha t is , a f ixed spa t ia l pa t t e rn w i t h an ampl i t ude that varies as a func t i on o f t i m e . T h e r e is no a priori reason to bel ieve that the o p t i m a l I D a p p r o x i m a t i o n to a c l i m a t i c da t a set is a s t and ing osc i l l a t ion , bu t that is a l l P C A can produce . A s sha l l be seen i n Chap t e r s 4 a n d 5, by m o v i n g to N L P C A , I D approx imat ions to c l i m a t i c da t a sets w h i c h are not s t and ing osci l la t ions m a y be obta ined . 2.2.2 Nonl inear P r inc ipa l Component Analysis T o c i r cumven t the l imi t a t i ons of l inear i ty inherent i n the P C A m o d e l (2.2), K r a m e r (1991) i n t r o d u c e d a non l inear general isa t ion that solved the general feature e x t r a c t i o n p r o b l e m descr ibed by the m o d e l (2.1), where f and Sf are a l lowed to be nonlinear funct ions . G i v e n da t a X{tn) e 5 R M , the p r o b l e m is to es t imate funct ions s/ : 3 ? M dtp a n d f : 9 £ p where P < M, such that the a p p r o x i m a t i o n X ( t n ) = ( f o 8 / ) ( X ( t n ) ) (2.10) to X(tn) passes t h r o u g h the m i d d l e of the data , ie, such that the s u m of squared res iduals , J = < | | X ( t n ) - X ( i n ) | | 2 > (2.11) Chapter 2. Nonlinear Principal Component Analysis: Theory and Implementation 10 Input Bottleneck Output layer layer l a y e r Encoding Decoding layer layer M r M F i g u r e 2.1: T h e 5-layer feed-forward autoassociat ive neura l ne twork used to pe r fo rm N L P C A . is a m i n i m u m . C a l l e d Nonlinear Principal Component Analysis ( N L P C A ) , K r a m e r i m -p lemen ted his so lu t ion us ing a 5-layer feed-forward neura l ne twork . N e u r a l ne tworks are nonl inear , n o n p a r a m e t r i c s ta t i s t i ca l tools for func t ion e s t ima t ion . T h e y are descr ibed i n de t a i l i n A p p e n d i x A . F i g u r e 2.1 shows the archi tec ture of the 5-layer ne twork used to ex t rac t the I D N L P C A a p p r o x i m a t i o n to the da t a set X ( i „ ) £ SftM; this ne twork is u n u s u a l i n tha t the t h i r d layer contains on ly a single neuron . T h i s t h i r d layer as is referred to as the bot-tleneck layer. T h e first (input) and fifth (output) layers each con ta in M neurons. Laye r s Chapter 2. Nonlinear Principal Component Analysis: Theory and Implementation 11 2 and 4 are called respectively the encoding and decoding layers; they contain L neurons, the transfer functions of which are hyperbolic tangents. The transfer functions of the bottleneck and output layers are linear. As input, the network is presented the vector X(i n ) for each time tn; the corresponding network output is denoted Af(X.(tn)). The weights and biases are adjusted ("trained"), using a conjugate gradient algorithm (Press et al., 1992) until the sum of squared differences between input and output: J =< \\X(tn) - N-(X(tn))\\2 > (2.12) is minimised (subject to certain robustness criteria discussed in the next section). Because the network is trained to approximate as closely as possible the input data itself, it is said to be autoassociative. It was proved by Sanger (1989) that if the transfer functions of the neurons in the second and fourth layers are linear, the resulting network performs classical PCA, such that the output of the bottleneck layer is the time series Sf(tn) of equation (2.6) (up to a normalisation factor). Now consider the manner by which this network solves the feature extraction problem for P = 1. The first three layers, considered alone, form a map from D?M to SR, and the last three layers alone form a map from 5R to $lM. All five layers together are simply the composition of these two maps. Because the bottleneck layer contains only a single neuron, the network must compress the input down to a single one-dimensional time series before it produces its final M-dimensional output. Once the network has been trained, the output M(X.(tn)) is the optimal one-dimensional approximation to X(f n), embedded in 5ftM. As is discussed in Appendix A, it is known from a result due to Cybenko (1989) that if L is sufficiently large, then the first three layers can approximate any continuous Sf, and the last 3 layers any continuous f, to arbitrary accuracy. Thus, the network illustrated in Figure 2.1 should be able to recover optimally, in a least-squares sense, any one-dimensional nonlinear structure present in X(i„) for which the projection and Chapter 2. Nonlinear Principal Component Analysis: Theory and Implementation 12 expansion functions are continuous. It is not required that the encoding and decoding layers each have the same number of neurons, but the numbers are fixed to be the same so as to have only one free parameter in the model architecture. That the network must be composed of (at least) 5 layers follows from the fact that each of the functions Sf and f requires a network with (at least) 3 layers for its approximation. The composition f os f of the two must then have at least 5 layers, as one layer is shared. The network illustrated in Figure 2.1 will extract the optimal one-dimensional curve characterising X(£ n ) . To uncover higher-dimensional structure, the number of neurons in the bottleneck layer can be increased. For example, if two neurons are used, the network will determine the optimal two-dimensional characterisation (by continuous functions) of X(i n). In general, a P-dimensional NLPCA approximation to X(i n) can be obtained by setting to P the number of neurons in the bottleneck layer. Another solution to the general feature extraction problem was introduced indepen-dently by Hastie and Stuetzle (1989). Their method, termed Principal Curves and Sur-faces (PCS), is described in Appendix B. Principal Curves and Surfaces is based on a rather different set of ideas than NLPCA. In practice, however, because both minimise the sum of squared errors (2.11), the two methods both boil down to iterative solutions to the variational formulation of a feature extraction problem. In fact, Malthouse (1998) argued that NLPCA and PCS are quite similar for a broad class of feature extraction problems. A primary difference between NLPCA and PCS is that in the former the pro-jection function Sf is constrained to be continuous, while in the latter it may have a finite number of discontinuities. Although here I will investigate the use of Kramer's NLPCA, because its implementation is straightforward, PCS has a stronger theoretical underpin-ning. In Chapter 6 a generalisation of Kramer's NLPCA that can model discontinuous projection and expansion functions is introduced, and is thus closer to PCS. As noted by LeBlanc and Tibshirani (1994), Hastie and Stuetzle's PCS partitions Chapter 2. Nonlinear Principal Component Analysis: Theory and Implementation 13 variance in the same fashion as does traditional PCA: if X(£ n ) is the PCS approximation to X(t„), then M U M E v a r p f O = 5 > a r ( X 0 + £ v a r p C ; - Xt). (2.13) i=l i=l i=l As with PCA, it is therefore sensible to describe a PCS approximation as explaining a certain fraction of variance in the original dataset. From the close relationship between NLPCA and PCS demonstrated by Malthouse (1998), it is tempting to hypothesise that NLPCA also partitions variance in such a fashion. While I am not aware of a rigorous proof of this result, this partitioning of variance in fact occurs in all of the examples I have considered, and in the following discussion it shall be assumed that equation (2.13) holds for X(f n) the NLPCA approximation to X(i n). Yet a third nonlinear generalisation of PCA was introduced by Oja and Karhunen (1993) and by Oja (1997), in which the map Sf is allowed to be nonlinear while f remains linear. Such a generalisation can be carried out using a two-layer recursive neural network. Because only the projection function is nonlinear, this approach is distinct from the class of feature extraction problems addressed by Kramer's NLPCA and Hastie and Stuezle's PCS. Because the traditional PCA model has the additive structure (2.2), the optimal linear P-dimensional substructure of X(i n) can be found all at once, or mode by mode; both methods yield the same result. In the iterative approach, the first mode X^^(tn) of X(i n ) is calculated from the entire data set, and then the second mode is calculated from the residual X(i n ) — X^^(in), taking advantage of the fact that the second PC mode of X(£ n ) is the first PC mode of this residual. The two approaches are equivalent for PCA because the most general linear function of P variables has the additive structure: p g [ u 1 , U 2 , . . . , U P ) = YjaiUi- (2-14) They are generally distinct, however, for NLPCA, as an arbitrary smooth function / of Chapter 2. Nonlinear Principal Component Analysis: Theory and Implementation 14 P variables cannot be decomposed as a sum of smooth functions of one variable. That is, in general / ( « i , « 2 , ...,uP) + + / 2 M + ... + fp{uP) (2.15) for some functions / 1 , / 2 , f p : f cannot usually be written as a GeneraUsed Additive Model (Hastie and Tibshirani, 1990). The iterative approach will be referred to as a modal analysis, and to the all-at-once approach as nonmodal. Naturally enough, in a modal analysis, each ID approximation will be referred to as a mode, and ordered in terms of decreasing fraction of variance explained. I will compare both the modal and nonmodal approaches in this thesis. Theoretically, the nonmodal P-dimensional approximation should be superior to the modal approximation, because it is drawn from a broader class of functions, although the modal analysis is more amenable to interpretation. Of course, a general P-dimensional analysis could involve both modal and nonmodal decompositions at various stages; such mixed modal/nonmodal analyses will not be considered here. Malthouse (1998) pointed out two limitations of NLPCA as formulated by Kramer (1991). First, Kramer's NLPCA is unable to characterise low-dimensional structure which is self-intersecting. Because the projection Sf must be discontinuous for a self-intersecting surface, there will be open neighbourhoods in that are mapped by Sf into non-open neighbourhoods in SRP. Consider the example of a circle in SR2. It is a ID surface, a natural parameterisation of which is the interval 6 £ [0,27r], with the points 6 = 0 and 8 = 2n identified (5 1 topology). Clearly, for any small e, there will exist an open neighbourhood on the circle which maps onto the non-open set [0, e) U (27r — e,2ir]. This limitation of Kramer's NLPCA is not of great importance to the analysis of climate data, as precisely cyclic variability is not characteristic of climatic systems. An exception is perhaps the annual cycle, but it is typically removed Chapter 2. Nonlinear Principal Component Analysis: Theory and Implementation 15 from climate data before analysis. The limitation can be removed by moving to a 7-layer neural network, which can approximate discontinuous projection and expansion functions. This issue is addressed in Chapter 6. The second limitation of Kramer's NLPCA highlighted by Malthouse is that the parameterisation Sf of the NLPCA approximation is only determined up to an arbi-trary homeomorphism (i.e., a continuous, one-to-one, and onto function with a contin-uous inverse). That is, for an arbitrary homeomorphism g : 5RP \-¥ 3?p, the time series g(sf ( X ( £ n ) ) ) is also an acceptable parameterisation of the surface, because f o Sf = (f o g _ 1 ) o (g o S f ) . This degeneracy is a potentially serious complication in the interpretation of the time series produced by the bottleneck layer of Kramer's net-work. Based on the results presented in this thesis, it is apparent that this degeneracy is not problematic in a modal NLPC analysis. Homeomorphisms from 3ft to 3ft are functions which can only stretch and compress (locally or globally), or translate globally, and thus do not radically change the information present in the time series Sf (X(£„) ) . The time series arising from the modal analyses in Chapters 4 and 5 are amenable to natural interpretation in terms of familiar phenomena in the systems under consideration. This degeneracy is substantially-more significant in the case of a nonmodal analysis, because homeomorphisms from $t.p to 3ftp for P > 1 can include rotations as well as dila-tions and compressions. Generally, the time series produced by the P different neurons in the bottleneck layer will thus not be independent, or even uncorrelated, because of mix-ing induced by this arbitrary rotation. Any P-dimensional surface can be parameterised by a set of independent variables 7;, i = 1,...,P. Determining such a set from the set of P time series Bi(tn) determined empirically by NLPCA is another problem of feature extraction in the space of the variables parameterising the surface. In principle, PCA or modal NLPCA can be used to calculate the ji(tn). An example of such an approach Chapter 2. Nonlinear Principal Component Analysis: Theory and Implementation 16 is cons idered i n C h a p t e r 5. T h e fact tha t n o n m o d a l N L P C A leads to a second feature e x t r a c t i o n p r o b l e m reduces i ts u t i l i ty , despite i ts aesthetic appeal . A n o t h e r c o m m o n general isat ion of P C A is R o t a t e d P r i n c i p a l C o m p o n e n t A n a l y s i s ( R P C A ; R i c h m a n , 1986). A s was po in ted out by B u e l l (1975,1979), because P C A is designed to m a x i m i s e the g loba l var iance exp la ined by the l ead ing mode , a n d because successive modes are cons t ra ined to have or thogonal spa t ia l pa t te rns , i ts results can be s t rong ly affected by the shape of the da ta doma in . R o t a t e d P C A addresses this p r o b l e m by m o d i f y i n g the cost func t ion (2.3) to inc lude a "s imple s t ruc ture c r i t e r i on" so tha t the r e su l t ing a p p r o x i m a t i o n strikes a compromise between m a x i m i s i n g i ts exp l a ined var iance and m i n i m i s i n g i ts spa t ia l scale. In doing so, ei ther the o r thogona l i ty of the spa t i a l pa t te rns , the uncorrelatedness of the t ime series, or b o t h , must be sacrif iced. R o t a t e d P C A and N L P C A are b o t h generalisations of P C A , but they address sub-s t an t i a l ly different issues. R o t a t e d P C A allows the de tec t ion o f loca l i sed v a r i a b i l i t y i n the da ta , bu t s t i l l v a r i ab i l i t y that is l inear . Consequent ly , I D R P C A app rox ima t ions share w i t h I D P C A approx ima t ions the p r o b l e m that they can o n l y descr ibe s t and ing va r i ab i l -i ty . O n the other h a n d , N L P C A is concerned w i t h de tec t ing and charac te r i s ing nonl inear s t ruc ture i n da t a sets. T h e general ID N L P C A a p p r o x i m a t i o n cannot be expressed as a separable func t ion of space and t ime such as (2.9), so i t is able to descr ibe v a r i a b i l i t y more general t h a n s t and ing osci l la t ions . Because of the lack of a "s imple s t ruc ture c r i t e r i on" i n the cost func t ion (2.11), N L P C A also maximises g loba l var iance, and its results w i l l p r e s u m a b l y suffer to some degree f rom the same sens i t iv i ty to d o m a i n boundar ies tha t P C A does. Howeve r , this p r o b l e m is p re sumab ly less serious w i t h N L P C A because o f the absence of o r thogona l i ty constraints between different modes . I n fact, such const ra in ts cannot be n a t u r a l l y fo rmula ted for the nonl inear approx ima t ions . A fur ther general isa-t i o n o f N L P C A to encourage regional isa t ion of the a p p r o x i m a t i o n cou ld be i n t r o d u c e d b y m o d i f y i n g the cost func t ion (2.11) to inc lude a s imple s t ruc ture c r i t e r ion . S u c h a Chapter 2. Nonlinear Principal Component Analysis: Theory and Implementation 17 generalisation, while an interesting direction for future research, is beyond the scope of the present study. 2.3 Implementat ion of N L P C A Neural networks are powerful tools for function approximation. Given input and target data sets u(tn) and v ( i n ) , n = 1 , N , a neural network, denoted by Af, can be trained until Af(u(tn)) is an arbitrarily good approximation to v ( i n ) , if the number of neurons in the hidden layer is sufficiently large. That is, a network can always be built so that the total sum of squared errors < | | v ( t n ) - A / ( u ( t n ) ) | | 2 > (2.16) is as small as desired. Another important property of the neural network is that it generalises, that is, that given new data u(£w + i ) , v(£./v+i)> the network error on these data is about the same size as the errors over the training set: l l v ^ + i ) - Af(u(tN+1))\\2 ~< ||v(*n) - AA(u(* n))|| 2 > . (2.17) The two goals of minimising network error and maximising its ability to generalise are often incompatible, and a subtle balance must be struck between the two. This situation arises, for example, in the case when u(tn) and v ( i n ) are of the form u(*„) = z{tn) + en (2.18) v(t n ) = f(z( i n ) )+»7„ (2.19) where en and nn are noise terms, and it is desired that Af learn the deterministic relation-ship f between u(tn) and v(tn). In such a case, care must be taken to avoid allowing the network to fit the noise as well. If Af is trained until it maps particular details of a given realisation of u(tn) into those of a given realisation of v ( i n ) , and thus will not generalise, Chapter 2. Nonlinear Principal Component Analysis: Theory and Implementation 18 the network has overfit. An overfit network is not truly representative of the structure underlying a data set. The avoidance of overfitting by neural networks is a primary issue in their implementation (Finnoff et al., 1993; Yuval, 1999). To avoid overfitting, a simple technique called early stopping has been used. Because the neural network is nonlinear in the model parameters, they must be determined iter-atively in a process referred to as training. In early stopping, the training is terminated before the error function is minimised, according to a well-defined stopping criterion. In essence, the idea behind early stopping is that the training is allowed to continue sufficiently long to fit the structure underlying the data, but not long enough to fit the noise. The strategy employed was to hold aside a fraction of the data points, selected randomly, in a validation set not used to train the network. While network training proceeded, the network performance on the validation set was monitored, and training was stopped when this performance began to degrade, or after a fixed large number of iterations, whichever came first. The use of early stopping along with the deterministic conjugate gradient algorithm to minimise the error function confers on the training results a degree of sensitivity to the network parameters (the weights and biases) used to initialise the iterative training procedure. This sensitivity is exacerbated by the possible existence of multiple minima in the error function (2.11). To address this problem, an ensemble of training runs starting from different, randomly chosen, initial parameter values was carried out for each analysis performed. The training results from these runs were examined, and those members of the ensemble for which the final error over the validation set was greater than that over the training set were discarded. The remaining members of the ensemble are referred to as candidate models. The number of neurons L in the encoding and decoding layers determines the class of functions that Sf and f can take. As L increases, there is more flexibility in the Chapter 2. Nonlinear Principal Component Analysis: Theory and Implementation 19 forms of Sf and f, but the model also has more parameters, implying both that the error surface becomes more complicated and that the parameters are less constrained by data. Consequently, for L large, the scatter among the candidate models can be large, as measured by the normalised mean square distance (NMSD). The NMSD between approximations X^^(tn) and X^2^(in) is defined as NMSD = " (2-20) < l l x l l > This statistic was introduced in Monahan (1999), in which it was found that NLPCA approximations for which the NMSD was less than about 2% were essentially indistin-guishable. In the end, the number L of neurons used in the encoding and decoding layers was the maximum such that the NMSD between NLPCA approximations to X(i n) in the candidate model set was less than 5%. This threshold value of NMSD was chosen on the basis of experience and intuition; there is no existing rigorous sampling theory for this test statistic. In other words, for any given analysis, the value of L used in the NLPCA network is the largest that produces a robust set of candidate models. The early stopping technique ensures that the NLPCA approximation is robust to the introduction of new data, and the existence of a set of similar candidate models (as measured by NMSD) ensures that the approximation is robust with respect to the initial parameter values used in the training. Finally, once a maximal L was determined and a set of candidate models obtained, the model selected as "the" NLPCA approximation was the one with the highest Fraction of Explained Variance (FEV): < l |X| | 2 > which is a meaningful statistic because NLPCA partitions variance as described in equa-tion (2.13). Alternately, the candidate model selected was that which minimised the Chapter 2. Nonlinear Principal Component Analysis: Theory and Implementation 20 Fraction of Unexplained Variance (FUV): F U V = 1 - F E V (2.22) Typically, the F E V differed little between candidate models. 2.4 D y n a m i c a l Significance of Low-Dimensional Approximat ions The lower-dimensional approximations obtained by feature extraction methods are sta-tistical in nature. A natural question concerns the relation they bear to the dynamics of the system under investigation. North (1984) considered the dynamical system Z^(M) = C(x,0 (2.23) where L is a space-time linear differential operator and C(x, £) represents stochastic forc-ing, and concluded that the EOFs of ip(x,t) coincide with its dynamical modes if and only if the operator L is normal (i.e. it commutes with its adjoint) and the noise £ is white in space and stationary in time: £(C(x!,*i)C(x2, t2)) = gilh - t2\)5{^ - x 2 ) (2.24) where g(r) is a lag autocovariance function. These requirements greatly restrict the class of dynamical systems for which the connection between the statistics and dynamics is clear-cut, as it does not even include the geophysically-relevant class of linear models for which non-modal variance growth is important (Penland and Sardeshmukh, 1995; Farrell and Ioannou, 1996; Whitaker and Sardeshmukh, 1998). Mo and Ghil (1987) attempted to assess the connection between the results of E O F analysis and the dynamics of the system under consideration in the context of nonlinear dynamics. They concluded that, "the dynamical interpretation of EOFs is their pointing Chapter 2. Nonlinear Principal Component Analysis: Theory and Implementation 21 from the time mean to the most populated regions of the system's phase space". If the distribution of the system in phase space is Gaussian, then this direction will he along the distribution's principal axis. On the other hand, if the distribution is not Gaussian, but is characterised by an inhomogeneous density with a small number of local extrema corresponding to preferred regimes of behaviour (associated, e.g., with slow manifolds of the dynamics (Ghil and Childress, 1987)), then the leading EOFs will characterise the distribution of these extrema. Unlike PCA, NLPCA approximations are not characterised by unique "directions" through phase space, but rather by curved surfaces. Interpretation of the results of an NLPC analysis must then be couched in rather different terms than that of PCA. A natural interpretation, using the language of dynamical systems theory is that NLPCA approximations characterise the attractor of the system under consideration, as was noted by Kirby and Miranda (1994). Many naturally occurring systems possess a stable attrac-tor, which is a manifold typically of smaller dimension than the Cartesian phase space in which it must be embedded to preclude spurious self-intersections (Ott, 1993). These at-tractors are generally complicated surfaces of non-integer dimension; only in very special cases are they planar. Because PCA produces an orthogonal coordinate system in the phase space, it can at best eliminate the degrees of freedom in the data associated with noise, thereby producing an embedding space for the attractor. Nonlinear PCA, however, can characterise the curved structures associated with these attractors (although it is re-stricted to approximations of integer dimension), and produce what Kirby and Miranda denote the "optimal coordinates" of the system. The results of NLPCA are thus best considered as characterising the underlying attractor of the system under consideration, as will be illustrated in the following Chapter when an NLPCA approximation of the Lorenz attractor is constructed. The estimation of the attractor underlying a data set Chapter 2. Nonlinear Principal Component Analysis: Theory and Implementation 22 provides insight into the governing physics. Knowledge of the dominant forms of vari-ability in the data can act as a guide to reductionism, helping to develop mechanistic models of the system under investigation, which then provide insight into the underlying dynamics. Chapter 3 Nonl inear P r inc ipa l Component Analysis of the Lorenz At t r ac to r 3.1 Introduct ion A s a p r e H m i n a r y inves t iga t ion in to the i m p l e m e n t a t i o n of N L P C A , consider a syn the t i c d a t a set cons is t ing of a set of points sampled f rom the L o r e n z a t t r ac to r (Lo renz , 1963). T h i s f ami l i a r object is the a t t rac tor on w h i c h (as t —> oo) l ive solut ions x ( i ) of the sys t em of coup led nonl inear O D E s x\ — —axx-\-ax2 (3.1) x2 = —XxXz + rxi — x2 (3-2) x3 = xix2 — bx3, (3.3) w i t h pa ramete r values r = 28,6 = 8 /3 , and a = 10. Syn the t i c d a t a is used to test the N L P C A m e t h o d because • by add ing r a n d o m noise to the signal x ( £ ) , the sens i t iv i ty of the m e t h o d to noise leve l c an be tested, and • the s t ruc ture of the L o r e n z a t t rac tor is we l l -known , and of sufficiently low d i m e n s i o n that v i sua l i sa t ion of results is s t ra ight forward. F i g u r e 3.1 displays the project ions of the Lorenz a t t rac tor (as de t e rmined by n u m e r i c a l i n t eg ra t ion of equat ions (3.1)-(3.3)) on the (£1,2:2) , (#2,2:3), and (2:3,2:1) planes. It tu rns out tha t the L o r e n z a t t rac to r is f racta l , w i t h a box-coun t ing d imens ion o f about 2.04 23 Chapter 3. Nonlinear Principal Component Analysis of the Lorenz Attractor 24 Chapter 3. Nonlinear Principal Component Analysis of the Lorenz Attractor 25 (Berliner, 1992). However, inspection of the butterfly-shaped attractor indicates that a one-dimensional U-shaped curve passing through the centres of the two lobes should explain a substantial fraction of the variance. To produce a dataset of size similar to that typically encountered in climate appli-cations (e.g. 600 points in length, corresponding to 50 years of monthly data), the data displayed in Figure 3.1 were subsampled at uniform intervals in time to produce a 3D time series 584 points in length, to be denoted z(tn). The subsampled data set is dis-played in Figure 3.2. Clearly, it retains the gross structure of the original attractor. To investigate the effects of noise on the NLPCA results, constructed the datasets x(tn) = z{tn) + Ve{tn) (3.4) were constructed, where e(tn) is a 584-point 3D series of Gaussian iid random deviates with zero mean and unit standard deviation, and 77 is a tunable parameter for the noise level. This noise is added in an effort to model measurement error; the stochasticity is not intrinsic to the dynamics. 3.2 Mode l Building The early stopping algorithm described in the previous chapter was used to carry out the NLPC analysis of the Lorenz data. A validation set containing 30% of the data points was set aside, and network performance over this set was monitored as training progressed. Training was stopped when this validation set error started to increase, or after 500 iterations, whichever was the first to occur. 3.3 Results The ID PCA approximation to x(tn) when rj = 0 is displayed in Figure 3.3; it is a straight line passing through the centres of the two lobes of the attractor, and explains Chapter 3. Nonlinear Principal Component Analysis of the Lorenz Attractor 26 o cn co o co cn o cn 1 1 1 1 1 1 1 1 • • • • • •••••• • . . . ...... .v • • • • * . ; • • • *•• .'* -• . . . . • • » • • • •. •• • 1 1 1 1 • • • - • • • i i • i cn O I cn I o I cn I o I cn 2 o cn o cn I I cn I o I cn 1 o cn o ' • • • V." •# * .*! *• *•* ^ ..... • • Figure 3.2: As in Figure 3.1, for a subsample of 584 points. Chapter 3. Nonlinear Principal Component Analysis of the Lorenz Attractor 27 60% of the var iance of x ( i n ) . F i g u r e 3.4 displays the I D N L P C A a p p r o x i m a t i o n to x ( i „ ) . A s an t i c ipa ted , it is a U - s h a p e d curve passing t h rough the m i d d l e o f the da ta . T h i s curve expla ins 76% of the variance, an improvemen t of 16% over the P C A resul ts . C l e a r l y , the I D N L P C A a p p r o x i m a t i o n is subs tan t ia l ly closer to x ( i „ ) t h a n is the I D P C A a p p r o x i m a t i o n . T h e ne twork used to per form the N L P C A h a d 3 i n p u t and o u t p u t neurons for xx, x2, and x3, 1 bot t leneck neuron , and L = 3 neurons i n the encod ing a n d decod ing layers. E x p e r i m e n t a t i o n ind i ca t ed that the N L P C A results i m p r o v e d us ing L = 3 over us ing L = 2 (ie, the former h a d a smaller F U V t h a n the l a t t e r ) , bu t tha t for L > 3, the results d i d not improve . T u r n i n g now to the issue of robustness o f resul ts , the N M S D be tween 6 different I D N L P C A curves (not shown) varies be tween 0.5% and 2%. These curves differ on ly i n s m a l l details, and agree i n the i r essential s t ruc ture w i t h the curve s h o w n i n F i g u r e 3.4. T h u s , the I D N L P C A a p p r o x i m a t i o n to x ( t n ) d i sp l ayed i n F i g u r e 3.4 is a robus t result tha t improves subs tan t ia l ly over the I D P C A a p p r o x i m a t i o n . F igu res 3.3 and 3.4 i l lus t ra te the s t rength of N L P C A re la t ive to P C A . It can be proven a n a l y t i c a l l y ( L i i c k e , 1976) that x3 is uncor re la ted w i t h X\ a n d x2. Consequen t ly , the covar iance m a t r i x takes on the fo rm r 1 2 0 \ r-21 r 2 2 0 I 0 0 T33 where Y\2 = T 2 i because the covariance m a t r i x is s y m m e t r i c . O n e eigenvector o f Y thus lies a long the x3 axis whi le the other two span the xxx2 p lane. O n e o f the la t t e r appears as the lead ing P C A a p p r o x i m a t i o n d isp layed i n F i g u r e 3.3. In the P C A desc r ip t ion o f the L o r e n z a t t rac tor , va r i ab i l i t y along x3 appears i n a separate mode f r o m v a r i a b i l i t y i n the xxx2 p lane . However , whi le va r i ab i l i t y along x3 is uncor re la ted w i t h v a r i a b i l i t y i n the xxx2 p lane, i t is clear u p o n inspec t ion of F i g u r e 3.2 that these are not independent modes of var iab i l i ty . Indeed, large values of \xi\ are associated w i t h large values o f x3. Chapter 3. Nonlinear Principal Component Analysis of the Lorenz Attractor 28 i o o ho o o CS) o co IV) o . • • • • i o bo i i o i o i o So o ho o o b) o bo 0 Q • • ^ - • • • • . • •» i o I bo I o cn i o i o So o ho o 4*. o CS) o bo O • • Figure 3.3: Noise-free Lorenz data for a subsample of 584 points and their ID PCA approximation, projected as in Figure 3.1 (note axes have been rescaled). The dots represent the original data points, the open circles represent points of the approximation. Figure 3.4: As in Figure 3.3, but for the ID N L P C A approximation. Chapter 3. Nonlinear Principal Component Analysis of the Lorenz Attractor 30 T h e I D N L P C A a p p r o x i m a t i o n characterises this dependence between x\ a n d x 3 , w h i l e also descr ib ing the covar iab i l i ty of X i and x2 descr ibed by the I D P C A a p p r o x i m a t i o n . T h e power of N L P C A is tha t i t can characterise covar iab i l i ty be tween variables tha t are uncor re l a t ed , bu t not independent , w h i c h P C A cannot . F i g u r e 3.5 displays the 2 D P C A a p p r o x i m a t i o n of the d a t a x ( £ n ) w h e n 77 = 0; th is surface expla ins 95% of the var iance. T h e 2 D P C A a p p r o x i m a t i o n is a flat sheet tha t character ises the s t ruc ture of the da t a as pro jec ted i n the (xi, X3) a n d (x2, x3) planes w e l l bu t fails to reproduce the s t ruc ture seen i n the p ro jec t ion on the (aji, x2) p lane . I n F i g u r e 3.6, t he resul t of a 2 D n o n m o d a l N L P C A of x ( i n ) is shown. T h i s surface exp la ins 99 .5% of the var iance , i m p l y i n g an order of magn i tude r educ t ion i n F U V as c o m p a r e d to the P C A resul t . T h e ne twork used to pe r fo rm the N L P C A h a d 2 neurons i n the bo t t l eneck layer and L = 6 neurons i n the encoding and decoding layers. It was found that decreasing L be low 6 also decreased the f rac t ion of var iance exp la ined , and increas ing i t above L = 6 h a d l i t t l e effect u p o n the results . T h e 2D n o n m o d a l N L P C A resul t is h i g h l y robus t : a sample of 4 N L P C A models (not shown) has N M S D between curves o f at most 0 .1%. A s w i t h the I D example considered above, the N L P C A a p p r o x i m a t i o n is a subs tan t i a l ly be t te r a p p r o x i m a t i o n to the o r ig ina l da t a set t h a n is the P C A a p p r o x i m a t i o n . C o n s i d e r now a dataset x ( i n ) ob ta ined f rom equa t ion (3.4) w i t h 77 = 2.0. T h e I D P C A a p p r o x i m a t i o n to x ( i n ) (not shown) explains 59% of the var iance . T h e I D N L P C A a p p r o x i m a t i o n (F igu re 3.7), expla ins 74% of the var iance. T h e curve i n F i g u r e 3.7 is ve ry s imi l a r to tha t shown i n F i g u r e 3.4 for the 77 = 0 case; the two- lobed s t ruc tu re o f the d a t a is s t i l l manifest at a noise level of 77 = 2.0, and the N L P C A is able to recover i t . A d d r e s s i n g again the issue of robustness of results , 6 different N L P C A app rox ima t ions to x(tn) were found to have N M S D va ry ing f rom 0.5% to 3%. These 6 curves agree i n the i r essential detai ls , a l though the set displays more va r i ab i l i t y be tween members t h a n d i d the cor responding set for 77 = 0. F i g u r e 3.8 shows the results of a 2 D n o n m o d a l Chapter 3. Nonlinear Principal Component Analysis of the Lorenz Attractor 31 F i g u r e 3.5: Noise-free Lo renz da ta for a subsample of 584 points and the i r 2 D P C A a p p r o x i m a t i o n . T h e dots represent the o r ig ina l da t a poin ts , a n d the open circles the poin ts of the a p p r o x i m a t i o n . Chapter 3. Nonlinear Principal Component Analysis of the Lorenz Attractor 32 Chapter 3. Nonlinear Principal Component Analysis of the Lorenz Attractor 33 I o o o o 0) O 00 CD cn i o I 03 I o 05 I o I o ro 1 o o ro o o b o bo Figure 3.7: Lorenz data with noise level rj = 2.0 for a subsample of 584 points and its ID NLPCA approximation. Dots represent the data points and the open circles represent points of the approximation. Chapter 3. Nonlinear Principal Component Analysis of the Lorenz Attractor 34 N L P C A pe r fo rmed on this da ta set. T h i s explains 97.4% of the var iance , i n contras t to the 2 D P C A a p p r o x i m a t i o n (not shown), w h i c h explains 94.2%. T h u s , the F U V of the 2 D n o n m o d a l N L P C A a p p r o x i m a t i o n is about ha l f tha t of the 2 D P C A a p p r o x i m a t i o n . These results too are robust ; the N M S D between different 2 D n o n m o d a l N L P C A models was about 0.2%. T h e 2 D n o n m o d a l N L P C A a p p r o x i m a t i o n is again an i m p r o v e m e n t over the 2 D P C A a p p r o x i m a t i o n , but not by such a subs tan t ia l m a r g i n as was the case w h e n x] = 0. T h e noise-free L o r e n z a t t rac tor is very near ly two-d imens iona l , so the 2 D n o n m o d a l N L P C A was able to account for a lmost a l l of the var iance . T h e a d d i t i o n of noise ac ted to smear out this fine f rac ta l s t ruc ture and made the d a t a c l o u d m o r e 3 d imens iona l . T h e 2 D N L P C A app l ied to this quas i -3D s t ruc ture c o u l d not p roduce as close an analogue as was the case when rj = 0. A t a noise s t rength of rj = 5.0, the da ta set x ( i n ) s t i l l has a d iscernib le two- lobed s t ruc ture , bu t i t is subs tan t ia l ly obscured. T h e I D P C A a p p r o x i m a t i o n (not shown) expla ins 54% of the var iance, whereas the I D N L P C A a p p r o x i m a t i o n (shown i n F i g u r e 3.9) expla ins 65%. A g a i n , the I D N L P C A a p p r o x i m a t i o n to x ( i n ) is q u a l i t a t i v e l y s imi l a r to tha t ob t a ined i n the noise-free case (F igure 3.4). T h e rj = 0 and rj = 5.0 I D N L P C A a p p r o x i m a t i o n s differ at the ends of the curves. P r e s u m a b l y , the s t ruc ture represented i n the former is somewhat washed out by noise i n the la t ter . F o u r different N L P C A curves for the d a t a ob ta ined w i t h n — 5.0 share the i r gross features, bu t differ fa i r ly subs tan t i a l ly i n de ta i l . I n this case, the N M S D between curves varies be tween 5% a n d 10%. T h e 2 D n o n m o d a l N L P C A a p p r o x i m a t i o n to these da t a (not shown) expla ins 90% o f the var iance , o n l y s l igh t ly more t h a n the 2 D P C A a p p r o x i m a t i o n , w h i c h expla ins 88% of the var iance. F i n a l l y , at a noise level of rj = 10.0 the two- lobed s t ruc ture of x(tn) is no longer obvious , a n d the d a t a c loud appears as a fa i r ly homogeneous, vaguely e l l ipso ida l b lob . T h e results of N L P C A by this noise level are no longer robus t , t end ing to be a s y m m e t r i c , Chapter 3. Nonlinear Principal Component Analysis of the Lorenz Attractor 35 i o cn o o cn O o ho o o O) o co O-' < o CX * co 0 F i g u r e 3.8: A s i n F i g u r e 3.7, bu t w i t h the 2 D n o n m o d a l N L P C A a p p r o x i m a t i o n o f the L o r e n z d a t a w i t h noise level n = 2.0. Chapter 3. Nonlinear Principal Component Analysis of the Lorenz Attractor 37 convo lu t ed curves . A t this noise level , then , N L P C A seems no longer to be a useful t echn ique for charac ter i s ing low-d imens iona l nonl inear s t ruc ture o f the d a t a set, prec ise ly because the a d d i t i o n o f noise has des t royed th is s t ruc ture . 3.4 Conclusion T h e resul ts con ta ined i n th is chapter demons t ra te tha t N L P C A is able to p r o d u c e l o w -d i m e n s i o n a l app rox ima t ions to mul t iva r i a t e da ta sets tha t are more representa t ive o f the d a t a t h a n the cor responding P C A approx ima t ions . T h e I D N L P C A a p p r o x i m a t i o n to a d a t a set s ampled f rom the Lo renz a t t rac tor explains 76% of the var iance , i n contras t to 60% e x p l a i n e d by the I D P C A a p p r o x i m a t i o n , and characterises the two- lobed s t ruc ture of the da ta . A 2 D n o n m o d a l N L P C A a p p r o x i m a t i o n expla ins 99.5% of the var iance , where the 2 D P C A a p p r o x i m a t i o n explains 95%. A s the b o x - c o u n t i n g d i m e n s i o n o f the a t t r ac to r f r o m w h i c h the da ta is sampled is 2.04, the 2 D nonl inear a p p r o x i m a t i o n is able to cap ture a lmost the entire s t ruc ture of the data ; because the a t t r ac to r is e m b e d d e d i n 3? 3 , the 2 D P C A a p p r o x i m a t i o n cannot do this . W i t h the a d d i t i o n o f G a u s s i a n noise o f s m a l l to modera te s t rength to the data , N L P C A remains super ior to P C A i n i ts a b i l i t y to detect l ow-d imens iona l s t ruc ture . A s the noise level increases, the i m p r o v e m e n t of N L P C A over P C A decreases, u n t i l eventua l ly the noise dominates the s ignal a n d N L P C A cannot i m p r o v e u p o n P C A . Tab le 3.1 presents a s u m m a r y o f these resul ts . C o n s i d e r a t i o n of syn the t ic da ta was useful because i t is o f low d imens ion and easi ly v i sua l i sed , a n d because the sens i t iv i ty of the results of N L P C A to noise l eve l can be as-sessed t h r o u g h m a n i p u l a t i o n of the s t rength of the noise. R e a l c l ima te da t a sets however are o f a h i g h d imens iona l i t y and o f a fixed noise leve l , and i t is the p o t e n t i a l o f N L P C A to p roduce robus t , en l ightening approx imat ions to c l imate da t a tha t determines the u t i l i t y of the m e t h o d . T h e analysis of such da ta sets is the subject o f the next two C h a p t e r s . Chapter 3. Nonlinear Principal Component Analysis of the Lorenz Attractor 38 I D 2 D P C A N L P C A P C A N L P C A 77 = 0 60 76 95 99.5 r7 = 2 59 74 94.2 97.4 rj = 5 54 65 88 90 Tab le 3.1: Percentages of var iance exp la ined by the I D and 2 D N L P C A a p p r o x i m a t i o n s to the L o r e n z d a t a for the three noise levels 77 considered. Chapter 4 Nonlinear Principal Component Analysis of Tropical Indo-Pacific Sea Surface Temperature and Sea Level Pressure 4.1 Introduction In t e r annua l va r i ab i l i t y of the E a r t h ' s c l imate sys tem is d o m i n a t e d by the t r o p i c a l Pac i f i c bas in -wide phenomenon k n o w n as E l N i n o and the Sou the rn O s c i l l a t i o n ( E N S O ) ( P h i -lander , 1990). T h i s phenomenon is character ised by a l t e rna t ing per iods o f anoma lous ly w a r m or co ld water i n the eastern equa tor ia l Pac i f i c , a l te rna te ly weaken ing or s t rength-en ing the z o n a l sea surface t empera tu re ( S S T ) gradient across the Pac i f i c ocean . Thes e phases of the phenomenon are referred to respect ive ly as E l N i n o and L a N i n a . A s s o c i a t e d w i t h these changes i n S S T are p ronounced changes i n the zona l gradient o f t h e r m o c l i n e d e p t h a n d sea surface height . In the a tmosphere , E l N i n o ( L a N i n a ) events are associated w i t h a s lackening (s t rengthening) of the zona l ly -or ien ted W a l k e r c i r c u l a t i o n , i m p l y i n g a r e d u c t i o n (increase) i n w i n d stress app l ied to the ocean surface associated w i t h the east-e r ly T rade W i n d s , and an eastward (westward) shift i n the region of deep convec t ion . V a r i a b i l i t y of the W a l k e r c i r cu l a t i on manifests i tself as an east-west d ipo la r f l uc tua t i on i n sea leve l pressure ( S L P ) k n o w n as the Sou the rn O s c i l l a t i o n . A s was o r ig ina l l y desc r ibed by B je rknes (1969), these changes i n a tmospher ic c i r cu l a t i on feed back o n the ocean a n d reinforce the o r ig ina l S S T anomalies . In the late 1980's, a m e c h a n i s m was p roposed for the t r ans i t i on between E l N i n o and L a N i n a events that i nvoked the d y n a m i c s of oceanic 39 Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 40 ba roc l i n i c waves i n the so-called "equator ia l waveguide" (Suarez a n d Schopf, 1988; B a t -t i s t i a n d H i r s t , 1989). D e n o t e d the "delayed osc i l la tor" m e c h a n i s m , this has become the dominan t p a r a d i g m for the negative feedback that te rminates E N S O events ( B a t t i s t i a n d Sarach ik , 1995). E N S O va r i ab i l i t y is aper iodic , w i t h power p r i m a r i l y i n the 4-7 year b a n d (Tangang et a l . , 1998). M o d e l s i nvoked to exp l a in the ape r iod i c i t y o f the v a r i a b i l i t y range f r o m the s tochast ic forc ing of a l inear sys tem (Pen l and , 1996) to l o w - d i m e n s i o n a l d y n a m i c s of a chaot ic sys tem ( J i n et a l , 1996); the ac tua l character o f E N S O d y n a m i c s is s t i l l a subject of some debate, a l though recent evidence favours the former of the above mode ls ( P e n l a n d et a l , 1999). W h i l e the p h y s i c a l mechanisms p roduc ing E N S O are thought to be m a i n l y confined to the equa to r i a l Pac i f i c , i ts effects are g loba l i n scale (Ph i l ande r , 1990; T r e n b e r t h et a l . , 1998). In consequence, forecasts of E N S O va r i ab i l i t y have been a t t e m p t e d by a n u m b e r o f researchers, and th roughout the 1990s i t has been the p a r a d i g m a t i c p r o b l e m of c l ima t e p r e d i c t i o n ( B a r n s t o n et a l . , 1994; B a r n s t o n et a l . , 1999). E N S O d y n a m i c s has qui te ce r t a in ly been the most in tens ively s tud ied p r o b l e m i n c l ima te phys ics for the last decade. E N S O va r i ab i l i t y is often diagnosed f rom observations us ing l inear s t a t i s t i ca l tools , i n pa r t i cu l a r P C A ; the dominan t E N S O signal i n t r o p i c a l Pac i f i c S S T a n d S L P is u sua l ly iden t i f ied w i t h the l ead ing P C A a p p r o x i m a t i o n to these d a t a sets. I n th is chapter , N L P C A is appl ied to c l ima t i c da t a sets relevant to E N S O va r i ab i l -i t y : t r o p i c a l Pac i f i c O c e a n sea surface t empera tu re and t r o p i c a l Indo-Pac i f i c sea leve l pressure. In the case of S S T , N L P C A is able to p roduce one- and two-d imens iona l ap-p r o x i m a t i o n s tha t are o f greater f idel i ty to the o r ig ina l da t a t h a n the cor respond ing one-a n d two-d imens iona l P C A approx imat ions . In pa r t i cu la r , the I D S S T N L P C A describes E N S O v a r i a b i l i t y i n a manner that characterises the a s y m m e t r y i n spa t i a l d i s t r i b u t i o n of t empera tu re anomalies between E l N i n o and L a N i n a events , w h i c h are t r ea ted s y m m e t -r i c a l l y i n the I D P C A a p p r o x i m a t i o n . T h e improvemen t of the N L P C A a p p r o x i m a t i o n s Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 41 over P C A are more modes t , but s t i l l notable , i n the case of S L P . In C h a p t e r 3, I considered the app l i ca t ion of N L P C A to syn the t ic d a t a sets of suf-f ic ien t ly low d imens ion , and of sufficiently low noise leve l , tha t the i r u n d e r l y i n g low-d i m e n s i o n a l s t ruc ture was manifest . Non l inea r P C A was able to recover this s t ruc ture , even i n the presence of modera te noise levels. F u n d a m e n t a l l y , however , N L P C A is o n l y of p r a c t i c a l use i n c l ima te research i f i t is able to robus t ly character ise l o w - d i m e n s i o n a l s t ruc tu re i n rea l da t a sets ar is ing f rom the c l imate sys tem, and i m p r o v e u p o n the results o b t a i n e d by t r a d i t i o n a l l inear methods . I show here that this is indeed the case, a n d thereby demons t ra te the po ten t i a l u t i l i t y of N L P C A i n the analysis of c l i m a t i c d a t a sets. 4.2 D a t a and M o d e l Bu i ld ing T h e S S T da t a cons idered consist of month ly -averaged N O A A sea surface t empera tu res for the t r o p i c a l Pac i f i c Ocean . T h e da ta are on a 2 ° x 2 ° g r i d f r o m 19S to 1 9 N , a n d f rom 125E to 6 9 W , and span the pe r iod f rom J a n u a r y 1950 to D e c e m b e r 1998. T h i s d a t a set was p r o d u c e d us ing the P C A - b a s e d in t e rpo la t ion m e t h o d deve loped b y S m i t h et a l . (1996). A c l ima to log i ca l annua l cycle was ca lcu la ted by averaging the d a t a for each ca lendar m o n t h , and m o n t h l y S S T anomalies ( S S T A ) were defined re la t ive to this a n n u a l cyc le . T h e S L P d a t a were C O A D S month ly-averaged sea leve l pressure ( S L P ) over the t r o p i c a l Indo-Pac i f i c area (Woodru f f et a l , 1987 ) on a 2 ° x 2 ° g r i d f rom 27S to 1 9 N , and f rom 3 1 E to 6 7 W , cover ing the pe r iod f rom J a n u a r y 1950 to D e c e m b e r 1998. T h e a n n u a l cyc le was r emoved i n the same fashion as for the S S T da t a to p roduce sea l eve l pressure anomalies ( S L P A ) . T h e S L P A field was then smoo thed i n t i m e us ing a 3 -mon th r u n n i n g m e a n filter and a 1-2-1 filter i n each spa t ia l d i r ec t ion . Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 42 In the analysis of both the SSTA and SLPA data, the early stopping algorithm de-scribed in the previous chapter was used. In all analyses, 20% of the data was held aside in a validation set, for which network performance was monitored as training proceeded. Training was stopped when this performance began to degrade, or after 5000 iterations, whichever came first. It was found that increasing the maximum number of iterations beyond 5000 did not affect the results of the analysis. 4.3 Tropical Pacific Sea Surface Temperature To render the NLPCA problem tractable, the SSTA data set was pre-processed by pro-jecting it on the space of its first 10 EOF modes {e^}^, in which 91.4% of the total variance is contained. Doing so takes advantage of the data compression aspect of PCA, which is a feature distinct from feature extraction, for which NLPCA shall be used. Such pre-processing of data to reduce the problem to a manageable size is common in rotated PCA (Barnston and Livezy, 1987) and in statistical forecasting (Barnston, 1994; Tangang et al., 1998). The first 3 EOF spatial patterns of SSTA are displayed in Figure 4.1; these explain, respectively, 57.6%, 10.9%, and 6.8% of the total SSTA variance. A scatterplot of the two leading principal component time series is shown in Figure 4.2. It can also be considered to be a plot of the projection of the data into the plane spanned by the first two SSTA EOF modes. The time series corresponding to these two PCA modes are uncorrelated, but they are clearly not independent] the distribution of the data appears to be markedly non-Gaussian. Indeed, Figure 4.2 indicates that there is an inverted U-shape underlying the data, such that both strongly positive and negative values of the first PCA time series are associated with negative values of the second. Physically, this coupling of the PCI and PC2 time series describes the fact that the most positive SST anomalies during an average El Nino event lie closer to the eastern boundary of the Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 43 SST EOF 1 150E 180E 150W 120W SST EOF 2 150E 180E 150W 120W 90W SST EOF 3 J 1 \—L C ±J 150E 180E 150W 120W 90W F i g u r e 4.1: S p a t i a l pa t terns of the first three S S T A E O F pat terns , no rma l i s ed to un i t m a g n i t u d e . T h e contour in t e rva l is 0.02, the zero contour is i n b o l d , and negat ive contours are dashed. Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 44 .41 1 1 1 i i i I - 3 - 2 - 1 0 1 2 3 4 PC1 F i g u r e 4.2: Sca t te rp lo t o f S S T A da t a p ro jec ted onto the plane spanned by the first two E O F pa t te rns . Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 45 Pac i f i c t h a n do the coldest anomalies du r ing an average L a N i n a event. I sha l l r e t u r n to th is po in t la ter . C o n s i d e r first a m o d a l N L P C A decompos i t ion of this S S T A da ta . M o d e 1, found us ing a ne twork w i t h L — 4 nodes i n the encoding and decoding layers and a single neu ron i n the bo t t l eneck layer , expla ins 69 .1% of the variance i n the 10-dimens ional E O F space, a n d thus 63 .3% of the var iance i n the t o t a l S S T A data , as c o m p a r e d to 57.6% e x p l a i n e d by the I D P C A a p p r o x i m a t i o n . F o u r candida te models were ob ta ined f r o m an ensemble of 20; these models differed w i t h N M S D of at most 1%. P ro jec t ions of the first N L P C A m o d e onto the subspaces spanned by the S S T A E O F s (ei,e2), (e 2,e 3), (ei,e3), and (ei,e 2,e 3) are shown i n F i g u r e 4.3 (a)-(d), respect ively. A l l four projec t ions are shown because i t is difficult to unde r s t and the s t ructure of the N L P C A a p p r o x i m a t i o n f r o m a single p ro j ec t i on . T h i s d i f f icul ty is p a r t i c u l a r l y evident i n F i g u r e 4.3(b): the curve , v iewed edge-on, appears to be self-intersecting, when i n fact the other pro jec t ions demons t ra te this is not the case. F i g u r e 4.3(a) indicates that N L P C A mode 1 characterises the U-shaped s t ruc ture discussed i n the previous paragraph; N L P C A m o d e 1 is p r i m a r i l y a m i x t u r e o f P C A modes 1 and 2. Assoc i a t ed w i t h this mode is the s t andard i sed t i m e series sr{X(tn))- < s f > < (sT- < sr >)2 >1/2' ; cor respond ing to the ou tpu t of the single neuron i n the bot t leneck layer . A plot o f cti(tn) appears i n F i g u r e 4.4(a). T h i s t i m e series bears a s t rong resemblance to the N i n o 3.4 t i m e series (defined as the average S S T A over a box f rom 7S to 7 N , a n d f r o m 1 1 9 W to 1 7 1 W ) d i sp layed i n F i g u r e 4.4(b); the corre la t ion coefficient between the two series is 0.88. In contras t to P C A , no single spa t ia l pa t t e rn is associated w i t h any g iven N L P C A Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 46 PC3 PC2 i i i i i i i i i i i p p o o o o p o o o o o o o o o o o F i g u r e 4.3: Sca t te rp lo t of S S T A da ta (points) and S S T A N L P C A mode 1 a p p r o x i m a t i o n (open circles) p ro jec ted onto the planes spanned by E O F s (a) ( e i , e 2 ) , (b) (e2,e3), (c) ( e i , e 3 ) . (d) shows a scat terplot of the I D N L P C A a p p r o x i m a t i o n p ro j ec t ed in to the subspace (e1} e 2 , e 3 ) . F i g u r e 4.4: (a) P l o t of ai(tn) = Sf(X.(tn)), the s tandardised t i m e series associa ted w i t h S S T A N L P C A mode 1. (b) P l o t of the N i n o 3.4 i n d e x normal i sed to un i t var iance . Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 48 mode. The approximation X, however, provides a sequence of patterns that can be visu-alised cinematographically. This cinematographic interpretation is implicit in traditional PCA: the ID PCA approximation X(i n) = (ei -X(in))ei describes the evolution in time of a standing oscillation. This oscillation has a fixed spatial structure with an amplitude that varies in time. The more general approximation X(i„) = ( f os f ) (X(£ n ) ) , with Sf and f nonlinear, is not so constrained, and can characterise more complex lower-dimensional variability. There is no a priori reason to expect the optimal ID approximation to a spa-tial field to be a standing wave - but standing waves are the only such approximations that traditional PCA can produce. The power of NLPCA lies in its ability to characterise more general lower-dimensional structure. Figure 4.5 displays maps characterising the first NLPCA mode corresponding to values of the time series ax = -3.5,-1.5,-0.75,-0.25,0.25,0.75,1.5,3.5. These values were chosen to provide a representative sample of spatial structures associated with the NLPCA approximation. Clearly, NLPCA mode 1 describes the evolution of average ENSO events, in contrast with PCA mode 1, which describes only the standing oscillation associated with average ENSO variability. This difference between NLPCA and PCA modes 1 results from the spatial asymmetry between the average warm and cold phases of ENSO. In particular, warm events described by NLPCA mode 1 display the strongest anomalies near the Peruvian coast, whereas the cold events are strongest near 150W. This asymmetry in the evolution of NLPCA mode 1 arises because NLPCA mode 1 mixes PCA modes 1 and 2: for both El Nino and La Nina events, the PCA mode 2 spatial map (Figure 4.1(b)) enters into the NLPCA mode 1 approximation with the same (negative) sign. This spatial asymmetry between average El Nino and La Nina events is manifest in a composite study. Figure 4.6(a) is a composite of NDJ (November, December, January) Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 49 (a) 150E 180E 150W 120W 90W (c) 150E 180E 150W 120W 90W (9) 150E 180E 150W 120W 90W (b) 150E 180E 150W 120W 90W (f) 150E 180E 150W 120W 90W 150E 180E 150W 120W 90W Figure 4.5: Sequence of spatial maps characterising SSTA N L P C A mode 1 for (a) ax = -3.5 (b) ct1 = -1.5 (c) a : = -.75 (d) a : = -.25 (e) cxx = .25 (f) a i = 0.75 (g) a : = 1.5 (h) a i = 3.5. Contour interval: 0.5°C. Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 50 averaged SSTA for those years in which the NDJ Nino 3.4 index is greater than one standard deviation above the long-term average; Figure 4.6(b) is the same for those NDJ for which Nino 3.4 is less than one standard deviation below the long-term average. This averaging period was used for the composites because NDJ displays the largest variance of all 3-month seasons. These two maps correspond to the SSTA patterns of an average El Nino and an average La Nina, respectively. Note that, consistent with the maps corresponding to the ID NLPCA approximation (Figure 4.5), the largest SST anomalies tend to be located in the central Pacific during average La Nina events and in the eastern Pacific during average El Nino events. This asymmetry in the composite fields was previously noted by Hoerling et al. (1997). The symmetric component of the composite El Nino and La Nina maps, as defined in Appendix B, is displayed in Figure 4.6(c). This map, which in a rough sense characterises the pattern in the composites that is related nonlinearly to the Nino 3.4 time series, bears a strong resemblance to SST EOF mode 2 (Figure 4.1(b)). In fact, the spatial correlation between the two maps is -0.90. The antisymmetric component of the composite (not shown) bears a strong resemblance to EOF 1; the pattern correlation between these two maps is 0.975. Thus, PCA mode 1 may be interpreted as characterising the component of average ENSO behaviour that is antisymmetric between average El Nino and La Nina events. By mixing EOF modes 1 and 2, NLPCA mode 1 is able to characterise the spatial asymmetry between average El Nino and La Nina events. The bias of SST toward warm anomalies in the eastern part of the basin and toward cold anomalies in the western and central parts is also evident in the study of Burgers and Stephenson (1999) , who calculate the skewness of the observed SSTA distribution. It is interesting to note the striking similarity between their map of the spatial distribution of skewness (their Figure 3(a)) and the symmetric component of the SSTA composite displayed in Figure 4.6(c). A final comparison of the ID NLPCA and ID PCA approximations is given in Figures F i g u r e 4.6: S S T A compos i te maps for "average" (a) E l N i n o and (b) L a N i n a events. C o n t o u r in t e rva l : 0 . 5 ° C . (c) S y m m e t r i c component of composi tes (a) and (b) . C o n t o u r in t e rva l : 0 . 1 ° C . See text for def in i t ion of composi tes and of the s y m m e t r i c componen t . Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 52 4.7 (a)-(c), which show respectively maps of the pointwise correlation between the orig-inal SSTA data and the ID N L P C A approximation, the pointwise correlation between SSTA and the ID P C A approximation, and the difference between these two pointwise correlations. The two approximations are equally well correlated with the original data over the eastern-central half of the Pacific Ocean, except near the Ecuadorian coast, where the N L P C A correlations are somewhat higher than those of P C A . In the west-ern Pacific, and in particular in the neighbourhood of the E O F mode 1 zero line, the ID N L P C A approximation displays greater fidelity to the original data, as measured by pointwise correlation, than does the ID P C A approximation. Now consider SSTA N L P C A mode 2, which was calculated using a network containing L = 3 neurons in the encoding and decoding layers. Figure 4.8 displays mode 2 projected onto the subspaces spanned by SSTA EOFs (ei, 62), (e 2 ) 63), (ei, e3), and (ei, e2, e^). The 5 candidate models from an ensemble of 20 trials differed from each other with N M S D always less than 4%. N L P C A mode 2 explains 11.1% of the variance in the original SSTA data. The associated standardised time series, a2(tn) is shown in Figure 4.9. Interestingly, the correlation coefficient between cti(tn) and a2(tn) is —0.06; the two time series are uncorrelated. Figure 4.10 displays maps of SSTA N L P C A mode 2, X(2), corresponding to a 2 = -4,-1,-0.25,0,0.15,0.25,0.3,0.35,0.4,0.5,0.75,1.5. These values of Q 2 were selected to present a representative sample of maps describing N L P C A mode 2. When a2 is strongly negative, SSTA N L P C A mode 2 is characterised by negative anomalies in the central and western Pacific and positive anomalies in the eastern Pacific. As a2 increases through zero, the anomalies decrease in magnitude, while the positive anomalies in the eastern part of the basin become increasingly concentrated in the equatorial region. Eventually, the region of positive SSTA breaks off from the coast of South America and migrates into the central Pacific. As a2 increases further, the SSTA pattern becomes the opposite of that for a2 near zero, with positive anomalies in the central and western Chapter 4. NLPCA of Tropical Indo-Pacihc SST and SLP 53 Figure 4.7: Map of pointwise correlation coefficient between observed SSTA and (a) ID N L P C A approximation, (b) ID P C A approximation, and (c) difference between (a) and (b). Figure 4.8: As for Figure 4.3, but for SSTA N L P C A mode 2. Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 55 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 Figure 4.9: As for Figure 4.4(a), but for a2(tn), the time series corresponding to SSTA N L P C A mode 2. Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 56 F i g u r e 4.10: M a p s cor responding to S S T A N L P C A mode 2 for (a) a 2 = - 4 (b) ct2 = - 1 (c) a 2 = - 0 . 2 5 (d) a 2 = 0 (e) a 2 = 0.15 (f) a 2 = 0.25 (g) cx2 = 0.3 (h) a 2 = 0.35 (i) a 2 = 0.4 (j) a 2 = 0.5 (k) a 2 = 0.75 (1) a 2 = 1.5. C o n t o u r in t e rva l : 0 . 5 ° ( 7 . Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 57 Pac i f i c and negat ive anomalies i n the east. F i n a l l y , for a2 near the ex t reme pos i t ive par t o f i ts range, S S T A N L P C A mode 2 is character ised by negat ive anomal ies a long the equator , ex t end ing to the datel ine, w i t h pos i t ive anomalies th roughou t the rest of the bas in . Because the anomalies are often concent ra ted a long the equator , i t is reasonable to associate this mode w i t h cer ta in aspects of E N S O va r i ab i l i t y not c a p t u r e d by N L P C A m o d e 1. Indeed, i t is in teres t ing to note f rom F i g u r e 4.4 that N L P C A m o d e 2 is more ac t ive i n the la ter par t of the r ecord t h a n i n the earlier. T h e two s t rong m i n i m a i n a2(tn) coinc ide w i t h the decay phases of the large E l N i n o events o f 1982/83 a n d 1997/98 , desc r ib ing the l inge r ing patches of w a r m water i n the eastern t r o p i c a l Pac i f i c observed d u r i n g these per iods . T w o of the three weaker m i n i m a i n the more ac t ive p e r i o d are associated w i t h the peaks of the L a N i n a events of 1985/85 and 1988/89. Indeed, the co ld anomal ies d u r i n g L a Ninas i n the later pe r iod are somewhat stronger, more concen t ra ted i n the cen t ra l Pac i f i c O c e a n , and weaker i n the eastern Pac i f ic O c e a n t h a n L a N i n a events i n the earl ier par t of the record , as ind i ca t ed by a composi te analysis (F igu re 4.11). A n u m b e r o f studies have no ted a shift i n E N S O va r i ab i l i t y i n 1977 (see, e.g., W a n g , 1995). T h e apparent nons ta t iona r i ty of a2(tn) is consistent w i t h a shift at th is t i m e , a l t h o u g h the precise t i m i n g of the shift i n a2(tn) is not obvious . T h e 1977 shift is i n fact manifest i n S S T A N L P C A mode 1 t ime series ai(tn) (F igu re 4.4); the t i m e series is b iased t o w a r d negat ive e x t r e m a before 1977 and t oward pos i t ive e x t r e m a after 1977. It shou ld however be no t ed that this shift m a y s i m p l y be an artifact o f the technique used to recons t ruc t the g r i dded S S T d a t a set. T h u s , the first mode of the m o d a l N L P C A decompos i t ion of t r o p i c a l Pac i f i c S S T A describes the average va r i ab i l i t y associated w i t h the E N S O p h e n o m e n o n , a n d n ice ly characterises the a s y m m e t r y i n spa t ia l s t ructure between average E l N i n o and L a N i n a events. T h e second mode characterises the difference i n evo lu t ion between different E N S O events, a n d i n pa r t i cu la r , displays a nons ta t ionar i ty consistent w i t h the observed "regime Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 58 Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 59 shift" i n E N S O var iab i l i ty . P l o t s o f a 2 D n o n m o d a l N L P C A a p p r o x i m a t i o n of the S S T A da t a (ie, us ing 2 neu-rons i n the bo t t l eneck layer) , p ro jec ted i n the subspaces spanned by S S T A E O F s ( e i , 62), (^2,63), ( e i , e 3 ) , and (ei ,e2 ,e 3 ) , are shown i n F i g u r e 4.12 (a)-(d). T h e associa ted net-w o r k used L = 6 neurons i n the encoding and decoding layers, a n d the N M S D be tween cand ida te models (8 out of an ensemble of 20) var ied between 1% a n d 3%. T h e 2 D n o n m o d a l N L P C A a p p r o x i m a t i o n explains 79.0% of the var iance i n the t r u n c a t e d d a t a set, a n d thus 72.2% of the var iance of the or ig ina l da ta . T h e t i m e series co r respond ing to the o u t p u t o f the bot t leneck layers, denoted (81,/^X^n) — S f ( X ( i „ ) ) , are shown i n F i g u r e 4.13. These two t i m e series are h igh ly corre la ted w i t h each other (r = -.835) a n d w i t h the N i n o 3.4 i n d e x ( r i = —.879 and r2 = .889, respec t ive ly ) . Because the 2 D non-m o d a l N L P C A depends on 2 parameters , 81 and 82, it is difficult to visual ise the results us ing a sequence of maps as was done w i t h the m o d a l N L P C A i n F igu res 4.5 and 4.10. F igu re s 4.14(a) and (b) d isplay maps of the poin twise cor re la t ion coefficient be tween the S S T A d a t a a n d the 2 D P C A a n d 2D n o n m o d a l N L P C A app rox ima t ions , respect ively . T h e 2 D n o n m o d a l N L P C A a p p r o x i m a t i o n produces higher corre la t ions t h a n the 2 D P C A a p p r o x i m a t i o n i n the cen t ra l equator ia l , western, and southeastern Pac i f i c , a n d s l igh t ly lower corre la t ions i n the eastern equa tor ia l Pac i f i c . It is w o r t h cons ider ing the t ime series 8\(tn) and 82(tn) m more de ta i l . A s was discussed i n C h a p t e r 2, the parameter i sa t ion S f ( X ( i n ) ) of the P - d i m e n s i o n a l surface de t e rmined by N L P C A is on ly defined up to an a rb i t r a ry h o m e o m o r p h i s m (i.e., a con t i n -uous , one-to-one, a n d onto func t ion w i t h cont inuous inverse) . T h a t i s , for a n a r b i t r a r y h o m e o m o r p h i s m g : 3ftp 3ftp, the t ime series g ( s f ( X ( < n ) ) ) is also an acceptable pa -rame te r i s a t ion of the surface, because f o S f = (f 0 g _ 1 ) 0 ( g 0 S f ) . I n pa r t i cu l a r , for any h o m e o m o r p h i s m g : 3ft2 H-> 3ft2, [gi{tn), g2(tn)) — g(8i(tn), 82(tn)) parameter ises the surface found by 2 D n o n m o d a l N L P C A . W h i c h pa ramete r i sa t ion is de t e rmined b y the Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP Figure 4.12: As for Figure 4.3, but for SSTA 2D nonmodal NLPCA approximation. Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 61 (a) .41 1 1 1 1 1 1 1 1 1 I 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 Figure 4.13: Time series (a) and (b) B2(tn)t where (/3i,/32)(i„) = s f (X(i„)) is the pair of time series associated with the SSTA 2D nonmodal N L P C A approximation. Both time series have been normalised to unit variance. Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 62 (a) 150E 180E 150W 120W 90W (b) 150E 180E 150W 120W 90W (c) 150E 180E 150W 120W 90W F i g u r e 4.14: M a p s of poin twise corre la t ion between observed S S T A a n d (a) 2 D P C A a p p r o x i m a t i o n (b) 2 D n o n m o d a l N L P C A a p p r o x i m a t i o n , and (c) 2 D m o d a l N L P C A a p p r o x i m a t i o n . Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 63 N L P C A a l g o r i t h m is a ma t t e r of chance. T h i s degeneracy compl ica tes the in t e rp re t a t i on of the t i m e series de t e rmined by n o n m o d a l N L P C A . In pa r t i cu la r , the t i m e series f3i(tn) need not be independent , or even uncorre la ted . D e t e r m i n i n g a set of P independent variables 7, pa ramete r i s ing the surface f r o m the set of P t i m e series /%(£„) de t e rmined emp i r i ca l l y by N L P C A is another p r o b l e m of feature e x t r a c t i o n , i n the space of the variables parameter i s ing the surface. Therefore , P C A or m o d a l N L P C A can be used to calculate the ~yi(tn). I n the case at h a n d , i n s p e c t i o n o f the scat terplot of j3i(tn) w i t h p2(tn) i nd i ca t ed that P C A was appropr ia te for the separa t ion of the cor re la ted t i m e series f3\(tn) and / 3 2 ( £ n ) in to two uncor re la ted t i m e series 71 (tn) a n d 72(^71)- T h e h o m e o m o r p h i s m g is thus s imp ly a l inear func t ion . T h e first P C A m o d e exp l a ined 92.7% of the var iance i n /3-space, and the associated t i m e series (not shown) describes average E N S O var iab i l i ty . Its corre la t ion coefficient w i t h the N i n o 3.4 t i m e series is 0.92 and w i t h ai(tn) is 0.87. T h e second P C A mode expla ins the r e m a i n i n g 7.3% o f the var iance i n /3-space,-and the associated t i m e series 72(^71) (not shown) is ra the r s imi l a r to a 2 ( £ n ) . T h e two t ime series have a cor re la t ion o f 0.7, a n d i n pa r t i cu l a r 72 ( ^ n ) d isplays the same shift i n a c t i v i t y f rom the pre-1977 to the post-1977 p e r i o d as does a2(tn), w i t h the same prominen t peaks appear ing i n b o t h t i m e series. T h e pa rame te r i s a t i on (/3i(tn),^(^n)) thus contains essential ly the same i n f o r m a t i o n as the two t i m e series ai(tn) and a 2 ( £ n ) - T h e fact that an e x t r a step o f process ing is r equ i r ed to a l low in t e rp re t a t i on o f the t ime series p roduced by n o n m o d a l N L P C A indica tes a d i s t inc t advantage o f m o d a l over n o n m o d a l N L P C A . F i g u r e 4.15 displays the 2D m o d a l a p p r o x i m a t i o n to the S S T A da ta , ob t a ined by add ing the two I D m o d a l approx imat ions ob ta ined before, p ro jec ted i n S S T A E O F sub-spaces (e i ,e 2 ) , (e 2 ,e 3 ) , (ei ,e 3 ) , and ( e i , e 2 , e 3 ) . Because of the va r i ance -pa r t i t i on ing p r o p e r t y of N L P C A , the f rac t ion of var iance exp la ined by this a p p r o x i m a t i o n is the s u m of the fract ions exp la ined by the two I D m o d a l approx ima t ions , i .e. , 74.4%, w h i c h is F i g u r e 4.15: A s for F i g u r e 4.3, but for S S T A 2 D m o d a l N L P C A a p p r o x i m a t i o n . Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 65 s l igh t ly greater t h a n that ob ta ined w i t h the 2 D n o n m o d a l a p p r o x i m a t i o n . T h i s approx-i m a t i o n differs i n de ta i l f rom the n o n m o d a l a p p r o x i m a t i o n d i sp layed i n F i g u r e 4.12, bu t the two agree b road ly i n thei r general features. F i g u r e 4.14(c) d isplays a m a p of the po in twise cor re la t ion coefficient between the 2 D m o d a l N L P C A a p p r o x i m a t i o n and the o r ig ina l S S T A data ; correlat ions are somewhat higher t h a n those of the 2 D n o n m o d a l a p p r o x i m a t i o n i n the western Pac i f ic ocean and somewhat lower i n the eastern equa to r i a l Pac i f i c , bu t by and large the differences between the two po in twise cor re la t ion maps are s m a l l . N o t e tha t wh i l e i n p r inc ip le one w o u l d expect a 2 D n o n m o d a l N L P C A a p p r o x i m a t i o n to be super ior to a 2 D m o d a l N L P C A a p p r o x i m a t i o n , because i t shou ld have access to a broader class o f funct ions , i n the case of S S T A the former expla ins 72.2% of the var iance wh i l e the l a t t e r expla ins 74.4%. In fact, the m o d e l cor responding to the m o d a l N L P C A h a d 13% more free parameters t h a n that cor responding to the n o n m o d a l N L P C A . It seems tha t this difference a l lowed the m o d a l m o d e l more f l ex ib i l i ty t h a n the n o n m o d a l , l ead ing to the slight improvemen t i n the f rac t ion of var iance exp la ined . T h u s , i n b o t h I D and 2 D , and for b o t h m o d a l and n o n m o d a l approaches , N L P C A produces app rox ima t ions of greater f idel i ty to the t r o p i c a l Pac i f i c S S T A t h a n does P C A . In pa r t i cu l a r , b o t h the I D P C A and the I D N L P C A approx ima t ions descr ibe "average" E N S O va r i ab i l i t y , bu t the I D m o d a l N L P C A a p p r o x i m a t i o n was able to character ise the spa t i a l a s y m m e t r y between average E l N i n o and L a N i n a events i n a fashion tha t I D P C A cannot . 4.4 Tropical Indo-Pacific Sea Level Pressure A s was done w i t h the S S T A da ta , the S L P A da t a was preprocessed b y p ro j ec t ing i t onto the space spanned by its 10 leading E O F modes, w h i c h together e x p l a i n 60% of the t o t a l Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 66 var iance i n the da ta . F i g u r e 4.16 displays maps of the three lead ing S L P A E O F modes , w h i c h e x p l a i n 24.2%, 10.7%, and 6.0% of the to t a l var iance, respect ively . F i g u r e 4.17 displays the I D N L P C A a p p r o x i m a t i o n of the S L P A da t a p ro j ec t ed on the subspaces spanned by S L P A E O F s ( e i , e 2 ) , (e2,e3), ( e ^ e s ) , a n d (e i ,e 2 ,e3) . T h i s curve was ob t a ined us ing a ne twork w i t h L = 2 neurons i n the encod ing a n d decod ing layers; the N M S D be tween the 8 candida te models f r o m an ensemble of 50 r anged be tween 0 .1% a n d 0.3%. T h e I D N L P C A a p p r o x i m a t i o n explains 27.0% of the t o t a l var iance i n the S L P A da ta , a slight i m p r o v e m e n t over the f ract ion of var iance exp l a ined by the I D P C A a p p r o x i m a t i o n . F igu re s 4.18(a) a n d (b) d isplay respec t ive ly a p lo t o f cti(tn), the t i m e series associa ted w i t h the I D N L P C A a p p r o x i m a t i o n , and the Sou the rn O s c i l l a t i o n Index ( S O I ) , ca l cu la t ed by sub t r ac t ing the S L P A at D a r w i n , A u s t r a l i a (131E,12S) f rom tha t at T a h i t i ( 149W,17S) (see F i g u r e 4.16(a)), and then a p p l y i n g a 5 -mon th r u n n i n g average smoother . T h e two t i m e series bear a s t rong resemblance to each other on i n t e r a n n u a l a n d longer t imescales; the i r cor re la t ion coefficient is 0.72. T h e I D m o d a l N L P C A approx-i m a t i o n thus seems to descr ibe E N S O va r i ab i l i t y i n the S L P A f ie ld . T h i s assoc ia t ion is re inforced by in spec t ion of the sequence of maps X ^ 1 ) for = —3, —2, —1, —0.5, 0, 0 .5 ,1 , 2 (F igu re 4.19). T h i s sequence of spa t ia l pat terns describes the east-west S L P A d ipole as-soc ia ted w i t h average Sou the rn O s c i l l a t i o n var iab i l i ty . F i g u r e 4.20 d isp lays composi tes o f S L P A averaged over those D J F (December , January , F e b r u a r y ) per iods i n w h i c h the S O I was less t h a n 1 s t anda rd dev ia t ion below the long- t e rm average (F igu re 4.20(a)) or was m o r e t h a n one s t a n d a r d dev ia t i on above the long- t e rm average ( F i g u r e 4 .20(b)) . T h i s 3 -month p e r i o d was selected because of a l l 3-month seasons i n the record , i t d i sp layed the greatest var iance . F igures 4.20(a) and 4.20(b) represent "average" E l N i n o and L a N i n a pa t te rns , respect ive ly . C lea r ly , the maps i n F i g u r e 4.19 for ai < 0 co r re spond to the E l N i n o compos i t e and those for a i > 0 correspond to the L a N i n a compos i t e . C o m p a r i s o n of F igu res (4.19) a n d (4.20) indicates that the I D N L P C A a p p r o x i m a t i o n characterises Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 67 (a) 60E 90E 120E 150E 180E 150W 120W 90W (b) 60E 90E 120E 150E 180E 150W 120W 90W F i g u r e 4.16: S p a t i a l pat terns of S L P A (a) E O F mode 1, (b) E O F m o d e 2, (c) E O F m o d e 3. T h e b lack dots i n (a) designate the posi t ions of T a h i t i and D a r w i n , A u s t r a l i a . Figure 4.17: As for Figure 4.3, but for SLPA N L P C A mode 1. Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 69 r - Oh 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 F i g u r e 4.18: (a) P l o t of cti(tn) = 5 / l ( X ( i n ) ) , the s tandardised t i m e series associated w i t h S L P A N L P C A mode 1. (b) P l o t of 5-month r u n n i n g m e a n of S O I , s t andard i sed to un i t var iance . Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 70 F i g u r e 4.19: P l o t o f a sequence of spa t ia l maps charac ter i s ing S L P A N L P C A m o d e 1 for (a) ax = - 3 (b) ax = - 2 ( c ) a i = - 1 (d) a i = - 0 . 5 (e) ax = 0 (f) ax = 0.5 (g) ax = 1 (h) a i = 2. C o n t o u r in te rva l : 0.5 h P a . Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 71 (a) 6 0 E 90E 120E 150E 180E 150W 120W 90W 6 0 E 90E 120E 150E 180E 150W 120W 90W F i g u r e 4.20: Compos i t e s o f S L P A du r ing (a) E l N i n o and (b) L a N i n a . C o n t o u r In te rva l : 0.5 h P a . Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 72 the a s y m m e t r y i n S L P A pa t t e rn between average E l N i n o and L a N i n a events , pa r t i c -u l a r l y i n the eastern h a l f o f the d o m a i n . F i g u r e 4.21 displays the spa t i a l s t ruc tu re o f the po in twise cor re la t ion coefficient between S L P A and the I D N L P C A a p p r o x i m a t i o n ( F i g u r e 4.21(a)) , and the I D P C A a p p r o x i m a t i o n (F igure 4.21(b)) . A s was the case i n the previous sect ion, t h e . I D S L P A N L P C A a p p r o x i m a t i o n produces h igher corre la t ions t h a n the I D P C A a p p r o x i m a t i o n pa r t i cu l a r l y a round n o d a l l ines o f the la t te r . I n c a l cu l a t i ng the second mode of the m o d a l N L P C A decompos i t i on o f the S L P A da ta , i t was de t e rmined that on ly for L = 1 neuron i n the encod ing and decod ing layers cou ld robus t results be ob ta ined . Neura l -ne twork based N L P C A can o n l y find non l inear s t ruc tu re i f there are two or more neurons i n the encod ing a n d decod ing layers . T h u s , the o p t i m a l I D charac ter i sa t ion of the res idua l da ta , ob ta ined by sub t r ac t i ng f rom the o r i g i n a l S L P A d a t a the I D N L P C A a p p r o x i m a t i o n , is a s traight l ine . T h e I D N L P C A a p p r o x i m a t i o n of the res idua l da t a coincides w i t h the I D P C A a p p r o x i m a t i o n o f these da ta , a n d hence no lower -d imens iona l nonl inear s t ructures can be found i n these res iduals . O u r ca l cu l a t i on (not shown) shows that N L P C A mode 2 expla ins 15.9% of the var iance of the o r ig ina l da ta . T h e spa t ia l pa t t e rn and associated t i m e series are shown i n F i g u r e 4.22. I n fact , S L P A N L P C A mode 2 bears a s t rong resemblance to S L P A P C A m o d e 2; the co r re la t ion coefficient between the two t ime series is 0.96 and the p a t t e r n cor re la t ion be tween the associated spa t ia l pa t terns is 0.93. T h e s i m i l a r i t y be tween the two is not surpr i s ing , as S L P A N L P C A mode 1 does not differ subs tan t ia l ly f rom S L P A P C A m o d e 1. T h e results of a 2 D n o n m o d a l N L P C A of the S L P A da t a (not shown) d i d not y i e l d p a r t i c u l a r l y in te res t ing results . T h u s , apar t f rom a weak ly nonl inear I D N L P C A a p p r o x i m a t i o n cor respond ing to average E N S O va r i ab i l i t y a n d charac ter i s ing a slight a s y m m e t r y be tween average E l N i n o and L a N i n a events, the robust low-d imens iona l s t ruc ture of the S L P A d a t a is l inear . Figure 4.21: Spatial pattern of pointwise correlation coefficient between SLPA and (a) ID N L P C A approximation and (b) ID P C A approximation. F i g u r e 4.22: S L P A N L P C A mode 2 (a) Spa t i a l pa t t e rn (not no rma l i sed , uni t s are h P a ) a n d (b) t i m e series (normal i sed to uni t var iance) . Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 75 4.5 Conclusions Application of N L P C A to two data sets of climatic significance, namely tropical Pacific sea surface temperatures and tropical Indo-Pacific sea level pressure, has demonstrated that N L P C A is able to robustly produce one- and two-dimensional approximations that are superior to the corresponding approximations produced by P C A . The improvement is particularly striking in the case of SST: variability in this field is dominated by ENSO, the average manifestation of which is asymmetric between average E l Nino and La Nina phases. As P C A is constrained to produce ID approximations which are standing oscil-lations with fixed spatial pattern, it is unable to characterise this asymmetry. On the other hand, the ID N L P C A approximation, by mixing P C A modes 1 and 2, is able to capture this difference in SST structure between average E l Nino and La Nina episodes. Figures 4.12 and 4.15 display, respectively, 2D nonmodal and modal N L P C A approx-imations to the SST data. These are seen to be rather similar in structure. Inspection of the time series (3i(tn) and f32(tn) parameterising the 2D nonmodal N L P C A approx-imation highlights the importance of the fact, pointed out by Malthouse (1998), that N L P C A produces time series that are unique only up to an arbitrary homeomorphism. The time series (3\(tn) and /^ (^ n) were strongly correlated, and therefore contain substan-tial overlap in the information they convey. A second feature analysis problem, solved using traditional P C A , was then used to untangle (3i(tn) and (^in)- The resulting time series bore strong similarities to the time series cti(tn) and ct2(tn) corresponding to the first two modes of the modal N L P C analysis. This result further strengthens the in-terpretation that the modal and nonmodal analyses are producing essentially the same approximation. The fact that the use of nonmodal N L P C A required the solution of a subsidiary feature extraction problem to interpret the time series produced illustrates a deficiency of nonmodal N L P C A , as compared to modal. Chapter 4. NLPCA of Tropical Indo-Pacific SST and SLP 76 The first modal NLPCA approximation to the SLP data describes average ENSO variability in this field, producing a somewhat better approximation to the original data than that produced by PCA and characterising the asymmetry in SLPA between av-erage El Nino and La Nina episodes. The differences between the NLPCA and PCA approximations for SLP are less striking than was the case with SST, indicating that the low-dimensional structure of SLP is more linear than SST. Indeed, no nonlinear mode be-yond the first could robustly be found in the data, indicating that either SLP variability in the tropical Indo-Pacific region is very nearly linear, or that any nonlinear structure is too subtle to detect within existing records. Chapter 5 Nonl inear P r inc ipa l Component Analysis of Nor the rn Hemisphere Atmospher ic Circula t ion D a t a 5.1 Introduct ion Low-frequency, large-scale coherent variability in the Northern Hemisphere midlatitude circulation has been a subject of considerable interest in climate research over the last few decades. It is typically characterised in terms of spatially-fixed, temporally fluctuating anomaly patterns modifying the climatological mean circulation. Some of these patterns are zonally localised, such as the Pacific-North America (PNA) pattern (Wallace and Gutzler, 1981), a chain of alternating positive and negative geopotential height anoma-lies in the mid-troposphere extending from the subtropical North Pacific Ocean over North America, following a great circle route; and the North Atlantic Oscillation (NAO; van Loon and Rogers, 1978; Hurrell, 1995), a dipolar pattern with geopotential height anomalies of opposite sign over Iceland and the Azores. Other patterns are more zonal in structure, such as the Arctic Oscillation (AO; Thompson and Wallace, 1998) and the Antarctic Oscillation (AAO; Gong and Wang, 1999). These are approximately zonally-symmetric patterns of variability with anomalies of opposite sign over the polar region and the midlatitudes, for the Northern and Southern Hemisphere, respectively. The connections between these different patterns of variability (e.g. Deser, 1999), and their dynamical origin - in particular their maintenance by lower boundary forcing or internal dynamics (see, e.g., Feldstein and Lee, 1996; Corti et al, 1997; Trenberth et al, 1998) 77 Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 78 - are areas o f ac t ive research. These character is t ic pat terns of a tmospher ic v a r i a b i l i t y have h i s to r i ca l ly been diagnosed us ing corre la t ion analysis (Wal lace and G u t z l e r , 1981; H s u a n d L i n , 1992), P C A ( K u s h n i r and Wal l ace , 1989; T h o m p s o n and W a l l a c e , 1998), c o m b i n e d P C A ( B a l d w i n and D u n k e r t o n , 1999; C P C A , also k n o w n as ex t ended E O F analysis , is defined b y B r e t h e r t o n et a l . , 1992), canon ica l cor re la t ion analysis ( P e r l w i t z a n d Gra f , 1995), and ro ta ted P C A ( B a r n s t o n and L ivezey , 1987). A l l of these me thods are l inear and p roduce spa t ia l and t e m p o r a l pat terns tha t descr ibe s t and ing osc i l la t ions . R e c e n t l y , the A r c t i c Osc i l l a t i on ( A O ) has been of pa r t i cu l a r interest . T h e A O was defined by T h o m p s o n and Wal l ace (1998) as the leading P C A mode o f m o n t h l y aver-aged N o v e m b e r t h r o u g h A p r i l N o r t h e r n Hemisphere S L P A n o r t h of 2 0 ° . T h e A O spa t i a l p a t t e r n , de r ived f rom the T r e n b e r t h and Pao l ino S L P da t a set (1980), is d i sp layed i n F i g u r e 5.1. T h e canon ica l A O s t ruc ture is rough ly z o n a l l y - s y m m e t r i c , w i t h anomal ies o f oppos i te s ign i n the po la r region and the mid la t i tudes . Dev i a t i ons f rom z o n a l s y m m e t r y are charac ter i sed by a wavenumber two pa t t e rn ref lect ing land-ocean contrasts . I n par-t i cu l a r , the d ipole pa t t e rn i n S L P over the N o r t h A t l a n t i c s t rongly resembles the surface s ignature o f the N A O . T h e A O diagnosed us ing P C A on the T r e n b e r t h a n d P a o l i n o d a t a set s t rongly resembles that ob ta ined by T h o m p s o n and W a l l a c e (1998), w h o found the A O a n d N A O t i m e series to be h igh ly corre la ted (r = 0.69). T h i s po in t was cons id -ered fur ther by Deser (1999), who considered the extent to w h i c h coherent v a r i a b i l i t y i n N o r t h e r n H e m i s p h e r e c i r cu l a t i on is rea l ly hemispher ic i n extent , and suggested tha t i n fact the A O shou ld be t e rmed the " A r c t i c - A t l a n t i c O s c i l l a t i o n " . T h e A O is s t rongly equivalent baro t rop ic i n s t ruc ture , w i t h coherent v a r i a b i l i t y th roughou t the t roposphere and in to the lower s tratosphere ( P e r l w i t z and Gra f , 1995; T h o m p s o n and W a l l a c e , 1998, 1999a; B a l d w i n and D u n k e r t o n , 1999). Interest i n the A O has been p a r t i c u l a r l y s t rong i n recent years because of i ts p o t e n t i a l as a sensi t ive F i g u r e 5.1: S p a t i a l s t ruc ture of the leading E O F pa t t e rn f rom observed S L P . C o n t o u r in tervals are 1 h P a (.. . ,-1.5,-0.5,0.5,1.5,.. .). Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 80 ba romete r of c l ima te change. T h o m p s o n et a l . (1999b) no ted the s i m i l a r i t y i n s t ruc-ture be tween the spa t ia l pa t t e rn of the A O and of recent t rends i n S L P i n the N o r t h e r n H e m i s p h e r e . T h e y suggested that the c l imate change s ignal is mani fes t ing i t se l f as a sec-u la r shift t o w a r d the pos i t ive phase o f the A O (s t rong po la r vo r t ex ) supe r imposed u p o n m o n t h l y t imescale A O fluctuations. Indeed, i n mode l l i ng studies, S h i n d e l l et a l . (1999, us ing the G o d d a r d Ins t i tu te for Space Sciences ( G I S S ) a tmospher ic G C M ) and F y f e et al. (1999, us ing the C a n a d i a n Cen t re for C l i m a t e M o d e l l i n g and A n a l y s i s ( C C C m a ) coup led G C M ) found a p ronounced t r end i n the A O signal as greenhouse gases were increased . Howeve r , the p h y s i c a l m e c h a n i s m for this behaviour remains cont rovers ia l , as S h i n d e l l et a l . f ound that on ly w i t h a fu l l s tratosphere w o u l d their m o d e l p roduce an A O t r e n d unde r inc reas ing greenhouse forc ing, whi le F y f e et al. were able to p roduce th is resul t us ing an a tmospher ic m o d e l w i t h a poor ly- reso lved s tratosphere. In th is chapter , the results of an N L P C analysis of fields charac te r i s ing the N o r t h e r n H e m i s p h e r e c i r c u l a t i o n w i l l be considered, us ing da t a f rom the C C C m a coup led G C M . T h e resul ts w i l l demons t ra te that N L P C A is able to detect and character ise reg ime behav iou r i n m u l t i v a r i a t e da ta sets. Cons ide ra t i on o f these regimes w i l l t h r o w some l ight o n the controversies concern ing the re la t ionship be tween the A O a n d the N A O discussed above. 5.2 Data and Mode l Building T h e da t a ana lysed i n this chapter comes f rom two integrat ions of the C C C m a coup led c l i -ma t e m o d e l ( C G C M 1 ) : a 1001-year con t ro l in tegra t ion w i t h a tmospher i c c a r b o n d iox ide (CO2) at p r e - indus t r i a l levels and a 500-year s tab i l i sa t ion in t eg ra t ion w i t h a tmospher i c CO2 concent ra t ions at four t imes p re - indus t r i a l l eve l , co r responding to p r ed i c t ed C 0 2 levels i n 2100. U s e of m o d e l da t a ra ther t han observations allows the inves t iga t ion of Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 81 potential changes in the atmospheric variability associated with increased atmospheric CO2 concentrations. The fields considered are monthly-averaged Northern Hemisphere sea level pressure and 500mb geopotential height (Z50o) from 20°N to 90°N, over the extended winter period from November though April. Both fields are on a Gaussian grid with 3.75° resolution in the zonal and meridional directions. The CCCma CGCM1 model and its climate are described in Flato et al. (1999). The atmospheric component is a T32 spectral primitive equation model with 9 unequally spaced levels (McFarlane et al., 1992). The ocean component is a global primitive equation grid-point model with 1.875° resolution and 29 vertical levels. It is based on the Geophysical Fluid Dynam-ics Laboratory (GFDL) Modular Ocean Model (MOM) 1.1 (Pacanowski et al., 1993). The leading EOF of this coupled model displays spatially and temporally realistic AO behaviour (Fyfe et al, 1999). For both the SLP and Z500 fields, monthly anomalies (SLPA and Z500A, respectively) were computed by subtracting the climatological annual cycle. Fields were weighted by the square root of the cosine of the latitude before calculation of the EOFs to account for the poleward concentration of gridpoints on the Gaussian grid. NLPC analysis of SLPA and Z500A data was carried out using the early stopping algorithm described in Chapter 2. In all analyses, 20% of the data was held aside in a validation set, for which network performance was monitored as training proceeded. Training was stopped when this performance began to degrade, or after 20000 iterations, whichever came first. It was found that increasing the maximum number of iterations beyond 20000 did not affect the results of the analysis. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 82 5.3 Analys is of G C M Sea Level Pressure A s was done w i t h the t r o p i c a l S L P A and S S T A da t a i n the previous chapter , the n o r t h e r n hemisphere S L P A and Z500A were pro jec ted onto the spaces o f the i r first 10 E O F modes , i n w h i c h 85% and 76% of the var iance are respec t ive ly con ta ined . T h e first four E O F pat te rns o f S L P A and Z500A are d isp layed i n F igures 5.2 and 5.3, respect ive ly . T h e m a p d i sp layed i n F i g u r e 5.2(a) is the spa t ia l pa t t e rn of the canon ica l A r c t i c O s c i l l a t i o n i n the C C C m a m o d e l (Fyfe et a l . , . 1999). F i g u r e 5.4 displays a scat terplot of the S L P A da t a p ro jec ted i n the plane spanned by the l ead ing two E O F s , over la id w i t h a h is togram-based es t imate o f the m a r g i n a l p r o b a b i l i t y dens i ty func t ion ( P D F ) of these da t a i n this space. T h e P D F displays a m a r k e d dev i a t i on f r o m Gauss i an s t ructure i n the fo rm of a p ronounced lobe i n the lower-r ight quadran t . F i g u r e 5.5 displays the es t imate of the P D F a long w i t h the I D N L P C A a p p r o x i m a t i o n X ( i n ) to these da ta . T h e N L P C A a p p r o x i m a t i o n was found us ing a ne twork w i t h L = 2 neurons i n the encoding and decoding layers. T h i s a p p r o x i m a t i o n was ob t a ined f r o m an ensemble of 5 networks , of w h i c h 3 were cand ida te mode ls differing f r o m each o ther w i t h an N M S D of at most 1%. T h e a p p r o x i m a t i o n , w h i c h exp la ins 26 .5% of the t o t a l var iance ( in contrast to 24% exp la ined by the l ead ing P C A a p p r o x i m a t i o n ) , is thus robus t . N o t e tha t X ( i n ) has a piecewise-l inear s t ruc ture , composed o f three branches . T h e associated s tandardised t ime series n(t \ */(X(*"))~ < sf > a long w i t h a h i s t o g r a m es t imate o f the P D F of a(tn), are d i sp layed i n F i g u r e 5.6. T h e dis-t r i b u t i o n of a(tn) is s t rongly b i m o d a l , demons t ra t ing the existence of two wel l -separa ted regimes i n w h i c h the N L P C A a p p r o x i m a t i o n resides. E a c h of the three branches o f X ( i n ) corresponds t o a feature i n the P D F of a(tn): the upper b r a n c h (hereafter referred to as B r a n c h 1) is associated w i t h the larger of the two peaks, the lower b r a n c h (hereafter Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 83 Figure 5.2: Spatial structure of the leading four EOF patterns from CCCma SLPA: (a) EOF 1, (b) EOF 2, (c) EOF 3, (d) EOF 4. These patterns explain 23.7%, 10.6%, 8.5%, and 6.5% of the variance in SLP, respectively. Negative contours are dashed. Contour intervals are 1 hPa (..., -1.5, -0.5, 0.5, ...). Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 84 F i gure 5.3: S p a t i a l s t ruc ture of the leading four E O F pat terns f rom C C C m a Z500A: (a) E O F 1, (b) E O F 2, (c) E O F 3, (d) E O F 4. These pat terns e x p l a i n 19.6%, 12.5%, 9.3%, a n d 8.2% of the var iance i n Z50Q, respect ively. Nega t ive contours are dashed. C o n t o u r in tervals are 10 m (..., -15, -5 , 5, . . . ) . Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 85 200 150 100 Q_ oo O Q_ - 1 0 0 - 1 5 0 -200 - 3 0 0 -200 -100 100 200 300 PC1 (hPa) Figure 5.4: Scatterplot of the leading two SLPA P C time series, overlaid with a histogram estimate of the corresponding marginal probability density function. Contour intervals are 5 x 10"4,1.5 x 10" 3,3 x ICT 3,6 x 1CT3,1 x l f r 2 , 2 x ICT 2, 3 x 1(T 2. The histogram bin size is 25 hPa in both directions. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 86 F i g u r e 5.5: I D N L P C A a p p r o x i m a t i o n X of S L P A , p ro jec ted i n the space of the first two S L P A E O F s (open c i rc les) , over lay ing h i s tog ram es t imate o f S L P A P D F as i n F i g u r e 5.4. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 87 Figure 5.6: Plot of the ID SLPA NLPCA time series a(tn) (left) and the associated histogram estimate of the PDF (right). Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 88 referred to as B r a n c h 2) is associated w i t h the smaller peak, a n d the in t e rmed ia t e b r a n c h is associa ted w i t h the m i n i m u m of the P D F of a between the two peaks , a n d is r a re ly v i s i t ed . T h e a p p r o x i m a t i o n X ( i „ ) is i n B r a n c h 1 for 84% of the months a n d i n B r a n c h 2 for 13%. A n in spec t ion o f a ( t n ) indica tes tha t va r i ab i l i t y o n X ( t n ) consists o f o sc i l l a to ry m o t i o n a long B r a n c h 1 w i t h occas ional episodic excurs ions to B r a n c h 2, o n w h i c h the a p p r o x i m a t i o n ra re ly resides longer t h a n a m o n t h or two. F i g u r e 5.7 is the same as F i g u r e 5.5 but w i t h the es t imated P D F s o f the popu la t ions cor respond ing to the upper and lower branches p lo t t ed separately. T h e P D F of the p o p u l a t i o n p ro j ec t ing onto B r a n c h 1 is seen to be near ly Gauss i an , w i t h a ma jo r axis nea r ly pa ra l l e l to B r a n c h 1. B r a n c h 2 also runs t h rough the m i d d l e o f i ts associated P D F . T h e over lap o f the two P D F s is an art ifact of the coarse b i n n i n g used i n the i r e s t ima t ion . A s was discussed i n the previous C h a p t e r , N L P C A differs f rom P C A i n tha t the I D a p p r o x i m a t i o n p r o d u c e d by N L P C A does not cor respond to a un ique spa t i a l p a t t e r n , bu t i n fact to a sequence of maps . In C h a p t e r 4, these maps were presented at representa t ive poin ts a long the curve X ( i n ) . In this chapter , a somewhat different m e t h o d o l o g y for p r o d u c i n g maps cor responding to the N L P C A a p p r o x i m a t i o n is adop ted . Because the features o f interest invo lve a tmospher ic c i r cu la t ion da t a at different a l t i tudes , based o n an a p p r o x i m a t i o n ca lcu la ted us ing da t a at a single a l t i tude , ins tead representa t ive poin ts a long the a p p r o x i m a t i o n X ( i n ) are selected and composi te the o r ig ina l d a t a over a l l t imes tha t the a p p r o x i m a t i o n resides i n the neighbourhoods of these po in ts . A c o m p a r i s o n of the two me thods for the field f rom w h i c h the N L P C A a p p r o x i m a t i o n was de r ived demons t ra tes t ha t the r e su l t ing maps are essential ly i den t i ca l , as e x p e c t e d since the N L P C A a p p r o x i m a t i o n is cons t ra ined to r u n t h rough the " m i d d l e " o f the da ta . F i g u r e 5.8 displays composi tes of S L P A over ne ighbourhoods i n a of w i d t h 0.4, cen-t r ed at in tervals of 0.4 f rom —3.1 to 1.3. V a r i a b i l i t y a long B r a n c h 1 is exempl i f i ed by Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 89 PC1 (hPa) Figure 5.7: As in Figure 5.5, but with the PDFs of the populations corresponding to Branch 1 (solid contours) and Branch 2 (dashed contours) plotted separately, and with a bin size of 20 hPa. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 90 Figure 5.8: Composites of SLPA over characteristic ranges of a. These ranges are in-dicated in parentheses below the maps, along with the number N of maps used in the composite. Contour interval is 2 hPa (...,-3,-1,1,3,...). Continued on next page. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 91 (g) (-0.9,-0.5); N=76 (i) (-0.1,0.3); N=1631 (k) (0.7,1.1); N=822 (h) (-0.5,-0.1); N=381 (j) (0.3,0.7); N=2127 (I) (1.1,1.5); N=107 Figure 5.8: Continued. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 92 composi tes (h) and (k) i n F i g u r e 5.8. These two maps d isp lay pat terns o f S L P anoma-lies tha t differ i n sign a n d magn i tude , but not i n spa t i a l pa t t e rn , consistent w i t h the i n t e rp re t a t i on of B r a n c h 1 va r i ab i l i t y as descr ib ing a s t and ing osc i l l a t ion . T h e anomal ies associated w i t h this osc i l l a t ion are of opposi te sign over the po la r cap and the m i d l a t -i tudes , w i t h a po la r l o c a l e x t r e m u m over no r the rn E u r a s i a a n d s m a l l l o c a l m i d l a t i t u d e e x t r e m a over the west M e d i t e r r a n e a n and the n o r t h Pac i f i c . T h i s p a t t e r n resembles tha t of the canon ica l A O as diagnosed by E O F analysis , bu t w i t h eas tward shif ted po la r a n d M e d i t e r r a n e a n centres of ac t ion . Indeed, the corre la t ion between the A O t i m e series (ie, the l ead ing S L P A P C ) and a ( i n ) , over those t imes when the a p p r o x i m a t i o n is o n B r a n c h 1, is -0.96. C h a r a c t e r i s t i c S L P A anomalies associated w i t h B r a n c h 2 are i l l u s t r a t ed i n F i g u r e 5.8(c). T h i s m a p shares cer ta in features w i t h the a n o m a l y pa t te rns d i sp layed i n F igu res 5.8(h) and 5.8(k) , i n pa r t i cu la r anomalies of opposi te sign over the po la r cap and the m i d l a t i t u d e s . Howeve r , the polar e x t r e m u m i n F i g u r e 5.8(c) is shif ted to a l o c a t i o n over I ce land , the A t l a n t i c m i d l a t i t u d e e x t r e m u m is centred over the A z o r e s , and anomal ies over the m i d l a t i t u d e Pac i f i c ocean are weak. T h i s m a p resembles s t rongly the S L P A s ignature of the negat ive phase of the N o r t h A t l a n t i c O s c i l l a t i o n . N o t e also tha t the a n o m a l y p a t t e r n of opposi te sign to that i n F i g u r e 5.8(c) does not appear on B r a n c h 2, i n contras t to wha t was observed along B r a n c h 1. V a r i a b i l i t y a long B r a n c h 2 does not descr ibe an osc i l l a t ion , but episodic excursions to a s ingle-phased, s t rongly anomalous c i r c u l a t i o n . F igu res 5.9 a n d 5.10 d isplay respect ive ly composi tes of the C C C m a m o d e l l e d Z500A and Z5oo fields over the same intervals i n oc(tn) used to ca lcula te the composi tes i n F i g u r e 5.8. T h e pa t te rns d isp layed i n F i g u r e 5.9 demonst ra te tha t the anomalous c i rcu la t ions are s t rongly equivalent ba ro t rop ic , w i t h a slight wes tward phase t i l t w i t h height . T h e m i d l a t i t u d e Z500 anomalies are s l ight ly stronger re la t ive to the po la r anomal ies t h a n is Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 93 (a) (-3.3,-2.9); N=48 (c) (-2.5,-2.1); N=321 (e) (-1.7,-1.3); N=62 (b) (-2.9,-2.5); N=234 (d) (-2.1,-1.7); N=135 (f) (-1.3,-0.9); N=58 F i g u r e 5.9: A s w i t h F i g u r e 5.8, but for composi tes of Z500 anomal ies . T h e contour i n t e rva l is 20 m (.. . ,-30,-10,10,30,.. .) . Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 94 Figure 5.9: Continued. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data Figure 5.10: As with Figure 5.8, but for composites of Z500- The 5300 and 5500 contours are in bold. Contour interval is 50 m. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 96 Figure 5.10: Continued. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 97 the case with SLPA. Examination of the Z5oo held is illuminating. Figures 5.10 (h)-(k) indicate that the mid-tropospheric manifestation of variability along Branch 1 consists of an alternating amplification and attenuation of the climatological ridge over Europe. Figure 5.10(c) demonstrates that Branch 2 describes, on average, an amplified climato-logical ridge over western North America and flow split around a local anticyclone over southern Greenland. Again, the mid-tropospheric composites associated with X(£ n ) de-scribe two markedly different modes of atmospheric circulation: a standing oscillation in the strength of the climatological ridge over North Europe (Branch 1), and episodic split flow events over Greenland (Branch 2). Figure 5.11 displays maps of the geographical distribution of variance and skewness of SLP, for both the observations and the CCCma model output. The skewness, s, of a random variable x is the ratio < ( * - < * » 3 > (5.2) < (a;- < x >)2 >3/2 and is a measure of the asymmetry of the distribution of x about its mean. For both the modelled and observed SLP, the variance is greater in the polar latitudes than in the midlatitudes, and displays a broad extremum from southern Greenland to northeastern Eurasia, with a second extremum over the North Pacific. The Pacific centre is substan-tially stronger in observations than in the model results. The broad local extremum in the variance of GCM SLP corresponds precisely to the polar centre of action in SLP EOF 1 (Figure 5.2(a)), and its west and east flanks correspond to the polar centres of action of Branches 1 and 2 of the NLPCA approximation X(£ n ) . The leading EOF pat-tern of SLP is in a way a compromise between the circulation anomalies described by Branches 1 and 2. The broad extremum in variance results from the separate occurrence of two circulation regimes, and the PCA approximation, being linear, must attempt to characterise both modes of variability simultaneously. F i g u r e 5.11: M a p s of spa t ia l d i s t r i bu t i on of var iance of S L P f rom (a) C C C m a G C M a n d (b) observat ions (contour in t e rva l 10 ( h P a ) 2 ) , and skewness of S L P f rom (c) C C C m a G C M and (d) observat ions (contour in te rva l 0.2). Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 99 There is broad agreement between observed and modelled skewness; it is seen that both are generally characterised by negative values in the midlatitudes and positive val-ues in polar latitudes, as was found by Nakamura and Wallace (1991) and by Holzer (1996). As well, both observed and modelled SLP skewness has a local maximum over Greenland and a local minimum over the midlatitude North Atlantic Ocean. In ob-servations, the Greenland maximum is weaker than that of the modelled SLP, and the North Atlantic minimum is shifted northwestward of the corresponding minimum in the modelled SLP. As well, the modelled skewness over northern Eurasia is weaker than in observations, and the North Pacific extremum evident in observations is shifted to over Alaska. Note however that some of these apparent differences may in fact be an artifact of poor observational data coverage north of 70N, as is discussed further in section 5.6. In their observational study of the geographical distribution of skewness in North-ern Hemisphere tropospheric circulation, Nakamura and Wallace (1991) suggested that, "Large skewness at a particular location might be indicative of a juxtaposition of two flow regimes: one which prevails most of the time and another which occurs relatively infrequently and is thus characterised by large anomalies." This is in fact the situation seen to prevail in the NLPCA approximation X(in) of SLPA. The dipole in skewness over the North Atlantic arises because of the regular occupation of Branch 1 and the episodic occupation of Branch 2. The negative extremum in skewness over the Atlantic Ocean west of Africa, which corresponds to the southern, negative centre of action of Branch 2 events, is displaced somewhat southward because skewness tends to be enhanced where the variance of the field is low, as is clear from equation (5.2). Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 100 5.4 Analys is of G C M 500mb Heights In Figure 5.12 are displayed histogram estimates of the 2D marginal distributions of the Z500A held PCs (PC1.PC2), (PC1,PC3), (PC2.PC3), and (PC1,PC4). The marginal distribution of P C I and PC2 is closer to Gaussian than is the corresponding SLPA distribution, although P C I is clearly skewed. In the Z500A field, marked deviations from Gaussianity are most evident in the marginal distributions of P C I with PC3 and of P C I with PC4. Figures 5.12(b) and (d) demonstrate the existence of substantial non-Gaussian structure in the Z500A data. A l l other 2D marginal distributions do not appear to differ strongly from being Gaussian. The ID N L P C A approximation to the Z50o data is displayed in Figure 5.13, projected in the spaces spanned by Z500A EOFs (ei ,e 2 ) , (e^es), (e 2 ,e 3 ) , and (e!,e2,e3). This approximation explains 23.9% of the variance in the data, in contrast to 19.6% explained by the first P C A approximation. Note that unlike the corresponding SLPA approximation it projects strongly onto EOFs other than the leading two. The N L P C A approximation is constructed based on the joint distribution of the data in the 10-dimensional embedding space, and because it has non-negligible projections on EOFs beyond the first two, it is not particularly useful to compare the 2D marginal distributions presented in Figure 5.12 with the projections of the approximation in these planes. Thus, no equivalent of Figure 5.7 is presented. The ID N L P C A Z5ooA approximation is similar to that of the SLPA in that it is composed of three branches. As well, the corresponding standardised time series a(t n ) and its histogram (displayed in Figure 5.14) display a strikingly bimodal character. As was the case with the ID SLPA N L P C A approximation, one branch of X(£ n ) corresponds to the larger peak of the distribution of a(tn), a second to the smaller peak, and the third branch to the rarely-visited region in between. The system spends 78% of all months in the branch corresponding to the larger peak, 19% in the branch Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 101 F i g u r e 5.12: H i s t o g r a m est imates of 2D m a r g i n a l P D F s of Z500A P C s ( a ) ( P C l , PC2) , (b)(PCl,Pc73), (c)(PC72,PC3), and ( d ) ( P C l , PC4) . T h e contour in tervals are as i n F i g u r e 5.4. B i n sizes are 2500 m . Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 102 F i g u r e 5.13: P r o j e c t i o n of the Z50oA da ta (dots) and its I D N L P C A a p p r o x i m a t i o n (open circles) onto the spaces spanned by E O F s (a) (ei,e2), (b) (ei,e3), (c) (e2,e3), and (d) (e 1,e 2,e 3). Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 103 F i g u r e 5.14: P l o t of the I D Z500A t ime series a(tn) (left) and the associated h i s t o g r a m es t imate of the P D F (r ight) . Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 104 cor respond ing to the smaller peak, and 3% i n the in te rmedia te b r a n c h . A n i n s p e c t i o n of the t i m e series ct(tn) indicates that va r i ab i l i t y i n the a p p r o x i m a t i o n is charac ter i sed by osc i l l a to ry m o t i o n along the more popu la t ed b ranch , w i t h episodic excurs ions to the less p o p u l a t e d b ranch . F i g u r e 5.15 displays composi tes of Z50oA over t imes associated w i t h ne ighbourhoods i n a o f w i d t h 0.375, centred at intervals of 0.375 f rom -2.9 to 1.6. Inspec t ion o f these composi tes indica tes tha t the b ranch of va r i ab i l i t y associated w i t h the larger of the two peaks i n the P D F o f ot(tn) corresponds t o essential ly a s t and ing o sc i l l a t i on (F igures 5.15 (h)- ( l ) , especia l ly (h) and (k)) . T h i s osc i l la t ion is character ised by anomal ies w i t h a s t rong wavenumber four s ignal i n mid la t i tudes and a wave n u m b e r two s ignal i n higher l a t i tudes . T h e c i r c u l a t i o n anomalies over the eastern N o r t h Pac i f i c a n d N o r t h A m e r i c a d i sp lay a P N A - l i k e s t ruc ture , and those over E u r a s i a resemble the S c a n d i n a v i a p a t t e r n descr ibed i n B e l l and H a l p e r t (1995). T h e s t ruc ture of this osc i l l a t ion i n m i d l a t i t u d e s bears a s t r i k i n g s i m i l a r i t y to that observed i n the 500 m b field of the lead ing m o d e of the c o m b i n e d E O F analysis car r ied out by B a l d w i n and D u n k e r t o n (1999), a l t hough the s ignal i n the po la r regions is ra ther different. V a r i a b i l i t y a long the b ranch associated w i t h the smaller of the two peaks is i l l u s t r a t e d i n F igu res 5.15 (a)-(d). U n l i k e the b ranch associated w i t h the larger peak, anomal ies a long th is b r a n c h are o f a single phase, w i t h highs over the po la r r eg ion a n d lows over the mid l a t i t udes . A l o c a l m a x i m u m i n the anomaly field over the polar reg ion is l oca t ed over S o u t h e r n G r e e n l a n d , w i t h a loca l negative e x t r e m u m i n mid l a t i t udes over the A z o r e s . T h i s s t ruc ture is s t rongly reminiscent of the negat ive phase o f the N A O . A compar i son of the composi tes presented i n F i g u r e 5.15 w i t h those g iven i n F i g u r e 5.9, a n d of the t i m e series presented i n F igures 5.14 and 5.6, demonst ra tes tha t the I D N L P C A app rox ima t ions of S L P A and Z50QN describe essential ly the same m o d e o f va r i -ab i l i ty . T h e c i r c u l a t i o n anomalies p lo t t ed i n F i g u r e 5.9(b)-(d) bear a s t r i k i n g resemblance Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 105 F i g u r e 5.15: A s i n F i g u r e 5.9, but for Z500A compos i t ed over charac ter i s t ic ranges of a associa ted w i t h the I D Z5Q0A N L P C A a p p r o x i m a t i o n . C o n t o u r i n t e rva l is 2 0 m (.. . ,-30,-10,10,30,. . .) . C o n t i n u e d on next page. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 106 (0.1,0.475); N=1635 (k) (0.85,1.225); N=600 (h) (-0.275,0.1); N=565 (i) (0.475,0.85); N=1842 (I) (1.225,1.6); N=34 Figure 5.15: Continued. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 107 to those p l o t t e d i n F i g u r e 5.15(b)-(d). S imi l a r ly , the po la r a n d E u r a s i a n sectors o f the s t and ing osc i l l a t ion as diagnosed f rom the analysis of S L P A (F igu re 5.9 (h) a n d (k)) are essent ia l ly the same as those diagnosed f rom the analysis of Z500A (F igu re 5.15(h) and (k) ) . T h e more a n d less popu la t ed regimes of c i r cu la t ion i n the Z 5 0 0 A f ie ld are referred to as the osc i l l a to ry and spli t -f low regimes, respect ively. T h i s charac te r i sa t ion w i l l be borne out by the composi tes of Z500 t ° be presented later . T h e two approx ima t ions do not correspond exact ly . T h e Z500A a p p r o x i m a t i o n is i n the less -popula ted regime 19% of the t ime , i n contrast to 13% for the S L P A a p p r o x i -m a t i o n . T h e cor responding numbers for the b r a n c h descr ib ing the s t and ing osc i l l a t i on are 78% a n d 84%. T h e spli t -f low regime is more f requent ly o c c u p i e d i n the Z500A ap-p r o x i m a t i o n t h a n i n the S L P A . F i g u r e 5.16 displays a scat terplot of the t i m e series ct(tn) cor re spond ing to the S L P A and Z 5 0 o A approx imat ions . T h e co r re l a t ion coefficient be-tween these two t i m e series is 0.81. T y p i c a l l y the app rox ima t ions are s imul t aneous ly b o t h i n e i ther the osc i l la tory or spli t-f low regimes. F o r a s m a l l n u m b e r o f m o n t h s , the S L P A a p p r o x i m a t i o n is i n the spl i t - f low regime whi le the Z500A a p p r o x i m a t i o n is i n the osc i l l a to ry reg ime, but more c o m m o n l y when the two approx ima t ions differ i t is because the S L P A a p p r o x i m a t i o n is i n the osc i l la tory regime w h e n the Z500A a p p r o x i m a t i o n is i n the spl i t - f low regime. T h i s is a reflect ion of the fact tha t the spl i t - f low reg ime is more f requent ly occup ied i n the Z50oA a p p r o x i m a t i o n t h a n i n the S L P A a p p r o x i m a t i o n . T h i s difference i n o c c u p a t i o n stat is t ics has l i t t l e effect on the composi tes character-i s ing the spl i t - f low b ranch . However , the s tanding osc i l l a t ion diagnosed f r o m the Z500A d a t a is more hemispher ic i n extent t h a n is that f rom the S L P A da ta . N o t e i n p a r t i c u l a r the presence i n the Z50oA a p p r o x i m a t i o n of a l o c a l e x t r e m u m i n the height anomal ies over wes tern C a n a d a and A l a s k a of the same sign as the anomalies over the pole , and l o c a l e x t r e m a of the opposi te sign over the cent ra l N o r t h Pac i f ic and eastern N o r t h A m e r i c a w h i c h together compr ise a wave t ra in reminiscent of the P N A pa t t e rn . These e x t r e m a do Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 108 F i gure 5.16: Scatterplot of the time series o:(in) corresponding to the ID SLPA and Z^QQA N L P C A approximations. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 109 not occur i n the composi tes cor responding to the I D S L P A N L P C A a p p r o x i m a t i o n . F i g u r e 5.17 presents the geographical d i s t r i bu t i on of var iance and skewness of the C C C m a mode l l ed Z500A f ield. These maps show the same sort of s t ruc ture as those for the m o d e l l e d S L P : the var iance displays a general increase w i t h l a t i t ude , a n d the skewness is general ly negat ive i n mid la t i tudes and pos i t ive i n the po la r reg ion . A s we l l , the s t anda rd dev i a t i on and skewness of Z500A d isp lay l o c a l e x t r e m a i n m u c h the same loca t ions as S L P A , bu t w i t h a considerably stronger s ignal i n the N o r t h Pac i f i c . A s was the case w i t h S L P A , b o t h the skewness and var iance maps are consistent w i t h the two-regime s t ruc ture of the N L P C A a p p r o x i m a t i o n of Z500A. C o m p o s i t e s of the t o t a l Z5oo field are g iven i n F i g u r e 5.18. T h e composi tes o n the spl i t - f low b r a n c h of the Z500A a p p r o x i m a t i o n do not differ subs tan t ia l ly f r o m those based on the S L P A analysis . T h e spli t -f low b ranch of the I D Z500A N L P C A a p p r o x i m a t i o n is seen indeed to descr ibe split flow over G r e e n l a n d and an enhanced r idge over W e s t e r n C a n a d a . V a r i a b i l i t y along the osc i l la tory b r anch of the Z500A a p p r o x i m a t i o n describes a l t e rna t ing ampl i f i ca t i on and a t tenua t ion of the c l ima to log ica l ridges over b o t h E u r o p e and N o r t h A m e r i c a , i n contrast to va r i ab i l i t y along the S L P A a p p r o x i m a t i o n , w h i c h was concen t ra ted i n the E u r o p e a n sector. F i n a l l y , F i g u r e 5.19 displays composi tes of S L P A based on the I D Z50oA N L P C A a p p r o x i m a t i o n . A s was the case w i t h the Z500A and Z500 fields, the composi tes o f S L P A over charac ter i s t ic ranges of the t ime series a(tn) co r responding to the Z500A a p p r o x i m a t i o n are more hemispher ic i n extent t h a n was the case w i t h the cor respond ing composi tes based on the S L P A a p p r o x i m a t i o n . In pa r t i cu la r , there is enhanced v a r i a b i l i t y over wes tern C a n a d a , cor responding to an ampl i f ied wavenumber 2 c o n t r i b u t i o n to the a n o m a l y f ie ld. T h u s , a I D N L P C A analysis of a second field f rom the C C C M A coup led G C M , v i z N o r t h e r n H e m i s p h e r e w in t e r t ime 500mb geopotent ia l height anomal ies , produces results Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 110 (a) (b) Figure 5.17: Maps of variance (a) and skewness (b) of CCCma modelled Z50o- Contour interval in (a) is 500 m 2 and in (b) is 0.2. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 111 (a) (-2.9,-2.525); N=39 (C) (-2.15,-1.775); N=442 (e) (-1.4,-1.025); N=97 (b) (-2.525,-2.15); N=237 (d) (-1.775,-1.4); N=367 (f) (-1.025,-0.65); N=59 Figure 5.18: As with Figure 5.15, but for composites of Z500. The 5300 and 5500 m contours are in bold. Contour interval is 50 m. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 112 Figure 5.18: Continued. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 113 Figure 5.19: As in Figure 5.15, but for S L P A . Contour interval is 2 h P a (...,-3,-1,1,3,...). Continued on next page. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 114 (9) (h) (0.85,1.225); N=600 (1.225,1.6); N=34 Figure 5.19: Continued. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 115 that are very similar to those produced by the analysis of the SLPA field, providing strong evidence that this structure is a robust feature of the data. The structures diagnosed in the two fields are not exactly the same: the Z500A approximation anomaly fields are more hemispheric in extent than those of the SLPA approximation, and the split-flow branch is occupied more frequently. 5.5 Analysis of G C M SLP in a 4 x C 0 2 Integration A plot of the histogram estimate of the 2D marginal P D F of SLPA from the G C M integration with C 0 2 concentrations at four times the pre-industrial level, in the space of the leading two control integration EOFs, is presented in Figure 5.20, along with the corresponding ID N L P C A approximation. The N L P C A approximation explains 31.4% of the variance of the data, in contrast to 29.8% explained by the first P C A mode. The most striking aspect of this P D F is the fact that it appears to be much more Gaussian than the corresponding P D F for the control integration (Figure 5.4). In particular, the P D F of the 4 x C 0 2 integration lacks the bulge in the lower-right quadrant associated with the split-flow branch of the control integration ID N L P C A approximation. That the structure of the data is now much more nearly Gaussian is reflected in the structure of the ID N L P C A approximation displayed in Figure 5.20. This approximation is no longer branched, but is a slightly curved line lying nearly along the major axis of the marginal distribution. The corresponding time series (Figure 5.21) is unimodal. Note that the N L P C A approximation of the SLPA in the 4 x C 0 2 integration lies along what was the oscillatory branch of the SLPA N L P C A approximation from the control integration. Figure 5.22 displays composites of the ID 4 x C 0 2 SLPA N L P C A approximation over characteristic ranges of the (standardised) time series ct(tn). The variability illustrated in Figure 5.22 is not precisely that of a standing oscillation; the composites over the Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 116 F i g u r e 5.20: H i s t o g r a m es t imate of the m a r g i n a l p r o b a b i l i t y dens i ty func t i on (con-tours) o f S L P A f rom the G C M in tegra t ion w i t h C 0 2 concentra t ions at four t imes the p r e - indus t r i a l value, i n the space of the leading two con t ro l in teg ra t ion S L P A P C A modes , over la id w i t h the cor responding I D N L P C A a p p r o x i m a t i o n (open c i rc les) . C o n t o u r i n -tervals are 5 x 1 0 - 4 , 1 . 5 x l O " 3 , 3 x 1 0 " 3 , 6 x 1 0 - 3 , 1 x 1 0 " 2 , 2 x 1 0 - 2 , 3 x 1 0 ~ 2 . T h e h i s t o g r a m b i n size is 25 h P a i n b o t h direct ions . Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 117 0 500 1000 1500 2000 2500 3000 0 0.05 0.1 t (months) F requency Figure 5.21: Plot of the ID N L P C A SLPA time series a(tn) (left) and the associated histogram estimate of the P D F (right) for the G C M integration with C 0 2 concentration at four times the pre-industrial level. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 118 (a) (-5.4,-4.5); N=3 ( C ) (-3.6,-2.7); N=20 (e) (-1.8,-0.9); N=344 (b) (-4.5,-3.6); N=5 (d) (-2.7,-1.8); N=71 (f) (-0.9,0); N=1075 Figure 5.22: As in Figure 5.8, but for SLPA in the GCM integration at four times the pre-industrial CO2 concentration. Contour interval is 2 hPa (...,-3,-1,1,3,...). Continued on next page. Figure 5.22: Continued. Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 120 tai ls o f a accentuate s t rongly pos i t ive anomalies over the northeast Pac i f i c a n d n o r t h e r n S ibe r i a , respect ively . N o t e , however, that the a p p r o x i m a t i o n is i n a state be tween F igu res 5.22(e) and (h) 93% of the t ime , over w h i c h range the s t ruc ture is essent ial ly tha t o f a s t and ing osc i l l a t ion . N o t e also that the s t ructure of this osc i l l a t ion is ve ry s imi l a r to tha t of the osc i l l a to ry b r anch of the ID N L P C A a p p r o x i m a t i o n of S L P A i n the con t ro l i n t eg ra t ion , w i t h a s l ight ly enhanced N o r t h Pac i f ic centre of ac t ion . T h u s , the dominan t m o d e o f nonl inear va r i ab i l i t y i n S L P A under quad rup led CO2 is o n l y w e a k l y non l inear and s t rongly resembles the osc i l la tory regime of S L P A va r i ab i l i t y under p r e - i n d u s t r i a l CO2 concent ra t ions . P a l m e r (1999) has suggested, based on exper iments w i t h s imple l o w - d i m e n s i o n a l non -l inear sys tems, tha t the response of the c l imate sys tem to ex te rna l pe r tu rba t ions (e.g. increased a tmospher i c CO2) w i l l not be a change i n the s t ruc ture of dominan t c i r c u l a t i o n regimes, bu t r a the r i n thei r occupa t ion frequencies. O u r results are b r o a d l y consistent w i t h this hypothes i s . U n d e r quad rup led CO2, the osc i l la tory regime of S L P A v a r i a b i l i t y becomes more f requent ly occup ied at the expense of the spl i t - f low regime. In fact, the l a t t e r a lmost disappears entirely. O u r results are consistent as we l l w i t h those of U l b r i c h and C h r i s t o p h (1999), w h o found tha t the ECHAM4+OPYC3 coupled G C M pred ic ted a sys temat ic no r theas tward shift of the centres of ac t ion of the N A O w i t h an increase i n a tmospher ic CO2. T h e N A O as d iagnosed us ing l inear s ta t i s t ica l tools is seen i n the l ight o f the ID con t ro l S L P A N L P C A a p p r o x i m a t i o n as a compromise between B r a n c h 1 a n d B r a n c h 2 va r i -ab i l i ty . A s the spl i t - f low b ranch becomes depopula ted w i t h increas ing a tmospher i c C 0 2 , the com pr omise w i l l increas ingly favour the anoma ly pat terns charac te r i s t i c of the os-c i l l a t o r y b r a n c h . T h i s change w o u l d manifest i t se l f i n a reg iona l analysis as a secular no r theas tward shift of the centres of ac t ion of the N A O . F i g u r e 5.23 displays maps of the var iance and skewness of the S L P A field i n the 4x CO2 Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 121 in t eg ra t ion . T h e var iance displays l o c a l e x t r e m a over N o r t h e r n R u s s i a a n d S o u t h e r n A l a s k a ; b o t h o f these loca t ions cor respond to centres o f ac t ion i n the range o f a over w h i c h the ID N L P C A a p p r o x i m a t i o n is effectively a s t and ing osc i l l a t ion . T h e skewness is s t rongly pos i t ive over the Nor theas t Pac i f i c and over E u r o p e a n R u s s i a . Inspec t ion of the composi tes d i sp layed i n F i g u r e 5.22 indicates tha t the N L P C A a p p r o x i m a t i o n is consistent w i t h this d i s t r i b u t i o n of skewness. F o r s t rongly negat ive values of a(tn), there are s t rong pos i t ive S L P anomalies centred south of A l a s k a tha t are not coun te rba lanced b y s imi l a r negat ive anomalies for large pos i t ive a(tn). S i m i l a r l y , for s t rongly pos i t ive a(tn), s t rong pos i t ive S L P anomalies occur over N o r t h e r n R u s s i a tha t do not have a negat ive counterpar t for large negative values of a. T h u s , the ID N L P C A a p p r o x i m a t i o n characterises the gross features of b o t h the skewness and the var iance of the 4xCC>2 S L P A da ta . N o t e tha t whi le the tails i n the d i s t r i bu t i on of a(tn) do not con t r ibu te subs tan t i a l ly to the var iance, the s t ructure of w h i c h is d o m i n a t e d b y v a r i a b i l i t y near the m e a n of a(tn), t hey are manifest i n the spa t ia l s t ruc ture of the skewness. N o t e as we l l tha t because skewness involves a compromise between the second and t h i r d order momen t s , i ts e x t r e m a are shifted somewhat sou thward of the centres of ac t ion i n the compos i te towards la t i tudes of lower variance. T h u s , i n the C C C M A coupled G C M , the quad rup l ing of the a tmospher ic concent ra -t i o n o f CO2 results not so m u c h i n a change i n the s t ructures o f a tmospher ic va r i ab i l i t y , bu t i n a change i n the i r occupa t ion frequencies, consistent w i t h the hypothes is o f P a l m e r (1999). T h e spl i t - f low b r a n c h o f the ID con t ro l S L P A a p p r o x i m a t i o n is d e p o p u l a t e d i n the 4 x C 0 2 r u n , wh i l e the osc i l la tory regime remains la rge ly unchanged . Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 122 F i g u r e 5.23: (a) V a r i a n c e (contour in t e rva l 10 ( h P a ) 2 ) and (b) skewness (contour i n t e r v a l 0.2) of S L P A f rom G C M in tegra t ion w i t h quadrup led a tmospher ic C 0 2 . Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 123 5.6 Conclusions In this chapter the N L P C analysis of sea level pressure anomaly data and 500mb geopo-tential height anomaly data from the Canadian Centre for Climate Modelling and Anal-ysis coupled G C M have been considered. It was found that for data from a control integration at pre-industrial CO2 concentrations, in both the SLPA and the Z 5 0 0 A fields, the ID N L P C A approximation was composed of three branches, and the corresponding time series was bimodal. The most frequently occupied branch describes the oscillatory amplification or attenuation of the climatological ridges over Europe, for the SLPA ap-proximation, and over both Europe and North America for the Z50oA. approximation. The less populated branch is occupied episodically, and strongly resembles the nega-tive phase of the North Atlantic Oscillation. Mid-tropospheric circulation patterns in this regime are associated with split flow over southern Greenland. The fact that the independently-determined ID N L P C A approximations found for SLPA and Z 5 0 0 A are very similar provides strong evidence that these approximations represent actual struc-ture in the data. An analysis of SLPA data from a G C M integration with C 0 2 levels at four times the-pre-industrial concentrations indicated that the characteristic regimes of low-frequency variability did not change in structure, but in occupation frequency, as was suggested by Palmer (1999). The oscillatory regime became increasingly populated at the expense of the split-flow regime, which in fact became so depopulated that the branched structure of the N L P C A approximation disappeared. In all cases, it was seen that the ID N L P C A approximation describes coherent hemispheric-scale atmospheric variability that is consistent with both the geographical distribution of variance and of skewness. In the case of the control run integrations, the strong skewness in the region of the N A O is associated with the fact that variability is Chapter 5. NLPCA of Northern Hemisphere Atmospheric Circulation Data 124 dominated by Gaussian oscillations with episodic excursions to a single strong anomaly pattern. This mechanism for the origin of skewness has been suggested in the literature by Nakamura and Wallace (1991). N L P C analyses were also carried out on a Southern Hemisphere SLPA data set from the CCCma coupled G C M and on the Trenberth and Paolino Northern Hemisphere SLPA data set (1980). In neither case could robust ID N L P C A approximations be found that differed from the ID P C A approximation. In the case of the Southern Hemisphere G C M SLPA, this result is likely because of the strong zonal symmetry of the lower boundary, which is not as favourable to the formation of geographically-fixed circulation anomalies as is the Northern Hemisphere, with its strong topography and land-sea contrast. There are two possible reasons for the failure to detect strong nonlinear structure in the Tren-berth and Paolino SLPA data. The first is simply that such structure is not there, and that the structure of the data is predominantly linear. The second is that the structure is there, but cannot be found because the records are short and data are of very poor quality in the polar regions, precisely the latitudes in which the nonlinear structure found in the G C M data is strongest. In fact, no data are reported on the latitude circles 75°N and 85°N or at the pole. A future study will consider the analysis of N C E P reanalysis data, which does not suffer from this deficiency in geographical coverage. Chapter 6 Seven-Layer Networks for Discontinuous Project ion and Expans ion Functions 6.1 Introduct ion K r a m e r ' s N L P C A allows the es t ima t ion of continuous funct ions Sf a n d f such tha t X(tn) = (fosf)(X(tn)) (6.1) is the o p t i m a l ( in the least-squares sense) a p p r o x i m a t i o n to X(tn) by a continuous curve or surface. A s was po in t ed out by K i r b y and M i r a n d a (1996) and by M a l t h o u s e (1998), a n d discussed i n C h a p t e r 2, the res t r i c t ion to con t inu i ty of the p ro j ec t i on a n d expan-sion maps precludes the de tec t ion and charac ter i sa t ion by N L P C A o f P - d i m e n s i o n a l s t ruc tures not topo log ica l ly equivalent to the uni t cube i n 3ftp. T h e s implest example i l l u s t r a t i n g this p r o b l e m arises i n the N L P C analysis of the un i t c i rc le e m b e d d e d i n 9ft2, S = {(cos t, s'mt); t £ [0, 27r)}. (6.2) T h e topo logy of the man i fo ld parameter i s ing this s t ruc ture is 5 1 ; the cor respond ing m a p Sf : 3ft2 i—y 3ft is d iscont inuous , because for any 0 < e < < 1 there w i l l be a s m a l l open n e i g h b o u r h o o d D o n the circle about the point (1,0) such that Sf(D) = [0, e)U(27r —e, 27r). T h e r e is no cont inuous m a p between the circle and the uni t i n t e rva l i n 3ft, because they are topo log ica l l y inequiva len t . T h u s , N L P C A as fo rmula ted by K r a m e r cannot character ise the l o w - d i m e n s i o n a l s t ruc ture u n d e r l y i n g a self- intersecting curve or surface. O n the other h a n d , i f the funct ions i n (6.1) are a l lowed to be discontinuous, t h e n there exists 125 Chapter 6. Seven-Layer Networks for Discontinuous Functions 126 a class of parameter manifolds topologically inequivalent to the unit cube in $lp such that the projection and expansion functions of an N L P C A approximation X(£ n ) can be approximated. This class includes topologies such as the circle (S1) and torus (S1 ® S1) but excludes the sphere (S2). Because the parameterisation of the sphere is degenerate at the poles, it cannot be expressed as a simple discontinuous function from 5R2 (—>• SR3. Given enough neurons in its single hidden layer, a three-layer feed-forward neural network with M input neurons and P output neurons can approximate to arbitrary ac-curacy any continuous function from SRM to 3ffp. Such a network generally does a poor job approximating a discontinuous function between these spaces. If, however, the number of hidden layers is increased to two, each with nonlinear transfer functions, the network is much better able to approximate a discontinuous function. This fact suggests the generalisation of Kramer's 5-layer network to a 7-layer network, with two encoding and two decoding layers. In such a network, the first four layers approximate the (potentially discontinuous) function Sf : SRM H-4 9ftp and the last four layers the (potentially discon-tinuous) function f : 5RP i—> 5RM; the 7-layer network is just the composition of these two functions. This chapter demonstrates that in fact a 4-layer neural network is better able to approximate a discontinuous function than is a 3-layer network, and investigates the application of the generalised 7-layer network to the N L P C analysis of an ellipse embedded in 5ft2. 6.2 Neural Network Approximations to Discontinuous Functions To demonstrate the superior ability of a 4-layer neural network to approximate a discon-tinuous function relative to that of a 3-layer network, consider the AT-point data sets: X(£ n ) = (cos 2ntn, sin 27vtn) Chapter 6. Seven-Layer Networks for Discontinuous Functions 127 Y{tn) tn + 0.5 1/N < tn < 0.5 (6.3) * „ - 0 . 5 0.5 < tn < 1 where tn = 1/ iV, 2 / J V , 1 T h e m a p g re la t ing X ( £ n ) and ^ ( £ n ) is d i scont inuous at tn = 0.5; i t is d i sp layed i n F i g u r e 6.1. T h e func t iona l re la t ionsh ip be tween X ( i n ) and Y(tn) ( w i t h N = 1000) is mode l l ed us ing two different feed-forward neura l ne tworks . T h e first , deno ted N N 1 , has a single h idden layer w i t h 13 neurons, and the second, N N 2 , has two h i d d e n layers w i t h 5 neurons i n each. H y p e r b o l i c tangents were used as the transfer funct ions i n each o f the h i d d e n layers. T h e number of ne twork parameters i n N N 1 and N N 2 is 53 and 51 respect ively. E a c h ne twork was t r a ined for 10000 i te ra t ions , s t a r t ing f r o m r a n d o m l y - d e t e r m i n e d i n i t i a l weights and biases. Because the d a t a are noise-free, ea r ly - s topp ing was not employed i n the t r a in ing process. F igu re s 6.2(a) and (b) d isplay respect ive ly the app rox ima t ions to Y(tn) p r o d u c e d by neu ra l ne tworks N N 1 and N N 2 . N o t e that N N 2 is m u c h be t te r able to represent the d i scon t inu i ty i n the re la t ionsh ip between X ( f n ) and Y(tn) t h a n is N N 1 . T h e ne twork N N 1 spreads the d i scon t inu i ty out over a number of points , and in t roduces G i b b s osc i l la t ions i n i ts v i c i n i t y . T h e poin t of d i scon t inu i ty is m u c h bet ter represented by N N 2 : the w i d t h over w h i c h the d i scon t inu i ty is spread is subs tan t ia l ly reduced , and the G i b b s osc i l la t ions are suppressed. T h i s difference is not a result of differences i n i n i t i a l weights be tween the two cases, as an ensemble of independen t ly t r a ined ne tworks (not shown) demonst ra tes tha t for b o t h N N 1 and N N 2 the approx imat ions are independent of i n i t i a l pa ramete r values. N o r is the i m p r o v e m e n t a ref lect ion of the n u m b e r of parameters i n the m o d e l , as N N 1 and N N 2 have essential ly the same number of parameters . T h u s , N N 2 is be t te r able to a p p r o x i m a t e discont inuous funct ions t h a n is N N 1 because of differences i n the i r architecture. Chapter 6. Seven-Layer Networks for Discontinuous Functions 128 Figure 6.1: Plot of Y(tn) (diamonds) and X ( i n ) (open circles) as defined by equation (6.3) with TV = 50. Chapter 6. Seven-Layer Networks for Discontinuous Functions 129 F i g u r e 6.2: N e u r a l ne twork approx imat ions of the func t iona l r e la t ionsh ip be tween d a t a sets X ( i n ) a n d Y(tn) defined i n equa t ion (6.3): (a) ne twork w i t h one h i d d e n layer , (b) ne twork w i t h two h i d d e n layers. Chapter 6. Seven-Layer Networks for Discontinuous Functions 130 6.3 7-Layer N L P C A Network T h e i m p r o v e d ab i l i t y of a 4-layer neura l ne twork over a 3-layer ne twork to a p p r o x i m a t e d iscont inuous funct ions suggests that N L P C A car r ied out us ing a 7-layer ne twork shou ld be be t te r able t h a n that us ing a 5-layer ne twork to app rox ima te s t ructures whose projec-t i o n and expans ion funct ions are d iscont inuous . In this sect ion, this hypothes i s is tes ted t h r o u g h cons idera t ion of the ID N L P C A a p p r o x i m a t i o n o f a n el l ipse i n the p lane : n — 1 X(tn) = (0.5 cos 27rt n, 0.7 s in 27r* n) tn = — — (6.4) for n = l , N — 1, where N = 550. A s i n the previous sect ion, because these d a t a are noise-free, no ea r ly s topp ing was used i n cons t ruc t ing the m o d e l . F i g u r e 6.3 displays the N L P C A a p p r o x i m a t i o n X ( £ „ ) and the t i m e series a(tn) = 5 / ( X ( t n ) ) ob ta ined us ing a 5-layer autoassociat ive neu ra l ne twork w i t h L = 13 nodes i n the m a p p i n g and d e m a p p i n g layer. Convergence to this a p p r o x i m a t i o n was ex t r eme ly slow; the results shown are f rom a r u n i n w h i c h 10 6 i t e ra t ions were ca r r i ed out i n the t r a i n i n g . A s was discussed i n M a l t h o u s e (1998), the a p p r o x i m a t i o n X ( i n ) d isplays h i g h f ide l i ty to the o r ig ina l da ta th roughout the b u l k of the d o m a i n (the F U V is 1.2 x 1 0 - 3 ) , bu t fails en t i re ly over a range o f poin ts cen t red near (0.5,0.2). T h i s set o f poin ts corresponds to the ne ighbourhood of the poin t at w h i c h the i dea l p ro j ec t i on a n d expans ion funct ions are d iscont inuous because the topology of the m a n i f o l d pa ramete r -i s ing the curve is S1. N o t e tha t the pos i t ion of this poin t is a rb i t ra ry , a n d is d e t e r m i n e d r a n d o m l y by the i n i t i a l ne twork parameters . T h e t ime series oc(tn) is unab le to repre-sent this d i scont inu i ty , d i sp lay ing precisely the same overshoot as was observed i n F i g u r e 6.2(a). Because o f the res t r i c t ion of sj and f to the space of cont inuous funct ions , the 5-layer autoassocia t ive ne twork cannot accura te ly app rox ima te the el l ipse. O n the other h a n d , F i g u r e 6.4 displays the N L P C A a p p r o x i m a t i o n X ( £ n ) a n d the cor-r e spond ing t i m e series ot(tn) = Sf(X(tn)) ob ta ined us ing a 7-layer au toassoc ia t ive neu ra l Figure 6.3: Results of ID N L P C analysis of an ellipse using a 5-layer autoassocia-tive neural network: (a) N L P C A approximation X ( i n ) , (b) associated time series a ( t n ) = Sf(X.(tn)) (note scale on y-axis is arbitrary). Chapter 6. Seven-Layer Networks for Discontinuous Functions 132 ne twork w i t h 5 h i d d e n neurons i n each o f the two m a p p i n g and d e m a p p i n g layers . A s was the case w i t h the 5-layer network, this ne twork was t r a ined for 10 6 i t e ra t ions . T h i s a p p r o x i m a t i o n displays great f idel i ty to the o r ig ina l da t a th roughout the d o m a i n (the F U V is 2.4 x 1 0 - 5 ) , except for a ra ther sma l l in te rva l centred near (-0.5,-0.1). A g a i n , this reg ion corresponds to the ne ighbourhood of the point i n w h i c h the idea l m a p p i n g and d e m a p p i n g funct ions are discont inuous; note that i t is m u c h smal ler t h a n the cor respond-ing reg ion ob t a ined us ing a 5-layer network. A s we l l , the F U V of the a p p r o x i m a t i o n f rom the 7-layer ne twork is two orders of magn i tude smaller t h a n tha t of t he a p p r o x i m a t i o n f rom the 5-layer ne twork . T h e t i m e series a(tn) f rom the 7-layer ne twork is a m u c h bet-ter a p p r o x i m a t i o n of the discont inuous p ro jec t ion m a p t h a n was that ob t a ined f r o m the 5-layer ne twork . T h e 5-layer autoassocia t ive neura l ne twork considered above con ta ined 107 p a r a m -eters, wh i l e the 7-layer ne twork conta ined 103. T h u s , the super ior per formance of the 7-layer ne twork is not due to i ts hav ing a larger number of parameters . B o t h ne tworks were t r a i n e d for the same number of i tera t ions . O t h e r t r a i n i n g runs (not shown) d i sp lay the same super io r i ty of the 7-layer ne twork to the 5-layer ne twork; this resul t is insensi-t ive to the choice o f i n i t i a l m o d e l parameters . T h u s , the 7-layer ne twork is super ior to the 5-layer ne twork because of differences i n archi tec ture ; the presence o f two m a p p i n g a n d d e m a p p i n g layers allows the ne twork to app rox ima te a broader class o f p ro j ec t i on and expans ion funct ions t h a n that open to K r a m e r ' s o r ig ina l ne twork . 6.4 Conclusions It has been demons t r a t ed that a 7-layer general isat ion o f K r a m e r ' s 5-layer ne twork , i n w h i c h two m a p p i n g and d e m a p p i n g layers are used, displays a m a r k e d super io r i ty i n i ts a b i l i t y to m o d e l l ow-d imens iona l s t ruc ture whose cor responding p ro j ec t i on a n d expans ion Chapter 6. Seven-Layer Networks for Discontinuous Functions 133 _0 8' 1 1 1 1 1 1 1 1 1 i i— -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 X1 0.3 (b) 8' 0.2 O o 0.1 o CP 0 ^ ^ ^ ^ J-o,' -0.2 o -0.3 o -0.4 o o -0.5 o 1 . . . 1 1 1 1 0 100 200 300 400 500 600 t n Figure 6.4: As in Figure 6.3, but for N L P C A performed using a 7-layer autoassociative neural network. Chapter 6. Seven-Layer Networks for Discontinuous Functions 134 funct ions are d iscont inuous . T h i s super ior i ty was shown th rough the N L P C analysis of an el l ipse i n the p lane , whose p ro jec t ion a n d expans ion funct ions are d i scont inuous because of the S1 t opo logy of the man i fo ld parameter i s ing the ell ipse. A n o t h e r so lu t ion to this p r o b l e m was presented by K i r b y and M i r a n d a (1996), w h o i n t r o d u c e d the concept of "c i rcu la r nodes" i n neura l ne tworks . Because s t anda rd nodes i n neu ra l ne tworks m a p to the rea l l ine , K i r b y and M i r a n d a no t ed that such nodes are unab le to encode "angular" in fo rma t ion . In other words , they cannot m a p con t inuous ly to a space w i t h S1 topology. K i r b y and M i r a n d a i n t roduced the idea of c o u p l i n g pairs o f nodes such that thei r ou tpu t is cons t ra ined to fa l l on the un i t c i rc le . These coup led nodes can be t rea ted as a single abstract node that can encode angular i n f o r m a t i o n . T h i s coup l ing requires mod i f i ca t i on of the backpropaga t ion a l g o r i t h m ( A p p e n d i x A ) ; such a m o d i f i c a t i o n is presented b y K i r b y and M i r a n d a . T h e r e are two problems w i t h K i r b y and M i r a n d a ' s approach . F i r s t , the m e t h o d is somewhat difficult to i m p l e m e n t , as it requires a mod i f i ca t ion of the b a c k p r o p a g a t i o n a l g o r i t h m a n d thus cannot be car r ied out us ing a regular , c o m m e r c i a l neu ra l ne twork package. Second , the use of c i rcu la r nodes presumes the existence o f pe r iod i c s t ruc tu re i n the da ta . T h e 7-layer ne twork presented here has nei ther of these diff icult ies. H o w e v e r , K i r b y a n d M i r a n d a ' s m e t h o d allows an exact charac ter i sa t ion of maps to S1, whereas the 7-layer genera l i sa t ion of K r a m e r ' s ne twork produces on ly app rox ima t ions . It is not clear tha t this l ack of exactness is a p r o b l e m i n prac t ice , as neu ra l ne tworks can on ly ever p roduce a p p r o x i m a t e models to data . F e w large-scale c l i m a t i c phenomena display the str ict p e r i o d i c i t y of the d a t a set con-s idered i n this sect ion. C l i m a t e va r i ab i l i t y is m u c h more i r regular , a n d examples o f s t r i c t l y pe r iod ic l i m i t cycles are rare. E x a m p l e s of per iod ic va r i ab i l i t y i nc lude the sea-sona l cyc le ( i nc lud ing annua l and higher ha rmon ics ) , the t ides, a n d the d i u r n a l cyc le ; a l l of these are pe r iod ic because they arise f rom ex te rna l , a s t ronomica l forcings w h i c h Chapter 6. Seven-Layer Networks for Discontinuous Functions 135 are strongly periodic. These signals are of well-known frequency, as are their harmonics arising from through nonlinear rectification (Huang and Sardeshmukh, 1999). If the an-nual cycle and its harmonics are stationary in time, they can be removed from the data using harmonic analysis. If stationarity does not obtain, more complicated techniques such as wavelet analysis could be used instead. Thus, the inability of Kramer's 5-layer autoassociative neural network to model strictly periodic variability is not expected to be a significant liability in the analysis of climate data. The fact that this problem can be addressed by generalising Kramer's network is an interesting theoretical result. In practice, its application will probably be limited to data thought to have been generated by a highly periodic system. Chapter 7 Summary and Conclusions 7.1 Summary Principal component analysis is a tool of great utility in the analysis of climate data. The phase space of a typical climatic data set has a dimensionality in the range from hundreds to thousands, which makes straightforward visualisation impossible. P C A finds the or-dered set of axes in the phase space which provides the most efficient linear description of the data, in that the projection of the data into the space spanned by the first P axes is the optimal P-dimensional linear approximation to the data (in a least-squares sense). It is thus a tremendously useful algorithm for the reduction of data dimensionality. There is, however, no a priori reason to believe that any lower-dimensional structure underlying a multivariate dataset is linear, in that it is optimally described by a set of orthogonal axes. Indeed, it is a basic result of the theory of dynamical systems that if the dynamical system i(t) = P(x(t)) (7-1) x(0) = x0 (7.2) possesses a if-dimensional stable attractor F (where K is not necessarily an integer), then in general the description of the manifold V by a set of Cartesian coordinates (ie, orthogonal coordinate axes) requires the dimension of the embedding space to be at least 2K + 1. P C A can determine an appropriate embedding space for a general low-dimensional structure, but it cannot provide the most efficient description of this surface. 136 Chapter 7. Summary and Conclusions 137 In an effort to circumvent this limitation of P C A , Kramer (1991) introduced a nonlin-ear generalisation of P C A , which he denoted Nonlinear Principal Component Analysis. Implemented using a 5-layer feed-forward neural network, N L P C A attempts to find func-tions S f : 5ftM !->• 3?p and f : 3ftp H - » UM such that the approximation X ( i n ) = (f o Sf)(X(£„)) is the optimal P-dimensional nonlinear approximation to the M-dimensional data set X ( i n ) , in a least-squares sense. If the functions S f and f are constrained to be linear, this approach reduces to P C A . This thesis presents the first systematic application of N L P C A to the analysis of climate data. A summary of the results follows. 1. The similarities and differences between P C A and N L P C A were discussed. Both P C A and N L P C A can be characterised as variational problems for detecting lower-dimensional structure in multivariate data sets: P C A finds linear structure, N L P C A can find more general nonlinear structure. The P-dimensional P C A approximation to a data set is the sum of its first P one-dimensional approximations; in N L P C A , the situation is more complicated. A single P-dimensional surface determined us-ing an autoassociative neural network with P neurons in the bottleneck layer is said to be a nonmodal approximation, while the sum of the first P one-dimensional approximations is said to be a modal approximation; these two approaches are gen-erally distinct. A degeneracy in the parameterisation of the surface determined by a P-dimensional nonmodal N L P C A complicates interpretation of the parameteri-sation for P > 2, and generally requires the consideration of a secondary feature extraction problem; thus, modal N L P C A for the analysis of real climate data is preferred. Both ID P C A and N L P C A modes correspond to a single time series, however, unlike P C A , a ID N L P C A mode does not have a unique spatial pattern. In fact, the approximation is characterised by a sequence of spatial patterns, which Chapter 7. Summary and Conclusions 138 m a y be v i sua l i sed c inematographica l ly . B o t h P C A and N L P C A p a r t i t i o n var iance i n tha t the s u m of the t o t a l variance of a P - d i m e n s i o n a l a p p r o x i m a t i o n w i t h the t o t a l var iance of the res idua l equals the var iance o f the o r ig ina l da ta . T h i s resul t can be p roved for P C A , but so far remains an e m p i r i c a l resul t for N L P C A . 2. I m p l e m e n t a t i o n of N L P C A was considered i n some de ta i l . T h e N L P C A a l g o r i t h m is ca r r i ed out us ing a 5-layer autoassociat ive feed-forward neura l ne twork m o d e l . Because N L P C A involves the so lu t ion of a nonl inear va r i a t i ona l p r o b l e m , no ana-l y t i c f o r m u l a for the so lu t ion exists, and i te ra t ive func t ion m i n i m i s a t i o n procedures mus t be used. T h i s m i n i m i s a t i o n process is referred to as " t r a i n i n g " . A n issue of p r i m a r y i m p o r t a n c e is the avoidance of overf i t t ing; an N L P C A a p p r o x i m a t i o n shou ld r o b u s t l y characterise lower-d imens iona l s t ruc ture i n the da ta . O v e r f i t t i n g was avo ided t h r o u g h the use of an ear ly s topping technique i n w h i c h a r a n d o m l y -selected p o r t i o n of the o r ig ina l da t a is set aside and not used to fit the m o d e l parameters . T h e performance of the m o d e l over this w i t h h e l d da t a set is m o n i -tored as t r a i n i n g progresses; i f the m o d e l performance over this v a l i d a t i o n set is infer ior to the performance over the t r a in ing da ta , the m o d e l is d i scarded . A n ensemble of models is cons t ruc ted , each of w h i c h satisfy the robustness c r i t e r i a desc r ibed above. These are referred to as the "candidate mode l s " . T h e n u m b e r of neurons i n the encod ing and decoding layers of the autoassocia t ive ne twork a n d the n u m b e r of i te ra t ions used to t r a i n the m o d e l are adjusted u n t i l a sufficient n u m b e r of s imi l a r candida te models is ob ta ined , at w h i c h poin t a representa t ive m e m b e r is ex t r ac t ed and said to be the N L P C A a p p r o x i m a t i o n to the da ta . 3. A p r e l i m i n a r y inves t iga t ion of N L P C A was car r ied out us ing d a t a s ampled f rom the L o r e n z a t t rac tor , to w h i c h r a n d o m noise was added . It was found tha t at low to modera te noise levels N L P C A was able to p roduce robus t a p p r o x i m a t i o n s Chapter 7. Summary and Conclusions 139 that were more characteristic of the data, and explained higher percentages of the variance, than were those produced by PCA. In the limit of no noise, the ID NLPCA approximation explained 76% of the total variance while the corresponding PCA approximation explained 60%. The NLPCA mode was able to describe covariability between two uncorrelated but dependent variables, which PCA cannot do. A 2D nonmodal NLPCA approximation explained 99.5% of the variance, while the 2D PCA approximation explained 95%. As the noise level increased, the improvement of NLPCA over PCA decreased, in terms of the percentage of variance explained. At high noise levels, in which the structure of the Lorenz data was entirely obscured, NLPCA could not produce robust approximations that differed from those obtained by PCA. 4. A tropical Pacific Ocean sea surface temperature data set was analysed by NLPCA. The ID NLPCA approximation to these data describes average ENSO variability. Unlike the ID PCA approximation, which also describes average ENSO variabil-ity, the ID NLPCA approximation is able to characterise the asymmetry in SST spatial patterns between average El Nino and La Nina events. The distribution of SST is skewed toward positive anomalies in the eastern Pacific and toward nega-tive anomalies in the western Pacific, and the ID NLPCA approximation is able to characterise this distribution of skewness. The second NLPCA mode is also related to ENSO, and seems to characterise differences between individual events. It is particularly active in the period after 1977, a time which has been noted in a number of studies as corresponding to a shift in ENSO variability. A 2D non-modal NLPCA approximation to the SST was also determined. A secondary PCA analysis in the 2D space of variables parameterising this surface indicated that the Chapter 7. Summary and Conclusions 140 v a r i a b i l i t y descr ibed by this a p p r o x i m a t i o n contains essential ly the same i n f o r m a -t i o n as the first two N L P C A modes. A n N L P C analysis o f t r o p i c a l I ndo -Pac i f i c sea l eve l pressure was also car r ied out . T h e first mode was found to co r respond to the E N S O s ignal i n S L P , and character ised asymmet r ies i n the S L P be tween E l N i n o and L a N i n a events. N o robust nonl inear s t ruc ture c o u l d be de tec ted i n . the residuals f rom the first N L P C A mode; w i t h i n the const ra in ts i m p o s e d b y the q u a n t i t y of d a t a avai lable, S L P var iab i l i ty is l inear b e y o n d the first N L P C A m o d e . 5. T h e first N L P C A mode of month ly-averaged N o r t h e r n H e m i s p h e r e S L P A f r o m the C a n a d i a n C e n t r e for C l i m a t e M o d e l l i n g a n d A n a l y s i s coup led G C M was f o u n d to p a r t i t i o n the da t a in to two dis t inc t popula t ions . T h e I D N L P C A a p p r o x i m a t i o n h a d a th ree-branched s t ruc ture , and the d i s t r i b u t i o n associated t i m e series was s t rongly b i m o d a l . One b r anch ( B r a n c h 1), associated w i t h the larger peak of the d i s t r i b u t i o n of the t ime series, corresponded to a s t and ing osc i l l a t ion w i t h anomal ies of oppos i te s ign over the po la r region and the mid la t i t udes , s t rongly r e sembl ing the A r c t i c O s c i l l a t i o n . M o s t of the da t a p ro jec ted onto this b r a n c h . A second b r a n c h ( B r a n c h 2) , w h i c h corresponded to the smal ler peak of the t i m e series P D F , was on ly o c c u p i e d episodical ly , and s t rongly resembled the negat ive phase of the N o r t h A t l a n t i c O s c i l l a t i o n . T h e B r a n c h 1 s ignal i n 500 m b geopoten t ia l height composi tes (based on the S L P A analysis) descr ibed a l t e rna t ing ampl i f i ca t ion a n d a t t enua t ion of the c l ima to log i ca l r idge over E u r o p e , whi le B r a n c h 2 descr ibed s t rong ly spl i t flow over G r e e n l a n d . A n analysis of the S L P skewness indica tes tha t there are s t rong pos i t ive and negat ive l oca l e x t r e m a i n skewness i n the same loca t ions as the pos i t ive and negat ive e x t r e m a of the B r a n c h 2 a n o m a l y pa t te rns . These e x t r e m a i n skewness thus seem to arise because of the c o m b i n a t i o n o f a s t and ing osc i l l a t i on d i sp l ay ing G a u s s i a n va r i ab i l i t y w i t h episodic occurrences o f a s t rongly anomalous Chapter 7. Summary and Conclusions 141 circulation, a possibility that has been suggested by other authors in the past. An NLPC analysis of the 500mb height anomaly field itself resulted in a branched approximation that was very similar to that obtained from the SLPA field; the primary difference between the two is that the former corresponds to anomaly fields that are somewhat more hemispheric in extent. Thus, the two-regime structure identified using NLPCA appears to be equivalent barotropic in nature. Finally, the results of an analysis of SLPA from a GCM run with C 0 2 concentra-tions at four times the pre-industrial levels indicated that the oscillatory branch of the control NLPCA approximation was largely unchanged but that the split-flow branch was substantially depopulated. This behaviour is consistent with the suggestion by Palmer (1999) that the climate response to greenhouse forcing will not be changes in the structure of characteristic circulation regimes, but in their occupation frequencies. 6. Because Kramer's 5-layer autoassociative neural network can only find continu-ous projection and expansion functions, a 7-layer generalisation was suggested for the analysis of data sets where the expansion and projection functions are discon-tinuous. Such a data set is an ellipse, because the manifold parameterising the low-dimensional approximation is topologically different than the unit interval. It was found that the NLPCA approximation produced by a 7-layer autoassociative neural network was substantially better than that produced by a 5-layer network, because the former was much better able to approximate discontinuous projection and expansion functions. It was demonstrated that this improvement was due to the different architectures of the two networks, and not to a difference in the number of model parameters. Chapter 7. Summary and Conclusions 142 7.2 Conclusions N o n l i n e a r P r i n c i p a l C o m p o n e n t A n a l y s i s has been demons t ra ted to be a useful t o o l for the analysis o f c l ima te da ta . W h e r e P C A characterises the var iance i n a m u l t i v a r i a t e da t a set, N L P C A is able to also characterise higher order momen t s of va r i ab i l i ty . T h i s thesis has i n t r o d u c e d N L P C A to the s tudy of c l imate da ta , bu t there remains w o r k to be done. T h e m o d e l b u i l d i n g procedures descr ibed i n C h a p t e r 2 r e t a in an element of sub jec t iv i ty . It w o u l d be useful to develop an au tomated , objec t ive technique for b u i l d i n g N L P C A app rox ima t ions ; such a me thodo logy could perhaps use a sophis t ica ted regu la r i sa t ion t echn ique such as Genera l i sed Cross V a l i d a t i o n ( Y u v a l , 1999). Use fu l general isat ions o f N L P C A m i g h t also be developed t h rough modif ica t ions of the cost func t ion to i n c l u d e cons t ra in ts on the a p p r o x i m a t i o n . A n example w o u l d be a s imple s t ruc ture cons t ra in t o f the f o r m used i n ro t a t ed P C A analysis (R ichman ,1986) . A second m o d i f i c a t i o n o f the cost func t ion to ensure self-consistency of the N L P C A m o d e l is suggested i n R i c o - M a r t i n e z et a l . (1996). K i r b y and M i r a n d a (1999) suggest a number of other possible const ra in ts . A s we l l , N L P C A c o u l d be used i n a nonl inear general isat ion o f S ingula r Sys tems A n a l y s i s ( S S A ) , w h i c h is s i m p l y P C A appl ied to a t ime series expressed i n delay coord ina te space ( B r o o m h e a d and K i n g , 1986; v o n S to rch and Zwiers ,1999) . F i n a l l y , ana ly t i c demons t ra -t ions of features of N L P C A discovered empi r i ca l ly , such as the p a r t i t i o n i n g o f var iance , are l a c k i n g . T h e analysis of large mul t iva r i a t e datasets, ei ther observat ions or G C M o u t p u t , is an i m p o r t a n t a c t i v i t y for the unders tand ing of c l imate var iab i l i ty . N o n l i n e a r P r i n c i p a l C o m p o n e n t A n a l y s i s w i l l not replace t r ad i t i ona l P C A , because i t is more diff icult to i m p l e m e n t . Howeve r , I" believe that N L P C A m a y wel l become an i m p o r t a n t a d d i t i o n to the geophys ica l s ta t i s t ic ian 's too lbox , and w i l l p rov ide i m p o r t a n t ins ight i n to the v a r i a b i l i t y o f the c l ima te sys tem. Append ix A Neura l Networks A s is desc r ibed i n de ta i l by B i s h o p (1995) and by H s i e h and T a n g (1998), a feed-forward neu ra l ne twork is a non-paramet r i c s ta t i s t ica l m o d e l used to es t imate (general ly n o n l i n -ear) f unc t i ona l re la t ions be tween two da ta sets, X ( i n ) £ Dft5 and Z(tn) £ 9ftT. T h e neu ra l ne twork is composed of a series of para l le l layers, each o f w h i c h contains a n u m b e r of process ing elements , or neurons, such that the ou tpu t of the i t h layer is used as i n p u t to the (i + l ) t h . I f y^ is the ou tpu t of the jth. neuron of the z th layer , t h e n y r ' - ^ ^ E - r ^ + ^ j (A.i) is the ou tpu t of the kth. neuron of the (i + l ) t h layer. T h e elements of the arrays w^k+1^ are referred to as the weights, and those of the vectors bkl+1^ as the biases. T h e transfer function charac te r i s ing the ( i + l ) t h layer is denoted t r ( , + 1 ) ; i t m a y be l inear or nonl inear . T h e first , or i n p u t layer , receives the values of the da t a presented to the ne twork ; i ts transfer func t ion is s i m p l y the iden t i ty m a p 07 : x \-¥ x. T h e famous f l ex ib i l i t y o f neu ra l ne tworks comes f rom the use of nonl inear transfer funct ions ( t y p i c a l l y h y p e r b o l i c tangents) i n some or a l l of the r e m a i n i n g layers. A n i m p o r t a n t resul t due to C y b e n k o (1989) is tha t a 3-layer neu ra l ne twork w i t h S i npu t neurons, h y p e r b o l i c tangent transfer funct ions i n the second layer and l inear transfer funct ions i n the t h i r d layer o f T neurons can a p p r o x i m a t e to a rb i t r a ry accuracy any cont inuous func t ion f rom 3ft5 to 3ftT, i f the n u m b e r of neurons i n the second layer is sufficiently large. 143 Appendix A. Neural Networks 144 X z Input layer layer Output layer Figure A . l : Diagrammatic representation of neural network with input data X and output data Z. Neural networks are often represented diagrammatically, with open circles represent-ing the neurons and straight lines the weights, as is illustrated in Figure A . l . These diagrams are meant to be suggestive of biological neuronal systems, reflecting the origin of neural network theory in the context of artificial intelligence research. Feed-forward neural networks as described above are fit to data, or trained, as follows. Suppose it is desired to fit the data X(i„) £ Oft5 to the data Z(tn) £ Denoting the network as Af, the weights and biases (referred to collectively as u) are adjusted until the cost function J=< \\Z-Af{X;p)\\2 > (A.2) Appendix A. Neural Networks 145 is minimised. That is, parameters / i m m are determined such that dJ 0 = A dm (A.3) min for each parameter \±i. In the above, the angle brackets < . > denote averaging over time and ||.|| denotes the L2-norm. For a network in which all transfer functions <r^ are linear, equation (A.3) can be expressed as a simple matrix equation which admits an analytic solution, and the approach simply reduces to multivariate regression. When the transfer functions are nonlinear, equation (A.3) does not reduce to a simple matrix equation, and the cost function J must be minimised numerically. The minimisation of the cost function was carried out using a conjugate-gradient algorithm (Press et al., 1992). At each step of this algorithm, the gradient of J with respect to the parameters over which it is being minimised must be evaluated. It is easy to show that for an /-layer neural network, " = -2 < ey(')(^)yr> > (A.4) fa* where ,(0 _ ^ .(0..(»-i) _L A( T = E<yr} + ^ , (A.5) i e(tn) = Z{tn)-Af{X(tn);u.) (A.6) and the prime represents differentiation. Thus, the gradient of the cost function with respect to the weights of the output layer can be evaluated exactly at every step of the minimisation algorithm. The same is true for the gradient of J with respect to the biases of the output layer. Furthermore, it can be shown that 8 J -2 E^'("(^)K]-"'-1)(#"1))!/i'"2)) (A-T) Appendix A. Neural Networks 146 and dj - 2 E E ^%^)v>^-%^)) V ( ' - » ) ( ^ - a ) ) y i ' - » ) \ (A.8) N o t e that i n the express ion for the derivat ives of J w i t h respect to the weights at the (i — l ) t h layer , quant i t ies used i n the ca lcu la t ion of the der iva t ive w i t h respect to the weights o f the z th layer appear ( in the square brackets) . A n efficient a l g o r i t h m for the eva lua t ion o f the gradient of J then is to calcula te the quant i t ies : df = e / V ) (A-9) = E ^ M M - 1 ^ ^ ) ( A . 1 0 ) i 4 /_2) = £<f-1Hf_1)*'(/~a)(«?~a)) ( A . 1 1 ) i a n d so fo r th , up to cft^K T h e gradient of J w i t h respect to the weights is s i m p l y t h e n g iven by S i m i l a r equat ions h o l d for the gradient of J w i t h respect to the biases. T h i s a l g o r i t h m for eva lua t ing the gradient of the cost func t ion w i t h respect to the neu ra l ne twork parameters is referred to as back-propagation. N o t e that backpropaga t ion allows the exact eva lua t ion of the gradient at each step of the conjugate gradient a l g o r i t h m . Appendix B Principal Curves and Surfaces F i r s t consider P r i n c i p a l Curves , w h i c h are the I D vers ion of P C S . F o l l o w i n g H a s t i e a n d Stue tz le (1989), let X be a r a n d o m vector i n 3ftM, the d i s t r i b u t i o n o f w h i c h , deno ted b y h, has f ini te second moment s . A s usua l , assume E(X.) = 0, w i t h o u t loss o f general i ty . L e t the m a p f : 3ft —> 9ftM be C°°, uni t speed (that is , ||f'|| = 1), a n d non-self- intersect ing ( that is , Xi ^ A 2 => f(Ai) ^ f(A 2 )) . T h e pro jec t ion func t ion Sf : 3ftM —> 3ft is defined such that 5 /(x) = s u p { A : | | x - f ( A ) | | = i n f ||x - f(//)||}. ( B . l ) A * T h a t is , of those A such tha t f (A) are the poin ts closest t o x, £/(x) is t he largest . H a s t i e and S tue tz le defined f to be self-consistent i f the expec t a t i on value of a l l the poin ts p ro j ec t ing onto a ce r ta in point on the curve f is tha t poin t itself, ie, i f Eh{X\sf{X) = \)=f{\). (B .2 ) where Eh(.) denotes expec t a t i on over the d i s t r i b u t i o n h. Iff is self-consistent, t h e n i t is a principal curve. Has t i e and Stuetz le p roved that i f f is cons t ra ined to be l inear , and i f i t is self-consistent, t hen it is a p r i n c i p a l component . F u r t h e r m o r e , i n the space o f curves t h r o u g h the da ta , a p r i n c i p a l curve is a c r i t i c a l poin t of the dis tance func t ion , i n the fo l lowing sense. Le t f be a p r i n c i p a l curve and g be an a rb i t r a ry s m o o t h func t ion f r o m 3ft to 3ftM, a n d define ft = f + f g . D e n n i n g the dis tance func t ion d(x>ft) = | |x-f t( a / t(x))| | (B.3) 147 Appendix B. Principal Curves and Surfaces 148 and D2(h,ft) = Ehd2(X,ft), then d 0 (B.4) t=o Equation (B.4) is a formal expression of the idea that the principal curve passes through the "middle" of the data, and is the clearest point of connection between PCS and NLPCA. A very useful fact pointed out by LeBlanc and Tibshirani (1994) is that principal curves partition variance such that M M M £ varpO) = £ v a r ( / i ( - / ( X ) ) ) + ^ T v a r p ^ - / ^ ( X ) ) ) (B.5) i=i j=i i=i It is therefore sensible to describe the principal curve f as explaining a certain fraction of the variance of the random vector X. The construction of principal curves presented above presupposes knowledge of the distribution h of the random vector X; this is not usually known for real data sets. Hastie and Stuezle present an iterative algorithm, involving the use of locally-weighted running-lines smoothers, to determine the principal curve of a data set. Hastie and Stuetzle denoted the generalisation of principal curves to two dimensions as principal surfaces. As with principal curves, given a two-dimensional surface f £ 3ft2, a projection index s/ : 3ftM —¥ 3ft2 is defined such that sy(x) is the point on f closest to x; f is a principal surface if £ ( X | S / ( X ) = A) = f(A) (B.6) Hastie and Stuetzle did not discuss principal surfaces in much detail; they did mention that preliminary numerical investigations indicated that principal surfaces share many properties with principal curves. Principal surfaces can be further generalised to surfaces of dimensionality higher than two; LeBlanc and Tibshirani (1994) constructed a piecewise linear generalisation they denoted adaptive principal surfaces. Appendix B. Principal Curves and Surfaces 149 Hastie and Stuetzle provided rigorous proofs of PCS results for only the ID case, al-though LeBlanc and Tibshirani (1994) and Malthouse (1998) consider higher-dimensional generalisations. A hybrid approach to NLPCA using both Kramer's autoassociative neural network and PCS has been proposed by Dong and McAvoy (1995). Their method involved a preliminary pre-processing of the data by PCS, followed by an NLPC analysis of the pro-cessed data. The logic behind this approach was that PCS possesses a better theoretical grounding than does Kramer's NLPCA, but does not produce a simple model of the data in that when presented with a new data point, there is no simple algorithm to deter-mine its PCS approximation. Kramer's NLPCA, on the other hand, does produce such a model of the data. Dong and McAvoy recognised that by combining the two methods, the benefits of both can be realised. However, this approach is more cumbersome than Kramer's NLPCA alone, and was thus not implemented in this work. Append ix C Symmetr ic and Ant i - symmetr ic Components of Composites Consider a spatial field Y(tn),n = 1 , N , which is composited using a time series X(tn) as follows. Two subsets of time, and t^~\ are defined by = {tn:X(tn)>c} ( C l ) = {tn:X(tn)<-c} (C.2) where c is some threshold: in our case, it is one standard deviation of X(tn). The positive and negative composites of Y(tn), Y ^ + ' and Y ^ - ^ , are simply defined as the respective averages over { i ^ } and {^~^}: Y<+> = < Y > + (C.3) Y<-> = < Y > _ (C.4) Maps of Y W and Y^~\ where Y(tn) is SSTA and X(tn) is the NDJ-averaged Nino 3.4 index are shown in Figures 4.6(a) and (b), respectively. In general, the spatial patterns of Y ( + ) and Y ^ - ) differ by more than a sign. It is desired to determine the symmetric and anti-symmetric (under a change of sign in X(tn)) components of and Y^~\ To address this question, assume the minimal nonlinear model for the dependence of Y(tn) on X(tn): Y(tn) = a (°> + a^X(tn) + aWX(tn)2 + en (C.5) where en is a vector noise process, assumed to satisfy < e > = < € > + = < e > _ = 0 (C.6) 150 Appendix C. Symmetric and Antisymmetric Components of Composites 151 T h e v a l i d i t y o f this a p p r o x i m a t i o n depends b o t h on the va l id i t y o f the m o d e l (C .5 ) and o n the l e n g t h o f the records, {t n }> {^L+}^> a n ( i {^T^}- It can be as sumed w i t h o u t loss o f genera l i ty tha t b o t h Y ( i n ) and X(tn) are zero-centred i n t ime . T h i s impl i e s tha t 0 = A * 0 * + A<2> < X2 > (C .7 ) a n d so the m o d e l can be r ewr i t t en as Y(tn) = a^X(tn) + aW(X(tn)2- <X2>) + en (CS) T h e vec tor A ^ 1 ) is the field pa t t e rn an t i - symmet r i c under a change o f sign i n X, wh i l e A ^ 2 ) is the f ie ld pa t t e rn s y m m e t r i c under such a change of s ign. T h e y w i l l be referred to, respect ive ly , as the an t i - symmet r i c and s y m m e t r i c components o f the compos i te . C lea r ly , by the def in i t ion of the composi te maps , Y W = <X >++*&(< X2 >+ - <X2 >) (C .9 ) Y<-> = a™ <X>.+aS2){<X2>- -<X2>) ( C I O ) T h i s is a l inear equa t ion w h i c h can easily be solved to y i e l d : a W = D~l[(<X2 >_ - <X2 > ) Y ^ - ( < X 2 >+ - <X2 > ) Y ( - 1 ] ( C . l l ) a(2) = D - 1 [ - < I > . Y W + < I > + Y H ] (C .12) where D =< X >+ (< X2 >_ - < X2 >)- < X >_ (< X2 >+ - < X2 >) (C .13) F i g u r e 4.6(c) d isplays A ^ 2 ' for t rop i ca l Pac i f i c S S T A compos i t ed accord ing to the N i n o 3.4 i ndex . A m a p of A ^ (not shown) looks ve ry m u c h l ike S S T A E O F m o d e 1 ( F i g u r e 4.6(a)); the spa t i a l cor re la t ion between these is 0.975. Appendix C. Symmetric and Anti-symmetric Components of Composites 152 Hoerling et al. (1997) considered the linear combinations Y^+) — Y^ -) and Y^+) + Y^") and denoted them the linear and nonlinear responses of Y to X, respectively. The above analysis shows this identification is appropriate only in the special case that < X >_ = — < X >+ and < X2 >_ = < X2 > + . This is certainly not true in general, although for the case they considered, in which X(tn) was an SST index similar to Nino 3.4, it is a fairly good approximation. In principle, one could use the technique described above to fit the more general model Y('n) = E ^k)X(tn)k + en (C.14) k=0 by stratifying the data into K + 1 subsets. Presumably, however, as K increases, so does the sampling variability associated with decreasing validity of approximations (C.6). Bibl iography B a l d w i n , M . P . and D u n k e r t o n , T . J . (1999). P r o p a g a t i o n of the A r c t i c O s c i l l a t i o n f r o m the s t ra tosphere to the t roposphere. J. Geophys. Res., i n rev iew. B a r n s t o n , A . G . (1994). L i n e a r s ta t i s t ica l shor t - te rm c l ima te p red ic t ive s k i l l i n the n o r t h e r n hemisphere . J. Climate, 7:1513-1564. B a r n s t o n , A . G . , G l a n t z , M . H . , and H e , Y . X . (1999). P r e d i c t i v e s k i l l of s t a t i s t i ca l and d y n a m i c a l c l ima te models i n S S T forecasts du r ing the 1997-98 E l N i n o episode a n d the 1998 L a N i n a onset. Bull. Amer. Met. Soc, 80:217-243. B a r n s t o n , A . G . a n d L ivezey , R . E . (1987). Class i f ica t ion , seasonality, a n d persis tence o f low-frequency a tmospher ic c i r cu l a t i on pat terns . Mon. Wea. Rev., 115:1083-1126. B a r n s t o n , A . G . and R o p e l e w s k i , C . F . (1992). P r e d i c t i o n o f E N S O episodes us ing canon ica l cor re la t ion analysis . J. Climate, 5:1316-1345. B a r n s t o n , A . G . , van den D o o l , H . M . , Zebiak , S. E . , B a r n e t t , T . P . , J i , M . , R o d e n h u i s , D . R . , C a n e , M . A . , L e e t m a , A . , G r a h a m , N . E . , R e p e l e w s k i , C . R . , K o u s k y , V . E . , O ' L e n i c , E . A . , and L ivezey , R . E . (1994). Long- l ead seasonal forecasts - where do we s tand? Bull. Am. Met. Soc., 75:2097-2114. B a t t i s t i , D . S. a n d H i r s t , A . C . (1989). In te rannua l va r i ab i l i t y i n a t r o p i c a l a tmosphere-ocean m o d e l : Influence of the basic state, ocean geomet ry a n d nonl inear i ty . J. Atmos. Sci., 46:1687-1712. B a t t i s t i , D . S. and Sarachik , E . S. (1995). U n d e r s t a n d i n g and p r e d i c t i n g E N S O . Reviews of Geophysics, Supplement : 1367-1376. U . S . N a t i o n a l R e p o r t to I n t e rna t i ona l U n i o n of Geodesy and Geophys ics 1991-1994. B e l l , G . D . and H a l p e r t , M . S. (1995). Interseasonal and Interannual Variability: 1986 to 1993. N O A A A t l a s N o . 12. U . S . D e p a r t m e n t of C o m m e r c e . B e r l i n e r , L . M . (1992). S ta t i s t ics , p robab i l i ty , and chaos. Statistical Science, 7:69-122. B i s h o p , C . M . (1995). Neural Networks for Pattern Recognition. C l a r e n d o n Press , O x f o r d . B j e r k n e s , J . (1969). A t m o s p h e r i c teleconnections f rom the equa to r i a l Pac i f i c . Mon. Weath. Rev., 97:163-172. 153 BIBLIOGRAPHY 154 Bretherton, C. S., Smith, C , and Wallace, J. M. (1992). An intercomparison of methods for finding coupled patterns in climate data. J. Climate, 5:541-560. Broomhead, D. and King, G. P. (1986). Extracting qualitative dynamics from experi-mental data. Physica, 20D:217-236. Buell, C. E. (1975). The topography of the empirical orthogonal functions. In Fourth Conf. on Probability and Statistics in Atmospheric Sciences, Tallahassee, FL, pages 188-193. Amer. Meteor. Soc. Buell, C. E. (1979). On the physical interpretation of empirical orthogonal functions. In Sixth Conf. on Probability and Statistics in Atmospheric Sciences, Banff, AB, Canada, pages 112-117. Amer. Meteor. Soc. Burgers, G. and Stephenson, D. B. (1999). The "normality" of El Nino. Geophys. Res. Lett, 26:1027-1030. Corti, S., Giannini, A., Tibaldi, S., and Molteni, F. (1997). Patterns of low-frequency variability in a three-level quasi-geostrophic model. Climate Dynamics, 13:883-904. Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Math. Contr. Signals Syst., 2:303-314. Del Frate, F. and Schiavon, G. (1999). Nonlinear principal component analysis for the radiometric inversion of atmospheric profiles by using neural networks. IEEE Trans. Geosci. Rem. Sensing, 37:2335-2342. DeMers, D. and Cottrell, G. (1993). Nonlinear dimensionality reduction. Neural Inform. Processing Syst., 5:580-587. Deser, C. (1999). A note on the annularity of the "Arctic Oscillation". Geophys. Res. Lett., in review. Dong, D. and McAvoy, T. J. (1996). Nonlinear principal component analysis - based on principal curves and neural networks. Computers Chem. Engng., 20:65-78. Farrell, B. F. and Ioannou, P. I. (1996). Generalised stability theory. Part I: autonomous operators. J. Atmos. Sci., 53:2025-2040. Feldstein, S. and Lee, S. (1996). Mechanisms of zonal index variability in an aquaplanet GCM. J. Atmos. Sci., 53:3541-3555. Finnoff, W., Hergert, F., and Zimmermann, H. G. (1993). Improving model selection by nonconvergent methods. Neural Networks, 6:771-783. BIBLIOGRAPHY 155 Flato, G. and et al. (1999). The Canadian Centre for Climate Modelling and Analysis global coupled model and its climate. Climate Dynamics, in review. Fotheringhame, D. and Baddeley, R. (1997). Nonlinear principal components analysis of neuronal spike train data. Biological Cybernetics, 77:282-288. Fyfe, J. C , Boer, G. J., and Flato, G. M. (1999). The Arctic and Antarctic Oscillations and their projected changes under global, warming. Geophys. Res. Lett., 26:1601-1604. Ghil, M. and Childress, S. (1987). Topics in Geophysical Fluid Dynamics: Atmospheric Dynamics, Dynamo Theory, and Climate Dynamics. Springer-Verlag. Gong, D. and Wang, S. (1999). Definition of Antarctic oscillation index. Geophys. Res. Lett., 26:459-462. Hastie, T. and Stuetzle, W. (1989). Principal curves. J. Amer. Statist. Assoc., 84:502-516. Hastie, T. and Tibshirani, R. J. (1990). Generalised Additive Models. Chapman and Hall, London. Hoerling, M. P., Kumar, A., and Zhong, M. (1997). El Nino, La Nina, and the nonlinearity of their teleconnections. J. Climate, 10:1769-1786. Holzer, M. (1996). Asymmetric geopotential height fluctuations from symmetric winds. J. Atmos. Sci., 53:1361-1379. Hsieh, W. W. and Tang, B. (1998). Applying neural network models to prediction and data analysis in meteorology and oceanography. Bull. Amer. Met. Soc, 79:1855-1870. Hsu, H.-H. and Lin, S.-H. (1992). Global teleconnections in the 250-mb streamfunction field during the northern hemisphere winter. Mon. Wea. Rev., 120:1169-1190. Huang, H.-P. and Sardeshmukh, P. D. (1999). Another look at the annual and semiannual cycles of atmospheric angular momentum. J. Climate, in review. Hurrell, J. W. (1995). Decadal trends in the North Atlantic Oscillation: Regional tem-peratures and precipitation. Science, 269:676-679. Jin, F.-F., Neelin, J. D., and Ghil, M. (1996). El Nino/Southern Oscillation and the annual cycle: subharmonic frequency-locking and aperiodicity. Physica D, 98:442-465. BIBLIOGRAPHY 156 K i r b y , M . J . and M i r a n d a , R . (1994). Non l inea r r educ t ion of h igh -d imens iona l d y n a m i c a l systems v i a neu ra l networks . Phys. Rev. Lett., 72:1822-1825. K i r b y , M . J . and M i r a n d a , R . (1996). C i r c u l a r nodes i n neura l ne tworks . Neural Comp., 8:390-402. K i r b y , M . J . and M i r a n d a , R . (1999). E m p i r i c a l d y n a m i c a l sys tem r e d u c t i o n I: G l o b a l non l inear t ransformat ions . In Semi-Analytic Methods for the Navier-Stokes Equa-tions, v o l u m e 20 of CRM Proc. Lecture Notes, pages 41-63 . A m e r . M a t h . Soc. K r a m e r , M . A . (1991). N o n l i n e a r p r i n c i p a l component analysis us ing autoassocia t ive neu ra l ne tworks . AIChE J., 37:233-243. K u s h n i r , Y . a n d W a l l a c e , J . M . (1989). Low-f requency va r i ab i l i t y i n the n o r t h e r n h e m i -sphere win te r : geographica l d i s t r i bu t ion , s t ruc ture and t ime-scale dependence. J. Atmos. Sci., 46:3122-3142. L e B l a n c , M . and T i b s h i r a n i , R . (1994). A d a p t i v e p r i n c i p a l surfaces. J. Amer. Statist. Assoc., 89:53-64. L o r e n z , E . N . (1956). E m p i r i c a l o r thogona l functions and s ta t i s t i ca l weather p r e d i c t i o n . T e c h n i c a l r epor t , D e p a r t m e n t of Meteoro logy , M I T . Science R e p o r t 1. L o r e n z , E . N . (1963). D e t e r m i n i s t i c nonper iod ic flow. J. Atmos. Sci., 20:130-141. L u c k e , M . (1976). S t a t i s t i ca l dynamics of the lorenz m o d e l . J. Stat. Phys., 15:455-475. M a l t h o u s e , E . C . (1998). L i m i t a t i o n s of nonl inear P C A as pe r fo rmed w i t h generic neu ra l ne tworks . IEEE Trans. Neural Nets., 9:165-173. M c F a r l a n e , N . A . , B o e r , G . J . , B l a n c h e t , J . - P . , and Laza re , M . (1992). T h e C a n a d i a n C l i m a t e C e n t r e second-generat ion general c i r cu l a t i on m o d e l a n d its e q u i l i b r i u m c l i -ma te . J. Climate, 5:1013-1044. M i l l e r , A . J . , W h i t e , W . B . , and C a y a n , D . R . (1997). N o r t h Pac i f i c t he rmoc l ine var ia t ions on E N S O t imescales. J. Phys. Ocean., 27:2023-2039. M o , K . C . and G h i l , M . (1987). Sta t is t ics and dynamics of persis tent anomal ies . / . Atmos. Sci., pages 877-901. M o n a h a n , A . H . (1999). N o n l i n e a r p r i n c i p a l compnent analysis b y neura l ne tworks : T h e o r y and app l i ca t i on to the Lo renz sys tem. J. Climate, i n press. N a k a m u r a , H . a n d Wal l ace , J . M . (1991). Skewness o f low-frequency f luc tua t ions i n the t ropospher ic c i r cu l a t i on du r ing the nor the rn hemisphere win te r . J. Atmos. Sci., 48:1441-1448. BIBLIOGRAPHY 157 N o r t h , G . R . (1984). E m p i r i c a l o r thogonal functions and n o r m a l modes . J. Atmos. Sci., 41:879-887. O j a , E . (1997). T h e nonl inear P C A learn ing ru le i n independent componen t analysis . Neurocomputing, 17:25-45. O j a , E . and K a r h u n e n , J . (1993). Non l inea r P C A : A l g o r i t h m s and app l ica t ions . T e c h n i c a l R e p o r t A 1 8 , H e l s i n k i U n i v e r s i t y of Technology. O t t , E . (1993). Chaos in Dynamical Systems. C a m b r i d g e U n i v e r s i t y Press . P a c a n o w s k i , R . C , D i x o n , K . , and R o s a t i , A . (1993). T h e G F D L m o d u l a r ocean m o d e l users guide. T e c h n i c a l R e p o r t 2, G e o p h y s i c a l F l u i d D y n a m i c s L a b o r a t o r y , P r i n c e t o n , U S A . P a l m e r , T . N . (1999). A nonl inear d y n a m i c a l perspect ive on c l ima te p r e d i c t i o n . J. Climate, 12:575-591. P e n l a n d , C . (1996). A s tochast ic m o d e l o f IndoPac i f i c sea surface t e m p e r a t u r e anomal ies . Physica D, 98:534-558. P e n l a n d , O , F l i i g e l , M . , and C h a n g , P . (1999). O n the iden t i f i ca t ion of d y n a m i c a l regimes i n an in t e rmed ia te coupled ocean-atmospher ic m o d e l . J. Climate, page i n press. P e n l a n d , C . a n d Sa rdeshmukh , P . D . (1995). T h e o p t i m a l g r o w t h of sea surface t emper -ature anomal ies . J. Climate, 8:1999-2024. P e r l w i t z , J . a n d Graf , H . - F . (1995). T h e s ta t i s t ica l connec t ion between t ropospher ic a n d s t ra tospher ic c i r cu l a t i on of the nor the rn hemisphere i n win te r . J. Climate, 8 :2281-2295. P h i l a n d e r , S. G . (1990). El Nino, La Nina, and the Southern Oscillation. A c a d e m i c Press , S a n Diego . Press , W . H . , Teuko l sky , S. A . , V e t t e r l i n g , W . T . , a n d F l a n n e r y , B . P . (1992). Numerical Recipes in C. C a m b r i d g e U n i v e r s i t y Press , C a m b r i d g e . Pr iesendorfer , R . W . (1988). Principal Component Analysis in Meteorology and Oceanog-raphy. E l s ev i e r , A m s t e r d a m . R i c h m a n , M . B . (1986). R o t a t i o n of p r i n c i p a l components . Int. J. Climatology, 6:293-335. R i c o - M a r t i n e z , R . , A n d e r s o n , J . , a n d K e v r e k i d i s , I. (1996). Self-consis tency i n n e u r a l ne twork-based N L P C analysis w i t h appl ica t ions to t ime-series process ing. Computers chem. Engng, 2 0 , S u p p l . : S l 0 8 9 - S 1 0 9 4 . BIBLIOGRAPHY 158 Sanger, T. (1989). Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks, 2:459-473. Sengupta, S. K. and Boyle, J. S. (1995). Nonlinear principal component analysis of climate data. Technical Report 29, PCMDI. ShindeU, D. T., Miller, R. L., Schmidt, G. A., and Pandolfo, L. (1999). Simulation of recent northern winter climate trends by greenhouse-gas forcing. Nature, 399:452-455. Smith, T. M., Reynolds, R. W., Livezey, R. E., and Stokes, D. C. (1996). Reconstruc-tion of historical sea surface temperatures using empirical orthogonal functions. J. Climate, 9:1403-1420. Stamkopoulos, T., Diamantaras, K., Maglaveras, N., and Strintzis, M. (1998). ECG analysis using nonlinear PCA neural networks for ischemia detection. IEEE Trans. Sig. Proc, 46:3058-3067. Suarez, M. J. and Schopf, P. S. (1988). A delayed action oscillator for ENSO. </. Atmos. Sci., 45:3283-3287. Takane, Y. (1998). Nonlinear multivariate analysis by neural network models. In Stud-ies in Classification, Data Analysis, and Knowledge Organisation: Data Science, Classification, and Related Methods. Springer. Tangang, F. T., Tang, B., Monahan, A. H., and Hsieh, W. W. (1998). Forecasting ENSO events: A neural network-extended EOF approach. J. Climate, 11:29-41. Thompson, D. W. and Wallace, J. M. (1998). The Arctic Oscillation signature in the wintertime geopotential height and temperature fields. Geophys. Res. Lett., 25:1297-1300. Thompson, D. W. and Wallace, J. M. (1999). Annular modes in the extratropical circu-lation part I: Month-to-month variability. / . Climate, in review. Thompson, D. W., Wallace, J. M., and Hegerl, G. C. (1999). Annular modes in the extratropical circulation part II: Trends. J. Climate, in review. Trenberth, K. E., Branstator, G. W., Karoly, D., Kumar, A., Lau, N. -C, and Ropelewski, C. (1998). Progress during TOGA in understanding and modeling global teleconnec-tions associated with tropical sea surface temperatures. J. Geophys. Res., 103:14291-14324. Trenberth, K. E. and Paolino, D. A. (1980). The northern hemisphere sea level pressure data set: trends, errors, and discontinuities. Mon. Wea. Rev., 108:855-872. BIBLIOGRAPHY 159 Ulbrich, U. and Christoph, M. (1999). A shift of the NAO and increasing storm track ac-tivity over Europe due to anthropogenic greenhouse gas forcing. Climate Dynamics, 15:551-559. van Loon, H. and Rogers, J. C. (1978). The seesaw in winter temperatures between Greenland and northern Europe, part I: General description. Mon. Wea. Rev., 106:296-310. von Storch, H. and Zwiers, F. W. (1999). Statistical Analysis in Climate Research. Cambridge University Press, Cambridge. Wallace, J. M. and Gutzler, D. S. (1981). Teleconnections in the geopotential height field during the northern hemisphere winter. Mon. Weath. Rev., 109:784-812. Wang, B. (1995). Interdecadal changes in El Nino onset in the last four decades. J. Climate, 8:267-285. Whitaker, J. S. and Sardeshmukh, P. D. (1998). A linear theory of extratropical synoptic eddy statistics. / . Atmos. Sci., 55:237-258. Wilks, D. S. (1995). Statistical Methods in the Atmospheric Sciences. Academic Press, San Diego. Woodruff, S., Slutz, R., Jenne, R., and Steurer, P. (1987). A comprehensive ocean-atmosphere data set. Bull. Am. Met. Soc, 66:1239-1250. Yuval (1999). Neural network training for prediction of climatological time series; regu-larized by minimization of the generalized cross validation function. Mon. Weath. Rev., in press.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Nonlinea principal component analysis of climate data
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Nonlinea principal component analysis of climate data Monahan, Adam Hugh 2000
pdf
Page Metadata
Item Metadata
Title | Nonlinea principal component analysis of climate data |
Creator |
Monahan, Adam Hugh |
Date Issued | 2000 |
Description | A nonlinear generalisation of Principal Component Analysis (PCA), denoted Nonlinear Principal Component Analysis (NLPCA), is introduced and applied to the analysis of climate data. This method is implemented using a 5-layer feed-forward neural network introduced originally in the chemical engineering literature. The method is described and details of its implementation are addressed. It is found empirically that NLPCA partitions variance in the same fashion as does PCA, that is, that the sum of the total variance of the NLPCA approximation with the total variance of the residual from the original data is equal to the total variance of the original data. An important distinction is drawn between a modal P-dimensional NLPCA analysis, in which P successive 1D approximations are determined iteratively so that the approximation is the sum of P nonlinear functions of one variable, and a nonmodal analysis, in which the P-dimensional NLPCA approximation is determined as a nonlinear non-additive function of P variables. Nonlinear Principal Component Analysis is first applied to a data set sampled from the Lorenz attractor. It is found that the NLPCA approximations are much more representative of the data than are the corresponding PCA approximations. In particular, the 1D and 2D NLPCA approximations explain 76% and 99.5% of the total variance, respectively, in contrast to 60% and 95% explained by the 1D and 2D PCA approximations. When applied to a data set consisting of monthly-averaged tropical Pacific Ocean sea surface temperatures (SST), the modal 1D NLPCA approximation describes average variability associated with the El Nino/Southern Oscillation (ENSO) phenomenon, as does the 1D PCA approximation. The NLPCA approximation, however, characterises the asymmetry in spatial pattern of SST anomalies between average warm and cold events (manifested in the skewness of the distribution) in a manner that the PCA approximation cannot. The second NLPCA mode of SST is found to characterise differences in ENSO variability between individual events, and in particular is consistent with the celebrated 1977 "regime shift". A 2D nonmodal NLPCA approximation is determined, the interpretation of which is complicated by the fact that a secondary feature extraction problem has to be carried out to interpret the results. It is found that this approximation contains much the same information as that provided by the modal analysis. A modal NLPC analysis of tropical Indo-Pacific sea level pressure (SLP) finds that the first mode describes average ENSO variability in this field, and also characterises an asymmetry in SLP fields between average warm and cold events. No robust nonlinear mode beyond the first could be found. Nonlinear Principal Component Analysis is used to find the optimal nonlinear approximation to SLP data produced by a 1001 year integration of the Canadian Centre for Climate Modelling and Analysis (CCCma) coupled general circulation model (CGCM1). This approximation's associated time series is strongly bimodal and partitions the data into two distinct regimes. The first and more persistent regime describes a standing oscillation whose signature in the mid-troposphere is alternating amplification and attenuation of the climatological ridge over Northern Europe. The second and more episodic regime describes mid-tropospheric split-flow south of Greenland. Essentially the same structure is found in the 1D NLPCA approximation of the 500mb height field itself. In a 500 year integration with atmospheric CO2 at four times pre-industrial concentrations, the occupation statistics of these preferred modes of variability change, such that the episodic split-flow regime occurs less frequently while the standing oscillation regime occurs more frequently. Finally, a generalisation of Kramer’s NLPCA using a 7-layer autoassociative neural network is introduced to address the inability of Kramer’s original network to find P-dimensional structure topologically different from the unit cube in RP. The example of an ellipse is considered, and it is shown that the approximation produced by the 7-layer network is a substantial improvement over that produced by the 5-layer network. [Scientific formulae used in this abstract could not be reproduced.] |
Extent | 8824754 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-07-15 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0089632 |
URI | http://hdl.handle.net/2429/10829 |
Degree |
Doctor of Philosophy - PhD |
Program |
Oceanography |
Affiliation |
Science, Faculty of Earth, Ocean and Atmospheric Sciences, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2000-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_2000-486788.pdf [ 8.42MB ]
- Metadata
- JSON: 831-1.0089632.json
- JSON-LD: 831-1.0089632-ld.json
- RDF/XML (Pretty): 831-1.0089632-rdf.xml
- RDF/JSON: 831-1.0089632-rdf.json
- Turtle: 831-1.0089632-turtle.txt
- N-Triples: 831-1.0089632-rdf-ntriples.txt
- Original Record: 831-1.0089632-source.json
- Full Text
- 831-1.0089632-fulltext.txt
- Citation
- 831-1.0089632.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0089632/manifest