Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Neural network satellite retrievals of nocturnal stratocumulus cloud properties Rautenhaus, Marc 2007

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2007-0564.pdf [ 15.86MB ]
Metadata
JSON: 831-1.0052641.json
JSON-LD: 831-1.0052641-ld.json
RDF/XML (Pretty): 831-1.0052641-rdf.xml
RDF/JSON: 831-1.0052641-rdf.json
Turtle: 831-1.0052641-turtle.txt
N-Triples: 831-1.0052641-rdf-ntriples.txt
Original Record: 831-1.0052641-source.json
Full Text
831-1.0052641-fulltext.txt
Citation
831-1.0052641.ris

Full Text

Neural Network Satellite Retrievals of Nocturnal Stratocumulus Cloud Properties by Marc Rautenhaus A T H E S I S S U B M I T T E D IN P A R T I A L F U L F I L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F Master of Science in The Facul ty of Graduate Studies (Atmospheric Science) The Universi ty Of Br i t i sh Co lumbia October 2007 © Marc Rautenhaus 2007 Abstract I investigate the feasibil ity of retr ieving cloud top droplet effective radius, opt ical thickness and cloud top temperature of nocturnal marine stratocumulus clouds by invert ing infrared satell ite measurements using an art i f ic ial neural network. For my study, I use the information contained in the three infrared chan-nels centred at 3.7, 11.0 and 12.0 /nm of the Moderate Resolut ion Imaging Spectroradiometer (MODIS) on-board N A S A ' s Terra satellite, as well as sea surface temperature. A database of simulated top-of-atmosphere brightness temperatures of a range of cloud parameters is computed using a correlated-k parameterisat ion which I have embedded in the radiative transfer package l i bRadt ran . The database is used to t ra in feed-forward neural networks of different architecture to perform the inversion of the satel-lite measurements for the cloud properties. I investigate the appl icat ion of Bayesian methods to estimate the retrieval uncertainties, and analyse the Jacobian of the networks in order to gain information about the functional dependence of the retrieved parameters on the inputs. A high var iabi l i ty in the Jacobian indicates that the nocturnal retrieval problem is i l l-posed. M y experiments show that because the prob-lem is i l l -condit ioned, it is very difficult to find a network that approximates the database of simulated brightness temperatures well. Sea surface temperature proves to be a necessary input. I compare the retrievals of a selected network architecture wi th in-situ cloud measurements taken dur ing the second Dynamics and Chemist ry of Mar ine Stratocumulus experiment. The results show general agreement be-tween retrievals and in-situ observations, although no collocated comparsions are possible because of a t ime lag of five hours between both measurements. I establish that the uncertainty estimates are prone to numerical problems and their results are questionable. I show that the Jacobian is a valuable tool in evaluating the retrieval networks. i i Table of Contents Abstract 1 1 Table of Contents i i i List of Tables v i List of Figures v i i List of Algorithms x i List of Abbreviations x i i Acknowledgements xv 1 Introduction 1 1.1 Clouds and Cl imate 2 1.2 Mar ine Stratocumulus Clouds and the Aerosol Indirect Effect 5 1.2.1 Bu lk Propert ies 5 1.2.2 Aerosols 9 1.3 Radiat ive Transfer Terminology : 12 1.4 Mot iva t ion 15 1.5 Overview and Theory of Ex is t ing Retr ieval A lgor i thms 18 1.5.1 Dayt ime Methods 18 1.5.2 Night t ime Methods 21 1.5.3 Ar t i f ic ia l Neura l Networks 25 1.6 Objectives : 29 ii i TABLE OF CONTENTS 2 Radiative Transfer and Development of the Forward Model 31 2.1 D Y C O M S - I I D a t a 32 2.2 Scatter ing and Droplet Spectra 32 2.2.1 Radiat ive Transfer Equat ion 32 2.2.2 Droplet Size Dis t r ibut ion and its Phase Funct ion 35 2.3 Ad iabat ic C loud Mode l 40 2.4 Choice of Channels and Correlated-k 46 2.5 The Over ly ing Atmosphere 51 2.6 Forward Mode l 54 2.6.1 Radiat ive Transfer Mode l : l i bRadt ran 54 2.6.2 Design of the Forward Mode l 55 3 Nonlinear Regression with Neural Networks and Uncertainty Estimation 60 3.1 The Nonl inear Regression Prob lem 61 3.1.1 Er ro r Funct ions and Overf i t t ing 61 3.1.2 Contro l l ing Mode l Complex i ty and Network Tra in ing 64 3.1.3 Describing Uncertainty 65 3.2 Dis t r ibut ion of the Neural Network Weights 68 3.2.1 Derivat ion 68 3.2.2 The Meaning of the Hyperparameters 71 3.2.3 Gaussian Approx imat ion to the Posterior Dis t r ibut ion 72 3.3 Output Uncertaint ies 73 3.4 Hyperparameter Re-Est imat ion 76 3.4.1 Evidence Procedure for Ar 79 3.5 The Jacobian Ma t r i x 82 3.6 Implementat ion Issues 84 3.6.1 Normal isat ion of Input and Output Variables 87 3.6.2 Regular isat ion of the Hessian 87 3.7 Usefulness of the Aires et a l . Method 88 4 Retrieval, Results and Evaluation 94 4.1 Retr ieval Setup 94 TABLE OF CONTENTS 4.1.1 The Test Scene 94 4.1.2 Lookup Table Setup 95 4.2 C loud Mask 97 4.2.1 M O D I S C loud Mask and Irretrievable Pixels 98 4.2.2 Possible Fai lure Mechanisms 100 4.3 Network Tra in ing 105 4.3.1 Network Archi tecture 105 4.3.2 Fai lure of the Aires et al . Hyperparameter Re-Es t imat ion 106 4.3.3 Input Preprocessing 107 4.4 Network Architecture: Inputs and Hidden Neurons 107 4.4.1 Brightness Temperature Differences 107 4.4.2 Three Input Networks . 110 4.4.3 Four Input Networks I l l 4.5 Retr ieval Eva luat ion, Jacobian and Uncertainty 120 4.5.1 Compar ison wi th In-Situ Da ta 120 4.5.2 Jacobian 121 4.5.3 Uncertainty 129 4.6 Further Developments '• • 130 5 Summary 135 Bibliography 139 Appendices A Implementation of the Aires et al. Method in Matlab 147 A . l Modi f ied Net lab Functions 147 A .2 M a i n Loop: mlat ra in 147 A . 3 Gradient Computa t ion wi th Backpropagat ion: mlagrad 149 A .4 The Hessian wi th the Pearlmutter TZ{-}-Algorithm: mlahess, mlahdotv 151 A . 5 The Jacobian and its Dist r ibut ion: mlajacob, mlajacobuncertainty 152 A .6 Normal isat ion of Input and Output Variables . . . .• 153 A . 7 Implementat ion Difficulties: Regular isat ion of the Hessian and Numer ica l Symmetry . . . 154 v List of Tables 2.1 Spectral intervals of the four M O D I S channels implemented in the forward model and the atmospheric constituents whose absorpt iv i ty is accounted for 50 3.1 Jacobian matrices and their uncertainty (one standard deviation) of the two networks trained to model the example function given by (3.9) and (3.10) 84 4.1 Po in t estimate of the Jacobian of a network containing 30 neurons in the hidden layer and using B T ( 3 . 7 ) , B T ( l l ) , B T ( 1 2 ) and T s / c as inputs. . . 109 4.2 The same as Table 4 .1 , but for a network containing 15 neurons in the hidden layer and using B T ( 3 . 7 ) , B T ( l l ) , B T D ( 1 1 - 1 2 ) and Tsfc as inputs 115 4.3 The same as Table 4 .1 , but for a network containing 30 neurons in the hidden layer and using B T ( 3 . 7 ) , B T ( l l ) , B T D ( 1 1 - 1 2 ) and Tsfc as inputs 115 4.4 Average Jacobian of the Ju ly 11, 2001 , scene and its var iabi l i ty ( 1 5 N network) 124 4.5 The same as in Table 4.4, but normalised by the standard deviat ion of the inputs 125 A . l L is t of N E T L A B functions that were adapted or created in order to accommodate the matr ix hyperparameters Ain and Ar 148 v i List of Figures 1.1 The role of clouds in the cl imate system 2 1.2 C loud feedback processes represent a large uncertainty in cl imate model l ing 4 1.3 The annual cycle of cloud amount, lower tropospheric stabi l i ty and shortwave, longwave and net cloud forcings from a two-year E a r t h Radia t ion Budget Exper iment climatology. . 5 1.4 Example of in-situ measurements of microphysical cloud properties taken dur ing the F i rs t International Satell ite C loud Cl imato logy Programme Regional Exper iment in precipitat-ing marine stratocumulus clouds off the coast of Cal i forn ia 7 1.5 Typ ica l ranges of mean and effective particle diameter, c loud l iquid water content and part icle number concentration in low-level strat i form clouds 8 1.6 Dependence of cloud radiative forcing on macroscopic and microscopic c loud properties. . 11 1.7 Different methods to observe clouds 16 1.8 Dependence of radiat ive effects on microphysical properties of a cloud 20 1.9 Phys ica l basis for the remote sensing of nocturnal marine stratocumulus clouds 23 1.10 Schematic d iagram of a feed-forward network wi th two layers of adaptive weights 26 2.1 Processes that influence radiat ion as it passes through the atmosphere 33 2.2 In-situ measured spectra of cloud droplet sizes from the J u l 11th and J u l 13th flights, together wi th best-fit gamma distr ibutions 38 2.3 Phase functions of cloud distr ibutions following a gamma dist r ibut ion w i th a shape pa-rameter of 26 39 2.4 Vert ica l profile of cloud properties from the idealised adiabatic cloud model , together wi th in-si tu measurements from the Ju ly 11 D Y C O M S - I I flight 41 2.5 Measurements of k value versus height in marine stratocumulus clouds 44 2.6 The same as Figure 2.4, but wi th a subadiabatic l iquid water content of 75% of the adiabatic value 45 v i i L I S T OF FIGURES 2.7 Dependence of atmospheric transmittance on wavelength for a typ ica l midlat i tude summer atmosphere. 47 2.8 I l lustrat ion of the fundamental idea of correlated-k 49 2.9 Atmospher ic soundings from San Diego from Ju ly 11, 2001, and Ju l y 13, 2001 52 2.10 Ver t ica l brightness temperature profiles through a th in cloud and a thick cloud 57 2.11 Dependence of brightness temperature (BT ) and B T difference on varying effective radius and opt ical thickness for a cloud top temperature of 285 K and a cloud top pressure of 900 h P a 58 2.12 The same in F igure 2.11, but for different cloud top pressures 59 3.1 I l lustrat ion of overfitt ing 1 62 3.2 I l lustrat ion of overfitt ing II 63 3.3 Dis t r ibut ion of the first weight of the neural network used for the example given by equa-tions (3.9) and (3.10) after a short t raining run and after a longer t ra in ing run 73 3.4 Section through the functional surface of the example given by (3.9) and (3.10) wi th network predict ion and error bars 77 3.5 Scatterplots of the estimated error (one standard deviation) vs. the actual error for both output variables of the example given in (3.9) and (3.10) for the long t ra in ing run 78 3.6 The same as Figure 3.5, but for the short t raining run 78 3.7 Development of the covariance matrices Co and Cin and the hyperparameters Ain and Ar dur ing the t ra in ing run shown in Figure 3.8 83 3.8 Development of the weight values and the mean-square-error of the t ra in ing dataset and of a test dataset without added noise of the example given by (3.9) and (3.10) dur ing a t ra in ing run wi th 10 hyperparameter re-estimation iterations 85 3.9 Scatterplot of the network predictions for both outputs of the example given in (3.9) and (3.10) vs. the target data 86 3.10 Demonstrat ion of the input dependence of the error est imation given in (3.43) 89 3.11 Same as Figure 3.10, but wi th a training dataset consisting of 10,000 datapoints 90 3.12 The same example as in Figure 3.4, but the network has only been trained wi th data in an interval 90 3.13 L imi ta t ions of the error estimation - a simple two-layer perceptron is not able to model a mapping that contains ambiguities 91 LIST OF FIGURES 3.14 The same as Figures 3.5 and 3.6, but using non-regularised Hessian 91 4.1 Brightness temperature difference ( B T D ) between channel 20 (3.7 /im) and channel 31 (11 fiva) and channel 31 brightness temperature at 6:25 U T C on Ju ly 11th, 2001 96 4.2 k-value vs. height as inferred from the D Y C O M S - I I in-si tu measurements of Ju ly 11 and Ju ly 13, 2001 97 4.3 Histograms of u g a m and fc-value as inferred from the D Y C O M S - I I in-s i tu measurements of Ju ly 11, 2001 97 4.4 M O D I S cloud mask product for the Ju ly 11, 2001, scene and cloud mask derived from the B T / B T D range in the lookup table 99 4.5 L U T and M O D I S cloud mask filtered scene brightness temperatures 101 4.6 The same as Figure 4.5, but for the brightness temperature difference between channels 29 and 31 (8.5 and 11 /an) 102 4.7 11 fim versus 12 fim radiance plots for the observations along the line shown in F igure 4.4. 103 4.8 BTD(3.7-11) and BTD(11-12) signals of the pixels between points A and B as shown in F igure 4.4 104 4.9 Subset of the Ju ly 11 L U T . In addi t ion to the cloud top pressure (939.5 h P a ) , the cloud top temperature is also held constant at 285 K 108 4.10 Tra in ing performance of a three input network employing the BT(3 .7 ) , B T ( l l ) and B T D ( 1 1 -12) signals, compared to a four input network addit ional ly making use of surface temperature. 112 4.11 Retrievals of cloud top temperature and visible cloud optical thickness of the three input network from which the scatter plots on the left side of F igure 4.10 were produced 113 4.12 Mean-square-error of the val idat ion dataset for the four input A N N (BT(3.7) , B T ( l l ) , BTD(11-12) , Tsfc) in dependence of the number of hidden neurons 114 4.13 Sensit iv i ty of 15N predictions if cloud top pressure (fixed in the L U T ) is varied by ± 5 hPa.117 4.14 The same as Figure 4.13, but for the 30N network. 118 4.15 Histograms of the 15N Jacobian of the L U T subset plotted in F igure 4.9 119 4.16 Retr ieved effective radi i for the test scene and comparison of retrieved (15N network) and aircraft measured values 122 4.17 The same as F igure 4.16, but for cloud top temperature . 123 4.18 The same as Figure 4.16, but for cloud optical thickness 124 4.19 Histograms of the 15N Jacobian of the Ju ly 11, 2001, scene . 126 ix L I S T OF FIGURES 4.20 Spat ia l distr ibut ion of the sensit ivity of the effective radius retrieval to changes in BT(3.7) and BTD(11-12) on Ju ly 11, 2001 (15N network) 127 4.21 The same as Figure 4.20, but for the sensitivity of cloud top temperature to changes in BT(3.7) and B T ( l l ) 128 4.22 Spat ia l d istr ibut ion of the uncertainty in the effective radius and opt ical thickness re-trievals, as computed from the neural uncertainty term GTH~1G 131 x List of Algorithms 3.1 Bayesian Neura l Network Tra in ing wi th Ma t r i x Hyperparameter Re-Es t imat ion 85 x i List of Abbreviations 15N network wi th four inputs and 15 hidden neurons used in Chapter 4 30N network wi th four inputs and 30 hidden neurons used in Chapter 4 A B L atmospheric boundary layer A C E - 2 Second Aerosol Character isat ion Exper iment A I E aerosol indirect effect A M S R - E Advanced Microwave Scanning Radiometer for the E a r t h Observ ing System A N N art i f icial neural network A R 4 I P C C 4th Assessment Report A S adiabatic stratif ied cloud model A V H R R . Advanced Very H igh Resolut ion Radiometer B T brightness temperature B T ( l l ) 11 /jm brightness temperature BT(12) 12 fxm brightness temperature BT(3.7) 3.7 / im brightness temperature BT(8.5) 8.5 urn brightness temperature B T D brightness temperature difference BTD(11-12) B T D between the 11 f im and 12 fim channels BTD(3.7-11) B T D between the 3.7 /an and 11 fim channels x i i LIST OF ABBREVIATIONS BTD(8.5-11) B T D between the 8.5 / im and 11 channels C C N cloud condensation nuclei C E R E S Clouds and the Ear th 's Radiant Energy System C F M I P C loud Feedback Mode l Intercomparison Project C R F cloud radiat ive forcing D I S O R T Discrete Ordinate Radiat ive Transfer D Y C O M S - I I Second Dynamics and Chemist ry of Mar ine Stratocumulus field experiment E P I C East Pacif ic Investigation of Cl imate E R B E E a r t h Radia t ion Budget Exper iment F I R E F i rs t I S C C P Regional Exper iment G C M general c irculat ion model I P A independent pixel approximat ion I P C C Intergovernmental Panel on Cl imate Change I S C C P International Satell ite C loud Cl imatology Programme L U T lookup table L W C l iquid water content L W P l iquid water path M L P mult i layer perceptron M O D I S Moderate Resolut ion Imaging Spectroradiometer M S E mean square error N A S A Nat iona l Aeronautics and Space Admin is t ra t ion O A overlying atmosphere P C A pr incipal component analysis LIST OF ABBREVIATIONS P D F probabi l i ty density function P D T Pacif ic Dayl ight T ime R F 0 2 D Y C O M S - I I research flight II R F 0 3 D Y C O M S - I I research flight III R T E radiat ive transfer equation Sc stratocumulus S H D O M Spherical Harmonic Discrete Ordinate Method S S T sea surface temperature T O A top of atmosphere T R M M Tropical Rainfa l l Measur ing Miss ion U T C Coordinated Universal T ime V U vert ical ly uni form cloud model Acknowledgements I gratefully acknowledge the support of al l of the people who made this thesis possible. F i rs t , I would like to thank P h i l for giving me the opportuni ty to spend two years at the Univers i ty of Br i t i sh Co lumbia , including the part ic ipat ion in the European Geosciences Un ion conference in V ienna. I am very grateful for his supervision, the numerous discussions, his patience, guidance and for his tremendous time commitment, especially while I was preparing my poster presentation for the conference and while I was wri t ing up my thesis. I am grateful to my committee members, W i l l i a m and especially Douw for his valuable advice. I thank Chr is t ian for support ing discussions when I needed them. I appreciate the help of U l i and Dr . Mayer wi th the radiat ive transfer model l ibRadt ran. I am enormously thankful to my parents for their continuing belief in me dur ing my entire t ime at university and for accepting al l of my decisions. Wi thou t their support my stay in Vancouver would not have been possible. I thank my family and my friends for support and for br inging joy into my life. Most of al l , however, I thank R i k i from whom I received al l the love and support an ind iv idual person could have given during the past two years. xv Chapter 1 Introduction Of paramount importance to a comprehensive understanding of the Ear th ' s cl imate and its response to anthropogenic and natural variabi l i ty is a knowledge, on a global sense, of cloud properties that may be achieved through remote sensing and retrieval algorithms. - K i n g et a l . (1997, Page 2) Shallow boundary layer clouds play a cr i t ical role in the exchange of energy and water in our cl imate system. They do not contribute as much to the greenhouse effect as do upper level clouds, but due to their high a lbedo 1 , they reflect a large port ion of the incoming shortwave radiat ion back to space. The radiat ive forcing that these clouds exert on the atmosphere is modulated by their macrophysical, microphysical , and optical properties, and the way they interact w i th the cl imate system has been the subject of intense research act iv i ty in the last decades (Stephens, 2005, and references therein). However, despite this extensive research work, processes that are involved in cloud-atmosphere interact ion and their representation in general c irculat ion models remain some of the pr imary uncertainties in global cl imate model l ing (Bony et a l . , 2006; Ringer et a l , 2006; Wi l l i ams and Tsel ioudis, 2007). Accurate observations of clouds are a key component to solving the problems in this field. In this thesis I address the retrieval of marine stratocumulus cloud properties from satell ite measurements, specifically dur ing night t ime. These marine boundary layer clouds are common and cover large parts of the oceans (K le in and Har tmann, 1993; Norr is and Leovy, 1994). Thei r hor izontal homogeneity and usually low l iquid water path put them amongst the simplest to treat. Nevertheless, in a recent retrieval comparison exercise, Turner et a l . (2007) pointed out that such th in l iquid water clouds are surprisingly difficult to observe. Th is is part icular ly true for nocturnal observations, where no operat ional retrieval data are available to date. Th is introductory chapter wi l l cover the fundamentals of the clouds and cl imate research field, present information on marine stratocumulus clouds, and review relevant previous research work in the field of satellite remote sensing. I conclude wi th an outline of the objectives for this work which wi l l be addressed in the following chapters. 1A f o r m a l d e f i n i t i o n o f t h e a l b e d o w i l l b e g i v e n i n s e c t i o n 1 . 3 . 1 Chapter 1. Introduction Outgoing longwave radiation Figure 1.1: Clouds play a complex role in the radiative energy balance of our planet. Reflecting incoming shortwave radiat ion from space and t rapping outgoing longwave radiat ion from the earth's surface and atmosphere, they impact the climate wi th both cooling and warming effects. The magnitude of these radiative effects is dependent on macroscopic and microscopic cloud properties, giving rise to complex interactions wi th in the cl imate system (from Phi l l ips and Bar ry , 2002, courtesy of N A S A Marsha l l Space F l ight Center and Sc ience@NASA) . 1.1 Clouds and Climate Cloud radiat ive forcing ( C R F ) is defined as the difference between the upwell ing radiat ive flux at the top of the atmosphere and what it would be if the clouds were absent. F igure 1.1 i l lustrates the most important interactions between clouds and radiat ion. The largest contr ibut ion to the globally averaged long wave C R F is made by high clouds in the upper levels of the troposphere. They are cold and thus emit less radiat ion to space than the earth's surface and atmosphere under clear skies. Clouds that are opt ical ly thick, on the other hand, have the largest albedo, which makes them the largest contr ibutor to the short wave C R F . In the global annual average, clouds exert a net cooling effect on the cl imate system in its present state. The average top-of-atmosphere ( T O A ) short wave C R F is about -50 W / m 2 (the negative sign indicates cooling), while the T O A long wave C R F accounts for a gain of approximately 30 W / m 2 - a net cloud forcing o f - 2 0 W / m 2 (Wiel ick i et a l . , 1995). These values are large compared to the heating that would result if the current C O 2 concentration was doubled, which is estimated to be on the order of 4 W / m 2 (Wiel icki et a l . , 1995). 2 Chapter 1. Introduction Because of their smal l scale variabi l i ty in t ime and space, clouds are difficult to simulate in numerical global cl imate and weather forecasting models. The processes governing cloud format ion and dissipation occur on spat ial scales smaller than a general c irculat ion model gr id cell , so that the clouds and their interaction wi th the atmosphere have to be parametrised. Since the phase change from the gaseous to the l iquid phase produces a nonlinear transit ion from transparent vapour to opaque water droplets, parameterisations require detailed knowledge of the processes involved so as to capture the major effects of the governing physics. A n addit ional complicat ion is that knowledge of gr id cell mean cloud properties does not uniquely determine knowledge of the C R F because of the nonlinear relat ionship between cloud properties and albedo (Harshvardhan, 1982). Especia l ly of interest to the cl imate science community is the radiat ive feedback between clouds and the evolving ocean and atmosphere. How do clouds react to a changing cl imate? Does a warmer atmosphere induce more shallow clouds, which in turn would cool the cl imate? G iven the magnitude of the net C R F , smal l changes in cloud cover could already have a significant effect on the earth's energy balance. For instance, Har tmann et a l . (1992), using one year of global satell ite data, found the sensitivity of the radiat ion balance to low clouds to be on the order of -0.6 W / m 2 per percent fract ional cloud cover in the annual average. C loud feedback processes depend on so many factors that they have long been identified as being the largest source of uncertainty in cl imate change predictions; as Bony et a l . (2006) note, w i th a larger uncertainty than any other feedback. Despite intense research work in this area (see, for instance, the review articles of Stephens (2005) and Bony et al . (2006)), recent comparisons among cl imate models st i l l show a large variabi l i ty of the predicted change in global mean C R F if forcing factors such as a doubl ing of the atmospheric C O 2 content or an increase in sea surface temperature (SST) are prescribed (Ringer et a l . , 2006; Wi l l i ams and Tsel ioudis, 2007). F igure 1.2 reproduces the results of a modell ing study conducted by Ringer et al . (2006). The change in C R F , as computed by the current generation of cl imate models, varies not only in magnitude but also in sign - it is currently unclear whether clouds are changing so as to ampli fy global warming or to counteract it. Wi l l i ams and Tsel ioudis (2007) investigated how different cloud regimes contr ibute to the cloud feed-back uncertainty. For the six general circulat ion models ( G C M s ) they compared, they found that dif-ferences in the radiat ive response of frontal clouds in the mid-lat i tudes and of stratocumulus clouds in low-lati tude regions cause the largest proport ion of the variance in the global cloud response. Bony and Dufresne (2005) also found low-lati tude marine boundary layer clouds to be a major factor in the 3 Chapter 1. Introduction Cloud feedback parameter: +/- 2K Cloud feedback parameter: 2 X C O . AR4 Models . n i l 0 8 I I I I I I I I I I I I I I L D F H B A C G J I Model 1 . 2 O 0 4 o < o Longwave Shortwave 1 . 2 0 . 8 h O °- 4 LL DC O < o H - 0 . 4 r -D E F G Model -0.8 Longwave Shortwave D F H B A C Model G J Figure 1.2: C loud feedback processes represent a large uncertainty in cl imate modell ing. Macroscopic and microscopic cloud properties cannot be resolved expl ic i t ly in the large grid cells of general c irculat ion models, and different parameterisations lead to a large range of different feedbacks - from global mean cooling to warming of atmospheric temperature. Shown on the left is the global mean change in cloud radiative forcing (ACRF) in ten cl imate models part ic ipat ing in the C loud Feedback Mode l Intercomparison Project ( C F M I P ) , normalised by the radiat ive imbalance G result ing from a perturbat ion of the sea surface temperature in the simulations by ±2K (top: tota l cloud feedback; bot tom: longwave and shortwave components separated). O n the right the forcing G is given by doubl ing the C O 2 content of the atmosphere, and the models used are the ones submitted to the I P C C 4th Assessment Report (AR4) . ACRF/G = 1 means that the radiat ive forcing associated wi th cloud feedback is the same as the original direct forcing G, i.e. the clouds double the forcing exerted by G. Note the large differences in cloud feedback amongst the models (reprinted from Ringer et a l . , 2006, © 2006, w i th permission from the Amer ican Geophysical Union) . 4 Chapter 1. Introduction J F M A M J J A S O N D J F M A M J J A S O N D Figure 1.3: The annual cycle of (left) cloud amount and lower tropospheric stabi l i ty (ex-pressed as the potential temperature difference G(700 hPa) - 0 (sea level pressure)) and (right) shortwave, longwave and net cloud forcings from a two-year E R B E (Ear th Radia t ion Budget Exper iment) cl imatology in the region 20°-30°N and 120°-130°W off the coast of Cal i forn ia (reprinted from K l e i n and Har tmann, 1993, © 1993, wi th permission f rom the Amer ican Me-teorological Society). disagreement of G C M s in cloud feedback predictions. Wi l l i ams and Tsel ioudis (2007) note that the un-certainty is already present in model simulations of current cl imate, before the introduct ion of addit ional uncertainty from future cl imate forcing. Th is indicates that there is current ly no consensus on how to represent shallow boundary layer cloud processes in cl imate models. 1.2 Marine Stratocumulus Clouds and the Aerosol Indirect Effect 1.2.1 Bu lk Properties The retrieval algor i thm developed in this thesis focuses on the observation of marine stratocumulus (Sc) clouds. These clouds are so important to the cl imate problem because of their persistence and their high albedo (typical ly ~0.6-0.8 compared to the underlying sea surface value of typical ly ~0.05 for water (Driedonks and Duynkerke, 1989)). Th is causes a strong shortwave C R F especially in the low-latitude regions, where the incident solar radiat ion is very high. For example, off the coast of Cal i forn ia , average cloud amount reaches more than 65% dur ing the summer months, accounting for a net C R F of up to -70 W / m 2 (K le in and Har tmann, 1993, their F igure 5b,d is reproduced in F igure 1.3). Mar ine Sc part icular ly favour the eastern subtropical oceans, where cold surface waters from the upwell ing ocean currents produce boundary layer air temperatures that are cool compared to the warm, 5 Chapter 1. Introduction subsiding air aloft. The result ing strong temperature inversion caps the boundary layer and inhibits deep convection. Strong radiat ive cooling at cloud top helps to mainta in the shallow convection and thus the cloud layer in the absence of strong sensible heat flux from the ocean surface (K le in and Har tmann, 1993). Recently, it has been recognised that drizzle also plays an important role in the dynamics of the cloud layer. However, how exactly precipi tat ion processes interact w i th the ci rculat ion is st i l l subject to ongoing research (e.g. Ackerman et a l . , 2004; Stevens et a l , 2005; Savic-Jovcic and Stevens, 2007). These complex physical processes, together w i th the clouds' smal l geometrical thickness (typical ly a few hundred metres) and the sharp capping inversion makes the representation of marine Sc in cl imate models part icular ly difficult (Bretherton et a l . , 2004). They are also st i l l a problem for weather forecasting models, as a recent comparison of in-situ data taken during the second Dynamics and Chemist ry of Mar ine Stratocumulus ( D Y C O M S - I I ) field experiment (Stevens et al . , 2003a) w i th numerical weather predict ion results shows (Stevens et al . , 2007). Several field experiments have been conducted during which in-si tu and remotely sensed data of marine Sc were taken in order to better comprehend the mechanisms involved in sustaining and dissipating this type of cloud. Figures 1.4 and 1.5 give examples of typical vert ical c loud profiles observed in marine Sc. The data in Figure 1.4 were taken during F I R E , the F i rs t I S C C P (International Satell i te C loud Cl imato logy Programme) Regional Exper iment (Albrecht et a l . , 1988), in precipi tat ing Sc off the coast of Cal i forn ia. C loud l iquid water content and average droplet radius typical ly follow their adiabatic va lues 2 , wi th l iquid water content often becoming subadiabatic at the top of the clouds by entrainment of dry air f rom aloft or by removal of droplets by precipi tat ion (drizzle). The mean droplet concentration typical ly stays approximately constant wi th height, and the geometrical thickness of the clouds observed in this example is less than 300 m. Figure 1.5 summarises the results compiled by Mi les et al . (2000) f rom in-si tu data reported in the literature. The magnitude of average cloud top droplet radi i ranges from about 4 to about 12 / jm . 3 . The effective radius, an area-weighted mean radius of the cloud droplets which is important in radiative transfer applications (see section 1.3 and Chapter 2), is 5 to 15 f im, a l i t t le larger than the mean droplet radius. M e a n droplet number concentrations range from less than 20 c m - 3 to about 200 c m - 3 in marine clouds. A s a comparison, number concentrations typical of continental clouds are also shown. Due to the higher aerosol content of continental air masses, which act as cloud condensation nuclei ( C C N ) , significantly higher concentrations can be observed over land. 2 Adiabatic refers to a well mixed cloud layer in which entropy and total water content (i.e. water vapour plus liquid water) are constant with height. 3 Note that Miles et al. ( 2 0 0 0 ) use diameter to describe droplet sizes in their work, whereas I will use radius in this thesis. 6 Chapter 1. Introduction jlwc (g m ) (points Nr (cm ) (points) 0.1 0 2 0.3 0.4 0.5 0 50 100 150 jlwc (g m ) (points) 0 0.1 0.2 0.3 0.4 0.5 N, (cm ) (points) 0 50 100 150 •o 3 o 15 •o q, fern3) (line) , (nm) (lines) 0 0.1 0.2 0.3 0.4 0.5 q^gm^Xline) 0 2 4 6 8 10 12 r*, (urn) (lines) Figure 1.4: Example of in-situ measurements of microphysical cloud properties taken during the F i rs t I S C C P (International Satell ite C loud Cl imato logy Programme) Regional Exper iment ( F I R E ; Albrecht et al . , 1988) in precipi tat ing marine stratocumulus clouds off the coast of Cal i forn ia. Shown are data from two aircraft flights (left and right): in-si tu measured l iquid water content (qr, left side of each panel, sol id l ine), adiabat ic l iquid water content (left side, straight dashed l ine), in-si tu measured and adiabatic volume-mean radius (rvoi, r ight side, solid and dashed, respectively), and in-situ measured number concentration (Nr, r ight side, points). 0 marks cloud-top and cloud-base (reprinted from Aus t in et a l . , 1995, © 1995, w i th permission from the Amer ican Meteorological Society). 7 Chapter 1. Introduction a 0 10 20 30 Dm,obs (r"n) Figure 1.5: Mi les et al . (2000) collected observations of low-level strat i form clouds from previously published studies in order to compare the results found by different authors. The figures shown here i l lustrate typical ranges of mean and effective part icle diameter (Dm and De), c loud l iquid water content (LWC) and particle number concentrat ion (Nt). O n the left profiles of both continental and marine clouds are plotted together versus the normalised cloud height (h/ht = 0 marks cloud base, h/ht = 1 cloud top), whereas in the plots of the number concentration on the right they are separated. Mar ine clouds typical ly have much smaller number concentrations than continental clouds. Th is is due to fewer aerosols act ing as cloud condensation nuclei ( C C N ) over the oceans. Furthermore, Nt can be regarded as approximately constant w i th height wi th in a cloud. Note that, the figures on the left use diameter to describe cloud droplet size, whereas in this thesis radius is used (reprinted from Mi les et a l . , 2000, © 2000, w i th permission from the Amer ican Meteorological Society). Chapter 1. Introduction Due to their occurrence in strong subsidence regions, marine Sc cloud tops are usual ly found at low alt i tudes of less than 1000 m (e.g. Driedonks and Duynkerke, 1989). L iqu id water paths ( L W P ) are found to be on the order of up to a few hundred g / m 2 , often well below 100 g / m 2 (e.g. Brether ton et a l . , 2004; X u et a l . , 2005). A dist inctive feature of marine Sc decks is the occurrence of a pronounced d iurnal cycle in cloud amount and L W P . For instance, dur ing the East Pacif ic Investigation of C l imate ( E P I C ) , Bretherton et al . (2004) observed a rise of the inversion height (and thus cloud top height) of roughly 200 m each night f rom an early afternoon min imum value. The cloud base showed l i t t le d iurnal cycle, so that the clouds were thinnest dur ing the afternoon. Furthermore, cloud cover was almost 100% dur ing nightt ime, but dur ing the afternoon clouds were often broken. Such diurnal cycles in c loud amount were also found by Rozendaal et al . (1995), who used low cloud fraction data inferred from infrared satell ite channels in the I S C C P dataset. Wood et al . (2002) analysed the L W P derived from microwave measurements from the Tropical Rainfa l l Measur ing Miss ion ( T R M M ) satellite. They found a strong d iurnal cycle in L W P of subtropical low cloud regions, which wi th an early morning peak was in phase wi th the cloud amount cycle of Rozendaal et a l . (1995). Simi lar results were found by Blaskovic et a l . (1991). The diurnal cycle of cloud thinning during the day and growth at night is main ly caused by solar forcing. Dur ing the day, the shortwave radiat ion that is absorbed in the cloud layer largely offsets the longwave radiat ive cooling from cloud top. Th is inhibits the turbulence in the layer, and the resulting less well-mixed structure is both less efficient in t ransport ing moisture upwards f rom the surface and maintaining the cloud top by entrainment. Consequently, subsidence can advect the cloud top down-wards, while it is also dried out. Typical ly , the cloud base is also l ifted dur ing the day. Th is leads to a commensurately thinner cloud (Bretherton et a l . , 2004; Stevens et a l . , 2007). Da i l y amplitudes of the variations reach 5-10% for cloud amount (Rozendaal et a l . , 1995) and 15-35% for L W P (Wood et a l . , 2002), so that a correct representation of the d iurnal cycle in G C M s is cr i t ical to the accurate representation of low clouds in large scale models. 1.2.2 Aerosols Int imately l inked to low boundary layer clouds are the aerosol indirect effects (A IEs) , important for the estimation of the cl imate impact of anthropogenic aerosol emissions. Twomey (1974, 1977) proposed that an increased aerosol concentration shifts the droplet size distr ibut ion towards smaller values by providing more cloud condensation nuclei. If cloud l iquid water content ( L W C ) is unchanged, more and 9 Chapter 1. Introduction commensurately smaller droplets are formed, which increase the albedo of the cloud (first A I E of Twomey effect). A second key hypothesis was introduced by Albrecht (1989). He suggested that smaller cloud droplets suppress the format ion of precipi tat ion in the c loud . 4 Th is reduced cloud water sink would moisten the atmospheric boundary layer ( A B L ) , leading to a lower cloud base and a thicker c loud that could live longer (second A I E or cloud lifetime effect). The effect on the radiat ion budget is simi lar to the Twomey effect - increasing cloud thickness increases the albedo, and clouds that persist over longer t ime spans also have an increased albedo if we consider the time average. B o t h effects are important in the context of anthropogenic emissions. For instance, an idealised cl i -mate sensit ivity study conducted by H u and Stamnes (2000) investigated the sensit ivity of the C R F to microphysical properties of clouds. As an i l lustrat ion, F igure 1.6 shows the dependence of shortwave C R F on average cloud droplet radius for three different cloud thicknesses. If the average droplet size was decreased by just one micrometer, the effect on the radiat ion budget could already be significant. The figure also shows that th in clouds (given wi th a L W P of 50 g / m 2 ) , such as marine Sc, exhibit the largest radiat ive sensit ivity to microphysical changes. However, the interactions between aerosols and clouds are complicated. Prec ip i ta t ion, including drizzle that does not reach the ground, occurs and can influence the aerosol concentrat ion v ia scavenging, which in turn influences the cloud microphysics and cloud fraction (Stevens et a l . , 2005; Sharon et al . , 2006). A lso, precipi tat ion is known to stabilise the cloud layer (by cooling the subcloud layer and increasing static stabi l i ty), so less precipi tat ion caused by the second A I E would increase turbulence and thus entrainment of warm and dry air f rom aloft - counteracting the cloud thickening process (Wood, 2007). In tota l , the aerosol indirect effects make the problem of accurately representing boundary layer cloud processes in G C M s even more complicated, adding significantly to the already exist ing cloud feedback uncertainty (Lohmann and Feichter, 2005). In fact, whether higher aerosol concentrations actual ly increase the area-averaged albedo of a cloud deck and thus its C R F is discussed controversial ly in the literature. Recently, studies suggested that processes might exist that could cancel the indirect effect completely, leaving no effect of aerosols on the net radiative fluxes (Twohy et a l . , 2005; W o o d , 2007; X u e et a l , 2007). A intr iguing manifestation of the A I E are ship tracks (Hobbs et a l . , 2000; Durkee et a l . , 2000b,a). They occur when elevated aerosol levels in the area of a ship plume lead to an enhanced reflectivity of the cloud layer above the ship. The result ing "ship tracks" can often be observed on satellite imagery. Indeed, 4 In ice-free clouds, large droplets are needed to form precipitation (via the collision/coalescence process). If the droplet size distribution is shifted towards smaller values, the propensity of the cloud to form precipitation decreases. 10 Chapter 1. Introduction -TOO 3 - 1 5 0 - 2 0 0 - 2 5 0 L W P : 5 g m -L W P : 5 0 g m -L W P : 5 0 0 g m - ' 1 0 1 5 2 0 A v e r a g e D r o p l e t R a d i u s ( / ^ m ) 2 5 3 0 - 1 5 - 2 0 L W P : 5 g m " ' L W P : 5 0 g m - ' L W P : 5 0 0 g m " 1 5 2 0 A v e r a g e D r o p l e t R a d i u s {fJrri) 2 5 3 0 Figure 1.6: Example of the dependence of cloud radiative forcing on macroscopic and micro-scopic cloud properties: the dependence of shortwave C R F at the tropopause (global annual average, top) on average cloud droplet radius for three different c loud thicknesses, and its change if the average droplet radius is decreased by 1 /um (bottom) in an idealised cl imate sensit ivity study. To put the numbers into context, Wie l i ck i et a l . (1995) estimate the globally averaged shortwave C R F to be on the order of -50 W m ~ 2 ; an instantaneous doubl ing of C 0 2 would result in a forcing of roughly 4 W m ~ 2 (reprinted from H u and Stamnes, 2000, © 2000, w i th permission from Blackwel l Publ ishing). 11 Chapter 1. Introduction Durkee et a l . (2000a) found that ships that emit more aerosols on average produce ship tracks that are brighter, wider, and longer-l ived than ships wi th lower emissions. Here, I wi l l use the phenomenon of ship tracks to check the physical consistency of the retrieval method (Chapter 4). 1.3 Radiative Transfer Terminology Before I discuss the mot ivat ion for this work and give an overview of the relevant exist ing l i terature, it is useful to review some basic radiat ive transfer terminology. General references in which more detailed information can be found are, for instance, Pet ty (2006) or Bohren and C lo th iaux (2006). The flux density, or irradiance (sometimes abbreviated as flux), is a measure of the total energy per unit t ime and unit area transported by electromagnetic radiat ion through a flat surface: F = energy (Joules) t ime (seconds) x area (m 2 ) It is measured in W m - 2 . The radiance, or intensity, measures the directional energy transport: j = energy (Joules) ^ t ime (seconds) x area (m 2 ) x field of view ( s r _ 1 ) It is measured in W m - 2 s r - 1 . The direction is given by solid angle wi th units steradian (sr). A steradian is a "square rad ian" ; solid angle is to "regular" angle as area is to length (Petty, 2006). A n integration of solid angle over one hemisphere yields 2tt, over an entire sphere 47r. B o t h flux and intensity can be expressed in monochromatic form, i.e. per unit wavelength, A (or, alternatively, wave number, 1/A): ^ A wavelength (fim)' ^ ^ wi th units W m ~ 2 s r ~ V m _ 1 -W h e n electromagnetic radiat ion is incident on a medium, part of it is absorbed, part reflected, and part t ransmit ted. The absorptivity (also absorptance) a, reflectivity (also reflectance) r and transmissivity (also transmittance) t of a medium describe the corresponding fractions of the incident radiat ion and in general depend on wavelength and direction of the incident radiat ion. The shortwave reflectivity of a surface is also referred to as its (shortwave) albedo. Obviously, al l three quantit ies range from zero to one, 12 Chapter 1. Introduction and aA(M)+a(M)+tA(M) = l - (1-4) The transmissivi ty is described by Beer's Law. iA = ^ = e x p ( - / 3 A , e s ) , (1.5) where s denotes distance along the direction of propagation, 7A,O the intensity at posit ion s = 0, I\(s) the intensity after distance s, and /3A,e the extinction coefficient ( m - 1 ) . It describes the rate of energy attenuation per unit distance (l//?A,e determines the distance for energy to be attenuated to e _ 1 of its original value). Since al l of the following equations correspond to the monochromat ic case, I wi l l drop the subscript A for convenience. In equations that describe wavelength-integrated cases this wi l l be expl ic i t ly stated. Fol lowing Beer's Law, in a direct beam wi th no sources of radiat ion, the intensity I falls off exponen-t ia l ly w i th distance: J ( s ) = / 0 e x p ( - / J e a ) . (1.6) In this equation, the ext inct ion coefficient describes the effects of two mechanisms for ext inct ion; radiat ion can be either absorbed by a medium, or be scattered out of its original direct ion of propagation. The ext inct ion coefficient is the sum of an absorption coefficient f3a and a scattering coefficient (5S: Pe=Pa + Ps- (1-7) The single scatter albedo Hi characterises the relative importance of scattering in the total ext inct ion - Ps Ps Pe Pa + Ps (1.8) In a purely absorbing medium, w would be zero, whereas in a purely scattering one it would be unity. The dimensionless optical thickness of a cloud between z\ and z2 (in an atmosphere in which z represents the vert ical coordinate) indicates the opacity of the cloud for a given wavelength. It is defined as the path-integrated ext inct ion coefficient and hence expresses how much ext inct ion occurs over the geometrical thickness of the cloud: T(ZUZ2) = I 2 Pe(z)dz. (1.9) 13 Chapter 1. Introduction As a rule of thumb, Bohren et al . (1995) found that at an approximate visible opt ical thickness of 10 one can no longer see the sun through a cloud. The water droplets found in a cloud are generally distr ibuted over a range of sizes. The droplet size distribution is described by n(r)dr = number of droplets per unit volume wi th radi i between r and r + dr, and it is important for determining the effective radius ref / , which is defined as J n(r)r (1.10) 6dr reff (1.11) r dr Two cloud parcels wi th identical l iquid water content and ref j have the same tota l droplet surface area, independent of the actual droplet size distr ibut ion. The absorpt ion and scattering coefficients fia and @s can be wri t ten in terms of the droplet size (and wavelength) dependent absorpt ion and scattering cross sections cra(r) and <xs(r) (m 2 ) : (3a = J aa(r)n(r)dr Ps = J crs(r)n(r)dr (1.12) (1.13) o-a(r) and crs(r) hence describe the absorpt ion or scattering per part icle. Water droplets larger than about 5 fim have a scattering cross section of as = 2nr2 and an absorpt ion cross section of o~a « 0 at visible wavelengths. It can be shown (Petty, 2006) that in this case the visible opt ical thickness is related to the effective radius by _ 3 L W P Tvis ~ r' , 2 Pireff where pi is the density of l iquid water and the l iquid water path (g m - 2 ) is given by (1.14) L W P qidz •• 4ir , dr dz, (1.15) w i th l iquid water content qt (g m~ 3 ) . A l though (1.14) does not apply at absorbing wavelengths where aa and < 7 S vary wi th r and A, it is st i l l the case that cloud reflectivity and emissivity (defined below) can be wri t ten as a unique function of reff, r v i s , temperature and A for al l wavelengths. 14 Chapter 1. Introduction Final ly , the intensity emitted by a medium is given by the Planck function and the emissivity s: Ix=eXBX{T), (1.16) where T is the temperature of the emitt ing material . The emissivity describes how efficient the medium is emitt ing, and after Kirchhoff's Law, the emissivity of a mater ial is equal to its absorpt iv i ty: ex(0,d>)=ax(e,4>). (1-17) The Planck function, dependent on the temperature of a medium, is given by Bx{T) = — v , (1.18) * (Mot ) - 1 ) where h = 6.626 x 1 0 - 3 4 J s _ 1 is Planck 's constant, c = 2.998 x 10 8 m s _ 1 is the speed of l ight, and ks = 1.381 x 10 - 2 3 J / K is Bol tzmann's constant. It gives the emission of a black body, an object that absorbs al l radiat ion that falls onto it (ax = 1) and consequently also emits the max imum possible radiat ion that a mater ial w i th a given temperature can emit (e\ = 1). The brightness temperature TB is often used as a substitute for describing the measured intensity in remote sensing. It is the temperature a black body must have in order to emit the observed radiance at a given wavelength: h = BX(TB). (1.19) 1.4 Motivation D a t a that can shed light on the regional variations in cloud microphysical and opt ical properties, their d iurnal cycle and the interaction of clouds, precipi tat ion and aerosols is part icular ly important for further progress in the accurate representation of stratocumulus clouds in cl imate and weather models (Stevens et a l . , 2003a). F igure 1.7 summarises the options the scientific communi ty has to observe clouds. In-situ instruments carried by aircraft can very accurately measure drop size dist r ibut ion and l iquid water content, as well as the standard meteorological variables. Wh i le field campaigns such as D Y C O M S - I I yield valuable data, the aircraft flights are expensive, and it is not pract ical to use in-s i tu sampl ing to conduct 15 Chapter 1. Introduction ground based remote sensing Figure 1.7: Observations of cloud microphysical and opt ical properties, their diurnal cycle and the interaction of clouds, precipi tat ion and aerosols are important for further progress in the accurate representation of stratocumulus clouds in cl imate and weather models. Different methods can be employed; active (radar, l idar) and passive (radiometers) remote sensing can be used from the ground. Aircrafts or helicopter-based instrument-sondes are able to perform in-situ measurements and directly measure particle size, number and the standard meteorological parameters. However, for marine clouds, both ground based remote sensing (from ships) and in-situ measurements are cost and labour expensive. Satellites provide an ideal means to remotely sense data over large areas and long time spans. Most satellite-based approaches use passive mult ispectral radiometer data, but data from active satellite instruments wi l l also be available. the long-term observations that are needed to understand how the cl imatology of marine Sc responds to environmental change. Remote sensing from surface and satellite based instruments can provide a long-term data record, wi th surface instruments typical ly being able to achieve a higher temporal resolution but wi th a l imited spatial field of view. A lso , it is difficult to deploy surface remote sensing instruments for long time spans over sea, leaving satellites to be the major data source for large-area and cl imat ic studies of marine Sc. Remote sensing instruments infer cloud properties from radiat ion that is reflected, transmitted or emitted by the cloud. Whi le active instruments such as radar, sodar or l idar emit electromagnetic radi-at ion and infer the property of interest from the backscatter signal, passive sensors record the radiat ion that originates from natural sources. Mathemat ica l models of the radiat ive processes in the cloud and 16 Chapter 1. Introduction atmosphere are then employed to compute the intensity arr iv ing at the sensor (i.e. for satellites, at the top of the atmosphere) as a funct ion (referred to below as the forward model) of the relevant cloud parameters: jTOA = _f(re//j r ? Tcioud, Tsurface, overlying atmosphere, ...), (1.20) where the intensity has been wri t ten as a vector in order to account for observations at several wave-lengths. Other parameters, indicated by in the above equation, could include subadiabatici ty, cloud inhomogeneity or part ia l ly fil led pixels. The retrieval is then defined as the inversion of the function / , ip = f ~ 1 { I T O A , overlying atmosphere, ...), (1-21) where the vector <p = (reff, r, Tcioud, Tsurface, •••) contains the parameters to be determined. Most satellites currently in orbit carry passive sensors, due to the high power consumption of active technology. For instance, the Moderate Resolut ion Imaging Spectroradiometer ( M O D I S ) instruments, on-board the Nat iona l Aeronaut ics and Space Administ rat ion 's ( N A S A ) near-polar orbi t ing Terra and A q u a satellites, measure in the visible and infrared wavelength range and provide four images dai ly for a given locat ion on earth. Wh i le an operational cloud product including droplet size and opt ical thickness (a measure of how much radiat ion can pass through the cloud) is rout inely retrieved from the daytime measurements (K ing et a l . , 1997; Platn ick et a l . , 2003) 5 , no such retrievals are available for nightt ime images . 6 7 However, such retrievals would be useful for studies of marine Sc on the d iurnal timescale. One reason that no operational retrievals of nocturnal cloud properties are available is that solar radiat ion carries several orders or magnitude more energy than radiat ion emit ted by terrestr ial sources. The sun hence represents a source of very easily detectable photons, and the absence of emission of terrestr ial sources in the visible wavelength range means that cloud temperature is not a confounding variable. The higher intensity allows for smaller pixel sizes, too. For instance, while near and thermal infrared signals are recorded at a 1-km resolution by the M O D I S sensors, the visible channels have a 250-m resolution. Furthermore, most information on cloud microphysical and opt ical properties are carried by wavelengths in the visible and near infrared. Wh i le the reflected sunlight at visible channels is highly 5Available from http://ladsweb.nascom.nasa.gov/data/. 6 This is also true for other instrument data, such as from the Advanced Very High Resolution Radiometer (AVHRR). 7In 2006, NASA launched the CloudSat and CALIPSO satellites within the so-called A-Train - a series of satellites equipped with special cloud remote sensing instrumentation, flying in formation with the Aqua satellite in order to provide near-simultaneous measurements. The satellites are equipped with an active radar and a lidar instrument, respectively, which operate independently of the available sunlight. However, the lidar instrument will mainly focus on upper level clouds (i.e. cirrus), and the radar on vertical profiles of LWC and precipitation — droplet size, for instance, will not be part of the retrieval product (Stephens et al., 2002; Vaughan et al., 2004) 17 Chapter 1. Introduction sensitive to cloud optical thickness, near infrared channels provide informat ion about cloud part icle size (e.g. K i n g et a l . , 1997). However, a similar, al though weaker, dependence can also be found at non-visible wavelengths. Rad ia -tive transfer calculations by B a u m et al . (1994) showed that a combinat ion of the three near infrared and thermal channels of the Advanced Very H igh Resolut ion Radiometer ( A V H R R ) could be used to infer cloud droplet size, opt ical thickness and cloud temperature. At tempts to construct a retrieval scheme from such measurements (Minnis et a l . , 1995; Heck et a l . , 1999; Perez et a l . , 2000; Gonzalez et a l . , 2002; Perez et a l . , 2002) were part ly successful, but suffered from high computat ional cost and included only simple sensit ivity analyses in order to estimate the uncertainty on the retrievals. In part icular, the com-putat ional t ime needed to retrieve an indiv idual pixel in an image was on the order of seconds 8 , which, again, is not suited to bui ld a cl imatology bui ld on a large number of satell ite images. A lso , ambiguities in the inverse mapping were a problem (i.e. different droplet size and opt ical thickness combinations can lead to the same set of emitted radiances), further compl icat ing the uncertainty est imation and restricting the quanti tat ive usefulness of the retrievals (Perez et a l . , 2000; Gonzalez et a l . , 2002). Towards these ends, a retrieval algori thm both fast enough to process large datasets and able to provide reliable error estimates on the retrieved values could prove quite useful, and its non-existence motivates this study. 1.5 Overview and Theory of Existing Retrieval Algorithms 1.5.1 Daytime Methods Retr ieval techniques to extract boundary layer l iquid water c loud properties f rom satellite measure-ments have been developed since the 1980s (Ark ing and Chi lds , 1985; Twomey and Cocks, 1989; Naka j ima and K i n g , 1990; Naka j ima et al . , 1991; P latn ick and Twomey, 1994; H a n et a l . , 1994; P la tn ick and Valero, 1995; Naka j ima and Naka jma, 1995; Kawamoto et a l . , 2001), al though interest in inferring cloud cover characteristics f rom satellite images can be traced back to A rk i ng (1964). Since reflected solar radiat ion is observed by the sensors, and the reflectivity main ly depends on cloud opt ical thickness and effective radius, these two parameters are commonly retrieved. The underly ing principle is i l lustrated in F igure 1.8. L iqu id water clouds are composed of spherical water droplets whose scattering properties can be described by M i e theory (Mie, 1908). F igure 1.8a shows 8 on a 933 MHz Pentium III machine; P.H. Austin, 2005, per. comm. 18 Chapter 1. Introduction the dependence of the single scatter albedo ui (1.8) on wavelength for three different droplet sizes. In the visible wavelength range, Hi is very close to unity for al l particle sizes, and no radiat ion is absorbed in the cloud. The reflectance of a cloud layer, i.e. its albedo, is determined by the amount of radiat ion that is scattered back into space, i.e. by the number of photons that leave the cloud at its top after various scattering events. Since water droplets tend to scatter visible radiat ion main ly in the forward d i rec t ion 9 , it needs strong and a large number of scattering events for a high albedo. If a l l ext inct ion is due to scattering, an optical ly thicker cloud wi l l have a larger albedo. Since Hi depends only weakly on particle size in the visible wavelength range, the opt ical thickness of a cloud can well be inferred from the measured reflectance (Figure 1.8b, the curve for Hi = 1). F igure 1.8a shows that for wavelengths in the near and thermal infrared, the single scatter albedo becomes dependent on part icle size. Furthermore, for sufficiently low Hi and sufficiently thick clouds, the cloud albedo becomes independent of r (i l lustrated in Figure 1.8b for Hi = 0.9), leaving the major dependence on the effective droplet radius re/f. For th in clouds, the retrieved opt ical thickness from the visible wavelengths can be used to eliminate the optical depth dependence. Typica l ly , near infrared channels centred at 2.1 /m i (Nakaj ima and K i n g , 1990) or 3.7 fim (Naka j ima and Naka jma , 1995) are used to retrieve particle size information. To infer the values of cloud optical depth and effective radius from the satellite measurements, a radiat ive transfer model is employed to compute a database of reflected intensit ies, the lookup table ( L U T ) . Radiances that, would be expected at the satellite sensor are computed using simplif ied models of the cloud layer and the atmosphere. For instance, the operational M O D I S retrieval (K ing et a l . , 1997), based on the work of Naka j ima and K i n g (1990), uses a simple model of hor izontal ly and vert ical ly uniform clouds whose only parameters are reff and r. For the atmosphere, absorpt ion by water vapour or trace gases (mainly infrared) or scattering at aerosols (visible) has to be considered. A lso , emittance of cloud droplets in the near infrared bands plays a role in the observed intensities. It is often removed by ut i l is ing further thermal bands for estimates of cloud temperature. The components of cloud, atmosphere, and radiat ive transfer model together are called the forward model. Since the forward model is too complex to be inverted analyt ical ly, it is run for a variety of cloud parameters to bui ld a L U T . The determination of the cloud properties is then done by " looking up" the measured intensities in the database - the difference between observed and computed radiances is 9More details will be discussed in Chapter 2. 19 Chapter 1. Introduction o 3 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 '"i""i""i'"i"i"r 0.8 o < 0.6 0.4 0.2 1 1 I I 1 1 1 1 1 -~~ to=1 .03=0.999 - /•' (D=0.99 03=0.9 1 10 Wavelength X [urn] 0 10 20 30 40 50 60 70 80 90 100 (a) (b) Figure 1.8: Example of how radiative effects are dependent on microphysical properties of a cloud, (a) Dependence of the single scatter albedo Us (the fract ion of radiat ion extinguished along a ray of radiat ion that is due to scattering effects) on wavelength for three different cloud droplet sizes (5, 10 and 20 fim). (b) Idealised albedo, (c) transmit tance and (d) absorptance of a plane paral lel layer c loud in dependence on cloud opt ical thickness r for different values of oj. Especia l ly note the different w for the satellite channels measuring at 3.7, 11 and 12 p,m in (a). The result ing differing albedo, transmittance and absorptance can be exploited for remote sensing of droplet size and optical depth (reprinted from Pet ty , 2006, © 2006, w i th permission from Sundog Publ ish ing) . 20 Chapter 1. Introduction represented by a cost function (also referred to as error surface) p f » n m / j-observed jcomputed\2 /-i n n \ — (lTOA ~ 1TOA ) ' U - Z Z J which is minimised numerically. The above mentioned studies al l util ise these fundamental techniques, differing in the wavelength of the employed channels or the minimisat ion algori thm used. A lso , forward models were refined in later publ icat ions, for instance the treatment of water vapour (Kawamoto et a l . , 2001), or the cloud model, which now is often assumed to be adiabatic rather than vert ical ly uni form (Brenguier et a l . , 2000). A n important assumption in most retrieval schemes is the plane-paral lel approximat ion, which treats the cloud layer wi th in a pixel as horizontal ly homogeneous. Wh i le this might seem to be an oversim-pl i f ication for many realistic clouds, it is often a good approximat ion for strat i form clouds at scales of about 1 k m (the resolution of the M O D I S channels) and considerably simplifies the radiat ive transfer calculations (Petty, 2006). The cloud parameters are then retrieved on a pixel-by-pixel basis, imply ing that the observed radiances are determined entirely by the properties of the clouds wi th in the pixel of interest (independent pixel approximat ion, I P A ) . Hence, care has to be exercised as to only apply the retrieval scheme to pixels that are ful ly covered by cloud - a requirement that can be problematic for regions of broken cloud. Unfortunately, precipitat ion produces inhomogeneities at scales of about 1 k m (Stevens et a l . , 2005; Sharon et a l . , 2006). Th is poses obvious problems for remote sensing studies of the aerosol indirect effect; research showed a considerable error is introduced into the retrieval if inhomogeneities exist in the cloud layer, especially in the presence of broken clouds on the subpixel scale (Faure et a l , 2001b,a,c, 2002; K a t o et a l . , 2006; Iwabuchi, 2007). 1.5.2 Nighttime Methods A t night, the dependence of the radiat ive fields on cloud opt ical thickness and effective radius is sl ightly more complicated. Since radiat ion in the visible wavelengths is not ava i lab le 1 0 , the retrieval has to rely entirely on near and thermal infrared signals, result ing in the lack of the relatively independent opt ical thickness information in the visible range. Research work in this area dates back to Hunt (1973), whose work showed a difference in the sensitivity of cloud emissivity to changes in part icle size between several wavelengths in the near and thermal infrared. Most follow-up studies in the 1970s and 1980, 1 0Although new highly sensitive radiometers are being developed that will be able to measure visible light reflected by the clouds from the moon (Lee et al., 2006). 21 Chapter 1. Introduction however, were focused on cirrus clouds rather than shallow l iquid water clouds (Minnis et a l . , 1995, and references therein). D 'Entremont (1986) used the 3.7, 11 and 12 fim channels of the A V H R R instrument to determine fract ional c loud amount and cloud top height for low and mid-level clouds, and L i n and Coakley (1993) and Luo et al . (1994) developed a method based on the 11 and 12 u,m emissivities in order to determine fract ional cloud cover and an effective particle radius for single layered clouds. B a u m et a l . (1994) eventually found that the combinat ion of the A V H R R channels at 3.7,11 and 12 / im may be used for a simultaneous retrieval of cloud temperature, opt ical thickness and effective radius. A t these wavelengths, the emitted radiat ion of a cloud depends on both effective radius and opt ical thickness, and addit ional ly also on temperature. F igure 1.8a shows the strong dependence of w on part icle size for wavelengths larger than about 2 f im. Since, after Kirchhof f 's Law, the emissivity of a droplet is equal to its absorptivity, the value of l-u> gives an idea of how cloud emission as well as absorpt ion depend on re/f.n The cloud top radiance at a certain wavelength then becomes a complex funct ion of cloud temperature (determining the black body emission), particle size, the thickness of the cloud (determining how many scattering or absorpt ion/emission events occur), and, for clouds th in enough to allow for transmission, surface temperature (assuming a surface emissivity = 1). Of course, cloud inhomogeneities add further complexity. Despite this complexity, the differences in the dependence of cloud droplet scattering, absorption and emission properties on wavelength can be exploited for the retrieval. Wh i l e the objective of B a u m et al . (1994) was to detect mult i level cloud situations, M inn is et al . (1995) and Heck et a l . (1999) report on a nightt ime retrieval algor i thm wi th in the Clouds and the Ear th 's Radiant Energy System ( C E R E S ) programme. They use a very simple radiative transfer approach, parametr is ing the forward model by an efficiently computable function that incorporates cloud opt ical depth and the emissivity dependence on effective radius, and minimise the sum-of-square error between the model and observations. Perez et al . (2000) also include surface temperature information, retrieved from clear sky pixels in the vic in i ty of the clouds, i n their retrieval method to determine effective radius and cloud top temperature of nocturnal marine stratocumulus clouds. Thei r forward model consists of a vert ical ly uniform, plane-paral lel cloud over a sea surface. Emiss ion and absorption effects of the atmosphere above the cloud are neglected. Fol lowing B a u m et a l . (1994), Perez et al . (2000) employ brightness temperature differences ( B T D ) between the satellite channels to express the varying behaviour of the cloud radiat ive properties wi th wavelength, and extensively study the behaviour of the forward model when r, reff, and temperature 1 1 Indeed, although making the retrieval of particle size possible at nighttime, this dependence becomes a problem in the removal of the thermal contribution to the near infrared channels during daytime. 22 Chapter 1. Introduction A 4 s -2 I-CQ Optical depth 15.0 I 2 8 4 K 2 8 5 K 2 8 6 K 2 8 7 K 2 8 4 2 8 6 2 8 8 2 9 0 2 8 2 2 8 4 T 4 ( K ) 2 8 6 T 4 ( K ) 2 8 8 2 9 0 F i g u r e 1.9: Phys ica l basis for the remote sensing methods used by Heck et a l . (1999), Perez et al . (2000), B a u m et al . (2003) and Cerdena et al . (2007): plotted is the brightness temperature difference ( B T D ) between A V H R R channels 3 and 4 (3.7 and 11.0 /mi ) , as measured at the top of the atmosphere, versus the brightness temperature for channel 4. The curves are based on radiat ive transfer computations for horizontal ly and vert ical ly homogeneous clouds wi th varying part icle radius and optical thickness (left) and temperature (right). B y creating such curves wi th a radiat ive transfer model and comparing them to the brightness temperatures measured by the satellite, it is possible to infer information about the cloud properties that caused the satellite measurements (reprinted from Perez et a l . , 2000, © 2000, w i th permission from Elsevier). are va r i ed . 1 2 Figure 1.9 shows the B T D between the 3.7 fim and 11 /j,m channels of the A V H R R instrument (channels 3 and 4, respectively), henceforth abbreviated wi th BTD(3 .7 -11) , versus the 11 /j,m brightness temperature (BT ) for vert ical ly uni form clouds wi th (a) a fixed cloud top temperature of 285 K and several effective radi i and optical depths, and (b) a fixed radius of 8 fim and varying cloud temperature. The curves span an area of possible solutions of the forward model , and if some parameters can be fixed the remaining ones can readily be inferred from these diagrams. Simi lar diagrams exist for the B T D between 11 and 12 fim (BTD(11-12) in the following). Yet two issues complicate the retrieval. One problem for nocturnal retrievals is the saturat ion of cloud emission wi th opt ical depth. For th in clouds a number of photons from the surface and al l levels wi th in the cloud layer are able to reach the satellite without absorption. For thicker clouds absorpt ion wi l l effectively remove al l photons from lower levels in the cloud layer before they reach cloud top - leaving 1 2 T h e intensities in the cost function (1.22) now become brightness temperatures and brightness temperature differences. 23 Chapter 1. Introduction the cloud top radiance to be governed only by the upper cloud (of course, the absorpt ion is wavelength dependent). F igure 1.8d il lustrates the phenomenon. For an u> of 0.9, the emissivi ty of the cloud, equal to its absorptance, quickly reaches an asymptot ic l imit . A convergence for larger opt ical depths is also visible in the B T D diagrams of F igure 1.9. However, since most marine Sc have opt ical thicknesses of less that 10 ( X u et a l . , 2005; Rossow and Schiffer, 1999), the saturat ion problem does not pose a significant constraint. A lso problematic are ambiguities in the forward model. A s noted by Perez et a l . (2000), some sets of measured brightness temperatures can be caused by several clouds having differing opt ical properties. Th is , of course, makes the unique retrieval of the cloud parameters impossible. Perez et al . (2000) find that the effect is larger on BTD(3.7-11) , but also occurs for BTD(11-12) , especially for effective radi i in the range of 4-7 fim. The minimisat ion technique used by Perez et a l . (2000) is similar to the one used by Naka j ima and K i n g (1990) and K i n g et al . (1997), who first retrieve optical depth from the visible wavelengths in order to transform the opt imisat ion wi th respect to r e / / into a one-dimensional problem. Not ing the complexity of the hypersurface of the cost function (with equivalently deep min ima that resemble the ambiguit ies), Perez et a l . (2000) also split the retrieval into two parts. F i rs t , cloud top temperature is recovered from the BTD(11-12) signal. F i x i ng the atmospheric profile using the inferred surface and cloud top temperatures, they then run the forward model again for a range of effective radi i . Th is produces a one-dimensional minimisat ion problem wi th respect to reff, which is easier to solve for mult iple solutions. The retrieval result then consists of a cloud top temperature, together w i th a list of possible effective radi i . Perez et al . (2000) compare the retrieval results to time-averaged in-si tu measurements performed on the Canary Islands, and find that satellite and in-situ observed r e / / agree wi th in 1.25 ^ m . Gonzalez et a l . (2002) report on a modif ied version of Perez et al . (2000). They eliminate the ambi-guities due to effective radius by bui lding a L U T in which, for situations where mult ip le solutions exist, they only store the median radius. The discarded solutions are used to compute a confidence interval around the central value, which they find to be as large as 7 um for large r e / / in th in clouds. The cost funct ion then possesses a unique global min imum, which they f ind wi th a genetic a lgor i thm in order to avoid mult iple local min ima. In addi t ion to cloud temperature and effective radius, they also retrieve opt ical depth values. Perez et a l . (2002) adapt the retrieval to channels from the M O D I S instrument. Instead of using B T s at 3.7, 11 and 12 fim, they employ the 3.7, 3.9, 8.5 and 11 p m channels (plus surface temperature). 24 Chapter 1. Introduction Furthermore, atmospheric contributions to the top-of-atmosphere ( T O A ) B T s are included this t ime, assuming a known atmospheric profile from a different source. The error surface is minimised wi th a scatter search algori thm. B a u m et al . (2003) also extend the B a u m et a l . (1994) work to the M O D I S instrument. Thei r objective is again to recognise situations in which high level cirrus overlays low boundary layer clouds. Including the 8.5 jj,m channel in their computations, they note that the BTD(8.5-11) ( B T D between 8.5 and 11 /um) shows a strong dependence on cloud top temperature but not on part icle radius. Apa r t f rom the expensive minimisat ion techniques employed in the out l ined nightt ime retrieval meth-ods, the procedures also suffer from the problem of comput ing a meaningful uncertainty estimate. Perez et al . (2000), as well as Gonzalez et al . (2002), perform a sensit ivity analysis of their retrievals wi th respect to errors in the observed brightness temperatures. They find that a var iat ion of ± 2 K in the observed B T s of cloudy pixels leads to variations in reff of more than 3 fj.m, variations of ± 0.5 K in the clear sky B T s lead to errors in r e / / of less than 0.5 u m (Perez et a l . , 2000). Gonzalez et a l . (2002) note that the sensitivities of retrieved cloud parameters to input B T s are largest in the case of th in clouds. They also discuss the effect of the ambiguities in effective radius and the error in the presence of broken clouds on the subpixel scale (neglecting nonlinear effects). However, these sensit ivity analyses cannot provide more than an idea of the general magnitude of the uncertainty. They also do not include the effects of assumptions made in the forward model or the accuracy of the numerical inversion. Indeed, Pincus et al . (1995), for the case of opt ical depth retrievals, report on the difficulty of obtaining a good uncertainty estimate. In order to obta in an accurate error estimate for an indiv idual retrieval or for an average over a populat ion of retrievals, addi t ional inversions wi th perturbed input variables would have to be performed, which increases the computat ional cost. 1.5.3 Art i f ic ia l Neural Networks A n interesting development to tackle the high computat ional cost of the inverse procedure is the use of art i f icial neural networks ( A N N s ) . The theory of A N N s as a tool for stat ist ical data model l ing has been developed since the 1950s (e.g. Rosenblatt, 1958), but they were not used much in the atmospheric sciences unt i l the 1990s (e.g. M c C a n n , 1992; Hsieh and Tang, 1998). General references to A N N s are the books by Bishop (1995, 2006). Inspired from the biological archetype of the human brain, an A N N imitates the concept of intercon-nected nodes, the neurons, that can "fire" in order to pass on informat ion if certain input conditions are 25 Chapter 1. Introduction bias bias F i g u r e 1.10: Schematic diagram of a feed-forward network wi th two layers of adaptive weights w^j and wfl- The input neurons on the left are labelled wi th a;;. The information propagates in a forward direct ion through the hidden neurons zj to the output neurons y^. In addi t ion to the input and hidden neurons, there are two bias parameters wi th a fixed input (activation) of XQ = 1 and zo = 1, respectively. met. In part icular, a neuron in an A N N computes the sum over al l of its input values and returns the value of an act ivat ion function of that sum. Th is activation funct ion can be an arbi t rary function, but usual ly a sigmoidal function is used that returns 1 if a certain threshold is reached and 0 otherwise. The connections between the indiv idual neurons are weighted, and by adjust ing these weights, the A N N can " learn" a certain behaviour. Ar t i f ic ia l neural networks have been widely used for both classif ication and regression problems (Bishop, 1995). There are many different types of A N N s , but the important one for this study and the works cited here is the mult i layer perceptron ( M L P ) . 1 3 Its architecture is schematical ly i l lustrated in F igure 1.10. Its neurons are grouped into several layers, one input layer, one output layer, and one to several hidden layers . 1 4 Each neuron in the input layer corresponds to an input variable, and each neuron in the output layer to an output. The number of hidden layers and neurons in them is variable and has to be determined indiv idual ly for each case. In order to learn a behaviour, the network has to be trained wi th a t ra in ing dataset that includes input values as well as their corresponding target values. The weights are then modif ied unt i l , for al l 1 3 I will not give a more detailed description of multilayer perceptrons in this thesis, a thorough introduction to many aspects concerning neural networks can be found in Bishop (1995), a shorter review in Bishop (1994). 1 4 T h e numbering of the layers can be confusing. A "two-layer-perceptron" refers to a network with two layers of adaptive weights, that is only one layer of hidden neurons. "Two hidden layers", however, refer to two layers of hidden neurons and hence three layers of adaptive weights. 26 Chapter 1. Introduction input values, the outputs of the A N N equal the target values to a sufficient accuracy. A n important property of such A N N s is that, given a large enough number of h idden neurons and weights, they are able to approximate any arbi t rary smooth function (e.g. B ishop, 1995). Th is makes them ideally suited for nonlinear regression problems in which l i t t le is known about the shape of the cost function. Furthermore, once the training process is completed, the computat ion of the output values includes only a few summations and evaluations of the activation functions, which makes a trained A N N very fast. O f course, there are also disadvantages, as it can be difficult to determine the number of hidden neurons or to find a good set of weights (Bishop, 1995, more specific problems arising in the atmospheric sciences are outl ined by Hsieh and Tang (1998)). In the context of atmospheric satellite remote sensing, some work has been done that employs A N N s . For instance, Krasnopolsky et al . (1995) and Krasnopolsky et al . (2000) used A N N s to retrieve surface wind speeds over the ocean from microwave measurements. Aires et a l . (2001) determined surface tem-perature, water vapour content and cloud l iquid water path, also from microwave observations, and Aires et al . (2002) retrieved atmospheric and surface temperature from infrared measurements. Faure et al . (2001c) applied A N N s to dayt ime retrievals of c loud parameters. In their work, they investigated the feasibil ity of simultaneously retr ieving cloud opt ical thickness, effective radius, a relative cloud inhomogeneity and fractional cloud cover of inhomogeneous clouds from observations at 0.6, 1.6, 2.1 and 3.7 u.m. They used a three dimensional Monte Car lo radiat ive transfer model to account for nonlinear effects due to the cloud inhomogeneities to bui ld a database of 3000 clouds. A n A N N wi th two hidden layers, 10 neurons in each layer, was trained from this database in order to model the inverse function. Faure et al . (2001c) concluded that a retrieval of inhomogeneous clouds using an A N N is feasible. The study was extended by Cornet et al . (2004) to combine dayt ime observations available at different resolutions. In order to account for the increased complexity of the problem, they employed a combination of several A N N s and applied the method to M O D I S data (Cornet et a l . , 2005). However, they d id not compare their results to in-situ observations. Schii l ler et al . (2003, 2005) also used A N N s for daytime retrievals of droplet number concentration, geometrical thickness and l iquid water path of shallow convective clouds, but they d id not give many details on the architecture they used. Mot ivated by the high computat ional efficiency of an ANN-approx ima ted inverse funct ion, Cerdena et al . (2004) continued the work of Perez et al . (2000) and Gonzalez et a l . (2002), becoming the first to apply A N N s to nocturnal cloud property retrievals. In their study, they empir ical ly explored several 27 Chapter 1. Introduction network architectures wi th differing numbers of hidden layers and neurons, and found that two layers w i th 100 neurons in the first and 20 neurons in the second layer yielded the best results. The A N N s were trained using a database of 20,000 data points, w i th an addi t ional 10,000 independent points for val idat ion. Cerdena et al . (2004) compare two forwards models, one employing the vert ical ly uniform cloud model , and the other an adiabatic profile wi th vert ical ly increasing l iquid water content (more details in Chapter 2). Analys ing the same data as Perez et al . (2000) and Gonzalez et a l . (2002), the A N N retrieved effective radi i agreed wi th in 2 /jm wi th the in-si tu observed values, w i th the adiabatic cloud model yielding sl ightly improved results. Cerdena et a l . (2007) extended the retrieval method to the dayt ime case and also further investigated the network architecture. Ut i l is ing genetic algorithms, they were able to improve the network design to a network containing 20 neurons in the first and 5 neurons in the second hidden layer, yielding similar results as their first, more expensive architecture. They also presented a sensit ivi ty analysis, similar to the one conducted by Perez et al . (2000), by perturbing the input brightness temperatures by 0.5 K and noting the effect on the retrieved parameters for th in (r < 2) as well as thick ( r > 8) clouds. For th in clouds, such perturbations can lead to errors of up to over 4 um in reff, 4 K in cloud temperature (T), and 0.6 in r. For thick clouds, errors are smaller in rejf and T , but larger for r. However, in the Cerdena et al . (2007) work, the problem of est imating an accurate error interval for given inputs remains. Furthermore, they do not discuss the effect of an uncertainty in the neural network fit on the outputs (i.e. the uncertainty of having found the best set of weights dur ing the t ra in ing process), which can also be significant (Aires et a l . , 2004a). Indeed, there exist both analyt ical methods to compute the Jacobian (i.e. the sensit ivity) of a network for a given input and to compute the output uncertainty due to the network fit. M a c K a y (1992a,b, 1995) developed a Bayesian framework for neural network training that besides of making the t ra in ing process more stable allows for an est imation of the uncertainties in the network predictions that are due to the network fit and to noise in the t ra in ing data. Aires (2004); Ai res et a l . (2004a,b) generalised this concept to the mult id imensional case and demonstrated its appl icat ion in the context of their previous microwave remote sensing problem (Aires et al . , 2001). The Jacobian is very important for the val idat ion of the network fit. Since for the inverse problem, we do not know the shape of the function to model, we can only verify the A N N by comparing its predictions to independent test data or by control l ing the physical consistency of the Jacobian. Th is means that the output variables should be dependent on the expected input variables. For instance, dur ing nightt ime, 28 Chapter 1. Introduction we expect the effective radius output to be sensitive mainly to the BTD(3.7-11) signal (cf. F igure 1.9), while cloud top temperature should depend mainly on the thermal signals at 11 or 12 um. Th is can even be done in a quanti tat ive way by numerical ly estimating sensitivities at certain points in the L U T and comparing them to the A N N predicted values. However, inverse problems are often i l l -condit ioned. Tha t means that there exist several differing functional mappings that approximate the t ra in ing data equally well, but not necessarily represent the true function that generated the data - the training data are s imply not precise enough to unambiguously specify the correct function. In the case a bad mapping is afterwards appl ied to input data that was not part of the training dataset, the predict ion may be far f rom the actual value. The network is then said to generalise badly (Bishop, 1995). Wh i le this problem is difficult to avoid, addi t ional care should be exercised when using A N N s for i l l -condit ioned problems. Aires et al . (2004b) describe an approach to estimate the var iabi l i ty of the Jacobian due to the uncertainty in the network fit. Th is var iabi l i ty is a good indicator of how unstable the training process was and thus how i l l -condit ioned the problem is. 1.6 Objectives A neural network can potential ly provide a fast retrieval a lgor i thm for determining nocturnal cloud properties that might be able to overcome the performance problem encountered in classical retrieval approaches. Furthermore, the appl icat ion of the methods developed by M a c K a y and Aires et al . could be very beneficial for understanding the "black-box nature" of A N N s and in order to improve the retrieval quality. I w i l l extend the promising Cerdena et al . (2007) results to a different network architecture, a different satellite sensor and different in-situ data. M y focus lies on the uncertainty i n the retrieval, and what we can learn from its sensitivities. In part icular, the objectives for this thesis can be outl ined as follows. The goal is to set up a retrieval method capable of determining cloud effective radius, cloud top temperature and cloud opt ical thickness from nocturnal measurements of the M O D I S instrument. In-situ data for evaluation purposes are available from the D Y C O M S - I I field campaign, and the method should be able to provide error bars on the results. The indiv idual steps include: • The Aires (2004); Aires et a l . (2004a,b) method wi l l be implemented and it wi l l be investigated if it is suitable for the given problem. Apar t f rom obtaining an uncertainty estimate of the network fit, the hope is that ambiguities in the training database wi l l cause a larger uncertainty in the retrieval. 29 Chapter 1. Introduction • A forward model wi l l be developed that is capable of comput ing top-of-atmosphere brightness tem-peratures for the relevant near and thermal infrared M O D I S channels for vary ing cloud parameters. A L U T ( L U T wi l l be used synonymously for database from here on) has to be computed using this model. • The implemented methods wi l l be applied to the retrieval problem. A N N s have to be trained from the computed L U T , different network architectures have to be explored and the results have to be evaluated. For this thesis, the goal is to retrieve one scene from the D Y C O M S - I I field campaign (so that several parameters including the atmospheric profile can be prescribed in order to simpli fy the problem) and to evaluate the results using the in-situ data. • F ina l ly , the uncertainties and sensitivities of the retrieval wi l l be analysed and their usefulness investigated. Th is thesis is meant to bui ld a basis for further research. If the retrieval of the test case proves successful, future work can be performed on evaluating further scenes, w i th an eventual goal of creating a general method that could be used "operat ional ly" for any arbi t rary scene. The remainder of this document is structured as follows. In Chapter 2, the radiat ive transfer model wi l l be described. The employed cloud model wi l l be discussed, as well as radiat ive transfer techniques for comput ing the T O A brightness temperatures. A short section wi l l be devoted to the effects of the overlying atmosphere. In Chapter 3 I discuss neural network techniques. The Ai res method wi l l be derived, and its implementat ion wi l l be applied to simple test cases. The behaviour of the method for these cases wi l l be analysed and its appl icabi l i ty to the remote sensing problem discussed. The t ra in ing of A N N s from the L U T and the retrieval of the test scene is the topic of Chapter 4. Issues concerning the retrieval setup wi l l be discussed, and the results of sensit ivity and uncertainty estimates are presented. A discussion of the usefulness of the estimated variables is given, and the thesis concludes w i th a summary of the work in Chapter 5. 30 Chapter 2 Radiative Transfer and Development of the Forward Model In this chapter I wi l l describe the design of the forward model that is used for the case study. The scene of Ju ly 11, 2001, was chosen for the retrieval, and the in-situ aircraft measurements used in this chapter are taken from D Y C O M S - I I research flight II (RF02) , which took place on that day. The D Y C O M S - I I campaign wi l l briefly be introduced in section 2.1. Next , I wi l l introduce the radiat ive transfer equation ( R T E ) that mathematical ly describes the propagation of radiat ion through the atmosphere and discuss how it can be solved. For the radiat ive transfer calculations, cloud model and droplet size distr ibut ion are needed. Us ing the R F 0 2 data, I wi l l demonstrate the adequacy of the adiabat ic approximat ion for the cloud model and show that the size distr ibut ion can be described by a modif ied gamma distr ibut ion (sections 2.2 and 2.3). The question of which M O D I S channels are best suited for the retrieval is discussed in section 2.4. The radiances measured by the sensor at these channels are always average radiances over a wavelength interval, since the instrument cannot measure at indiv idual wavelengths. Hence, the forward model must be able to compute such interval-averaged intensities. I wi l l introduce the correlated-k approximat ion as an efficient way to calculate gaseous absorption over these intervals. The forward model also requires specification of absorpt ion and emission by gaseous atmospheric constituents including water vapour and carbon dioxide above the cloud. Th is is discussed in section 2.5. The radiat ive transfer package l ibRadt ran (Mayer and K y l l i n g , 2005) is employed to compute cloud top radiances. In section 2.6, the setup of the forward model including l i bRad t ran wi l l be described, and I conclude the chapter by demonstrat ing that the forward model is able to reproduce the B T D relationships of B a u m et a l . (1994) and Perez et al . (2000). 31 Chapter 2. Radiative Transfer and Development of the Forward Model 2.1 D Y C O M S - I I Data The D Y C O M S - I I field campaign took place from Ju ly 7, 2001, to Ju ly 28, 2001. Dur ing nine nights, research flights collected extensive in-situ datasets in the nocturnal Sc cloud layer over the east Pacif ic ocean off the coast of Cal i forn ia, approximately 350-400 k m west southwest of San Diego. The campaign and the available data are described by Stevens et al . (2003a,b). Wh i le the major objective of the cam-paign was to perform measurements to advance understanding of entrainment and drizzle processes, the collected data are also very useful for this remote sensing study. Dur ing seven nights, circles wi th an approximate diameter of 60 k m (30 min) were flown at sev-eral heights in the boundary layer (subcloud and cloud layer). Addi t ional ly , frequent vert ical profiles were taken. D a t a useful for this study includes the measurements taken dur ing vert ical profi l ing and in horizontal ly advected circles flown just below cloud top and above cloud base. Besides the standard meteorological measurements of temperature, humidity, pressure and wind speed, data of l iquid water content, droplet concentration and droplet sizes were taken (a complete list can be found in Stevens et a l . (2003b)). The ground speed of the airplane was about 100 m s _ 1 and measurements relevant for this work were taken every second. To ensure comparabi l i ty w i th the M O D I S data, I have averaged the D Y C O M S - I I data over 10 s intervals in order to yield measurements on the 1 k m scale. The measured droplet size distr ibutions allowed for the computat ion of effective radius and mean radius. For this thesis, the flights of Ju ly 11 and Ju ly 13 (RF02 and R F 0 3 in the D Y C O M S - I I l iterature) were chosen. Satell i te images of both scenes contain a number of ship tracks that provide contrast ing droplet sizes which can be used to check the physical consistency of the retrieval. Wh i l e the Ju l y 11 case wi l l serve as the retrieval example in Chapter 4, data from both nights are used in this chapter for evaluating the cloud model. The actual satellite images from the M O D I S sensor wi l l be introduced in Chapter 4. 2.2 Scattering and Droplet Spectra 2.2.1 R a d i a t i v e Trans fer E q u a t i o n Figure 2.1 i l lustrates the processes that influence radiat ion propagating along a line-of-sight as it passes through the atmosphere. As discussed in section 1.3, energy can be lost by absorpt ion and scattering out of the direct ion of propagat ion, while emission and scattering into the beam represent sources of energy. The change in intensity across an inf initesimal volume hence is the sum of a sink term describing 32 Chapter 2. Radiative Transfer and Development of the Forward Model scattering out absorption F i g u r e 2 . 1 : Processes that influence radiat ion as it passes through the atmosphere, ext inct ion by absorption and scattering, and source terms representing emission and scatter ing 1 5 : : dl = d l e x t + dlemit + d l s c a t . (2.1) The ext inct ion term is described by Beer's Law (1.6): dhxt = -pjds, (2.2) where ds represents an infinitesimal path length along the direct ion of propagat ion. Emiss ion is given by the emissivity of the medium times the P lanck funct ion (1.16): dlemit = PaB{T)ds. (2.3) Here, Kirchhof f 's Law (1.17) has been used to express the emissivity in term of the absorpt ion coefficient. The scattering source term, however, is more complicated. The scattering phase function p{Cl'', Cl) ex-presses the idea that radiat ion from any direction Cl' passing through an inf ini tesimal volume can con-tr ibute scattered radiat ion to our direction of interest Cl. Furthermore, d l s c a t must be proport ional to the scattering coefficient @ s , which describes how much scattering occurs: dlscat = T- \ P ( « ' , Cl)I{Cl')dCl'ds (2.4) The scattering phase function can be interpreted as a probabi l i ty density. p(Cl', Cl) gives the probabi l i ty that a photon from direction Cl' is scattered into direction Cl. Hence, p(Cl', Cl)I(Cl') described the gain in 1 5 Note that I am still omitting the subscript A. All equations given here correspond to the monochromatic case. 33 Chapter 2. Radiative Transfer and Development of the Forward Model intensity in direct ion Cl due to scattered radiat ion from direct ion Cl'. In order to compute the tota l energy gain due to scattered radiat ion, contributions from al l directions are summed in the integral, where the factor of 4w arises from the spherical integration to ensure that the integral over the phase function is one. Pu t t ing the ext inct ion and source terms into (2.1), the radiat ive transfer equation becomes dl = -peIds + paBds + — f p(Cl', Cl)I{Cl')dCl'ds. (2.5) In this form, it has a general three-dimensional character, i.e. the direct ion vectors Cl can be expressed in any coordinate system, and the infinitesimal path length ds can be in any direct ion. The most general approach to solve the radiat ive transfer problem numerical ly is the Monte Carlo method (e.g. Bohren and Clo th iaux, 2006). It allows for arbi t rary three-dimensional scenes by simulat ing the propagation of a large number of photons through the medium. The path of each photons is traced from its original source (e.g. the sun) unt i l it leaves the defined scene (e.g. at the top of the atmosphere). Wh i le this method is very flexible and allows for inhomogeneities in the cloud layer, i t is computat ional ly very expensive and currently not suited to compute the T O A radiances for a large number of clouds, as needed for a L U T . For the reasons discussed in Chapter 1, the plane paral lel approximat ion is a good assumption for large parts of marine Sc on the scale of the M O D I S pixels (1 km). In order to keep the forward problem as simple as possible and at the same t ime computat ional ly tractable, I decided to construct the L U T for plane paral lel clouds. In fact, most analyt ic solutions and approximations to the radiat ive transfer equation have been developed for the plane parallel case (Petty, 2006). In a plane paral lel atmosphere, al l relevant parameters (i.e. pa, (3S, T, reff, cf. section 1.3) are only dependent on height. Since the important aspect for radiat ion is how much absorbing, emitt ing and scattering atmosphere it must traverse, it is convenient to express the vert ical coordinate in terms of the optical properties of the atmosphere rather than in geometrical units. The ext inct ion optical depth is defined as the opt ical thickness of the atmosphere between the top of the atmosphere and a level at height z : 1 6 (3e(z')dZ'. (2.6) A t the top of the atmosphere, r = 0. Furthermore, directions are usually expressed in terms of zenith 1 6Some authors use optical depth synonymous for optical thickness when referring to the cloud optical thickness; in order to avoid confusion I will use optical depth for the vertical coordinate in the R T E and optical thickness for the cloud property. 34 Chapter 2. Radiative Transfer and Development of the Forward Model angle 9 (measured from directly overhead; 6 = 0 is overhead, and 6 = TT/2 is the horizon), and azimuth angle 0 (measured counterclockwise from a reference point on the horizon). The inf initesimal path length ds in (2.5) can now be expressed as where n = cos 9. F rom (2.6) it follows that dr = —0edz = —{3e/j.ds. D iv id ing (2.5) by this new height increment, the radiat ive transfer equation becomes The problem of comput ing the T O A radiances "seen" by the satell ite sensor now becomes the problem of solving this R T E . A common simpli f icat ion in order to solve the R T E is to divide the zenith angle into discrete intervals, so-called streams. The simplest of these methods is the two-stream method, which divides the intensity field into only two directions; upwell ing and downwelling. Hence, it assumes that the intensity is ap-proximately constant in each hemisphere. The mult i -stream code D I S O R T (Stamnes et a l . , 1988, 2000) generalises this concept to an arbi t rar i ly large number of discrete angles, so that i t becomes possible to accurately compute radiances for pixels that are not directly underneath the satellite. The code is well documented and already part of the radiat ive transfer package l ibRadt ran . It w i l l be used for the forward model computations in this thesis. 2.2.2 Droplet Size Dist r ibut ion and its Phase Function A big simpl i f icat ion for l iquid water clouds is that the water droplets can be treated as spherical droplets, which makes the computat ion of the phase function wi th M i e theory (Mie, 1908) possible. In contrast, the non-spherical character of ice particles makes the computat ion of the phase function for ice clouds much more difficult (Mayer and Ky l l i ng , 2005). M ie theory employs Maxwel l ' s equations to derive a three-dimensional wave equation for electromagnetic radiat ion, which is solved for boundary conditions at the surface of a sphere (Petty, 2006). For spherical particles, the phase function only depends on the angle 9 between the original direction $V and the scattered direction fl of a photon. Since cos 0 = fl' • fl, it is common to write the phase funct ion as p(cos 0 ) . (2.7) dl{y., <t>) dr I(fi, 4>) - (1 - w)B - — / / p(JM', </>'• M , <TW, #Wd4i 4TTJ0 J _ X (2.8) 35 Chapter 2. Radiative Transfer and Development of the Forward Model For remote sensing applications it is of interest to model the mean scattering effects of the entire cloud droplet size distribution. A common assumption is to model this distribution as a modified gamma distribution. It is given by (Miles et al., 2000) where T(z) = J0°° tz~1e~tdt is the gamma function (e.g. Weisstein, 2007), vgam the shape parameter, and i"N = reff I\vgam + 2) a scaling parameter. Location and width of the distribution are given by r&jf and vgam. The larger vgam becomes, the narrower the distribution will be. In many studies that employ Mie theory for computing the scattering properties of the cloud layer, the shape parameter is not discussed at all (for instance, in Perez et al. (2000) or Cerderia et al. (2007)). It is common, however, to assume vgam to be constant (Miles et a l , 2000). For instance, the precalculated Mie tables for the A V H R R sensor, as available on the libRadtran homepage17, use a value of 7. Arduini et al. (2005) investigated the sensitivity of daytime retrieved continental cloud properties to droplet size distribution width. They found that the shape of the size distribution can have a significant effect on retrieved properties, therefore, the shape parameter must be representative of the cloud regime to be observed. Miles et al. (2000) compared the droplet size distributions of both marine and continental stratocu-mulus clouds that were reported from different measurement campaigns in the literature. From more than 40 observations, they found a mean shape parameter of 8.6 for marine clouds, however, the observed values varied from less than 3 to over 40. In order to avoid errors arising from an unrepresentative size distribution, I computed the best fit gamma distribution to the DYCOMS-II measured droplet size spectra. The shape parameter was deter-mined from equations (3a) and (3b) in Miles et al. (2000): Here, reff is the effective radius as computed from (1.11), and rvoi is the volume mean radius. It is 17http://www.libradtran.org/ (2.9) (2.10) Vi gam — 36 Chapter 2. Radiative Transfer and Development of the Forward Model computed from the measured spectra v ia (cf. section 1.3) 1/3 (2.11) Figure 2.2a shows a size distr ibut ion measured at cloud top dur ing the Ju l y 11 flight. The effective radius is 11.6 / jm, and the best fit gamma distr ibut ion has a shape parameter of 33.5, significantly narrower than the average value of 8.6 found by Mi les et al . (2000). Indeed, the average shape parameter encountered at this day was 26, the distr ibut ion of which is also plotted in the figure (see section 4.1 for more details). Not al l measured distr ibutions were well represented by a modif ied gamma distr ibut ion. In fact, Mi les et al . (2000) note that Sc droplet size distr ibutions are frequently b imodal . A n example of such a distr ibut ion, measured close to cloud base on Ju ly 13, is shown in F igure 2.2b. Here, the best fit d istr ibut ion computed from (2.10) has a shape parameter of 10.4. Since ugarn is not a variable parameter in my retrieval algor i thm, it has to be set to a fixed value as representative of the scene as possible. For the specific test case discussed in Chapter 4 (and also for the remaining D Y C O M S - I I cases), it is possible to analyse the in-si tu data prior to bui ld ing the L U T . Hence, a representative mean value can be found. For future applications of the method that require independence of the L U T of the scene to be retrieved, sensit ivity tests should be conducted in order to investigate the impact of a fixed gamma distr ibut ion shape parameter on the retrieved values. Figure 2.3a shows the phase functions of droplet size distr ibutions w i th effective radi i of 2, 11.5 and 25 /um and a shape parameter of 26 for radiat ion at A = 3.7 fim. In F igure 2.3b, the phase function for the 11.5 / jm distr ibut ion (as in Figure 2.2a) is reproduced in a polar plot (note the logari thmic scale in both figures). The functions were computed using M ie code included in K . F . Evans S H D O M (Spherical Harmonic Discrete Ordinate Method, another radiat ive transfer model) d istr ibut ion (Evans, 1998). A dist inct feature worth mentioning is the strong forward scattering peak. It becomes more pronounced wi th increasing re/f, which means that for this cloud a large amount of the scattered photons wi l l be scattered into the forward direction (this is also true at visible wavelengths, the reason why sunlight penetrates even thick clouds). A lso, more photons are scattered into the backwards direct ion than to the sides. The smal l peak at about c o s 0 = —0.8 becomes more pronounced in the visible wavelength range, where it is seen as the rainbow. 37 Chapter 2. Radiative Transfer and Development of the Forward Model in-situ spectrum of 2001-07-11 09:52:49+00:00 UTC i 1 1 1 1 ' 1 1 droplet radius [/xm] (a) in-situ spectrum of 2001-07-13 08:21:16+00:00 UTC droplet radius [/im] (b) F i g u r e 2.2: In-situ measured spectra of cloud droplet sizes from the J u l 11th and J u l 13th flights, together w i th best-fit gamma distr ibutions for (a) a unimodal d ist r ibut ion and (b) a b imodal distr ibut ion. Sol id black lines mark the in-situ data, dashed lines are the best fit dis-tr ibut ions. The average shape parameter on both days was approximately 26, this distr ibut ion is shown by the dotted line. 38 Chapter 2. Radiative Transfer and Development of the Forward Model 270 (b) F i g u r e 2 .3 : (a) Phase functions of cloud distr ibutions following a gamma distr ibut ion wi th a shape parameter of 26. Shown are functions for effective radi i of 2, 11.5 and 25 yitm. 0 represents the scattering angle, cos(0) = 1 is forward scattering, (b) Polar view of the phase funct ion for 11.5 /m i , also wi th a logari thmic scale. 39 Chapter 2. Radiative Transfer and Development of the Forward Model 2.3 Adiabatic Cloud Model In order to estimate the effect of the vert ical strat i f ication in the cloud layer on forward radiative transfer calculations, Brenguier et al . (2000) compared the vert ical ly uni form (VU) cloud model (employed by Minn is et al . (1995); Heck et al . (1999); Perez et al . (2000); Gonzalez et al . (2002) or Perez et al . (2002)) to an adiabatic stratif ied (AS) one. They found that the adiabat ic strati f ied model in general is the more accurate choice. F i rs t , in-si tu data analysed by Brenguier et a l . (2000) of marine and continental Sc taken during the Second Aerosol Character isat ion Exper iment ( A C E - 2 ) showed a close-to-adiabatic strati f ication. Second, by comparing the radiative properties of clouds represented by a vert ical ly uniform model and by the adiabatic stratif ied one, they found that in order to achieve radiat ive equivalence, the effective radius of the V U model had to be 80% - 100% of the cloud top reff of the A S model, the actual value depending strongly on cloud geometrical thickness and droplet concentrat ion. These results mean that the effective radius retrieved by a scheme employing a V U model is not necessarily representative of the actual cloud top reff, since the V U model provides no way to retrieve cloud geometrical thickness or droplet concentration. B y employing the A S model, on the other hand, it is possible to retrieve droplet concentration v ia the adiabatic relationships between the thermodynamic variables. Th is could be especially useful for monitor ing of the aerosol indirect effect. Indeed, the vert ical profiles measured during the D Y C O M S - I I campaign closely resemble adiabatic profiles. F igure 2.4 shows in-si tu data taken during R F 0 2 (July 11), together w i th the vert ical profile of an idealised adiabatic cloud. W h e n cloud droplet number concentration, humidity, temperature and pressure are prescribed at a reference level, i t is straightforward to compute an adiabatic cloud profile using the thermodynamic equations. As thermodynamic variables I use the moist static energy hm (e.g. Wal lace and Hobbs, 2006) and the tota l specific humidi ty qx (i.e. water vapour plus l iquid water). In this thesis, "adiabat ic profile" refers to a profile in which hm and qx are constant. If cp denotes the specific heat of dry air at constant pressure (J K _ 1 ) , g the earth's gravitat ional acceleration (m s ~ 2 ) , Lv the latent heat of evaporation (J k g - 1 ) and qv the specific humidi ty (g k g - 1 ) (Curry and Webster, 1999; Wal lace and Hobbs, 2006), hm is given by hm = cpT + gz + Lvqv. (2.12) 40 Chapter 2. Radiative Transfer and Development of the Forward Model droplet concentration [cm 3 70 90 110 L W C [g/m3] 0.2 0.4 0.6 0.8 1.0 285 287 289 temperature [K] 2 4 6 8 10 particle radius [/xm] F i g u r e 2.4: Vert ical profile of cloud properties from the idealised adiabat ic cloud model , together w i th in-si tu measurements from the Ju ly 11 D Y C O M S - I I flight. The adiabat ic cloud has a cloud top effective radius of 10.7 /j,m, a cloud top temperature of 284.8 K and a droplet concentration of N = 110 c m - 3 . Volume mean radius is converted to effective radius wi th k = 0.8. Note that in order to fit the observed l iquid water content the droplet concentration is overestimated in the adiabatic model (compare to Figure 2.6). 41 Chapter 2. Radiative Transfer and Development of the Forward Model Fol lowing the findings of Mi les et al . (2000), the total droplet concentration (cm 3 ) (2.13) is also assumed to be constant in my model. The above equations result in linear temperature profiles fol lowing T(z)=Tsfc-Tz, (2.14) where T represents the dry adiabatic lapse rate T = Td « 10 K k m - 1 below cloud base and the saturated adiabatic lapse rate r = Ts wi th in the cloud. T s varies wi th temperature and pressure, for marine Sc a value of r s w 6 K k m - 1 is representative (Curry and Webster, 1999). Below cloud base, no condensation occurs, so that the specific humidi ty remains constant wi th height: qv(z) = const. (2-15) W i t h i n the cloud, qv follows the saturat ion specific humidi ty qs: qv(z) = q.(T,p), (2.16) where qs is dependent on both temperature and pressure and can be obtained from the Claus ius-Clapeyron equation (see C u r r y and Webster, 1999, for further details). The l iquid water content can be determined by subtract ing the saturat ion water vapour content (the saturat ion specific humidi ty expressed in g m~ 3 ) from the total water content (given by the specific humidi ty below cloud base), qi also follows an almost l inear profile w i th in the cloud: qi{h) = Cv>h, (2.17) where h denotes the height above cloud base, and the moist adiabatic condensate coefficient Cw varies from about 1 to 2.5 x 1 0 - 3 g m ~ 3 m - 1 , depending sl ightly on temperature (Brenguier et a l . , 2000). In my model , cloud top and surface pressure, cloud top temperature, c loud top l iqu id water content and droplet concentration are prescribed. F rom the latter two, the volume mean radius rvoi is computed, 42 Chapter 2. Radiative Transfer arid Development of the Forward Model given the relationship 9 = ^ / . (2-18) In order to compute the effective radius from the volume mean radius or vice versa, an assumption about the droplet size distr ibut ion has to be made. If the distr ibut ion is model led as a modif ied gamma distr ibut ion as discussed in the previous section, (2.10) can be used for the conversion. More common in the l i terature, however, is to employ the fc-value parameterisat ion suggested by M a r t i n et al . (1994). M a r t i n et a l . (1994) analysed in-situ measurements from marine and continental Sc of several field campaigns. In their data, the clouds were relatively homogeneous and entrainment played a minor role. For these cases, they proposed to parameterise the effective radius in terms of the volume mean radius and a dimensionless constant fc following reff = k - ^ 3 r v o l . (2.19) In mari t ime airmasses, M a r t i n et al . (1994) found an average value of k = 0.80 ± 0.07 (one standard deviat ion). Measured values of Pawlowska and Brenguier (2000) dur ing A C E - 2 conf irm the estimates made by M a r t i n et al . (1994), however, they also show a strong variabi l i ty of k w i th height, w i th values at cloud base ranging from as low as 0.4 to up to 1, while values at cloud top where more constant around 0.8 to 1.(Figure 2.5). Combin ing (2.10) and (2.19), the relationship between fc-value and vgam becomes * V m = 2 ( V 1 / 3 - l ) _ 1 . (2.20) A dist inct problem in Figure 2.4 is that the droplet concentration is overestimated by about 30 c m - 3 in order to fit the observed l iquid water path and droplet sizes. A lso , the slope of the l iqu id water content line seems sl ightly overestimated. Th is problem is due to subadiabatic regions in the cloud. For instance, as discussed in section 1.2, entrainment of dry air f rom above the inversion often causes subadiabaticity. The cloud column can hence be adiabatic in the lower part and subadiabat ic in its upper part. Brenguier et al . (2000) emphasise that the A S model st i l l is an idealised representation of the actual cloud. They note that the adiabatici ty should be taken as a max imum reference for the actual cloud microphysics at al l levels. However, they also point out that the variety of possible profiles is too large for a simple parameterisat ion, and hence suggest the A S model as a simple and relatively accurate 43 Chapter 2. Radiative Transfer and Development of the Forward Model 25 June 1997 26 June 1997 0.4 0.6 0.8 d 3 / d 3 0.6 0.8 d 3 / d 3 F i g u r e 2 .5 : Measurements of k value versus height in marine stratocumulus clouds. The k value varies widely from values as low as 0.4 to almost 1, w i th a larger var iabi l i ty at cloud base (reprinted from Pawlowska and Brenguier, 2000, © 2000, w i th permission from Blackwel l Publ ish ing) . compromise solut ion. For comparison, Figure 2.6 shows the same data as Figure 2.4, but this t ime wi th a subadiabatic profile using 75% of the adiabatic values. This t ime, both droplet concentrat ion and l iquid water content are well-fit. In their investigation of radiat ive equivalence between the V U and A S models, Brenguier et al . (2000) found that a major difference between the two models is the dependence of opt ical thickness on cloud geo-metr ical thickness. Equat ion (1.9) introduced the opt ical thickness as the integrated ext inct ion coefficient between cloud base Zbot and cloud top ztop: •> Zbot Pedz. (2.21) If the ext inct ion cross section cre = 1 7 0 + o~s (cf. 1.12 and 1.13) is wr i t ten in terms of the extinction efficiency Q e x t = i r e / (27rr 2 ) , the ext inct ion coefficient of a droplet size d ist r ibut ion is given by Pe = J n(r) {Qext{rW} dr. (2.22) The ext inct ion efficiency can be interpreted as how effective a part icle can attenuate radiat ion wi th respect to its size, and is a complex function of the size parameter x, and hence part icle size and wavelength. In order to compute the optical thickness in the A S model, it is necessary to express (2.21) in terms 44 Chapter 2. Radiative Transfer and Development of the Forward Model 0.7 0.6 0.5 0.4 43 •55 0.3 0.2 0.1 0.0 droplet concentration [cm 3] 70 90 110 L W C [g/m3] 0.2 0.4 0.6 0.8 1.0 1 1 1 1 1 1 — i — i — i — i — i — • D • O V • 0 \° V \-o D O i p 0 1 P / • / • p y <>•/ y reff \LWC \Z / Vvol | %/: /* \ \ \ \ \ t f l \ ' \ \ \ \ X. 1 1 1 1 m \ % spec, humidity [g/kg] • 285 287 289 temperature [K] 2 4 6 8 10 particle radius [urn] F i g u r e 2 .6 : The same as Figure 2.4, but wi th a subadiabatic l iqu id water content of 75% of the adiabatic value. of the available thermodynamic variables. B y combining (2.11), (2.13) and (2.18), we can write the l iquid water content as a function of the particle size distr ibut ion: 4n o dr. (2.23) Th is enables us to express r in terms of the droplet radius and qi ( Jzbot qiw . Jz, Ztop Zbot Jn(r) [Qext(r)irr(z)2] dr J n(r) Pi-^rizf dr qi(z)dz. (2.24) J For cloud droplet sizes in the visible wavelength range, the ext inct ion efficiency Qext is approximately constant at a value of two (e.g. Pet ty, 2006, Figures 12.4 - 12.6), so that it can be pul led out of (2.24). The integrals can then be wri t ten as the effective radius: 4pi rztop J Zl,„t reff{z) dz (2.25) 45 Chapter 2. Radiative Transfer and Development of the Forward Model (e.g. Naka j ima et a l , 1991), which reduces to (1.14) for Q e x t = 2 and constant reff. For a V U model , the integral over height can be approximated by a mult ip l icat ion of the integrand by cloud geometrical thickness AH, whereas in the A S case it has to be expl ic i t ly computed. In the infrared wavelength range, however, Q e x t and thus the opt ical thickness of the cloud become a funct ion of both cloud geometrical thickness and particle size dist r ibut ion (Petty, 2006). Thus, there no longer is a simple relationship between r, L W P and reff. For simplici ty, I wi l l use the visible opti-cal thickness in my forward model. Th is has the advantage that the opt ical thickness retrieved by my algor i thm can be compared w i th the preceding and subsequent operat ional dayt ime M O D I S retrievals. 2.4 Choice of Channels and Correlated-k The M O D I S instrument offers a total of 36 channels in the visible and infrared spectrum (Barnes et a l . , 1998), 16 of which cover the infrared spectrum from 3.7 to 14.1 ^ m . Not al l of these channels are suited for remote sensing of cloud properties. A s discussed in Chapter 1, the radiat ive properties of the cloud droplets in the infrared strongly depend on particle size. Hence, the employed channels should maximise these differences. However, it is also desirable for the satellite sensor to measure photons arr iv ing directly f rom the cloud. Th is means that the impact of the atmosphere above the cloud on the radiat ion arr iv ing at the satellite should be as smal l as possible. F igure 2.7 shows how the transmittance of a typical midlat i tude summer atmosphere varies wi th wavelength. The bot tom panel displays the total transmittance, while in the panels above, the effects of ind iv idual atmospheric components are shown. The four channels at approximately 3.7 ( M O D I S number 20), 8.5 (29), 11 (31) and 12 (32), as used in previous studies (cf. Chapter 1), are highlighted. They are located at wavelengths at which the total transmittance is large. Other M O D I S channels are located, for instance, at approximately 4.5 or 9.7 f im. Wh i le these channels provide data for other applications, the atmosphere at these wavelengths is too opaque for remote sensing of c loud properties. Hence, channels 20, 29, 31 and 32 are indeed best suited for the remote sensing of cloud properties at night. The second panel from the bot tom shows that most of the absorpt ion that occurs above cloud top is due to water vapour. It should thus be considered in the retrieval method. I wi l l discuss in section 2.5 how the effect of water vapour can be accounted for. The forward model must account for the band wid th of the chosen channels in the radiat ive transfer model. Th is requirement arises from the fact that the M O D I S channels are not monochromatic, but cover finite wavelength intervals. Hence, the "monochromat ic" radiances measured by the instrument are 46 Chapter 2. Radiative Transfer and Development of the Forward Model Z E N I T H A T M O S P H E R I C T R A N S M I T T A N C E U V I VIS Near IR Thermal IR I CO r N 2 0 o . 0.3 0.4 0.5 0.6 0.8 1 1.2 1.5 2 2.5 3 4 5 6 7 8 910 12 15 20 25 30 40 50 Wavelength [u.m] F i g u r e 2 .7 : Dependence of atmospheric transmittance on wavelength for a typical midlat i tude summer atmosphere. Highl ighted in grey are the four M O D I S channels relevant in this study (reprinted from Petty, 2006, © 2006, wi th permission from Sundog Publ ish ing) . 47 Chapter 2. Radiative Transfer and Development of the Forward Model spectral-mean radiances. In order to compute spectral-mean radiances wi th the radiative transfer model , the wavelength depen-dent absorpt ion of the different molecular species l isted in Figure 2.7 has to be taken into consideration. For an extact solut ion, an integration over many indiv idual ly computed monochromat ic radiances has to be performed, which can be computat ional ly expensive. The correlated-k procedure (e.g. F u and L iou , 1992; K r a t z , 1995) provides a way to reduce the computat ional cost by several orders of magnitude while maintaining a high accuracy and is used in this thesis. The concept of correlated-k can be explained by considering the simplif ied problem of computing the spectral-mean transmission of an atmospheric layer, due to only molecular abso rp t i on . 1 8 W h e n the ab-sorption coefficient of an atmospheric constituent is plotted against wavelength, the absorpt ion spectrum in general is very complex. For example, F igure 2.8a shows a spectrum of C O 2 at a pressure of 507 h P a . The distinct peaks in the spectrum are called absorpt ion lines, they can be explained by quantum theory (e.g. Pet ty , 2006) and their height and wid th are dependent on both temperature and pressure. In the spectral interval of a satellite channel, there can be thousands of such lines, especially if the effects of different molecular species are overlapping. F i rs t consider a single absorbing species. Let the spectral interval w id th of a channel be A A = A 2 — A i , and the mass path of the absorbing constituent be denoted by u. The mass path is the total mass of the species in a column, measured in kg m - 2 . It is then convenient to introduce the mass absorption coeff icient 1 9 kx = (2.26) P where p is the density of the constituent. The spectral-mean transmittance follows from the integration of Beer's Law over wavelength: tAX(u,p, T) = jf 2 exp [-kx(p, T) u) dX. (2.27) Here, p denotes pressure and T temperature. The fundamental idea of correlated-k stems from the fact that in order to approximate the integral in (2.27) numerically, it is not important in which order the indiv idual lines are arranged in the spectrum, as long as the area underneath the curve stays the same. Figure 2.8 i l lustrates this idea. If the complex 1 8 T h e layer concept is relevant in as far as that DISORT and other R T E solvers divide the atmosphere into discrete layers. 1 9 D o not confuse the mass absorption coefficient "k" used in correlated-k with the k-value used in the conversion of rvoi to reff. 48 Chapter 2. Radiative Transfer and Development of the Forward Model I F i g u r e 2 .8 : I l lustrat ion of the fundamental idea of correlated-k. (a) Absorp t ion spectrum (k-coefficients) of C O 2 at a pressure of 507 h P a . (b) Inverse cumulat ive probabi l i ty funct ion k(g), that represents the reordered fc-coefficients from (a). For a numerical integrat ion, the function in (b) requires much less quadrature points than the function in (a) (reprinted from Mlawer et a l . , 1997, © 1997, wi th permission from the Amer ican Geophysical Un ion) . spectrum of the absorpt ion coefficient in Figure 2.8a is integrated numerical ly, a large number of quadra-ture points is needed in order to obtain a reasonable accuracy. 2 0 However, by reordering the absorption lines in ascending order (Figure 2.8b), we get a function that is much easier to integrate, yet covers the same area. The reordering of the lines can be done by comput ing the inverse cumulat ive probabi l i ty distr ibut ion k(g) of the absorpt ion coefficients. The cumulative probabi l i ty funct ion g(k) = f£ p(k')dk' is the integral of the probabil i t ies p(k') for al l mass absorpt ion coefficients k' between 0 and k, so that its inverse k(g) gives the upper bound of the (3 x 100)% smallest ks. For instance, 60% of al l fc-values in the spectral interval of the channel are smaller than k(g = 0.6). Equat ion (2.27) can then be wri t ten as tAx(u,p,T)= [ exv[-kg(p,T)u}dg. (2.28) Jo For the numerical integration, the integral is approximated by a finite sum: N t A A ( u , P l r ) » X l w * E X P [ ~ K * ( P . T ) u l . ( 2 - 2 9 ) i=l wi th quadrature weights u>; and quadrature points gj. For a funct ion shaped as in Figure 2.8b, much 2 0Calculations of all individual lines are called line-by-line calculations. 49 Chapter 2. Radiative Transfer and Development of the Forward Model less quadrature points are needed than for one shaped as in 2.8a, and it is hence computat ional ly more efficient. The method of reordering the absorpt ion coefficients is cal led the k-distribution method. Since the ks are s imply reordered and no approximat ion is made, it is exact. However, in a typ ica l atmospheric layer pressure and temperature wi l l vary wi th height, and since the ks are dependent on p and T (2.28) is only exact for vert ical ly homogeneous layers (homogeneous mass paths). Nevertheless, the same equation is used to approximate the transmissivity of inhomogeneous mass paths. The assumption that is made in the correlated-k method is that for any p, T found in the atmosphere, any part icular absorpt ion coefficient k(p, T) wi l l always have the same cumulative probabil i ty. Thus , the absorpt ion coefficients at different pressure and temperature levels are correlated wi th each other. In other words, the shape of k{g) is assumed to be constant, but the magnitudes of the ks are scaled w i th p and T (e.g. Pet ty , 2006). Frequently the absorption wi th in a spectral interval arises from overlapping lines of two or more molecular species. The standard procedure in this situation (Kra tz , 1995) is to assume the spectral features of the species to be uncorrelated. The mean transmittance for a combinat ion of two constituents wi th mass paths ui and u2 can then be expressed as a double summat ion: N M t&x{ui,u2,p,T) « i W i e x P [-ki(p>T)ui]} {WJ exp [-kj(p,T)u2]} . (2.30) »=i 3 = 1 K r a t z (1995) computed optimised ^-coefficients for the five A V H R R channels. He later appl ied the same technique to several of the M O D I S channels (Kra tz , 2001). Table 2.1 lists the M O D I S channels centred at 3.7, 8.5, 11 and 12 /mi , together wi th the spectral intervals taken into consideration in the correlated-k routines, and the atmospheric constituents that contribute to the absorpt ion in each channel. The number of fc-coefficients required for the integration is as low as one to five, depending on the gas. T a b l e 2 . 1 : Spectral intervals (as considered in the K r a t z (2001) correlated-k routines) of the four M O D I S channels implemented in the forward model, the atmospheric constituents whose absorpt iv i ty is accounted for, and the number of fc-coefficients required for the spectral integration of each species. channel centre wavelength (/jm) wavelength interval (/mi) considered gases (no. of fcs) 20 3.748 3.656 - 3.839 H 2 0 (4), C 0 2 (1), C H 4 (1) 29 8.553 8.333 - 8.772 H 2 0 (5), 0 3 (1), C H 4 (1), N 2 0 (2) 31 11.010 10.526 - 11.494 H 2 0 (5), C 0 2 (1) 32 11.920 11.494 - 12.346 H 2 0 (5), C 0 2 (1) The K r a t z (2001) M O D I S routines were not part of l ibRadt ran at the t ime the work on this thesis 50 Chapter 2. Radiative Transfer and Development of the Forward Model was conducted. I thus implemented the corresponding functions into the model in order to compute the spectral-mean intensities for the desired channels. 2.5 The Overlying Atmosphere Figure 2.7 showed that most of the absorption above cloud top is due to water vapour. A s discussed in Chapter 1, marine Sc often appear under conditions of strong subsidence in subtropical regions. Th is generally implies a low water vapour content of the free atmosphere, nevertheless, it should be investigated how large the impact of the exist ing water vapour is and how it can be accounted for in the forward model. The way the overlying atmosphere (referred to below as O A ) is treated in the existing l i terature varies, but most studies (Perez et a l , 2000; Gonzalez et al . , 2002; Cerdena et a l . , 2007) ignore absorpt ion above cloud top, arguing that the subsiding air is mostly dry and that, most of the atmospheric water vapour is contained in the boundary layer air - so that its effects are already contained in the observed clear sky brightness temperatures (which are used to fix parameters of the cloud model , cf. section 1.5.2). 2 1 The other possibi l i ty (Perez et a l . , 2002) is to compute the bulk absorpt ion effects f rom a given water vapour path observed from an independent source (e.g. microwave soundings or reanalyses of weather forecast models). In order to estimate the impact of the O A for my retrieval scene, I computed absorpt ion and emission effects from radiosonde water vapour and temperature profiles measured by the closest operational mete-orological sounding stat ion in San D iego . 2 2 Two soundings dai ly were available, the nightt ime soundings of Ju ly 11 (RF02) and Ju ly 13 (RF03) are plotted in Figure 2.9. The moisture profiles (middle panels) in fact show a very dry atmosphere above the subsidence inversion, especially for Ju ly 11. Us ing adapted versions of the K r a t z (2001) correlated-k routines, I computed emissivity and transmissivi ty for each layer defined in the sounding data (left and middle panels), as well the profile of the upwell ing radiances (right panels). Since the scattering effects of gaseous particles are negligible for infrared wavelengths (Petty, 2006), scattering was not considered. Trace gas concentrations of 350 ppm for C O 2 , 1.75 ppm for C H 4 and 0.31 ppm for N 2 0 were assumed for the computations, and the O3 profile was taken from the U S standard atmosphere compi led by Anderson et al . (1986). For al l channels, the transmissivi ty of the dry atmosphere encountered on Ju l y 11 is close to 1 k m - 1 2 1 This , of course, assumes that the boundary layer water vapour path of the utilised cloud free pixels equals that of the observed cloudy pixels. 2 2 T h e data are freely available at http://weather.uwyo.edu/upperair/sounding.html. 51 emissivity [(* 10 2)W/m2/sr/cm l] 0.00 0.02 0.04 0.06 0.08 0.10 0.120.0 transmissivity per km 0.2 0.4 0.6 0.8 1.0 radiance [W/m2/sr/cm l] 0.095 0.100 0.105 0.110 0.115 03 PL, <P m xn 1000 o3 OH 1000 I \ / 1 ( 1 1 ; i ) i t \ r-< 1 : J 1 : 7 - -\ i 7-~^-- .' I 220 240 260 280 temperature [K] 0 2 4 6 8 10 water vapour mixing ratio [g/kg] • • ••.} -. i : 1 : 1 i : :i : : l : : l ; i :: 1 :; 1 •: 1 -.' 1 1 •: 1 :: 1 • i i 1 •: 1 •1 1 \ \ r I — 1 , i - — i ^ 1 : \ : !l : l : 1 . . . ; : : 1 : i 1 : 1 ; i : ; 1 j • 1 : ; 1 : : 1 • | ; i 1 ; \ 1 i / .* : / 0.040 0.045 0.050 0.055 [(*10~2) W/ m21 sr I cm-1} o o CM CS3 O o C N C O C N to F i g u r e 2 .9 : Atmospher ic soundings from San Diego f rom (top) Ju ly 11, 2001, and (bottom) Ju l y 13, 2001. Shown are [left] temperature profile (thick) and corresponding emissivity of the atmosphere in the wavenumber ranges of the four M O D I S channels 20, 29, 31 and 32; [middle] humidi ty (thick) and transmissivi ty for the channels; and [right] radiance profiles. No scattering has been considered for the computat ion of the radiances. Channels are marked as sol id-channel 20, dashed-channel 29, dash-dotted-channel 31, dot ted-channel 32. Channel 20 radiances and emissivit ies have been mult ip l ied by 10 2 for better display (bottom scale; top scale for the remaining channels). See text for more details. Chapter 2. Radiative Transfer and Development of the Forward Model at al l layers. Influenced most by the O A is channel 29. O n Ju l y 13, there is a layer of increased humidi ty between 800 h P a and 600 h P a , where the layer transmissivi ty varies between 0.9 and 1 k m - 1 for channels 20, 31 and 32, and between 0.8 and 1 k m - 1 for channel 29. However, the attenuat ion of radiat ion by absorption is part ly countered by emission, so that the total differences between the radiances at the inversion and the top of the atmosphere are small . O n both days, the intensities changed by less that 2% for channel 20, less than 3.5% for channel 29 and less than 0.5% for channels 31 and 32. Wh i le these effects do not contribute much to the observed T O A radiances, they represent a potent ial and - if known - unnecessary source of uncertainty in the retrieval, hence, I decided to account for atmospheric transmission and emission above cloud top in the forward model. The simplest, but also most computer t ime consuming approach to account for the O A in the forward model is to feed the entire sounding profile into the radiative transfer model. However, it is undesirable to make the L U T dependent on a specific atmosphere. A lso, at most locations over the ocean, no radiosonde soundings are available in the vic in i ty of the retrieval scene. Rather, it is often possible to infer estimates of tota l L W P from an independent source as mentioned above. Al ternat ively, it might be feasible to add a tota l t ransmissivi ty t* and average emitted intensity B* of the O A as retrievable parameters to the algori thm. In order to keep my method flexible, I decided to account for O A effects by adding these two parameters to the forward model. Using this approach, the cloud top radiances can be computed independently from the O A , the impact of which can be added afterwards. In my case, t* and B* can easily be computed from the available atmospheric soundings. In the absence of scattering, the R T E (2.8) can be wri t ten as'Schwarzschild's equation: (2.31) where 7 T(oo) denotes the upwell ing intensity at the top of the atmosphere, 7 T (0) the upwell ing atmosphere at cloud top (the bot tom of the O A ) , and W^(z) the weighting function, defined by W\z) = dt(z, OO) _ pg(z) dz fx t[z, oo). (2.32) B y introducing the weighted average Planck function (Petty, 2006) (2.33) 53 Chapter 2. Radiative Transfer and Development of the Forward Model and comput ing the tota l transmittance as the product of al l ind iv idual layer transmissivi t ies, the T O A intensity can be conveniently wr i t ten as a function of cloud top radiance, t* and B*: 7 T(oo) = / T ( 0 )£ * +B*(l-t*). (2.34) In the above equations, the O A is effectively treated as a single homogeneous layer w i th transmittance t* and emitted intensity B*. Unfortunately, both parameters are wavelength dependent, so that they have to be computed for each channel. Wh i le this poses no constraint if the atmospheric profiles are known in advance, it adds addit ional parameters to the retrieval if they were to be inferred from the satellite observations. However, even in this case, it seems feasible to map t* and B* of one channel to a l l other channels employed in the algori thm. Th is idea is left for future work. 2.6 Forward Model 2.6.1 Radiative Transfer Mode l : l ibRadtran The radiat ive transfer package l ibRadt ran , described by Mayer and K y l l i n g (2005), contains a number of tools for radiat ive transfer calculations in the earth's atmosphere. Par t i cu la r ly important for this thesis is its abi l i ty to read in arbi t rary atmospheric and cloud profiles, as well as precomputed phase functions, and to solve the R T E using D I S O R T . 2 3 The package comes wi th the K r a t z (1995) A V H R R correlated-k routines implemented, but, as mentioned above, I had to add the corresponding code for the M O D I S channels (Kra tz , 2001). Clouds are included in the radiat ive transfer calculations by specifying vert ical profiles of height, l iquid water content and effective radius. Phase functions are read in from tables produced by the M ie code included in S H D O M (Evans, 1998), the same that was used to produce the examples in Figure 2.3. In M ie theory, the solution to the electromagnetic wave equation (cf. section 2.2) is expressed as a finite series of Legendre polynomials (e.g. Pet ty , 2006), so that the phase functions are l isted in the M ie tables in terms of Legendre coefficients. These coefficients are directly used by D I S O R T for the numerical solution of (2.8). The atmosphere is div ided into discrete layers, for each of which the radiat ive transfer problem is solved. Absorpt ion coefficients and single scatter albedos are computed from the K r a t z (2001) routines, while the phase funct ion is assumed to be constant over the spectral interval of the channel. D I S O R T is 2 3 T h e complete software package, however, is capable of much more; see Mayer and Kylling (2005) for details. 54 Chapter 2. Radiative Transfer and Development of the Forward Model called once for each fc-coefncient, and the result ing intensities are summed using the weights W{ in (2.29) to compute the channel integrated intensities. 2.6.2 Design of the Forward M o d e l I implemented a system based on P y t h o n 2 4 scripts that automatical ly computes a number of adiabatic cloud profiles and calls l ibRadt ran in order to generate a L U T . The cloud profiles can be computed from either randomly chosen cloud properties drawn from a given interval or from discrete values. The forward model requires the four variable input parameters effective radius reff, total droplet number concentration N, cloud top pressure pct and cloud top temperature Tct (either intervals or discrete values), as well as the fixed inputs surface pressure psfc, gamma dist r ibut ion shape parameter vgam for comput ing the M ie properties of the cloud droplets, k-value for the conversion of rvoi to r" e// 2 5 and the number of datapoints that should be computed (if discrete values of the first four parameters are given, this input is not applicable). The system is able to run in paral lel mode, so that the generation of large L U T s can be performed in reasonable t ime. For instance, the generation of 96,000 datapoints requires about 8 hours on a cluster of 32 2 -GHz processors. After the input parameters are read in, l iquid water content qi is computed from reff and N using (2.18) and (2.19). The specific humidi ty qv is obtained from (2.16) by using Tct and pct- Since l ibRadt ran uses height as a vert ical coordinate, the hydrostat ic equation (e.g. C u r r y and Webster, 1999) is used to relate pressure and height in the cloud model , so that Tsfc can be computed from (2.12). The atmospheric and cloud variables are then integrated using a denned step-size (e.g. 0.2 hPa) from the surface to cloud top. B o t h cloud (z, rejf, qi) and atmospheric (z, p, T, specific humidi ty qv and trace gases) profiles are passed to l ibRadt ran , where for this study, the same trace gas concentrations were used as for the overlying atmosphere in section 2.5. l ibRadt ran is then run to compute the cloud top radiances. The cloud visible opt ical thickness is computed using (2.25), w i th Qext = 2. If desired, the effects of the O A can be accounted for using the procedure described in section 2.5. Precomputed t* and B* as well as radiosonde soundings can be used. As a last step, the T O A intensities are converted to brightness temperatures using the inverse of (1.18). Since the P lanck funct ion also varies over the spectral intervals of the channels, the channel-mean B T can either be approximated at the channel-centre wavelength, or a correction formula suggested by van Deist (2005) can be used in order to account for the polychromat ic i ty of the channels. 2 4 T h e free programming language Python can be obtained at http://www.python.org/. 2 5^pam and k are held separate in order to facilitate sensitivity studies and to avoid time consuming re-computations of the Mie properties for small changes of k. 55 Chapter 2. Radiative Transfer and Development of the Forward Model For al l computat ions, the satellite is assumed to be directly above the cloud (i.e. T O A B T s are computed for nadir v iew), no satell ite zenith angle is considered. F igure 2.10 shows two vert ical profiles of channel 20 (3.7 p.m), 31 (11 ftm) and 32 (12 pm) brightness temperatures for an opt ical ly th in ( r « 3.8) and an opt ical ly thick ( r « 43) c loud. The plots i l lustrate how the attenuation of radiat ion wi th in the cloud layer is simulated by the forward model . A s discussed in Chapter 1, the surface signal st i l l influences the cloud top radiances for the th in cloud case, while for the other case, the cloud emits as a black body in channels 31 and 32. The point at which the B T s of al l three channels coincide wi th the actual temperature is clearly visible. A t this point, al l photons propagating upwards through the cloud that entered the cloud at its bot tom are extinguished, and the cloud emits as a black body wi th its actual temperature T . The intensities observed at cloud top are only determined by emission in this upper part of the cloud (cf. section 1.5.2). F igure 2.11 shows the BTD(3.7-11) and BTD(11-12) signals, each wi th the 11 p.m B T as reference. As in Perez et al . (2000), cloud top temperature is fixed at 285 K. Since in my forward model , cloud top pressure is a fixed parameter as well (in the given case at 900 hPa) , the surface temperatures as computed from the adiabatic lapse rates depend on the cloud geometrical thickness and hence differ for the indiv idual clouds. Thus , the curves do not converge in a single point on the right side, as they do in Figure 1.9. M y model is able to reproduce the relationships found by B a u m et al . (1994) and Perez et a l . (2000). A s expected and discussed in Chapter 1, the changes in B T wi th changing opt ical thickness become smaller for larger T. In contrast to the V U cloud model, the max imum opt ical thickness in the A S model depends on rejf. Consequently, clouds wi th smal l effective radi i cannot become as opt ical ly thick as in the B a u m et al . (1994) and Perez et al . (2000) studies, where both parameters were independent (cf. F igure 1.9). A s noted in Chapter 1, if Tct and pct are known, rejj and r could in principle already be retrieved manual ly from the plots in Figure 2.11. The characteristic of my A S model that surface temperature is obtained f rom the adiabat ic lapse rates is part icular ly pronounced is the case of constant reff, N and T but vary ing pct- As shown in Figure 2.12, changes in cloud top pressure can cause large differences in the B T D signals. In Chapter 4, I wi l l fix cloud top pressure in the L U T in order to simpli fy the retrieval problem. It wi l l hence be important to evaluate the sensit ivity of the retrieval to changes in cloud top pressure (or more accurately, to the pressure difference between cloud top and sea surface) i n order to estimate the impact of sl ightly varying cloud top heights in the scene to the retrieval accuracy. I wi l l come back to this .topic in section 4.4. 56 Chapter 2. Radiative Transfer and Development of the Forward Model e 80 285 290 295 temperature [K] 300 280 285 290 295 temperature [K] F i g u r e 2 .10 : Vert ica l brightness temperature profiles through a th in cloud (left) and a thick cloud (right). Shown are the brightness temperatures of M O D I S channel 20 (3.7 f im; th in sol id l ine), channel 31 (11 u.m; th in dashed line) and channel 32 (12 fim; th in dash-dotted l ine), as well as temperature (T) and potential temperature (0 ) . For the thick cloud, the brightness temperatures of channels 31 and 32 are similar to the temperature of the cloud (i.e. the cloud emits as a black body in this wavelength range), whereas for the th in cloud, the surface temperature st i l l influences the signal. 57 Chapter 2. Radiative Transfer and Development of the Forward Model brightness temperature ch. 31 (llu,m) [K] brightness temperature ch. 31 (llu,m) [K] F i g u r e 2 . 1 1 : Droplet size and cloud opt ical thickness influence radiat ion at the different M O D I S channels to a different extend. Shown is the dependence of brightness temperature (BT ) and B T difference ( B T D ) on varying effective radius and opt ical thickness for a cloud top temperature of 285 K and a cloud top pressure of 900 h P a . Symbols from left to r ight: opt ical thickness of 0.5, 1, 1.5, 2, 3, 5, 8. Note that clouds wi th a smal l effective radius cannot become as thick as clouds wi th a larger r e / / in the adiabatic c loud model (compare to the vert ical ly uni form model of Perez et al . (2000)). 58 Chapter 2. Radiative Transfer and Development of the Forward Model ^ 1.4r | 1.2 l r m "5 0.8 | 0.6 0.4 [ o 0.2 r Q E-1 PQ 0 4um 284 286 288 290 292 brightness temperature ch. 31 (ll|0.m) [K] 294 F i g u r e 2 .12 : The same in Figure 2.11, but for different cloud top pressures (solid-940 h P a , dashed-920 h P a , dotted-900 hPa) . In the adiabatic model , a different cloud top pressure wi th a fixed cloud top temperature results in a different surface temperature. Hence, the left con-vergence points in the figures (surface temperature) are warmer for higher c loud tops. 59 Chapter 3 Nonlinear Regression with Neural Networks and Uncertainty Estimation The forward model I developed in the previous chapter maps given cloud properties to satellite observations computed by the radiative transfer model. F rom a set of such forward computat ions, the unknown inverse function has to be inferred in order to determine the cloud parameters from the observed satellite data. Th is poses a nonlinear mult iple regression problem, which, as proposed in Chapter 1, I wi l l tackle w i th art i f icial neural networks and the Bayesian framework suggested by M a c K a y (1992a,b) and Aires (2004); Aires et al . (2004a,b). M y objective for this chapter is to describe this approach and to discuss its appl icabi l i ty to my remote sensing problem. For simplicity, the original M a c K a y (1992a,b) publ icat ions, as well as the book by Bishop (1995) and the review paper by M a c K a y (1995), only discuss networks wi th one output variable and hence only one output uncertainty. For mult idimensional mappings, M a c K a y (1995) suggests the use of ful l covariance matrices to describe the output uncertainties, but does not elaborate on this idea. Aires (2004); Aires et a l . (2004a,b) point out the importance of uncertainty estimates for mult idimensional mappings that arise in atmospheric inverse problems, and expand the or iginal M a c K a y (1992a,b) method to networks having mult iple input and output variables. In this chapter, I wi l l first formulate the regression problem in a general context (section 3.1). Th is introduct ion (mainly based on the book by Bishop (1995)) is useful i n order to understand the problems that arise in neural network training for which the Bayesian framework provides some remedy. I wi l l then derive the theoretical foundation of how the uncertainty in the network predict ion and the Jacobian can be estimated (sections 3.2 - 3.5). The derivation wi l l be i l lustrated by an appl icat ion of the method to a simple problem, which wi l l demonstrate important benefits but also highlight problems. 60 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation I implemented the Aires method by extending the N E T L A B toolbox by Nabney (2002), a set of routines implemented in M a t l a b 2 6 . Some information about the implementat ion is given in section 3.6; in section 3.7 I discuss the usefulness of the method for my remote sensing problem. 3.1 The Nonlinear Regression Problem Consider the problem of nonlinear regression on noisy data, for which M a c K a y and Aires et a l . derive their methods. The data consist of a dataset S = {X, D} w i th input data X = {x1, x 2 , x N } and target (or observed) data D = {t1, t2, ...,tN}. xn and tn denote vectors that contain al l input and target variables, respectively. We assume the inputs X to be exact, while the target data D are noisy. Str ic t ly speaking, for the inverse problem, we deal wi th the opposite case; the input data - the satellite measurements - are noisy, whereas the target data - the inputs to the forward model - are known exactly at the t ime the dataset S is generated. I wi l l come back to this issue shortly. 3.1.1 Er ror Functions and Overfitting Once the dataset S has been observed, we wish to gain information about the funct ion or relat ion which generated the targets from the inputs. Since the target data are noisy, they are composed of two parts - the underly ing generator, representing the physical relationship we wish to capture, and the noise. Let M denote a stat ist ical model for S. M. wi l l be a neural network shortly, but first consider a polynomial of order K: K y(x) = w0 + w\x + ... + WKXK = '^2wjx'>• (3-1) 3 = 1 This model maps an input variable x to an output variable y, the prediction. The exact form of the model function depends on the order of the polynomial and the values of the parameters Wj. Hence, following Bishop (1995), I wri te y = y(x;w) (where al l parameters wj are grouped into one parameter vector w). A standard way to find the best possible model architecture for a given S is to consider the errors between the model predictions yn and the desired values, the targets, tn for the N data pairs in S. In order to obtain a good fit, the differences yn — tn should be as smal l as possible, which can be achieved by minimis ing the sum-of-squares error (cf. the cost function (1.22)) E=\j2{y(xn-w)-n2. (3.2) Z n = l 26http://www.mathworks.com/ 61 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation ' 0 0.5 1 0 0.5 1 F i g u r e 3.1: Left: a dataset S, from which we would like to infer informat ion about the underlying generator, i.e. the function from which the values were generated. Right : the sine function from which the data were produced. However, i f the shape of the generating function is not known, it is difficult to judge different models based only on their sum-of-squares error (compare Figure 3.2). However, wi l l the model wi th the smallest error E best represent the under ly ing generator of the data? In fact, this often is not the case. The phenomenon is known as overfitting in the l i terature, and describes the problem of f i t t ing the noise rather than the generating function. Its major effect is that models that fit the training data S too well do not generalise well. In this case the model , being presented wi th an input value which is not part of the training dataset, predicts a value far f rom the desired one. Figures 3.1 and 3.2 i l lustrate the problem. In the left panel of F igure 3.1, eleven datapoints are plotted, which are generated by adding random noise to the sine funct ion depicted in the right panel. F igure 3.2 shows two polynomials fitted to the data, in the left panel a cubic polynomial , in the right panel a polynomial of order ten. Since the underlying generator of the data is known, we easily judge that the cubic polynomial is a better representation of the original sine function. The sum-of-squares error, however, is much smaller for the higher order polynomial , since it perfectly fits the t ra in ing data. If we knew nothing about the original function, how could we judge which model is b e s t ? 2 7 The same problem arises if neural networks are used instead of polynomials. For simpl ic i ty of the derivation and implementat ion, I restrict myself to a two-layer perceptron (i.e. one layer of hidden units) as used by Aires et al . A s discussed by, for instance, Bishop (1995), this network is already capable of approximat ing arbi t rary functions. Mathemat ical ly , an A N N such as the one depicted in F igure 1.10 computes the output values from the following equations. The inputs X{ are weighted by the parameters of the first layer, then summed 2 7Concluding that less complex models (i.e. lower degree) are better is also misleading — consider, for example, a linear polynomial for the given case. 62 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation F i g u r e 3 .2 : Left: a cubic polynomial (solid line) fitted to the data f rom Figure 3.1, together w i th the generating sine function (dashed). Right : a polynomial of order ten, also plotted w i th the generating sine function. It is easily seen by eye that the cubic po lynomia l represents the sine funct ion better, however, the sum-of-squares error is smaller for the right polynomial . Th is phenomenon is known as overfitting. for each hidden neuron and transformed by an activation function <?(•) to give the hidden values \i=0 The same equation is applied to the hidden values, which yields the output values (3.3) M Vk = 9 Y,wflzi (3.4) i j=0 where <?(•) denotes the act ivat ion of the output units. In the architecture considered here, g(-) is a sigmoidal funct ion, whereas g(-) is a simple linear function (necessary to obtain output values other than zero and one). Combin ing (3.3) and (3.4) gives the complete network funct ion M Vk \j=0 : = 0 (3.5) A s for the polynomial in (3.1), the goal is to find a set of weights w so that (3.5) becomes the best possible representation of the generator of the dataset S. A lso, as in the polynomia l example, a trade-off between model complexity and a smal l error between predicted outputs and target da ta has to be found in order to find a model that generalises the data well. The problem of f inding a model that is neither too complex nor too simple is known as Occam's 63 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation razor (after W i l l i a m of Occam, 1285-1349; see Bishop (1995)). Assessing the representativeness of the model and control l ing its complexity is an important part of assessing the uncertainty inherent in the predictions. 3.1.2 Control l ing M o d e l Complexity and Network Training F ind ing the opt imal set of weights in (3.5) involves f inding the m in imum of an error surface in weight space, such as defined by (3.2). Obviously, the number of weights in an A N N architecture rapidly increases wi th the number of hidden units. Hence, the error surface is of a high dimensional i ty and typical ly has local min ima. The neural network error function is s imi lar ly to the cost funct ion (1-22) between computed and observed radiances in the retrieval methods described in section 1.5 too complex to be minimised analyt-ically. Instead, several numerical algorithms exist to perform the minimisat ion. Overviews of commonly used algorithms and their functional i ty are given by, for instance, B ishop (1995) and Press et al . (2007). In this work, I used the scaled conjugate gradients algori thm, as implemented by Nabney (2002) in the N E T L A B toolbox. Obviously, both the number of adaptive parameters (i.e, weights) in a network architecture and the type of the error function determine the complexity of the error surface. C o m m o n problems arise in specifying the error surface in a way so that the global m in imum does not correspond to a state of overfitt ing and in actual ly f inding this global m in imum - numerical min imisat ion algorithms often get stuck in local min ima, depending on the ini t ia l point where the search for the global m in imum started. Bishop (1995) points out two pr incipal approaches to control l ing the complexi ty of the model. The first, called structural stabilisation, involves control l ing the number of adaptive parameters in the network. The second, regularisation, adds a penalty term to the error function (3.2) to counteract overfitt ing by avoiding strongly f luctuating mappings such as the example in the right panel of F igure 3.2. The simplest approach to network structure opt imisat ion (and, as Bishop (1995) points out, st i l l the most widely adopted approach in practise) is to perform an exhaustive search through a restricted class of architectures (i.e. varying number of hidden units). Obviously, such a search requires a large computat ional effort. Nevertheless, due to the lack of easy-to-use alternatives (cf. B ishop, 1995, Chapter 9), I wi l l follow this approach as well. The opt imisat ion of model complexity is performed wi th respect to a given t ra in ing dataset. The sim-plest approach to comparing the generalisation performance of different network architectures is evaluate 64 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation the error funct ion for an independent val idat ion dataset. A common approach is to split the original dataset S into two parts, one used for training and one for val idat ion (typical ly two-thirds are used for training and one-third for val idation). After the models have been compared, the one wi th the smallest val idat ion error can be selected. As for control l ing the complexity of the model by regularisation, I wi l l show in sections 3.2 and 3.3 that the Bayesian approach provides a natural framework for both regularising the tra in ing process and estimating the uncertainty of the network predictions. 3.1.3 Describing Uncertainty The idea of the Bayesian approach is that instead of f inding the single weights vector w* that minimises an error function E and represents the most probable solut ion to the regression problem, we compute an entire distribution of weights. B y examining this distr ibut ion, we can see how "cer ta in" it is that the training process has found the generator of the data; if the distr ibut ion is wide, it is uncertain, if it is narrow, it is certain. Since the distr ibut ion of the weights is inferred from the t ra in ing data, it is usually wri t ten as p(w\D). (3.6) In order to simplify the notat ion and to follow the notat ion of B ishop, the dataset is only denoted by D. Implicit ly, however, the distr ibut ion depends on the entire dataset, including the input values (written as p(w\D,X)).28 The certainty in the network outputs is also restricted by the noise on the data. If we were, for example, to model the long-term generator of hourly-averaged windspeed given a dataset S composed of minute-by-minute measurements there would be intrinsic noise caused by turbulence on the measurements that would not be part of the generator. Obviously, apart from describing this noise wi th a probabi l i ty d ist r ibut ion, we cannot make statements about exactly where the value of the generator lies wi th in a noise interval. The noise distr ibut ion i p(t\x,w) (3.7) 2 8 I n section 3.2 I will show how so-called prior information about the weights distribution can be inserted into the training process that leads to the weights distribution (3.6). Research showed that weight values close to zero lead to smoother network mappings than large weights (Bishop, 1995, Section 10.1.2). This can be achieved by favouring small weights by using a special prior, a process is known as regularisation in the literature. Regularisation is an important tool to control the complexity of the model. However, it does not solve the problem of how many weights and hidden nodes the architecture should contain. In the evidence framework (MacKay, 1992a; Bishop, 1995, Section 10.6), the Bayesian approach provides the tools for comparing different architectures (e.g. different numbers of weights) of networks based on the training data. This topic, however, is outside the scope of my thesis and will not be discussed further. 65 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation gives the probabi l i ty of observing a specific target t, given an input x and the most l ikely generator function represented by a network wi th weights w. To come back to the issue that the target data in the forward model are not noisy, this problem in fact turns out to be a flaw in the appl icat ion of the method to the retrieval problem. However, f rom the studies of Perez et al . (2000) and Gonzalez et al . (2002), we expect the forward model to contain ambiguities. One can reason that these ambiguities could be interpreted as noise on the target data. Furthermore, the noise in the input data (the satellite observations) can be accounted for wi th the network Jacobian. Th is issue wi l l be further discussed in the remainder of this thesis. G iven the two sources of uncertainty described by the weights distr ibut ion and the noise distr ibut ion, the total uncertainty of the network predictions can be calculated from where the noise distr ibut ion has been integrated over al l l ikely generators w. Th is d ist r ibut ion describes the probabi l i ty that a target t wi l l be observed if an input vector x is given and the general behaviour of the target data is specified by the training dataset D. W h e n I first studied the derivation of the network outputs d ist r ibut ion (3.8) , it appeared confusing to me' that the uncertainty in the network outputs, denoted by y, is expressed as a distr ibut ion of the targets t. However, since we have no information about the generator of the data except the dataset S, i t is best to make predictions about new data that would l ikely be observed for a new input. The network predict ion y describes the mean of the posterior target distr ibut ion, and hence the locat ion of the most probable data value to be observed. The shape and width of p(t\x,w), however, describe the certainty that a data value we would observe in reality would actual ly be the most probable value - hence, the distr ibut ion wid th can be interpreted directly as error bars on the network predict ion y. The weights distr ibut ion p(w\D) addi t ional ly describes our degree of belief in the abi l i ty of the network w to model the generator, thereby further widening the distr ibut ion of possibly observed data values. The network Jacobian represents another important tool to analyse the model . It is defined as the derivative of the network outputs w i th respect to its inputs, dy^jdxi. For instance, the Jacobian provides a mechanism to estimate the impact of errors associated wi th the inputs on outputs errors. Furthermore, the Jacobian provides a good tool to answer questions concerning the complexi ty and adequateness of the mapping - to what extend does the output change when the input is changed? Is this the behaviour we expect the solution to have? Is the model sensitive to the inputs we expect it to be sensitive to given (3.8) 66 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation our knowledge of the problem? Th is information can make the A N N much more transparent than a "black-box network" would be. Us ing the weights distr ibut ion, it is also possible to estimate a Jacobian distr ibut ion. A s mentioned in Chapter 1, such an estimate of the variabi l i ty in the Jacobian is a useful indicator of how i l l -condit ioned the regression problem in question is. Aires et al . (2004b) point out that inverse problems are often i l l -posed; similar output statistics of the network can be obtained by a variety of different network parameters (i.e. weights). A possible reason is redundant information in the input variables, caused, for instance, by correlated inputs. Imagine, for example, a network wi th one output and two highly correlated inputs. The output might physical ly be dependent on the magnitude of one of the two inputs as well as their difference. In this case, the training algori thm cannot decide which input carries the information, and separate t ra in ing runs could yield different results in terms of the A N N weights. The Jacobian would be fundamental ly different between the networks. In one case the output variable could be main ly sensitive to the first input and to a lesser extend to the second, while in the other case it could get most of its information from the second input and represent the dependence on the input difference wi th an opposite sign dependence on the other input. The output statistics for the t ra in ing data might be simi lar ly good for al l networks, but the abi l i ty to generalise to new inputs could vary considerably. In part icular, if we know l i t t le about the problem and are interested in interpreting the network Jacobians as actual physical Jacobians (so that we can learn on which inputs the output depends), such a high variabi l i ty in the Jacobian is problematic and we should be careful about assuming that the network represented by the most probable weights vector models the real generator. If, however, the var iabi l i ty in the Jacobian is smal l , we might expect that a "good" mapping has been found. In the following sections, I wi l l show how the weights distr ibut ion, output uncertainties and variabi l i ty in the Jacobian can be obtained. To i l lustrate some important aspects, I wi l l use a simple example function - a mapping from two input to two output variables, given by A two-layer network wi th six hidden units has been trained from a dataset generated from these equations by adding normal ly distr ibuted noise wi th a standard deviat ion of 0.1 to output 1 and 0.05 to output 2. I wi l l show examples that originate from two training runs - a long run and a short run - , resulting in different posterior weights distr ibutions and consequently different output uncertainties and Jacobian yi(xi,x2) = 4x\ + sin (2TTX2) (3.9) V2(X1,X2) = COS (27TXl) + X2- (3.10) 67 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation variabil i t ies. 3.2 Distribution of the Neural Network Weights In this section I wi l l present and in places elaborate on the Aires (2004) derivat ion of the posterior network weights distr ibut ion, p(w\D), and demonstrate how neural network t ra in ing can be formulated in the Bayesian framework. 3.2.1 Derivation Using Bayes' theorem, p(w\D) can be wri t ten as p H ^ ) = ^ M P W ( 3 1 1 ) The density p(D\w) is called the likelihood of the data and indicates how l ikely the observation of the dataset D is in the light of a given model (here represented by a set of weights). p(w) is called the weights prior. Th is density represents al l information about the weights that is available before the data is observed. F ina l ly , the denominator p(D) is a normal isat ion factor to ensure that the integral over p(w\D) is o n e : 2 9 p(D) = J p{D\w)p{w)dw. (3.12) In order to find the mapping that best represents the generator of the data, the max imum of the posterior d istr ibut ion p(w\D) has to be found (following Aires, I wi l l write to* for this max imum). Therefore, expressions for p(D\w) and p(w) are needed. The first assumption that is made is that the target data are composed of a smooth function h and an addit ive noise component e (Bishop, 1995, Chapter 6): t = h(x) + e. (3.13) The noise e is assumed to follow a Gaussian distr ibut ion, which I wri te as a mult ivar iate Gaussian wi th zero mean (zero mean because it is the noise around the generator value): 2 9 Note that again, the explicit dependence on the input data X has been omitted in the notation. 68 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation Cin denotes the covariance matr ix of this intrinsic noise on the data; c denotes the number of network outputs (cmp. Figure 1.10). The generator function h(x) is the funct ion that we would like to model wi th the neural network, therefore I replace h(x) = y(x;w). Combin ing (3.13) and (3.14) yields the distr ibut ion of the target variables: p(t\x, w) = y / 2 l \ c |i/2 e x p ( " ^ ( y ( x ; ™) _ * ^ ' ° i n ~ l ' ( y ( x ; w ^ ~ ' ^ • 1 5 ^ Assuming that al l datapoints of the dataset are drawn independently f rom this d ist r ibut ion, the proba-bi l i ty density of the entire dataset becomes N p(D\w) = Y[p(tn\xn,w) (3.16) n = l 1 (2n)^ \C, 1/2 zri | exp (4|>n)r'Ain"£n)' (3-17) where I have used en — tn — yn and replaced Ain = Cin~~X• The matr ix Ain is called a hyperparameter in the context of Bayesian learning, since it is a parameter that controls the dist r ibut ion of other parameters (the network weights). I wi l l explain its function in the next paragraph. In analogy to the conventional max imum l ikel ihood approach (Bishop, 1995, Chapter 6), the sum in the exponential is called the data error function 1 N ED(w) = ^-T(en)T - A i n e n . (3.18) A n = l In order to find an expression for the weights prior p(w), Aires (2004) assumes that the weights follow a Gaussian distr ibut ion as well: p(w) = exp f - \ w T • Ar • w ) = exp (-Ew(w)). (3.19) Lw \ I J ZJW For convenience and following Aires (2004), I have grouped al l normal isat ion factors into a single constant Zw- The parameter Ar = Cr~l is the inverse of the covariance matr ix Cr of the weights and the second hyperparameter that occurs in the context of Bayesian neural network learning. Analogous to the data error function, the weights error function is defined as Ew(w) = ^wT • AT • w. (3.20) 69 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation In the form given by (3.19), the prior distr ibut ion has zero mean, thereby expressing that the weights are expected to be centred around zero. The use of such a prior weights d ist r ibut ion regularises the training process (cf. subsection 3.1.2); smooth network mappings, which usual ly generalise better than strongly fluctuating functions, can be achieved by favouring smal l weights (Bishop, 1995, Sect ion 10.1.2). If the weights are large, then Ew wi l l be large and p(w) wi l l be smal l ; for smal l weights p(w) wi l l be large. Hence, by setting Ar correspondingly, we can prefer smal l weights over large ones. B ishop points out that priors other than Gaussian can be considered as well. In this work, however, I w i l l only consider the Aires (2004) approach. G iven the expressions for p(D\w) and p(w), the posterior d istr ibut ion of the weights (3.11) becomes p{w\D) = | exp I> U ) T • A™ • 6") e x P (~\wT • A r • w) = \ e x P ( ~ E d ~ E w ^ > ( 3 2 1 ) where I have again used the shorthand notat ion Z for the normal isat ion factors. Th is expression could be maximised, however since many standard algorithms exist to minimise a funct ion rather than to maximise it (Bishop, 1995, Chapter 7) it is more convenient to minimise the negative logar i thm of (3.21). The logar i thm removes the exponential, and because of its monotonici ty the locat ion of the min imum remains unchanged. Since the normal isat ion constant Z also does not affect the posi t ion of the min imum, we are left w i th minimis ing the total error f unc t i on 3 0 1 N 1 E(w) = ED(w) + Ew{w) = - X>")T ' Ai-n. • e " + A r w . (3.22) n = l Convent ional (i.e. non-Bayesian) network learning can be regarded as a special case of this Bayesian framework; if we have no information about the weights pr ior and assume it to be a uniform distr ibut ion (p(w) = const.), then Ew = 0. If we furthermore assume independent output variables (all off-diagonal elements of Ain are zero) which have the same variance (all diagonal elements of Ain set to a2), then a2 becomes a constant factor in (3.22) and can be omit ted, leaving E to be a sum-of-squares error function as in the example of polynomial curve fitt ing: N c . fiW^EEW1"^)-^. (3-23) n=l fc= l A problem w i th maximis ing the posterior weights distr ibut ion is that the hyperparameters Ain and Ar 3 0 This is analogous to the principle of maximum likelihood; Bishop (1995, Chapter 2). 70 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation are usual ly not known beforehand. I wi l l discuss how they can be estimated in section 3.4. 3.2.2 The Meaning of the Hyperparameters M a c K a y (1992a, 1995) and Bishop (1995) introduce the Bayesian framework wi th only scalars (3 and a as hyperparameters instead of the matrices Ain and AT. For simplici ty, they only use one output, thereby reducing Ain to a single element (5. The weights are assumed to be independent, making Ar a diagonal matr ix . In the simplest form, the variance of al l weights is the same ( 1 / a ) 3 1 . The error function (3.22) then becomes If the intr insic noise on the target data is smal l , then ft wi l l be large, and smal l deviations of the network predictions from the target wi l l result in a large "penal ty" . If the noise is large, (3 wi l l be smal l and larger differences wi l l be tolerated. Sett ing the ratio a/f3 becomes important under the aspect of the size of the training dataset. Wh i le the first term in (3.24) grows wi th increasing numbers N of datapoints in the dataset, the second term does not. Hence, the ratio of both hyperparameters controls the importance of the weights prior and the size of the dataset form which it wi l l become insignif icant. The matr ix hyperparameters Ain and Ar used by Aires (2004) generalise this concept to interdepen-dent weights and outputs, respectively. Us ing Ain instead of (3 allows for more outputs and also incor-porates the correlat ion structure of the errors in the indiv idual variables. (Remember that C in — Ain does not represent the covariance matr ix of the outputs, but of the errors in the outputs (the noise).) Th is way, mapping errors for variables wi th smal l noise are penalised stronger than those for targets wi th larger intr insic noise. In order to understand Ar, it is important to keep in mind that the prior p(w) is a Gaussian distr ibu-t ion wi th zero mean. Th is means that, similar to the scalar hyperparameter case, we expect the weights to be small . The major difference to the scalar case is that we assign different inverse variances (diagonal elements of Ar if the weights are independent) to different weights, hence control l ing indiv idual ly how much the weights are penalised for being large. Th is may be useful since the magnitudes of weights in different layers of a network can have fundamental ly different ranges ( M a c K a y , 1995; Ai res, 2004). 3 1 MacKay (1995, Section 3.2) notes that the weights of a two-layer perceptron will usually fall into three or more distinct classes, depending on the structure of the inputs. For a good regularisation performance, he suggests the use of different hyperparameters a for these different classes. (3.24) 71 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation 3.2.3 Gaussian Approximat ion to the Posterior Dis t r ibut ion Al though (3.21) is an exact equation for a given noise model and prior, it is useful to approximate the posterior wi th a Gaussian distr ibut ion in order to make it analyt ical ly tracktable when used in integrals such as (3.8) (Bishop, 1995, Section 10.1.7). Th is can be obtained by performing a second-order Taylor expansion of the total error funct ion (3.22) around its min imum w* (i.e. the max imum of p(w\D)): E(w) = E(w*) + bT- Aw + ^AwT-H-Aw, (3.25) where Aw = w — w*. b denotes the gradient of E at w*, b = VE(w)\v,=w.=0, (3.26) which vanishes because w* marks the min imum of E. H is the Hessian matr ix of E (second derivative) wi th respect to the weights, and it wi l l play an important role in the remaining parts of this thesis. Look ing at (3.22), we can see that H is composed of two parts, the data Hessian Hp and the weights hyperparameter A r 3 2 : H = VVE(w)\w=w* = VVED(w)\w=w* +Ar (3.27) = H D + A r . (3.28) Using the approximated error function (3.25), (3.21) becomes p(w\D) = ^ exp (-E{w*) - ^AwT • H • Aw^j , (3.29) and, including the constant exp (—E(w*)) in the normalisat ion factor Z, p{w\D) = exp ^ - i AwT • H • Aw^j . (3.30) The posterior weights distr ibut ion thus becomes a Gaussian wi th mean w* and covariance matr ix H The information contained in this covariance matr ix can be immediately used to give error bars on the most probable weights vector w*, an example of which is shown in Figure 3.3. . 3 2Note that if we use no prior information about the weights (i.e. p(w) = const.), E = ED and H = HQ. This case corresponds to A N N training without regularisation. 72 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation -0.5 0 weight value F i g u r e 3.3: Dis t r ibut ion of the first weight wn of the neural network used for the example given by equations (3.9) and (3.10) after a short t raining run (grey histogram, the network has not gained much certainty about the weight value yet) and after a longer t ra in ing run (black histogram, the distr ibut ion has narrowed considerably). 3.3 Output Uncertainties In the previous section, both terms under the integral in (3.8) - the noise model of the target variables (3.15) and the Gaussian approximat ion to the posterior weights d ist r ibut ion (3.30) - have been derived. Us ing these results, the derivation of the distr ibut ion of the network outputs is straightforward (Aires et a l , 2004a): p(t\x,D) = J p(t\x,w)p(w\D)di J exp ^ - i (t - y)T • Ain -(t-y) • exp (3.31) --AwT H-Aw ) dw. (3.32) A l l normal isat ion factors have been omit ted in the notat ion in (3.32), instead, the • oc • sign has been used to indicate the missing normal isat ion. Th is integral can be evaluated by assuming that the posterior d istr ibut ion of the weights, (3.30), is narrow enough to approximate the network funct ion y(x; w) by its l inear expansion around the opt imal weights value w*: y(x; w) = y(x; w*) + G1Aw, (3.33) 73 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation where the W x c matr ix G represents the gradient of the network funct ion (c is the number of network outputs): G = Vy(x;w)\w=w.. (3.34) Hence, (3.32) becomes p(t\x,D) oc J exp ^ - i (t - y(x;w*) - GT Aw^ • A i n • (t-y(x;w*) - GT Aw^ • exp ^ A w T • H • Awj dw. (3.35) Wr i t ing e* = t — y{x\ w*) for simplicity, expanding the product and rearranging yields p(t\x, D) oc exp ^ - ^ e * T ' A i n • e* j • J exp (e*T • A i n • (GTAw) - ^(GTAw)T • A i n • (GTAw) - \AWt • H • Aw^j dw, (3.36) where the first factor is independent of w and has hence been pul led out of the integral. Us ing the matr ix identity (AB)T = BTAT and further rearranging leads to p(t\x, D) oc exp (^-^e*T ' A i n • €* j • J exp ^e* T • A i n • G T ) Aw - ^AwT (G • A i n • GT + H^j Awj dw. (3.37) Bishop (1995, Append ix B ) shows that Gaussian integrals w i th a linear term evaluate to J exp (LTW - ^wT •Aw^dw = {2-n)w'2 | A | " 1 / 2 e x p QiT • A • LJ . (3.38) Sett ing L = e*T • Ain • GT and A = G Ain • GT + H, the integral in (3.37) becomes J exp(.. .)d«> = (2n)w'2 G Ain G + H - 1 / 2 exp Q (e*T • A i n -GT)T • (G • A i n • GT + H) • (e*T • A i n • G T ^ . (3.39) 74 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation Omi t t ing the constant factor and rearranging (3.37) again gives 1 r \ / I T p(t\x,D) oc exp ( --e* • Ain • e* J exp I--e* -AinGT -(G-Airl-GT + hJ GAi (3.40) and further p(t\x, D) oc exp (-±(t-y(x;w*)f AIN - AINGT • (G • AIN • GT + • GAIN (t - y(x; ™* ) ) ) • (3-41) Thus, the target variables follow a mult ivariate Gaussian distr ibut ion w i th mean y(x; w*) and covariance matr ix C 0 = Ain - AinGT • ( G • A i n G T + i ? ) 1 • GA - l (3.42) The expression for the covariance matr ix can be simplif ied by mul t ip ly ing by J G + H 1GAinGT^j GJ X . . . x [ G ( l + H^GA^G^ G ] _ 1 to give C0 = Cin + GTH ' G , (3.43) where Cin = A i „ _ 1 has been used. Equat ion (3.43) is the ma in result of this derivation (Aires et a l , 2004a). It shows that the uncertainty in the neural network predictions is composed of two parts; the intr insic noise contained in the training data, represented by its covariance matr ix Cin, and a term G H G that represents the impact of the uncertainty of the posterior weights distr ibut ion on the predictions - a result that is expected from (3.8). Unless we have si tuat ion dependent (= input dependent) information about intr insic noise on the data, Cin wi l l be constant. The neural predict ion term, however, is si tuat ion dependent through the gradient G (the Hessian is not dependent on the input data). We can determine the error bars on a network predict ion by tak ing the standard deviat ion from the covariance matr ix Co- Figures 3.4, 3.5 and 3.6 i l lustrate the results that can be obtained for the example T — i defined at the end of section 3.1. Wi l l i ams et a l . (1995) show that the weights uncertainty term G H G is approximately proport ional to the inverse training data dens i t y 3 3 , and note that consequently in high-3 3 They can prove the result for generalised linear regression models (models of the form y(x) — wi't>j(x)> where <fij are basis functions), and note that empirical studies provide evidence that the result also holds for multi-layer networks - especially for networks with linear output activations trained with a least-squares error function (as in our case), which "is effectively a generalised linear regression model with adaptive basis functions" (Williams et al., 1995, Section 5). 75 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation data-density regions, the contr ibut ion of this term wi l l become insignificant compared to the noise term C o - a result that I wi l l discuss in section 3.7. 3.4 Hyperparameter Re-Estimation W i t h the results f rom the previous sections, we have to know the values of the hyperparameters Ain and Ar in advance. However, using the complete structure of Ar in the tra in ing process requires a good knowledge of the network mapping, which usually is not available a pr ior i . Simi lar ly, information about the noise distr ibut ion in the target data wi l l also not be available in most cases. Therefore, how can the hyperparameters be estimated? Aires et al . (2004a) suggest the fol lowing - omit both hyperparameters in an in i t ia l learning stage, then estimate them from the trained network and re-train the network wi th the new values. In the case of Ain, Aires et al . suggest use of (3.43). The approximat ion they make is to assume the intr insic noise to be constant throughout the tra in ing data. Then an average covariance matr ix Cin can be computed from the training dataset D by determining the covariance mat r ix of the output errors C o from the ( e * ) n of the training datapoints, and by using an average of the neural predict ion term G7H * G over the tra in ing dataset. F rom these two matrices, an average Cin can be computed, and Ain is obtained by inversion: Cin = (Co) - (GTH_1 G), (3.44) where the averages have been denoted by (•). Unfortunately, Ai res et al . do not give details on how to re-estimate A r . The i r papers suggest use of the posterior weights distr ibut ion of one t ra in ing run as the prior d ist r ibut ion in the consecutive run. Th is , however, would destroy the intended regularisation mechanism - the prior (3.19) is intentionally chosen to be a Gaussian wi th zero mean in order to keep the weight values smal l . The posterior (3.30), in contrast, is a Gaussian wi th mean w* ^ 0. Thus , using this d ist r ibut ion as a prior for the consecutive training run would pul l the weights towards the already found min imum, not encourage them to be s m a l l 3 4 . Th is also becomes clear when considering the following: In order to accommodate the mean w*, (3.20) would have to be changed to Ew(w)M = l- (w - (™*)(i_1))T ' • (w - (w*)^-1^ , (3.45) 3 4Although MacKay (1992b) points out that using weight decays with non-zero means would just correspond to using a different model - whose performance could be compared to a zero-mean regulariser within the evidence framework (MacKay, 1992a). 76 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation 3r 2 Input X (a) Long training run. 3r I I I 1 I I 0 0.2 0.4 0.6 0.8 1 Input X (b) Short training run. F i g u r e 3.4: Section through the functional surface of the example given by (3.9) and (3.10) at X2 = 0.3. P lot ted are datapoints from the training dataset (grey), the network predict ion (solid line) and error bars (dotted lines). Panel (a) shows the error predict ion after the long training run (the narrow weight distr ibut ion in Figure 3.3), while panel (b) displays the same predict ion for the network trained only a short t ime (the broad distr ibut ion in Figure 3.3). The broader weights distr ibut ion is noticeably reflected in larger error bars. Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation 0.4r & 0.3 3 o o 0.2 fc u u e 0.1 L-oj: 0 0.1 0.2 0.3 Estimated error output 1 0.2r CM 3 g, 0.15 o la O 0.1 E U E 0.05 H o-0 0.05 0.1 0.15 Estimated error output 2 0.2 (a) (b) F i g u r e 3 .5 : Scatterplots of the estimated error (one standard deviation) vs. the actual error for both output variables of the example given in (3.9) and (3.10), for the long training run shown in Figure 3.8 (also see Figure 3.4a). The narrow weights distr ibut ion (cmp. Figure 3.3) causes the network term in (3.43) to be much smaller than the intr insic noise Cin. Consequently, the estimated error is almost identical to the noise on the data, which is correctly estimated to have a standard deviat ion of 0.1 and 0.05, respectively. Hence, the predicted error does not show significant input dependence, it can only be stated that the true error wi l l be smaller than the estimated error in at least 68.2% of al l cases. In fact, for the given example, 68.5% of the true errors are smaller than the predicted ones for output variable 1 (68.0% for variable 2). 0.5 1 Estimated error output 1 1.5 0.2 0.4 0.6 0. Estimated error output 2 (a) 0) F i g u r e 3 .6 : The same as Figure 3.5, but for the short t ra in ing run leading to the broad weights distr ibut ion in Figure 3.3 and the larger error bars in F igure 3.4b. Note that this t ime, the network dependent term in (3.43) is much larger, due to the broader weights distr ibut ion, result ing in a more input dependent error (in panel (a) 67.5% of the predicted errors are smaller than the actual error; in panel (b) 75.4%). 78 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation where refers to the i-th re-estimation i terat ion. Since VV•Ew(to)lu>=(u; ,' ,)W when (3.45) is used instead of (3.20), Ar is re-estimated following = Ar(l 1 ' does not change A r « = H{i) = JT D W + Ard-V. (3.46) Since H, i / j j and Ar are the inverses of covariance matrices, which are posit ive definite (Aires, 2004), they are also posit ive definite. Th is means that at least the diagonal elements of these matrices are posit ive (Weisstein, 2007). Hence, the diagonal elements of Ar would become larger and larger wi th each t ra in ing i terat ion. Th is makes sense; larger diagonal elements of Ar approx imate ly 3 5 mean a smaller variance of the weights, hence the weights distr ibut ion is more sharply peaked. If the opt imisat ion algor i thm has found a min imum w* in a first t ra in ing run, and in a second tra in ing run is " to ld" that this m in imum is very l ikely (even if it was not a very good min imum), then it wi l l further increase the certainty in that min imum by decreasing the variance. Furthermore, by increasing the diagonal elements in A r , the importance of the data is decreased, unt i l they eventually are insignificant compared to the weights prior. Th is , however, wi l l only increase the chance of remaining stuck in a local m in imum found in the first t ra in ing run. However, since we are using a zero-mean regulariser, we wish to get a new est imation of how strongly the weights should be pul led towards zero, which eventually should lead to a compromise between a mapping that fits the data well and a guard against overfitt ing. A method to re-estimate an AT for a zero-mean Gaussian is needed, something that Aires et a l . do not derive. The approach I use is to adapt the re-estimation technique suggested by Aires et al . (2004a) for A i n , but to use the so-called evidence procedure (MacKay , 1992a) in order to re-estimate a modif ied version of A r . However, in order to use the evidence procedure in the form derived by M a c K a y , Ar can only contain diagonal elements. Hence, correlations between the weights wi l l be ignored below. 3.4.1 Evidence Procedure for Ar. The evidence procedure has been derived by M a c K a y (1992a) in order to re-estimate the scalar hyperparameters a and (3 from the data during the network learning stage. Nabney (2002) shows how the approach can be generalised to mult iple a for different groups of weights, up to an indiv idual a for every weight - which corresponds to having a diagonal Ar in (3.20). W h e n in addi t ion to the network weights the values of the hyperparameters are inferred from the 3 5 if no covariances exist exactly 79 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation data, the result can be expressed by the joint probabi l i ty d istr ibut ion p{w, a, f3\D). The correct Bayesian treatment (Bishop, 1995, Section 10.4) to get the posterior d istr ibut ion of the weights p(w\D) from this joint d istr ibut ion would be to integrate over al l possible values of a and (3: p{w\D) = J J p(w,a,P\D) dad/3 (3.47) = J J p(w\a,p,D)p(a,p\D) dad/3. (3.48) The evidence procedure, however, makes the approximat ion that the density p(a, /3\D) is sharply peaked around the most probable values a M P and /3MP, reducing (3.48) to p(w\D) = p{w\aMP,/3MP,D) J J p{a,(3\D) dad/3 . (3.49) v v ' = 1 In order to find a M P and /3MP, the posterior distr ibut ion p i a , m = m ± B ^ ' ,3.50) has to be maximised. The prior p(a, f3) is assumed to be uni form in the evidence procedure, so that it does not affect the max imum of p(a, P\D)36. The normal isat ion factor p(D) (the integral of the numerator over a and (3) also does not affect the max imum, hence only p(D\a,(3) - known as the evidence of the hyperparameters - has to be maximised. Th is term can be wr i t ten as an integral of the data l ikel ihood over a l l possible weights w: p(D\a,P) = jp(D\w,a,P)p{w\a,P)dw. (3.51) Since the hyperparameters are given, the factors p(D\w, a, P) and p(w\a, P) of the integrand are given by (3.17) and (3.19) (exchange the matr ix hyperparameters wi th the scalar ones), leading to p(D\a,P) = ± | exp ( - £ a V ) ) dw, (3.52) where (3.24) has been used because of the scalar hyperparameters. The Gaussian integral in (3.52) is 3 6 Such a prior is said to be an improper prior, since it does not have a finite integral and cannot be normalised. 80 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation given by /, 1 — 1 / 2 exp (-Eap(w)) dw = exp (-Ea(3(w*)) (2TT)W/2 Lff (3.53) (see Bishop (1995, Append ix B ) for the evaluation of Gaussian integrals). Together w i th terms from the normal isat ion factor Z, the logar i thm of the evidence (3.52) is then given by w ~ N I i i W N N logp(D\a,(3) = - | ^ K ) 2 { y ( x n ; w) - tn}2 - - l o g \ H \ + — l o g a + - log/5 - - log(27r). (3.54) In order to optimise this log evidence wi th respect to a (the corresponding approach for j3 wi l l not be considered here), the part ia l derivative -j^ (log p(D\a, ft)) has to be computed. B ishop (1995, p. 410) shows that ^ l o g | f f | = t r ( H _ 1 ) , (3.55) where the approximat ion has been made that the eigenvalues of H do not depend on a. (I have skipped the eigenvalue step here, see Bishop (1995, Section 10.4) for further details.) Hence, d_ da (\ogp(D\a,P)) = ~ f > : ) 2 - | t r (iT1) + (3.56) 2 = 1 Equat ing (3.56) to zero yields the expression W-atr (H~1^) eL«) 2 (3.57) If groups of weights are assigned different hyperparameters, this equation can be adjusted correspondingly, down to an indiv idual a for each weight: l - o ^ H - 1 ) , In order to optimise a, the A N N training is first started wi th an in i t ia l (random) value of a. Once the tra in ing algor i thm has found a min imum, a is re-estimated from (3.58): 1 (3.59) K ) 2 Of course, the opt imisat ion of the hyperparameters by iterative re-estimation - val id for both re-81 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation estimating Ain w i th the Aires et a l . (2004a) approach and Ar w i th the evidence procedure - is computa-t ional ly expensive, since the network has to be re-trained several t imes, ideal ly unt i l the hyperparameters stabilise. F igure 3.7 i l lustrates how they develop dur ing the long t ra in ing run of the example used in this chapter. 3.5 The Jacobian Matr ix The Jacobian matr ix is not a part of the Bayesian framework described in the previous sections. It represents the first derivative of the network outputs y w i th respect to the inputs x and is defined as Jlk = (3.60) Aires et a l . (2004b) suggest that the variabi l i ty in the Jacobian can be determined by comput ing a distr ibut ion of Jacobians from the posterior weights distr ibut ion given by (3.30). The i r approach, and the one adopted in this study, is to use Monte Car lo techniques to sample from the weights distr ibut ion, then to construct a Jacobian distr ibut ion from these samples. Aires et a l . use R = 1000 samples from (3.30), compute the Jacobian for each weights sample, and compute mean and variance from al l these Jacobians. Th is , of course, assumes a Gaussian distr ibut ion for the Jacobian as well, al though the histogram of the Jacobians itself could be used as a P D F (probabil i ty density function) representation. The Jacobian, as defined in (3.60), is input dependent, hence its d ist r ibut ion can be computed for ind iv idual i npu ts . 3 7 For the purpose of identifying i l l-posed problems Aires et a l . (2004b) propose to compute an average Jacobian over the training dataset and to interpret its variabi l i ty. Th is approach applied to the example given by (3.9) and (3.10) yields the results l isted in Table 3.1. Indeed, the wider weights distr ibut ion results in a larger variabi l i ty in the Jacobian. If a problem has been identified as i l l-posed, they suggest a pr inc ipal component decomposit ion of the input and output data. They apply this approach to a remote sensing problem w i th many correlated inputs and use it to reduce the number of inputs and to exploit the decorrelated structure of the pr incipal components. However, if the number of inputs is smal l from the beginning, it is usual ly not desired to further decrease their number. B ishop (1995, Section 8.2) discusses a simi lar approach, which decorrelates the inputs and is known as whitening in the l iterature. It wi l l become important in chapter 4. 3 7 A property that will prove useful in chapter 4, where the Jacobian provides information about whether a network is modelling the "right" function. 82 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation •,11 U c U ..S o . o i 0 -0.01 -,22 - o — 0 0.02r -,12 '0 -• -,22 ,12 0 Iteration (a) 10 400 300 < r 200 100 0, 0 Development of A A A 2 2 • - i n • 12 before iteration (b) 10 Development of A 10000 5000 before iteration (c) F i g u r e 3.7: Development of the covariance matrices Co and C\n (panel (a)) and the hyperpa-rameters A i n (panel (b)) and Ar (panel (c)) dur ing the t ra in ing run shown in Figure 3.8. After a few re-estimation iterations, the estimated intr insic noise stabilises at the correct variance values of 0.01 and 0.0025, respectively, leading to a stabi l isat ion in the data hyperparameter Ain. Notable is that the uncertainty in the neural predict ion, given by the difference between C 0 and Cin, quickly becomes smaller. Most elements of Ar stay in the range between 0 and 100, however, some weights are more strongly penalised for getting large. Af ter approximately eight iterations, Ar has stabil ised as well. 83 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation T a b l e 3 .1 : Jacobian matrices and their uncertainty (one standard deviation) of the two net-works trained to model the example function given by (3.9) and (3.10). Left: the Jacobian of the network w i th the narrow weights distr ibut ion (long t ra in ing run) shown in F igure 3.3; right: the Jacobian of the network wi th the broad weights distr ibut ion (short t ra in ing run) in Figure 3.3. Note how the width of the weights distr ibut ion is' reflected in the uncertainty of the Jacobian, and how the only short ly trained network models a signif icantly different dependence of output 1 to input 2 - a dependence that is also most uncertain in the long trained network and indicates a problematic input. dyi/ dyi/ dy2/ 4.00 ± 0 . 0 2 - 0 . 0 4 ± 0 . 0 1 dxi 3.83 ± 0 . 5 0 0.01 ± 0 . 8 7 dx2 - 0 . 0 2 ± 0 . 4 3 1.00 ± 0 . 0 5 dx2 - 1 . 5 2 ± 0 . 3 7 0.77 ± 0 . 3 4 3.6 Implementation Issues The N E T L A B toolbox by Nabney (2002) is implemented in M A T L A B and provides many functions for neural networks. The toolbox comes wi th the abi l i ty to handle two-layer M L P wi th a Bayesian module that implements the scalar hyperparameter approach of M a c K a y (1992b). Since N E T L A B is very well documented and already provides many functions needed for the Aires et al . a lgor i thm, I chose it as a basis for my implementat ion. A lgor i thm 3.1 below summarises the algor i thm discussed in the previous sections in a schematic outl ine. The ini t ia l hyperparameters A i r l ^ and A r ^ wi l l usual ly be set to the identi ty matr ix I and a01, respectively, where a0 is some constant that controls the degree of regularisation in the first t raining run. If no regularisation is desired, Ar can be set to zero. The loop in lines 2 to 9 trains the network and re-estimates the hyperparameters. It is repeated unt i l some terminat ion cr i ter ion has been reached, which can be either a stabi l isat ion of the hyperparameters or s imply a max imum number of iterations. In line 3, the error surface of equation (3.22) is minimised wi th one of the opt imisat ion algorithms provided by N E T L A B , for instance, conjugate gradients. Afterwards al l input data is propagated through the network, and the covariance matr ix C o is computed from the errors of the predictions compared to the targets. In the following step, the gradient of the outputs wi th respect to the weights G and the Hessian H is computed, which together w i th C 0 are used to estimate Cin and A i n . Eventual ly, Ar is re-estimated wi th the evidence procedure described in section 3.4. Wh i le details about the implementat ion are given in Append ix A , a few important issues are mentioned in subsections 3.6.1 and 3.6.2. 84 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation Algorithm 3.1: Bayesian Neura l Network Tra in ing wi th Ma t r i x Hyperparameter Re-Est imat ion . Input: Tra in ing dataset S = {X, D}, in i t ia l hyperparameters Ain^°\ Ar^°\ Result: M a x i m u m of posterior weights distr ibut ion w*, hyperparamaters Ain and Ar, covariance matr ix of intr insic noise Cin and covariance matr ix of posterior weights d ist r ibut ion H . T A <- A A <- A (0)-2 repeat 3 T ra in network (minimise error function (3.22)) w i th Ain and Ar in order to f ind w*; 4 Est imate covariance matr ix Co — covariance(t(x) — y(x\ w*)) f rom target data and network predictions; Compute gradient G = Vy(x;w)\w=w- (3.34) and Hessian H = V\7E(w)\w=w' (3.27); f — l Compute the approximate covariance matr ix of the intr insic noise Cin = (Co) — (G H G) (3.44); Set A.-in — Cin , Re-estimate Ar wi th the evidence procedure, (3.59); 9 until hyperparameters have stabilised or maximum number of iterations has been reached ; Development of training and test MSE Development of A N N weights 0 5 10 0 5 10 Iteration Iteration (a) (b) Figure 3.8: (a) Development of the mean-square-error ( M S E ) of the t ra in ing dataset and of a test dataset without added noise of the example given by (3.9) and (3.10) dur ing a tra in ing run wi th 10 hyperparameter re-estimation iterations (corresponding to the narrow weight distr i -but ion in F igure 3.3). The stabi l isat ion of the test M S E close to zero after i terat ion 4 is a sign that no overfitt ing is taking place; in the case of overfitt ing, the tra in ing M S E would further decrease while the test M S E would increase (the network predict ion would not model the true function anymore), (b) Development of the weight values dur ing the train ing. Af ter i teration 6 the values hardly change anymore. The regularisation effect of Ar causes al l weights to stay of order unity. 85 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation Test dataset after training. Test dataset after training. (c) (d) F i g u r e 3.9: Scatterplot of the network predictions for both outputs of the example given in (3.9) and (3.10) vs. the target data. Panels (a) and (b) show plots of the t ra in ing data (with noise), panels (c) and (d) show plots of the test data (no noise). Whereas panels (a) and (b) reflect the intr insic noise in the training data, panels (c) and (d) i l lustrate the good gener-al isation performance of the network; if overfitt ing had occurred dur ing the tra in ing process, predictions of the test data would unlikely match the targets. 86 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation 3.6.1 Normalisation of Input and Output Variables A s Bishop (1995, Section 8.2) points out, rescaling input and output variables is useful if different variables have typical values that differ significantly. In atmospheric remote sensing, for instance, particle size and temperature would have very different value ranges. In such cases, pre-processing can have a significant effect on the generalisation performance of the network; after normal is ing input and output variables to be of order unity, it is expected that the weights wi l l also be of order uni ty and hence be smal l (Bishop, 1995, Chapter 8). A simple method to achieve similar values of order unity for al l variables is to apply a l inear rescaling by subtract ing the mean of the variable and normalising by its standard deviat ion: where v can be either input or output variable. Whitening, the more sophisticated linear rescaling method mentioned in section 3.5, decorrelates the variables in addi t ion to normalising them. W i t h whitening, the rescaled variables v are computed using (Bishop, 1995, Section 8.2). Here, A denotes a diagonal matr ix containing the eigenvalues of the covariance matr ix £ of the variables Vi, and U contains the eigenvectors of S . 3.6.2 Regularisation of the Hessian Whi le working wi th the implementat ion of the Aires et al . a lgor i thm, I encountered a severe difficulty, which has already been discussed by Aires (2004) - the posit ive definite character of the Hessian. A s the covariance matr ix of a Gaussian distr ibut ion, the Hessian has to be posit ive definite. Th is means that al l its eigenvalues have to be str ict ly positive, and that vTHv > 0 for any non-zero vector v (Bishop, 2006). Aires (2004) points out that for the local quadratic approximat ion (3.30) to be val id, the opt imal weights vector w* must be at a real m in imum of the error surface, otherwise the posit ive definite character is not guaranteed. Th is is obviously a problem, since every training algor i thm can only approximate this min imum. Furthermore, as Aires states, the possibly large size of the Hessian (W x W, wi th W being the tota l number of weights in the network) has the consequence that its est imation needs to be done wi th a large enough dataset, otherwise the eigenvalues could become very smal l or even negative, also violat ing (3.61) 5 n = A - l / 2 ? 7 T ( ; (3.62) 87 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation the posit ive definiteness of H. A solution to this problem suggested by Aires (2004) is to add a diagonal regularisation m a t r i x 3 8 XI to the Hessian, where A is a smal l scalar and / is the identi ty matr ix . If A is large enough, this approach wi l l result in a posit ive definite matr ix. However, the addi t ion wi l l also result in a bias in quantit ies estimated from the regularised matr ix (Rigdon, 1997). Using a combinat ion of cr i ter ia to measure condit ion number and posit ive definiteness, Aires (2004) obtains A = 12 for his example. M y own experiments, however, showed that A needs to assume values larger than 10, 000 in certain cases, so that the original character of the Hessian is considerably altered. Another possibil ity, mentioned in passing by Nabney (2002), is to use an eigenvalue decomposit ion, which, for the given case, is closely related to the truncated singular value decomposition (e.g. Hansen, 1994). If the matr ix H is not positive definite, then one or more of its eigenvalues wi l l be negative. Nabney (2002, Section 9.4.2) decomposes the data Hessian H n into its eigenvalues and eigenvectors, sets al l negative eigenvalues to zero and reconstructs the matr ix: HD = VAVT, (3.63) where V contains the eigenvectors of Hp and A the modified eigenvalues. If the matr ix HD is at least posit ive semi-definite (i.e. eigenvalues can also be zero) and Hermi t ian (which al l real symmetr ic matrices are), the eigenvalue decomposit ion is equivalent to the singular value decomposit ion and the eigenvalues coincide wi th the singular values (Abd i , 2007, Section 2). Since the reconstructed HQ st i l l has zero eigenvalues, Nabney further adds the weights hyperparameter AT in order to reconstruct the ful l Hessian H. Since in Nabney's diagonal matr ix, his procedure is a combinat ion of an eigenvalue decomposit ion and the method proposed by Aires. 3.7 Usefulness of the Aires et al. Method The figures given in this chapter so far show that the Aires et al . method yields the expected results for the example given by equations (3.9) and (3.10); the hyperparameters stabilise as expected after several re-estimation iterations (Figure 3.7), the network converges (Figures 3.8 and 3.9), and the network wi th 3 8 Do not confuse the meaning of regularisation used here with the meaning of the word used for the weights term in the error function. Regularisation for neural network training denotes adding a weight penalty term to the error function in order to encourage small network weights. Regularisation of the Hessian here describes methods to make the Hessian (and its inverse) positive definite. Note that the term regularisation of matrices is also often used in the context of matrix inversion or the solution of linear systems, in this case a matrix is singular or close to singular and has to be regularised in order to make it invertible with the available numerical precision. 88 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation 2r Input F i g u r e 3 .10 : Demonstrat ion of the input dependence of the error est imation given in (3.43). Shown is the one-dimensional generator funct ion y(x) = sin(37r:r) + x2 (dashed l ine), f rom which 100 datapoints were created by adding Gaussian noise wi th a standard deviat ion of 0.1 in the intervals [0.15..0.4] and [0.9..1.0]. The network predict ion (solid line) is shown together wi th error bars at one standard deviation. Note the larger error bars in the regions where no t ra in ing data existed, reflecting the inverse dependence of the estimated error on the t ra in-ing data density. (Example generated wi th the original N E T L A B implementat ion wi th scalar hyperparameters.) the broader weights distr ibut ion (Figure 3.3) produces larger error bars (Figure 3.4). However, in many examples, I found the performance of the error est imation to be unsatisfactory, and several problems and open questions require further investigation. Figures (3.10) to (3.14) i l lustrate some of the problems and failures I encountered while testing my implementation. A serious problem is the actual size of the error bars. As mentioned in sections 3.3 and 3.4, Aires et a l . assume the intr insic noise on the data to be constant throughout the dataset, and Wi l l i ams et al . (1995) show that the uncertainty due to the neural network weights is approximately proport ional to the inverse data density. Especia l ly in the examples given by Bishop (1995, F igure 10.9) and Nabney (2002, Figure 9.4), the error bars increase significantly in regions where no training data is given. However, I found that for many other functions such good results seem to be difficult to obtain. Furthermore, Wi l l i ams et al . (1995) found that the neural uncertainty contr ibut ion of a network trained from a very dense dataset wi l l become insignificant compared to the noise term. For instance, F igure 3.10 shows a simple example of a network trained from a dataset including 100 datapoints in two dist inct intervals. A s expected, the error bars in the middle section, where no training data was given, are larger than in the sections where data was available. Consequently, the-true generator function is st i l l w i th in the error interval of 89 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation L 5 0 0.2 0.4 0.6 0.8 1 Input Figure 3.11: Same as Figure 3.10, but wi th a training dataset consisting of 10,000 datapoints. Th is example demonstrates how the estimated error bars depend on the factors such as the size of the tra in ing dataset, an effect that is not desired in the given case. 2r. _ 4 ! 1 1 1 1 1 0 0.2 0.4 0.6 0.8 1 Input X . Figure 3.12: The same example as in Figure 3.4, but the network has only been trained wi th data in the interval X\ = [0.15..0.4] (plotted in grey, however, are the entire t raining data). Aga in , the error bars diverge where no training data was present, however, the divergence is much too weak to indicate the actual error - a problem that I encountered in many cases and which emphasises how difficult it can be to interpret the error bars. 90 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation -0.21 1 1 ' 1 ' 1 1 1 - 1 0 1 2 3 4 5 6 7 Input Figure 3.13: L imi tat ions of the error estimation - a simple two-layer perceptron is not able to model a mapping that contains ambiguit ies, such as the mapping shown here (dashed l ine). We would hope that at least the error bars could reflect the problematic areas by becoming larger in the ambiguous regions, however, this is not the case. Furthermore, the lower data density in the non-ambiguous regions acts as a counter-productive effect. (One-dimensional example generated wi th the original N E T L A B implementation wi th scalar hyperparameters.) Figure 3.14: The errors in Figures 3.5 and 3.6 were predicted using a Hessian regularised wi th the eigenvalue decomposit ion described in section 3.6. Shown here is the error predicted by the same network as in Figure 3.6, but using non-regularised Hessian. Note the str ik ing difference between computat ions performed wi th a regularised and a non-regularised Hessian. Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation the network predict ion. However, if the same problem is repeated w i th 10,000 datapoints (Figure 3.11), then the network uncertainty term suddenly becomes very smal l - including the interval where no data were present. In this case, the true function is outside of the error bars of the predict ion, and we are confronted wi th the counter-intuit ive result that more observations lead to a worse result. F igure 3.12 shows a variat ion of the two-dimensional example used in the previous sections. It displays the same section through the functional surface as Figure 3.4, but this t ime the network has been trained only w i th data in a narrow interval. As expected, the error bars diverge in the part where no training data was present, however, they are much too smal l to indicate the true error to the actual function. Another problem is the presence of ambiguities in the training data. A simple two-layer network architecture is not able to model such ambiguities (Bishop, 1995), but as I pointed out in Chapter 1, the hope is that the error bars reflect these regions through a larger uncertainty. However, I was not able to achieve this result. O n the contrary, regions where ambiguities exist actual ly have a higher data density than regions where only one functional value is present, leading to the possibi l i ty that the error bars are even smaller in the ambiguous parts (Figure 3.13). Th is investigation also raises several questions concerning the regularisation of the Hessian - how large is the influence of the regularisation on the training algor i thm and on the error bars? W h i c h is the better regularisation method, and how much information is destroyed by performing the regularisation? Figure 3.14 shows the errors from Figure 3.6, but computed using the non-regularised Hessian - the neural uncertainty term almost vanishes. In other examples, I also encountered negative errors due to a non posit ive definite inverse Hessian. The problem is made worse by the fact that the Hessian is estimated from a large dataset. Since it represents the second derivative of the error w i th respect to the weights, est imating it from different datasets should not significantly change the Hessian. Usual ly it wi l l be estimated from the t ra in ing dataset. However, if the size of the dataset is changed, or the Hessian is estimated from only a part of it, the negative eigenvalues can sl ightly change, result ing in a different regularisation and hence different error bars. Also unclear is the effect of the regularised Hessian on the Jacobian variabi l i ty estimates. The numerical accuracy of the implementat ion also becomes a problem in the l ight of the issue that the method has been devised for noisy target data. Hence, if the intr insic noise variance of the training data becomes smal l and the network predict ion is good enough so that the covariance matr ix of the errors Co T — i has very smal l elements, the average neural uncertainty term G H G has to be even smaller in order to re-estimate Cin w i th the help of (3.44). G iven the regularisation issue, however, I often encountered 92 Chapter 3. Nonlinear Regression with Neural Networks and Uncertainty Estimation the case that neural uncertainty term was larger than Co, leading to negative variances in Cin, which T — t obviously does not make sense. As a simple workaround, the G H G term can be omit ted in such cases, re-estimating Cin by setting it to the CQ values, however, such an approach is l ikely to generate further uncertainties and problems. Unfortunately, the noise on the target data in my inverse problem is smal l , and consequently, I was confronted wi th this problem during the appl icat ion of the method to the actual retrieval problem. I wi l l come back to this topic in chapter 4. For future investigations, it would be interesting to follow the suggestions of M a c K a y (1995) and compare the performance of the training algori thm using a ful l diagonal Ar w i th an ind iv idual hyperpa-rameter for each weight to its performance when only a few scalar hyperparameters for dist inct groups of weights are used. For some test runs I observed large fluctuations of some elements of A r , and it might be worth testing whether suppressing such fluctuations can influence the t ra in ing performance. In conclusion, it seems to require a lot of ski l l and experience w i th neural networks in order to t ra in a network well and to interpret the results of the error estimation correctly. G iven the t ra in ing data density dependent magnitude of the error bars and the inabi l i ty of the method to recognise ambiguit ies, the usefulness of the uncertainty estimates for the retrieval problem is l imi ted. Th is is especially true when considering the numerical problems in the implementation. Nevertheless, the obtained uncertainties wi l l give a general idea of the certainty in the neural network fit. The Jacobian and its variabil i ty, on the other hand, provide powerful tools to analyse a network. Hence, at least this part of the uncertainty estimation framework should provide useful results for the retrieval A N N s . 93 Chapter 4 Retrieval, Results and Evaluation In the previous chapters, I presented the theory and design of the forward model and the neural network techniques to be used in the retrieval algor i thm. In this chapter, I w i l l report on the appl icat ion of these methods to the actual retrieval. The satellite images of the Ju l y 11, 2001, scene as well as the corresponding L U T setup wi l l be described in section 4.1. Not al l pixels observed by the satellite were overcast. In section 4.2, I discuss the operat ional M O D I S cloud mask product and report on problems I encountered wi th a number of pixels classified as overcast, but exhibi t ing B T D values outside the expected range. The design of the neural network is the topic of sections 4.3 and 4.4. I also discuss how the Jacobian can be used to analyse the retrieval performance of a given network architecture, present the sensitivity of the retrievals to cloud top pressure (cf. section 2.6) and check on their physical plausibi l i ty. G o o d results were obtained wi th a network architecture containing 15 neurons in the hidden layer that used the brightness temperatures of channels 20 (3.7 fim) and 31 (11 /um), the B T D of channel 31 to 32 (12 fim) and surface temperature as inputs. Th is network's results wi l l be evaluated in section 4.5 by comparison wi th the D Y C O M S - I I in-si tu data and an analysis of its Jacobian. I conclude this chapter wi th a discussion of my findings in section 4.6. 4.1 Retrieval Setup 4.1.1 The Test Scene The satellite image of the test scene was taken by the M O D I S instrument aboard N A S A ' s Terra satellite at 6:25 U T C on Ju ly 11th, 2001 , 3 9 corresponding to a local t ime of 11:25pm P D T (Pacif ic Dayl ight T ime) . F igure 4.1 shows the BTD(3.7-11) , as well as the 11 /an brightness temperature (hereafter B T ( l l ) ) images. The black circles in the BTD(3.7-11) image mark the flight track dur ing this night. The satellite 3 9 In 2001, only the Terra satellite was operational. Hence, no image from Aqua, which would have been approximately six hours later, was available. 94 Chapter 4. Retrieval, Results and Evaluation overpass did not coincide wi th the in-situ measurements, which were taken approximately five hours after the M O D I S images were recorded. A mean wind speed of about 8 m s _ 1 f rom the north-west (310°N) was observed during the flight (Stevens et al . , 2003b), so that the clouds wi th in the dashed rectangle shown in Figure 4.1 .(top) have l ikely been advected into the flight area. They wi l l serve for comparison wi th the in-situ data. The scene contains two ship tracks, discernible in the BTD(3 .7-11) , but not in the B T ( l l ) image. Th is can be explained by not ing the much smaller single scatter albedo in the thermal infrared in F igure 1.8. Th is means that the thermal cloud top radiances are main ly a function of c loud emission, and the impact of the scattering term in (2.8) becomes smal l . Since the L W P of the cloud stays constant across the ship tracks, the emission does not change. As noted in Chapter 2, the ship tracks wi l l be used to check on the physical consistency of the retrieval. 4.1.2 Lookup Table Setup A s discussed in Chapter 2, the forward model requires the input of four variable ( r e / / , N, pct, Tct) and three fixed (psfc, vgam, /c-value) parameters. These parameters were obtained from the in-situ mea-surements. Dur ing R F 0 2 , c loud top effective radi i ranged from approximately 8 to 16 pm and droplet number concentrations varied between 25 and 115 c m - 3 . C loud top temperature in the flight area varied only sl ightly between 284 and 285 K , and a fair ly constant cloud top pressure of 939.5 h P a was observed (not shown). In order to account for a potential ly larger var iabi l i ty in the entire satell ite scene, the L U T was generated wi th effective radi i ranging from 3 to 23 /zm, cloud top temperatures varying between 280 to 288.5 K , and droplet concentrations ranging from 20 to 200 c m - 3 . C loud top pressure was fixed at the observed value of 939.5 h P a , as was the surface pressure at 1016.8 h P a . In order to find representative values for k-value and vgam (cf. section 2.3), I created statistics of the droplet size distr ibutions that were encountered during the flight. F igure 4.2 shows scatterplots of fc-value (obtained from equation (2.19)) versus height. A s noted in section 2.1, the data obtained dur ing R F 0 2 cover only cloud top and bot tom. The fc-values that are found cover ranges simi lar to those found by Pawlowska and Brenguier (2000) dur ing A C E - 2 (Figure 2.5). A similar picture was obtained from the R F 0 3 data (right panel). F igure 4.3 displays histograms of both fc-value and vgam inferred from the R F 0 2 measurements. Since the radiat ive transfer through the entire cloud is simulated, I decided to take k as the average of cloud top and bot tom values. F rom the distr ibutions shown, I chose a k of 0.8. Th is corresponds roughly to 95 Chapter 4. Retrieval, Results and Evaluation channel 20 - channel 31 brightness temperature difference [K] longitude (degrees west) channel 31 brightness temperature [K] longitude (degrees west) F i g u r e 4 . 1 : (Top) Brightness temperature difference ( B T D ) between channel 20 (3.7 pm) and channel 31 (11 pm) at 6:25 U T C on Ju ly 11th, 2001. The ship tracks have a smaller B T D than their environment, as expected from Figure 2.11. The black circles mark the flight track during this night. The in-situ measurements were taken approximately five hours after the satellite overpass, so that the clouds wi th in the dashed rectangle have l ikely been advected into the flight area and wi l l serve for comparison. (Bottom) The same for channel 31 brightness temperature. Note that the shiptracks are not discernible in this channel. 96 Chapter 4. Retrieval, Results and Evaluation rf02 07/11/2001 06:24:40-15:52:35 rf03 07/13/2001 06:18:23-15:46:03 0.4 0.6 k value 0.4 0.6 k value F i g u r e 4 .2 : fc-value vs. height as inferred from the D Y C O M S - I I in-si tu measurements of (left) Ju ly 11 and (right) Ju ly 13, 2001. 0.06 0.05 §0.04 g 0 . 0 2 o G 0 . 0 1 0.00, rf02 07/11/2001 06:24:40-15:52:35 rf02 07/11/2001 06:24:40-15:52:35 I I l 10 20 30 40 gamma distr. shape parameter 50 0.4 0.6 k value F i g u r e 4 .3 : Histograms of (left) vgam and (right) fc-value as inferred from the D Y C O M S - I I in-si tu measurements of Ju ly 11, 2001. The black histograms represent cloud top values, the white ones cloud base values. ugam 26, which I used to compute the M ie tables. Note that these values are considerably larger than the average values found by Mi les et al . (2000). The optical properties of the overlying atmosphere were computed f rom the San Diego sounding displayed in Figure 2.9. The sounding was recorded at 12:00 U T C , also about five hours after the satellite overpass. 4.2 Cloud Mask In order to confirm the accurateness of the forward computat ions, it is important to test how the com-puted brightness temperatures compare to the observations. If the forward model represents a reasonable 97 Chapter 4. Retrieval, Results and Evaluation approximat ion to the actual clouds, the observed B T / B T D values should lie well inside the ranges defined by the L U T . However, if B T / B T D values outside the defined ranges are encountered, the corresponding pixels have to be nagged as "irretr ievable", since the inverse funct ion is not defined for such cases . 4 0 It is l ikely that some pixels in the scene wi l l contain clear sky or broken clouds. Mu l t ip le scattering from cloud sides and inhomogeneous mixtures of cloud and clear sky mean that I expect some pixels in these broken regimes to exhibit radiances not reproducible w i th the plane paral lel forward model. A scatterplot of the forward model computations, overlain wi th the M O D I S observations indeed showed a large number of outliers (not shown). Hence, the operational M O D I S cloud mask product (Ackerman et a l . , 1998) was employed to filter the satellite image for fully cloud covered pixels. 4.2.1 M O D I S Cloud Mask and Irretrievable Pixels A simple way to discriminate cloudy from clear sky pixels and the approach used in the studies of Perez et a l . (2000), Gonzalez et al. (2002) and Cerdena et al . (2007) is the spat ial coherence method proposed by Coakley and Bretherton (1982). The algori thm uses informat ion from neighbouring pixels to assess the spatial homogeneity in the observations. B y comput ing mean B T and standard deviation from smal l clusters of pixels (e.g. 2-by-2), homogeneous (i.e. low standard deviat ion) areas wi th cold temperatures are classified as cloudy and homogeneous areas wi th warm temperatures as clear. Since Coakley and Bretherton (1982), several other methods have been proposed to recognise cloudy pixels in satellite images (Ackerman et a l . , 1998, and references therein). The operat ional M O D I S cloud mask product combines several of these methods into one product, selecting amongst a variety of tests optimised for different underlying surfaces (i.e. water, different land surfaces, ice) and employing several of the available visible (daytime) and infrared (at day and night) channels. A descript ion can be found in Ackerman et al . (1998). The top panel of Figure 4.4 shows a map of the cloud mask for the Ju l y 11 scene. The image pixels are classified into four categories: cloudy, uncertain clear, probably clear, and confident clear. App l ica t ion of the cloud mask to the scene significantly reduced the number of points outside the LUT-de f ined B T / B T D range. However, after I el iminated al l pixels that were not classified as cloudy, the brightness temperature difference diagrams st i l l contained many outliers (Figure 4.5). A s noted in section 1.5, B a u m et al . (2003) also employed the 8.5 fim signal ( M O D I S channel 29) in their work. As an unfortunate result, the observed brightness temperatures for channel 29 were entirely outside of the range 4 0 Note that the neural network will still retrieve some values for such undefined inputs, so that it is important to remove the corresponding pixels from the scene in order to avoid unphysical retrievals. 98 Chapter 4. Retrieval, Results and Evaluation MODIS cloud mask product longitude (degrees west) F i g u r e 4 .4 : (Top) M O D I S cloud mask product for the scene. A l l black pixels are classified as cloudy, the orange (grey in black and white) pixels are uncertain clear, and the white pixels are probably clear. (Bottom) C loud mask derived from the B T / B T D range in the L U T (all observations that are outside the LUT-def ined B T / B T D region in F igure 4.5 are marked as dark red). Note that the M O D I S cloud mask has "grown" ; are the addit ional pixels broken clouds? 99 Chapter 4.. Retrieval, Results and Evaluation computed by my forward model (Figure 4.6). Current ly, I can only conclude that there is an error in the forward model, and consequently I wi l l drop channel 29 from the retrieval scheme. It would, however, be desirable to investigate the cause of the failure and to include the 8.5 /m i informat ion in future retrieval designs. Yet what causes the large number of outliers in the remaining channels 20 (3.7 /urn), 31 (11 /mi) and 32 (12 /mi) brightness temperatures? The bot tom panel of F igure 4.4 shows the scene wi th al l pixels outside the LUT-de f ined B T / B T D range removed. The cloud mask basical ly grows - this could be an indicat ion of more broken or otherwise inhomogeneous clouds at the edges of the clear areas. 4.2.2 Possible Failure Mechanisms There are several possibil it ies for the failure of the forward model to match the observed brightness temperatures. Besides inhomogeneous clouds, it is possible that other assumptions in the forward model do not match the real clouds accurately enough. For instance, the anomalous pixels could contain suba-diabat ic clouds or clouds wi th a significantly different cloud top height. Effective radi i , droplet number concentrations and temperatures outside the in the L U T ranges seem unlikely. The major i ty of the "ir-retrievable" pixels in Figure 4.5 exhibit both a larger BTD(3.7-11) and BTD(11-12) signal than defined by the L U T , while the B T ( l l ) signal is well inside the computed range. Warmer or colder cloud tem-peratures would cause the locat ion of the fai l ing points on the B T ( l l ) axis to fal l outside the computed range. Larger rejj than defined would cause a larger BTD(3.7-11) signal, but at the same t ime a smaller BTD(11-12) observation (cf. F igure 2.11). Since droplet concentration is not a direct input into l i bRad-t ran (cf. section 2.6), changing droplet concentrations would influence reff and r, the latter of which would merely displace the B T / B T D points wi th in the defined ranges (cf. F igure 2.11). Another possibi l i ty is that the structure of the overlying atmosphere changed wi th t ime and locat ion, so that the sounding recorded at San Diego is not representative for the entire scene. Th is could also include the presence of th in high level clouds (cirrus). In order to investigate the problem of the irretrievable pixels, I analysed the pixels along a test line in an area that included clouds wi th in the LUT-de f ined B T / B T D range, pixels classified as cloudy by the M O D I S cloud mask, but outside the defined B T / B T D range, and pixels classified as clear sky by the M O D I S cloud mask. The line is shown in the bot tom panel of F igure 4.4. The capi ta l letters A and C mark pixels wi th in the LUT-def ined B T / B T D range, and B marks a clear sky pixel . I first tested for broken clouds. Luo et al . (1994) showed that broken clouds exhibit a characteristic 100 Chapter 4. Retrieval, Results and Evaluation F i g u r e 4 . 5 : L U T (grey) and M O D I S cloud mask filtered scene brightness temperatures. Many pixels are st i l l outside of the B T range contained in the L U T . These pixels are not defined if a retrieval network is trained wi th the L U T data, hence they have to be nagged as " irretr ievable". Chapter 4. Retrieval, Results and Evaluation 280 285 290 brightness temperature ch . 31 (1 l u m ) [K] F i g u r e 4 .6 : The same as Figure 4.5, but for the brightness temperature difference between channels 29 and 31 (8.5 and 11 urn). Channel 29 B T s are 1-5 K cooler than the M O D I S measurements. Channel 29 hence could not be used for the retrievals. pattern when the observed 11 yum radiances (not brightness temperatures) are plotted against the 12 fim radiances. In part icular, if a given area contains clear sky pixels, broken clouds and fully overcast pixels, the entire set of observations resembles a continuous curve in the d iagram between cloudy and clear pixels. F igure 4.7 shows an 11 versus 12 /jm radiance plot for the sections from A to B (left panel) and from B to C (right panel). The section between B and C clearly contains broken clouds. This area already contains many pixels classified as "uncertain clear" by the M O D I S cloud mask product, so that I conclude that inhomogeneities on the sub-pixel scale are l ikely to be responsible for the failures. The other section between A and B, however, does not show the characteristic broken cloud signature in the radiance plot. Yet a larger number of the pixels along this line are irretrievable. In order to extend the analysis, I examined the observed BTD(3.7-11) and BTD(11-12) signals (Figure 4.8). Simi lar to the radiance plot in Figure 4.7, the observed values cluster around the cloudy and clear foot. The elevated BTD(3.7-11) signal places almost al l points outside of the LUT-de f ined B T / B T D range (with point A just at the edge), however, the pixels clustering around the cloudy BTD(11-12) signal are al l well inside the computed range. A higher cloud top seems to be an unlikely cause of such a behaviour. A s noted in Chapter 2, the differences in the B T D signals due to varying cloud top pressure in the forward model are mainly due to the change in sea surface temperature produced by the new adiabatic profile (cf. F igure 2.12). However, the existence of a much warmer Tsfc between A and B compared to B and C seems unlikely. A lso, both 102 Chapter 4. Retrieval, Results and Evaluation radiance ch. 31 (1 l(im) [W/m /sr/nm] (a) 7.8 7.7 7.6 7.5 7.4 7.3 7.2 7.1 7 BC clear cloudy 7.4 7.6 7.S S.2 radiance ch. 31 (ll".m) [W/m /sr/|Xm] (b) F i g u r e 4 .7 : 11 / /m versus 12 /jm radiance plots after Luo et a l . (1994) for the observations along the line shown in Figure 4.4. Left panel: the section from A to B, right panel: from B to C. The solid lines connect the observed radiances at A and B (left) and B and C (right), respectively, and are only shown as references to the point clouds. The observations between points B and C follow a typical signature of broken clouds. 103 Chapter 4. Retrieval, Results and Evaluation U 4r |-4 " 280 285 290 295 brightness temperature ch. 31 (Hum) [K] 5 " 2r brightness temperature ch. 31 (llp.m) [K] F i g u r e 4 .8 : BTD(3.7-11) and BTD(11-12) signals of the pixels between points A and B as shown in Figure 4.4. A s in the radiance plot in Figure 4.7, the observed values cluster around the cloudy and clear foot. It is possible that this behaviour is caused by th in overlying cirrus clouds. Chapter 4. Retrieval, Results and Evaluation BTD(3.7-11) and BTD(11-12) would be impacted by such a change. 4 1 Other causes for an increased pct could be equally thick, but elevated clouds, or geometrically thicker clouds. The first possibi l i ty would not change the optical properties of the cloud, and the second mechanism would merely lead to increased T and reff. B a u m et a l . (1994) and B a u m et al . (2003) showed that overlying cirrus clouds lead to an increase in both the BTD(3.7-11) and BTD(11-12) signals. They also showed that th in cirrus has relatively smal l impact on the BTD(11-12) signal, making th in cirrus a l ikely cause of the lookup table failure along A B . However, in order to give this presumption more confidence, further investigations would have to be conducted. A lso , possible effects of subadiabatic clouds should be examined in the future. For this thesis, I wi l l continue to work wi th those pixels that fall w i th in the LUT-de f ined B T / B T D range. 4.3 Network Training 4.3.1 Network Architecture A s described in Chapter 3, I restricted myself to the two-layer perceptron design. Cerdena et al . (2007) found that a three-layer architecture exhibited a better generalisation performance in their study, however, since a two-layer network should already be able to model arbi t rary functions (cf. Chapter 3), the generalisation of the software to three-layer architectures was not a pr ior i ty of my work. W i t h respect to the inputs, decisions to be made included whether to provide the satellite observations as radiances or as brightness temperatures to the network, the inclusion of sea surface temperature as input, and how the inputs should be preprocessed. Another difficult problem was selecting a good number of hidden neurons. Aires (2004) used a two-layer perceptron wi th 30 neurons in the hidden layer in their example of microwave remote sensing, and as noted in Chapter 1, Cerdena et a l . (2007) employed 20 neurons in the first and five neurons in the second layer. Mot ivated by these values, the number of hidden neurons in my architectures wi l l be of the same order of magnitude. It d id not seem feasible wi th in the timeframe of this Master 's thesis to systematical ly t ra in, analyse and compare a large number of different network architectures in order to find the best possible one. Instead, after some prel iminary try-outs, I selected some architectures that I wi l l present in more detai l in this chapter. These networks yielded the most interesting results. The methods applied to analyse these 4 1 The data in Figure 2.12, computed with a similar cloud top temperature as encountered in the July 11 scene, also showed that a relatively large increase in pct is necessary in order to increase the BTD(3.7-11) signal (40 hPa between the solid and dotted curve in Figure 2.12). Also, the resulting relative change in the BTD(11-12) signal is stronger than in the BTD(3.7-11) signal. 105 Chapter 4. Retrieval, Results and Evaluation networks, however, can readily be used for any other network architectures to be employed in future work. The authors of al l works presented in Chapter 1 that investigated the nocturnal retrieval case employed satellite observations in the form of brightness temperatures, while the dayt ime retrieval techniques in general use radiances. In fact, my tests showed that networks using radiances as inputs were unable to approximate the inverse function (possibly because channel differences provide the most information about cloud opt ical properties - I d id not investigate the use of radiance differences as inputs). Therefore, and in order to stay consistent wi th the published l i terature, I chose to use brightness temperatures instead of radiances as inputs (this also facilitates the analysis of the networks in sections 4.4 and 4.5). Unfortunately, the t ra in ing process generally was unstable, as I wi l l discuss in section 4.4. Tra in ing of networks of a given architecture led to very different results when the weights were ini t ial ised differently at the beginning of the training process - a property that I attr ibute to the inverse problem being il l-posed (cf. chapter 3). Th is is also reflected in a large variabi l i ty in the Jacobian (see the following section). Consequently, my goal for this thesis is not to find the best possible architecture, but to show that the method is working in principle and to demonstrate the use of the Jacobian and other tools to compare architectures and to evaluate the performance of a given network. 4.3.2 Failure of the Aires et al. Hyperparameter Re-Estimation The est imation of the matr ix hyperparameter Ain also often failed dur ing network training. As discussed in section 3.7, the neural uncertainty term GTH~lG in general was larger than the error covariance matr ix C o , leading to negative variances in Cin. The problem was encountered with both regularisation methods described in section 3.6. Aires (2004) pointed out that the estimation of the Hessian H f rom the t ra in ing dataset has to be done wi th a large enough number of datapoints in order to avoid numerical problems. However, increasing the number of samples in the L U T from in i t ia l ly 30,000 (20,000 for t ra in ing and 10,000 for validation) to 96,000 (64,000 and 32,000, respectively) d id not improve the si tuat ion (for comparison, Aires (2004) used 15,000 samples for t ra in ing and 5,000 for testing, and Cerdena et a l . (2007) 20,000 and 10,000, respectively). The true cause of the failures currently remains unclear. Further research is needed to clarify whether the modif icat ion of the Hessian due to the regularisation is the important factor, or whether the noise level on the target data in the L U T is too small to be treated w i th the Aires et a l . method. As noted in section 3.1, there is no noise on the target data except for the expected ambiguit ies. However, as I d id 106 Chapter 4. Retrieval, Results and Evaluation not investigate the actual magnitude of such ambiguities, I cannot rule out the possibi l i ty that they are smal l . Due to these problems I eventually decided to t ra in the networks wi th the or iginal scalar hyperpa-rameter approach implemented in N E T L A B (cf. section 3.2). Using the L U T containing 96,000 datapoints and the Nabney (2002) eigenvalue regularisation (cf. section 3.6), I was able to estimate a Cin from the trained network in some cases, although often this strategy failed as well. However, the est imation of the Hessian and the network gradient were always possible, so that the var iabi l i ty of the Jacobian as well as the neural uncertainty term GTH~1G could be computed. 4.3.3 Input Preprocessing As discussed in Chapter 3 and expected from the findings of Aires et al . (2004b), correlations among the input variables were a problem. The observed brightness temperatures of the three employed channels were al l correlated amongst each other w i th (linear) correlation coefficients larger than 0.8. Th is lead to widely varying Jacobians, and it was pract ical ly impossible to obtain a network fit that approximated the lookup table well. The input data were thus decorrelated and normalised wi th the whitening procedure described in Chapter 3, leading to better results. 4 . 4 Network Architecture: Inputs and Hidden Neurons 4.4.1 Brightness Temperature Differences Despite the input preprocessing, it appeared to be difficult for the A N N s to infer the brightness temperature differences from the B T inputs. Especia l ly the close correspondence between the 11 and 12 /jm channels seemed to have a negative effect on the stabi l i ty of the t ra in ing process. In fact, I was not able to produce a reasonable fit to the L U T wi th networks that used B T ( l l ) and BT(12 ) (12 /jm B T ) as inputs. To analyse the problem, I computed the Jacobians of ind iv idua l points in the L U T in order to compare the sensitivities to expected values. Figure 4.9 shows the B T / B T D diagrams of a subset of the Ju ly 11 L U T . In addi t ion to the cloud top pressure, cloud top temperature is also held constant at 285 K . The effective radius varies in steps of two microns from 4 fim to 12 /j.m, as in F igure 2.11. Since al l variables are connected wi th each other in a nonlinear way, it is difficult to assess Jacobian values visual ly f rom the plots in Figure 4.9. O f course, it is possible to compute finite difference deriva-tives from the L U T . However, f inding the required datapoints for a sensit ivi ty est imation constitutes a 107 Chapter 4. Retrieval, Results and Evaluation ^ I f QI . 1 1 — • • • 284 285 286 287 288 289 290 291 brightness temperature ch. 31 (Hum) [K] F i g u r e 4 .9 : Subset of the Ju ly 11 L U T . In addi t ion to the cloud top pressure (939.5 hPa) , the cloud top temperature is also held constant at 285 K . Compare v isual ly obtained estimates of the Jacobian for r e / / « 8pm and B T ( l l ) « 288 K to the computed values in Tables 4.1, 4.2 and 4.3. The symbols correspond to the optical thicknesses defined in F igure 2.11. 108 Chapter 4. Retrieval, Results and Evaluation mult idimensional opt imisat ion problem itself. Comput ing the derivative of an output to a given input, at least two points have to be found for which the other three inputs are constant. E v e n wi th 96,000 datapoints, this was not possible to a good enough accuracy (solving this problem, for instance, wi th larger L U T s and interpolat ion could constitute an important part of future research in this area). I thus restricted myself to order-of-magnitude estimates from Figure 4.9 where possible. For instance, for an effective radius of 8 f im, if B T ( l l ) is held constant at 288 K , r3r e / / /<9BT(3.7) « 2.5 / jm/K. Similar ly, <9r e///<9BT(12) s=y 18/um/K. C loud optical thickness decreases sl ightly w i th increasing BT(3 .7) , and increases wi th increasing BT(12) (<9T/<9BT(3.7) « - 0 . 5 K " 1 and r>r/dBT(12) is posit ive). If al l cloud parameters are held constant, an increase in surface temperature wi l l lead to a similar increase in cloud top temperature (cf. Figure 2.9). Since the cloud in question is th in ( r sa 1.3), the surface temperature signal is expected to be "vis ible" through the cloud, so that dTct/dTSfc « 1. Likewise, Tct should depend on B T ( l l ) and BT(12 ) ; for thicker clouds, these two inputs almost entirely determine cloud top temperature (cf. section 4.1 and Figure 2.9). Table 4.1 shows the Jacobian estimated wi th the entire L U T trained network at the discussed coor-dinates of rejf « 8/^m and B T ( l l ) sa 288 K. The A N N contained 30 neurons in the hidden layer, and in addi t ion to BT(3 .7 ) , B T ( l l ) and BT(12) also used surface temperature as an input (the Tsfc input wi l l be discussed short ly). The variabi l i ty was obtained by computing the Jacobians of 10,000 samples from the weights distr ibut ion, as described in section 3.5. T a b l e 4 . 1 : Point estimate of the Jacobian of a network containing 30 neurons in the hidden layer and using BT(3 .7 ) , B T ( l l ) , BT(12) and Tsfc as inputs. Note the large variabi l i ty of the Jacobian, indicat ing an il l-posed problem. Furthermore, the dependences on B T ( l l ) and BT(12) are large and of opposite sign. dreff 1 — [MW-K"] dT / — [K/K] dr/ — [K-1] <9BT(3.7) 0.22 ± 4 . 7 2 - 0 . 0 7 ± 1.58 - 1 . 1 0 ± 9 . 8 3 d B T ( l l ) -20 .16 ±20 .92 - 0 . 7 8 ± 7 . 3 8 - 4 . 8 3 ± 3 4 . 5 3 5BT(12) 19.11 ± 2 1 . 6 7 0.75 ± 8 . 3 0 5.56 ± 3 7 . 1 2 dTsfc 1.18 ± 3 . 3 7 1.09 ± 1.27 0.47 ± 6 . 0 5 Two features are part icular ly noticeable. F i rs t , there is a large var iabi l i ty in the computed values (i.e. uncertain network fit), especially in the dependences on B T ( l l ) and BT(12 ) . Second, the dependences on these two channels are large and of opposite sign. Since B T ( l l ) and BT(12 ) are so highly correlated, this behaviour models the dependence on the differences between the two channels (since B T ( l l ) and BT(12) wi l l always change by approximately the same value). However, the large values make the retrieval very 109 Chapter 4. Retrieval, Results and Evaluation sensitive to noise in the inputs, and there is no reason why they could not be much smaller or of the same sign but slightly different magnitude. Large and opposite sign sensitivities for at least one output could be observed for al l A N N s trained wi th B T ( l l ) and BT(12) inputs (although the magnitude of the dependences and variabi l i ty varied, depending on the min imum that was found in the weights error surface), and the performance of the networks was very sensitive to the ini t ia l isat ion of the weights and the number of h idden neurons. In an attempt to make the training process more stable, I replaced the BT(12 ) input wi th the BTD(11-12) signal, which indeed improved the training performance. It is worth not ing that a simi lar replacement of the BT(3.7) input by the BTD(3.7-11) signal d id not lead to comparable improvements. The empir ical Jacobian reported by Cerdena et al . (2007) (cf. Chapter 1) shows the same behaviour of opposite signs, but w i th smaller magnitudes (although they compute a scene-averaged Jacobian, whose values should not be 'compared with the point estimate given in Table 4.1). The i r network seems to perform well (although no results are given in the paper); hence I conclude that in general it is possible for the A N N to model the correct dependences from the "raw" B T inputs, but that using the BTD(11-12) input improves the stabi l i ty in the training process. 4.4.2 Three Input Networks A different question was whether surface temperature, as used in al l previous studies, is a necessary input. Perez et a l . (2000) argued that Ts/C is necessary in order to compute the upwell ing cloud base radiat ion, and Cerdena et al . (2007) added that the effects of water vapour in the atmosphere can be accounted for by using clear sky B T s of al l input channels. I prescribe the opt ical properties of the overlying atmosphere and the effects of subcloud water vapour are connected wi th the cloud parameters through the adiabatic model. However, it is true that three variables are not enough to uniquely specify an adiabatic cloud in my forward model. If cloud top pressure is held constant, four more parameters are needed to compute a profile (for instance, cloud.top T , r e / / , l iquid water content and either surface pressure or temperature). Nevertheless, since surface temperature does not belong to the direct satellite measurements, it was worth testing the behaviour of the networks if no Tsfc input is used. The retrieval attempts using three inputs were not successful. F igure 4.10 shows scatter plots of the target variables in the L U T (cloud top re;j, T and r ) versus A N N predict ions, for both a three input network and a four input input network that made use of Tsfc. Such scatter plots provide a good way for visual ising whether the network is able to approximate the target data in the t ra in ing or val idat ion 110 Chapter 4. Retrieval, Results and Evaluation database. If the network predictions are perfect, the plots wi l l take the shape of a straight line. The wider the scatter around this line, the worse is the fit. Obviously, intr insic noise on the target data also creates scatter. The plots in Figure 4.10 lead to the conclusion that the three input network is not able to produce good predictions of r e / / and T, only r is predicted reasonably well. The A N N that produced the displayed results had 30 neurons in the hidden layer, but varying this number did not improve the performance. Nevertheless, in order to ensure that the wide scatter is not caused by ambiguit ies in the L U T and that the four input network possibly overfits the data, I appl ied the three input A N N to the actual scene. The retrieval results of cloud top temperature and cloud optical thickness are shown in F igure 4.11. Wh i le the opt ical thickness retrieval shows the expected signature of opt ical ly thicker clouds along the ship tracks, the tracks are also discernible in the temperature retr ieval, exhibi t ing a higher temperature than their environment. Th is is not the expected physical behaviour - the aerosol particles contained in the ship exhaust should not influence the cloud temperature. Furthermore, many of the retrieved droplet sizes were negative (not shown). A curious feature of al l t rained networks is that they are able to retrieve opt ical ly very thick clouds from the L U T . Due to the saturat ion effect discussed in section 1.5 I had expected that target opt ical thicknesses above a certain threshold would be retrieved as the threshold value. It is currently unclear if there actual ly is enough information in the input data to infer the large opt ical thicknesses or if the networks are overfitt ing in this case. Since the scene of Ju ly 11, 2001, does not contain clouds wi th (retrieved) r larger than about 6, no anomalies can be found in the retrieval (cf. section 4.5). However, this behaviour should be further investigated in the future. 4.4.3 Four Input Networks The scatter plots of the four input network in Figure 4.10 showed that including Tsfc as an input significantly improves the abi l i ty of the network to fit the lookup table well. In contrast to Cerdena et al . (2007), I used the actual surface temperature as a single input, not the clear sky brightness temperatures of al l three employed channels. Using T s / C as an input at first seems to be a significant restr ict ion to the usefulness of the retrieval, since clear sky pixels are not always available in the near v ic in i ty of the clouds whose properties are to be retrieved. However, sea surface temperature retrievals are also possible from microwave imagers such as the Advanced Microwave Scanning Radiometer for the E a r t h Observing System ( A M S R - E ; Kawanish i et a l . , 2003) on-board the A q u a satellite (which also carries the 111 Chapter 4. Retrieval, Results and Evaluation H 10 5 10 15 20 25 Prediction r E- 10 10 15 20 25 ci'f Prediction r eff 290 290 280 285 Prediction T 290 280 285 Prediction T 0 20 40 60 Prediction x 20 40 60 Prediction T F i g u r e 4 .10 : Tra in ing performance of a three input network (left column) employing the BT(3 .7 ) , B T ( l l ) and BTD(11-12) signals, compared to a four input network (right column) addit ional ly making use of surface temperature. The scatter plots show the target data, as contained in the val idat ion database, plotted against the corresponding network predictions. It is clearly visible that three inputs are not enough to approximate the inverse function. 112 Chapter 4. Retrieval, Results and Evaluation cloud top temperature [K] longitude (degrees west) cloud visible optical thickness longitude (degrees west) F i g u r e 4 . 1 1 : Retrievals of cloud top temperature (top) and visible cloud optical thickness (bottom) of the three input network from which the scatter plots on the left side of F igure 4.10 were produced. The ship tracks are discernible in the Tct retrieval, which is not the expected physical behaviour. 113 Chapter 4. Retrieval, Results and Evaluation 0.3 0.25 8 0.2 | 0.15 1 0.1 0.05 °0 20 40 60 80 100 number of hidden neurons F i g u r e 4 . 1 2 : Mean-square-error of the val idat ion dataset for the four input A N N (BT(3.7) , B T ( l l ) , BTD(11-12) , T3fc) in dependence of the number of hidden neurons. The networks were trained wi th four hyperparameter re-estimation cycles, each wi th 1500 opt imisat ion steps. second M O D I S instrument). Such retrievals are largely independent of c loud cover, since clouds are semi-transparent in the microwave. For this study, I used the Tsfc measurements taken dur ing the D Y C O M S - I I research flight. O n Ju ly 11, 2001, the surface temperature was approximately 19°C. The next open question was how many hidden neurons are needed to approximate the inverse function well without adding too many degrees of freedom to the training process. As mentioned above, the work by Aires (2004) and Cerdena et al . (2007) suggested that a number on the order of 30 neurons should be sufficient. I thus trained a number of A N N s wi th different numbers of hidden neurons ranging from five to 100, and compared the M S E of the val idat ion data. F igure 4.12 shows the decrease of the error wi th an increasing number of hidden neurons. However, the curve is deceptive - the conclusion that a 100 hidden neurons network performs better in the retrieval than a 15 hidden neurons network proved wrong. In fact, f rom al l networks that were trained the most physical plausible results were obtained from the one containing 15 neurons in the hidden layer, al though its M S E was higher than that of other networks. A s noted in section 3.1, models of a higher complexity (i.e. a larger number of hidden neurons) are more susceptible to overfitt ing. However, the Bayesian regularisation in the training process should effectively prevent overfitt ing to noise in the target data (cf. section 3.2), and as noted above, there seems to be l i t t le noise in the L U T . Instead, I assume that the described behaviour, too, has to be attr ibuted to the i l l -condit ioning of the inverse problem - if a network contains more hidden neurons, there exist more possibil it ies to map the inputs to the target 114 Chapter 4. Retrieval, Results and Evaluation data. More possible mappings also mean a larger number of possible dependences, hence it becomes more l ikely that a physical ly incorrect dependence is modelled. It is thus especially in this case important to find a network wi th a good degree of complexity (cf. section 3.1). In order to investigate the problem, I selected two networks - 15 and 30 neurons in the hidden layer, referred to as 15N and 30N hereafter - and analysed some of their p roper t ies . 4 2 A l though 30N produced a better fit to the L U T (lower M S E ) , the retrieval of the Ju ly 11 scene yielded many unphysical results (for instance, negative optical thicknesses). Tables 4.2 and 4.3 show point estimates of the Jacobian at r e / / « 8/jm and B T ( l l ) « 288 K , as in Table 4.1, for 15N and 30N. The most str ik ing difference is the much larger var iabi l i ty in the 30N Jacobian, indicat ing that a less well-defined min imum in the weights error surface has been found. Th is increases the probabi l i ty of an incorrect mapping. Furthermore, while the sensitivit ies of rejf to the four inputs are similar for both networks (and at least <9re///<9BT(3.7) is in the expected range), those of Tct and r are different. For instance, as expected, 15N retrieves a good part of the Tct output from the B T ( l l ) signal. 30N, on the other hand, infers cloud top temperature almost exclusively from the surface temperature input (since the variat ion in BTD(11-12) is so small). T a b l e 4 . 2 : The same as Table 4.1, but for a network containing 15 neurons in the hidden layer and using BT(3 .7 ) , B T ( l l ) , BTD(11-12) and Tsfc as inputs (referred to as 15N in the text). dreff 1 — [fJ-m/K] dT/ — [K'/K] dr/-[K-i] SBT(3 .7 ) 2.04 ± 1 . 0 3 - 0 . 4 2 ± 0 . 7 5 - 0 . 7 8 ± 1.26 d B T ( l l ) - 3 . 3 6 ± 1 . 2 2 0.45 ± 1 . 3 4 0.37 ± 1.50 SBTD(11-12) - 6 . 4 7 ± 3 . 3 0 - 3 . 3 2 ± 2 . 3 4 -5 .31 ± 4 . 1 7 dTsfc 1.35 ± 0 . 3 6 1.04 ± 0 . 6 1 0.49 ± 0.41 T a b l e 4 . 3 : The same as Table 4.1, but for a network containing 30 neurons in the hidden layer and using BT(3 .7 ) , B T ( l l ) , BTD(11-12) and Tsfc as inputs (referred to as 30N in the text). dreff 1 — [fJ-m/K] dT 1 — [K/K] dr/-[K-i] <9BT(3.7) 1.93 ± 4 . 3 2 - 0 . 1 5 ± 1 . 4 6 - 2 . 0 5 ± 10.33 <9BT(11) - 3 . 1 9 ± 5 . 9 0 0.07 ± 2 . 5 3 1.80 ±16 .08 <9BTD(11-12) -7 .11 ± 1 5 . 0 1 - 1 . 2 0 ± 7 . 2 1 -11 .96 ± 3 8 . 3 2 dTsfc 1.34 ± 2 . 6 0 1.08 ± 1.20 0.50 ± 7 . 7 1 Since, as noted above, I do not have further independent estimates available for comparison, I cannot 4 2 T h e networks discussed here were trained with three hyperparameter re-estimation cycles with 1500 optimisation steps each, unlike the networks that were used to produce Figure 4.12. 115 Chapter 4. Retrieval, Results and Evaluation judge how reasonable the remaining sensitivities are. In order to gain more insight into the response of the two networks, I analysed the sensitivity of the retrieval to changes in cloud top pressure (or to the pressure difference between cloud top and sea surface, cf. section 2.6). Th is is important because of the sensitivity of the computed brightness temperature differences to changes in cloud top pressure in the forward model , as discussed in section 2.6. Since pct is fixed in the L U T , it is necessary to know about the retrieval response to changes in this parameter. Of course, in the forward model , changes in pct wi l l cause changes in the other cloud properties through the adiabatic assumption. It is hence quite possible that for a cloud with a different pct encountered in the scene a cloud of similar thickness, particle size and temperature, but different cloud top pressure is contained in the L U T and the desired parameters can be retrieved correctly. Nevertheless, a sensitivity analysis can provide further insight. Th is sensit ivity of the retrieval to changes in pct cannot be determined wi th the Jacobian, since cloud top pressure is not an input variable. I thus repeated the forward model runs that produced the subset of the t ra in ing L U T plotted in Figure 4.9, but this t ime the cloud top pressure was changed by ± 5 hPa . Figures 4.13 and 4.14 show scatter plots of the forward propagation of these datasets through 15N and 30N, in both cases compared to the results obtained wi th the original L U T , fixed at a cloud top pressure of 939.5 h P a . Wh i le the sensit ivity of effective radius and cloud top temperature is relatively smal l and of similar magnitude for both networks ( ± 2 fim for reff and ± 0.5 K for Tct), the retrieved optical thickness of the 30N network changes drastical ly wi th changes in cloud top pressure (up to ± 4) - in contrast to 15N, where the r retrieval is much less sensitive. Indeed, for higher cloud tops (decreased pct), 30N infers negative optical thicknesses for th in clouds. Th is behaviour might be a possible mechanism for the unphysical 30N retrievals. O f course, the point estimates of the Jacobian presented in Tables 4.1, 4.2 and 4.3 can only be used to evaluate the sensit ivity of the retrieval in the selected area of the L U T . Histograms of the Jacobian, on the other hand, can provide information on the range in which the sensit ivity varies over the entire L U T or a given scene. Figure 4.15 shows such histograms of the 15N Jacobian over the database from which F igure 4.9 was created. As expected from Figure 4.9, the sensitivities vary significantly, wi th the larger magnitudes l ikely corresponding to the thick and th in clouds. A possible appl icat ion of the Jacobian histograms could be to identify pixels for which the retr ieving network exhibits an unreasonably large sensitivity. The retrieval could thus be further constrained to pixels for which the sensit ivity is in the expected range. 116 Chapter 4. Retrieval, Results and Evaluation 4 6 8 10 12 4 6 8 10 12 Prediction r _ Prediction r _ Prediction x Prediction x F i g u r e 4 . 1 3 : Sensit ivi ty of 15N predictions if cloud top pressure (fixed in the L U T ) is varied by ± 5 h P a . In the right column, predictions of the unperturbed subset of the t ra in ing L U T plotted in Figure 4.9 are shown. The left column shows the same network-predicted cloud parameters from L U T subsets in which pct was perturbed by +5 h P a (grey) and -5 h P a (black). See text for more details. 117 Chapter 4. Retrieval, Results and Evaluation 4 6 8 10 12 Prediction r eft' Prediction r 8 10 12 cit 285.5 284.5 285 285.5 Prediction T 285.5 % 285 so S3 f— 284.5 285 285.5 Prediction T - 2 0 Prediction x 0 2 Prediction x F i g u r e 4 .14 : The same as Figure 4.13, but for the 30N network. 118 Chapter 4. Retrieval, Results and Evaluation 0 2 4 6 -1.5 -1 -0.5 0 -30 -20 -10 d ( reif ^ m ] ) 1 d ( B T 2 0 d(T [K]) / d(BT20 [K]) d(T) / d(BT20 [K]) I jlllli.M.l.lllllh.Jl! -60 -40 -20 ( d(reff[um])/d(BT31 [K]) .nit iiiiiil 0.5 1 1.5 2 d(T [K])/d(BT31 [K]) i l i i i H H i i i i „ • _ 0 20 40 60 d(T)/d(BT31 [K]) allium .iiiiiil -150 -100 -50 d(r d f [um])/d(BTD31-III 0 32 [K]) Ell l i i i . l l_ - 4 - 2 0 2 d(T [K])/d(BTD31-32 [K]) -600 -400 -200 0 d(T)/d(BTD31-32 [K]) 20 -10 60 d(r e f f[nm])/d(T f c[K]) . l l l l l l 1 • i j j 1 2 d(T [K])/d(T f c[K]) Ill -40 -30 -20 -10 d(T) /dO\ 0 [K]) F i g u r e 4 . 1 5 : Histograms of the 15N Jacobian of the L U T subset plotted in Figure 4.9. Average values are indicated by the black lines. 119 Chapter 4. Retrieval, Results and Evaluation 4.5 Retrieval Evaluation, Jacobian and Uncertainty 4.5.1 Comparison wi th In-Situ Da ta A s discussed in the previous section, the 15N network yielded the best retrieval performance. In this section, I wi l l analyse the retrieval by comparing the results to the R F 0 2 in-situ data (cf. section 2.1) and by checking the physical plausibil i ty. F igure 4.16 shows a map of the retrieved rejj values, along wi th a histogram of the cloud top measurements from R F 0 2 and a histogram of the retrieved values in the area. Since the in-situ measurements were taken about five hours after the satellite overpass, the measured values have been advected and are plotted over the clouds that l ikely have been observed from the aircraft. B y comparing against advected data I assumed that the clouds in question d id not change over the five hours. A lso, I assumed that the wind speed and direction at cloud level were constant at the values measured from the aircraft at the t ime the in-situ measurements were taken. Wh i le the cloud layer is both extensive and horizontal ly homogeneous (so that changes in wind speed and direction st i l l advect clouds wi th similar characteristics), changes due to the diurnal cycle and precipi tat ion are possible. The cloud top was l ikely lower earlier in the diurnal cycle when the M O D I S image was taken (cf. section 1.2), however, the in the L U T employed cloud top pressure is probably underestimated (I used the aircraft alt i tude just below cloud top), thereby offsetting this change to some extend. Prec ip i ta t ion, on the other hand, was significant dur ing R F 0 2 (Stevens et a l . , 2003a, Figure 6), hence, it represents a possible source of uncertainty. As noted in section 2.1, the D Y C O M S - I I data were averaged over 10 s intervals in order to yield measurements on the 1 k m scale of the M O D I S pixels. Due to the t ime lag between satellite and in-situ observations and the uncertainty in the advection, no collocated observations were possible. I hence chose distr ibutions of the retrieved parameters wi th in the box shown in F igure 4.16 and in-si tu measured values from the cloud top flight tracks of R F 0 2 as the means for comparison. Unfortunately, the selected area contains a large number of "irretrievable" pixels. The range of the retrieved effective radi i corresponds well to the range of the in-si tu measured values, both ranging from about 8 ^im to about 16 pm (Figure 4.16b,c). The shape of the distr ibut ion, however, does not agree as well. Whereas the retrieved effective radius peaks at both about 9.5 and 13 pm, the aircraft measured values are more uniformly distr ibuted, wi th several smal l peaks and one larger peak located at 11 pm. A possible cause for these discrepancies are changes in the structure of the cloud 120 Chapter 4. Retrieval, Results and Evaluation layer between the satellite and in-situ observations as described above. A lso , pixels in the vicini ty of "irretrievable" pixels might st i l l be influenced by the mechanisms that cause the anomalous pixels, e.g. overlying cirrus. Furthermore, sea surface temperature was assumed to be constant in the scene. A s I wi l l show at the end of this section, the uncertainty in the neural inversion (GTH G, cf. section 3.3) is on the order of 2 pm in the selected area; however, as discussed below, the usefulness of this value is questionable. Nevertheless, the retrieved values look physically plausible. The ship tracks are clearly discernible wi th a decreased particle radius of about 2 /zm smaller than particle sizes in the environment of the tracks, which is in the expected range (Schreier et a l , 2006). The radi i retrieved for the remaining scene also look reasonable for marine stratocumulus (values ranging from 5 to 15 pm, some structure present, but no abrupt changes). Simi lar results are obtained for cloud top temperature, shown in F igure 4.17. As expected, the ship tracks are not discernible in the temperature retrieval. Instead, the cloud deck has temperatures varying only sl ightly between values of about 284 K and 286 K , wi th smooth transit ions and no abrupt changes. The range of the retrieved temperature agrees well wi th the in-situ measurements, and for this variable the shape of the distr ibut ion also agrees well. The optical thickness of the clouds can only be judged by physical plausibi l i ty, since values of r cannot be inferred from the in-situ measurements along the horizontal flight legs. As Figure 4.18 shows, values of T vary from less than 1 to about 5, al l reasonable values for marine Sc. The ship tracks are thicker by about one to two compared to their environment, and the major i ty of the clouds are th in ( T < 2). 4.5.2 Jacobian In section 4.3, I introduced the appl icat ion of point estimates of the Jacobian to compare the A N N sensitivities wi th independent estimates. However, the information in the Jacobian can also be used in different ways to analyse the retrieval network. A s noted in section 1.5, Aires et a l . (2004b) investigated whether a P C A (principal component analysis) preprocessing of the input data can reduce the variabi l i ty in the Jacobian. To obtain a variabi l i ty representative of the entire retrieval scene, they computed an average Jacobian and its variabi l i ty from al l pixels contained in the scene. Furthermore, they normalised this mean Jacobian by the standard deviations of the input da ta (over the retrieval scene) to gain information about the relative importance of the indiv idual inputs. They argue that the normalised Jacobian can be used to refine the inversion procedure by identifying inputs that do not contribute 121 Chapter 4. Retrieval, Results and Evaluation cloud top effective radius Qxm] -123 -121 -119 -117 longitude (degrees west) (a) retrieved effective radius [urn] in-situ effective radius [pm (b) (c) F i g u r e 4 .16 : Retr ieved effective radi i for the test scene and comparison of retrieved (15N network) and aircraft measured values. The ship tracks are discernible wi th smaller droplet sizes than their environment. In-situ measurements of two cloud top flights have been advected and overlain on the retrieved data. Due to the uncertainty in the advection and the large number of irretrievable pixels in the flight area the histogram of in-situ data is compared to a histogram of the surrounding retrieved values. 122 Chapter 4. Retrieval, Results and Evaluation retrieved cloud top temperature [K ] in-situ temperature [K] (b) (c) F i g u r e 4 .17 : The same as Figure 4.16, but for cloud top temperature. 123 Chapter 4. Retrieval, Results and Evaluation cloud visible optical thickness longitude (degrees west) F i g u r e 4 .18 : The same as Figure 4.16, but for cloud opt ical thickness. Note that for the optical thickness no in-situ measurements were available. significantly to the outputs (which could consequently be eliminated). Table 4.4 shows the average Jacobian of the Ju ly 11, 2001, scene. The values are similar to those found for the point estimate in Table 4.2. O n average, the dependence of reff to B T ( l l ) is smaller than in Table 4.2, dr/dBT(ll) is larger and dreff/dTsfc and dr/dTsfc are very small . C loud top temperature is s imi lar ly dependent on both T s / C and B T ( l l ) . T a b l e 4 .4 : Average Jacobian of the Ju ly 11, 2001, scene and its variabi l i ty (15N network). dreff 1 — [f^m/K] dT/ — [K/K] <9BT(3.7) 1.47 ± 1.32 - 0 . 2 2 ± 0 . 5 6 - 0 . 9 4 ± 1.53 <9BT(11) - 1 . 2 3 ± 1.37 0.56 ± 0.80 1.07 ± 1.69 <9BTD(11-12) - 7 . 7 5 ± 6 . 1 1 - 2 . 3 9 ± 2 . 3 2 - 5 . 2 4 ± 7.86 dTsfc - 0 . 1 6 ± 0 . 5 4 0.62 ± 0 . 4 3 - 0 .04 ± 0.82 The normalised mean Jacobian is l isted in Table 4.5. The standard deviations of BT(3 .7 ) , B T ( l l ) and BTD(11-12) were obtained from the satellite data, while that of T s / C was estimated from the R F 0 2 measurements. The normalisat ion leads to some interesting results. Effective radius is indeed mainly determined by the BT(3.7) signal, while its dependence on BTD(11-12) becomes more relative. C loud top temperature sti l l is equally dependent on both B T ( l l ) and Tsfc, which also contribute most to this 124 Chapter 4. Retrieval, Results and Evaluation output (BTD(11-12) is less significant). Tsjc contributes very l i t t le to both reff and r, so that this input l ikely is main ly needed for a correct Tct retrieval (cf. F igure 4.11). T a b l e 4 .5 : The same as in Table 4.4, but normalised by the standard deviat ion of the inputs. Th is allows for a better judgement of the importance of the ind iv idual inputs. The standard deviat ion of the surface temperature input has been estimated from the D Y C O M S - I I measure-ments. dreff 1 — H dT 1 — [K] dr/-[l] <9BT(3.7) 2.36 - 0 . 3 5 -1 .51 S B T ( l l ) - 1 . 8 2 0.91 1.74 <9BTD(11-12) - 1 . 34 - 0 . 4 2 - 0 . 9 2 dTsfc - 0 . 2 0 0.78 - 0 . 0 4 When inferring information about the importance of ind iv idual inputs from an averaged Jacobian, it is important that the average indeed is representative of the scene. To verify the representativeness of the Jacobian given in Tables 4.4 and 4.5, I computed histograms of the indiv idual sensitivities of al l pixels in the scene, as was done for the L U T subset in Figure 4.15. The histograms for the Ju ly 11 scene are displayed in Figure 4.19. The average Jacobian values correspond well w i th the most often occurring sensitivities, so that the relative Jacobian in Table 4.5 indeed gives a good idea of the information content of the indiv idual inputs. The last useful representation of the Jacobian that shall be discussed in this thesis is its spatial distr i-but ion, as shown in Figures 4.20 and 4.21 for effective radius and cloud top temperature, respectively. The spatial distr ibutions of <9r e / / / t5BT(3.7) and 0 r e / / / d B T D ( l l - 1 2 ) in Figure 4.20 show a coherent picture. The absolute magnitudes of both sensitivities decrease across the ship tracks, which is expected from Figure 4.9 due to the increased space between the curves for clouds w i th effective radi i of approximately 10 p and opt ical thicknesses of about 3. Similar ly, the sensitivities for the th in clouds in the area that served for comparison wi th the in-situ data exhibit a much larger absolute magnitude - consistent wi th the converging lines in Figure 4.9 for th in clouds. A curious feature is that the negative dependence of Tct on BT(3.7) decreases in magnitude wi th increasing T (Figure 4.21, cf. F igure 4.18). A t r 3.5, the dependence becomes positive. A t the same time, the posit ive c3T c t /<9BT(l l) decreases wi th increasing r. Th is sensit ivity is largest for th in clouds. Th is behaviour can l ikely be attr ibuted to the transit ion between th in clouds that influence the surface-emitted radiat ion very l i tt le and thick clouds that emit approximately as black bodies in the thermal infrared. 125 Chapter 4. Retrieval, Results and Evaluation lllllllllll d(reff [urn]) / d(BT20 [K]) Jill. -0.5 0 0.5 d(T [K]) / d(BT20 [K]) ll. - 2 - 1 0 1 d(x) / d(BT20 [K]) - 4 - 2 0 d(reff[(im])/d(BT31 [K]) . llllllll . 0 0.5 1 d(T [K])/d(BT31 [K]) 0.5 1 1.5 2 d(T)/d(BT31 [K]) llllll dlDL -20 -10 d(reff [urn]) / d(BTD31-32 [K]) - 4 - 2 0 d(T [K])/d(BTD31-32[K]) -10 -5 0 d(T)/d(BTD31-32 [K]) H i . . d(r e f f[um])°/d(T ( c[K]) I 0.4 0.6 0.8 1 1.2 d(T[K])/d(T f c[K]) III.... -0.2 ) 0.2 d(x)/d(T f c[K]) F i g u r e 4 . 19 : Histograms of the 15N Jacobian of the Ju ly 11, 2001, scene. The average values as l isted in Table 4.4 are highlighted by the black lines. 126 Chapter 4. Retrieval, Results and Evaluation sensitivity of effective radius to BT 20 [um/K] longitude (degrees west) F i g u r e 4 .20 : Spat ia l distr ibut ion of the sensitivity of the effective radius retrieval to changes in BT(3.7) (top panel) and BTD(11-12) (bottom panel) on Ju ly 11, 2001 (15N network). 127 Chapter 4. Retrieval, Results and Evaluation sensitivity of cloud top temperature to BT 20 [K/K] longitude (degrees west) F i g u r e 4 . 2 1 : The same as Figure 4.20, but for the sensitivity of cloud top temperature to changes in BT(3.7) (top panel) and B T ( l l ) (bottom panel). 128 Chapter 4. Retrieval, Results and Evaluation Since pct was held constant in the L U T , the dependence of Tct on Tsjc should be approximately 1 for very th in clouds ( r < 1) - the cloud layer has l itt le impact on the radiances, and due to the linear temperature decrease wi th height changes in Tsfc lead to immediate changes in Tct- The sea surface also approximately emits as a black body in the infrared (cf. the subcloud layer in F igure 2.10), thus the dependence of Tct on B T ( l l ) is also expected to be close to 1. Indeed, the sensit ivi ty of Tct on Tsfc is also largest for th in clouds (not shown). If the clouds have reached an optical thickness large enough to emit approximately as black bodies at 11 pm (e.g. r »s 43 in the right panel of Figure 2.10), <9T ct/<9BT(ll) is again expected to be about 1. In between, however, B T ( l l ) is influenced through absorption in the cloud. Independently of Tct, the optical ly thicker the cloud, the smaller B T ( l l ) as the cold cloud top becomes more opaque (cf. the left panel of F igure 2.10, wi th r « 4). Simi lar effects are expected to impact the BT(3 .7) signal. Wh i l e I cannot verify this argumentation from the B T / B T D diagrams computed from my L U T (cloud top temperature is fixed in al l plots), the corresponding plot by Perez et a l . (2000) in F igure 1.9b confirms the dependence of Tct on B T ( l l ) for thick clouds (note that the surface temperature is constant in this plot). Th is example shows how valuable the Jacobian is in interpreting the behaviour of the retrieval A N N . Since the creation of diagrams simi lar to Figure 4.9 but for vary ing Tct is straightforward, I recommend a more detailed analysis of the spatial d istr ibut ion of the Jacobian in the future. 4.5.3 Uncertainty Al though the est imation of the intrinsic noise covariance matr ix Cin failed dur ing the tra in ing process, the neural uncertainty term GTH~1G could st i l l be computed and provides an estimate of the uncertainty in the retrieval due to the weights distr ibut ion. F igure 4.22 shows maps of the uncertainty (standard deviat ion computed from the variance in the diagonal elements of the matr ix) for effective radius and cloud opt ical thickness. The uncertainty in reff (r) ranges from less than 1 pm (1) in the area of the ship tracks to more than 2 pm (4) in the upper left corner of the scene. However, based on my findings in Chapter 3, I question the usefulness of these uncertainty estimates. The larger uncertainty in the upper left corner of the scene is correlated wi th neither of the retrievals -there are no signif icantly larger or smaller effective radi i , warmer or colder or opt ical ly thicker or thinner clouds in this area than in the remainder of the scene. As discussed in section 3.7, ambiguit ies in the L U T cannot be recognised by the neural network. Hence, the only reason for the increased uncertainty would be a lower data density of the type of clouds occurr ing in the area in question in the L U T - possibly 129 Chapter 4. Retrieval, Results and Evaluation the setup of my forward model caused the cloud type in the upper left corner of the scene to be more sparsely represented in the L U T compared to the other clouds in the scene. Further research is needed to clarify the meaning of the uncertainties. 4.6 Further Developments In this chapter, I analysed the retrieval results of the 15N network because it yielded the most plausible retrieval performance. A s noted, it did not produce the smallest M S E . G iven the instabil i t ies in the training process, it is l ikely that the same network architecture, t ra ined from a different weights ini t ial isat ion would not perform as well. I hence attr ibute the relatively good results to " luck" rather than a well funct ioning method. It is l ikely that a network containing a larger number of hidden units could also approximate the inverse function correctly. Possibly such a network would produce a smaller M S E and exhibit an improved retrieval performance. However, given that the inverse problem is very i l l-posed, more hidden neurons imply that it becomes more difficult to find the correct mapping. Due to the various problems I encountered, I was not able to give ul t imate answers to the questions that arose in this thesis. Instead, the work that I presented should be taken as a first step to find a good and reliable network architecture suited for the given inverse problem. I demonstrated the high potential of neural networks for appl icat ion in remote sensing, and explored several techniques that can be used to construct a stable retrieval scheme. However, my work also showed the need for further investigations. Given the results presented in this chapter, I propose, in roughly descending order of importance, to continue work in the following areas: Stabilisation of the Jacobian In order to make the training process less dependent on the network architecture and the ini t ia l isat ion of the weight values, it is necessary to decrease the variabi l i ty in the Jacobian and make the problem less i l l -condit ioned. The P C A approach suggested by Aires et al . (2004b) is not useful for this problem, since it involves reducing the number of inputs - feasible only if a large number of inputs are involved. Krasnopolsky (2007) notes alternative methods that a im at obtain ing a Jacobian wi th a low uncer-tainty for data assimilat ion purposes, such as training a separate A N N to represent the Jacobian. However, while such methods might be able to provide a good estimate of the physical Jacobian, they say nothing about the Jacobian of the retrieval network. In his work, Krasnopolsky (2007) suggests an approach based on ensembles of A N N s . Several networks of the same architecture are 130 Chapter 4. Retrieval, Results and Evaluation standard deviation r [urn] longitude (degrees west) standard deviation x longitude (degrees west) F i g u r e 4 . 2 2 : Spat ia l distr ibut ion of the uncertainty in the effective radius (top panel) and opt ical thickness (bottom panel) retrievals, as computed from the neural uncertainty term GTH1G. 131 Chapter 4. Retrieval, Results and Evaluation t rained using differently perturbed ini t ia l conditions for the A N N weights, so that each network wi l l l ikely exhibit different weight values after the training. The average of a l l networks is then used to predict the outputs and estimate the Jacobians. Of course, such an approach implies increased computat ional effort for the network t ra in ing. How-ever, since the number of inputs cannot be reduced in my retrieval problem, the method could lead to improvements. Optimisation of the Network Architecture Once the Jacobian is more stable, the network architec-ture can be optimised. Th is can be continued to be done by t ra in ing a restricted class of architectures and comparing both their val idat ion M S E and retrieval performance. For comparison, it might be worthwhile to test whether three-layer networks, such as employed by Cerdefia et a l . (2007), perform better than the two-layer architectures used in this thesis. A n alternative for f inding the opt imal number of hidden units would be to adopt the genetic a lgor i thm approach proposed by Cerdefia et al . (2007). A lso , B ishop (1995, Chapters 9 and 10) discusses further model selection tools. In my work, I restricted the number of training cycles during network t ra in ing due to the high computat ional cost. However, if a stabil ised Jacobian allows for a more structured explorat ion of different network architectures it should be ensured that the t ra in ing process is long enough to find the global m in imum in the weights error surface. Retrieval Evaluation The retrieval evaluation can be extended and improved in several ways. F i rs t , it would be desirable to compute accurate point estimates of the physical Jacob ian from the L U T , as discussed in section 4.4. Th is would significantly increase the usefulness of ANN-es t ima ted point Jacobians. Next , the comparison of retrieved and in-situ observed values by means of histograms is not very satisfactory. More work should be invested into applying the retrieval to other scenes - possibly in which satellite overpass and in-situ measurements are closer together. The D Y C O M S - I I campaign provides further scenes, as does the E P I C campaign (Bretherton et a l . , 2004). In addi t ion to the in-situ data, the A N N retrievals could also be compared to precedent and subsequent daytime retrievals from independent sources (e.g. the operational M O D I S product) . A lso, it would be useful to know precisely what caused the irretrievable pixels. If computed correctly i n the forward model , the 8.5 /um channel could help to identi fy high clouds (Baum et a l . , 2003), and the operational M O D I S algorithms also provide information about cloud height (Ackerman 132 Chapter 4. Retrieval, Results and Evaluation et a l . , 1998). For comparing the obtained retrieval results wi th those of other nightt ime retrieval schemes (e.g. Cerdeha et a l , 2007) and for estimating the true impact of the cloud layer on the upwell ing thermal flux, it would be desirable to implement the computat ion of the 11 /urn opt ical thickness into the forward model. Sensitivities to Assumptions in the Forward Model It is also important to explore the sensit ivi-ties of the retrieval to the assumptions made in the forward model. A sensit ivi ty study to fc-value and Vgam (cf. sections 2.2 and 2.3) is difficult, since in order to determine the precise effects, for each new value a new L U T would have to be computed in the current setup, and a new network would have to be trained. However, it would be possible to compute the changes in the computed T O A B T s due to changing fc and ugam, and to estimate the impact on the retrieval v ia the Jacobian. The same is val id for the sensit ivity of the retrieval to subadiabatic clouds, and the impact on T O A B T s caused by broken cloud regimes should also be investigated. In my forward model, I assumed that the satellite is directly above the cloud (cf. section 2.6). In many cases, this might not be a good assumption (in fact, I d id not check whether i t was fulfil led in the Ju ly 11 case). Hence, if a sensit ivity analysis shows a large impact of a slanted radiat ion path on the B T s observed by the satellite, the satellite zenith angle should be considered in the forward model. Overlying Atmosphere In order to make the retrieval independent of a prescribed overlying atmo-sphere, the opt ical properties of the O A could be introduced as variable parameters t* and B*, such as suggested in section 2.5. G iven the ease of including addit ional inputs into the A N N architecture, an alternative could be to obtain independent estimates of the water vapour path in the atmosphere (cf. section 2.5) and use it as an addit ional input. The network could then automat ical ly infer the bulk absorpt ion effects. A lso , it could be investigated how much informat ion is gained if the bright-ness temperatures of the clear sky pixels are obtained for al l channels, such as done by Perez et al . (2000) and Cerdefia et al . (2007). Uncertainties and Ambiguities It would be interesting to check whether the type of cloud in the upper left region of the scene is actual ly underrepresented in the L U T , as discussed in section 4.5. Th is information could help to refine the forward model so that the data density is similar for al l types of clouds. 133 Chapter 4. Retrieval, Results and Evaluation It is unsatisfying that the Aires et al . (2004a) method is not able to recognise ambiguous situa-tions. Another analysis of the L U T should be performed to estimate the actual magnitude of the occurr ing ambiguities and to determine the regimes in which they occur (for instance, th in clouds vs. thick clouds). In order to incorporate the uncertainty due to ambiguit ies into the retrieval, I propose the following. F i rs t , if we consider the ambiguities as being a part of the intr insic noise, we could discretise the input space and compute localised noise matrices C'in (where ' indicates the localisation). B y assuming that the network mapping is correct, the covariance matr ix of the errors C'Q in the selected input area could be used to approximate C'in (thereby either ignoring the neural rp 1 uncertainty term, or, if stable enough, considering localised versions oiG'H G a s well). A n alternative could be to employ a different network architecture known as mixture models (Bishop, 1995, Chapter 6). M ix tu re models are able to compute mul t imodel output distr ibut ions; they are therefore not restricted by the Gaussian assumption as the Aires et al . (2004a) method (cf. section 3.3). Wh i le this could represent an approach wi th a high potent ial to improve some of the difficulties encountered in my work, the Bayesian methods applied in this thesis could not be direct ly applied to mixture models. Hence, this approach would require an increased effort. A comprehensive uncertainty estimate requires the combinat ion of the errors of al l contr ibut ing sources. If the above problems are solved, such an estimate can be obtained by combining network uncertainty, ambiguit ies, uncertainties due to assumptions in the forward model and instrument noise of the M O D I S instrument, which can be converted to output error w i th the Jacobian. A i r e s et a l . I m p l e m e n t a t i o n Final ly , Chapter 3 raised many questions concerning the implementat ion and appl icat ion of the Aires et a l . (2004a) method. Whether we can improve the unsatisfying need to regularise the Hessian and whether the problems wi th the hyperparameter re-estimation can be solved are issues that should also be addressed in the future. 134 Chapter 5 Summary In this thesis I have investigated the feasibil ity of retr ieving cloud top effective radius, cloud optical thickness and cloud top temperature of nocturnal marine stratocumulus clouds by invert ing infrared satellite measurements using an art i f icial neural network. In Chapter 1, I described the scientific context of my work. Mar ine Sc play a cr i t ical role in the exchange of energy and water in our cl imate system. However, processes that are involved in cloud-atmosphere interact ion and their representation in general circulat ion models remain some of the pr imary uncertainties in global cl imate modell ing. Observational data that can shed light on regional variations in cloud microphysical and optical properties, their d iurnal cycle and the interact ion of clouds, precipitat ion and aerosols is important for further progress in the accurate representation of marine Sc in cl imate and weather models. A few studies have investigated the nocturnal retrieval problem and shown that it is possible to infer cloud properties from the information contained in the infrared channels centred at 3.7, 8.5, 11.0 and 12.0 / jm. However, problems wi th the standard opt imisat ion approach to these retrievals include high computat ional cost and a lack of consistent error estimates. The a im of this study was to use neural networks to design a retrieval method that is both fast and able to give such uncertainty estimates. The fundamental idea behind this approach is to approximate the inverse of the forward function that describes the dependence of top-of-atmosphere radiances on cloud parameters wi th a neural network architecture that is capable of solving nonlinear regression problems. Cerdefia et al . (2007) were the first to use neural networks to invert nocturnal measurements of the A V H R R instrument. The objective of this study was to extend their approach by investigating the appl icabi l i ty of methods proposed by Aires (2004) and Aires et al . (2004a,b) - al lowing for the estimation of uncertainties and sensitivities - to the problem, to apply the retrieval scheme to nocturnal measurements of the higher resolution M O D I S instrument, and to compare the retrieved values to in-situ measurements obtained during the D Y C O M S - I I field campaign off the coast of Cal i forn ia. M y focus lay on the uncertainty in the retrieval, and on what we can learn from retrieval sensitivit ies. M y ini t ia l 135 Chapter 5. Summary work included the construction of a forward model capable of comput ing top-of-atmosphere brightness temperatures for varying cloud parameters and the implementation of the Aires et a l . method. The topic of Chapter 2 was the theory and implementat ion of the forward model. The scene of Ju ly 11, 2001, was chosen for the retrieval, and the in-si tu aircraft measurements obtained on that day were used for the development of the forward model and the evaluation of the retrieval performance. The scene contains a number of ship tracks that provide contrasting droplet sizes which were used to check the physical consistency of the retrieval. The clouds were modelled as adiabatic plane paral lel cloud layers in the forward model and the droplet size distr ibut ion was described by a modif ied gamma distr ibut ion. Compar isons of idealised adiabatic profiles and size distr ibutions wi th in-situ measured data showed good agreement. The radiat ive transfer model l ibRadt ran (Mayer and Ky l l i ng , 2005), which incorporates the mult ip le scattering code D I S O R T (Stamnes et a l . , 1988) to solve the radiative transfer equation, was used to compute cloud top radiances. To account for the spectral intervals of the M O D I S instrument channels, I implemented correlated-k code developed by K r a t z (2001) into l ibRadt ran. Gaseous absorpt ion and emission above cloud top were accounted for by incorporat ing average transmission and emission properties of the atmosphere obtained from radiosonde soundings. The forward model proved capable of reproducing relationships between cloud top brightness temperature differences and cloud parameters that were previously described by B a u m et a l . (1994) and Perez et al . (2000). In Chapter 3, I further explored theory and implementat ion of the method proposed by Aires (2004) and Aires et a l . (2004a,b). The method ideally provides estimates of the uncertainty arising from both the neural network fit to the inverse function and the intr insic noise inherent in the data, as well as the variabi l i ty in the Jacobian that is due to the network fit. The Jacobian, describing the sensitivities of the outputs wi th respect to the inputs, is part icular ly important for analysing the dependences of the network fit in order to ensure that the network models the "correct" function. Its variabi l i ty indicates how i l l -condit ioned the regression problem is, a common problem w i th inverse problems that makes it difficult to find a good approximat ion to the inverse function. Unfortunately, I encountered several difficulties wi th the Aires et al . method that l imi ted its usefulness for the retrieval problem. Init ial tests wi th simple examples showed questionable results, w i th uncertainty intervals often not including the true value. For instance, the method was not able to recognise ambigu-ities that were expected to occur in the lookup table. Furthermore, numerical problems involved in the est imation of the Hessian matr ix of the network in some cases led to a failure in the est imation of the 136 Chapter 5. Summary uncertainty due to the intr insic noise in the data and questionable results for the uncertainty due to the network fit. The estimation of the Jacobian and its variabil i ty, however, proved promising. In Chapter 4, I applied the methods described in Chapters 2 and 3 to the retrieval scene. A lookup table consisting of 96,000 cloud profiles wi th varying effective radius, droplet number concentration and temperature was computed. Parameter ranges were obtained from the in-si tu measurements, and, for simplicity, cloud top and surface pressure were held constant at the observed values. A n average droplet size distr ibut ion width representative of the night was determined from the measurements. The brightness temperatures and cloud parameters in the lookup table were used to t ra in different architectures of neural networks. I explored several configurations wi th different inputs and numbers of hidden units in the network and compared their results based on training error, sensitivit ies and physical plausibil i ty. I restricted myself to network architectures containing one layer of h idden neurons. The major results of the retrieval investigations can be summarised as follows: • Compar ison of the computed brightness temperatures to those observed by M O D I S showed that a large number of observations were outside the computed range. M a n y of these "irretrievable pixels" could be attr ibuted to broken clouds and clear sky pixels; for the remaining pixels I found indications of overlying cirrus clouds. • Unfortunately, the computations of the 8.5 /um brightness temperatures d id not agree wi th the observations. A n error in the forward model seemed to be the l ikely cause, hence, the corresponding channel could not be used in the retrieval. • The uncertainty est imation of Aires et al . (2004a) failed for al l networks trained from the lookup table. It is possible that there was only l i tt le intrinsic noise in the lookup table, which could have amplif ied the numerical problems noted above. Wh i le I was able to obtain uncertainty estimates due to the network fit wi th a regularised Hessian, I question the usefulness of the obtained values. • The brightness temperatures at 3.7, 11 and 12 /jm alone were not sufficient for retr ieving effective radius, cloud opt ical thickness and cloud temperature. A l l networks employing only these inputs showed unphysical results in at least one output variable. • The Jacobian and its variabi l i ty proved to be a valuable tool for analysing the networks. Point estimates of the network Jacobian were compared wi th estimations of the physical Jacobian obtained from the lookup table, the average Jacobian of the satellite scene gave informat ion about the average information content of the network inputs and the i l l -condit ioning of the problem, and maps of 137 Chapter 5. Summary the Jacobian showed the spatial d istr ibut ion of the sensitivities and were interpreted for physical plausibi l i ty. • Obta in ing a good network fit to the inverse function proved to be a highly i l l -condit ioned problem. Th is led to a very unstable learning process and the high var iabi l i ty in the Jacobian made it difficult to find a network model l ing the expected dependences. • A lmost al l network architectures exhibited unphysical behaviour in at least one output variable. Th is included networks employing sea surface temperature as an addi t ional input to the satel-lite observations and networks employing brightness temperature differences instead of brightness temperatures as inputs. • A network that predicted reasonable results employed the brightness temperatures at 3.7 and 11 pm, the brightness temperature difference between 11 and 12 pm and sea surface temperature as inputs and included 15 units in the hidden layer. Satell i te image and in-si tu measurements were recorded wi th a five hour difference; histograms of advected cloud properties wi th histograms of the in-si tu measurements showed good agreement of cloud top temperature, ranges of observed and retrieved effective radi i also agreed well. Retrievals of al l three retrieved parameters were physical ly plausible, as was the Jacobian. I have demonstrated that it is feasible to use art i f icial neural networks for the retrieval of nocturnal marine stratocumulus properties. M y results are promising in as far as that despite the difficulties I encountered a good agreement between retrieved and observed data was obtained. In part icular the Jacobian proved to be a very valuable tool that has not been employed in the investigations of other authors. However, further refinements and analysis of the method are required. I discussed open questions and suggested areas for continuing work in the conclusions of Chapter 4. In conclusion, I believe that it is essential to improve the i l l -condit ioning of the inverse problem in order to achieve a more stable training process and improved retrieval results. M y work provides a foundation for future work, but further research is required to provide a reliable, stable method that can provide accurate uncertainty estimates. 138 Bibliography A b d i , H. , 2007: Encyclopedia of Measurement and Statistics, chapter Singular Value Decomposit ion (SVD) and General ized Singular Value Decomposit ion ( G S V D ) . Sage, Thousand Oaks ( C A ) . Ackerman, A . S., M . P. K i rkpat r ick , D. E . Stevens and O. B. Toon, 2004: The impact of humidi ty above strat i form clouds on indirect aerosol cl imate forcing. Nature, 432(7020), 1014-1017. Ackerman, S. A . , K . I. Strabala, P. W . Menzel , R. A . Frey, C . C . Moel ler and L. E . Gumley, 1998: Discr iminat ing clear sky from clouds wi th M O D I S . Journal of Geophysical Research, 103(D24), 32141-32158. Aires, F . , 2004: Neura l network uncertainty assessments using Bayesian statistics wi th appl icat ion to remote sensing: 1. Network weights. Journal of Geophysical Research, 109, D10303. Aires, F. , A . Ched in , N . A . Scott and W . B. Rossow, 2002: A regularized neural net approach for retrieval of atmospheric and surface temperatures wi th the iasi instrument. Journal of Applied Meteorology, 41, 144-159. A i res, F. , C . Prigent and W . Rossow, 2004a: Neural network uncertainty assessments using Bayesian statistics wi th appl icat ion to remote sensing: 2. Output errors. Journal of Geophysical Research, 109, D10304. Ai res, F . , C . Pr igent and W . Rossow, 2004b: Neural network uncertainty assessments using Bayesian statistics w i th appl icat ion to remote sensing: 3. Network Jacobians. Journal of Geophysical Research, 109, D10305. Ai res, F. , C . Prigent, W . B. Rossow and M . Rothstein, 2001: A new neural network approach including first guess for retrieval of atmospheric water vapor, cloud l iquid water path, surface temperature, and emissivities over land from satellite microwave observations. Journal of Geophysical Research, 106, 14887-14908. Albrecht , B. , 1989: Aerosols, cloud microphysics, and fractional cloudiness. Science, 245, 1227-1230. Albrecht , B. A . , D. A . Randa l l and S. Nichol ls, 1988: Observations of marine stratocumulus clouds during F I R E . Bulletin of the American Meteorological Society, 69(6), 618-626. Anderson, G . P., S. A . C lough, F . X . Kne izys , J . H . Chetwynd and E . P. Shettle, 1986: A F G L atmospheric constituent profiles (0-120km). A F G L - T R - 8 6 - 0 1 1 0 , A i r Force Geophys. L a b , Hanscom A i r FOrce Base, Bedford, Mass. A rdu in i , R. F. , P. M inn is , Smi th , J . K . Ayers, M . M . Kha iyer and P. Heck, 2005: Sensit iv i ty of satellite-retrieved cloud properties to the effective variance of cloud droplet size distr ibut ion, in Fifteenth Atmo-spheric Radiation Measurement (ARM) Science Team Meeting, Daytona Beach, FL (US), 03/14/2005-03/18/2005. Ark ing , A . and J . D. Chi lds , 1985: Retr ieval of cloud cover parameters from mult ispectral satellite images. Journal of Applied Meteorology, 24, 322-334. Aus t i n , P. H. , Y . Wang, R. Pincus and V . Ku ja la , 1995: Prec ip i ta t ion in stratocumulus clouds: Obser-vat ional and model ing results. Journal of Atmospheric Sciences, 52, 2329-2352. 139 Bibliography Barnes, W . L., T . S. Pagano and V . V . Salomonson, 1998: Pre launch characteristics of the moderate resolution imaging spectroradiometer ( M O D I S ) on E O S - A M I . Geoscience and Remote Sensing, IEEE Transactions on, 36(4), 1088-1100. B a u m , B. A . , R. F. Ardu in i , B . A . Wie l ick i , P. Minn is and S. C . Tsay, 1994: Mul t i leve l cloud retrieval using mult ispectral hirs and avhrr data: Night t ime oceanic analysis. Journal of Geophysical Research, 99, 5499-5514. B a u m , B. A . , R. A . Frey, G . G . Mace, M . K . Harkey and P. Yang , 2003: Night t ime mult i layered cloud detection using M O D I S and A R M data. Journal of Applied Meteorology, 42, 905-919. Bishop, C . M . , 1994: Neural networks and their applications. Review of Scientific Instruments, 65, 1803-1832. B ishop, C . M . , 1995: Neural Networks for Pattern Recognition, Oxford Un iv . Press. B ishop, C . M . , 2006: Pattern Recognition and Machine Learning. Springer. Blaskovic, M . , R. Davies and J . B . Snider, 1991: D iurna l var iat ion of marine stratocumulus over San Nicolas island dur ing Ju ly 1987. Monthly Weather Review, 119(6), 1469-1478. Bohren, C . F . and E . Clo th iaux, 2006: Fundamentals of Atmospheric Radiation: An Introduction with 400 Problems. W i l e y - V C H . Bohren, C . F. , J . R. Linskens and M . E . Chu rma , 1995: A t what opt ical thickness does a cloud completely obscure the sun? Journal of Atmospheric Sciences, 52, 1257-1259. Bony, S., R. Co lman , V . M . Kat tsov, R. P. A l l an , C . S. Bretherton, J . L. Dufresne, A . Ha l l , S. Hallegatte, M . M . Ho l land, V . Ingram, D. A . Randa l l , B . J . Soden, G . Tsel ioudis and M . J . Webb, 2006: How well do we understand and evaluate cl imate change feedback processes? Journal of Climate, 19(15), 3445-3482. Bony, S. and J . L. Dufresne, 2005: Mar ine boundary layer clouds at the heart of t ropical cloud feedback uncertainties in cl imate models. Geophysical Research Letters, 32, 20806+. Brenguier, J . L., H . Pawlowska, L. Schuller, R. Preusker, J . Fischer and Y . Fouquart , 2000: Radiat ive properties of boundary layer clouds: Droplet effective radius versus number concentrat ion. Journal of Atmospheric Sciences, 57, 803-821. Bretherton, C . S., T . U t ta l , C . W . Fai ra l l , S. E . Yuter , R. A . Weller, D. Baumgardner , K . Comstock, R. Wood and G . B . Raga, 2004: The E P I C 2001 stratocumulus study. Bulletin of the American Meteo-rological Society, 85, 967-977. Cerdena, A . , A . Gonzalez and J . C . Perez, 2007: Remote sensing of water cloud parameters using neural networks. Journal of Atmospheric and Oceanic Technology, 24(1), 52-63. Cerdena, A . , J . C . Perez and A . Gonzalez, 2004: C loud properties retrieval using neural networks, in K . P. Schafer, A . Comeron, M . R. Carleer, R. H . P i ca rd and N . I. Sifakis, editors, Remote Sensing of Clouds and the Atmosphere IX., volume 5571 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference, pp. 11-19. Coakley, J . A . and F . P. Bretherton, 1982: C loud cover f rom high-resolution scanner data: Detect ing and al lowing for par t ia l ly filled fields of view. Journal of Geophysical Research, 87, 4917-4932. Cornet, C , J . C . Bur iez, J . R ied i , H . Isaka and B . Gui l lemet, 2005: Case study of inhomogeneous cloud parameter retrieval from modis data. Geophysical Research Letters, 32, 13807+. Cornet , C , H . Isaka, B . Gui l lemet and F . Szczap, 2004: Neura l network retrieval of c loud parameters of inhomogeneous clouds f rom mult ispectral and mult iscale radiance data : Feasibi l i ty study. Journal of Geophysical Research, 109, 12203+. 140 Bibliography Curry, J . A . and P. J . Webster, 1999: Thermodynamics of Atmospheres and Oceans (International Geophysics). Academic Press. D 'Entremont , R. P., 1986: Low- and midlevel cloud analysis using nightt ime mult ispectral imagery. Journal of Applied Meteorology, 25, 1853-1869. Driedonks, A . G . M . and P. G . Duynkerke, 1989: Current problems in the stratocumulus-topped atmo-spheric boundary layer. Boundary-Layer Meteorology, 46, 275-303. Durkee, P. A . , R. E . Chart ier , A . Brown, E . J . Trehubenko, S. D. Rogerson, C . Skupniewicz, K . E . Nielsen, S. P la tn ick and M . D. K i n g , 2000a: Composi te ship track characteristics. Journal of Atmospheric Sciences, 57, 2542-2553. Durkee, P. A . , K . J . Noone and R. T . B lu th , 2000b: The Monterey area ship track experiment. Journal of Atmospheric Sciences, 57(16), 2523-2541. Evans, K . F., 1998: The spherical harmonics discrete ordinate method for three-dimensional atmospheric radiat ive transfer. Journal of Atmospheric Sciences, 55, 429-446. Faure, T . , H . Isaka and B. Gui l lemet, 2001a: Mapp ing neural network computat ion of high-resolution radiant fluxes of inhomogeneous clouds. Journal of Geophysical Research, 106, 14961-14974. Faure, T . , H . Isaka and B. Gui l lemet, 2001b: Neural network analysis of the radiat ive interact ion between neighboring pixels in inhomogeneous clouds. Journal of Geophysical Research, 106, 14465-14484. Faure, T . , H . Isaka and B. Gui l lemet, 2001c: Neural network retrieval of cloud parameters of inhomoge-neous and fract ional clouds - feasibil ity study. Remote Sensing of Environment, 77(2), 123-138. Faure, T . , H . Isaka and B. Gui l lemet, 2002: Neural network retrieval of c loud parameters from high-resolution mult ispectral radiometr ic data - a feasibil ity study. Remote Sensing of Environment, 80(2), 285-296. F u , Q. and K . N . L i ou , 1992: O n the correlated k-distr ibut ion method for radiat ive transfer in nonho-mogeneous atmospheres. Journal of Atmospheric Sciences, 49, 2139-2156. Gonzalez, A . , J . C . Perez, F. Herrera, F. Rosa, M . A . Wetzel , R. D. Borys and D. H . Lowenthal, 2002: Stratocumulus properties retrieval method from noaa-avhrr data based on the discret izat ion of cloud parameters. International Journal of Remote Sensing, 23(4), 627-645. Han , Q. , W . B . Rossow and A . A . Lacis , 1994: Near-global survey of effective droplet radi i in l iquid water clouds using isccp data. Journal of Climate, 7, 465-497. Hansen, P., 1994: Regular izat ion tools: A Mat lab package for analysis and solut ion of discrete i l l-posed problems. Numerical Algorithms, 6(1), 1-35. Harshvardhan, 1982: The effect of brokenness on cloud-cl imate sensitivity. Journal of Atmospheric Sci-ences, 39(8), 1853-1861. Har tmann, D. L., M . E . Ocker t -Bel l and M . L. Michelsen, 1992: The effect of cloud type on earth's energy balance: G loba l analysis. Journal of Climate, 5, 1281-1304. Heck, P. W . , W . L. Smi th , P. Minn is and D. F . Young, 1999: Mul t ispect ra l retrieval of nightt ime cloud properties for C E R E S , A R M , and F I R E , in Proceedings of ALPS 99 Symposium, Mer ibe l , France. Hobbs, P. V . , T . J . Garret t , R. J . Ferek, S. R. Strader, D. A . Hegg, G . M . Prick, W . A . Hoppel , R. F . Gasparovic, L. M . Russel l , D. W . Johnson, C. O 'Dowd, P. A . Durkee, K . E . Nielsen and G . Innis, 2000: Emissions from ships wi th respect to their effects on clouds. Journal of Atmospheric Sciences, 57, 2570-2590. 141 Bibliography Hsieh, W . W . and B . Tang, 1998: App ly ing neural network models to predict ion and data analysis in meteorology and oceanography. Bulletin of the American Meteorological Society, 79, 1855-1870. H u , Y . and K . Stamnes, 2000: Cl imate sensitivity to cloud optical properties. Tellus B, 52(1), 81-93. Hunt , G . E . , 1973: Radiat ive properties of terrestial clouds at visible and infra-red thermal window wavelengths. Quarterly Journal of the Royal Meteorological Society, 99, 346-369. Iwabuchi, H. , 2007: Retr ieval of cloud optical thickness and effective radius using mult ispectral remote sensing and accounting for 3D effects, in A . A . Kokhanovsky, editor, Light Scattering Reviews 2, pp. 97-124. Springer. K a t o , S., L. M . Hinkelman and A . Cheng, 2006: Est imate of satell i te-derived cloud opt ical thickness and effective radius errors and their effect on computed domain-averaged irradiances. Journal of Geophysical Research, 111, 17201+. Kawamoto, K. , T . Naka j ima and T . Y . Naka j ima, 2001: A global determinat ion of cloud microphysics w i th avhrr remote sensing. Journal of Climate, 14, 2054-2068. Kawanish i , T . , T . Sezai, Y . Ito, K . Imaoka, T . Takeshima, Y . Ishido, A . Shibata, M . M i u r a , H . Inahata and R. W . Spencer, 2003: The Advanced Microwave Scanning Radiometer for the E a r t h Observing Sys-tem ( A M S R - E ) , N A S D A ' s contr ibut ion to the E O S for global energy and water cycle studies. Geoscience and Remote Sensing, IEEE Transactions on, 41(2), 184-194. K i n g , M . , S. Tsay, S. Platn ick, M . Wang and K . L iou , 1997: C loud retrieval algorithms for M O D I S : optical thickness, effective particle radius, and thermodynamic phase. MODIS Algorithm Theoretical Basis Document No. ATBD-MOD-05, N A S A . K le i n , S. A . and D. L. Har tmann, 1993: The seasonal cycle of low strat i form clouds. Journal of Climate, 6, 1587-1606. Krasnopolsky, V . M . , 2007: Reducing uncertainties in neural network jacobians and improving accuracy of neural network emulations wi th nn ensemble approaches. Neural Networks, 20(4), 454-461. Krasnopolsky, V . M . , L. C . Breaker and W . H . Gemmi l l , 1995: A neural network as a nonlinear transfer function model for retr ieving surface wind speeds from the special sensor microwave imager. Journal of Geophysical Research, 100, 11033-11046. Krasnopolsky, V . M . , W . H. Gemmi l l and L. C . Breaker, 2000: A neural network mult iparameter algo-r i thm for S S M / I ocean retrievals - comparisons and validations. Remote Sensing of Environment, 73(2), 133-142. K r a t z , D. P., 1995: The correlated k-distr ibution technique as applied to the A V H R R channels. Journal of Quantitative Spectroscopy and Radiative Transfer, 53, 501-517. K r a t z , D. P., 2001: M O D I S correlated k-distr ibutions. http://asd-www.larc.nasa.gov/~kratz/modis.html, accessed A p r i l 9th, 2007. Lee, T . E . , S. D. Mi l le r , F. .1. Turk, C . Schueler, R. Ju l ian , S. Deyo, P. Di l ls and S. Wang , 2006: The N P O E S S V I I R S day/n ight visible sensor. Bulletin of the American Meteorological Society, 87, 191-199. L i n , X . and J . A . Coakley, 1993: Retr ieval of properties for semitransparent clouds from mult ispectral infrared imagery data. Journal of Geophysical Research, 98, 18501-18514. Lohmann , U . and J . Feichter, 2005: G loba l indirect aerosol effects: a review. Atmospheric Chemistry & Physics, 5, 715-737. Luo , G . , X . L i n and J . A . Coakley, 1994: 11-pm emissivities and droplet radi i for marine stratocumulus. Journal of Geophysical Research, 99, 3685-3698. 142 Bibliography M a c K a y , D. J . C , 1992a: Bayesian interpolat ion. Neural Computation, 4(3), 415-447. M a c K a y , D. J . C , 1992b: A pract ical Bayesian framework for backpropagat ion networks. Neural Com-putation, 4(3), 448-472. M a c K a y , D. J . C , 1995: Probable networks and plausible predictions - a review of pract ical Bayesian methods for supervised neural networks. Network: Computation in Neural Systems, 6, 469-505. Ma r t i n , G . M . , D. W . Johnson and A . Spice, 1994: The measurement and parameter izat ion of effective radius of droplets in warm stratocumulus clouds. Journal of Atmospheric Sciences, 51, 1823-1842. Mayer , B . and A . K y l l i n g , 2005: Technical note: The l ibRadt ran software package for radiat ive transfer calculations - description and examples of use. Atmospheric Chemistry & Physics, 5, 1855-1877. M c C a n n , D. W . , 1992: A neural network short-term forecast of significant thunderstorms. Weather and Forecasting, 7(3), 525-534. M ie , G . , 1908: Beitrage zur Opt ik tr i iber Medien, speziell kolloidaler Metal losungen. Annalen der Physik, 330, 377-445. Mi les , N . L., J . Verl inde and E . E . Cloth iaux, 2000: C loud droplet size distr ibut ions in low-level strat i form clouds. Journal of Atmospheric Sciences, 57, 295-311. M inn is , P., D. P. K r a t z , J . A . Coakley, M . D. K i n g , D. Garber, P. Heck, S. Mayor , D. F. Young and R. A rdu in i , 1995: Cloud Optical Property Retrieval (Subsystem 4-3). volume 3, pp. 135-176. N A S A R P 1376. Mlawer, E . J . , S. J . Taubman, P. D. Brown, M . J . Iacono and S. A . C lough, 1997: Radiat ive transfer for inhomogeneous atmospheres: R R T M , a validated correlated-k model for the longwave. Journal of Geophysical Research, 102, 16663-16682. Nabney, I. T . , 2002: NETLAB - Algorithms for Pattern Recognition. Springer. Naka j ima, T . and M . D. K i n g , 1990: Determinat ion of the optical thickness and effective particle radius of clouds from reflected solar radiat ion measurements. Par t I: Theory. Journal of Atmospheric Sciences, 47(15), 1878-1893. Naka j ima, T . , M . D. K i n g , J . D. Spinhirne and L. F . Radke, 1991: Determinat ion of the opt ical thickness and effective particle radius of clouds from reflected solar radiat ion measurements. P a r t II: Mar ine stratocumulus observations. Journal of Atmospheric Sciences, 48, 728-751. Naka j ima, T . Y . and T . Naka jma, 1995: Wide-area determinat ion of cloud microphysical properties from noaa avhrr measurements for fire and astex regions. Journal of Atmospheric Sciences, 52(23), 4043-4059. Norr is , J . R. and C . B . Leovy, 1994: Interannual variabi l i ty in strat i form cloudiness and sea surface temperature. Journal of Climate, 7, 1915-1925. Pawlowska, H . and J . Brenguier, 2000: Microphysica l properties of stratocumulus clouds during A C E - 2 . Tellus B, 52, 868+. Pear lmutter , B . A . , 1994: Fast exact mul t ip l icat ion by the Hessian. Neural Computation, 6(1), 147-160. Perez, J . C , P. H. Aus t in and A . Gonzalez, 2002: Retr ieval of boundary layer cloud properties using infrared satellite data during the dycoms-i i field experiment, in Proceedings 15th Symposium Boundary Layer and Turbulence, Wageningen, Netherlands. 143 Bibliography-Perez, J . C , F . Herrera, F. Rosa, A . Gonzalez, M . A . Wetzel , R. D. Borys and D. H . Lowenthal , 2000: Retr ieval of marine stratus cloud droplet size from N O A A - A V H R R nightt ime imagery. Remote Sensing of Environment, 73(1), 31-45. Petty, G . W . , 2006: A First Course in Atmospheric Radiation (2nd Ed.). Sundog Publ ish ing. Phi l l ips , T . and P. L. Barry, 2002: Clouds in the greenhouse, http://science.nasa.gov/ headlines/y2002/22apr-ceres.htm, accessed June 30th, 2007. P incus, R., M . Szczodrak, J . G u and P. H . Aus t in , 1995: Uncerta inty in cloud opt ical depth estimates made from satellite radiance measurements. Journal of Climate, 8, 1453-1462. Platn ick, S., M . D. K i n g , S. A . Ackerman, W . P. Menzel , B. A . B a u m , J . C . R ied i and R. A . Frey, 2003: The M O D I S cloud products: algorithms and examples from Ter ra . Geoscience and Remote Sensing, IEEE Transactions on, 41(2), 459-473. P la tn ick, S. and S. Twomey, 1994: Determining the susceptibi l i ty of c loud albedo to changes in droplet concentration wi th the Advanced Very H igh Resolut ion Radiometer. Journal of Applied Meteorology, 33, 334-347. P la tn ick, S. and F . P. J . Valero, 1995: A val idation of a satellite cloud retrieval dur ing A S T E X . Journal of Atmospheric Sciences, 52, 2985-3001. Press, W . H. , S. A . Teukolsky, W . T . Vetter l ing and B . P. Flannery, 2007: Numerical Recipes: The Art of Scientific Computing. Cambridge Universi ty Press. R igdon, E . E . , 1997: Not positive definite matrices - causes and cures, http://www2.gsu.edu/ ~mkteer/npdmatri.html, accessed June 18th, 2007. Ringer, M . A . , B . J . Mcavaney, N . Andronova, L. E . Bu ja , M . Esch , W . J . Ingram, B . L i , J . Quaas, E . Roeckner, C . A . Senior, B. J . Soden, E . M . Vo lod in , M . J . Webb and K . D. Wi l l i ams , 2006: G loba l mean cloud feedbacks in idealized cl imate change experiments. Geophysical Research Letters, 33, 7718+. Rosenblatt , F. , 1958: The perception: a probabil ist ic model for informat ion storage and organizat ion in the brain. Psychological Review, 65(6), 386-408. Rossow, W . B. and R. A . Schiffer, 1999: Advances in understanding clouds from I S C C P . Bulletin of the American Meteorological Society, 80 , 2261-2288. Rozendaal , M . A . , C . B. Leovy and S. A . K le i n , 1995: A n observational study of diurnal variations of marine strat i form cloud. Journal of Climate, 8, 1795-1809. Savic-Jovcic, V . and B . Stevens, 2007: The structure and mesoscale organizat ion of precipi tat ing stra-tocumulus. Journal of Atmospheric Sciences, i n p ress . Schreier, M . , A . A . Kokhanovsky, V . Eyr ing , L. Bugl iaro, H . Mannste in , B . Mayer , H . Bovensmann and J . P. Burrows, 2006: Impact of ship emissions on the microphysical , opt ical and radiat ive properties of marine stratus: a case study. Atmospheric Chemistry & Physics, 6, 4925-4942. Schuller, L., R. Bennartz, J . Fischer and J . L. Brenguier, 2005: A n algor i thm for the retrieval of droplet number concentration and geometrical thickness of strat i form marine boundary layer clouds applied to M O D I S radiometric observations. Journal of Applied Meteorology, 44, 28-38. Schuller, L., J . L . Brenguier and H . Pawlowska, 2003: Retr ieval of microphysical , geometrical, and radiat ive properties of marine stratocumulus from remote sensing. Journal of Geophysical Research, 108, 5-1 . 144 Bibliography Sharon, T . M . , B. A . Albrecht, H . H. Jonsson, P. Minn is , M . M . Kha iyer , T . M . van Reken, J . Seinfeld and R. F lagan, 2006: Aerosol and cloud microphysical characteristics of rifts and gradients in mari t ime stratocumulus clouds. Journal of Atmospheric Sciences, 63 , 983-997. Stamnes, K. , S. C . Tsay, K . Jayaweera and W . Wiscombe, 1988: Numer ica l ly stable algor i thm for discrete-ordinate-method radiative transfer in mult iple scattering and emit t ing layered media. Applied Optics, 27, 2502-2509. Stamnes, K. , S. C . Tsay, W . Wiscombe and I. Laszlo, 2000: D I S O R T , a general-purpose For t ran program for discrete-ordinate-method radiative transfer in scattering and emit t ing layered media: Documenta-t ion of methodology. Technical report, Dept. of Physics and Engineer ing Physics, Stevens Institute of Technology, Hoboken, N J 07030. Stephens, G . L., 2005: C loud feedbacks in the cl imate system: A cr i t ical review. Journal of Climate, 18, 237-273. Stephens, G . L., D. G . Vane, R. J . Boa in , G . G . Mace, K . Sassen, Z . Wang, A . J . I l l ingworth, E . J . O 'Connor , W . B. Rossow, S. L. Durden, S. D. Mi l le r , R. T . Aus t in , A . Benedett i , C . Mi t rescu and The , 2002: The Cloudsat mission and the A -T ra in . Bulletin of the American Meteorological Society, 83, 1771-1790. Stevens, B. , A . Bel jaars, S. Bordon i , C . Holloway, M . Koh ler , S. Krueger, V . Savic-Jovcic and Y . Zhang, 2007: O n the structure of the lower troposphere in the summert ime stratocumulus regime of the northeast pacific. Monthly Weather Review, 135(3), 985-1005. Stevens, B. , D. H . Lenschow, G . Va l i , H . Gerber, A . Bandy, B . B lomquis t , J . L. Brenguier, C . S. Bretherton, F . Burnet , T . Campos, S. Cha i , I. Faloona, D. Friesen, S. Ha imov, K . Laursen, D. K. Li l ly , S. M . Loehrer, S. P. Mal inowsk i , B. Morley, M . D. Petters, D. C . Rogers, L. Russel l , V . Savic-Jovcic, J . R. Snider, D. Straub, M . J . Szumowski , H . Takagi, D. C . Thornton, M . Tschudi , C . Twohy, M . Wetzel and M . C . van Zanten, 2003a: Dynamics and Chemist ry of Mar ine Stratocumulus - D Y C O M S - I I . Bulletin of the American Meteorological Society, 84(5), 579-593. Stevens, B. , D. H . Lenschow, G . Va l i , H . Gerber, A . Bandy, B. B lomquis t , J . L. Brenguier, C . S. Bretherton, F . Burnet , T . Campos, S. Cha i , I. Faloona, D. Friesen, S. Haimov, K . Laursen, D. K. Li l ly , S. M . Loehrer, S. P. Mal inowsk i , B . Morley, M . D. Petters, D. C . Rogers, L. Russel l , V . Savic-Jovcic, J . R. Snider, D. Straub, M . J . Szumowski, H . Takagi, D. C . Thornton, M . Tschud i , C . Twohy, M . Wetzel and M . C . van Zanten, 2003b: Supplement to Dynamics and Chemis t ry of Mar ine Stratocumulus -D Y C O M S - I I (flight summaries). Bulletin of the American Meteorological Society, 84(5), S12-S25. Stevens, B. , G . Va l i , K . Comstock, R. Wood , M . C . van Zanten, P. H . Aus t i n , C . S. Brether ton and D. H. Lenschow, 2005: Pockets of open cells and drizzle in marine stratocumulus. Bulletin of the American Meteorological Society, 86, 51-57. Turner, D. D., A . M . Vogelmann, R. T . Aus t in , J . C . Barnard , K . Cady-Pere i ra , J . C . C h i u , S. A . Clough, C . F l y n n , M . M . Kha iyer , J . Li l jegren, K . Johnson, B. L i n , C . Long , A . Marshak, S. Y . Matrosov, S. A . McFar lane, M . Mi l le r , Q. M i n , P. Minn is , W . O'Hi rok , Z . Wang and W . Wiscombe, 2007: T h i n l iquid water clouds: Thei r importance and our challenge. Bulletin of the American Meteorological Society, 88(2), 177-190. Twohy, C . H. , M . D. Petters, J . R. Snider, B . Stevens, W . Tahnk, M . Wetzel , L. Russel l and F. Burnet, 2005: Eva luat ion of the aerosol indirect effect in marine stratocumulus clouds: Droplet number, size, l iquid water path, and radiat ive impact. Journal of Geophysical Research, 110, 8203+. Twomey, S., 1974: Po l lu t ion and the planetary albedo. Atmospheric Environment, 8(12), 1251-1256. Twomey, S., 1977: The influence of pol lut ion on the shortwave albedo of clouds. Journal of Atmospheric Sciences, 34, 1149-1154. 145 Bibliography Twomey, S. and T . Cocks, 1989: Remote sensing of cloud parameters from spectral reflectance in the near-infrared. Beitrdge zur Physik der Atmosphdre, 62 , 172-179. van Deist, P., 2005: Sensor SpcCoeff data, http://cimss.ssec.wisc.edu/~paulv/, accessed September 6th, 2007. Vaughan, M . A . , S. A . Young, D. M . Winker , K . A . Powel l , A . H. Omar , Z. L i u , Y . H u and C . A . Hostetler, 2004: Fu l ly automated analysis of space-based l idar data: an overview of the C A L I P S O re-trieval algorithms and data products, in U . N . Singh, editor, Laser Radar Techniques for Atmospheric Sensing., volume 5575 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference, pp. 16-30. Wal lace, J . M . and P. V . Hobbs, 2006: Atmospheric Science, Volume 92, Second Edition: An Introductory Survey (International Geophysics). Academic Press. Weisstein, E . W . , 2007: M a t h W o r l d - a Wol f ram web resource, http://mathworld.wolfram.com, accessed June 5th, 2007. Wie l ick i , B . A . , E . F . Harr ison, R. D. Cess, M . D. K i n g and D. A . Randa l l , 1995: Miss ion to planet earth: Role of clouds and radiat ion in climate. Bulletin of the American Meteorological Society, 76(11), 2125-2154. Wi l l i ams, C . K . I., C . Qazaz, C . M . Bishop and H. Zhu, 1995: O n the relat ionship between Bayesian error bars and the input data density, in Artificial Neural Networks, 1995., Fourth International Conference on, pp. 160-165. Wi l l i ams, K . and G . Tsel ioudis, 2007: G C M intercomparison of global cloud regimes: present-day eval-uat ion and cl imate change response. Climate Dynamics, 29(2), 231-250. Wood , R., 2007: Cancel lat ion of aerosol indirect effects in marine stratocumulus through cloud thinning. Journal of Atmospheric Sciences, 64(7), 2657-2669. Wood , R., C . S. Bretherton and D. L. Har tmann, 2002: D iurna l cycle of l iqu id water path over the subtropical and tropical oceans. Geophysical Research Letters, 29 , 7-1. X u , K . M . , T . Wong, B. A . Wie l ick i , L. Parker and Z. A . E i tzen , 2005: Stat is t ical analyses of satellite cloud object data from C E R E S , part I: Methodology and prel iminary results of the 1998 E l Ni f io/2000 L a N ina . Journal of Climate, 18, 2497-2514. Xue , H. , G . Feingold and B. Stevens, 2007: The role of precipi tat ing cells in organizing shallow cumulus convection. Journal of Atmospheric Sciences, i n p ress . 146 Appendix A Implementation of the Aires et al. Method in Matlab A . l Modified Netlab Functions Table A . l lists al l N E T L A B functions that were modif ied for this study. Abou t half of the functions implementing M L P s in N E T L A B had to be modified in order to accommodate the matr ix hyperparameters Ain and Ar, and some addit ional functions were implemented as well. However, only the most important changes and implementat ion details wi l l be discussed in this appendix. A.2 Main Loop: mlatrain The implementat ion of A lgor i thm 3.1 in m l a t r a i n is straightforward. Fol lowing the N E T L A B design, the network is stored in a structure n e t (cf. Nabney, 2002) . It is t ra ined using the generic N E T L A B function n e t o p t : n e t = n e t o p t ( n e t , o p t i o n s , x , t , ' s c g ' ) ; The covariance matr ix C o is computed from the network predictions (evaluated wi th mla f wd) and the target variables by using the M A T L A B function cov: y = m l a f w d ( n e t , x ) ; e p s i l o n = y — t ; C O = c o v ( e p s i l o n ) Next , G and H are computed using m l a d e r i v and mlahess , and (G H G) is evaluated: G = m l a d e r i v ( n e t , x ) ; [ h e s s , h d a t a ] = m l a h e s s ( n e t , x , t ) ; i n v h e s s = i n v ( h e s s ) ; G H G a v g = z e r o s ( n e t . n o u t ) ; f o r n = 1 : n d a t a 147 Appendix A. Implementation of the Aires et al. Method in Matlab T a b l e A . l : L is t of N E T L A B functions that were adapted or created in order to accommodate the matr ix hyperparameters Ain and Ar (cf. Nabney, 2002, Table 5.1). or iginal funct ion new function function changes mlp m la create two-layer M L P scalar hyperparameters have been replaced by the mat r ix ones mlpbkp mlabkp backpropagate error gra-dient through network none m l p d e r i v m l a d e r i v evaluate derivatives of none network outputs w i th respect to weights m l p e r r m l a e r r evaluate error function implementat ion of (3.22) instead of (3.24) mlpfwd mlafwd forward propagation none m lpgrad m lagrad evaluate error gradient ful l backpropagat ion wi th mat r ix hyperparameters mlphdotv mlahdotv evaluate the product of the Hessian wi th a vector 7£{-}-a lgor i thm has been adapted to the new error funct ion mlphess mlahess evaluate the Hessian ma- use ful l covariance matr ix t r ix Ar instead of a mlppak mlapak combine weights and bi -ases into one parameter vector none mlpunpak mlaunpak separate parameter vector into weight and bias ma-trices none — m l a t r a i n implementat ion of A lgo-r i thm 3.1 — — m l a r e s c a l e normalise input or target variables — — m l a r e v e r s e r e s c a l e undo the rescaling of m l a r e s c a l e — — mlaJacob evaluate the Jacobian ma-t r ix — — mlaJacobcheck verify the Jacobian matr ix wi th finite differences — — m l a j a c o b u n c e r t a i n t y compute the uncertainty in the Jacobian matr ix — 148 Appendix A. Implementation of the Aires et al. Method in Matlab G n = s q u e e z e ( G ( n , : , : ) ) ; G H G = G n ' * i n v h e s s * G n ; G H G a v g = G H G a v g + G H G ; e n d G H G a v g = G H G a v g / n d a t a ; Final ly , the hyperparameters are re-estimated following equations (3.44) and (3.59): C i n = C O - G H G a v g ; n e t . A i n = i n v ( C i n ) ; w = n e t p a k ( n e t ) ; n e w a l p h a = ( o n e s ( n e t . n w t s , 1) — d i a g ( n e t . A r ) . * d i a g ( i n v h e s s ) ) . / w ' . * 2 ; n e t . A r = d i a g ( n e w a l p h a ) ; A.3 Gradient Computation with Backpropagation: mlagrad The derivative of the error function wi th respect to the weights is needed by the opt imisat ion algor i thm and is computed in mlagrad . The function is based on the method of error back-propagation (Bishop, 1995, Section 4.8). In short, the total error function is decomposed into error terms of the indiv idual patterns of the t ra in ing dataset, so that d E - = Y ™ l . ( A . I ) dwn diva J n The error En depends on the weight Wji through the summed input aj = Y^iwjizi t o u r u t i> hence we can write ^ . ^ • S L - V , , (A . 2 , ovjji oaj ovjji where the notat ion daj &i = ^ (A-3) has been introduced and aj = ^tWjiZi has been differentiated to give daj/dvjji = Zi. Since the inputs Zi to a part icular unit are known, the 6s have to be computed. For the output units, FtEn h = ~g'{ak)^f- (A.4) oyk (since yk = p(afc)), whereas Bishop (1995) shows that for the hidden units Sj = g'(aj)Y2wkj6k. (A.5) 149 Appendix A. Implementation of the Aires et al. Method in Matlab Since the error derivatives of the hidden units depend on the Ss of the output units, the algor i thm is called back-propagation. The derivative of the output act ivat ion function in our case is g'(a,k) = 1 for the output units, since g is a linear function. For the in N E T L A B implemented sum-of-squares error funct ion (3.23), En = \Y,{ynk-tnk)\ (A.6) z k=i the output 5s evaluate to 6k = -^- = --2-(ykl-tnk)=ynk-tnk. (A.7) However, for the Aires error function (3.22), En = \ (Vn - tnf • Ain • (yn - tn) + ~ w T A r w (A.8) (where the weights term has been divided by N to express its contr ibut ion to the error of a single pattern n) , the derivative wi th respect to the weights becomes dEn _ dE% daj | 1 8EW ^ dwji daj duiji N dvjji The weights term can be evaluated direct ly to give dEw d dwr dwr {wrAr^ws} = Ar^ws, (A.10) where al l weights are considered to be a vector wi th a single index. The data term is computed wi th back-propagation, leading to the output 6s ^ = f f = ^ { | ( ^ - * E ) ^ ( y r - * D } (A. i i) = ^ ( y P - * D - (A- 1 2) Here, the property that Ain is symmetr ic has been used. The above equations can be implemented in a few lines in M A T L A B : d e l o u t = ( y — t ) * n e t . A i n ; g d a t a = m l a b k p ( n e t , x , z , d e l o u t ) ; w = m l a p a k ( n e t ) ; 150 Appendix A. Implementation of the Aires et al. Method in Matlab g p r i o r = w * n e t . A r ; g = g d a t a + g p r i o r ; Fi rs t , a l l 6k are computed with (A. 12). Then the data term of the error gradient is computed wi th back-propagation (mlabkp) and the weights term (A. 10) is added. A . 4 The Hessian with the Pearlmutter 7Z{-}-Algorithm: mlahess, mlahdotv Nabney (2002) uses an algor i thm derived by Pearlmutter (1994) to compute the Hessian matr ix of the network. The idea is to use an efficient algori thm to compute the product vTH of a vector v wi th the Hessian, and to evaluate the ful l Hessian by using a sequence of uni t vectors that each pick out one column of H. Pearlmutter uses the notat ion TZ{} for the operator vT\7, so that His derivation, a summary of which can be found in Bishop (1995, Section 4.10.7), leads to two expressions for the derivative of the error function wi th respect to the first and second layer weights, respectively: where the 6s are the standard back-propagation expressions given by (A.4) and (A.5). A s was the case for m lagrad , the N E T L A B implementat ion had to be adapted to the new 6k. Us ing (A.12), v <TH = vTV{VE)=TZ{VE}. (A.13) (A.14) (A.15) n{sk} = n{At{yi-ti)} = n{Afnyl}-H{Afntl} ^ J (A.16) (A.17) =o = n{At}yl + Atn{yl} (A.18) =o = A?nK{yi}. (A.19) 151 Appendix A. Implementation of the Aires et al. Method in Matlab Together wi th the results Tl{aj} (A.20) TZ{Zj} (A.21) n{yk} j j (A.22) 7c {5,} = g" (aj)TZ{aj}^2wkjSk + g' (aj)^TvkjSk + g' (a,-) 'Y^WkjIl {8k} k fc fc (A.23) (see B ishop, 1995, Section 4.10.7), the algori thm is implemented in mlahdotv . Aga in , al l 5kS in (A.19) can be evaluated in one vector: | r d e l = r y * n e t . A i n ; In m lahess , the data Hessian part of the tota l Hessian given by (3.27) is evaluated using mlahdotv , then the weights part AR is added: f o r v = e y e ( n e t . n w t s ) ; h d a t a ( f i n d ( v ) , : ) = m l a h d o t v ( n e t , x , t , v ) ; e n d h = h d a t a + n e t . A r ; A.5 The Jacobian and its Dis t r ibut ion: mlajacob, mlajacobuncertainty A s Nabney (2002, Chapter 5) notes, the Jacobian matr ix for a two layer M L P can be computed with 8 M which is based on a tanh hidden unit act ivation function. The implementat ion of (A.24) in m la jacob is straightforward. Fol lowing Nabney's radial basis functions implementat ion (Nabney, 2002, Section 6.4.2), I used the shortcut fyji = ( l — z?) Wji, leading to [ y , z , a ] = m l a f w d ( n e t , x ) ; f o r n = l : n d a t a P s i = ( o n e s ( n e t . n i n , 1 ) * ( 1 — z ( n , : ) . " 2 ) ) . * n e t . w l ; j a c ( n , : , : ) = P s i * n e t . w 2 ; e n d 152 Appendix A. Implementation of the Aires et al. Method in Matlab Note that the function returns a three dimensional array which contains the Jacobian for each indiv idual input pattern. The Aires et a l . (2004b) technique to estimate the variabi l i ty in the Jacobian (see section 3.5) is implemented in m l a j a c o b u n c e r t a i n t y . Samples from the weights distr ibut ion are generated wi th the M A T L A B STATISTICS T O O L B O X function mvnrnd, which returns random samples from a mult ivariate Gaussian distr ibut ion (here, according to (3.30), the most probable weights vector w* is used as the mean of the distr ibut ion and the inverse Hessian H 1 is the covariance matr ix) : i n v H = i n v ( H ) w m p = m l a p a k ( n e t ) ; w s a m p l e s = m v n r n d ( w m p , i n v H , n s a m p l e s ) ; For each of these weights samples, the Jacobian is computed and averaged over al l input patterns xn: m j a c = z e r o s ( n s a m p l e s , n e t . n i n , n e t . n o u t ) ; f o r i = 1: n s a m p l e s n e t i = m l a u n p a k ( n e t , w s a m p l e s ( i , : ) ) ; j a c = m l a j a c o b ( n e t i , x ) ; m j a c ( i , : , : ) = s q u e e z e ( m e a n ( j a c ) ) ; e n d Eventual ly, the mean and standard deviat ion of al l mean Jacobians are computed and returned: m e a n j a c = s q u e e z e ( m e a n ( m j a c ) ) ; s t d j a c = s q u e e z e ( s t d ( m j a c . ) ) ; A.6 Normalisation of Input and Output Variables The functions m l a r e s c a l e and m l a r e v e r s e r e s c a l e implement the normal isat ion methods mentioned in section 3.6.1. m l a r e s c a l e implements the simple linear rescaling (3.61): n r m . m e a n = m e a n ( D ) ; n r m . n o r m = s t d ( D ) ; R D = D — ( o n e s ( n d a t a , 1 ) * n r m . m e a n ) ; R D = R D . / ( o n e s ( n d a t a , 1 ) * n r m . n o r m ) ; A l l variables are passed as a matr ix D and processed al l at once. Whitening, the more sophisticated linear rescaling method (3.62), is also implemented in m l a r e s c a l e : n r m . m e a n = m e a n ( D ) ; n r m . s i g = c o v ( D ) ; [ e v U , e v L ] = e i g ( n r m . s i g ) ; n r m . e v U = e v U ; n r m . e v L = e v L ; 153 Appendix A. Implementation of the Aires et al. Method in Matlab R D = D — ( o n e s ( n d a t a , 1 ) * n r m . m e a n ) ; R D = ( e v L " ( - . 5 ) * e v U ' * R D ' ) ' ; The corresponding code to reverse both these rescaling methods is implemented in m l a r e v e r s e r e s c a l e . A.7 Implementation Difficulties: Regularisation of the Hessian and Numerical Symmetry Whi le working wi th the implementat ion of the Aires et a l . a lgor i thm, I encountered two difficulties. One of them has been discussed in section 3.6.2, the posit ive definite character of the Hessian. The Nabney (2002) method is implemented at the corresponding places in the m l a * routines: [ h e s s , h d a t a ] = m l a h e s s ( n e t , x , t ) ; [ e v e c , e v l ] = e i g ( h d a t a ) ; e v l = e v l . * ( e v l > 0) ; h d a t a = e v e c * e v l * e v e c ' ; h e s s = h d a t a + n e t . A r ; Another problem I encountered while working wi th my implementat ion is that the Hessian matrices that are computed (and consequently their inverses) are not completely numerical ly symmetr ic, as would be expected. Th is means that due to numerical inaccuracies in M A T L A B , the elements hij of the Hes-sian follow hij = hji + e, w i th e being a smal l perturbat ion. Such smal l errors can ampli fy consider-ably when the matrices are mult ip l ied wi th other matrices. Furthermore, the funct ion mvnrnd used in m l a j a c o b u n c e r t a i n t y expects a numerical ly symmetric covariance matr ix . However, the issue is easily resolved by copying one half of the matr ix into the other half where needed. 154 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0052641/manifest

Comment

Related Items