898 VOLUME 17W E A T H E R A N D F O R E C A S T I N G q 2002 American Meteorological Society An Ensemble Analysis of Forecast Errors Related to Floating Point Performance STEPHEN J. THOMAS Scientific Computing Division, National Center for Atmospheric Research,* Boulder, Colorado JOSHUA P. HACKER University of British Columbia, Vancouver, British Columbia, Canada MICHEL DESGAGNE´ Recherche en Prévision Numérique, Environment Canada, Dorval, Quebec, Canada ROLAND B. STULL University of British Columbia, Vancouver, British Columbia, Canada 23 March 2001 and 25 October 2001 ABSTRACT The dynamical core of the Mesoscale Compressible Community (MC2) model is described. Ensemble forecast techniques for high-resolution mesoscale simulations are applied to assess the impact of floating point optimi- zation, mathematics libraries, and processor configuration on forecast accuracy. It is shown that the iterative solver in the dynamical core is most sensitive to processor configuration, but it also shows weak sensitivity to the usage of fast mathematics libraries and floating point instruction reordering. Semi-implicit pressure solver errors are amplified in the physical parameterization package, which is sensitive to small pressure differences and feeds back to the dynamical solution. In this case, local rms spreads are around 18C in temperature by the end of a 42-h forecast. It is concluded that careful validation is required when changing computing platforms or introducing fast mathematics libraries. 1. Introduction In recent years there has been a rapid shift in the weather forecasting and research communities away from the use of centralized forecast products toward running mesoscale models on a regional or even local basis. For example, both the University of Washington (UW) and the University of British Columbia (UBC) have set up real-time forecasting systems. The Penn- sylvania State University–National Center for Atmo- spheric Research (Penn State–NCAR) fifth-generation Mesoscale Model (MM5) has been run in production mode at many sites (Mass and Kuo 1998), including UW, whereas the Mesoscale Compressible Community model (MC2) and Wisconsin Nonhydrostatic Modeling * The National Center for Atmospheric Research is sponsored by the National Science Foundation. Corresponding author address: Dr. Stephen J. Thomas, Scientific Computing Division, National Center for Atmospheric Research, 1850 Table Mesa Dr., Boulder, CO 80303. E-mail: thomas@ucar.edu System are run at UBC. In both cases the model pro- duction runs are carried out on microprocessor-based shared or distributed-memory clusters (Baillie et al. 1997). To sustain a significant percentage of peak per- formance, code restructuring in combination with com- piler optimizations and fast mathematics (‘‘math’’) li- braries are required. The most aggressive optimizations and libraries must be applied with care because they may seriously degrade the accuracy of a fluid flow sim- ulation if the model code or the compiler is not robust. It is for this reason that we believe ensemble analysis techniques can be useful in assessing the impact of op- timization strategies on atmospheric models with com- plex physical parameterization packages. In this note we apply ensemble forecast techniques to assess the effect of floating point optimizations, fast math libraries, and processor configuration on model error, which have been shown to dominate in the first few hours of forecast time (Stensrud et al. 2000; Orrell et al. 2002). Our study is based on the nonhydrostatic fully compressible MC2 limited-area atmospheric mod- el. The MC2 is widely used in Canadian universities and at Environment Canada for mesoscale and micro- AUGUST 2002 899N O T E S A N D C O R R E S P O N D E N C E scale atmospheric research. A detailed description of the adiabatic kernel and numerical formulation of the MC2 model with open boundaries is given in Thomas et al. (1998). In particular, the model employs a semi- implicit semi-Lagrangian time discretization scheme. The use of a terrain-following height coordinate (Gal- Chen and Sommerville 1975) results in a nonsymmetric linear system to solve every time step for the pressure. An iterative minimal residual Krylov solver has thus been implemented, and the computational efficiency of the model using line relaxation preconditioners designed for both hydrostatic and nonhydrostatic flow regimes is presented. The use of an iterative solver on a parallel computer leads to spread in the ensemble errors, and this spread is amplified by the physical parameterization package in combination with fast math libraries and changing processor configurations. Atmospheric models based on explicit time stepping schemes such as the Penn State–NCAR MM5 would not be subject to the same forecast error spreads because the pressure is bit-reproducible and does not depend on the processor configuration. Indeed, physical parame- terizations are strictly column-based and do not require interprocessor communication for distributed-memory computation. However, several operational centers in- cluding Canada (Côté et al. 1998) use semi-implicit models based on iterative solvers, and our results in- dicate that model validation should also account for the variation in the pressure with processor configuration. To establish that the error spread in the physics is in fact due to the iterative pressure solver, a sensitivity study was performed using one processor by varying the iterative solver convergence tolerance. 2. MC2 model formulation The MC2 is a nonhydrostatic fully compressible lim- ited-area model formulated with respect to a horizon- tally homogeneous, time-invariant, and hydrostatically balanced base state reference atmosphere (subscript 0). Thermodynamic variables, temperature T, and normal- ized log pressure q 5 ln(p/p0) are decomposed into a sum of the base state and perturbations: T 5 T 1 T9, q 5 q (z) 1 q9,0 0 where ]q g0 5 2 ]z RT0 and the equation of state is p 5 rRT. The gas constant for dry air is R, g 5 cp/cy and cp 2 cy 5 R, where cp and cy are specific heats at constant pressure and vol- ume, r is density, z is geometric height, and g is grav- itational acceleration. After introduction of the base state, the governing equations are given by Dv 5 2RT=q9 1 Bk 2 f kö 3 v Dt 2DT9 RT Dq9 N RT90 02 5 2T w 2 = · v0Dt c Dt g cp y Dq9 g 5 2g= · v 1 g w, 2Dt c0 where v is the velocity, f is the Coriolis force, B 5 gT9/ T0 is the buoyancy, and c0 is approximately the speed of sound. The semi-implicit scheme results in an elliptic prob- lem to solve every time step for a log pressure pertur- bation q9 about a stationary isothermal hydrostatic basic state: 2 2 2 2 2 (2) 21 (1)(1 2 c Dt ¹÷ )q9 2 c Dt D N D q9 5 Q* .0 h 0 z z q9 This nonsymmetric system of equations resulting from a terrain-following height coordinate system is solved using the generalized minimal residual (GMRES) al- gorithm of Saad and Schultz (1986). Skamarock et al. (1997) use the mathematically equivalent generalized conjugate residuals (GCR) algorithm of Eisenstat et al. (1983) in a semi-implicit formulation of a compressible model. The solver convergence criteria are based on the rms divergence. Computational efficiency (overall wall- clock time) is improved by finding a suitable precon- ditioner, which is often problem dependent. The discre- tized elliptic operator in a nonhydrostatic pressure solver will be dominated by the vertical terms when the aspect ratio DX/DZ is large. Therefore, an effective precon- ditioning strategy is to invert the vertical components of the elliptic operator, and Skamarock et al. (1997) apply a vertical alternating direction implicit (ADI) line relaxation preconditioner. A vertical ADI line relaxation preconditioner for the n 3 n linear system Ax 5 b is based on the splitting A 5 H 1 V, where the H and V represent the horizontal and vertical components of the discrete elliptic operator based on centered second-order finite differences. The ADI iteration is derived from the pseudo–time integra- tion of the heat equation ut 5 Du 1 r to steady state, where the matrix A represents the discrete Laplacian and k11 k(I 2 bV)x 5 (I 1 bH)x 2 bb. The largest possible pseudo–time step b is chosen so that the above integration scheme remains stable. A slightly more implicit scheme can be constructed using a line Jacobi relaxation scheme. For nonhydrostatic problems on isotropic grids, a fully 3D ADI precon- ditioner is implemented as in Skamarock et al. (1997). The solution of tridiagonal linear systems of equations implies global data dependencies; thus, a parallel data transposition strategy has been adopted in a distributed- memory implementation of the 3D ADI preconditioner. For a distributed-memory model of parallel compu- 900 VOLUME 17W E A T H E R A N D F O R E C A S T I N G TABLE 1. MC2 execution time on an SGI Origin 2000 (h:min). PX 3 PY 1 3 1 2 3 2 2 3 4 3 3 4 4 3 4 1D Jacobi 3D ADI 29:09 32:51 7:18 8:13 3:34 3:38 2:37 2:42 2:00 2:19 tation, the Ni 3 Nj 3 Nk computational grid is parti- tioned across a PX 3 PY logical processor mesh. A do- main decomposition in the horizontal direction is em- ployed because of the strong vertical coupling in phys- ical parameterization packages and since the number of grid points in the vertical direction is typically one order of magnitude less than in the horizontal. Each processor therefore contains Ni/PX 3 Nj/PY 3 Nk points. For both algorithms the communication overhead associated with boundary data exchanges between subdomains is min- imal when compared with computations. The 1D ver- tical ADI and Jacobi line relaxation preconditioners are also well suited to a horizontal decomposition, since the only global data dependency is in the vertical direction within tridiagonal solvers. However, the 3D ADI pre- conditioner requires global data in each of the three coordinate directions in order to solve tridiagonal linear systems of equations during each ADI sweep. Thus, the right-hand side b and solution xk must be remapped to perform line relaxations in each coordinate direction in turn. Such a remapping takes the form of a data trans- position algorithm requiring collective all-to-all com- munication of O(N 3) grid points. Vertical sweeps in the Z direction are performed using the domain decompo- sition described above. Each processor contains Ni 3 Nj/PY 3 Nk/PX grid points for sweeps in the X direction and Ni/PY 3 Nj 3 Nk/PX points in the Y direction. ADI sweeps progress from left to right and then right to left as indicated below with arrows representing commu- nication steps: N NN N N Nj ji k i k3 3 N ⇔ N 3 3 ⇔ 3 N 3 .k i jP P P P P PX Y Y X Y X To compare the computational efficiency of the model using the parallel 1D Jacobi and 3D ADI line relaxation schemes, a quasi-hydrostatic test case was run on a Silicon Graphics, Inc., (SGI) Origin computer with 16 processors. A 120 3 120 3 35 grid at 2.5-km horizontal resolution with model lid set at 23 km was employed in a 30-h mesoscale forecast over the British Columbia lower mainland. The integration consisted of 1800 time steps of length Dt 5 60 s using both 1D and 3D pre- conditioners. The results are summarized in Table 1 us- ing optimization level O3 (described in section 3). De- spite the fact that the 3D ADI preconditioner results in a faster solver convergence rate, model execution times are close. 3. Ensemble analysis The effect of compiler optimizations along with the use of fast math libraries on the accuracy of high-res- olution mesoscale forecasts is an unexplored subject. One possible way of studying these effects is to generate an ensemble of forecast runs, regarding optimization and processor (PE) configuration as sources of model error. Given three optimization levels and four proces- sors, two simple ensembles are generated. The same meteorological test case is run 12 times, without in- cluding physical parameterization schemes, but varying optimization levels and processor configurations. The 12 runs compose an ensemble, and standard analysis techniques can reveal error growth and forecast spread as a function of optimization, library usage, and floating point instruction reordering. A second ensemble is gen- erated by running the MC2 with the external physics package to determine the relative contribution of model error from the dynamical core as compared with the physics. The use of different PE configurations and op- timization levels are considered perturbations to the model, although they may be systematic rather than sto- chastic. Three optimization levels were identified for the 195- MHz R10000 processor on an SGI Origin 2000 (in this paragraph, compiler option codes are given in paren- theses). The first level of optimization (O1) allows some minimal code changes by the compiler. Interprocedural analysis (-IPA) is applied at the second level of opti- mization (O2) to improve both instruction and data cache usage through code movement. The most ag- gressive level of optimization (O3) permits floating point instruction reorderings that could adversely affect the precision of certain computations such as division. Fast math libraries (-lfastm) are also employed and pre- cision may be reduced in exchange for speed. To eval- uate these optimizations in the context of a real-time mesoscale forecasting system, the computational do- main and related run-time parameters were taken from daily runs of the UBC ensemble forecasting system (http://spirit.geog.ubc.ca/wxfcst). Horizontal resolution is 10 km at 608N with the model lid at 19 km. The grid contains 120 3 70 points on 35 terrain-following levels, implying the run is quasi-hydrostatic with DX/DZ 5 20. Vertical line Jacobi preconditioning is therefore applied in all our tests. Initial and boundary conditions are obtained from a coarse grid (DX 5 30 km) MC2 run starting at 0000 UTC 27 November 1997. A 42-h integration requires 1260 time steps of length Dt 5 120 s. For the ensemble that includes physics, the full Recherche en Prévision Numérique (RPN) physics package described in Mailhot et al. (1997) is used. A forecast initialized at 0600 UTC 27 November 1997 is chosen as a benchmark since it is a representative me- soscale case for the British Columbia lower mainland. In particular, it exhibits typical autumn and winter charac- teristics, including moist, southwesterly flow at low levels; AUGUST 2002 901N O T E S A N D C O R R E S P O N D E N C E TABLE 2. Optimization level O1, on a 195-MHz R10000. PX 3 PY 1 3 1 1 3 2 1 3 3 2 3 2 Wall clock User Mflops Mflops/CPU 31 395 30 975 53.6 53.6 16 666 16 339 101.5 50.8 11 169 10 828 153.2 51.1 8034 7879 210.6 52.6 TABLE 3. Optimization level O2, on a 195-MHz R10000. PX 3 PX 1 3 1 1 3 2 1 3 3 2 3 2 Wall clock User Mflops Mflops/CPU 28 597 28 223 58.8 58.8 14 593 14 341 115.7 57.8 9707 9449 175.6 58.5 7420 7252 228.8 57.2 TABLE 4. Optimization level O3, on a 195-MHz R10000. PX 3 PY 1 3 1 1 3 2 1 3 3 2 3 2 Wall clock User Mflops Mflops/CPU 26 862 26 828 61.8 61.8 13 919 13 679 121.3 60.6 8769 8578 193.4 64.5 7099 6935 239.2 59.8 a persistent upper-level trough offshore; and dynamically forced precipitation enhanced by steep topography. The dynamic forcing came from a weak surface cold frontal passage supported by a short-wave trough aloft. Upstream boundary wind speeds were on the order of 10 m s21 at the surface. Wall-clock execution times (including input/ output) are summarized in Tables 2 and 3. Note in par- ticular that single processor performance does not exceed 65 million floating point operations per second (Mflops) or 15% of the rated peak performance of the micropro- cessor. Forecast spread suggests model error resulting from choices in processor configuration and optimization lev- el. Unlike many ensemble studies, a control run to char- acterize deterministic model error is not available. In ensemble forecasting, a control run is often the cate- gorical forecast, initialized with a ‘‘best guess’’ analysis. A single processor run without any optimization could be taken as the control run since it may be what the model developers intended. Given codes of such com- plexity this is difficult to quantify. Alternately, the con- trol can be any member of the ensemble, chosen ran- domly. However, one case study cannot robustly deter- mine the statistical properties of the ensemble and a randomly chosen member has no significance. Ensemble forecasters often look for forecast spread and a reduced ensemble-averaged error. The performance of the en- semble mean per se is not of interest here and sensitivity to initial conditions is not an issue. Because this is a single case without a true control run, it is unwise to select a top performer in this experiment or to average the results looking for an improvement. However, sig- nificant deviation of one group of forecasts from the others would indicate the presence of model error. Each ensemble member is initialized identically and common boundary conditions are applied throughout the 42-h integration period. The 0-h forecasts may de- viate nominally from each other, because grid staggering and destaggering require calculations that may be han- dled differently. Boundary value determination is sub- ject to the same errors and by examining the spatial distribution of forecast spread such effects can be quan- tified. Forecast spread is defined quadratically as 1/2N1 2S(x, y) 5 [c (x, y) 2 c (x, y)] , (1)O j5 6N j51 where c is any forecast variable, is the ensemblec mean, and N is the number of ensemble members. This study evaluates temperature T and vertical ve- locity W, because those two parameters largely deter- mine the state of the model. The mass distribution is mostly described by T, while W is unique for a given wind solution. The vertical structure of the spread and the impact of optimization and processor configuration on the RPN physics package are shown in Fig. 1. It is clear for both vertical velocity and temperature that the spread in the ensemble is greater when the model is run with the full-physics package (dashed line). The dy- namics do not use the functions available in the fast math library, whereas the physics do in many places. When physics are included, the spread increases with height. Without physics the spread is greater in the mid- dle of the troposphere. In all cases after 36 h of forecast time, the spread of the physics ensemble is at least 2 times as large as the spread of the no-physics ensemble, showing that the library and processor configuration dif- ferences affect the physics more than the dynamics. Spreads in Fig. 1 increase rapidly at the beginning of the forecasts and then grow slowly or even decrease in some cases. This behavior is consistent with observed model error that follows a square-root curve (Orrell et al. 2002). Later in the forecast, we would expect sen- sitivity to initial conditions to take over, giving expo- nential-like error growth (e.g., Errico and Baumhefner 1987). The small domain here will restrict such error growth, explaining the weak growth in Fig. 1. Keeping track of how each ensemble member devi- ates from the ensemble mean gives us an idea of spe- cifically how the fast math library and processor con- figuration affect the forecast. Figure 2 is a plot of do- main-averaged absolute deviations for each member in the ensembles. Lower values indicate less contribution to the spread. Most of the error growth is contained within the first few hours of the forecast. Because of the large spread near the top of the troposphere, that level was chosen for demonstration, and only temper- ature deviations are shown. The ensemble with physics is on the left (Fig. 2a) and the dynamics-only ensemble is on the right (Fig. 2b). Note the different vertical axis scales and that the deviations are an order of magnitude greater when physics are included. Each configuration 902 VOLUME 17W E A T H E R A N D F O R E C A S T I N G FIG. 1. Ensemble spread of (left) temperature (T ) and (right) vertical velocity (W ) averaged over the model grid points for computational levels at approximately (a), (b) 5, (c), (d) 4200, and (e), (f ) 19 000 m above the surface. The dashed line is calculated from the ensemble with physics, and the solid line is from the ensemble without physics. is plotted with a different line style, and all 12 members are plotted. Different optimization levels and libraries have nearly identical deviations from the mean, since the curves appear superimposed in Fig. 2. Thus, the processor configuration is almost completely responsi- ble for the ensemble spread. In a global model, error growth is bounded by the climatological variance (Er- rico and Baumhefner 1987). Common boundary con- ditions in this ensemble run on a small, limited-area domain further restrict error growth. This results in an error saturation that is well below where it might occur on a global domain. Because the error growth is faster for the ensemble with physics included, saturation ap- pears to be reached near where the curves are flat and the spread may have reached its limit as early as 24 h into the forecast (Fig. 2a). Conversely, many of the curves in Fig. 2b tend upward at the end of the forecast period, suggesting that the slower-growing spread in the dynamics-only ensemble has not yet reached its limit. Plotting the spread for one forecast time without av- eraging shows that a variety of calculations performed in the MC2 are susceptible to processor configuration, but the physics are more sensitive. Figure 3 shows the temperature spread approximately 5 m above the surface (first model level) and 19 000 m above the surface for both ensembles, valid near the peak spreads in the phys- AUGUST 2002 903N O T E S A N D C O R R E S P O N D E N C E FIG. 2. Deviation from the ensemble mean for (a) the ensemble with physics included and (b) the ensemble with only the MC2 dynamical core. All 12 members of the ensemble are plotted, and each processor configuration uses a different line style (for three optimization levels each) as given in the legend. The differences between the three optimization levels are nearly undetectable. These results are for temperature at 19 000 m. ics ensemble (Fig. 2) at forecast hour 30. It is again clear that the spread of the ensemble with physics (Figs. 3a and 3c) is an order of magnitude larger than the ensemble without physics (Figs. 3b and 3d). The 5-m spreads (Figs. 3a and 3b) are smaller, corresponding with Fig. 1, and do not appear to be correlated with topography as might be expected. The mountain ranges in this domain run parallel to the coastline, while the spread pattern for the no-physics ensemble is oriented perpendicular to the coastline, and the physics-included ensemble spread does not have an organized pattern at all. Figure 3c shows the importance of boundary con- dition contamination, but the large upper-tropospheric spread patterns are also oriented with the topography. Most of these results are for temperature because it is a sensible weather parameter that is easy to understand. The vertical velocity behaves similarly (Fig. 1). Because precipitation processes are closely (and nonlinearly) tied to vertical velocity, these results imply that a large spread in forecast precipitation events could result in some cases, especially when it is weakly forced. To estimate the relationship between topography and the spread of the ensembles, correlation coefficients be- tween the spread of W and topography are calculated (Fig. 4). Topographically forced vertical velocity is han- dled entirely in the dynamical core of the MC2, re- gardless of the external physical parameterizations. Ver- tical velocities themselves are well correlated with to- pography, especially with cross-barrier flow as in this case. A strong correlation between the spread of vertical velocity and the topography would indicate that the nu- merics associated with vertical advection may introduce errors with different processor configurations. Figure 4 shows a positive correlation through most of the fore- cast, although the correlation is never particularly strong. As might be expected from Fig. 3, the correlation in the upper troposphere for the no-physics ensemble is the greatest and the no-physics correlations are in gen- eral higher than the correlations for the physics ensem- ble. The overall weak correlations suggest that the to- pographically forced vertical velocities in the model dy- namics are not prone to large errors from processor configuration. The correlation reduction in the physics ensemble near the end of the forecast time also indicates that the spread in W is more closely connected with errors resulting from the physical parameterizations, which are decoupled from the topography. The results presented here suggest that PE configu- ration is the primary source of error in the iterative pressure solution. The errors by themselves may be small. When a full physical parameterization package is included, the pressure errors propagate through the physics and positively feed back to degrade the full model solution. The RPN physics package treats each atmospheric column separately, so processor configu- ration should not directly affect the physics output. To further test this hypothesis, runs were made with dif- ferent elliptic solver convergence tolerances. The runs were made on one PE to eliminate processor configu- ration as a source of error, varying the iterative solver convergence tolerances from e 5 1024 to e 5 1025. Both of these result in converged solutions, but some small differences will result. Again, runs were made both with and without physics, giving a total of four 904 VOLUME 17W E A T H E R A N D F O R E C A S T I N G FIG. 3. Ensemble spread for temperature as defined in Eq. (1). (a), (b) Spreads at 5 m and (c), (d) spreads at 19 000 m. (a), (c) Results from the ensemble with full physics, and (b), (d) results from the ensemble with no physics. Note the difference scales for each map. Peak values are in the largest category defined by the grayscale bars. FIG. 4. Correlation coefficients between the spread of vertical velocity and the topography of the MC2 domain. The three levels are the same as are shown in Fig. 1. Results from the no-physics ensemble are shown with a solid line and results from the physics ensemble are shown dashed. runs. The results are shown in Fig. 5, which can be compared with Fig. 1. Although the overall magnitude of the spreads is smaller, the fact that the physics results in an order-of-magnitude increase in spread over the dynamics-only case is similar. 4. Conclusions Perhaps the best way to evaluate the impact of code optimization strategies for microprocessors on atmo- spheric flow simulations is by using theoretical test cas- AUGUST 2002 905N O T E S A N D C O R R E S P O N D E N C E FIG. 5. Same as in Fig. 1. Runs were made with different convergence criteria for the iterative solver. es with analytic or converged solutions. In such cases, a loss of accuracy can be isolated and easily corrected. With a fully configured atmospheric model interfaced to complex physical parameterization packages, the task is more difficult. One possible approach is to view ag- gressive compiler optimizations, fast math library usage, and parallel-processor topology as possible sources of floating point errors or perturbations of the model and boundary conditions. Ensemble forecast techniques can then be applied to determine the nature of the error growth that may result from these configuration options. If we interpret ensemble spread as the potential for mod- el error from compiler optimization, fast math libraries, and processor configuration, the feedback between the iterative pressure solution and the physics can generate significant errors in the forecasts. Spread from an en- semble with only dynamics is an order of magnitude smaller than a run with the full RPN physics package. The deviations between ensemble members that use the fast math libraries and those that do not are nominal, showing that the spread is primarily a result of processor configuration. This is to be expected for the dynamics- only case. The iterative solver is the main source of differences, because the pressure is not bit-reproducible because of a global summation when computing inner products in the iterative solver. However, changing processor con- figurations while using the physics package has the po- tential to degrade the forecast further. For a realistic forecast, the physics must be included, so it is up to the forecaster to determine whether the errors are within acceptable bounds. In this case, the implied model errors 906 VOLUME 17W E A T H E R A N D F O R E C A S T I N G averaged over the domain were relatively small and on the order of 0.58C in temperature. Additional study is required to quantify the effects that the implied vertical velocity errors can have on sensible weather parameters such as precipitation. Errors of this magnitude may only be important in select cases. Other published model verification studies can help to put these errors into context. Stensrud et al. (1999) evaluated Eta Model forecasts produced at the National Centers for Environmental Prediction (NCEP), run on a regional domain with DX 5 29 km. For 12-h forecasts, they found mean absolute errors (MAEs) in temperature that ranged from 0.938 to 1.668C between 85.0 and 10.0 kPa. At 36 h, the range of errors was from 1.168 to 1.738C. Hou et al. (2001) verified an ensemble designed to approximate forecast error resulting from initial-con- dition error and found mean 85.0-kPa temperature errors between 1.08 and 2.08C at 36 h. Both of those studies used larger domains than this study. With more rec- ognition that model error is large even at short ranges (Stensrud et al. 2000, Orrell et al. 2002), it is important to understand its sources, including those related to pro- cessor topology and/or floating point performance. The results here suggest that processor topology can produce errors that are 10%–25% of the magnitude of total ini- tial-condition and model error. These results are con- servative because of the small domain. For a larger do- main, it is expected that the errors could grow to larger values and they are not limited by the boundary con- ditions as they are here. Also, the suggested feedback between the pressure solution and the physics solution means that other sources of model error, such as incor- rect physical parameters, may result in greater model error than different PE configurations. It must be noted that experiments performed with an earlier math library and operating system resulted in errors on the order of 58C, which are large enough to be important most of the time. Those differences were caused primarily by the use of the fast math library. In other words, these results do not extend to all computing platforms. Our results also show that the influence of topography on spread is weak. This was examined by looking at the spread in vertical velocity, because ver- tical velocity itself is expected to be highly correlated with topography. The no-physics ensemble is generally better correlated with topography. The physics and no- physics ensembles behave differently, suggesting that other factors are at least as important. These results have implications for investigators try- ing to verify a numerical weather prediction model over one or more seasons. It is recognized that a proper ver- ification will cover a period during which the model is static. That is to say the physics and the dynamical core do not change over the verification period. If this is important, the number of processors must remain static as well. Within this context, there is no way to tell which forecast is the most accurate because we have not in- cluded any information about model biases, observa- tions, or observation errors. Regardless, the results show that processor configuration, and possibly the use of fast math libraries, can lead to forecast error that should be considered when a new compiler, operating system, or fast math libraries are installed. Acknowledgments. We thank SGI, NCAR, and the Canadian Meteorological Centre for providing dedicat- ed computer time. The comments of three anonymous reviewers have greatly improved the manuscript. REFERENCES Baillie, C., J. Michalakes, and R. Skalin, 1997: Regional weather modeling on parallel computers. Parallel Comput. 23, 2135– 2142. Côté, J., S. Gravel, A. Méthot, A. Patoine, M. Roch, and A. Staniforth, 1998: The operational CMC–MRB Global Environmental Mul- tiscale (GEM) model. Part I: Design considerations and for- mulation. Mon. Wea. Rev., 126, 1373–1395. Eisenstat, S. C., H. C. Elman, and M. H. Schultz, 1983: Variational iterative methods for nonsymmetric systems of linear equations. SIAM J. Numer. Anal., 2, 345–357. Errico, R., and D. Baumhefner, 1987: Predictability experiments using a high-resolution limited-area model. Mon. Wea. Rev., 115, 488– 504. Gal-Chen, T., and R. C. Sommerville, 1975: On the use of a coordinate transformation for the solution of the Navier–Stokes equations. J. Comput. Phys., 17, 209–228. Hou, D., E. Kalnay, and K. Droegemeier, 2001: Objective verification of the SAMEX 98 ensemble study. Mon. Wea. Rev., 129, 73– 91. Mailhot, J., R. Sarrazin, B. Bilodeau, N. Brunet, and G. Pellerin, 1997: Development of the 35-km version of the operational re- gional forecast system. Atmos.–Ocean, 35, 1–28. Mass, C. F., and Y.-H. Kuo, 1998: Regional real-time numerical weather prediction: Current status and future potential. Bull. Amer. Meteor. Soc., 79, 185–200. Orrell, D., L. Smith, J. Barkmeijer, and T. Palmer, 2002: Model error in weather forecasting. Nonlinear Processes Geophys., in press. Saad, Y., and M. Schultz, 1986: GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 7, 856–869. Skamarock, W. C., P. K. Smolarkiewicz, and J. B. Klemp, 1997: Preconditioned conjugate-residual solvers for Helmholtz equa- tions in nonhydrostatic models. Mon. Wea. Rev., 125, 587–599. Stensrud, D., H. E. Brooks, J. Du, S. Tracton, and E. Rogers, 1999: Using ensembles for short-range forecasting. Mon. Wea. Rev., 127, 433–446. ——, J.-W. Bao, and T. T. Warner, 2000: Using initial condition and model physics perturbations in short-range ensemble simulations of mesoscale convective systems. Mon. Wea. Rev., 128, 2077– 2107. Thomas, S. J., C. Girard, R. Benoit, M. Desgagné, and P. Pellerin, 1998: A new adiabatic kernel for the MC2 model. Atmos.–Ocean, 36, 241–270.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Faculty Research and Publications /
- Ensemble Analysis of Forecast Errors Related to Floating...
Open Collections
UBC Faculty Research and Publications
Ensemble Analysis of Forecast Errors Related to Floating Point Performance. Thomas, Stephen J.; Hacker, Joshua P.; Desgagne, Michel; Stull, Roland B. Aug 31, 2002
pdf
Page Metadata
Item Metadata
Title | Ensemble Analysis of Forecast Errors Related to Floating Point Performance. |
Creator |
Thomas, Stephen J. Hacker, Joshua P. Desgagne, Michel Stull, Roland B. |
Publisher | American Meteorological Society |
Date Issued | 2002-08 |
Description | The dynamical core of the Mesoscale Compressible Community (MC2) model is described. Ensemble forecast techniques for high-resolution mesoscale simulations are applied to assess the impact of floating point optimization, mathematics libraries, and processor configuration on forecast accuracy. It is shown that the iterative solver in the dynamical core is most sensitive to processor configuration, but it also shows weak sensitivity to the usage of fast mathematics libraries and floating point instruction reordering. Semi-implicit pressure solver errors are amplified in the physical parameterization package, which is sensitive to small pressure differences and feeds back to the dynamical solution. In this case, local rms spreads are around 1°C in temperature by the end of a 42-h forecast. It is concluded that careful validation is required when changing computing platforms or introducing fast mathematics libraries. Copyright 2002 American Meteorological Society (AMS). Permission to use figures, tables, and brief excerpts from this work in scientific and educational works is hereby granted provided that the source is acknowledged. Any use of material in this work that is determined to be “fair use” under Section 107 of the U.S. Copyright Act or that satisfies the conditions specified in Section 108 of the U.S. Copyright Act (17 USC §108, as revised by P.L. 94-553) does not require the AMS’s permission. Republication, systematic reproduction, posting in electronic form, such as on a web site or in a searchable database, or other uses of this material, except as exempted by the above statement, requires written permission or a license from the AMS. Additional details are provided in the AMS Copyright Policy, available on the AMS Web site located at (http://www.ametsoc.org/) or from the AMS at 617-227-2425 or copyright@ametsoc.org. |
Genre |
Article |
Type |
Text |
Language | eng |
Date Available | 2011-04-11 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0041843 |
URI | http://hdl.handle.net/2429/33496 |
Affiliation |
Science, Faculty of Earth and Ocean Sciences, Department of |
Citation | Thomas, Stephen J., Hacker, Joshua P., Desgagné, Michel, Stull, Roland B. 2002. Ensemble Analysis of Forecast Errors Related to Floating Point Performance. Weather and Forecasting 17(4) 898-906. |
Publisher DOI | 10.14288/1.0041843 |
Peer Review Status | Reviewed |
Scholarly Level | Faculty |
Copyright Holder | Stull, Roland B. |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 52383-Stull_AMS_2002_WF898.pdf [ 625.95kB ]
- Metadata
- JSON: 52383-1.0041843.json
- JSON-LD: 52383-1.0041843-ld.json
- RDF/XML (Pretty): 52383-1.0041843-rdf.xml
- RDF/JSON: 52383-1.0041843-rdf.json
- Turtle: 52383-1.0041843-turtle.txt
- N-Triples: 52383-1.0041843-rdf-ntriples.txt
- Original Record: 52383-1.0041843-source.json
- Full Text
- 52383-1.0041843-fulltext.txt
- Citation
- 52383-1.0041843.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.52383.1-0041843/manifest