End-to-End Network Tuning Sends Data Screaming from NERSC to NOAA
When it comes to moving large datasets between DOE’s National Energy Research Scientific Computing Center and his home institution in Boulder, Colo., Gary Bates is no slouch. As an associate scientist in the Earth System Research Lab of the National Oceanic and Atmospheric Administration (NOAA), Bates has transferred hundreds of thousands of files to and from NERSC, as part of a “reforecasting” weather forecasting project.
The “reforecasting” project, led by NOAA’s Tom Hamill, involves running several decades of historical weather forecasts with the same (2012) version of NOAA’s Global Ensemble Forecast System (GEFS). Among the advantages associated with a long reforecast dataset is that model forecast errors can be diagnosed from the past forecasts and corrected, thereby dramatically increasing the forecast skill, especially in forecasts of relatively rare events and longer-lead forecasts.
The GEFS weather forecast model used in this project is the same as that which is currently run in real time by the National Weather Service. In the reforecast project, GEFS forecasts were made on a daily basis from 1984 through early 2012, out to a forecast lead of 16 days. To further improve forecasting skills, an ensemble of 11 realizations was run each day, differing only slightly in their initial conditions.
In 2010, the NOAA team received an ALCC allocation of 14.5 million processor hours on NERSC supercomputers to perform this work. In all, the 1984-2012 historical GEFS dataset now totals over 800 terabytes, stored on the NERSC HPSS archival system. Of the 800 terabytes at NERSC, the NOAA team sought to bring about 170 terabytes back to NOAA Boulder for further processing and to make it more readily available to other researchers. Because of the large quantity of data involved, having the data move as quickly and easily as possible is important, both on the network and at both end points in Oakland and Boulder.