Reach Scale Hydrology
Surface meteorological forcing downscaled from NLDAS-2/HRRR, StageIV, and PRISM, over the continental United States
1-km 1-hourly CONUS Meteorological Forcing on NWM Grid
As part of data infrastructure building by the Hydrology Team at the Center for Western Weather and Water Extremes (CW3E), a meteorological forcing engine is developed to produce long-term (1979 to near real time), high-resolution (1-km, 1-hourly), and national-scale (CONUS) forcing data to support research and applications in hydrologic modeling and forecasting. The configuration of this forcing data product is set up to serve the demanding needs at the center, for example, to build near-real-time (NRT) forecasting system for various regions and to perform large scale modeling research for the National Water Model (NWM) (both its current generation WRF-Hydro and its NextGen) in collaboration with the CIROH consortium .
Methods of Production
Here an elevation (topography) based downscaling and merging procedure is established for all input forcing variables (precipitation, temperature, humidity, short-/long-wave radiation, pressure, and wind) according to the existing literature (Cosgrove et al., 2003) as well as AORC practices. This forcing engine ingests a series of inputs from different sources with different temporal/spatial resolutions, domains, reliability, period of coverage, and lag time, and generates multiple streams of forcing data, two of which are made available to the public: retrospective data (1979 to ~7 months behind real time) and near-real-time (NRT) data (up to the present day).
Input data
A number of near-real-time and historical data products from different agencies (including CW3E) are collected and updated on a daily basis:
NLDAS-2 meteorological forcing
HRRR surface analysis fields
Stage-IV precipitation
Real-time version
Archive version (after 10-day lookback)
PRISM precipitation and temperature
Provisional version
Recent History version
Historical Past version
MRMS precipitation (to be implemented)
CW3E West-WRF Forecast (forecast time horizon only)
Eight variables (precipitation, 2-m air temperature, downward shortwave/longwave, specific humidity, and wind U/V) are produced using the procedures shown in the table below. NLDAS-2, Stage-IV and PRISM are the backbone and HRRR is used only in the most recent 3.5 days. Since the longer the wait the more data available, more reliable data will replace less reliable ones once it becomes available. For example, NLDAS-2 is a reanalysis, thus more reliable than HRRR analysis and will replace the latter once available.
Downscaling/merging procedures
The downscaling procedure follows a series of commonly adopted physical principles, for example, a fixed air temperature lapse rate against elevations, hydrostatic pressure profile, and emission temperature adjustment for longwave radiation. All the downscaling procedures are performed on the lat/lon grid to 0.01° resolution and then the final results are reprojected onto NWM’s 1-km LCC grid. See the following table for details.
Data streams across time horizons and update schedule
Since the longer the wait the more data available, more reliable data will replace less reliable ones once it becomes available. For example, NLDAS-2 is a reanalysis, thus more reliable than HRRR analysis and will replace the latter once available. The same thing happens between Stage IV Real-time version and Stage IV Archive version, and between PRISM Provisional version and Recent History version. As older/less reliable data versions/sources are overwritten by newer versions/sources, multiple streams of data are created and rolling forward in time.
For the purpose of hydrologic modeling and forecasting, we set up a couple of time horizons: retrospective, near-real-time (NRT), short-range forecast, and seasonal forecast (Figure 2). The forcing data in the retrospective period is fairly stable - to be revised for bug fixes and other quality improvements, and the forcing data in the NRT period would be subject to frequent updates. Given the space limitations, we make two streams of data products available to the public:
Retrospective product
1979-01-01 to ~7 month behind real time
updated monthly
precipitation based on Stage IV (2002-) and NLDAS-2 (1979-2001) and matched against PRISM at daily (1981-) and monthly level (1979-1980)
temperature based on NLDAS-2 and matched against PRISM at daily (1981-) and monthly level (1979-1980)
data within recent 7 months may be provided but matched against PRISM Provisional (less stable - will be eventually replaced by PRISM Recent History)
Near-Real-Time (NRT) product
most recent few years, up to the current day
updated daily
precipitation based on Stage IV (Real-time and Archive) and NLDAS-2 (gap-filling)
other fields based on NLDAS-2 and HRRR
Data format
NetCDF format and grid projection
All the forcing data is in NetCDF format and follows the CF convention, thus most NetCDF-capable software will be able to read the data and interpret the meta information (e.g., time stamp, grid/projection settings). The naming of forcing variables and the units/sign definition all follow the WRF-Hydro convention such that the files can be read directly by the WRF-Hydro model. See the table for the list of 8 variables.
The data is 1-hourly and labeled in UTC time. Following NCEP conventions, the precipitation is the mean flux in the previous hour, for example, precipitation value time labeled at 12 UTC is the mean between 11 UTC and 12 UTC.
The data is in Lambert Conformal Conic (LCC) projection at 1-km resolution. The WKT projection parameters are as follows:
PROJCS["Lambert_Conformal_Conic",
GEOGCS["GCS_Sphere",
DATUM["D_Sphere",
SPHEROID["Sphere",6370000.0,0.0]],
PRIMEM["Greenwich",0.0],
UNIT["Degree",0.0174532925199433]],
PROJECTION["Lambert_Conformal_Conic_2SP"],
PARAMETER["false_easting",0.0],
PARAMETER["false_northing",0.0],
PARAMETER["central_meridian",-97.0],
PARAMETER["standard_parallel_1",30.0],
PARAMETER["standard_parallel_2",60.0],
PARAMETER["latitude_of_origin",40.0],
UNIT["Meter",1.0]]
The data grid has 3840 rows and 4608 columns (1000 m grid spacing in both x and y directions), though the actual data is bounded between 25°N to 53°N in latitude and between 125°W and 67°W in longitude.
Here is one example dump of the NetCDF header (repeated metadata like missing values and coordinates for data variables are abridged):
dimensions:
time = UNLIMITED ; // (24 currently)
x = 4608 ;
y = 3840 ;
nv4 = 4 ;
variables:
double time(time) ;
time:standard_name = "time" ;
time:long_name = "Time" ;
time:units = "minutes since 2023-12-31 00:00" ;
time:calendar = "standard" ;
time:axis = "T" ;
double lon(y, x) ;
lon:standard_name = "longitude" ;
lon:long_name = "longitude" ;
lon:units = "degrees" ;
lon:_CoordinateAxisType = "Lon" ;
lon:bounds = "lon_bnds" ;
double lon_bnds(y, x, nv4) ;
double lat(y, x) ;
lat:standard_name = "latitude" ;
lat:long_name = "latitude" ;
lat:units = "degrees" ;
lat:_CoordinateAxisType = "Lat" ;
lat:bounds = "lat_bnds" ;
double lat_bnds(y, x, nv4) ;
float T2D(time, y, x) ;
T2D:standard_name = "air_temperature" ;
T2D:long_name = "Air Temperature" ;
T2D:units = "K" ;
T2D:coordinates = "lat lon" ;
T2D:_FillValue = -9.99e+08f ;
T2D:missing_value = -9.99e+08f ;
float Q2D(time, y, x) ;
Q2D:standard_name = "specific_humidity" ;
Q2D:long_name = "Specific Humidity" ;
Q2D:units = "1" ;
float PSFC(time, y, x) ;
PSFC:standard_name = "air_pressure" ;
PSFC:long_name = "Pressure" ;
PSFC:units = "Pa" ;
float U2D(time, y, x) ;
U2D:standard_name = "eastward_wind" ;
U2D:long_name = "U Wind" ;
U2D:units = "m/s" ;
float V2D(time, y, x) ;
V2D:standard_name = "northward_wind" ;
V2D:long_name = "V Wind" ;
V2D:units = "m/s" ;
float SWDOWN(time, y, x) ;
SWDOWN:standard_name = "surface_downwelling_shortwave_flux_in_air" ;
SWDOWN:long_name = "Downward Shortwave Radiation" ;
SWDOWN:units = "W/m^2" ;
float LWDOWN(time, y, x) ;
LWDOWN:standard_name = "surface_downwelling_longwave_flux_in_air" ;
LWDOWN:long_name = "Downward Longwave Radiation" ;
LWDOWN:units = "W/m^2" ;
float RAINRATE(time, y, x) ;
RAINRATE:standard_name = "precipitation_flux" ;
RAINRATE:long_name = "Precipitation" ;
RAINRATE:units = "kg/m^2/s" ;
File and folder organization
The data files are organized by years and each data file contains data for one day. The naming follows WRF-Hydro’s convention:
[YYYY]/[YYYYMMDD].LDASIN_DOMAIN1
In Python, you can use the following lines to create file names:
from datetime import datetime
t = datetime(2024, 9, 20)
datafile = f'{t:%Y}/{t:%Y%m%d}.LDASIN_DOMAIN1'
An example folder structure will look like:
├── 2023
│ ├── 20230101.LDASIN_DOMAIN1
│ ├── 20230102.LDASIN_DOMAIN1
⠇ ⠇
│ └── 20231231.LDASIN_DOMAIN1
└── 2024
├── 20240101.LDASIN_DOMAIN1
├── 20240102.LDASIN_DOMAIN1
⠇
Common tools to manipulate data
For visualization and simple plotting purposes, we recommend NASA’s Panoply Viewer. It handles the map projections very nicely. It’s developed on top of Java thus requires a Java Runtime, which is available on both Windows and Linux.
For a quick peek into the data, we highly recommend the Ncview tool by David W. Pierce at Scripps Institution of Oceanography. You can install it with the Python package management tool conda. Ncview doesn’t project the data in LCC but it’s much faster and simpler than Panoply.
For data operations like reprojection, regridding, variable extraction, time/space subsetting, and time/space averaging, we recommend the cdo (Climate Data Operators). For simpler operations like compression, decompression, and metadata management, the nco (NetCDF Operators) tool also works.
For example, to regrid the data to a 0.1°x0.1° lat-lon grid, we can do:
# Create lat/lon grid definition
cat << EOF > latlon_conus_0.1deg.txt
gridtype = lonlat
xsize = 580
ysize = 280
xfirst = -124.95
xinc = 0.1
yfirst = 25.05
yinc = 0.1
EOF
# Use cdo remapbil operator to regrid the data
cdo -f nc4 -z zip remapbil,latlon_conus_0.1deg.txt 2024/20240101.LDASIN_DOMAIN1 20240101.LDASIN_DOMAIN1_0.1deg
For low-level manipulations that are not available in all those tools, Python netCDF4 library could be an option.
How to download
Both the retrospective and NRT forcing products are hosted on Globus:
To access them, you need to have an account on the Globus website . You may already have one with your institution, otherwise it’s free to register one. Once you login to the Globus web app, clicking on the links provided above will bring you directly to the data (see the screenshot).
If you prefer a command line way to download the data, or need to automate it, you can install the Globus CLI with pip or conda. Once you have the Globus CLI installed, you can list and copy data:
GLOBUS_RETRO=0351632c-c1f7-4885-8125-0a19290791ff
GLOBUS_NRT=1620b36c-6d83-45d1-8673-5143f09ac5d8
globus ls -l $GLOBUS_RETRO:1979
globus transfer $GLOBUS_RETRO:1979 [my globus endpoint]:[my data path]/1979 --recursive
Caution: very large data files and folders.
Known issues
Striping noise in Stage IV hourly data over CNRFC
Among the input datasets, the NCEP Stage IV data started to have striping noise problems as of July 2020 in its hour data over the CNRFC region. Cumulative values at 6-hourly intervals or longer were not affected by this problem. The cause of this issue was assumed to be a buggy 6-hourly to 1-hourly temporal disaggregation procedure performed at NCEP. No fixes have been applied at NCEP so far. A temporary workaround is being developed to redo the temporal disaggregation using the NSSL MRMS (e.g., multisensor pass 1).
Lack of observation data outside of US border
No significant number of Canadian or Mexican observations exist in the input data products, causing abrupt changes across the borders. An ongoing effort is trying to blend in the Canadian Precipitation Analysis System (CaPA) data.
Bias correction by Maxwell Lab at Princeton University & HydroFrame team (2003-2005 only)
The Maxwell Lab at Princeton University and HydroFrame team investigated the temperature biases in the data against networks like SNOTEL. According to the findings, the teams have implemented corrections to the raw temperature and specific humidity data for Water Years 2003-2005 at the HUC02 level. Temperature was adjusted as described for each region below. Specific humidity was adjusted along with temperature using the Clausius-Clapeyron equation.
Water Year 2003:
Great Basin (HUC 16): 0.5 degree uniform temperature decrease
Pacific Northwest (HUC 17): 0.5 degree uniform temperature decrease
Upper Colorado River Basin (HUC 14): temperature lapse rate correction at 4 K/km
Water Year 2004:
Great Basin (HUC 16): 0.5 degree uniform temperature decrease
Upper Colorado River Basin (HUC 14): temperature lapse rate correction at 4 K/km
Water Year 2005:
Upper Colorado River Basin (HUC 14): temperature lapse rate correction at 2.5 K/km
The data is made available by the HydroFrame team on their HydroData platform. HydroData is a powerful application developed by HydroFrame and provides a data catalog, an associated Python API library hf_hydrodata to query and retrieve data, and other tools to manipulate the data. See HydroData documentation here. To retrieve the bias-correction data, install the Python API library hf_hydrodata and follow instructions at here and the Accessing Gridded Data section:
import hf_hydrodata as hf
hf.register_api_pin("<your_email>", "<your_pin>")
# Define filters and return as NumPy array
filters = {"dataset":"CW3E", "variable":"air_temp", "temporal_resolution":"daily", "start_time": "2005-01-01", "end_time": "2005-01-02"}
data = hf.get_gridded_data(filters); print(data.shape)
# Get the metadata about the returned data
metadata = hf.get_catalog_entry(filters); print(metadata)
The dataset name is “CW3E” and the variable names can be found here.
Data citation and contact information
The data products are experimental and provided to the community without warranty of any kind. There has not been a peer reviewed journal publication about this dataset yet, though multiple studies have used this data for their modeling purposes (see the next section). The following Google Document contains the technical notes for the data product and will be updated more frequently than this webpage:
If you want to use these products or report bugs/issues, please contact data producer:
Ming Pan, Senior Hydrologist
Center for Western Weather and Water Extremes (CW3E)
Scripps Institution of Oceanography
University of California San Diego
Email: m3pan@ucsd.edu
Journal publications using this data
Martens, H. R., Lau, N., Swarr, M. J., Argus, D. F., Cao, Q., Young, Z. M., et al. (2024). GNSS geodesy quantifies water-storage gains and drought improvements in California spurred by atmospheric rivers. Geophysical Research Letters, 51, e2023GL107721. https://doi.org/10.1029/2023GL107721