Earth2Studio Deterministic Forecast Skill
Guide users through building deterministic (single-member) weather forecast
inference scripts using earth2studio.run.deterministic.
Prerequisites
- Earth2Studio installed with CUDA-capable GPU
- Python 3.10+, network access for model weights and data
Live Doc References
Fetch relevant docs to verify current APIs before recommending components:
| Component | URL |
|---|---|
| Prognostic models | https://nvidia.github.io/earth2studio/modules/models_px.html |
| Data sources (analysis) | https://nvidia.github.io/earth2studio/modules/datasources_analysis.html |
| Data sources (forecast) | https://nvidia.github.io/earth2studio/modules/datasources_forecast.html |
| IO backends | https://nvidia.github.io/earth2studio/modules/io.html |
run.deterministic | https://github.com/NVIDIA/earth2studio/blob/main/earth2studio/run.py |
Workflow
1. Gather Requirements (skip what's already provided)
- Time horizon (hours/days/weeks)
- Variables of interest (t2m, wind, geopotential, etc.)
- Region (global or specific like CONUS)
- GPU/VRAM available
2. Select Model
Fetch prognostic models page. Filter by time horizon, region, VRAM. Note model's:
- Input variables (
input_coords["variable"]) - Time step size (
output_coords["lead_time"])
3. Select Data Source
Data source must provide all model input variables. Verify via lexicon at
earth2studio/lexicon/<source>.py. Common pairings: Global models → GFS/ARCO/IFS;
Regional → HRRR.
4. Select IO Backend
Default: ZarrBackend. Use NetCDF4Backend for legacy tools, XarrayBackend
for in-memory/small runs.
5. Calculate nsteps
nsteps = forecast_hours / model_step_hours
Example: 5-day forecast with 6h step → nsteps = 120 / 6 = 20
6. Decide: output_coords Filtering
- Filter variables (
output_coords) when user requests specific variables (e.g., "t2m and wind") - reduces output size - Save all variables (omit
output_coords) when user says "all variables" or doesn't specify - preserves full model output
7. Generate Script
from collections import OrderedDict
import numpy as np
import torch
from earth2studio.models.px import <ModelClass>
from earth2studio.data import <DataSourceClass>
from earth2studio.io import <IOBackendClass>
from earth2studio.run import deterministic
model = <ModelClass>.load_model(<ModelClass>.load_default_package())
data = <DataSourceClass>()
io = <IOBackendClass>("<output_path>")
# Include output_coords ONLY if user requested specific variables
output_coords = OrderedDict({"variable": np.array(["t2m", "u10m"])})
io = deterministic(
time=["YYYY-MM-DDTHH:MM:SS"],
nsteps=<N>,
prognostic=model,
data=data,
io=io,
output_coords=output_coords, # omit if saving all variables
device=torch.device("cuda"),
)
8. Manual Loop Alternative
When user explicitly requests manual implementation (NOT using earth2studio.run.deterministic), follow this checklist in order:
- fetch_data - Get initial conditions:
x, coords = fetch_data(data, time, model.input_coords, device) - Setup total_coords - Build coordinate arrays for time and lead_time dimensions
- io.add_array - Initialize IO backend with total_coords before loop
- create_iterator - Create prognostic iterator:
model_iter = model.create_iterator(x, coords) - Loop through nsteps -
for step, (x, coords) in enumerate(model_iter): if step >= nsteps: break - map_coords - Filter output variables if needed:
x_out, coords_out = map_coords(x, coords, output_coords) - split_coords - Prepare for IO write:
x_out, coords_out = split_coords(x_out, coords_out) - io.write - Write each step to backend
9. Explain Next Steps
- How to change forecast time or run multiple initializations
- How to read output (
xr.open_zarr(...)) - Point to diagnostic workflow for post-processing
Ownership
Owns: Model selection, data source compatibility, IO backend selection,
nsteps calculation, generating earth2studio.run.deterministic scripts.
Does not own: Ensemble workflows, diagnostics, data-only fetch, installation, model training.
Troubleshooting
See references/troubleshooting.md for common errors and solutions.
Reminders
- Always fetch live docs before recommending models or data sources - APIs change between releases
- Verify lexicon compatibility - Model input variables must exist in data source's VOCAB
- Use
load_default_package()- This is the standard pattern for loading model weights - Time format is ISO 8601 - Use
"YYYY-MM-DDTHH:MM:SS"format for thetimeargument - Wind speed needs both components - If user asks for "wind speed", include both
u10mandv10m - nsteps is integer division -
nsteps = total_hours // model_step_hours - ZarrBackend is the default - Only suggest alternatives if user has specific requirements
- GPU is required - All prognostic models require CUDA; CPU inference is not supported