Earth2Studio Deterministic Forecast Skill

Guide users through building deterministic (single-member) weather forecast inference scripts using earth2studio.run.deterministic.

Prerequisites

Earth2Studio installed with CUDA-capable GPU
Python 3.10+, network access for model weights and data

Live Doc References

Fetch relevant docs to verify current APIs before recommending components:

Component	URL
Prognostic models	https://nvidia.github.io/earth2studio/modules/models_px.html
Data sources (analysis)	https://nvidia.github.io/earth2studio/modules/datasources_analysis.html
Data sources (forecast)	https://nvidia.github.io/earth2studio/modules/datasources_forecast.html
IO backends	https://nvidia.github.io/earth2studio/modules/io.html
`run.deterministic`	https://github.com/NVIDIA/earth2studio/blob/main/earth2studio/run.py

Workflow

1. Gather Requirements (skip what's already provided)

Time horizon (hours/days/weeks)
Variables of interest (t2m, wind, geopotential, etc.)
Region (global or specific like CONUS)
GPU/VRAM available

2. Select Model

Fetch prognostic models page. Filter by time horizon, region, VRAM. Note model's:

Input variables (input_coords["variable"])
Time step size (output_coords["lead_time"])

3. Select Data Source

Data source must provide all model input variables. Verify via lexicon at earth2studio/lexicon/<source>.py. Common pairings: Global models → GFS/ARCO/IFS; Regional → HRRR.

4. Select IO Backend

Default: ZarrBackend. Use NetCDF4Backend for legacy tools, XarrayBackend for in-memory/small runs.

5. Calculate nsteps

nsteps = forecast_hours / model_step_hours

Example: 5-day forecast with 6h step → nsteps = 120 / 6 = 20

6. Decide: output_coords Filtering

Filter variables (output_coords) when user requests specific variables (e.g., "t2m and wind") - reduces output size
Save all variables (omit output_coords) when user says "all variables" or doesn't specify - preserves full model output

7. Generate Script

from collections import OrderedDict
import numpy as np
import torch
from earth2studio.models.px import <ModelClass>
from earth2studio.data import <DataSourceClass>
from earth2studio.io import <IOBackendClass>
from earth2studio.run import deterministic

model = <ModelClass>.load_model(<ModelClass>.load_default_package())
data = <DataSourceClass>()
io = <IOBackendClass>("<output_path>")

# Include output_coords ONLY if user requested specific variables
output_coords = OrderedDict({"variable": np.array(["t2m", "u10m"])})

io = deterministic(
    time=["YYYY-MM-DDTHH:MM:SS"],
    nsteps=<N>,
    prognostic=model,
    data=data,
    io=io,
    output_coords=output_coords,  # omit if saving all variables
    device=torch.device("cuda"),
)

8. Manual Loop Alternative

When user explicitly requests manual implementation (NOT using earth2studio.run.deterministic), follow this checklist in order:

fetch_data - Get initial conditions: x, coords = fetch_data(data, time, model.input_coords, device)
Setup total_coords - Build coordinate arrays for time and lead_time dimensions
io.add_array - Initialize IO backend with total_coords before loop
create_iterator - Create prognostic iterator: model_iter = model.create_iterator(x, coords)
Loop through nsteps - for step, (x, coords) in enumerate(model_iter): if step >= nsteps: break
map_coords - Filter output variables if needed: x_out, coords_out = map_coords(x, coords, output_coords)
split_coords - Prepare for IO write: x_out, coords_out = split_coords(x_out, coords_out)
io.write - Write each step to backend

9. Explain Next Steps

How to change forecast time or run multiple initializations
How to read output (xr.open_zarr(...))
Point to diagnostic workflow for post-processing

Ownership

Owns: Model selection, data source compatibility, IO backend selection, nsteps calculation, generating earth2studio.run.deterministic scripts.

Does not own: Ensemble workflows, diagnostics, data-only fetch, installation, model training.

Troubleshooting

See references/troubleshooting.md for common errors and solutions.

Reminders

Always fetch live docs before recommending models or data sources - APIs change between releases
Verify lexicon compatibility - Model input variables must exist in data source's VOCAB
Use load_default_package() - This is the standard pattern for loading model weights
Time format is ISO 8601 - Use "YYYY-MM-DDTHH:MM:SS" format for the time argument
Wind speed needs both components - If user asks for "wind speed", include both u10m and v10m
nsteps is integer division - nsteps = total_hours // model_step_hours
ZarrBackend is the default - Only suggest alternatives if user has specific requirements
GPU is required - All prognostic models require CUDA; CPU inference is not supported

Earth2studio Deterministic Forecast