Earth2studio Deterministic Forecast

Build deterministic forecast scripts with Earth2Studio (model, data source, IO, inference). Do NOT use for ensemble, diagnostics, data-only fetch, or install.

Published by @NVIDIA·0 agent reads / 30d·0 saves·

Earth2Studio Deterministic Forecast Skill

Guide users through building deterministic (single-member) weather forecast inference scripts using earth2studio.run.deterministic.

Prerequisites

  • Earth2Studio installed with CUDA-capable GPU
  • Python 3.10+, network access for model weights and data

Live Doc References

Fetch relevant docs to verify current APIs before recommending components:

ComponentURL
Prognostic modelshttps://nvidia.github.io/earth2studio/modules/models_px.html
Data sources (analysis)https://nvidia.github.io/earth2studio/modules/datasources_analysis.html
Data sources (forecast)https://nvidia.github.io/earth2studio/modules/datasources_forecast.html
IO backendshttps://nvidia.github.io/earth2studio/modules/io.html
run.deterministichttps://github.com/NVIDIA/earth2studio/blob/main/earth2studio/run.py

Workflow

1. Gather Requirements (skip what's already provided)

  • Time horizon (hours/days/weeks)
  • Variables of interest (t2m, wind, geopotential, etc.)
  • Region (global or specific like CONUS)
  • GPU/VRAM available

2. Select Model

Fetch prognostic models page. Filter by time horizon, region, VRAM. Note model's:

  • Input variables (input_coords["variable"])
  • Time step size (output_coords["lead_time"])

3. Select Data Source

Data source must provide all model input variables. Verify via lexicon at earth2studio/lexicon/<source>.py. Common pairings: Global models → GFS/ARCO/IFS; Regional → HRRR.

4. Select IO Backend

Default: ZarrBackend. Use NetCDF4Backend for legacy tools, XarrayBackend for in-memory/small runs.

5. Calculate nsteps

nsteps = forecast_hours / model_step_hours

Example: 5-day forecast with 6h step → nsteps = 120 / 6 = 20

6. Decide: output_coords Filtering

  • Filter variables (output_coords) when user requests specific variables (e.g., "t2m and wind") - reduces output size
  • Save all variables (omit output_coords) when user says "all variables" or doesn't specify - preserves full model output

7. Generate Script

from collections import OrderedDict
import numpy as np
import torch
from earth2studio.models.px import <ModelClass>
from earth2studio.data import <DataSourceClass>
from earth2studio.io import <IOBackendClass>
from earth2studio.run import deterministic

model = <ModelClass>.load_model(<ModelClass>.load_default_package())
data = <DataSourceClass>()
io = <IOBackendClass>("<output_path>")

# Include output_coords ONLY if user requested specific variables
output_coords = OrderedDict({"variable": np.array(["t2m", "u10m"])})

io = deterministic(
    time=["YYYY-MM-DDTHH:MM:SS"],
    nsteps=<N>,
    prognostic=model,
    data=data,
    io=io,
    output_coords=output_coords,  # omit if saving all variables
    device=torch.device("cuda"),
)

8. Manual Loop Alternative

When user explicitly requests manual implementation (NOT using earth2studio.run.deterministic), follow this checklist in order:

  1. fetch_data - Get initial conditions: x, coords = fetch_data(data, time, model.input_coords, device)
  2. Setup total_coords - Build coordinate arrays for time and lead_time dimensions
  3. io.add_array - Initialize IO backend with total_coords before loop
  4. create_iterator - Create prognostic iterator: model_iter = model.create_iterator(x, coords)
  5. Loop through nsteps - for step, (x, coords) in enumerate(model_iter): if step >= nsteps: break
  6. map_coords - Filter output variables if needed: x_out, coords_out = map_coords(x, coords, output_coords)
  7. split_coords - Prepare for IO write: x_out, coords_out = split_coords(x_out, coords_out)
  8. io.write - Write each step to backend

9. Explain Next Steps

  • How to change forecast time or run multiple initializations
  • How to read output (xr.open_zarr(...))
  • Point to diagnostic workflow for post-processing

Ownership

Owns: Model selection, data source compatibility, IO backend selection, nsteps calculation, generating earth2studio.run.deterministic scripts.

Does not own: Ensemble workflows, diagnostics, data-only fetch, installation, model training.

Troubleshooting

See references/troubleshooting.md for common errors and solutions.

Reminders

  • Always fetch live docs before recommending models or data sources - APIs change between releases
  • Verify lexicon compatibility - Model input variables must exist in data source's VOCAB
  • Use load_default_package() - This is the standard pattern for loading model weights
  • Time format is ISO 8601 - Use "YYYY-MM-DDTHH:MM:SS" format for the time argument
  • Wind speed needs both components - If user asks for "wind speed", include both u10m and v10m
  • nsteps is integer division - nsteps = total_hours // model_step_hours
  • ZarrBackend is the default - Only suggest alternatives if user has specific requirements
  • GPU is required - All prognostic models require CUDA; CPU inference is not supported

Bundled with this artifact

11 files

Reference files that ship alongside this artifact. Agents pull these in only when the task needs them.

More on the bench

SKILL0

Whisper

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

data-science-ml+2
0
SKILL0

Guidance

Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance - Microsoft Research's constrained generation framework

ai-prompt-engineering+2
0
SKILL0

Pinecone

Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (<100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for serverless, managed infrastructure.

data-science-ml+2
0