NOAA — National Oceanic and Atmospheric Administration logo
NOAA Global Systems Laboratory logo
QR code — scan to view the interactive web poster

Scan for
interactive poster

Zyra

Modular, Reproducible Data Workflows for Science

An Open-Source Python Framework by NOAA Global Systems Laboratory

Eric Hackathorn NOAA Global Systems Laboratory iD 0000-0002-9693-2093

The Challenge

Environmental and scientific workflows span heterogeneous data sources — HTTPFTPS3APIs — and diverse formats such as GRIB2NetCDFGeoTIFF. They require repeatable transformation chains and produce outputs ranging from static maps and animations to interactive pages and publishable datasets. Existing approaches often rely on ad-hoc scripts that break when data changes and lack reproducibility across teams and environments.

Zyra (pronounced Zy-rah) provides a lightweight, CLI-first framework that standardizes these common steps while remaining fully extensible for domain-specific logic. Think of it as a garden for your data: you plant seeds (from the web, satellites, or experiments), Zyra helps you nurture them (through filtering, analysis, and processing), and you harvest insights as visualizations, reports, and interactive media. It's designed to make science not just rigorous, but also accessible, transparent, and beautiful.

Sources Outputs HTTP FTP S3 API GRIB2 NetCDF GeoTIFF Maps Animations Datasets ad-hoc scripts break here

The Pipeline: 8 Composable Stages

Use only what you need. Each stage streams via stdin/stdout for Unix-style chaining.

Click to zoom
Data Ingestion & Transformation
Visualization & AI Narration
Verification & Export
HTTP
S3
FTP
Import
acquire
Process
transform
Simulate
planned
Decide
planned
Visualize
render
Narrate
AI-driven
Verify
verify
Export
disseminate
Cloud
S3
Local
Import zyra acquire
Discover datasets via zyra search (SOS catalog, OGC, remote APIs), then fetch from S3, FTP, REST, or HTTP with automatic retry and checksum validation.
Process zyra transform
Regrid, subset, and convert GRIB2, NetCDF, and GeoTIFF through configurable processing chains.
Simulate planned
Future stage for ensemble simulations and model experiments within the pipeline.
Decide planned
Future stage for automated decision-support logic and threshold-based alerting.
Visualize zyra render
Generate publication-quality maps, plots, and multi-frame GIF animations from processed datasets.
Narrate zyra narrate
AI-driven scientific captions via multi-agent critique with OpenAI, Ollama, or Gemini providers.
Verify zyra verify
Check metadata, provenance records, and output integrity.
Export disseminate
Push final products to S3, cloud, or local with full audit trail.
GRIB2 NetCDF GeoTIFF
Discover datasets via search, then acquire
via HTTP, S3, or FTP and transform.
Generate maps, plots, and animations while AI agents
produce automated scientific summaries and reports.
Validate metadata quality before
disseminating to cloud or local storage.
# Stage Purpose CLI Status
1ImportSearch & fetch from HTTP/S, S3, FTP, REST APIzyra acquireImplemented
2ProcessDecode, subset, convert (GRIB2, NetCDF, GeoTIFF)zyra processImplemented
3SimulateGenerate synthetic/test dataPlanned
4DecideParameter optimization and selectionPlanned
5VisualizeStatic maps, plots, animations, interactivezyra visualizeImplemented
6NarrateAI-driven captions, summaries, reportszyra narrateImplemented
7VerifyQuality checks and metadata validationzyra verifyPartial
8ExportPush to S3, FTP, Vimeo, local, HTTP POSTzyra exportImplemented
Unix-Style Streaming
zyra acquire http $URL -o - | \ zyra process convert-format - netcdf --stdout | zyra visualize heatmap --input - --var TMP -o plot.png

Stages are composable — pipe any stage's output directly into the next. Every stage supports stdin/stdout for seamless chaining.

HRRR Wind Analysis: GRIB2 to Interactive Map

Fetch only the variables you need from a 300 MB GRIB2 file using the .idx byte-range trick, convert to NetCDF, and render an interactive wind-streamline map — four commands, one pipeline.

① Import zyra acquire http --idx noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20250101/conus/hrrr.t00z.wrfsfcf00.grib2 \ --select "TMP:2 m|UGRD:10 m|VGRD:10 m" -o hrrr.grib2 # 5.5 MB, not 300 MB
② Process zyra process convert-format hrrr.grib2 netcdf -o hrrr_t2m.nc zyra process convert-format hrrr.grib2 netcdf --var UGRD -o hrrr_wind.nc
③ Process — Coming Soon zyra process merge-nc hrrr_t2m.nc hrrr_wind.nc -o hrrr_merged.nc # Coming soon: merge multiple NetCDF files into one dataset
④ Visualize zyra visualize interactive \ --input hrrr_merged.nc --var t2m --uvar u10 --vvar v10 \ --mode vector --streamlines --engine folium --colorbar \ --output hrrr_wind.html # 151 KB standalone HTML

10m Wind Streamlines — Velocity vectors from the HRRR GRIB2 file, rendered as streamlines colored by wind speed. Pan, zoom, and explore. Generated by zyra visualize interactive --mode vector --streamlines

2m Temperature — Surface temperature heatmap from the same GRIB2 file, extracted alongside the wind variables in a single zyra acquire command. Generated by zyra visualize interactive --mode heatmap

HRRR 2m Temperature heatmap over CONUS, 2025-01-01 00Z

2m Temperature — zyra visualize heatmap

HRRR 10m wind streamlines over CONUS, 2025-01-01 00Z

10m Wind Streamlines — zyra visualize vector

Drought Animation Pipeline

Discover datasets via SOS catalog, sync weekly drought risk frames from NOAA FTP, fill gaps, and compose an MP4 animation — six steps, one pipeline.

D Discover zyra search "drought risk" --profile sos --select 1
① Import zyra acquire ftp \ ftp://ftp.nnvl.noaa.gov/SOS/DroughtRisk_Weekly \ --sync-dir ./frames --since P1Y
② Process zyra process scan-frames \ ./frames --output manifest.json
③ Process zyra process pad-missing \ ./frames --fill basemap
④ Visualize zyra visualize compose-video \ ./frames --fps 4 --output video.mp4
⑤ Export zyra export vimeo \ video.mp4 --title "Drought Risk Weekly"
Watch on Vimeo — Drought Risk Weekly animation output

Each stage logs provenance — start time, duration, command, and exit code — to a SQLite store for full reproducibility.

Agentic Pipeline Orchestration

Describe your goal in plain language — Zyra's planning engine decomposes intent into a concrete execution DAG and dispatches specialized stage agents to run it.

Click to zoom
Scientist
"Summarize this week's HRRR temperature anomalies for the Colorado Front Range and generate a narrated briefing with publication-quality maps."
User Intent natural language Planner zyra plan Value Engine suggest augmentations Execution DAG parallel / sequential Stage Agent acquire Stage Agent process Stage Agent visualize LLM Agent narrate Provenance (SQLite)

LLM Agnostic — swap providers via --provider: OpenAI, Ollama, Gemini, or any compatible backend. Mock mode for offline testing.

Intent → Executable Plan
zyra plan --intent "Sync drought frames, fill gaps, render animation" { "agents": [ {"id": "fetch_frames", "stage": "acquire"}, {"id": "scan_frames", "stage": "process"}, {"id": "pad_missing", "stage": "process"}, {"id": "compose_animation", "stage": "visualize"}, {"id": "save_local", "stage": "decimate"} ], "suggestions": [ {"stage": "narrate", "confidence": 0.88} ] }

Outputs validated against Pydantic schemas with optional guardrails via RAIL files for structured, reproducible results.

Swarm Orchestration
zyra swarm drought_animation.yaml \ --parallel --memory provenance.sqlite

Reproducible Pipeline Configs

Define multi-stage pipelines as YAML — no scripting required. Override parameters at runtime, dry-run to preview commands, and share configs across teams.

Click to zoom
name: FTP to Local Video stages: - stage: acquire command: ftp args: path: ftp://ftp.nnvl.noaa.gov/SOS/DroughtRisk_Weekly sync_dir: ./frames since_period: "P1Y" - stage: visualize command: compose-video args: frames: ./frames output: video.mp4 fps: 4 - stage: export command: local args: input: video.mp4 path: /output/video.mp4
zyra run pipeline.yaml # execute zyra run pipeline.yaml --dry-run # preview commands zyra run pipeline.yaml --set visualize.fps=8 # override parameters
Declarative YAML Dry-Run Preview Runtime Overrides Team Sharing

Building Off the Foundation

Three layers of access — from terminal commands to autonomous AI agents — all sharing the same pipeline architecture.

3
MCP + AI Agents
Model Context Protocol

Zyra exposes every pipeline stage as an MCP tool, letting LLM agents like Claude autonomously discover, compose, and execute scientific workflows. The AI layer transforms conversational intent into reproducible pipeline runs.

2
Python API
Application Programming Interface

Zyra's modular Python API extends the CLI with programmatic access — enabling custom processing modules, integration into existing data workflows, and automated dissemination pipelines via import zyra.

1
Command Line Interface
CLI — The Foundation

The CLI empowers researchers and developers to quickly build, test, and reproduce visualization pipelines with simple, scriptable commands. Every stage streams via stdin/stdout for Unix-style composition.

CLI zyra [command] API import zyra MCP tools/discover
Streaming CLI Pipes Chain stages via stdin/stdout — acquire, process, and visualize in a single Unix pipeline with zero intermediate files.
FastAPI Service Mode Run uvicorn zyra.api.server:app to expose all pipeline stages as a REST API, enabling web dashboards and automated integrations.
MCP Tool Discovery LLM agents discover available pipeline tools at runtime — no hardcoded prompts. Zyra's MCP server advertises capabilities dynamically.
Same Pipeline, Every Layer Whether invoked from bash, Python, REST, or an AI agent — every execution follows the same 8-stage architecture with full provenance.

Key Features

  • Scientific formats: GRIB2, NetCDF, GeoTIFF with xarray, cfgrib, rasterio
  • Connectors: HTTP/S, S3, FTP, REST API, Vimeo
  • Visualization: Heatmaps, contours, vectors, particles, animations, interactive maps (Folium, Plotly)
  • Agentic orchestration: Natural language intent → zyra plan generates an execution DAG; zyra swarm dispatches stage agents in parallel with provenance tracking
  • Narration swarm: Multi-agent LLM chain (context → summary → critic → editor) generates validated scientific narrative; LLM-agnostic via --provider
  • Provenance: SQLite-based event logging for full reproducibility
  • MCP server: Exposes Zyra's full pipeline as Model Context Protocol tools — letting AI assistants like Claude or ChatGPT acquire data, run transforms, and generate visualizations on demand via natural language
  • REST API: FastAPI service mode exposes the same stage toolset over HTTP for integration with custom apps and workflows
  • Modular extras: pip install "zyra[visualization]", "zyra[processing]", "zyra[llm]", or "zyra[all]"
Python 3.10+ Apache 2.0 CLI-first Streaming-friendly MCP-ready

Help Shape the Future of Agentic Science

We're exploring how intelligent agents can automate and coordinate complex scientific workflows — and we're asking for your help. Share how you actually work with data through the Zyra Workflow Insights Survey.

By sharing your workflow practices and challenges, you'll help us identify:

  • Which agentic tools offer the greatest real-world value
  • Where current automation still falls short
  • How to build an intent dataset to train Zyra's task decomposition system

Your insights directly guide how we design and prioritize future tools built to amplify human creativity, efficiency, and discovery. Responses are used anonymously for research and system improvement; do not include sensitive or confidential data.