Introduction to Zyra

Jump to:
Kid Version | High School Version | College Version | White Paper Version

Kid Version

Imagine you have a big box of LEGO bricks mixed together — some from space sets, some from castles, some from race cars.
Zyra is like a magical robot helper that:

Finds the bricks you want (getting data from the internet or your computer).
Puts them in order (sorting and cleaning the pieces so they fit).
Builds and shares something amazing (pictures, videos, or maps you can show to friends) — and sometimes it helps you check your build or tell its story.

It makes science data less messy and more fun to look at.

High School Version

Zyra is a Python tool that:

Collects data from many sources like websites, cloud storage, and scientific file formats.
Processes it so it’s easier to work with (cutting, reshaping, converting formats).
Visualizes it in charts, maps, and animations, and can publish results.

Think of it like a factory with up to eight stations (you can skip ones you don’t need):

Import (get data) → Process (clean/convert) → Simulate (make examples) → Decide (pick best settings) → Visualize (make graphics) → Narrate (add captions/reports) → Verify (check quality) → Export (share).

It’s modular — you can swap out any station for your own tool.

College Version

Zyra is an open-source, modular Python framework for reproducible scientific data workflows organized as up to eight stages:

Import – HTTP/FTP/S3/local fetch and listing; supports manifests and streaming I/O.
Process – Subset, transform, and convert (e.g., GRIB2⇄NetCDF, GeoTIFF).
Simulate – Generate synthetic or toy datasets for demos/tests.
Decide – Explore parameter spaces and select best variants.
Visualize – Static maps/plots, animations, and interactive outputs.
Narrate – Produce captions, summaries, or pages that contextualize outputs.
Verify – Integrity/quality checks, metadata validation, provenance.
Export – Write to local paths, S3/FTP, HTTP POST, or video destinations.

Not all stages are required in every workflow; the pipeline is composable and streaming‑friendly (stdin/stdout). Under the hood, implemented pieces map to modules like zyra.connectors (import/export), zyra.processing, zyra.visualization, and zyra.transform, with shared helpers in zyra.utils.

White Paper Version

Abstract:
Zyra is a composable Python framework for end‑to‑end scientific data workflows. It organizes work into eight conceptual stages — import, process, simulate, decide, visualize, narrate, verify, and export — providing reproducibility, modularity, and interoperability across environmental and geospatial datasets.

Motivation & Scope

Modern environmental workflows span heterogeneous data sources and formats, require repeatable transformations, and produce diverse outputs (plots, animations, interactive pages, datasets). Zyra provides a light‑weight, CLI‑first framework that standardizes common steps while remaining extensible for domain‑specific logic.

Design Principles

Modularity: small, composable commands and helpers; opt‑in extras for heavy deps.
Streaming by default: stdin/stdout support to avoid temporary files and enable Unix‑style chaining.
Reproducibility: explicit configs, deterministic transforms, comprehensive logging and metadata.
Interoperability: rely on well‑adopted libraries (xarray, netCDF4, rasterio, matplotlib/cartopy, ffmpeg).
Extensibility: pluggable connectors and processors; minimal glue code to register new commands.

Architecture (stages → modules)

Import/Export → zyra.connectors (HTTP/FTP/S3/Vimeo, local paths, HTTP POST) with list/filter, sync, and streaming I/O.
Process → zyra.processing (GRIB2 decoding, NetCDF/GeoTIFF conversion, extraction, subsetting); zyra.transform for lightweight metadata updates.
Visualize → zyra.visualization (static plots/maps, animations, interactive HTML).
Simulate / Decide / Narrate / Verify → conceptual today; tracked on the roadmap and expressed via configs/orchestrators and external tools until dedicated CLI groups mature.
Utilities → zyra.utils (credentials, date/time ranges, files/images, JSON/YAML I/O).

See also: Workflow-Stages.md for an overview and Stage-Examples.md for concise commands.

Execution Model

CLI groups mirror stages (acquire, process, visualize, export) and accept - for stdin/stdout where applicable.
Commands are side‑effect free where possible and return non‑zero exit codes on failure.
A config‑driven runner can chain stages; external orchestrators (n8n, cron, shell) are supported by design.

Data & Formats

Gridded data: GRIB2 (via cfgrib/pygrib), NetCDF (via netCDF4/xarray), GeoTIFF (via rioxarray/rasterio).
Imagery/video: PNG/JPEG/MP4 (via ffmpeg-python).
Protocols: HTTP/S, FTP, S3, filesystem; Vimeo for video publishing.
CRS/geo: handled by libraries (cartopy, rasterio); follow CF conventions where possible.

Configuration & Metadata

JSON/YAML configs for pipelines and per‑stage arguments.
Frames and dataset metadata helpers under zyra.transform (e.g., directory scans, enrich/merge).
Provenance captured via logs, timestamps, argument echoes, and optional JSON sidecars.

Extensibility

Connectors: add a backend (e.g., new cloud/object store) by implementing list/fetch/upload and registering a subcommand.
Processors: add decode/convert/extract operations by exposing CLI wrappers around library calls.
Visualizers: add new plot types by adhering to common I/O options (--input, --output, --var, etc.).

Security & Compliance

Credentials are read from environment and standard config locations; do not hard‑code secrets.
Optional API service (FastAPI) supports API keys and CORS options (see Zyra-API-Security-Quickstart.md).
Artifact handling supports deterministic outputs and optional checksums via verify stage (planned).

Performance Considerations

Stream and chunk large files; avoid load‑all where unnecessary.
Prefer xarray/dask patterns where feasible (future work) to enable out‑of‑core transforms.
Use format‑appropriate compression (e.g., NetCDF deflate) when exporting.

Deployment Modes

Local CLI via pip extras or poetry.
Containerized workloads for reproducible environments (see Zyra-Containers-Overview-and-Usage.md).
Optional API service for remote execution and WebSocket streaming; job results persisted with TTL (see Zyra-API-Routers-and-Endpoints.md).

Limitations & Roadmap

Simulate, Decide, Narrate, Verify: conceptual in current releases; tracked in Roadmap-and-Tracking.md.
Cartopy tile caching and large model assets may require writable caches and careful environment setup.
Parallel/cluster execution is orchestrator‑dependent; native dask integration is planned.

References

Workflow overview: Workflow-Stages.md
Examples: Stage-Examples.md
API & CLI docs: https://noaa-gsl.github.io/zyra/
Security: Zyra-API-Security-Quickstart.md