Introduction to Zyraď
Jump to:
Kid Version | High School Version | College Version | White Paper Version
Kid Versionď
Imagine you have a big box of LEGO bricks mixed together â some from space sets, some from castles, some from race cars.
Zyra is like a magical robot helper that:
Finds the bricks you want (getting data from the internet or your computer).
Puts them in order (sorting and cleaning the pieces so they fit).
Builds and shares something amazing (pictures, videos, or maps you can show to friends) â and sometimes it helps you check your build or tell its story.
It makes science data less messy and more fun to look at.
High School Versionď
Zyra is a Python tool that:
Collects data from many sources like websites, cloud storage, and scientific file formats.
Processes it so itâs easier to work with (cutting, reshaping, converting formats).
Visualizes it in charts, maps, and animations, and can publish results.
Think of it like a factory with up to eight stations (you can skip ones you donât need):
Import (get data) â Process (clean/convert) â Simulate (make examples) â Decide (pick best settings) â Visualize (make graphics) â Narrate (add captions/reports) â Verify (check quality) â Export (share).
Itâs modular â you can swap out any station for your own tool.
College Versionď
Zyra is an open-source, modular Python framework for reproducible scientific data workflows organized as up to eight stages:
Import â HTTP/FTP/S3/local fetch and listing; supports manifests and streaming I/O.
Process â Subset, transform, and convert (e.g., GRIB2âNetCDF, GeoTIFF).
Simulate â Generate synthetic or toy datasets for demos/tests.
Decide â Explore parameter spaces and select best variants.
Visualize â Static maps/plots, animations, and interactive outputs.
Narrate â Produce captions, summaries, or pages that contextualize outputs.
Verify â Integrity/quality checks, metadata validation, provenance.
Export â Write to local paths, S3/FTP, HTTP POST, or video destinations.
Not all stages are required in every workflow; the pipeline is composable and streamingâfriendly (stdin/stdout). Under the hood, implemented pieces map to modules like zyra.connectors
(import/export), zyra.processing
, zyra.visualization
, and zyra.transform
, with shared helpers in zyra.utils
.
White Paper Versionď
Abstract:
Zyra is a composable Python framework for endâtoâend scientific data workflows. It organizes work into eight conceptual stages â import, process, simulate, decide, visualize, narrate, verify, and export â providing reproducibility, modularity, and interoperability across environmental and geospatial datasets.
Motivation & Scopeď
Modern environmental workflows span heterogeneous data sources and formats, require repeatable transformations, and produce diverse outputs (plots, animations, interactive pages, datasets). Zyra provides a lightâweight, CLIâfirst framework that standardizes common steps while remaining extensible for domainâspecific logic.
Design Principlesď
Modularity: small, composable commands and helpers; optâin extras for heavy deps.
Streaming by default: stdin/stdout support to avoid temporary files and enable Unixâstyle chaining.
Reproducibility: explicit configs, deterministic transforms, comprehensive logging and metadata.
Interoperability: rely on wellâadopted libraries (xarray, netCDF4, rasterio, matplotlib/cartopy, ffmpeg).
Extensibility: pluggable connectors and processors; minimal glue code to register new commands.
Architecture (stages â modules)ď
Import/Export â
zyra.connectors
(HTTP/FTP/S3/Vimeo, local paths, HTTP POST) with list/filter, sync, and streaming I/O.Process â
zyra.processing
(GRIB2 decoding, NetCDF/GeoTIFF conversion, extraction, subsetting);zyra.transform
for lightweight metadata updates.Visualize â
zyra.visualization
(static plots/maps, animations, interactive HTML).Simulate / Decide / Narrate / Verify â conceptual today; tracked on the roadmap and expressed via configs/orchestrators and external tools until dedicated CLI groups mature.
Utilities â
zyra.utils
(credentials, date/time ranges, files/images, JSON/YAML I/O).
See also: Workflow-Stages.md for an overview and Stage-Examples.md for concise commands.
Execution Modelď
CLI groups mirror stages (
acquire
,process
,visualize
,export
) and accept-
for stdin/stdout where applicable.Commands are sideâeffect free where possible and return nonâzero exit codes on failure.
A configâdriven runner can chain stages; external orchestrators (n8n, cron, shell) are supported by design.
Data & Formatsď
Gridded data: GRIB2 (via cfgrib/pygrib), NetCDF (via netCDF4/xarray), GeoTIFF (via rioxarray/rasterio).
Imagery/video: PNG/JPEG/MP4 (via ffmpeg-python).
Protocols: HTTP/S, FTP, S3, filesystem; Vimeo for video publishing.
CRS/geo: handled by libraries (cartopy, rasterio); follow CF conventions where possible.
Configuration & Metadataď
JSON/YAML configs for pipelines and perâstage arguments.
Frames and dataset metadata helpers under
zyra.transform
(e.g., directory scans, enrich/merge).Provenance captured via logs, timestamps, argument echoes, and optional JSON sidecars.
Extensibilityď
Connectors: add a backend (e.g., new cloud/object store) by implementing list/fetch/upload and registering a subcommand.
Processors: add decode/convert/extract operations by exposing CLI wrappers around library calls.
Visualizers: add new plot types by adhering to common I/O options (
--input
,--output
,--var
, etc.).
Security & Complianceď
Credentials are read from environment and standard config locations; do not hardâcode secrets.
Optional API service (FastAPI) supports API keys and CORS options (see Zyra-API-Security-Quickstart.md).
Artifact handling supports deterministic outputs and optional checksums via verify stage (planned).
Performance Considerationsď
Stream and chunk large files; avoid loadâall where unnecessary.
Prefer xarray/dask patterns where feasible (future work) to enable outâofâcore transforms.
Use formatâappropriate compression (e.g., NetCDF deflate) when exporting.
Deployment Modesď
Local CLI via pip extras or poetry.
Containerized workloads for reproducible environments (see Zyra-Containers-Overview-and-Usage.md).
Optional API service for remote execution and WebSocket streaming; job results persisted with TTL (see Zyra-API-Routers-and-Endpoints.md).
Limitations & Roadmapď
Simulate, Decide, Narrate, Verify: conceptual in current releases; tracked in Roadmap-and-Tracking.md.
Cartopy tile caching and large model assets may require writable caches and careful environment setup.
Parallel/cluster execution is orchestratorâdependent; native dask integration is planned.
Referencesď
Workflow overview: Workflow-Stages.md
Examples: Stage-Examples.md
API & CLI docs: https://noaa-gsl.github.io/zyra/
Security: Zyra-API-Security-Quickstart.md