Phase 1: Module Scaffoldingο
[ ] Create
src/zyra/connectors/discovery/
for discovery logic.[ ] Define a
DiscoveryBackend
interface (similar to connector backends) with methods like:class DiscoveryBackend: def search(self, query: str, **kwargs) -> List[DatasetMetadata]: ...
[ ] Add
DatasetMetadata
model (name, description, source, format, URI).
Phase 2: Backends for Discoveryο
[ ] Implement a Local Catalog Backend: reads from a JSON/YAML index of datasets stored in
assets/catalog.json
.[ ] Implement a Remote API Backend: queries NOAA, Pangeo, or CKAN catalogs.
[ ] Register backends in
connectors.discovery.backends
.
Phase 3: Integration with CLIο
[ ] Extend CLI (
cli.py
) with a new command:zyra search "NOAA GFS forecast"
[ ] CLI should call discovery backends, print results as a table:
ID Name Format URI 1 GFS 2025-08-17 00z NetCDF s3://noaa/gfs/... 2 HRRR Surface Temp GRIB2 s3://noaa/hrrr/...
[ ] Add option to export search results (
--json
,--yaml
).
Phase 4: API Integrationο
[ ] Add
/search
endpoint inapi/
that acceptsquery
and returnsDatasetMetadata[]
.[ ] Ensure API reuses the discovery module (donβt re-implement logic).
Phase 5: Connectors Integrationο
[ ] Allow direct ingestion from search results:
zyra search "GFS" --select 1 | zyra ingest -
[ ] Support programmatic chaining:
results = discovery.search("GFS") connectors.ingest(results[0].uri)
Phase 6: Documentation & Examplesο
[ ] Add documentation in
docs/source/discovery.rst
.[ ] Provide sample catalog (
assets/catalog.json
) with 3β5 NOAA datasets.[ ] Create CLI walkthrough: search β select β ingest β visualize.
Stretch Goalsο
Semantic search (LLM-assisted descriptions).
Multi-backend search aggregation.
Metadata enrichment (units, time ranges, variables available).