Phase 1: Module Scaffolding

  • [ ] Create src/zyra/connectors/discovery/ for discovery logic.

  • [ ] Define a DiscoveryBackend interface (similar to connector backends) with methods like:

    class DiscoveryBackend:
        def search(self, query: str, **kwargs) -> List[DatasetMetadata]:
            ...
    
  • [ ] Add DatasetMetadata model (name, description, source, format, URI).


Phase 2: Backends for Discovery

  • [ ] Implement a Local Catalog Backend: reads from a JSON/YAML index of datasets stored in assets/catalog.json.

  • [ ] Implement a Remote API Backend: queries NOAA, Pangeo, or CKAN catalogs.

  • [ ] Register backends in connectors.discovery.backends.


Phase 3: Integration with CLI

  • [ ] Extend CLI (cli.py) with a new command:

    zyra search "NOAA GFS forecast"
    
  • [ ] CLI should call discovery backends, print results as a table:

    ID   Name                  Format   URI
    1    GFS 2025-08-17 00z    NetCDF   s3://noaa/gfs/...
    2    HRRR Surface Temp     GRIB2    s3://noaa/hrrr/...
    
  • [ ] Add option to export search results (--json, --yaml).


Phase 4: API Integration

  • [ ] Add /search endpoint in api/ that accepts query and returns DatasetMetadata[].

  • [ ] Ensure API reuses the discovery module (don’t re-implement logic).


Phase 5: Connectors Integration

  • [ ] Allow direct ingestion from search results:

    zyra search "GFS" --select 1 | zyra ingest -
    
  • [ ] Support programmatic chaining:

    results = discovery.search("GFS")
    connectors.ingest(results[0].uri)
    

Phase 6: Documentation & Examples

  • [ ] Add documentation in docs/source/discovery.rst.

  • [ ] Provide sample catalog (assets/catalog.json) with 3–5 NOAA datasets.

  • [ ] Create CLI walkthrough: search β†’ select β†’ ingest β†’ visualize.


Stretch Goals

  • Semantic search (LLM-assisted descriptions).

  • Multi-backend search aggregation.

  • Metadata enrichment (units, time ranges, variables available).