zyra.utils package

Utilities used across Zyra (dates, files, images, credentials).

Avoid importing optional heavy dependencies at package import time to keep the CLI lightweight when only a subset of functionality is needed (e.g., pipeline runner). Submodules can still be imported directly when required.

class zyra.utils.CredentialManager(filename: str | None = None, namespace: str | None = None)[source]

Bases: object

Manage app credentials from a dotenv file.

Parameters:

filename (str, optional) – Path to a dotenv file containing key=value pairs.
namespace (str, optional) – Optional prefix to apply to all keys when stored/retrieved.

Examples

Namespaced keys:

cm = CredentialManager(".env", namespace="MYAPP_")
cm.read_credentials(expected_keys=["API_KEY"])  # expects MYAPP_API_KEY

add_credential(key: str, value: str) → None[source]

Add or update a credential value in memory.

Parameters:

key (str) – Base key name (namespace is applied automatically).
value (str) – Credential value to store.

clear_credentials() → None[source]: Remove all tracked credentials from memory.

delete_credential(key: str) → None[source]

Delete a credential by key if present.

Parameters:: key (str) – Base key name (namespace is applied automatically).

get_credential(key: str) → str[source]

Retrieve a credential value by key (with namespace applied).

Parameters:: key (str) – Base key name (namespace is applied automatically).
Returns:: Stored value for the namespaced key.
Return type:: str
Raises:: KeyError – If the key is not present in memory.

list_credentials(expected_keys: Iterable[str] | None = None) → list[str][source]

List tracked credential keys, checking for expected ones when provided.

Parameters:: expected_keys (Iterable[str], optional) – Keys to verify are present.
Returns:: Keys currently stored in memory.
Return type:: list of str
Raises:: KeyError – If any expected keys are missing.

read_credentials(expected_keys: Iterable[str] | None = None) → None[source]

Read credentials from the dotenv file into memory.

Parameters:

expected_keys (Iterable[str], optional) – Keys that must be present; raises if any are missing.

Raises:

FileNotFoundError – If the dotenv path cannot be resolved.
KeyError – If any expected keys are missing after reading.

property tracked_keys: set[str]: Return the set of keys currently tracked in memory.

class zyra.utils.DateManager(date_formats: list[str] | None = None)[source]

Bases: object

High-level utilities for working with dates and filenames.

Parameters:: date_formats (list of str, optional) – Preferred strftime-style formats to use when parsing dates from filenames (e.g., ["%Y%m%d"]).

Examples

Use a custom filename format first, then fall back to ISO-like detection:

dm = DateManager(["%Y%m%d%H%M%S"])
when = dm.extract_date_time("frame_20240101093000.png")

__init__(date_formats: list[str] | None = None) → None[source]: Optionally store preferred date formats for filename parsing.

calculate_expected_frames(start_datetime: datetime, end_datetime: datetime, period_seconds: int) → int[source]

Calculate expected frame count between two datetimes at a cadence.

Returns:: Number of expected frames (inclusive of endpoints).
Return type:: int

datetime_format_to_regex(datetime_format: str) → str[source]: Convert a datetime format string to a regex pattern.

extract_date_time(string: str) → str | None[source]

Extract a date string from a filename/text using known formats.

Tries known formats first; falls back to a simple ISO-like pattern.

extract_dates_from_filenames(directory_path: str, image_extensions: Iterable[str] = ('.jpg', '.jpeg', '.png', '.gif', '.bmp', '.dds')) → tuple[str | None, str | None][source]

Extract dates from the first and last image file names in a directory.

Parameters:

directory_path (str) – Directory to scan for images.
image_extensions (Iterable[str]) – File extensions to include when scanning.

Returns:

(first_date, last_date) as strings, or (None, None).

Return type:

tuple

find_missing_frames(directory, period_seconds, datetime_format, filename_format, filename_mask, start_datetime, end_datetime)[source]: Find missing frames in a local directory with inconsistent period, only for image files.

find_missing_frames_and_predict_names(timestamps, period_seconds, filename_pattern)[source]: Find gaps and overfrequent frames in timestamps and predict names.

find_start_end_datetimes(directory: str)[source]: Find earliest and latest datetimes from filenames in a directory.

get_date_range(period: str) → tuple[datetime, datetime][source]

Compute a date range ending at the current minute from a period spec.

Parameters:: period (str) – Period string such as "1Y", "6M", "7D", or "24H".
Returns:: Start and end datetimes for the period ending at “now” (rounded to minute).
Return type:: (datetime, datetime)

get_date_range_iso(iso_duration: str) → tuple[datetime, datetime][source]

Compute a date range ending now from an ISO-8601 duration (e.g., P1Y, P6M, P7D, PT24H).

Supports a subset of ISO-8601: years (Y), months (M), days (D), hours (H) with the “P…T…” structure. Examples: “P1Y”, “P6M”, “P7D”, “PT24H”.

is_date_in_range(filepath: str, start_date: datetime, end_date: datetime) → bool[source]

Check if a filename contains a date within a range.

Parameters:

filepath (str) – Path or filename containing a date stamp.
start_date (datetime) – Inclusive start of the permitted range.
end_date (datetime) – Inclusive end of the permitted range.

Returns:

True if a parsed date falls within the range, else False.

Return type:

bool

parse_timestamps_from_filenames(filenames, datetime_format)[source]: Parse timestamps from filenames based on the given format.

class zyra.utils.FileUtils[source]

Bases: object

Namespace for file-related helper routines.

Examples

While most functions are provided at module-level, a class instance can be created if you prefer an object to group related operations.

class zyra.utils.JSONFileManager(file_path: str | None = None)[source]

Bases: object

Convenience wrapper for manipulating JSON files.

Parameters:: file_path (str | None) – Optional path to a JSON file on disk. When provided, the file is read immediately and made available via self.data. When omitted, the instance can be used with read_json/write_json utility methods for ad-hoc file operations.

Examples

Read, update, and save:

jm = JSONFileManager("./data.json")
jm.data["foo"] = "bar"
jm.save_file()

Ad-hoc helpers without binding to a path:

jm = JSONFileManager()
data = jm.read_json("./input.json")
jm.write_json("./out.json", data)

read_file() → None[source]

Read the JSON file from disk into memory.

Returns:: Populates self.data or sets it to None on error.
Return type:: None

read_json(path: str)[source]

Read JSON from path and return the parsed object.

Returns None and logs on error.

save_file(new_file_path: str | None = None) → None[source]

Write the in-memory data back to disk.

Parameters:: new_file_path (str, optional) – If provided, save to this path instead of overwriting the original.
Returns:: This method returns nothing.
Return type:: None

update_dataset_times(target_id: str, directory: str) → str[source]

Update start/end times for a dataset using directory image dates.

Parameters:

target_id (str) – Dataset identifier to match in the JSON payload.
directory (str) – Directory to scan for frame timestamps.

Returns:

Status message describing the outcome of the update.

Return type:

str

write_json(path: str, data) → None[source]

Write data as pretty JSON to path.

Logs on error and does not raise.

zyra.utils.remove_all_files_in_directory(directory: str) → None[source]

Remove all files and subdirectories under a directory.

Parameters:: directory (str) – Directory to clean.
Returns:: This function returns nothing.
Return type:: None

Notes

Errors are reported via logging.error for consistency with the rest of the codebase.

Modules

Credential storage and retrieval utilities.

This module provides CredentialManager, a small helper class to securely load, access, and manage credentials from a .env-style file without leaking them into the global process environment.

Examples

Load and access values:

from zyra.utils.credential_manager import CredentialManager

cm = CredentialManager("./.env")
cm.read_credentials(expected_keys=["API_KEY"])
token = cm.get_credential("API_KEY")

Use as a context manager:

with CredentialManager("./.env") as cm:
    cm.read_credentials()
    do_work(cm.get_credential("ACCESS_TOKEN"))

class zyra.utils.credential_manager.CredentialManager(filename: str | None = None, namespace: str | None = None)[source]

Bases: object

Manage app credentials from a dotenv file.

Parameters:

filename (str, optional) – Path to a dotenv file containing key=value pairs.
namespace (str, optional) – Optional prefix to apply to all keys when stored/retrieved.

Examples

Namespaced keys:

cm = CredentialManager(".env", namespace="MYAPP_")
cm.read_credentials(expected_keys=["API_KEY"])  # expects MYAPP_API_KEY

add_credential(key: str, value: str) → None[source]

Add or update a credential value in memory.

Parameters:

key (str) – Base key name (namespace is applied automatically).
value (str) – Credential value to store.

clear_credentials() → None[source]: Remove all tracked credentials from memory.

delete_credential(key: str) → None[source]

Delete a credential by key if present.

Parameters:: key (str) – Base key name (namespace is applied automatically).

get_credential(key: str) → str[source]

Retrieve a credential value by key (with namespace applied).

Parameters:: key (str) – Base key name (namespace is applied automatically).
Returns:: Stored value for the namespaced key.
Return type:: str
Raises:: KeyError – If the key is not present in memory.

list_credentials(expected_keys: Iterable[str] | None = None) → list[str][source]

List tracked credential keys, checking for expected ones when provided.

Parameters:: expected_keys (Iterable[str], optional) – Keys to verify are present.
Returns:: Keys currently stored in memory.
Return type:: list of str
Raises:: KeyError – If any expected keys are missing.

read_credentials(expected_keys: Iterable[str] | None = None) → None[source]

Read credentials from the dotenv file into memory.

Parameters:

expected_keys (Iterable[str], optional) – Keys that must be present; raises if any are missing.

Raises:

FileNotFoundError – If the dotenv path cannot be resolved.
KeyError – If any expected keys are missing after reading.

property tracked_keys: set[str]: Return the set of keys currently tracked in memory.

Date/time utilities for parsing, ranges, and frame calculations.

Provides DateManager for extracting timestamps from filenames, building date ranges from period specs (e.g., 1Y, 6M, 7D, 24H), and validating or interpolating time-based frame sequences.

Examples

Parse dates and compute a range:

from zyra.utils.date_manager import DateManager

dm = DateManager(["%Y%m%d"])
start, end = dm.get_date_range("7D")
ok = dm.is_date_in_range("frame_20240102.png", start, end)

class zyra.utils.date_manager.DateManager(date_formats: list[str] | None = None)[source]

Bases: object

High-level utilities for working with dates and filenames.

Parameters:: date_formats (list of str, optional) – Preferred strftime-style formats to use when parsing dates from filenames (e.g., ["%Y%m%d"]).

Examples

Use a custom filename format first, then fall back to ISO-like detection:

dm = DateManager(["%Y%m%d%H%M%S"])
when = dm.extract_date_time("frame_20240101093000.png")

__init__(date_formats: list[str] | None = None) → None[source]: Optionally store preferred date formats for filename parsing.

calculate_expected_frames(start_datetime: datetime, end_datetime: datetime, period_seconds: int) → int[source]

Calculate expected frame count between two datetimes at a cadence.

Returns:: Number of expected frames (inclusive of endpoints).
Return type:: int

datetime_format_to_regex(datetime_format: str) → str[source]: Convert a datetime format string to a regex pattern.

extract_date_time(string: str) → str | None[source]

Extract a date string from a filename/text using known formats.

Tries known formats first; falls back to a simple ISO-like pattern.

extract_dates_from_filenames(directory_path: str, image_extensions: Iterable[str] = ('.jpg', '.jpeg', '.png', '.gif', '.bmp', '.dds')) → tuple[str | None, str | None][source]

Extract dates from the first and last image file names in a directory.

Parameters:

directory_path (str) – Directory to scan for images.
image_extensions (Iterable[str]) – File extensions to include when scanning.

Returns:

(first_date, last_date) as strings, or (None, None).

Return type:

tuple

find_missing_frames(directory, period_seconds, datetime_format, filename_format, filename_mask, start_datetime, end_datetime)[source]: Find missing frames in a local directory with inconsistent period, only for image files.

find_missing_frames_and_predict_names(timestamps, period_seconds, filename_pattern)[source]: Find gaps and overfrequent frames in timestamps and predict names.

find_start_end_datetimes(directory: str)[source]: Find earliest and latest datetimes from filenames in a directory.

get_date_range(period: str) → tuple[datetime, datetime][source]

Compute a date range ending at the current minute from a period spec.

Parameters:: period (str) – Period string such as "1Y", "6M", "7D", or "24H".
Returns:: Start and end datetimes for the period ending at “now” (rounded to minute).
Return type:: (datetime, datetime)

get_date_range_iso(iso_duration: str) → tuple[datetime, datetime][source]

Compute a date range ending now from an ISO-8601 duration (e.g., P1Y, P6M, P7D, PT24H).

Supports a subset of ISO-8601: years (Y), months (M), days (D), hours (H) with the “P…T…” structure. Examples: “P1Y”, “P6M”, “P7D”, “PT24H”.

is_date_in_range(filepath: str, start_date: datetime, end_date: datetime) → bool[source]

Check if a filename contains a date within a range.

Parameters:

filepath (str) – Path or filename containing a date stamp.
start_date (datetime) – Inclusive start of the permitted range.
end_date (datetime) – Inclusive end of the permitted range.

Returns:

True if a parsed date falls within the range, else False.

Return type:

bool

parse_timestamps_from_filenames(filenames, datetime_format)[source]: Parse timestamps from filenames based on the given format.

Filesystem helper utilities.

Lightweight helpers for common file and directory operations used across the project.

Examples

Clean a scratch directory:

from zyra.utils.file_utils import remove_all_files_in_directory

remove_all_files_in_directory("./scratch")

class zyra.utils.file_utils.FileUtils[source]

Bases: object

Namespace for file-related helper routines.

Examples

While most functions are provided at module-level, a class instance can be created if you prefer an object to group related operations.

zyra.utils.file_utils.remove_all_files_in_directory(directory: str) → None[source]

Remove all files and subdirectories under a directory.

Parameters:: directory (str) – Directory to clean.
Returns:: This function returns nothing.
Return type:: None

Notes

Errors are reported via logging.error for consistency with the rest of the codebase.

Read, update, and write JSON files used as simple configs/datasets.

Provides JSONFileManager to persist simple JSON structures and to update dataset start/end times using dates inferred from a directory of frames.

Examples

Update a dataset time window:

from zyra.utils.json_file_manager import JSONFileManager

jm = JSONFileManager("./config.json")
jm.update_dataset_times("my-dataset", "./frames")
jm.save_file()

class zyra.utils.json_file_manager.JSONFileManager(file_path: str | None = None)[source]

Bases: object

Convenience wrapper for manipulating JSON files.

Parameters:: file_path (str | None) – Optional path to a JSON file on disk. When provided, the file is read immediately and made available via self.data. When omitted, the instance can be used with read_json/write_json utility methods for ad-hoc file operations.

Examples

Read, update, and save:

jm = JSONFileManager("./data.json")
jm.data["foo"] = "bar"
jm.save_file()

Ad-hoc helpers without binding to a path:

jm = JSONFileManager()
data = jm.read_json("./input.json")
jm.write_json("./out.json", data)

read_file() → None[source]

Read the JSON file from disk into memory.

Returns:: Populates self.data or sets it to None on error.
Return type:: None

read_json(path: str)[source]

Read JSON from path and return the parsed object.

Returns None and logs on error.

save_file(new_file_path: str | None = None) → None[source]

Write the in-memory data back to disk.

Parameters:: new_file_path (str, optional) – If provided, save to this path instead of overwriting the original.
Returns:: This method returns nothing.
Return type:: None

update_dataset_times(target_id: str, directory: str) → str[source]

Update start/end times for a dataset using directory image dates.

Parameters:

target_id (str) – Dataset identifier to match in the JSON payload.
directory (str) – Directory to scan for frame timestamps.

Returns:

Status message describing the outcome of the update.

Return type:

str

write_json(path: str, data) → None[source]

Write data as pretty JSON to path.

Logs on error and does not raise.

zyra.utils.cli_helpers.configure_logging_from_env(default: str = 'info') → None[source]

Set logging levels based on VERBOSITY env (supports ZYRA_*/DATAVIZHUB_*).

Values: debug|info|quiet. Defaults to ‘info’. - debug: root=DEBUG - info: root=INFO - quiet: root=ERROR (suppress most logs) Also dials down noisy third-party loggers (matplotlib, cartopy, botocore, requests).

zyra.utils.cli_helpers.detect_format_bytes(b: bytes) → str[source]

Detect basic format from magic bytes.

Returns one of: "netcdf", "grib2", or "unknown".

zyra.utils.cli_helpers.is_grib2_bytes(b: bytes) → bool[source]: Return True if bytes look like GRIB (b"GRIB").

zyra.utils.cli_helpers.is_netcdf_bytes(b: bytes) → bool[source]

Return True if bytes look like NetCDF (classic CDF or HDF5-based).

Recognizes magic headers: - Classic NetCDF: b"CDF" - NetCDF4/HDF5: b"HDF"

zyra.utils.cli_helpers.parse_levels_arg(val) → int | list[float][source]: Parse levels from int or comma-separated floats.

zyra.utils.cli_helpers.read_all_bytes(path_or_dash: str) → bytes[source]: Read all bytes from a path or ‘-’ (stdin).

zyra.utils.cli_helpers.sanitize_args(args: Iterable[str]) → list[str][source]: Return a sanitized copy of a command arg vector for logging.

zyra.utils.cli_helpers.sanitize_for_log(text: str) → str[source]

Redact secrets in URLs/headers for safe logging.

Redacts user:pass in URLs (scheme://user:pass@host)
Redacts common token/secret query params (token, signature, X-Amz-*, apikey, key, secret, password)
Redacts Authorization: Bearer tokens

zyra.utils.cli_helpers.temp_file_from_bytes(data: bytes, *, suffix: str = '') → Iterator[str][source]: Write bytes to a NamedTemporaryFile and yield its path; delete on exit.

zyra.utils.io_utils.open_input(path_or_dash: str) → Iterator[BinaryIO][source]

Yield a readable binary file-like for path or ‘-’ (stdin) without closing stdin.

When path_or_dash is ‘-’, yields sys.stdin.buffer and does not close it on exit. Otherwise opens the given path and closes it when the context exits.

zyra.utils.io_utils.open_input_file(path_or_dash: str) → BinaryIO[source]

Backward-compatible factory returning a readable binary stream.

When path_or_dash is ‘-’, returns sys.stdin.buffer; caller must NOT close it.
Otherwise returns an open file object in 'rb' mode; caller is responsible for closing it.

Prefer open_input (context manager) in new code to avoid leaking file descriptors and to ensure stdout/stdin are not accidentally closed.

zyra.utils.io_utils.open_output(path_or_dash: str) → Iterator[BinaryIO][source]

Yield a writable binary file-like for path or ‘-’ (stdout) without closing stdout.

When path_or_dash is ‘-’, yields sys.stdout.buffer and does not close it on exit. Otherwise opens the given path and closes it when the context exits.

zyra.utils.io_utils.open_output_file(path_or_dash: str) → BinaryIO[source]

Backward-compatible factory returning a writable binary stream.

When path_or_dash is ‘-’, returns sys.stdout.buffer; caller must NOT close it.
Otherwise returns an open file object in 'wb' mode; caller is responsible for closing it.

Prefer open_output (context manager) in new code to avoid leaking file descriptors and to ensure stdout/stdin are not accidentally closed.

zyra.utils.geo_utils.detect_crs_from_csv(df) → str | None[source]

zyra.utils.geo_utils.detect_crs_from_path(path: str, *, var: str | None = None) → str | None[source]

zyra.utils.geo_utils.detect_crs_from_xarray(ds) → str | None[source]

zyra.utils.geo_utils.to_cartopy_crs(crs: str | None)[source]

Return a Cartopy CRS object for an EPSG string, defaulting to PlateCarree.

Returns None if Cartopy is unavailable.

zyra.utils.geo_utils.warn_if_mismatch(input_crs: str | None, *, target_crs: str = 'EPSG:4326', reproject: bool = False, context: str = '') → None[source]

GRIB utilities used by connectors and managers.

This module centralizes protocol-agnostic helpers for working with GRIB2 index files (.idx), calculating byte ranges, and performing parallel multi-range downloads.

Notes

The .idx file path is assumed to be the GRIB file path with a .idx suffix appended, unless a path already ending in .idx is provided.
Pattern filtering uses regular expressions via re.search().

zyra.utils.grib.compute_chunks(total_size: int, chunk_size: int = 524288000) → list[str][source]

Compute contiguous byte ranges that partition a file.

The final range uses the file size as the inclusive end byte (matching the behavior used by nodd_fetch.py).

Parameters:

total_size (int) – Size of the file in bytes.
chunk_size (int, default 500MB) – Upper bound for each chunk.

Returns:

Range header strings, e.g., ["bytes=0-1048575", ...].

Return type:

list of str

zyra.utils.grib.ensure_idx_path(path: str) → str[source]

Return the .idx path for a GRIB file or pass through an explicit idx path.

Parameters:: path (str) – The GRIB file path or .idx path.
Returns:: path + '.idx' if path does not already end with .idx, otherwise returns path unchanged.
Return type:: str

zyra.utils.grib.idx_to_byteranges(lines: list[str], search_regex: str) → dict[str, str][source]

Convert .idx lines plus a variable regex into HTTP Range headers.

Parameters:

lines (list of str) – Lines from a GRIB .idx file.
search_regex (str) – Regular expression to select desired GRIB lines (e.g., “PRES:surface”).

Returns:

Mapping of {"bytes=start-end": matching_idx_line} suitable for use as Range headers.

Return type:

dict

zyra.utils.grib.parallel_download_byteranges(download_func: Callable[[str, str], bytes], key_or_url: str, byte_ranges: Iterable[str], *, max_workers: int = 10) → bytes[source]

Download multiple byte ranges in parallel and concatenate in input order.

Parameters:

download_func (Callable) – Function accepting (key_or_url, range_header) and returning bytes.
key_or_url (str) – The resource identifier for the remote object.
byte_ranges (Iterable[str]) – Iterable of Range header strings (e.g., “bytes=0-99”). Order matters and is preserved in the output concatenation.
max_workers (int, default=10) – Maximum number of worker threads.

Returns:

The concatenated payload of all requested ranges in the input order.

Return type:

bytes

zyra.utils.grib.parse_idx_lines(idx_bytes_or_text: bytes | str) → list[str][source]

Parse a GRIB index payload into non-empty lines.

Parameters:: idx_bytes_or_text (bytes or str) – Raw .idx file content.
Returns:: The non-empty, newline-split lines of the index.
Return type:: list of str