datavizhub.utils package

Utilities used across DataVizHub (dates, files, images, credentials).

Avoid importing optional heavy dependencies at package import time to keep the CLI lightweight when only a subset of functionality is needed (e.g., pipeline runner). Submodules can still be imported directly when required.

class datavizhub.utils.CredentialManager(filename: str | None = None, namespace: str | None = None)[source]

Bases: object

Manage app credentials from a dotenv file.

Parameters:
  • filename (str, optional) – Path to a dotenv file containing key=value pairs.

  • namespace (str, optional) – Optional prefix to apply to all keys when stored/retrieved.

Examples

Namespaced keys:

cm = CredentialManager(".env", namespace="MYAPP_")
cm.read_credentials(expected_keys=["API_KEY"])  # expects MYAPP_API_KEY
add_credential(key: str, value: str) None[source]

Add or update a credential value in memory.

Parameters:
  • key (str) – Base key name (namespace is applied automatically).

  • value (str) – Credential value to store.

clear_credentials() None[source]

Remove all tracked credentials from memory.

delete_credential(key: str) None[source]

Delete a credential by key if present.

Parameters:

key (str) – Base key name (namespace is applied automatically).

get_credential(key: str) str[source]

Retrieve a credential value by key (with namespace applied).

Parameters:

key (str) – Base key name (namespace is applied automatically).

Returns:

Stored value for the namespaced key.

Return type:

str

Raises:

KeyError – If the key is not present in memory.

list_credentials(expected_keys: Iterable[str] | None = None) list[str][source]

List tracked credential keys, checking for expected ones when provided.

Parameters:

expected_keys (Iterable[str], optional) – Keys to verify are present.

Returns:

Keys currently stored in memory.

Return type:

list of str

Raises:

KeyError – If any expected keys are missing.

read_credentials(expected_keys: Iterable[str] | None = None) None[source]

Read credentials from the dotenv file into memory.

Parameters:

expected_keys (Iterable[str], optional) – Keys that must be present; raises if any are missing.

Raises:
  • FileNotFoundError – If the dotenv path cannot be resolved.

  • KeyError – If any expected keys are missing after reading.

property tracked_keys: set[str]

Return the set of keys currently tracked in memory.

class datavizhub.utils.DateManager(date_formats: list[str] | None = None)[source]

Bases: object

High-level utilities for working with dates and filenames.

Parameters:

date_formats (list of str, optional) – Preferred strftime-style formats to use when parsing dates from filenames (e.g., ["%Y%m%d"]).

Examples

Use a custom filename format first, then fall back to ISO-like detection:

dm = DateManager(["%Y%m%d%H%M%S"])
when = dm.extract_date_time("frame_20240101093000.png")
__init__(date_formats: list[str] | None = None) None[source]

Optionally store preferred date formats for filename parsing.

calculate_expected_frames(start_datetime: datetime, end_datetime: datetime, period_seconds: int) int[source]

Calculate expected frame count between two datetimes at a cadence.

Returns:

Number of expected frames (inclusive of endpoints).

Return type:

int

datetime_format_to_regex(datetime_format: str) str[source]

Convert a datetime format string to a regex pattern.

extract_date_time(string: str) str | None[source]

Extract a date string from a filename/text using known formats.

Tries known formats first; falls back to a simple ISO-like pattern.

extract_dates_from_filenames(directory_path: str, image_extensions: Iterable[str] = ('.jpg', '.jpeg', '.png', '.gif', '.bmp', '.dds')) Tuple[str | None, str | None][source]

Extract dates from the first and last image file names in a directory.

Parameters:
  • directory_path (str) – Directory to scan for images.

  • image_extensions (Iterable[str]) – File extensions to include when scanning.

Returns:

(first_date, last_date) as strings, or (None, None).

Return type:

tuple

find_missing_frames(directory, period_seconds, datetime_format, filename_format, filename_mask, start_datetime, end_datetime)[source]

Find missing frames in a local directory with inconsistent period, only for image files.

find_missing_frames_and_predict_names(timestamps, period_seconds, filename_pattern)[source]

Find gaps and overfrequent frames in timestamps and predict names.

find_start_end_datetimes(directory: str)[source]

Find earliest and latest datetimes from filenames in a directory.

get_date_range(period: str) Tuple[datetime, datetime][source]

Compute a date range ending at the current minute from a period spec.

Parameters:

period (str) – Period string such as "1Y", "6M", "7D", or "24H".

Returns:

Start and end datetimes for the period ending at “now” (rounded to minute).

Return type:

(datetime, datetime)

is_date_in_range(filepath: str, start_date: datetime, end_date: datetime) bool[source]

Check if a filename contains a date within a range.

Parameters:
  • filepath (str) – Path or filename containing a date stamp.

  • start_date (datetime) – Inclusive start of the permitted range.

  • end_date (datetime) – Inclusive end of the permitted range.

Returns:

True if a parsed date falls within the range, else False.

Return type:

bool

parse_timestamps_from_filenames(filenames, datetime_format)[source]

Parse timestamps from filenames based on the given format.

class datavizhub.utils.FileUtils[source]

Bases: object

Namespace for file-related helper routines.

Examples

While most functions are provided at module-level, a class instance can be created if you prefer an object to group related operations.

class datavizhub.utils.JSONFileManager(file_path: str)[source]

Bases: object

Convenience wrapper for manipulating JSON files.

Parameters:

file_path (str) – Path to a JSON file on disk.

Examples

Read, update, and save:

jm = JSONFileManager("./data.json")
jm.data["foo"] = "bar"
jm.save_file()
read_file() None[source]

Read the JSON file from disk into memory.

Returns:

Populates self.data or sets it to None on error.

Return type:

None

save_file(new_file_path: str | None = None) None[source]

Write the in-memory data back to disk.

Parameters:

new_file_path (str, optional) – If provided, save to this path instead of overwriting the original.

Returns:

This method returns nothing.

Return type:

None

update_dataset_times(target_id: str, directory: str) str[source]

Update start/end times for a dataset using directory image dates.

Parameters:
  • target_id (str) – Dataset identifier to match in the JSON payload.

  • directory (str) – Directory to scan for frame timestamps.

Returns:

Status message describing the outcome of the update.

Return type:

str

datavizhub.utils.remove_all_files_in_directory(directory: str) None[source]

Remove all files and subdirectories under a directory.

Parameters:

directory (str) – Directory to clean.

Returns:

This function returns nothing.

Return type:

None

Notes

Errors are reported via logging.error for consistency with the rest of the codebase.

Modules

Credential storage and retrieval utilities.

This module provides CredentialManager, a small helper class to securely load, access, and manage credentials from a .env-style file without leaking them into the global process environment.

Examples

Load and access values:

from datavizhub.utils.credential_manager import CredentialManager

cm = CredentialManager("./.env")
cm.read_credentials(expected_keys=["API_KEY"])
token = cm.get_credential("API_KEY")

Use as a context manager:

with CredentialManager("./.env") as cm:
    cm.read_credentials()
    do_work(cm.get_credential("ACCESS_TOKEN"))
class datavizhub.utils.credential_manager.CredentialManager(filename: str | None = None, namespace: str | None = None)[source]

Bases: object

Manage app credentials from a dotenv file.

Parameters:
  • filename (str, optional) – Path to a dotenv file containing key=value pairs.

  • namespace (str, optional) – Optional prefix to apply to all keys when stored/retrieved.

Examples

Namespaced keys:

cm = CredentialManager(".env", namespace="MYAPP_")
cm.read_credentials(expected_keys=["API_KEY"])  # expects MYAPP_API_KEY
add_credential(key: str, value: str) None[source]

Add or update a credential value in memory.

Parameters:
  • key (str) – Base key name (namespace is applied automatically).

  • value (str) – Credential value to store.

clear_credentials() None[source]

Remove all tracked credentials from memory.

delete_credential(key: str) None[source]

Delete a credential by key if present.

Parameters:

key (str) – Base key name (namespace is applied automatically).

get_credential(key: str) str[source]

Retrieve a credential value by key (with namespace applied).

Parameters:

key (str) – Base key name (namespace is applied automatically).

Returns:

Stored value for the namespaced key.

Return type:

str

Raises:

KeyError – If the key is not present in memory.

list_credentials(expected_keys: Iterable[str] | None = None) list[str][source]

List tracked credential keys, checking for expected ones when provided.

Parameters:

expected_keys (Iterable[str], optional) – Keys to verify are present.

Returns:

Keys currently stored in memory.

Return type:

list of str

Raises:

KeyError – If any expected keys are missing.

read_credentials(expected_keys: Iterable[str] | None = None) None[source]

Read credentials from the dotenv file into memory.

Parameters:

expected_keys (Iterable[str], optional) – Keys that must be present; raises if any are missing.

Raises:
  • FileNotFoundError – If the dotenv path cannot be resolved.

  • KeyError – If any expected keys are missing after reading.

property tracked_keys: set[str]

Return the set of keys currently tracked in memory.

Date/time utilities for parsing, ranges, and frame calculations.

Provides DateManager for extracting timestamps from filenames, building date ranges from period specs (e.g., 1Y, 6M, 7D, 24H), and validating or interpolating time-based frame sequences.

Examples

Parse dates and compute a range:

from datavizhub.utils.date_manager import DateManager

dm = DateManager(["%Y%m%d"])
start, end = dm.get_date_range("7D")
ok = dm.is_date_in_range("frame_20240102.png", start, end)
class datavizhub.utils.date_manager.DateManager(date_formats: list[str] | None = None)[source]

Bases: object

High-level utilities for working with dates and filenames.

Parameters:

date_formats (list of str, optional) – Preferred strftime-style formats to use when parsing dates from filenames (e.g., ["%Y%m%d"]).

Examples

Use a custom filename format first, then fall back to ISO-like detection:

dm = DateManager(["%Y%m%d%H%M%S"])
when = dm.extract_date_time("frame_20240101093000.png")
__init__(date_formats: list[str] | None = None) None[source]

Optionally store preferred date formats for filename parsing.

calculate_expected_frames(start_datetime: datetime, end_datetime: datetime, period_seconds: int) int[source]

Calculate expected frame count between two datetimes at a cadence.

Returns:

Number of expected frames (inclusive of endpoints).

Return type:

int

datetime_format_to_regex(datetime_format: str) str[source]

Convert a datetime format string to a regex pattern.

extract_date_time(string: str) str | None[source]

Extract a date string from a filename/text using known formats.

Tries known formats first; falls back to a simple ISO-like pattern.

extract_dates_from_filenames(directory_path: str, image_extensions: Iterable[str] = ('.jpg', '.jpeg', '.png', '.gif', '.bmp', '.dds')) Tuple[str | None, str | None][source]

Extract dates from the first and last image file names in a directory.

Parameters:
  • directory_path (str) – Directory to scan for images.

  • image_extensions (Iterable[str]) – File extensions to include when scanning.

Returns:

(first_date, last_date) as strings, or (None, None).

Return type:

tuple

find_missing_frames(directory, period_seconds, datetime_format, filename_format, filename_mask, start_datetime, end_datetime)[source]

Find missing frames in a local directory with inconsistent period, only for image files.

find_missing_frames_and_predict_names(timestamps, period_seconds, filename_pattern)[source]

Find gaps and overfrequent frames in timestamps and predict names.

find_start_end_datetimes(directory: str)[source]

Find earliest and latest datetimes from filenames in a directory.

get_date_range(period: str) Tuple[datetime, datetime][source]

Compute a date range ending at the current minute from a period spec.

Parameters:

period (str) – Period string such as "1Y", "6M", "7D", or "24H".

Returns:

Start and end datetimes for the period ending at “now” (rounded to minute).

Return type:

(datetime, datetime)

is_date_in_range(filepath: str, start_date: datetime, end_date: datetime) bool[source]

Check if a filename contains a date within a range.

Parameters:
  • filepath (str) – Path or filename containing a date stamp.

  • start_date (datetime) – Inclusive start of the permitted range.

  • end_date (datetime) – Inclusive end of the permitted range.

Returns:

True if a parsed date falls within the range, else False.

Return type:

bool

parse_timestamps_from_filenames(filenames, datetime_format)[source]

Parse timestamps from filenames based on the given format.

Filesystem helper utilities.

Lightweight helpers for common file and directory operations used across the project.

Examples

Clean a scratch directory:

from datavizhub.utils.file_utils import remove_all_files_in_directory

remove_all_files_in_directory("./scratch")
class datavizhub.utils.file_utils.FileUtils[source]

Bases: object

Namespace for file-related helper routines.

Examples

While most functions are provided at module-level, a class instance can be created if you prefer an object to group related operations.

datavizhub.utils.file_utils.remove_all_files_in_directory(directory: str) None[source]

Remove all files and subdirectories under a directory.

Parameters:

directory (str) – Directory to clean.

Returns:

This function returns nothing.

Return type:

None

Notes

Errors are reported via logging.error for consistency with the rest of the codebase.

Read, update, and write JSON files used as simple configs/datasets.

Provides JSONFileManager to persist simple JSON structures and to update dataset start/end times using dates inferred from a directory of frames.

Examples

Update a dataset time window:

from datavizhub.utils.json_file_manager import JSONFileManager

jm = JSONFileManager("./config.json")
jm.update_dataset_times("my-dataset", "./frames")
jm.save_file()
class datavizhub.utils.json_file_manager.JSONFileManager(file_path: str)[source]

Bases: object

Convenience wrapper for manipulating JSON files.

Parameters:

file_path (str) – Path to a JSON file on disk.

Examples

Read, update, and save:

jm = JSONFileManager("./data.json")
jm.data["foo"] = "bar"
jm.save_file()
read_file() None[source]

Read the JSON file from disk into memory.

Returns:

Populates self.data or sets it to None on error.

Return type:

None

save_file(new_file_path: str | None = None) None[source]

Write the in-memory data back to disk.

Parameters:

new_file_path (str, optional) – If provided, save to this path instead of overwriting the original.

Returns:

This method returns nothing.

Return type:

None

update_dataset_times(target_id: str, directory: str) str[source]

Update start/end times for a dataset using directory image dates.

Parameters:
  • target_id (str) – Dataset identifier to match in the JSON payload.

  • directory (str) – Directory to scan for frame timestamps.

Returns:

Status message describing the outcome of the update.

Return type:

str