📚

High Level Terminology

The medium by which the developer can communicate capability with the PI

VoxKit was created with a systme in mind, a system for setting up research that breaks the ground. One component of that system is an abstracted terminology which forms the basis for an ecosystem (or shared ground) which bridges the disconnect between the developer and the researcher. Here are those terms.

Engine

A set of audio processing tools adapted to work on a specific task (e.g., training, alignment, GOP extraction). Engines group tools flexibly but the precedent has been to group tools that are part of the same low level library.

Analyzer

A function that extacts information about a dataset, with a mechanism for visulization. Analyzers are used to provide feedback to the user about their dataset, such as the distribution of audio durations or the number of speakers.

Stacker

A stacker is a smaller workflow that can be used as part of a larger workflow, for example a GOP exploration/analysis pipeline might require a forced alignment stacker which allows for automated annotation of audio files. Stacker classes are defined in the code but can be reordered and enabled/disabled in the configuration file, allowing for flexible workflow design without code changes after the coding ste is complete.

Pipeline

A named sequence of steps that implements a study-specific processing flow. Pipelines are defined in `config/pipeline_definitions.json` and outline the guidance and order of opertions.

Startup Routine

A coded routine that runs once on first launch to prepare the applications assets. (In the future) Startup behavior can be configured in `config/startup.json`.

Configuration (config files)

Human-editable JSON/YAML files in the `config` folder that drive behavior without changing code. Researcher teams can customize definitions like (pipeline selection, step parameters, study metadata).

Artifact & Provenance

Artifacts are outputs produced by steps (alignment files, feature arrays, plots). Provenance is metadata that records which pipeline, config version, component versions, and timestamps produced those artifacts; recording provenance is essential for reproducibility and auditability.