VoxKit Features

Enabling cross-functional teams to conduct phonetic research with ease.

Mission

To accelerate discovery and improve care in speech-language pathology by providing an intuitive, transparent platform for advanced forced alignment, pronunciation assessment, and phonetic research. Accessible to every researcher, not just programmers.

Bridging Complexity

Low level Abstraction

Abstract base class patterns allow for the rapid addition of new alignment toolkits and graphical components.

High Level Configuration

Quickly adjust visual flow and guidance to fit the needs of specific research studies and user groups.

Layered Architecture

Clean separation between GUI, storage, engine, and analyzer layers enables independent development and testing of each component.

Metadata Tracking

Every dataset, model, and alignment stores provenance information for reproducibility and data sharing.

Speech Engines

VoxKit's modular api enables seamless integration of speech processing engines (toolkits), from established libraries at the cutting edge and beyond.

Montreal Forced Aligner (MFA)

Production Ready

Industry-standard forced alignment with speaker-adaptive training. Achieves human-level reliability on diverse speech samples.

Tools:

Forced AlignmentModel Training

Wav2TextGrid (W2TG)

Production Ready

Alternative alignment engine implementing state-of-the-art Wav2Vec2 based phonetic alignment.

Tools:

Forced AlignmentModel Training

Faster Whisper

In Development

Lightweight automatic speech recognition (ASR) engine for transcription tasks, optimized for speed and efficiency.

Tools:

Transcription

UI Stackers

Multi-step capabilities for common research tasks, from model training to alignment generation and pronunciation assessment. Stackers are ordered, added and removed to build custom pipelines.

Transcription

Generate text transcriptions from audio datasets using integrated ASR engines.

Workflow Steps:

  • Dataset selection
  • ASR engine configuration
  • Transcription execution

Training

Train custom acoustic models on your datasets with configurable hyperparameters.

Workflow Steps:

  • Dataset selection
  • Model configuration
  • Training execution

Predicting

Generate phoneme-level alignments using trained or pretrained models.

Workflow Steps:

  • Model selection
  • Dataset alignment
  • TextGrid generation

GOP Extraction

Extract Goodness of Pronunciation scores for pronunciation assessment and speech disorder analysis.

Workflow Steps:

  • Dataset selection
  • Alignment selection
  • GOP computation

Dataset Analyzers

Extract structured metadata from datasets at registration time, enabling quality assurance and a tailored method of visualization.

Default Analyzer

Extracts core dataset metadata including file counts, and speaker count.

Analyzer Outputs:

  • CSV summary
  • Bar chart

Custom Analyzers

Extensible analyzer system allows researchers to add more...

Analyzer Outputs:

  • CSV Summary
  • Bar chart

Key Differentiators

For Non-Technical Researchers

Graphical interface eliminates command-line barriers while maintaining full control over analysis parameters and model configurations.

For Technical Teams

Extensible architecture with well-documented APIs enables integration of proprietary tools and custom analysis pipelines.

Ready to explore VoxKit's capabilities in depth?