Automatic Speech Recognition (ASR) project for the Inclusive Speech Technology course.
This project implements an end-to-end Automatic Speech Recognition system, featuring model training, fine-tuning, and evaluation components. The system is designed to transcribe speech audio into text with a focus on minimizing Word Error Rate (WER).
-
experiments/: Scripts for running and analyzing ASR experiments. Used for training the baseline model
run_experiment.sh: Main script for launching experimentstest_experiment.sh: Script for testing experiment resultstrain_model.py: Python script for model trainingbatch_scripts/: Contains multiple experiment batch scriptshparams/: Hyperparameter configuration fileslogs/: Experiment log files
-
fine-tuning/: Components for fine-tuning pre-trained models
fine-tune.py: Fine-tuning implementationfine-tune.sh: Shell script wrapper for fine-tuningfine-tune.yaml: Configuration for fine-tuning parameters
-
random_split/: Dataset splits for training and evaluation
split_stats.txt: Statistics about the dataset splitstrain.csv: Training datasetval.csv: Validation datasettest.csv: Test dataset
-
results/: Evaluation results and metrics
aggregated-wer-results.txt: Compiled Word Error Rate results- Results organized by model variant (baseline/, fine-tuning/, model_testing/)
-
testing/: Model evaluation scripts
test_models.py: Python script for model evaluationtest_models.sh: Shell script wrapper for testing
-
tokenizer/: Tools for tokenization of text data
tokenizer.yaml: Configuration for the tokenizertrain_tokenizer.py: Script to train custom tokenizerstrain_tokenizer.sh: Shell wrapper for tokenizer training