This concept was created as a way to determine if an in-house voice verification process would be feasible. The result is a fast-api endpoint that takes two audio files, a verification script and a longer cloning script, and outputs a score from 0 to 1 based on the likeness of the two audio files.
The process uses Resemble AI's Resemblyzer package to calculate the score.
This project requires Python 3.11 as it is the last version that supports certain packages that Resemblyzer uses.
- Clone this repository and cd into it
- Install python 3.11 (link to download)
- Install virtualenv
- python -m pip install --user virtualenv
- Create a virtual python environment
- virtualenv environment-name --python=python3.11
- Activate the environment
- ./environment-name/Scripts/activate
- Download the requirements
- pip install requirements.txt
- Start the server
- fastapi run main.py
There is a branch in Audiate that I used to test this endpoint, voiceVerificationConcept
- Scores are deterministic
- Able to accurately score my voice when using a short verification script (5s) and a longer cloning script (~1m) with a laptop microphone. Scores ranged from 0.90 to 0.97. This included using different voice moods and tones.
- When tested against differing voices the highest score I was able to obtain was 0.70.
- Much more testing
- Test different langauges