This is Vesper, a voice personal assistant that is supposed to get to know you and help you on a daily basis, based on the information he learns about you.
For now, this is only his shell, the AI is missing, but almost all the automation and principles are intact. He can hear you out, answer based on the topic or your emotions and store the conversations. I plan to grow on this project, do it step by step as my understanding on AI and ML deepens.
Vesper has "ears" and a "voice" of his own, the mic of your computer should be turned on in order for him to listen to you, but because he was created with the idea of being able to have conversations, short or long, you will need to use some special wake-up words to make him receptive, otherwise, he won't listen to you. His voice will shift with the emotion he senses in your voice, which means that he would be either neutral, happy or sad besides you.
In order not to mess his memory, I thought of having an audio memory that will be saved in folders in your computer that need to respect the format that I will specify in the How to use it (so far) section; and a memory that uses a json file for a faster process of "remembering" things about the user. The folder-based memory is a little bit tricky because it is segmented using the topics that are saved as "tags" inside the program ( the user can change the tags as they want, the code is not set in stone) and he will respond accordingly to the specific tag he recognize.
- Have you ever had that question when you look at something and be like: "Can I build that?". Well, Vesper is just a low profile of JARVIS, Siri, Bixby and all the chatboxes that have a voice. I was curious on how they work and how I could actually build my own version.
- In the meanwhile, I understood how should I apply some logic and how to combine different human-like function into a big project. This is my start in learning about the controversy about AIs.
- Python
- Libraries/ Packages:
- textblob (for sentiment polarity in a text reply);
- datatime (for time stamp and calculating the time in seconds of a reply or pause in speech);
- pydub -> AudioSegment (for the audio path);
- os (for audio and storing);
- json (for the written memory);
- sounddevice (for the mic to record the voice of the user);
- vosk + wave (for transcribing the audio);
- soundfile (write an answer for Vesper);
- pyttsx3 (for Vesper's voice, of course);
- subprocess (play the record/audio reply from Vesper).
- First of all, make sure you have installed and download all the packages mentioned above for a perfect and undisturbed usage.
- ATTENTION!: the memory is structured using rules; the memory is structured using tags and interests, so make sure you follow the memory management, which are the following:
- in the folder with the code source, add a folder named voice_cache
- you're path for a known topic should be:
voice_cache/memory/{tag} - if you are talking about somthing that is not so important, your path should look like:
voice_cache/temp - in each folder temp or tagged, there will be the records of Vesper talking to you
- After you clone or you download the repo, make sure you hit the run button from the core.py file (as the name suggests, this is Vesper's heart).
- As soon as you run the program, small hints about the program progress will be displayed, as "I'm listening..." when Vesper is waiting for your reply.
- To start a conversation, please use one of the wake-up words:
- vesper
- hey assistant
- help
- what should I do
- I'm stuck
- After you are done conversating, to stop Vesper, please use one of the stopping words:
- that's all
- stop listening
- bye
- we're done
- we are done
- Working with libraries and packages; linking the files so they respond as one
- Breaking a big and dreamy project idea into something that could be doable, structurating plans and files in modules
- Creating a voice is harder that almost anything, the pitch that comes with the emotion from a plain text is powerful
- Paying attention when working with physical memory, even in folder format; things get messy fast
- At some point, because I didn't link properly the files, I created a digital parrot. I know that it must sound funny, but a line of code gave me hours of struggle.
- The sensitivity of the mic and permissions are important. The audio transcription accuracy was already a challenge because of my accent, but when the sensitivity of my own mic created new obstacles, then I realised how important some minor aspects could actual be.
- Balancing emotion tone with technical limitation (I am still not 100% content with this). Working with free libraries and packages sometimes is not the most proffesional way to do things. If the assistant's voice sounds pretty well in neutral or sad tones... well, let's just say that when he is happy, he is another person.
- The error when some audio packages with my computer settings and permissions crashed and I wasn't able to memorise Vesper's audio responses in folders or even freeze. That's how
audio_config.pyfile was born.