voiceflow

⭐️ Real-time Voice Interaction Framework based on Go ⭐️

English • 中文 (Chinese)

Introduction

voiceflow is an open-source project built with Go, designed to enable real-time voice interaction with Large Language Models (LLMs). By integrating various third-party voice platforms and local models, voiceflow supports real-time Speech-to-Text (STT), Text-to-Speech (TTS), and intelligent interaction with LLMs.

Core Features 🌟

Real-time Speech-to-Text (STT): Integrates with multiple cloud STT services (e.g., Azure, Google) and local models to convert user speech into text in real-time.
LLM Interaction: Sends the recognized text directly to audio-capable LLMs to obtain intelligent responses.
Text-to-Speech (TTS): Converts the LLM's text responses back into speech, supporting various TTS services (e.g., Azure, Google) and local models.
Audio Storage & Access: Utilizes storage services like MinIO to store generated audio files and provide access URLs for real-time playback on the frontend.
Pluggable Service Integration: Features a modular design allowing for pluggable integration of different STT, TTS services, and LLMs, facilitating easy extension and customization. 🎉

Quick Start

Installation

Clone the Repository

git clone https://github.com/telepace/voiceflow.git
cd voiceflow

Install Dependencies

Ensure you have Go 1.16 or higher installed.
```
go mod tidy
```

Configuration

Copy the Example Environment File

cp configs/.env.example configs/.env

Edit the .env file and fill in the appropriate configuration values:

# Example Environment Variables
MINIO_ENDPOINT=play.min.io        # Your MinIO server endpoint
MINIO_ACCESS_KEY=youraccesskey    # Your MinIO access key
MINIO_SECRET_KEY=yoursecretkey    # Your MinIO secret key
AZURE_STT_KEY=yourazuresttkey     # Your Azure Speech-to-Text service key
AZURE_TTS_KEY=yourazurettskey     # Your Azure Text-to-Speech service key
# Add other necessary keys (e.g., Google Cloud, OpenAI API keys) as needed

Configure config.yaml

Edit configs/config.yaml according to your project requirements:

server:
  port: 8080          # Port the server will listen on
  enable_tls: false   # Set to true to enable TLS/SSL

minio:
  enabled: true       # Set to true to enable MinIO storage
  bucket_name: voiceflow-audio # Name of the MinIO bucket for audio files

stt: # Speech-to-Text Configuration
  provider: azure     # Options: azure, google, local (choose your STT provider)
  # Add provider-specific settings here if needed

tts: # Text-to-Speech Configuration
  provider: google    # Options: azure, google, local (choose your TTS provider)
  # Add provider-specific settings here if needed

llm: # Large Language Model Configuration
  provider: openai    # Options: openai, local (choose your LLM provider)
  # Add provider-specific settings here (e.g., API key, model name)

logging:
  level: info         # Logging level (e.g., debug, info, warn, error)

Start the Application

Run the following command in the project root directory:

go run cmd/main.go

Check if the service has started correctly by accessing http://localhost:8080 (or your configured port).

Architecture Diagram

graph TD
    A["Frontend (Browser)"] --> B["WebSocket Server (Go Backend)"]
    B --> C["Speech-to-Text (STT) Module"]
    C --> D["Large Language Model (LLM) Module"]
    D --> E["Text-to-Speech (TTS) Module"]
    E --> F["Storage Service (e.g., MinIO)"]
    F --> B  ["Provides Audio URL"]
    B --> A  ["Sends Audio URL/Data"]

Frontend (Browser): The user records voice input via the browser, sending audio data through a WebSocket connection to the server.
WebSocket Server: Receives audio data from the frontend and orchestrates the workflow between different service modules.
Speech-to-Text (STT) Module: Converts the incoming audio data into text.
Large Language Model (LLM) Module: Processes the text from STT and generates an intelligent response.
Text-to-Speech (TTS) Module: Converts the LLM's text response back into audio data.
Storage Service (MinIO): Stores the generated audio files and provides accessible URLs for playback.

Directory Structure

voiceflow/
├── cmd/
│   └── main.go              # Application entry point
├── configs/
│   ├── config.yaml          # Business logic configuration file
│   └── .env                 # Environment variables file (sensitive keys, etc.)
├── internal/
│   ├── config/              # Configuration loading module
│   ├── server/              # WebSocket server implementation
│   ├── stt/                 # Speech-to-Text module (interfaces, implementations)
│   ├── tts/                 # Text-to-Speech module (interfaces, implementations)
│   ├── llm/                 # LLM interaction module (interfaces, implementations)
│   ├── storage/             # Storage module (interfaces, implementations like MinIO)
│   ├── models/              # Data models/structs used across the application
│   └── utils/               # Utility functions
├── pkg/
│   └── logger/              # Logging module setup
├── scripts/                 # Build and deployment scripts (if any)
├── go.mod                   # Go modules file (dependencies)
├── go.sum                   # Go modules checksum file
└── README.md                # Project description (this file)

Core Modules

WebSocket Server
- Implemented using gorilla/websocket.
- Handles real-time communication with the frontend, receiving audio data and sending back processing results (like audio URLs).
Speech-to-Text (STT)
- Interface Definition: internal/stt/stt.go defines the standard interface for STT services.
- Pluggable Implementations: Supports various providers like Azure, Google Cloud Speech, and potentially local models. New providers can be added by implementing the interface.
Text-to-Speech (TTS)
- Interface Definition: internal/tts/tts.go defines the standard interface for TTS services.
- Pluggable Implementations: Supports various providers like Azure, Google Cloud Text-to-Speech, and potentially local models.
Large Language Model (LLM)
- Interface Definition: internal/llm/llm.go defines the interface for interacting with LLMs.
- Pluggable Implementations: Supports providers like OpenAI (GPT models) and potentially local LLMs.
Storage Module
- Interface Definition: internal/storage/storage.go defines the interface for storage services.
- Implementation: Defaults to using MinIO for object storage (ideal for audio files) but can be adapted to use local file systems or other cloud storage providers.

TODO

Implement a Message Bus (e.g., Kafka, NATS) for better decoupling between services.
Integrate a Configuration Center (e.g., Consul, etcd) for dynamic configuration management.
Provide Containerized Deployment options (Dockerfile, docker-compose.yaml).
Implement Hooks/Callbacks for extending functionality at various stages of the pipeline.

References

Contributing

We welcome contributions of any kind! Please read CONTRIBUTING.md (if available, otherwise follow standard GitHub practices) for more information.

Reporting Issues: If you find a bug or have a feature suggestion, please submit an issue on GitHub.
Contributing Code: Fork the repository, make your changes on a separate branch, and submit a Pull Request.

License

voiceflow is licensed under the Apache License 2.0.

Acknowledgements

Thank you to all the developers who have contributed to this project!

Name		Name	Last commit message	Last commit date
Latest commit History 199 Commits
.github		.github
api		api
assets		assets
audio_files		audio_files
build		build
cmd		cmd
configs		configs
deploy		deploy
docs		docs
examples		examples
init		init
internal		internal
pkg		pkg
scripts		scripts
test		test
third_party		third_party
tools		tools
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.golangci.yml		.golangci.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_zh-CN.md		README_zh-CN.md
go.mod		go.mod
go.sum		go.sum
go.work		go.work
go.work.sum		go.work.sum
sweep.yaml		sweep.yaml
user.md		user.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

voiceflow

⭐️ Real-time Voice Interaction Framework based on Go ⭐️

English • 中文 (Chinese)

Table of Contents

Introduction

Core Features 🌟

Quick Start

Installation

Configuration

Start the Application

Architecture Diagram

Directory Structure

Core Modules

TODO

References

Contributing

License

Acknowledgements

About

Uh oh!

Releases 5

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors 5

Uh oh!

Languages

Uh oh!

License

telepace/voiceflow

Folders and files

Latest commit

History

Repository files navigation

voiceflow

⭐️ Real-time Voice Interaction Framework based on Go ⭐️

English • 中文 (Chinese)

Table of Contents

Introduction

Core Features 🌟

Quick Start

Installation

Configuration

Start the Application

Architecture Diagram

Directory Structure

Core Modules

TODO

References

Contributing

License

Acknowledgements

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors 5

Uh oh!

Languages

Packages