LLM Completion Endpoint

As a developer integrating with the AI system
I want a generic LLM completion endpoint that accepts user input and streams responses
So that I can integrate AI capabilities into chat interfaces, tools, and other applications
Acceptance Criteria
Given the endpoint receives a user message
When I send a POST request to /api/completion
Then I should receive a streaming response from the LLM
Given the endpoint supports streaming
When the LLM generates a response
Then I should receive chunks of the response in real-time via Server-Sent Events or similar streaming mechanism
Given the endpoint is generic and reusable
When different applications call the endpoint
Then it should work consistently for chat interfaces, tools, and other integrations without application-specific logic
Technical Requirements

Accept JSON payload with user message and optional configuration
Integrate with OpenAI API using your existing setup
Return streaming response to minimize perceived latency
Handle errors gracefully (API limits, network issues, etc.)
Include proper CORS headers for frontend integration
Log requests for monitoring and debugging

Definition of Done

Can receive a completion request
successfully returns the streaming response
A pipeline service exists to setup the step by step process (1. kick of memory evaluator, 2. Create HyDE, 3. Do Vector Search 4. Send original message to LLM with context 5. Stream back the results)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Completion Endpoint #23

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

LLM Completion Endpoint #23

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions