-
Notifications
You must be signed in to change notification settings - Fork 0
Description
As a developer integrating with the AI system
I want a generic LLM completion endpoint that accepts user input and streams responses
So that I can integrate AI capabilities into chat interfaces, tools, and other applications
Acceptance Criteria
Given the endpoint receives a user message
When I send a POST request to /api/completion
Then I should receive a streaming response from the LLM
Given the endpoint supports streaming
When the LLM generates a response
Then I should receive chunks of the response in real-time via Server-Sent Events or similar streaming mechanism
Given the endpoint is generic and reusable
When different applications call the endpoint
Then it should work consistently for chat interfaces, tools, and other integrations without application-specific logic
Technical Requirements
Accept JSON payload with user message and optional configuration
Integrate with OpenAI API using your existing setup
Return streaming response to minimize perceived latency
Handle errors gracefully (API limits, network issues, etc.)
Include proper CORS headers for frontend integration
Log requests for monitoring and debugging
Definition of Done
Can receive a completion request
successfully returns the streaming response
A pipeline service exists to setup the step by step process (1. kick of memory evaluator, 2. Create HyDE, 3. Do Vector Search 4. Send original message to LLM with context 5. Stream back the results)