Use Async streaming by default #25

anuj-tambwekar · 2025-12-29T20:01:11Z

Issue overview

The current implementation of _stream uses the sync Gradient client. Since no async streaming implementation is available, it causes async streams to still remain blocking under the hood.

Additionally, this causes a bug when creating a streaming agent with the DigitalOcean Gradient ADK, where agents built with langchain_gradient will buffer their entire response instead of streaming.

For a minimal reproducible example, follow the Gradient ADK example with the following code

Agent:

import os
import json
from langchain_gradient import ChatGradient
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
from typing import Dict
from gradient_adk import entrypoint

load_dotenv()

@entrypoint
async def main(input: Dict, context: Dict):
    """Entrypoint"""

    input_request = input.get("prompt")

    # model = ChatOpenAI(
    #     model="openai-gpt-4.1",
    #     base_url="https://inference.do-ai.run/v1",
    #     api_key=os.getenv("DIGITALOCEAN_INFERENCE_KEY"),
    #     streaming=True,
    # ) # This streams fine

    model = ChatGradient(
        model="openai-gpt-4.1",
        api_key=os.getenv("DIGITALOCEAN_INFERENCE_KEY"),
        streaming=True,
    ) # Outputs are not streamed out

    async for chunk in model.astream(input_request["messages"]):
        response_text = chunk.content
        yield json.dumps({"response": response_text}) + "\n"

Client Side:

import requests
import os
import json
from dotenv import load_dotenv

load_dotenv()


def stream_endpoint(url: str, body: dict, headers: dict = {}, chunk_size: int = 1024):
    payload = json.dumps(body)
    with requests.post(url, data=payload, headers=headers, stream=True) as resp:
        resp.raise_for_status()
        for chunk in resp.iter_content(chunk_size=chunk_size):
            if chunk:  # filter keep-alive chunks
                yield chunk

url = "http://localhost:8080/run" 

headers = {"Authorization": f"Bearer {os.getenv('DIGITALOCEAN_API_TOKEN')}"}

body = {"prompt": {"messages": "Tell me a joke involving computers, taxidermied mice, and cheese."}}

buffer = ""
for chunk in stream_endpoint(url, body=body, headers=headers):
    buffer += chunk.decode("utf-8")
    while "\n" in buffer:
        line, buffer = buffer.split("\n", 1)
        if not line.strip():
            continue
        response = json.loads(line)
        print(response["response"], end="", flush=True)
print()

When using the ChatGradient client, the entire response is generated before being streamed out. Meanwhile with the OpenAI client, streaming occurs properly.

Fix Details

This PR fixes the incompatibility by using the async client within _astream and making async streaming non-blocking.

…nc blocked streaming

dillonledoux

This looks reasonable to me, we probably need to wait until code freeze lifts until we merge it however

Use async streaming as default streaming implementation to prevent sy…

e37596b

…nc blocked streaming

dillonledoux reviewed Dec 29, 2025

View reviewed changes

tgillam-do approved these changes Jan 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Async streaming by default #25

Use Async streaming by default #25

Uh oh!

anuj-tambwekar commented Dec 29, 2025 •

edited

Loading

Uh oh!

dillonledoux left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use Async streaming by default #25

Are you sure you want to change the base?

Use Async streaming by default #25

Uh oh!

Conversation

anuj-tambwekar commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue overview

Fix Details

Uh oh!

dillonledoux left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anuj-tambwekar commented Dec 29, 2025 •

edited

Loading