Skip to content

Conversation

@anxkhn
Copy link
Contributor

@anxkhn anxkhn commented Oct 2, 2025

Currently, when using Ollama inference, it defaults to num_ctx=2048, which is too small for longer prompts and outputs. This results in truncated responses, as confirmed by the Ollama logs.

This PR updates our configuration and utility code to set num_ctx explicitly, preventing prompt and response cutoffs for detailed posts. Following Ollama’s guidance, we set num_ctx to a higher value (default 8192 - seems like a sweet spot ensuring compatibility with smaller machines), and Ollama will automatically cap it at the model’s maximum supported context size.

Changes:

  • Added ollama_num_ctx setting to config.toml (default: 8192).
  • Updated ollama_predict in leetcomp/utils.py to pass num_ctx from config.

Result:
Ensures longer prompts and responses are not truncated, without requiring per-model manual tuning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant