- [2025.10.26] π Our project introduction has been featured on DeepWiki!
 - [2025.10.16] π Our paper has been accepted by NeurIPS 2025 Efficient Reasoning Workshop!
 - [2025.10.13] πΈ Excited to have a tutorial video for AgentFlow covered by Discover AI on YouTube!
 - [2025.10.10] π Our X post received 1K+ likes! Feel free to check out the post and join the discussion! π¬
 - [2025.10.08] π₯ We are honored to be featured as π€ HuggingFace Daily Paper #2.
 
AgentFlow is a trainable, tool-integrated agentic framework designed to overcome the scalability and generalization limits of todayβs tool-augmented reasoning approaches.
Unlike prevailing approaches such as Search-R1 which train a single LLM to interleave reasoning steps with tool calls, AgentFlow introduces a modular agentic system with four specialized modules: π§ Planner, π Executor, β Verifier, and βοΈ Generator.
For effective planning and tool use, the framework directly optimizes planner agent within the system in an online fashion using Flow-based Group Refined Policy Optimization (Flow-GRPO), achieving superior performance across diverse domains with improved tool-calling reliability and long-horizon reasoning capabilities.
Excited to have a tutorial video for AgentFlow covered by Discover AI on YouTube!
- π§© Modular Agentic System β Four specialized agent modules (Planner, Executor, Verifier, Generator) that coordinate via evolving memory and integrated tools across multiple turns.
 - π Multi-Tool Integration β Seamlessly connect with diverse tool ecosystems, including 
base_generator,python_coder,google_search,wikipedia_search,web_search, and more. - π― Flow-GRPO Algorithm β Enables in-the-flow agent optimization for long-horizon reasoning tasks with sparse rewards.
 - π Proven Results β AgentFlow (7B Backbone) beats top baselines on 10 benchmarks, with +14.9% search, +14.0% agentic, +14.5% math, +4.1% science, even outperforming ~200B-parameter GPT-4o.
 
AgentFlow (Qwen-2.5-7B-Instruct Backbone) outperforms top baselines on 10 benchmarks:
- +14.9% on search
 - +14.0% on agentic reasoning
 - +14.5% on math
 - +4.1% on science
 
π‘ Even surpasses larger proprietary models like GPT-4o (~200B).
- Improved planning and decision-making
 - Enhanced tool-calling reliability
 - Positive scaling trends with model size & reasoning turns
 
Explore more in our paper or project page.
- βοΈ Setup
 - β‘ Quick Start on AgentFlow Inference
 - π₯ Quick Start on AgentFlow Flow-GRPO Training
 - π― AgentFlow Benchmark
 - π§© Use Your Own Model in AgentFlow
 - π€ Core Contributors
 - π Advisors
 - π Acknowledgements
 - π Contributing
 
- Python 3.11 (recommended)
 
bash setup.sh
source .venv/bin/activate
# (Optional) Install `parallel` for running benchmark experiments in parallel:
sudo apt-get update
sudo apt-get install parallelCopy the .env.template file from agentflow/.env.template and rename it to .env, then place it in the agentflow/ folder. Update the following variables with your own API keys:
OPENAI_API_KEY(for judging reasponse)GOOGLE_API_KEY(for Google Search tool)DASHSCOPE_API_KEY(for calling Qwen-2.5-7B-Instruct as engine for agents and tools)TOGETHER_API_KEY(alternative for calling Qwen-2.5-7B-Instruct as engine for agents and tools - recommended for international users)- More ways: serve Qwen2.5-7B-instruct model with vLLM (details refer to 
serve_vllm_local.md). 
Please check API Key Setup Guide for detailed instructions on how to obtain these keys.
cp agentflow/.env.template agentflow/.env
# Then edit agentflow/.env with your API keysAgentFlow provides a modular agentic system with four specialized modules (planner, executor, verifier, generator) that coordinate through evolving memory and a toolkit over multiple turns to solve complex reasoning tasks.
To quickly experience the system in action, run the command below (donβt forget to set up your API key):
python quick_start.pyHere is the content of quick_start.py:
# Import the solver
from agentflow.agentflow.solver import construct_solver
# Set the LLM engine name
llm_engine_name = "dashscope"
# Construct the solver
solver = construct_solver(llm_engine_name=llm_engine_name)
# Solve the user query
output = solver.solve("What is the capital of France?")
print(output["direct_output"])For effective planning and tool use, the framework directly optimizes the planner agent within the system in an online fashion using Flow-GRPO. Below is a quick start for training.
Before diving in, we recommend verifying that AgentFlow's tools, LLM engines, and network configuration are properly set up. See test_env.md for detailed testing instructions.
We mix two datasets for training: NQ (Natural Questions) for agentic search and DeepMath-103K for mathematical reasoning.
# train data
python data/get_train_data.py
# validation data
python data/aime24_data.pyAfter that, data dir should be:
data/
βββ train/
β   βββ combined_train.parquet (182,190 samples)
βββ val/
β   βββ aime24.parquet (30 samples)
βββ aime24_data.py
βββ get_train_data.py
Start agentflow training using Flow-GRPO with tmux:
# Create tmux session and start agentflow service (Window 0)
tmux new-session -s agentflow
bash train/serve_with_logs.sh
# Create new window (Ctrl+B then C) and start training (Window 1)
bash train/train_with_logs.shConfiguration:
All training hyperparameters are in train/config.yaml (model settings, tools, RL parameters, resources, etc.)
Logging: We provide a comprehensive logging to monitor training. See logs.md for more details.
Serve the trained planner model with VLLM (here we deploy our 7B Flow-GRPO planner model):
bash scripts/serve_vllm.shRun inference on benchmark tasks:
cd test
bash exp/run_all_models_all_datasets.shYou can find more benchmarking details in benchmark.md.
AgentFlow supports different LLM engines for each agent module. See llm_engine.md for supported models and factory.py for the corresponding model_string configuration:
Planner Agent:
- Modify the 
llm_engine_nameparameter intest/exp/run_all_models_all_datasets.sh 
Other Agents (Executor, Verifier, Generator):
- By default, these agents use a fixed LLM engine (Qwen-2.5-7B-Instruct via DashScope)
 - To use your own model, modify 
self.llm_engine_fixedinagentflow/agentflow/models/planner.py:19: 
self.llm_engine_fixed = create_llm_engine(model_string="your-engine", is_multimodal=False, temperature=temperature)and
- Modify the 
llm_engine_nameparameter in the Executor instantiation fromagentflow/agentflow/solver.py:232: 
# Instantiate Executor
executor = Executor(
    # llm_engine_name=llm_engine_name,
    llm_engine_name="dashscope",
    root_cache_dir=root_cache_dir,
    verbose=verbose,
    # base_url=base_url,
    temperature=temperature
)- For detailed information on supported engines and 
model_stringformats, seellm_engine.md 
        
             
            Zhuofeng Li  | 
    
        
             
            Haoxiang Zhang  | 
    
        
             
            Pan Lu  | 
        
             
            James Zou  | 
    
        
             
            Yejin Choi  | 
    
        
             
            Yu Zhang  | 
We thank the following open-source projects:
- verl for the excellent RL framework design.
 - vLLM for fast LLM inference support.
 - Verl-Tool and agent-lightning for their early-stage exploration in agentic RL Training.
 
We thank Lambda for GPU support!
We are truly looking forward to open-source contributions to AgentFlow! If youβre interested in contributing, collaborating, or reporting issues, please feel free to open an issue or submit a pull request (PR). You can also reach us at zhuofengli12345@gmail.com, isaacpfino@gmail.com, lupantech@gmail.com or join our Slack community: AgentFlow.
We are also looking forward to your feedback and suggestions!
@article{li2025flow,
  title={In-the-Flow Agentic System Optimization for Effective Planning and Tool Use},
  author={Li, Zhuofeng and Zhang, Haoxiang and Han, Seungju and Liu, Sheng and Xie, Jianwen and Zhang, Yu and Choi, Yejin and Zou, James and Lu, Pan},
  journal={arXiv preprint arXiv:2510.05592},
  year={2025}
}

  

