ReCall stands for Reasoning with Tool Call
Contributors: Mingyang Chen, Linzhuang Sun, Tianpeng Li, Yijie Zhou, Chenzheng Zhu, Fan Yang GitHub Repo: https://github.com/Agent-RL/ReCall
Using LLMs as agents is a promising approach to unleash the power of LLMs to operate independently over extended periods, like OpenAI o3 and Deep Research. Though the definition of agents varies in different context, the main idea of using LLMs as agents focuses on using various tools to accomplish complex tasks.
Recently, Reinforcement Learning (RL) has demonstrated significant effectiveness in enhancing LLMs’ reasoning capabilities. However, most existing research focuses on developing the models' inherent reasoning abilities while overlooking the crucial aspect of tool utilization. Moreover, reasoning is essential for effective tool use, especially when navigating the vast space of possible tool-use trajectories in complex task solving scenarios.
In this work, we propose a new framework, ReCall, for training LLMs to Reason with Tool Call via reinforcement learning. Furthermore, we also provide a novel perspective to generate synthetic data with diverse environments and complex multi-step tasks, to train LLMs to use tools with reasoning capabilities.
For synthetic environments, we leverage advanced LLMs (i.e., DeepSeek-R1) to generate python code that can be executed and interact with LLMs. Instead of using a limited set of environments, we create a unique synthetic environment for each data point. This ensures maximum diversity in training scenarios, which we believe is essential for ReCall's ability to generalize to arbitrary user input tools in real-world usage.
In our pilot experiments, we trained ReCall on both Qwen2.5-7B-Instruct and Qwen2.5-32B-Instruct models, showing promising results in multi-turn tool usage scenarios. These encouraging findings warrant further investigation and development.
Since the lack of verified data and diverse executable environment for multi-step tool use on complex tasks, we deliberately design a prompt to synthesize training data for reinforcement learning. A synthetic data point includes following components:
An example of synthetic data.
In this work, we employ Group Relative Policy Optimization (GRPO) as our learning algorithm for reinforcement learning. Unlike traditional rollouts that only involve text generation by LLMs, ReCall incorporates interactions between the LLM and the tool executor during the rollout process (i.e., rollout with tool call).