ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning

ReCall stands for Reasoning with Tool Call

Contributors: Mingyang Chen, Linzhuang Sun, Tianpeng Li, Yijie Zhou, Chenzheng Zhu, Fan Yang GitHub Repo: https://github.com/Agent-RL/ReCall

1. Introduction

Using LLMs as agents is a promising approach to unleash the power of LLMs to operate independently over extended periods, like OpenAI o3 and Deep Research. Though the definition of agents varies in different context, the main idea of using LLMs as agents focuses on using various tools to accomplish complex tasks.

Tool use (i.e., tool call or function call) enhances LLMs' capabilities by enabling them to interact with external environments, extending beyond their internal knowledge-based responses. Such environment can be diverse, including ticketing systems, search engines, and databases, provided they offer standardized interfaces for interaction.
Complex task handling is another crucial aspect of agentic behavior, as real-world problems often require multiple steps. For agents to effectively solve such multi-step tasks in real-world environments, they must possess strong reasoning capabilities, including planning, task decomposition, and decision-making skills.

Recently, Reinforcement Learning (RL) has demonstrated significant effectiveness in enhancing LLMs’ reasoning capabilities. However, most existing research focuses on developing the models' inherent reasoning abilities while overlooking the crucial aspect of tool utilization. Moreover, reasoning is essential for effective tool use, especially when navigating the vast space of possible tool-use trajectories in complex task solving scenarios.

In this work, we propose a new framework, ReCall, for training LLMs to Reason with Tool Call via reinforcement learning. Furthermore, we also provide a novel perspective to generate synthetic data with diverse environments and complex multi-step tasks, to train LLMs to use tools with reasoning capabilities.

For synthetic environments, we leverage advanced LLMs (i.e., DeepSeek-R1) to generate python code that can be executed and interact with LLMs. Instead of using a limited set of environments, we create a unique synthetic environment for each data point. This ensures maximum diversity in training scenarios, which we believe is essential for ReCall's ability to generalize to arbitrary user input tools in real-world usage.

In our pilot experiments, we trained ReCall on both Qwen2.5-7B-Instruct and Qwen2.5-32B-Instruct models, showing promising results in multi-turn tool usage scenarios. These encouraging findings warrant further investigation and development.

2. Reasoning with Tool Call

Synthetic Environment with Complex Tasks

Since the lack of verified data and diverse executable environment for multi-step tool use on complex tasks, we deliberately design a prompt to synthesize training data for reinforcement learning. A synthetic data point includes following components:

Environment: A synthetic environment in Python, which often includes a synthetic database (e.g., Python dictionary) and a set of executable functions. This environment is only accessible to the tool executor but not the training LLM.
Tool Schemas: A set of tool schemas that describe the interface of the tool, which are exposed to the training LLM.
Question & Answer: A complex question based on the environment, and the question need multi-step tool use to solve. A answer of this question.

An example of synthetic data.

Reinforcement Learning

In this work, we employ Group Relative Policy Optimization (GRPO) as our learning algorithm for reinforcement learning. Unlike traditional rollouts that only involve text generation by LLMs, ReCall incorporates interactions between the LLM and the tool executor during the rollout process (i.e., rollout with tool call).