Agentic AI for Multi-Step Data Analysis [Whitepaper]

1. Introduction

In the rapidly evolving landscape of data analytics, the demand for sophisticated AI agents that can autonomously interpret complex queries, interact with diverse data sources, and generate accurate insights is paramount. Traditional single-agent systems often struggle with multi-step reasoning and dynamic tool utilization.

This Whitepaper introduces a novel, multi-agent architecture designed to overcome these challenges. Our system is composed of three distinct yet collaborative components: a Planner, an Executor, and a Feedbacker. This modular design enables a robust and transparent workflow, from initial query decomposition to final solution validation. We demonstrate the state-of-the-art (SOTA) performance of this architecture, achieving significant scores on benchmark datasets and setting a new standard for AI-driven data analysis.

The Challenge: Moving Beyond Single-Agent Systems

Modern data environments are complex, heterogeneous, and ever-changing. Answering a seemingly simple business question like "What is our top market for fraudulent transactions?" requires a multi-faceted approach. An agent must first understand the user's intent, identify the relevant data tables and fields, formulate a precise query, execute it, and then interpret the results correctly.

Most current agentic systems operate as a single, monolithic block, combining planning, execution, and reasoning. This can lead to a lack of transparency and difficulty in debugging when errors occur. Inspired by the need for specialization and clarity, we have developed a new paradigm based on a division of labor.

2. Our Multi-Agent Architecture

Our architecture deconstructs the analytical process into three specialized agent roles. This separation of concerns allows each agent to excel at its specific task, leading to a more efficient, accurate, and auditable system.

The overall workflow is designed as a pipeline, ensuring a structured and logical progression from query to validated answer.

Flowchart diagram showing the Planner, Executor, and Feedbacker architecture with AI agents and tools including SQL NLQ Engine, Search Engine, Code Interpreter, and MCP Server — *High-level overview of the Planner, Executor, and Feedbacker architecture.*

2.1 The Planner: The Master Strategist

The first point of contact for any user query is the Planner. Its primary role is not to answer the question directly, but to create a comprehensive, step-by-step blueprint for how to solve it. This involves a four-stage process:

Deconstruct the Question: The Planner breaks down the user's request into smaller, manageable sub-problems.
Extract Entities: It identifies key entities, metrics, and filters within the query (e.g., 'ip_country', 'fraud').
Explore and Constrain: The agent performs a preliminary exploration of the available data schemas to understand their structure, limitations, and constraints. This prevents the formulation of invalid or inefficient queries later on.
Outline the Solution Approach: The Planner creates an actionable plan that explicitly outlines which tools to use and in what sequence. It selects from a suite of available tools, including:
- Text-to-SQL Engine: For querying structured databases.
- Code Interpreter: For running Python code for complex transformations or calculations.
- Search Engine: For retrieving external information or context.
- MCP Server (Model Context Protocol Server): For interacting with the custom data source.

2.2 The Executor: The Diligent Worker

The Executor's task is to meticulously carry out the blueprint created by the Planner. It operates on a Reasoning and Action (ReAct) basis, which involves a continuous loop of thought and execution. This is not a single action but a sequence of them, forming the core of its multi-step reasoning capability. For example, the result from a first action (like fetching a list of top-selling products) becomes the context for a second thought process and subsequent action (like analyzing the customer demographics for those specific products).

This iterative chain allows the Executor to break down complex analytical problems, handle dependencies between steps, and dynamically adapt its approach based on the output of its actions. If a SQL query returns an error or an unexpected result, the Executor can reason about the cause and attempt a different action, all while staying within the strategic confines of the original plan.

2.3 The Feedbacker: The Quality Inspector

Once the Executor produces a final answer, it is passed to the Feedbacker. This final agent acts as an automated quality assurance layer. The Feedbacker reads the final solution and the original query, then assigns a reward score based on its correctness, format, and relevance. This scoring mechanism is crucial for performance tracking, reinforcement learning, and error analysis.

3. Performance & Competitive Analysis

We evaluated our architecture on the DABstep benchmark, which tests an agent's ability to perform multi-step reasoning over diverse datasets. The results demonstrate the effectiveness of our multi-agent approach, not only in absolute terms but also in comparison to other leading industry agents.

Our agent achieves the highest score on both Easy and Hard tasks, demonstrating its superior capability. Notably, our agent's 41.01% accuracy on hard tasks significantly outperforms the next-best competitor, showcasing the robustness of the Planner - Executor - Feedbacker design for complex, multi-step analytical challenges. While the Mphasis agent matches our performance on easy tasks, our architecture provides a distinct advantage as task difficulty increases.

The reference of our result can be accessed here.

Performance comparison table showing various AI agents and models with their easy level and hard level accuracy percentages on the DABstep benchmark — *Performance comparison on the DABstep benchmark*

Bar chart comparing multi-step agent performance across different AI models, showing easy level (green) and hard level (red) accuracy percentages — *Performance comparison on the DABstep benchmark.*

Database interface screenshot showing Amity DA Agent v0.1 submissions with task IDs, agent answers, and various metadata columns — *The reference of our DABStep Result.*

4. Conclusion and Future Work

Our Planner-Executor-Feedbacker architecture represents a significant step forward in building intelligent, reliable, and transparent data analysis agents. By separating concerns, we empower each component to perform its function optimally, leading to state-of-the-art results that are highly competitive with major industry players.

Future work will focus on:

Enhancing the Planner's strategic capabilities with more complex by synthesizing and finetuning using GRPO technique and Test-Time Reinforcement Learning by utilizing the Grader which is built by Human Expert Data Analyst. (Ref1, Ref2)
Expanding the Executor's toolset to include more advanced statistical analysis and visualization libraries.

Download PDF: link

‍

Collaborate and partner with our AI Lab at Amity Solutions here