Quick Recap
First, a quick recap on the steps included in a typical RAG Query:
🔢Pre-analyze question
🔍 Retrieval
🤖 Generation
In the Pre-analyze and Retrieval steps, we prepare an LLM-Prompt that generates an answer to the user's question.
With tools becoming available on many LLMs, we can now hand the Pre-Analyze and Retrieval steps directly to the LLM.
Setting up A Retrieval Agent
Let's re-build our prompt from the previous posts:
SYSTEM PROMPT:
You are a helpful agent that helps users find information in a document database. Use the provided query tool to search for information that answers the user's question.Question:
Hey RAG-Agent, what were the earnings in 2024 and how do they compare to the earnings of 2023?Available categories are:
- Finance
- HR
- IT Support
Available Languages are:
- EN
- FR
- DEAvailable Tools:
queryForInformation
- string: query
- string: language
- string: category
(tool config details are spared here for readability)
We now need to implement our queryForInformation function.
1const API_BASE = 'https://portal.keysemantics.ai/query';
2const API_KEY = '<your-api-key-here>'; // Replace with your actual API key
3
4async function queryForInformation(query, language, category) {
5 const url = `${API_BASE}/query/agents/seek?query=${encodeURIComponent(query)}&language=${encodeURIComponent(language)}&category=${encodeURIComponent(category)}`;
6
7 try {
8 const response = await fetch(url, {
9 method: 'GET',
10 headers: {
11 'Authorization': `Bearer ${API_KEY}`,
12 'Accept': 'text/plain'
13 }
14 });
15
16 if (!response.ok) {
17 throw new Error(`HTTP error ${response.status}: ${response.statusText}`);
18 }
19
20 const matchingChunks = await response.text();
21 return matchingChunks;
22 } catch (error) {
23 console.error('Error querying information:', error.message);
24 throw error;
25 }
26}This code-snippet calls the KeySemantics Seek API which returns matching text chunks. This could also be any vector DB or Search Engine.
Inside a Retrieval Agent Query
Hey RAG-Agent, what were the earnings in 2024 and how do they compare to the earnings of 2023?
🤖 RAG-Agent:
1. 🔍 queryForInformation("Annual Report 2024", "EN", "Finance")
-- ...This is the Annual Report for the financial year of 2023/2024 ....
2. 🔍 queryForInformation("Annual Report 2024 earnings", "EN", "Finance")
-- ... the earnings of 2024 were 42 Mio ....
2. 🔍 queryForInformation("Annual Report 2023 earnings", "EN", "Finance")
-- ... the earings of the 2022/2023 fiscal year were 40 Mio which was an increase of...
3. 🔍 queryForInformation("earnings 2023 2024 comparison", "EN", "Finance")
-- ...the earnings of 2024 were 5% higher compared to 2023...
Answer: The earnings of 2024 were 42 Mio which is an increase of 5% compared to the previous year (40 Mio).
The RAG-Agent will continue to call our queryForInformation function until it decides that the information it has gathered is enough to confidently answer the question.
A very interesting effect here is that an agent learns more about the available data while scanning the results. The subsequent search queries become more refined based on words in the previously searched chunks often leading to a better-formulated query .
Handing the query capability to the LLM eliminates the need to pre-analyze the user's question because the agent will interpret the question and form queries autonomously.
Important Things To Note
Retrieval Speed matters more than ever! While on "traditional" RAG Systems, we run a single query against our data storage, an agent will run multiple. Very quick query times are essential for interaction-based systems like chatbots.
Limit and sanitize tool calls: If an Agent cannot find what it is looking for in the provided information. It will continue to query the available tool until it's context window (128K tokens for GPT-4) is exhausted which leads to slow answers and high costs. It is important to limit the number of tool calls to prevent this. ALso, query parameters such as Language or Category need to be sanitized because LLMs will sometimes halucinate invalid values.
Data Governance: While data governance is important for any type of interaction with an LLM, it becomes particularily important when we start handing over control to an external LLM. Ensure that tools can only access data that is allowed to be processed by the LLM.
Recap
Retrieval Agents offer a means of greatly improving RAG Pipelines. If done right, the pipeline will be able to answer even vague questions or complex comparisons. But working with agents in general is challenging due to their nondeterministic nature. Safeguards need to be in place and a rigid quality testing framework is important.
More information on Retrieval Agents:
https://blog.langchain.com/conversational-retrieval-agents
https://huggingface.co/learn/agents-course/en/unit2/smolagents/retrieval_agents


