RAG gets much better if you use agents – and this is how we do it

Foto del autor

By Samuel Heinrichs

September 4, 2024

Generative AI applications are being developed at an unprecedented pace, with Retrieval Augmented Generation (RAG) being perhaps the most relevant example of it. Picture it as a ChatGPT-like interface, but customized to your own knowledge database. The trick is that, instead of relying on training or fine-tuning a Large Language Model (LLM)-which is time-consuming, expensive, prone to hallucinations and cannot be repeated at the same pace in which data is updated-, it is connected directly to your sources of information. Hence, every time you ask a question, a semantic search engine retrieves contexts that are relevant to craft an answer, and the LLM builds a human-like reply for you.

While most existing RAG tools rely on this process of embedding-based retrieval plus a prompt-engineered LLM to craft answers, we have observed that it does not scale very well to handle multiple sources of information. This is due to the complexity of handling different data structures and schemas through a single, prompt-engineered solution: it’s like asking an LLM to memorize every possible way it should process a question, dictating how to interact with this or that database, where to find the information, etc. As a result, these tools get stuck in modest applications that are not transformative enough for highly demanding users.

Digital agents, autonomous and powered by LLMs, offer a promising solution to this matter. Instead of instructing a model on how to do everything, agents are provided with access to different functional tools and a brief description of how every one of them acts. Then, they could use language as a proxy for reasoning, and, based on the user prompt, plan which sequence of actions is the most appropriate to build an answer.

In this article we’ll dive into how we migrated AK Bot, our internal RAG system, from a basic GenAI implementation to an agent-based one. Stay here if you want to learn how we transitioned from a standard, less flexible flow of actions to an open-ended, dynamic solution that has enhanced adaptability but also has significantly improved our users’ experience.

The challenges of traditional LLM-based RAG tools

As we outlined before, using standard prompt-engineered solutions for RAG is safe and sound for small, rigid applications, but face several challenges in other scenarios:

  • They don’t scale well when complexity increases: Each new data source or interaction feature that we add requires updating the same prompt that was used before for the already existing tooling. Hence, these approaches need more extensive testing to ensure compatibility with those existing features, often leading to unexpected behavior in production environments.
  • They get stuck in poor results when asked multi-hop questions: Simple RAG systems struggle with questions requiring multiple steps. For example, “When was the last time I met Bobby for planning a sales call?” needs first retrieving the current date, then identifying all minutes of meetings with Bobby, and finally refining those results based on which were about sales and were closer to the current date.
  • They can’t handle multi-source question answering: Some queries require combining information from multiple sources. For instance, “What’s the status of the project supervised by Mary?” demands accessing the table that assigns supervisors to each project and checking also the last weekly report of that project.
  • Their user experience gets progressively worse: while these tools certainly capture the enthusiasm of users at the very beginning, they gradually become less attractive as they fail when challenged with more complicated questions. 

Trust us: not using agents leads to several dead ends

If you’re an AI engineer, you might say: “Well, there’s no need to implement agents because there are other easier solutions to the problems you mentioned”. While it’s true that here are alternatives, unfortunately none of them will work as accurately as agents:

What about question classifiers?

Question classification through prompt engineering allows identifying subtypes of queries, and triggers specific pipelines based on the output. Nevertheless, while effective for simple categorization, they struggle with scaling. As the number of flows increases, classifier errors compound, necessitating complex fallback mechanisms. Combining multiple prompts often dilutes their effectiveness, leading to decreased overall performance.

And what about using super context?

This approach involves prefetching all possible sources for each query, and feeding them all to the LLM when crafting the answer. While seemingly straightforward, it presents several challenges: there is a huge computational overhead (as prefetching and processing all potential data sources is computationally expensive, leading to increased latency), requires complex argument handling, is extremely cost inefficient (as it consumes unnecessary tokens, inflating API costs) and is of course limited by the actual context window of the LLM.

Agents are the solution!

GenAI agents are systems empowered by LLMs that have control on their own about the flow of actions that need to be followed to craft an answer. By defining a set of tools, the agent can dynamically decide which tools to use and when, based on the user’s input and the context of the conversation. They do this by using language as a proxy for reasoning: a meta-prompt teaches them that they have access to a specific set of actions, and that they have to plan the right sequence to get an answer. Then, this sequence is executed step by step, enabling to gradually craft the answer using the available tools.

This approach renders multiple benefits. First of all, it is flexible:  agents can handle an arbitrary range of queries without having to predefine flows, adapting themselves to new scenarios on the fly, and replying negatively when they cannot handle a particular request. This flexibility enables an improved multi-Hop reasoning, as they can now break down complex queries into multiple steps, accessing different tools as needed. As a result, agents are also more efficient in resource utilization, as they only choose relevant tools for each query, optimizing computational resources and enabling more rapid answers. Finally, all these contribute to an enhanced user experience, thanks to this ability to understand and respond to nuanced queries in a more natural, context-aware interaction.

A hitchhiker’s guide to the implementation of agents in RAG 

Step 1. Review the tech stack

If you have already started to develop a RAG system, you probably have an existing stack of libraries and frameworks to follow. In our case, the first version of AK Bot was purely implemented using Langchain and the OpenAI GPT 3.5 APIs, so we used their capabilities for agent implementation. If you’re starting from scratch, you might also consider using LlamaIndex. Each tool has its own advantages and disadvantages, so you might take a look at this post if you’re intrigued about which to use.

In particular, Langchain offers a suite of features that facilitate development of agents for RAG, such as pre-configured memory management (you know, to handle follow-up conversations), database connections, and integration with models like Embeddings and LLMs. Additionally, Langgraph (a component of Langchain) allows for the definition of agents as graphs, enabling the creation of custom solutions.

Step 2. Define the tools for the agent

You can picture the agent as a fisherman with a Swiss knife, who will have access to this or that tool depending on what he has to do. 

In our case, we wanted AK Bot to be connected with the three most important sources of internal knowledge that we have: Confluence (where we register all the organizational memory of Arionkoder), HiBob (our human resources platform), and Google Drive (our cloud drive provider, where we store all the internal documents). Hence, we defined a diverse set of tools covering various functionalities:

  1. Date conversions: it enables understanding expressions like “today” and “next week” into real date values. 
  2. Database queries: based on the values passed by the LLM, this tool queries a database to extract Bob information from tables (e.g. registered leaves) and provide it back to the LLM to craft the answer.
  3. Semantic search: this tool retrieves relevant documents from Confluence and/or Google Drive based on a textual input, using embeddings models for retrieval.

Step 3. Configure the agent

Once the tools are defined, we need to set up the LLM-powered agent to instruct it on how to use them, and how to manage the conversation.  This includes developing mechanisms for those edge case scenarios where the agent struggles to find an appropriate tool (e.g. because the user requests something from a source that is yet to be connected, or something that is not that straightforward to solve). Furthermore, we experimentally found that it is very relevant to use memory buffers to preserve important information across multiple runs, to reduce latency and ensure accuracy in the answers.

Step 3. Refine the agent iteratively

The first version of the agent will most likely not be the one that you were looking for. And that’s ok, because if it’s your first agent, it might take a while to figure out how to make it work properly

In our setting, we were able to bypass this issue thanks to our automated QA pipeline, which allowed us to measure the effectiveness of the agent before deployment. Furthermore, we implemented feedback mechanisms in our bot to retrofit our database with users’ opinions, which enabled us to fine-tune prompts and tools.

Conclusion

RAG is perhaps the most ubiquitous application of large language models in the real world. And it’s totally reasonable, because it allows users seamless access to answers to their questions with almost zero risk of hallucinations, as LLMs are grounded to actual facts through their retrieval components.

While standard implementations based in traditional prompt engineering and GenAI are cool to learn how to perform RAG or for very small applications, they reveal themselves less accurate and successful when problems scale in number of input databases, complexity of the queries, or different types of data sources. Alternatively, agents reveal themselves as a modular, reusable and very accurate solution for those challenging scenarios

In our experience with AK Bot, we’ve managed to develop an ambitious, fully functional tool that is already transforming the way in which we perform our daily tasks in the company, while creating the building blocks to accelerate RAG development for our customers.

Do you have a specific RAG application in mind for which you need help on the implementation? Reach out to us at hello@arionkoder.com so we can help you accomplish your goals!