A typical RAG pipeline
📥 Data Ingestion
In order to be efficiently retrieved (searched for), text data has to be read from websites (crawling) or it has to be extracted from Documents or Images.
🧹 Preprocessing & Chunking
Extracted text data is split into smaller chunks (typically paragraphs). Irrelevant content is removed.
🧠 Embedding & Indexing
In this stage a vector embedding is assigned to each chunk. A vector embedding is in a way an "Address" given to it by an LLM that represents the meaning/topic of the encoded text. I like to think of a bookstore: When you ask for a book about economics they might tell you that it can be found on floor 2 aisle 4 at the back. "Floor 2 aisle 4 at the back" would be the vector and books with similar topics will have a similar vector.
Side-note on embedding models:
Many commercial and open-source embedding models exist for numerous use-cases. The MTEB Leaderboard compares existing models. But picking the best embedding model isn't a matter of just picking the leader on the MTEB list. Proper data ingestion and chunking matter much more than the embedding model used and speed and ease-of-use are not measured in the MTEB benchmarks which are also important factors in my opinion.
💾 Storage
Here we store the chunks and embeddings in a vector storage. This will allow us to query for matching chunks later on.
Side-note on vector storage:
There has been a hype around new vector database providers but proven Search Engines like Apache Solr and Elasticsearch offer state-of-the-art fulltext search on top of vector search which helps improving retrieval quality and allows for filters, facets and more. Hybrid search (fulltext + vector search combined) has been proven to be more precise, faster and more explainable than plain vector-based retrieval.
[1] https://arxiv.org/abs/2412.03736
[2] https://www.elastic.co/search-labs/blog/improving-information-retrieval-elastic-stack-hybrid
Inside a RAG Query
When we ask questions to a RAG component, typically the following happens:
Hey RAG-Tool, what were the earnings in 2024 and how do they compare to the earnings of 2023?
🔢 Question is pre-analyzed
- Filter for malicious queries => Query is OK
- Extract a short, precise search query => "earnings 2024 2023"
- Which vector store should I search (Finances, HR, IT-Support,...) => "Finances"
🔍 Vector Storage is queried for matching text chunks
🤖 An LLM prompt is executed using the matching text chunks
Please answer the user's question based on the provided data:
QUESTION:
Hey RAG-Tool, what were the earnings in 2024 and how do they compare to the earnings of 2023?DATA:
Earnings Document 2024
...the earnings of 2024 were 42 Mio...
Earnings Document 2023
...in the fiscal year of 2023/2024 the company had earnings of 40 Mio which was decrease 20% from the previous year...
Answer:
The earnings of 2024 were 42 Mio which is an increase compared to the previous year (40 Mio)
Why not just ask ChatGPT?
The outputs of this RAG Pipeline offer the following advantages over just asking LLMs like Claude and ChatGPT:
- The information you're searching for is not necessarily part of the LLM's training data
- The information source becomes clear: I control what data is searched
- Data is up to date (no cutoff)
What can RAG be used for?
Apart from the many Chatbots that are popping up everywhere, RAG also has other use cases:
🔍 Semantic Search in natural language
📧 Automated e-mail answering
☎️ Customer support automation
⚖️ Legal and compliance assistants
📄 Coding and writing assistants


