Build a RAG AI Agent in 30 Minutes
Retrieval-Augmented Generation (RAG) is one of the most practical ways to build reliable AI systems for real-world use.
Instead of letting an AI model guess answers, a RAG system first searches your own documents, retrieves the most relevant information, and only then generates a response based on that data.
In this tutorial, you will build a complete RAG-based AI agent that:
- Reads knowledge from PDF files
- Converts them into embeddings
- Stores them in a vector database (FAISS)
- Retrieves relevant sections for every question
- Generates safe, context-based answers
- Exposes everything through a FastAPI backend
- Displays a chat widget using React
This setup is suitable for insurance agencies, SaaS documentation, internal company tools, HR policies, legal documents, and customer support systems.
The final app has two parts:
Backend (Python)
The backend is responsible for:
- Reading PDF files
- Cleaning and normalizing text
- Splitting text into small overlapping chunks
- Creating embeddings for each chunk
- Storing embeddings in a vector database
- Searching for relevant chunks for a user query
- Generating the final answer using only retrieved context
Frontend (React)
The frontend is a simple chat widget that:
- Opens as a floating modal on the website
- Displays conversation history
- Sends user questions to the backend
- Displays AI-generated answers
When a user asks:
“How do I file a claim?”
The system does not guess. It searches inside the PDF knowledge base, retrieves the most relevant lines, and generates the answer using only those lines.
RAG in One Sentence
RAG = Search your documents first, then generate the answer.
This approach eliminates hallucinations and ensures answers remain grounded in verified data.
Technology Stack
Backend
- FastAPI – API server
- PyPDF – extract text from PDFs
- Token-based chunking with overlap
- OpenAI
text-embedding-3-small– embeddings model - FAISS – vector database
- OpenAI Responses API – answer generation
Frontend
- React
- Vite
- Simple chat widget UI
Folder Structure
A clean and industry-ready project structure:
insurance-rag-bot/
backend/
main.py
rag/
pdf_to_text.py
chunking.py
embed_store.py
rag_answer.py
make_sample_pdf.py
data/
knowledge.pdf
frontend/
index.html
package.json
src/
App.jsx
ChatWidget.jsx
ChatWidget.cssThis same structure can be reused for other domains such as healthcare, finance, education, or SaaS support.
Step 1: Create a Sample PDF Knowledge Base
If you do not have a real dataset yet, create a fake FAQ-style PDF to test the system end-to-end.
Create backend/rag/make_sample_pdf.py:
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
def make_pdf(path="backend/data/knowledge.pdf"):
c = canvas.Canvas(path, pagesize=letter)
text = c.beginText(40, 750)
lines = [
"Insurance Agency Customer Care Knowledge Base",
"",
"Q: How do I file a claim?",
"A: Call our claims line or submit through the portal. Keep photos and receipts.",
"",
"Q: What is a deductible?",
"A: The amount you pay out of pocket before insurance starts paying.",
"",
"Q: How do I add a driver to my auto policy?",
"A: Provide name, DOB, license number, and effective date. We’ll send a quote.",
]
for line in lines:
text.textLine(line)
c.drawText(text)
c.showPage()
c.save()
if __name__ == "__main__":
make_pdf()Step 2: Read and Clean PDF Text
Create backend/rag/pdf_to_text.py:
from pypdf import PdfReader
def pdf_to_text(pdf_path: str) -> str:
reader = PdfReader(pdf_path)
pages = []
for page in reader.pages:
pages.append(page.extract_text() or "")
text = "\n".join(pages)
text = text.replace("\r", "\n")
text = "\n".join([line.strip() for line in text.split("\n") if line.strip()])
return textIf your PDF is scanned (image-based), you will need OCR. Text-based PDFs work directly.
Step 3: Chunking (The Most Important Step)
Chunking breaks large documents into small overlapping pieces so that semantic search works reliably.
- Chunks too large → messy answers
- Chunks too small → loss of context
Token-based chunking with overlap is the safest default.
Create backend/rag/chunking.py:
import tiktoken
def chunk_text(text: str, chunk_tokens: int = 450, overlap_tokens: int = 80):
enc = tiktoken.get_encoding("cl100k_base")
tokens = enc.encode(text)
chunks = []
start = 0
while start < len(tokens):
end = start + chunk_tokens
chunk = enc.decode(tokens[start:end])
chunks.append(chunk)
start = end - overlap_tokens
if start < 0:
start = 0
return chunksCommon Chunking Strategies
- Token chunking – best general default
- Section-based chunking – for structured documents
- Sentence chunking – natural but often too small
- Paragraph chunking – depends on formatting quality
- Sliding window chunking – good context coverage
- Semantic chunking – best quality, more complex
- Table-aware chunking – preserves tables
- Hybrid chunking – combines multiple strategies
Step 4: Create Embeddings and Store Them in FAISS
Embeddings convert text into numeric vectors that represent meaning.
Create backend/rag/embed_store.py:
import json
import numpy as np
import faiss
from openai import OpenAI
client = OpenAI()
EMBED_MODEL = "text-embedding-3-small"
def embed_texts(texts):
resp = client.embeddings.create(model=EMBED_MODEL, input=texts)
vectors = [d.embedding for d in resp.data]
arr = np.array(vectors, dtype="float32")
faiss.normalize_L2(arr)
return arr
def build_and_save_index(chunks, index_path, meta_path):
vectors = embed_texts(chunks)
dim = vectors.shape[1]
index = faiss.IndexFlatIP(dim)
index.add(vectors)
faiss.write_index(index, index_path)
with open(meta_path, "w", encoding="utf-8") as f:
json.dump({"chunks": chunks}, f, ensure_ascii=False, indent=2)
def load_index(index_path, meta_path):
index = faiss.read_index(index_path)
with open(meta_path, "r", encoding="utf-8") as f:
meta = json.load(f)
return index, meta["chunks"]Step 5: Retrieval and Answer Generation
This step ensures safety. The model is only allowed to answer using retrieved document context.
Create backend/rag/rag_answer.py:
import numpy as np
import faiss
from openai import OpenAI
client = OpenAI()
CHAT_MODEL = "gpt-5.2"
EMBED_MODEL = "text-embedding-3-small"
def embed_query(query: str):
resp = client.embeddings.create(model=EMBED_MODEL, input=[query])
vec = np.array([resp.data[0].embedding], dtype="float32")
faiss.normalize_L2(vec)
return vec
def retrieve(query, index, chunks, k=4):
qvec = embed_query(query)
_, ids = index.search(qvec, k)
results = []
for i in ids[0]:
if i != -1:
results.append(chunks[i])
return results
def generate_answer(user_question, retrieved_chunks):
context = "\n\n".join(retrieved_chunks)
response = client.responses.create(
model=CHAT_MODEL,
instructions=(
"You are an Insurance Agency Customer Care assistant. "
"Use only the provided context to answer. "
"If the answer is not present, say you do not have it and offer human support."
),
input=f"Context:\n{context}\n\nQuestion:\n{user_question}",
)
return response.output_textStep 6: FastAPI Backend
Create backend/main.py:
import os
from fastapi import FastAPI
from pydantic import BaseModel
from rag.pdf_to_text import pdf_to_text
from rag.chunking import chunk_text
from rag.embed_store import build_and_save_index, load_index
from rag.rag_answer import retrieve, generate_answer
app = FastAPI()
DATA_DIR = os.path.join(os.path.dirname(__file__), "data")
PDF_PATH = os.path.join(DATA_DIR, "knowledge.pdf")
INDEX_PATH = os.path.join(DATA_DIR, "index.faiss")
META_PATH = os.path.join(DATA_DIR, "chunks.json")
index = None
chunks = None
class ChatIn(BaseModel):
message: str
@app.post("/ingest")
def ingest():
global index, chunks
text = pdf_to_text(PDF_PATH)
chunks = chunk_text(text)
build_and_save_index(chunks, INDEX_PATH, META_PATH)
index, chunks = load_index(INDEX_PATH, META_PATH)
return {"status": "ok", "chunks": len(chunks)}
@app.post("/chat")
def chat(payload: ChatIn):
global index, chunks
if index is None or chunks is None:
if os.path.exists(INDEX_PATH) and os.path.exists(META_PATH):
index, chunks = load_index(INDEX_PATH, META_PATH)
else:
return {"answer": "Knowledge base not ingested yet. Call /ingest first."}
hits = retrieve(payload.message, index, chunks)
answer = generate_answer(payload.message, hits)
return {"answer": answer}Step 7: React Frontend Chat Widget
Create frontend/src/ChatWidget.jsx:
import { useState } from "react";
export default function ChatWidget() {
const [msgs, setMsgs] = useState([
{ role: "bot", text: "Hi. Ask me anything about your policy or claims." },
]);
const [text, setText] = useState("");
async function send() {
const msg = text.trim();
if (!msg) return;
setMsgs((m) => [...m, { role: "user", text: msg }]);
setText("");
const res = await fetch("http://localhost:8000/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ message: msg }),
});
const data = await res.json();
setMsgs((m) => [...m, { role: "bot", text: data.answer }]);
}
return (
<div>
{msgs.map((m, i) => (
<div key={i}>
<b>{m.role}:</b> {m.text}
</div>
))}
<input
value={text}
onChange={(e) => setText(e.target.value)}
placeholder="Type your question"
/>
<button onClick={send}>Send</button>
</div>
);
}Step 8: Requirements
Create backend/requirements.txt:
fastapi
uvicorn
pydantic
pypdf
tiktoken
faiss-cpu
numpy
openai
reportlabStep 9: Environment Variables
Create a .env file in backend/:
OPENAI_API_KEY=your_api_key_hereDo not commit this file to GitHub.
Step 10: Run the Project
Backend
cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python rag/make_sample_pdf.py
uvicorn main:app --reload --port 8000Build the index once:
curl -X POST http://localhost:8000/ingestFrontend
cd frontend
npm install
npm run devConclusion
You now have a complete RAG AI agent that reads PDFs, creates embeddings, retrieves relevant sections, and generates context-based answers through a FastAPI backend and React frontend. You can extend this system by adding user authentication, logging, or integrating with other services. If you want source code, check out the GitHub repository. Thank you for reading!