Build a RAG AI Agent in 30 Minutes

Coding DevAI10 mins read

Retrieval-Augmented Generation (RAG) is one of the most practical ways to build reliable AI systems for real-world use.

Instead of letting an AI model guess answers, a RAG system first searches your own documents, retrieves the most relevant information, and only then generates a response based on that data.

In this tutorial, you will build a complete RAG-based AI agent that:

  • Reads knowledge from PDF files
  • Converts them into embeddings
  • Stores them in a vector database (FAISS)
  • Retrieves relevant sections for every question
  • Generates safe, context-based answers
  • Exposes everything through a FastAPI backend
  • Displays a chat widget using React

This setup is suitable for insurance agencies, SaaS documentation, internal company tools, HR policies, legal documents, and customer support systems.

The final app has two parts:

Backend (Python)

The backend is responsible for:

  • Reading PDF files
  • Cleaning and normalizing text
  • Splitting text into small overlapping chunks
  • Creating embeddings for each chunk
  • Storing embeddings in a vector database
  • Searching for relevant chunks for a user query
  • Generating the final answer using only retrieved context

Frontend (React)

The frontend is a simple chat widget that:

  • Opens as a floating modal on the website
  • Displays conversation history
  • Sends user questions to the backend
  • Displays AI-generated answers

When a user asks:

“How do I file a claim?”

The system does not guess. It searches inside the PDF knowledge base, retrieves the most relevant lines, and generates the answer using only those lines.

RAG in One Sentence

RAG = Search your documents first, then generate the answer.

This approach eliminates hallucinations and ensures answers remain grounded in verified data.

Technology Stack

Backend

  • FastAPI – API server
  • PyPDF – extract text from PDFs
  • Token-based chunking with overlap
  • OpenAI text-embedding-3-small – embeddings model
  • FAISS – vector database
  • OpenAI Responses API – answer generation

Frontend

  • React
  • Vite
  • Simple chat widget UI

Folder Structure

A clean and industry-ready project structure:

insurance-rag-bot/
  backend/
    main.py
    rag/
      pdf_to_text.py
      chunking.py
      embed_store.py
      rag_answer.py
      make_sample_pdf.py
    data/
      knowledge.pdf

  frontend/
    index.html
    package.json
    src/
      App.jsx
      ChatWidget.jsx
      ChatWidget.css

This same structure can be reused for other domains such as healthcare, finance, education, or SaaS support.

Step 1: Create a Sample PDF Knowledge Base

If you do not have a real dataset yet, create a fake FAQ-style PDF to test the system end-to-end.

Create backend/rag/make_sample_pdf.py:

from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas


def make_pdf(path="backend/data/knowledge.pdf"):
    c = canvas.Canvas(path, pagesize=letter)
    text = c.beginText(40, 750)

    lines = [
        "Insurance Agency Customer Care Knowledge Base",
        "",
        "Q: How do I file a claim?",
        "A: Call our claims line or submit through the portal. Keep photos and receipts.",
        "",
        "Q: What is a deductible?",
        "A: The amount you pay out of pocket before insurance starts paying.",
        "",
        "Q: How do I add a driver to my auto policy?",
        "A: Provide name, DOB, license number, and effective date. We’ll send a quote.",
    ]

    for line in lines:
        text.textLine(line)

    c.drawText(text)
    c.showPage()
    c.save()


if __name__ == "__main__":
    make_pdf()

Step 2: Read and Clean PDF Text

Create backend/rag/pdf_to_text.py:

from pypdf import PdfReader


def pdf_to_text(pdf_path: str) -> str:
    reader = PdfReader(pdf_path)
    pages = []
    for page in reader.pages:
        pages.append(page.extract_text() or "")

    text = "\n".join(pages)
    text = text.replace("\r", "\n")
    text = "\n".join([line.strip() for line in text.split("\n") if line.strip()])
    return text

If your PDF is scanned (image-based), you will need OCR. Text-based PDFs work directly.

Step 3: Chunking (The Most Important Step)

Chunking breaks large documents into small overlapping pieces so that semantic search works reliably.

  • Chunks too large → messy answers
  • Chunks too small → loss of context

Token-based chunking with overlap is the safest default.

Create backend/rag/chunking.py:

import tiktoken


def chunk_text(text: str, chunk_tokens: int = 450, overlap_tokens: int = 80):
    enc = tiktoken.get_encoding("cl100k_base")
    tokens = enc.encode(text)

    chunks = []
    start = 0
    while start < len(tokens):
        end = start + chunk_tokens
        chunk = enc.decode(tokens[start:end])
        chunks.append(chunk)
        start = end - overlap_tokens
        if start < 0:
            start = 0

    return chunks

Common Chunking Strategies

  • Token chunking – best general default
  • Section-based chunking – for structured documents
  • Sentence chunking – natural but often too small
  • Paragraph chunking – depends on formatting quality
  • Sliding window chunking – good context coverage
  • Semantic chunking – best quality, more complex
  • Table-aware chunking – preserves tables
  • Hybrid chunking – combines multiple strategies

Step 4: Create Embeddings and Store Them in FAISS

Embeddings convert text into numeric vectors that represent meaning.

Create backend/rag/embed_store.py:

import json
import numpy as np
import faiss
from openai import OpenAI

client = OpenAI()
EMBED_MODEL = "text-embedding-3-small"


def embed_texts(texts):
    resp = client.embeddings.create(model=EMBED_MODEL, input=texts)
    vectors = [d.embedding for d in resp.data]
    arr = np.array(vectors, dtype="float32")
    faiss.normalize_L2(arr)
    return arr


def build_and_save_index(chunks, index_path, meta_path):
    vectors = embed_texts(chunks)
    dim = vectors.shape[1]

    index = faiss.IndexFlatIP(dim)
    index.add(vectors)

    faiss.write_index(index, index_path)

    with open(meta_path, "w", encoding="utf-8") as f:
        json.dump({"chunks": chunks}, f, ensure_ascii=False, indent=2)


def load_index(index_path, meta_path):
    index = faiss.read_index(index_path)

    with open(meta_path, "r", encoding="utf-8") as f:
        meta = json.load(f)

    return index, meta["chunks"]

Step 5: Retrieval and Answer Generation

This step ensures safety. The model is only allowed to answer using retrieved document context.

Create backend/rag/rag_answer.py:

import numpy as np
import faiss
from openai import OpenAI

client = OpenAI()
CHAT_MODEL = "gpt-5.2"
EMBED_MODEL = "text-embedding-3-small"


def embed_query(query: str):
    resp = client.embeddings.create(model=EMBED_MODEL, input=[query])
    vec = np.array([resp.data[0].embedding], dtype="float32")
    faiss.normalize_L2(vec)
    return vec


def retrieve(query, index, chunks, k=4):
    qvec = embed_query(query)
    _, ids = index.search(qvec, k)

    results = []
    for i in ids[0]:
        if i != -1:
            results.append(chunks[i])

    return results


def generate_answer(user_question, retrieved_chunks):
    context = "\n\n".join(retrieved_chunks)

    response = client.responses.create(
        model=CHAT_MODEL,
        instructions=(
            "You are an Insurance Agency Customer Care assistant. "
            "Use only the provided context to answer. "
            "If the answer is not present, say you do not have it and offer human support."
        ),
        input=f"Context:\n{context}\n\nQuestion:\n{user_question}",
    )

    return response.output_text

Step 6: FastAPI Backend

Create backend/main.py:

import os
from fastapi import FastAPI
from pydantic import BaseModel

from rag.pdf_to_text import pdf_to_text
from rag.chunking import chunk_text
from rag.embed_store import build_and_save_index, load_index
from rag.rag_answer import retrieve, generate_answer

app = FastAPI()

DATA_DIR = os.path.join(os.path.dirname(__file__), "data")
PDF_PATH = os.path.join(DATA_DIR, "knowledge.pdf")
INDEX_PATH = os.path.join(DATA_DIR, "index.faiss")
META_PATH = os.path.join(DATA_DIR, "chunks.json")

index = None
chunks = None

class ChatIn(BaseModel):
    message: str


@app.post("/ingest")
def ingest():
    global index, chunks

    text = pdf_to_text(PDF_PATH)
    chunks = chunk_text(text)
    build_and_save_index(chunks, INDEX_PATH, META_PATH)
    index, chunks = load_index(INDEX_PATH, META_PATH)

    return {"status": "ok", "chunks": len(chunks)}


@app.post("/chat")
def chat(payload: ChatIn):
    global index, chunks

    if index is None or chunks is None:
        if os.path.exists(INDEX_PATH) and os.path.exists(META_PATH):
            index, chunks = load_index(INDEX_PATH, META_PATH)
        else:
            return {"answer": "Knowledge base not ingested yet. Call /ingest first."}

    hits = retrieve(payload.message, index, chunks)
    answer = generate_answer(payload.message, hits)

    return {"answer": answer}

Step 7: React Frontend Chat Widget

Create frontend/src/ChatWidget.jsx:

import { useState } from "react";

export default function ChatWidget() {
  const [msgs, setMsgs] = useState([
    { role: "bot", text: "Hi. Ask me anything about your policy or claims." },
  ]);

  const [text, setText] = useState("");

  async function send() {
    const msg = text.trim();
    if (!msg) return;

    setMsgs((m) => [...m, { role: "user", text: msg }]);
    setText("");

    const res = await fetch("http://localhost:8000/chat", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ message: msg }),
    });

    const data = await res.json();
    setMsgs((m) => [...m, { role: "bot", text: data.answer }]);
  }

  return (
    <div>
      {msgs.map((m, i) => (
        <div key={i}>
          <b>{m.role}:</b> {m.text}
        </div>
      ))}

      <input
        value={text}
        onChange={(e) => setText(e.target.value)}
        placeholder="Type your question"
      />

      <button onClick={send}>Send</button>
    </div>
  );
}

Step 8: Requirements

Create backend/requirements.txt:

fastapi
uvicorn
pydantic
pypdf
tiktoken
faiss-cpu
numpy
openai
reportlab

Step 9: Environment Variables

Create a .env file in backend/:

OPENAI_API_KEY=your_api_key_here

Do not commit this file to GitHub.

Step 10: Run the Project

Backend

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

python rag/make_sample_pdf.py
uvicorn main:app --reload --port 8000

Build the index once:

curl -X POST http://localhost:8000/ingest

Frontend

cd frontend
npm install
npm run dev

Conclusion

You now have a complete RAG AI agent that reads PDFs, creates embeddings, retrieves relevant sections, and generates context-based answers through a FastAPI backend and React frontend. You can extend this system by adding user authentication, logging, or integrating with other services. If you want source code, check out the GitHub repository. Thank you for reading!