Retrieval-Augmented Generation (RAG) is one of the most practical ways to build reliable AI systems for real-world use.

Instead of letting an AI model guess answers, a RAG system first searches your own documents, retrieves the most relevant information, and only then generates a response based on that data.

In this tutorial, you will build a complete RAG-based AI agent that:

Reads knowledge from PDF files
Converts them into embeddings
Stores them in a vector database (FAISS)
Retrieves relevant sections for every question
Generates safe, context-based answers
Exposes everything through a FastAPI backend
Displays a chat widget using React

This setup is suitable for insurance agencies, SaaS documentation, internal company tools, HR policies, legal documents, and customer support systems.

The final app has two parts:

Backend (Python)

The backend is responsible for:

Reading PDF files
Cleaning and normalizing text
Splitting text into small overlapping chunks
Creating embeddings for each chunk
Storing embeddings in a vector database
Searching for relevant chunks for a user query
Generating the final answer using only retrieved context

Frontend (React)

The frontend is a simple chat widget that:

Opens as a floating modal on the website
Displays conversation history
Sends user questions to the backend
Displays AI-generated answers

When a user asks:

“How do I file a claim?”

The system does not guess. It searches inside the PDF knowledge base, retrieves the most relevant lines, and generates the answer using only those lines.

RAG in One Sentence

RAG = Search your documents first, then generate the answer.

This approach eliminates hallucinations and ensures answers remain grounded in verified data.

Technology Stack

Backend

FastAPI – API server
PyPDF – extract text from PDFs
Token-based chunking with overlap
OpenAI text-embedding-3-small – embeddings model
FAISS – vector database
OpenAI Responses API – answer generation

Frontend

React
Vite
Simple chat widget UI

Folder Structure

A clean and industry-ready project structure:

insurance-rag-bot/
  backend/
    main.py
    rag/
      pdf_to_text.py
      chunking.py
      embed_store.py
      rag_answer.py
      make_sample_pdf.py
    data/
      knowledge.pdf

  frontend/
    index.html
    package.json
    src/
      App.jsx
      ChatWidget.jsx
      ChatWidget.css

This same structure can be reused for other domains such as healthcare, finance, education, or SaaS support.

Step 1: Create a Sample PDF Knowledge Base

If you do not have a real dataset yet, create a fake FAQ-style PDF to test the system end-to-end.

Create backend/rag/make_sample_pdf.py:

from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas


def make_pdf(path="backend/data/knowledge.pdf"):
    c = canvas.Canvas(path, pagesize=letter)
    text = c.beginText(40, 750)

    lines = [
        "Insurance Agency Customer Care Knowledge Base",
        "",
        "Q: How do I file a claim?",
        "A: Call our claims line or submit through the portal. Keep photos and receipts.",
        "",
        "Q: What is a deductible?",
        "A: The amount you pay out of pocket before insurance starts paying.",
        "",
        "Q: How do I add a driver to my auto policy?",
        "A: Provide name, DOB, license number, and effective date. We’ll send a quote.",
    ]

    for line in lines:
        text.textLine(line)

    c.drawText(text)
    c.showPage()
    c.save()


if __name__ == "__main__":
    make_pdf()

Step 2: Read and Clean PDF Text

Create backend/rag/pdf_to_text.py:

from pypdf import PdfReader


def pdf_to_text(pdf_path: str) -> str:
    reader = PdfReader(pdf_path)
    pages = []
    for page in reader.pages:
        pages.append(page.extract_text() or "")

    text = "\n".join(pages)
    text = text.replace("\r", "\n")
    text = "\n".join([line.strip() for line in text.split("\n") if line.strip()])
    return text

If your PDF is scanned (image-based), you will need OCR. Text-based PDFs work directly.

Step 3: Chunking (The Most Important Step)

Chunking breaks large documents into small overlapping pieces so that semantic search works reliably.

Chunks too large → messy answers
Chunks too small → loss of context

Token-based chunking with overlap is the safest default.

Create backend/rag/chunking.py:

import tiktoken


def chunk_text(text: str, chunk_tokens: int = 450, overlap_tokens: int = 80):
    enc = tiktoken.get_encoding("cl100k_base")
    tokens = enc.encode(text)

    chunks = []
    start = 0
    while start < len(tokens):
        end = start + chunk_tokens
        chunk = enc.decode(tokens[start:end])
        chunks.append(chunk)
        start = end - overlap_tokens
        if start < 0:
            start = 0

    return chunks

Common Chunking Strategies

Token chunking – best general default
Section-based chunking – for structured documents
Sentence chunking – natural but often too small
Paragraph chunking – depends on formatting quality
Sliding window chunking – good context coverage
Semantic chunking – best quality, more complex
Table-aware chunking – preserves tables
Hybrid chunking – combines multiple strategies

Step 4: Create Embeddings and Store Them in FAISS

Embeddings convert text into numeric vectors that represent meaning.

Create backend/rag/embed_store.py:

import json
import numpy as np
import faiss
from openai import OpenAI

client = OpenAI()
EMBED_MODEL = "text-embedding-3-small"


def embed_texts(texts):
    resp = client.embeddings.create(model=EMBED_MODEL, input=texts)
    vectors = [d.embedding for d in resp.data]
    arr = np.array(vectors, dtype="float32")
    faiss.normalize_L2(arr)
    return arr


def build_and_save_index(chunks, index_path, meta_path):
    vectors = embed_texts(chunks)
    dim = vectors.shape[1]

    index = faiss.IndexFlatIP(dim)
    index.add(vectors)

    faiss.write_index(index, index_path)

    with open(meta_path, "w", encoding="utf-8") as f:
        json.dump({"chunks": chunks}, f, ensure_ascii=False, indent=2)


def load_index(index_path, meta_path):
    index = faiss.read_index(index_path)

    with open(meta_path, "r", encoding="utf-8") as f:
        meta = json.load(f)

    return index, meta["chunks"]

Step 5: Retrieval and Answer Generation

This step ensures safety. The model is only allowed to answer using retrieved document context.

Create backend/rag/rag_answer.py:

import numpy as np
import faiss
from openai import OpenAI

client = OpenAI()
CHAT_MODEL = "gpt-5.2"
EMBED_MODEL = "text-embedding-3-small"


def embed_query(query: str):
    resp = client.embeddings.create(model=EMBED_MODEL, input=[query])
    vec = np.array([resp.data[0].embedding], dtype="float32")
    faiss.normalize_L2(vec)
    return vec


def retrieve(query, index, chunks, k=4):
    qvec = embed_query(query)
    _, ids = index.search(qvec, k)

    results = []
    for i in ids[0]:
        if i != -1:
            results.append(chunks[i])

    return results


def generate_answer(user_question, retrieved_chunks):
    context = "\n\n".join(retrieved_chunks)

    response = client.responses.create(
        model=CHAT_MODEL,
        instructions=(
            "You are an Insurance Agency Customer Care assistant. "
            "Use only the provided context to answer. "
            "If the answer is not present, say you do not have it and offer human support."
        ),
        input=f"Context:\n{context}\n\nQuestion:\n{user_question}",
    )

    return response.output_text

Step 6: FastAPI Backend

Create backend/main.py:

import os
from fastapi import FastAPI
from pydantic import BaseModel

from rag.pdf_to_text import pdf_to_text
from rag.chunking import chunk_text
from rag.embed_store import build_and_save_index, load_index
from rag.rag_answer import retrieve, generate_answer

app = FastAPI()

DATA_DIR = os.path.join(os.path.dirname(__file__), "data")
PDF_PATH = os.path.join(DATA_DIR, "knowledge.pdf")
INDEX_PATH = os.path.join(DATA_DIR, "index.faiss")
META_PATH = os.path.join(DATA_DIR, "chunks.json")

index = None
chunks = None

class ChatIn(BaseModel):
    message: str


@app.post("/ingest")
def ingest():
    global index, chunks

    text = pdf_to_text(PDF_PATH)
    chunks = chunk_text(text)
    build_and_save_index(chunks, INDEX_PATH, META_PATH)
    index, chunks = load_index(INDEX_PATH, META_PATH)

    return {"status": "ok", "chunks": len(chunks)}


@app.post("/chat")
def chat(payload: ChatIn):
    global index, chunks

    if index is None or chunks is None:
        if os.path.exists(INDEX_PATH) and os.path.exists(META_PATH):
            index, chunks = load_index(INDEX_PATH, META_PATH)
        else:
            return {"answer": "Knowledge base not ingested yet. Call /ingest first."}

    hits = retrieve(payload.message, index, chunks)
    answer = generate_answer(payload.message, hits)

    return {"answer": answer}

Create frontend/src/ChatWidget.jsx:

import { useState } from "react";

export default function ChatWidget() {
  const [msgs, setMsgs] = useState([
    { role: "bot", text: "Hi. Ask me anything about your policy or claims." },
  ]);

  const [text, setText] = useState("");

  async function send() {
    const msg = text.trim();
    if (!msg) return;

    setMsgs((m) => [...m, { role: "user", text: msg }]);
    setText("");

    const res = await fetch("http://localhost:8000/chat", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ message: msg }),
    });

    const data = await res.json();
    setMsgs((m) => [...m, { role: "bot", text: data.answer }]);
  }

  return (
    <div>
      {msgs.map((m, i) => (
        <div key={i}>
          <b>{m.role}:</b> {m.text}
        </div>
      ))}

      <input
        value={text}
        onChange={(e) => setText(e.target.value)}
        placeholder="Type your question"
      />

      <button onClick={send}>Send</button>
    </div>
  );
}

Step 8: Requirements

Create backend/requirements.txt:

fastapi
uvicorn
pydantic
pypdf
tiktoken
faiss-cpu
numpy
openai
reportlab

Step 9: Environment Variables

Create a .env file in backend/:

OPENAI_API_KEY=your_api_key_here

Do not commit this file to GitHub.

Step 10: Run the Project

Backend

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

python rag/make_sample_pdf.py
uvicorn main:app --reload --port 8000

Build the index once:

curl -X POST http://localhost:8000/ingest

Frontend

cd frontend
npm install
npm run dev

Conclusion

You now have a complete RAG AI agent that reads PDFs, creates embeddings, retrieves relevant sections, and generates context-based answers through a FastAPI backend and React frontend. You can extend this system by adding user authentication, logging, or integrating with other services. If you want source code, check out the GitHub repository. Thank you for reading!

Build a RAG AI Agent in 30 Minutes

Backend (Python)

Frontend (React)

RAG in One Sentence

Technology Stack

Backend

Frontend

Folder Structure

Step 1: Create a Sample PDF Knowledge Base

Step 2: Read and Clean PDF Text

Step 3: Chunking (The Most Important Step)

Common Chunking Strategies

Step 4: Create Embeddings and Store Them in FAISS

Step 5: Retrieval and Answer Generation

Step 6: FastAPI Backend

Step 7: React Frontend Chat Widget

Step 8: Requirements

Step 9: Environment Variables

Step 10: Run the Project

Backend

Frontend

Conclusion

Featured posts

AI/ML Projects that will get you Hired in 2026

Run an LLM Locally to Interact with Your Documents (Step‑by‑Step Guide)

Things I Wish I knew before my first GitHub Contribution