AI City Popy πŸ™οΈ
πŸ“–
πŸ”
πŸ“„
✨
District 8 Β· Open-Book School

Welcome to The RAG School

Where AI looks up real facts before writing the answer.

Enter The School
❓ Ask question
πŸ” Search docs
πŸ“„ Retrieve chunks
πŸ€– Generate answer

Why AI Makes Things Up

Without access to real documents, AI fills gaps with plausible-sounding guesses. This is called hallucination.

❓ Question: "What is our company's refund policy?"
🏫

RAG gives AI a open-book exam β€” it can look up the real answer before writing anything!

The 3-Step RAG Pipeline

Step through every stage of how RAG works.

❓
User Question
User asks: 'What's the refund policy?'
πŸ”’
Embed Question
Turn the question into a vector of numbers.
πŸ”
Search Vector DB
Find the most similar document chunks.
πŸ“„
Retrieve Chunks
Grab the top-3 matching policy paragraphs.
πŸ€–
Generate Answer
LLM reads chunks + writes a grounded reply.

Splitting Knowledge Into Chunks

Before storing documents, we split them into chunks. Chunk size affects search quality.

Chunk 1AI City has many districts. Each district teaches one topic. The Reception Center teaches FastAPI.
Chunk 2The Async Roads teach async Python. The AI Worker Office teaches agent architecture.
Chunk 3Visitors can explore each district and earn badges.

Good balance of precision & context. βœ… Recommended

RAG In Code

Hover any glowing token to understand what it does.

q_vec = embed_query(question)
results = vector_db.search(q_vec, top_k=3)
answer = llm.generate(question, context=results)

Mission: Build the RAG Pipeline

Tap the steps in the correct order to assemble a working RAG pipeline.

Tap steps below to add them here…
Get started

Build your first RAG system πŸ“š

Four steps from zero to a document-grounded chatbot.

  1. 1

    Install dependencies

    Grab the OpenAI library and ChromaDB for local vector storage.

    pip install openai chromadb
  2. 2

    Chunk & embed your docs

    Split your documents into chunks, then turn each chunk into a vector.

    import openai, chromadb
    
    client = chromadb.Client()
    col = client.create_collection("docs")
    
    chunks = ["Policy: 14-day returns...", "Billing via Stripe..."]
    for i, chunk in enumerate(chunks):
        emb = openai.embeddings.create(
            model="text-embedding-3-small",
            input=chunk
        ).data[0].embedding
        col.add(ids=[str(i)], embeddings=[emb],
                documents=[chunk])
  3. 3

    Search by meaning

    Embed the user's question and find the closest chunks.

    question = "Can I get a refund?"
    q_emb = openai.embeddings.create(
        model="text-embedding-3-small",
        input=question
    ).data[0].embedding
    
    results = col.query(
        query_embeddings=[q_emb], n_results=3
    )
    chunks = results["documents"][0]
  4. 4

    Generate a grounded answer

    Pass the retrieved chunks as context so the LLM answers from facts.

    context = "\n".join(chunks)
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
          {"role":"system","content":f"Answer using: {context}"},
          {"role":"user","content": question}
        ]
    )
    print(response.choices[0].message.content)
Ask Popy

Chat about RAG ✨

Questions about chunking, embeddings, vector databases, or when to use RAG vs fine-tuning?

Hi! I'm Popy 🏫 Ask me anything about RAG, embeddings, vector search, or document retrieval!

You're a RAG Researcher now!

You know how to ground AI in real documents β€” no more hallucinations. Next: learn how to structure AI data with Pydantic.

Mini Project
Build Quest

Grounded Answerer

Deliverable: Retrieve relevant chunks and answer with citations to source snippets.

Stretch: Refuse politely when retrieval confidence is low.

Complete the deliverable first, then unlock the stretch goal.

Previous
πŸ“š Memory Library
Next
πŸ‘©β€πŸ« Teacher Academy