Retrieval and RAG with Vedaya

This guide explains how to use Vedaya’s Retrieval and RAG (Retrieval Augmented Generation) APIs to query your knowledge base, retrieve relevant document chunks, and generate answers based on your data.

Overview

Vedaya’s Retrieval and RAG system allows you to:

  • Query your knowledge base for relevant information
  • Retrieve document chunks based on semantic search
  • Generate answers using language models enhanced with retrieved context
  • Build chatbots with knowledge from your documents

Simple Retrieval

To retrieve relevant chunks from your knowledge base without generating an answer:

import requests

url = "https://vedaya-backend.fly.dev/api/retrieval/query"

params = {
  "query": "What are the key benefits of quantum computing?",
  "vector_db": "pinecone",  # Vector database to use
  "top_k": 3                # Number of results to return
}

headers = {
  'Authorization': 'Bearer YOUR_API_KEY'
}

response = requests.get(url, headers=headers, params=params)
print(response.json())

The response includes the most relevant chunks from your documents:

{
  "query": "What are the key benefits of quantum computing?",
  "chunks": [
    {
      "id": "c12345-1",
      "text": "Quantum computing offers several key benefits including the ability to solve complex optimization problems exponentially faster than classical computers...",
      "score": 0.92,
      "source": "quantum-computing-overview.pdf"
    },
    {
      "id": "c67890-3",
      "text": "The primary advantage of quantum computing lies in its capacity to perform simultaneous calculations through quantum superposition...",
      "score": 0.87,
      "source": "computing-advances-2023.pdf"
    },
    {
      "id": "c24680-5",
      "text": "Benefits of quantum computing include breaking current encryption methods, accelerating drug discovery through molecular simulation, and optimizing complex logistics networks...",
      "score": 0.81,
      "source": "future-technology-trends.pdf"
    }
  ],
  "total_chunks_searched": 1250
}

Retrieval with Answer Generation

To retrieve chunks and generate an answer:

import requests
import json

url = "https://vedaya-backend.fly.dev/api/retrieval/query"

payload = json.dumps({
  "query": "What are the key benefits of quantum computing?",
  "vector_db": "pinecone",  # Vector database to use
  "top_k": 3,               # Number of results to return
  "model": "gpt-4"          # Model to use for generating answers
})

headers = {
  'Authorization': 'Bearer YOUR_API_KEY',
  'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, data=payload)
print(response.json())

The response includes both retrieved chunks and a generated answer:

{
  "query": "What are the key benefits of quantum computing?",
  "answer": "Based on the documents, the key benefits of quantum computing include:\n\n1. Exponentially faster solving of complex optimization problems compared to classical computers\n2. Ability to perform simultaneous calculations through quantum superposition\n3. Breaking current encryption methods\n4. Accelerating drug discovery through molecular simulation, and optimizing complex logistics networks...\n\nThese advantages stem from quantum computing's fundamentally different approach to processing information using qubits rather than traditional binary bits.",
  "chunks": [
    {
      "id": "c12345-1",
      "text": "Quantum computing offers several key benefits including the ability to solve complex optimization problems exponentially faster than classical computers...",
      "score": 0.92,
      "source": "quantum-computing-overview.pdf"
    },
    {
      "id": "c67890-3",
      "text": "The primary advantage of quantum computing lies in its capacity to perform simultaneous calculations through quantum superposition...",
      "score": 0.87,
      "source": "computing-advances-2023.pdf"
    },
    {
      "id": "c24680-5",
      "text": "Benefits of quantum computing include breaking current encryption methods, accelerating drug discovery through molecular simulation, and optimizing complex logistics networks...",
      "score": 0.81,
      "source": "future-technology-trends.pdf"
    }
  ],
  "total_chunks_searched": 1250,
  "retrieval_time_ms": 156
}

RAG Queries for Frontend Integration

For applications that need a more structured response format, you can use the dedicated RAG endpoint:

import requests
import json

url = "https://vedaya-backend.fly.dev/api/chatbot/retrieval/query"

payload = json.dumps({
  "query": "What are the key benefits of quantum computing?",
  "vector_db": "pinecone",
  "top_k": 3,
  "model": "gpt-4"
})

headers = {
  'Authorization': 'Bearer YOUR_API_KEY',
  'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, data=payload)
print(response.json())

The response format is specifically designed for frontend integration:

{
  "query": "What are the key benefits of quantum computing?",
  "answer": "Based on the documents, the key benefits of quantum computing include:\n\n1. Exponentially faster solving of complex optimization problems compared to classical computers\n2. Ability to perform simultaneous calculations through quantum superposition\n3. Breaking current encryption methods\n4. Accelerating drug discovery through molecular simulation\n5. Optimizing complex logistics networks\n\nThese advantages stem from quantum computing's fundamentally different approach to processing information using qubits rather than traditional binary bits.",
  "chunks": [
    {
      "id": 12345,
      "text": "Quantum computing offers several key benefits including the ability to solve complex optimization problems exponentially faster than classical computers...",
      "score": 0.92,
      "source": "quantum-computing-overview.pdf",
      "entities": ["Quantum Computing", "Optimization"]
    },
    {
      "id": 67890,
      "text": "The primary advantage of quantum computing lies in its capacity to perform simultaneous calculations through quantum superposition...",
      "score": 0.87,
      "source": "computing-advances-2023.pdf",
      "entities": ["Quantum Computing", "Superposition"]
    },
    {
      "id": 24680,
      "text": "Benefits of quantum computing include breaking current encryption methods, accelerating drug discovery through molecular simulation, and optimizing complex logistics networks...",
      "score": 0.81,
      "source": "future-technology-trends.pdf",
      "entities": ["Quantum Computing", "Encryption", "Drug Discovery"]
    }
  ],
  "total_chunks_searched": 1250,
  "retrieval_time_ms": 156
}

Simple Chatbot Queries

For simpler use cases, you can use the basic chatbot endpoint:

import requests

url = "https://vedaya-backend.fly.dev/api/chatbot/query/"

params = {
  "query_str": "What are the key benefits of quantum computing?",
  "rag": True,       # Whether to use RAG
  "topk": 3          # Number of results to return
}

headers = {
  'Authorization': 'Bearer YOUR_API_KEY'
}

response = requests.get(url, headers=headers, params=params)
print(response.json())

Understanding RAG

Retrieval Augmented Generation (RAG) combines:

  1. Retrieval: Finding relevant information from your document collection
  2. Generation: Using a language model to create coherent, contextual responses

The key benefits of RAG include:

  • Up-to-date information: Responses based on your latest documents
  • Reduced hallucinations: Grounding responses in factual content
  • Domain-specific knowledge: Tailored answers based on your specific data
  • Transparency: Citations to source documents for verification
  • Cost efficiency: Optimizing expensive language model usage

Implementing RAG in Your Application

Follow these steps to implement RAG in your application:

  1. Ingest documents using the Data Ingestion API
  2. Wait for processing to complete (indexing and embedding generation)
  3. Set up retrieval endpoints in your application
  4. Design prompts that effectively use the retrieved context
  5. Handle responses appropriately in your frontend

Best Practices

  1. Query Formulation: Ask clear, specific questions for better retrieval
  2. Context Length: Adjust top_k based on the complexity of the question
  3. Model Selection: Use more powerful models for complex reasoning tasks
  4. Citation Generation: Extract and display source information to users
  5. Fallback Mechanisms: Handle cases where relevant information isn’t found
  6. Progressive Enhancement: Start with simple retrieval and add RAG capabilities as needed

For more details on available endpoints and parameters, see the Retrieval & RAG API Reference.