Retrieval and RAG with Vedaya

This guide explains how to use Vedaya’s Retrieval and RAG (Retrieval Augmented Generation) APIs to query your knowledge base, retrieve relevant document chunks, and generate answers based on your data.

Overview

Vedaya’s Retrieval and RAG system allows you to:

Query your knowledge base for relevant information
Retrieve document chunks based on semantic search
Generate answers using language models enhanced with retrieved context
Build chatbots with knowledge from your documents

Simple Retrieval

To retrieve relevant chunks from your knowledge base without generating an answer:

import requests

url = "https://vedaya-backend.fly.dev/api/retrieval/query"

params = {
  "query": "What are the key benefits of quantum computing?",
  "vector_db": "pinecone",  # Vector database to use
  "top_k": 3                # Number of results to return
}

headers = {
  'Authorization': 'Bearer YOUR_API_KEY'
}

response = requests.get(url, headers=headers, params=params)
print(response.json())

The response includes the most relevant chunks from your documents:

{
  "query": "What are the key benefits of quantum computing?",
  "chunks": [
    {
      "id": "c12345-1",
      "text": "Quantum computing offers several key benefits including the ability to solve complex optimization problems exponentially faster than classical computers...",
      "score": 0.92,
      "source": "quantum-computing-overview.pdf"
    },
    {
      "id": "c67890-3",
      "text": "The primary advantage of quantum computing lies in its capacity to perform simultaneous calculations through quantum superposition...",
      "score": 0.87,
      "source": "computing-advances-2023.pdf"
    },
    {
      "id": "c24680-5",
      "text": "Benefits of quantum computing include breaking current encryption methods, accelerating drug discovery through molecular simulation, and optimizing complex logistics networks...",
      "score": 0.81,
      "source": "future-technology-trends.pdf"
    }
  ],
  "total_chunks_searched": 1250
}

Retrieval with Answer Generation

To retrieve chunks and generate an answer:

import requests
import json

url = "https://vedaya-backend.fly.dev/api/retrieval/query"

payload = json.dumps({
  "query": "What are the key benefits of quantum computing?",
  "vector_db": "pinecone",  # Vector database to use
  "top_k": 3,               # Number of results to return
  "model": "gpt-4"          # Model to use for generating answers
})

headers = {
  'Authorization': 'Bearer YOUR_API_KEY',
  'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, data=payload)
print(response.json())

The response includes both retrieved chunks and a generated answer:

{
  "query": "What are the key benefits of quantum computing?",
  "answer": "Based on the documents, the key benefits of quantum computing include:\n\n1. Exponentially faster solving of complex optimization problems compared to classical computers\n2. Ability to perform simultaneous calculations through quantum superposition\n3. Breaking current encryption methods\n4. Accelerating drug discovery through molecular simulation, and optimizing complex logistics networks...\n\nThese advantages stem from quantum computing's fundamentally different approach to processing information using qubits rather than traditional binary bits.",
  "chunks": [
    {
      "id": "c12345-1",
      "text": "Quantum computing offers several key benefits including the ability to solve complex optimization problems exponentially faster than classical computers...",
      "score": 0.92,
      "source": "quantum-computing-overview.pdf"
    },
    {
      "id": "c67890-3",
      "text": "The primary advantage of quantum computing lies in its capacity to perform simultaneous calculations through quantum superposition...",
      "score": 0.87,
      "source": "computing-advances-2023.pdf"
    },
    {
      "id": "c24680-5",
      "text": "Benefits of quantum computing include breaking current encryption methods, accelerating drug discovery through molecular simulation, and optimizing complex logistics networks...",
      "score": 0.81,
      "source": "future-technology-trends.pdf"
    }
  ],
  "total_chunks_searched": 1250,
  "retrieval_time_ms": 156
}

RAG Queries for Frontend Integration

For applications that need a more structured response format, you can use the dedicated RAG endpoint:

import requests
import json

url = "https://vedaya-backend.fly.dev/api/chatbot/retrieval/query"

payload = json.dumps({
  "query": "What are the key benefits of quantum computing?",
  "vector_db": "pinecone",
  "top_k": 3,
  "model": "gpt-4"
})

headers = {
  'Authorization': 'Bearer YOUR_API_KEY',
  'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, data=payload)
print(response.json())

The response format is specifically designed for frontend integration:

{
  "query": "What are the key benefits of quantum computing?",
  "answer": "Based on the documents, the key benefits of quantum computing include:\n\n1. Exponentially faster solving of complex optimization problems compared to classical computers\n2. Ability to perform simultaneous calculations through quantum superposition\n3. Breaking current encryption methods\n4. Accelerating drug discovery through molecular simulation\n5. Optimizing complex logistics networks\n\nThese advantages stem from quantum computing's fundamentally different approach to processing information using qubits rather than traditional binary bits.",
  "chunks": [
    {
      "id": 12345,
      "text": "Quantum computing offers several key benefits including the ability to solve complex optimization problems exponentially faster than classical computers...",
      "score": 0.92,
      "source": "quantum-computing-overview.pdf",
      "entities": ["Quantum Computing", "Optimization"]
    },
    {
      "id": 67890,
      "text": "The primary advantage of quantum computing lies in its capacity to perform simultaneous calculations through quantum superposition...",
      "score": 0.87,
      "source": "computing-advances-2023.pdf",
      "entities": ["Quantum Computing", "Superposition"]
    },
    {
      "id": 24680,
      "text": "Benefits of quantum computing include breaking current encryption methods, accelerating drug discovery through molecular simulation, and optimizing complex logistics networks...",
      "score": 0.81,
      "source": "future-technology-trends.pdf",
      "entities": ["Quantum Computing", "Encryption", "Drug Discovery"]
    }
  ],
  "total_chunks_searched": 1250,
  "retrieval_time_ms": 156
}

Simple Chatbot Queries

For simpler use cases, you can use the basic chatbot endpoint:

import requests

url = "https://vedaya-backend.fly.dev/api/chatbot/query/"

params = {
  "query_str": "What are the key benefits of quantum computing?",
  "rag": True,       # Whether to use RAG
  "topk": 3          # Number of results to return
}

headers = {
  'Authorization': 'Bearer YOUR_API_KEY'
}

response = requests.get(url, headers=headers, params=params)
print(response.json())

Understanding RAG

Retrieval Augmented Generation (RAG) combines:

Retrieval: Finding relevant information from your document collection
Generation: Using a language model to create coherent, contextual responses

The key benefits of RAG include:

Up-to-date information: Responses based on your latest documents
Reduced hallucinations: Grounding responses in factual content
Domain-specific knowledge: Tailored answers based on your specific data
Transparency: Citations to source documents for verification
Cost efficiency: Optimizing expensive language model usage

Implementing RAG in Your Application

Follow these steps to implement RAG in your application:

Ingest documents using the Data Ingestion API
Wait for processing to complete (indexing and embedding generation)
Set up retrieval endpoints in your application
Design prompts that effectively use the retrieved context
Handle responses appropriately in your frontend

Best Practices

Query Formulation: Ask clear, specific questions for better retrieval
Context Length: Adjust top_k based on the complexity of the question
Model Selection: Use more powerful models for complex reasoning tasks
Citation Generation: Extract and display source information to users
Fallback Mechanisms: Handle cases where relevant information isn’t found
Progressive Enhancement: Start with simple retrieval and add RAG capabilities as needed

For more details on available endpoints and parameters, see the Retrieval & RAG API Reference.

Get Started

Usage

Retrieval & RAG Guide

Retrieval and RAG with Vedaya

Overview

Simple Retrieval

Retrieval with Answer Generation

RAG Queries for Frontend Integration

Simple Chatbot Queries

Understanding RAG

Implementing RAG in Your Application

Best Practices

Get Started

Usage

​Retrieval and RAG with Vedaya

​Overview

​Simple Retrieval

​Retrieval with Answer Generation

​RAG Queries for Frontend Integration

​Simple Chatbot Queries

​Understanding RAG

​Implementing RAG in Your Application

​Best Practices

Retrieval and RAG with Vedaya

Overview

Simple Retrieval

Retrieval with Answer Generation

RAG Queries for Frontend Integration

Simple Chatbot Queries

Understanding RAG

Implementing RAG in Your Application

Best Practices