Retrieval and RAG with Vedaya
This guide explains how to use Vedaya’s Retrieval and RAG (Retrieval Augmented Generation) APIs to query your knowledge base, retrieve relevant document chunks, and generate answers based on your data.
Overview
Vedaya’s Retrieval and RAG system allows you to:
- Query your knowledge base for relevant information
- Retrieve document chunks based on semantic search
- Generate answers using language models enhanced with retrieved context
- Build chatbots with knowledge from your documents
Simple Retrieval
To retrieve relevant chunks from your knowledge base without generating an answer:
import requests
url = "https://vedaya-backend.fly.dev/api/retrieval/query"
params = {
"query": "What are the key benefits of quantum computing?",
"vector_db": "pinecone", # Vector database to use
"top_k": 3 # Number of results to return
}
headers = {
'Authorization': 'Bearer YOUR_API_KEY'
}
response = requests.get(url, headers=headers, params=params)
print(response.json())
The response includes the most relevant chunks from your documents:
{
"query": "What are the key benefits of quantum computing?",
"chunks": [
{
"id": "c12345-1",
"text": "Quantum computing offers several key benefits including the ability to solve complex optimization problems exponentially faster than classical computers...",
"score": 0.92,
"source": "quantum-computing-overview.pdf"
},
{
"id": "c67890-3",
"text": "The primary advantage of quantum computing lies in its capacity to perform simultaneous calculations through quantum superposition...",
"score": 0.87,
"source": "computing-advances-2023.pdf"
},
{
"id": "c24680-5",
"text": "Benefits of quantum computing include breaking current encryption methods, accelerating drug discovery through molecular simulation, and optimizing complex logistics networks...",
"score": 0.81,
"source": "future-technology-trends.pdf"
}
],
"total_chunks_searched": 1250
}
Retrieval with Answer Generation
To retrieve chunks and generate an answer:
import requests
import json
url = "https://vedaya-backend.fly.dev/api/retrieval/query"
payload = json.dumps({
"query": "What are the key benefits of quantum computing?",
"vector_db": "pinecone", # Vector database to use
"top_k": 3, # Number of results to return
"model": "gpt-4" # Model to use for generating answers
})
headers = {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
}
response = requests.post(url, headers=headers, data=payload)
print(response.json())
The response includes both retrieved chunks and a generated answer:
{
"query": "What are the key benefits of quantum computing?",
"answer": "Based on the documents, the key benefits of quantum computing include:\n\n1. Exponentially faster solving of complex optimization problems compared to classical computers\n2. Ability to perform simultaneous calculations through quantum superposition\n3. Breaking current encryption methods\n4. Accelerating drug discovery through molecular simulation, and optimizing complex logistics networks...\n\nThese advantages stem from quantum computing's fundamentally different approach to processing information using qubits rather than traditional binary bits.",
"chunks": [
{
"id": "c12345-1",
"text": "Quantum computing offers several key benefits including the ability to solve complex optimization problems exponentially faster than classical computers...",
"score": 0.92,
"source": "quantum-computing-overview.pdf"
},
{
"id": "c67890-3",
"text": "The primary advantage of quantum computing lies in its capacity to perform simultaneous calculations through quantum superposition...",
"score": 0.87,
"source": "computing-advances-2023.pdf"
},
{
"id": "c24680-5",
"text": "Benefits of quantum computing include breaking current encryption methods, accelerating drug discovery through molecular simulation, and optimizing complex logistics networks...",
"score": 0.81,
"source": "future-technology-trends.pdf"
}
],
"total_chunks_searched": 1250,
"retrieval_time_ms": 156
}
RAG Queries for Frontend Integration
For applications that need a more structured response format, you can use the dedicated RAG endpoint:
import requests
import json
url = "https://vedaya-backend.fly.dev/api/chatbot/retrieval/query"
payload = json.dumps({
"query": "What are the key benefits of quantum computing?",
"vector_db": "pinecone",
"top_k": 3,
"model": "gpt-4"
})
headers = {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
}
response = requests.post(url, headers=headers, data=payload)
print(response.json())
The response format is specifically designed for frontend integration:
{
"query": "What are the key benefits of quantum computing?",
"answer": "Based on the documents, the key benefits of quantum computing include:\n\n1. Exponentially faster solving of complex optimization problems compared to classical computers\n2. Ability to perform simultaneous calculations through quantum superposition\n3. Breaking current encryption methods\n4. Accelerating drug discovery through molecular simulation\n5. Optimizing complex logistics networks\n\nThese advantages stem from quantum computing's fundamentally different approach to processing information using qubits rather than traditional binary bits.",
"chunks": [
{
"id": 12345,
"text": "Quantum computing offers several key benefits including the ability to solve complex optimization problems exponentially faster than classical computers...",
"score": 0.92,
"source": "quantum-computing-overview.pdf",
"entities": ["Quantum Computing", "Optimization"]
},
{
"id": 67890,
"text": "The primary advantage of quantum computing lies in its capacity to perform simultaneous calculations through quantum superposition...",
"score": 0.87,
"source": "computing-advances-2023.pdf",
"entities": ["Quantum Computing", "Superposition"]
},
{
"id": 24680,
"text": "Benefits of quantum computing include breaking current encryption methods, accelerating drug discovery through molecular simulation, and optimizing complex logistics networks...",
"score": 0.81,
"source": "future-technology-trends.pdf",
"entities": ["Quantum Computing", "Encryption", "Drug Discovery"]
}
],
"total_chunks_searched": 1250,
"retrieval_time_ms": 156
}
Simple Chatbot Queries
For simpler use cases, you can use the basic chatbot endpoint:
import requests
url = "https://vedaya-backend.fly.dev/api/chatbot/query/"
params = {
"query_str": "What are the key benefits of quantum computing?",
"rag": True, # Whether to use RAG
"topk": 3 # Number of results to return
}
headers = {
'Authorization': 'Bearer YOUR_API_KEY'
}
response = requests.get(url, headers=headers, params=params)
print(response.json())
Understanding RAG
Retrieval Augmented Generation (RAG) combines:
- Retrieval: Finding relevant information from your document collection
- Generation: Using a language model to create coherent, contextual responses
The key benefits of RAG include:
- Up-to-date information: Responses based on your latest documents
- Reduced hallucinations: Grounding responses in factual content
- Domain-specific knowledge: Tailored answers based on your specific data
- Transparency: Citations to source documents for verification
- Cost efficiency: Optimizing expensive language model usage
Implementing RAG in Your Application
Follow these steps to implement RAG in your application:
- Ingest documents using the Data Ingestion API
- Wait for processing to complete (indexing and embedding generation)
- Set up retrieval endpoints in your application
- Design prompts that effectively use the retrieved context
- Handle responses appropriately in your frontend
Best Practices
- Query Formulation: Ask clear, specific questions for better retrieval
- Context Length: Adjust
top_k
based on the complexity of the question
- Model Selection: Use more powerful models for complex reasoning tasks
- Citation Generation: Extract and display source information to users
- Fallback Mechanisms: Handle cases where relevant information isn’t found
- Progressive Enhancement: Start with simple retrieval and add RAG capabilities as needed
For more details on available endpoints and parameters, see the Retrieval & RAG API Reference.