Query and RAG API
The Query and RAG API provides powerful endpoints for performing retrieval-augmented generation (RAG) queries on your knowledge graph, supporting multiple query modes and LLM providers.Query Knowledge Graph
Perform a retrieval-augmented generation (RAG) query on the knowledge graph.Request Body
The query text to search or ask about
Query mode determines the retrieval strategy:
- hybrid (default): Combines entity and relationship retrieval
- local: Entity-centric retrieval
- global: Relationship-centric retrieval
- naive: Basic keyword search
- mix: Integrates KG with vector search
- bypass: Direct LLM without retrieval
Only returns retrieved context without generating response
Only returns generated prompt without producing response
Response format (e.g., “Multiple Paragraphs”, “Bullet Points”)
Number of top items to retrieve
Max tokens for each retrieved text chunk
Max tokens for relationship descriptions
Max tokens for entity descriptions
Past conversation history for context
LLM provider (openai, anthropic, ollama, etc.)
Specific LLM model to use (e.g., gpt-4, claude-3)
Temperature for LLM generation (0.0-2.0)
Query Modes
- hybrid (default): Combines entity and relationship retrieval for comprehensive results
- local: Entity-centric retrieval (focuses on specific entities and their properties)
- global: Relationship-centric retrieval (focuses on connections between entities)
- naive: Basic keyword search without advanced graph features
- mix: Integrates knowledge graph with vector search
- bypass: Direct LLM query without retrieval
Response
Returns a QueryResponse with the generated response based on the retrieved context.Stream Query Results
Stream the response for a knowledge graph query in real-time.Request Body
Same parameters as the/query
endpoint.
Response
Streams the response as newline-delimited JSON (NDJSON) with each chunk containing a portion of the generated text. This provides a more responsive user experience for long outputs.OpenAI-Compatible Chat Completions
Create a chat completion with RAG enhancement, compatible with OpenAI’s API.Request Body
Model name (use “vedaya-*” for RAG modes):
- vedaya-naive: Basic keyword search
- vedaya-local: Entity-focused retrieval
- vedaya-global: Relationship-focused
- vedaya-hybrid: Combined approach (recommended)
- vedaya-bypass: Direct LLM without RAG
Array of message objects with role and content
Temperature for generation (0.0-2.0)
Maximum tokens to generate
Whether to stream the response (may not be available)
RAG mode override (naive, local, global, hybrid)
Number of results for RAG retrieval
Only return RAG context without generation
Special Model Names for RAG
- vedaya-naive: Naive RAG mode
- vedaya-local: Local search mode
- vedaya-global: Global search mode
- vedaya-hybrid: Hybrid search mode
- vedaya-bypass: Direct LLM without RAG
Important Notes
Key Differences from Old Documentation
- No authentication required - API works with dummy keys
- OpenAI interface is primary - Use
/v1/chat/completions
for best results - Special model names - Use
vedaya-*
models to control RAG modes - Streaming may not work - Returns 404, use regular requests
- Processing is fast - Documents process in seconds, not minutes
Tips for Best Results
- Use
vedaya-hybrid
model for general queries - Use
vedaya-local
when looking for specific entities - Use
vedaya-global
when understanding relationships - Keep
top_k
between 10-30 for optimal results - Multi-turn conversations maintain context automatically
Legacy Endpoints
Completions Endpoint
The/v1/completions
endpoint exists for compatibility but converts to chat format internally. Use /v1/chat/completions
instead.