Query and RAG API

The Query and RAG API provides powerful endpoints for performing retrieval-augmented generation (RAG) queries on your knowledge graph, supporting multiple query modes and LLM providers.

Query Knowledge Graph

Perform a retrieval-augmented generation (RAG) query on the knowledge graph.

Request Body

query

string

required

The query text to search or ask about

mode

enum

default:"hybrid"

Query mode determines the retrieval strategy:

hybrid (default): Combines entity and relationship retrieval
local: Entity-centric retrieval
global: Relationship-centric retrieval
naive: Basic keyword search
mix: Integrates KG with vector search
bypass: Direct LLM without retrieval

only_need_context

boolean

default:"false"

Only returns retrieved context without generating response

only_need_prompt

boolean

default:"false"

Only returns generated prompt without producing response

response_type

string

Response format (e.g., “Multiple Paragraphs”, “Bullet Points”)

top_k

integer

default:"20"

Number of top items to retrieve

max_token_for_text_unit

integer

default:"4000"

Max tokens for each retrieved text chunk

max_token_for_global_context

integer

default:"4000"

Max tokens for relationship descriptions

max_token_for_local_context

integer

default:"4000"

Max tokens for entity descriptions

conversation_history

array

Past conversation history for context

llm_provider

string

LLM provider (openai, anthropic, ollama, etc.)

llm_model

string

Specific LLM model to use (e.g., gpt-4, claude-3)

llm_temperature

number

default:"0.7"

Temperature for LLM generation (0.0-2.0)

Query Modes

hybrid (default): Combines entity and relationship retrieval for comprehensive results
local: Entity-centric retrieval (focuses on specific entities and their properties)
global: Relationship-centric retrieval (focuses on connections between entities)
naive: Basic keyword search without advanced graph features
mix: Integrates knowledge graph with vector search
bypass: Direct LLM query without retrieval

Response

Returns a QueryResponse with the generated response based on the retrieved context.

curl -X POST "https://vedaya-kge.fly.dev/query" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the relationship between AI and machine learning?",
    "mode": "hybrid",
    "top_k": 20,
    "llm_provider": "openai",
    "llm_model": "gpt-4",
    "response_type": "Multiple Paragraphs"
  }'

{
  "response": "Artificial Intelligence (AI) and Machine Learning (ML) are closely related concepts, with ML being a subset of AI...\n\nMachine Learning is a specific approach to achieving artificial intelligence through algorithms that can learn from and make predictions based on data...\n\nThe relationship between AI and ML can be understood as hierarchical, where AI represents the broader goal of creating intelligent machines, while ML provides specific methods and techniques to achieve that goal."
}

Stream Query Results

Stream the response for a knowledge graph query in real-time.

Request Body

Same parameters as the /query endpoint.

Response

Streams the response as newline-delimited JSON (NDJSON) with each chunk containing a portion of the generated text. This provides a more responsive user experience for long outputs.

curl -X POST "https://vedaya-kge.fly.dev/query/stream" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Explain the evolution of deep learning",
    "mode": "global",
    "top_k": 30,
    "llm_provider": "anthropic",
    "llm_model": "claude-3-opus-20240229"
  }'

OpenAI-Compatible Chat Completions

Create a chat completion with RAG enhancement, compatible with OpenAI’s API.

Request Body

model

string

required

Model name (use “vedaya-*” for RAG modes):

vedaya-naive: Basic keyword search
vedaya-local: Entity-focused retrieval
vedaya-global: Relationship-focused
vedaya-hybrid: Combined approach (recommended)
vedaya-bypass: Direct LLM without RAG

messages

array

required

Array of message objects with role and content

temperature

number

default:"0.7"

Temperature for generation (0.0-2.0)

max_tokens

integer

Maximum tokens to generate

stream

boolean

default:"false"

Whether to stream the response (may not be available)

rag_mode

string

RAG mode override (naive, local, global, hybrid)

rag_top_k

integer

default:"10"

Number of results for RAG retrieval

rag_only_context

boolean

default:"false"

Only return RAG context without generation

Special Model Names for RAG

vedaya-naive: Naive RAG mode
vedaya-local: Local search mode
vedaya-global: Global search mode
vedaya-hybrid: Hybrid search mode
vedaya-bypass: Direct LLM without RAG

curl -X POST "https://vedaya-kge.fly.dev/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vedaya-hybrid",
    "messages": [
      {"role": "user", "content": "What are neural networks?"}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1735368000,
  "model": "vedaya-hybrid",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Neural networks are computational models inspired by the structure and function of biological neural networks in the human brain..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 150,
    "completion_tokens": 250,
    "total_tokens": 400,
    "retrieval_tokens": 1200,
    "graph_tokens": 300
  }
}

Important Notes

Key Differences from Old Documentation

No authentication required - API works with dummy keys
OpenAI interface is primary - Use /v1/chat/completions for best results
Special model names - Use vedaya-* models to control RAG modes
Streaming may not work - Returns 404, use regular requests
Processing is fast - Documents process in seconds, not minutes

Tips for Best Results

Use vedaya-hybrid model for general queries
Use vedaya-local when looking for specific entities
Use vedaya-global when understanding relationships
Keep top_k between 10-30 for optimal results
Multi-turn conversations maintain context automatically

Legacy Endpoints

Completions Endpoint

The /v1/completions endpoint exists for compatibility but converts to chat format internally. Use /v1/chat/completions instead.

# Not recommended - use chat completions instead
response = requests.post(
    "https://vedaya-kge.fly.dev/v1/completions",
    json={
        "model": "vedaya-hybrid",
        "prompt": "What is machine learning?",
        "max_tokens": 200
    }
)

API Overview

Authentication

Document Management

Query & RAG

Knowledge Graph

MCP Integration

Branding

Query & RAG

Query and RAG API

Query Knowledge Graph

Request Body

Query Modes

Response

Stream Query Results

Request Body

Response

OpenAI-Compatible Chat Completions

Request Body

Special Model Names for RAG

Important Notes

Key Differences from Old Documentation

Tips for Best Results

Legacy Endpoints

Completions Endpoint

API Overview

Authentication

Document Management

Query & RAG

Knowledge Graph

MCP Integration

Branding

​Query and RAG API

​Query Knowledge Graph

​Request Body

​Query Modes

​Response

​Stream Query Results

​Request Body

​Response

​OpenAI-Compatible Chat Completions

​Request Body

​Special Model Names for RAG

​Important Notes

​Key Differences from Old Documentation

​Tips for Best Results

​Legacy Endpoints

​Completions Endpoint

Query and RAG API

Query Knowledge Graph

Request Body

Query Modes

Response

Stream Query Results

Request Body

Response

OpenAI-Compatible Chat Completions

Request Body

Special Model Names for RAG

Important Notes

Key Differences from Old Documentation

Tips for Best Results

Legacy Endpoints

Completions Endpoint