curl -X POST "https://vedaya-kge.fly.dev/query" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the relationship between AI and machine learning?",
    "mode": "hybrid",
    "top_k": 20,
    "llm_provider": "openai",
    "llm_model": "gpt-4",
    "response_type": "Multiple Paragraphs"
  }'
{
  "response": "Artificial Intelligence (AI) and Machine Learning (ML) are closely related concepts, with ML being a subset of AI...\n\nMachine Learning is a specific approach to achieving artificial intelligence through algorithms that can learn from and make predictions based on data...\n\nThe relationship between AI and ML can be understood as hierarchical, where AI represents the broader goal of creating intelligent machines, while ML provides specific methods and techniques to achieve that goal."
}

Query and RAG API

The Query and RAG API provides powerful endpoints for performing retrieval-augmented generation (RAG) queries on your knowledge graph, supporting multiple query modes and LLM providers.

Query Knowledge Graph

Perform a retrieval-augmented generation (RAG) query on the knowledge graph.

Request Body

query
string
required
The query text to search or ask about
mode
enum
default:"hybrid"
Query mode determines the retrieval strategy:
  • hybrid (default): Combines entity and relationship retrieval
  • local: Entity-centric retrieval
  • global: Relationship-centric retrieval
  • naive: Basic keyword search
  • mix: Integrates KG with vector search
  • bypass: Direct LLM without retrieval
only_need_context
boolean
default:"false"
Only returns retrieved context without generating response
only_need_prompt
boolean
default:"false"
Only returns generated prompt without producing response
response_type
string
Response format (e.g., “Multiple Paragraphs”, “Bullet Points”)
top_k
integer
default:"20"
Number of top items to retrieve
max_token_for_text_unit
integer
default:"4000"
Max tokens for each retrieved text chunk
max_token_for_global_context
integer
default:"4000"
Max tokens for relationship descriptions
max_token_for_local_context
integer
default:"4000"
Max tokens for entity descriptions
conversation_history
array
Past conversation history for context
llm_provider
string
LLM provider (openai, anthropic, ollama, etc.)
llm_model
string
Specific LLM model to use (e.g., gpt-4, claude-3)
llm_temperature
number
default:"0.7"
Temperature for LLM generation (0.0-2.0)

Query Modes

  • hybrid (default): Combines entity and relationship retrieval for comprehensive results
  • local: Entity-centric retrieval (focuses on specific entities and their properties)
  • global: Relationship-centric retrieval (focuses on connections between entities)
  • naive: Basic keyword search without advanced graph features
  • mix: Integrates knowledge graph with vector search
  • bypass: Direct LLM query without retrieval

Response

Returns a QueryResponse with the generated response based on the retrieved context.
curl -X POST "https://vedaya-kge.fly.dev/query" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the relationship between AI and machine learning?",
    "mode": "hybrid",
    "top_k": 20,
    "llm_provider": "openai",
    "llm_model": "gpt-4",
    "response_type": "Multiple Paragraphs"
  }'
{
  "response": "Artificial Intelligence (AI) and Machine Learning (ML) are closely related concepts, with ML being a subset of AI...\n\nMachine Learning is a specific approach to achieving artificial intelligence through algorithms that can learn from and make predictions based on data...\n\nThe relationship between AI and ML can be understood as hierarchical, where AI represents the broader goal of creating intelligent machines, while ML provides specific methods and techniques to achieve that goal."
}

Stream Query Results

Stream the response for a knowledge graph query in real-time.

Request Body

Same parameters as the /query endpoint.

Response

Streams the response as newline-delimited JSON (NDJSON) with each chunk containing a portion of the generated text. This provides a more responsive user experience for long outputs.
curl -X POST "https://vedaya-kge.fly.dev/query/stream" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Explain the evolution of deep learning",
    "mode": "global",
    "top_k": 30,
    "llm_provider": "anthropic",
    "llm_model": "claude-3-opus-20240229"
  }'

OpenAI-Compatible Chat Completions

Create a chat completion with RAG enhancement, compatible with OpenAI’s API.

Request Body

model
string
required
Model name (use “vedaya-*” for RAG modes):
  • vedaya-naive: Basic keyword search
  • vedaya-local: Entity-focused retrieval
  • vedaya-global: Relationship-focused
  • vedaya-hybrid: Combined approach (recommended)
  • vedaya-bypass: Direct LLM without RAG
messages
array
required
Array of message objects with role and content
temperature
number
default:"0.7"
Temperature for generation (0.0-2.0)
max_tokens
integer
Maximum tokens to generate
stream
boolean
default:"false"
Whether to stream the response (may not be available)
rag_mode
string
RAG mode override (naive, local, global, hybrid)
rag_top_k
integer
default:"10"
Number of results for RAG retrieval
rag_only_context
boolean
default:"false"
Only return RAG context without generation

Special Model Names for RAG

  • vedaya-naive: Naive RAG mode
  • vedaya-local: Local search mode
  • vedaya-global: Global search mode
  • vedaya-hybrid: Hybrid search mode
  • vedaya-bypass: Direct LLM without RAG
curl -X POST "https://vedaya-kge.fly.dev/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vedaya-hybrid",
    "messages": [
      {"role": "user", "content": "What are neural networks?"}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1735368000,
  "model": "vedaya-hybrid",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Neural networks are computational models inspired by the structure and function of biological neural networks in the human brain..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 150,
    "completion_tokens": 250,
    "total_tokens": 400,
    "retrieval_tokens": 1200,
    "graph_tokens": 300
  }
}

Important Notes

Key Differences from Old Documentation

  1. No authentication required - API works with dummy keys
  2. OpenAI interface is primary - Use /v1/chat/completions for best results
  3. Special model names - Use vedaya-* models to control RAG modes
  4. Streaming may not work - Returns 404, use regular requests
  5. Processing is fast - Documents process in seconds, not minutes

Tips for Best Results

  • Use vedaya-hybrid model for general queries
  • Use vedaya-local when looking for specific entities
  • Use vedaya-global when understanding relationships
  • Keep top_k between 10-30 for optimal results
  • Multi-turn conversations maintain context automatically

Legacy Endpoints

Completions Endpoint

The /v1/completions endpoint exists for compatibility but converts to chat format internally. Use /v1/chat/completions instead.
# Not recommended - use chat completions instead
response = requests.post(
    "https://vedaya-kge.fly.dev/v1/completions",
    json={
        "model": "vedaya-hybrid",
        "prompt": "What is machine learning?",
        "max_tokens": 200
    }
)