The Vedaya API provides full OpenAI compatibility with special model names for different RAG behaviors.

OpenAI-Compatible Interface (Primary Method)

Basic Setup

from openai import OpenAI

# Authentication - API works with dummy keys
client = OpenAI(
    api_key="sk-dummy",  # Can use any dummy key
    base_url="https://vedaya-kge.fly.dev/v1"  # Note: /v1 not /openai/v1
)

# Query with special Vedaya models
response = client.chat.completions.create(
    model="vedaya-hybrid",  # Special RAG model
    messages=[
        {"role": "user", "content": "Your question here"}
    ],
    temperature=0.7,
    max_tokens=500
)

print(response.choices[0].message.content)

Available RAG Models

Use these special model names to control RAG behavior:
Model NameDescriptionUse Case
vedaya-naiveBasic keyword searchSimple fact retrieval
vedaya-localEntity-focused retrievalFinding specific entities
vedaya-globalRelationship-focusedUnderstanding connections
vedaya-hybridCombined approach (default)General queries

Working Example with Fallback

def query_vedaya(question, mode="vedaya-hybrid"):
    """Query with automatic HTTP fallback"""
    
    # Try OpenAI SDK first
    try:
        from openai import OpenAI
        client = OpenAI(
            api_key="sk-dummy",  
            base_url="https://vedaya-kge.fly.dev/v1"
        )
        
        response = client.chat.completions.create(
            model=mode,
            messages=[{"role": "user", "content": question}],
            temperature=0.7,
            max_tokens=500
        )
        return response.choices[0].message.content
        
    except Exception as e:
        # Fallback to direct HTTP
        import requests
        response = requests.post(
            "https://vedaya-kge.fly.dev/v1/chat/completions",
            headers={"Content-Type": "application/json"},
            json={
                "model": mode,
                "messages": [{"role": "user", "content": question}],
                "temperature": 0.7,
                "max_tokens": 500
            }
        )
        if response.status_code == 200:
            return response.json()['choices'][0]['message']['content']
        return f"Error: {response.status_code}"

Multi-Turn Conversations

Maintain context across multiple queries:
from openai import OpenAI

client = OpenAI(api_key="sk-dummy", base_url="https://vedaya-kge.fly.dev/v1")

# Build conversation history
messages = []

# First question
messages.append({"role": "user", "content": "What are the main topics?"})
response = client.chat.completions.create(
    model="vedaya-hybrid",
    messages=messages,
    max_tokens=300
)
answer = response.choices[0].message.content
messages.append({"role": "assistant", "content": answer})
print(f"Answer 1: {answer}")

# Follow-up question (maintains context)
messages.append({"role": "user", "content": "Tell me more about the first topic"})
response = client.chat.completions.create(
    model="vedaya-hybrid",
    messages=messages,
    max_tokens=300
)
answer = response.choices[0].message.content
print(f"Answer 2: {answer}")

Direct HTTP Request (No SDK)

import requests

response = requests.post(
    "https://vedaya-kge.fly.dev/v1/chat/completions",
    headers={"Content-Type": "application/json"},
    json={
        "model": "vedaya-hybrid",
        "messages": [{"role": "user", "content": "Your question"}],
        "temperature": 0.7,
        "max_tokens": 500
    }
)

if response.status_code == 200:
    answer = response.json()['choices'][0]['message']['content']
    print(answer)

Embeddings Endpoint

Generate embeddings for similarity search:
import requests

# No authentication required
response = requests.post(
    "https://vedaya-kge.fly.dev/v1/embeddings",
    headers={"Content-Type": "application/json"},
    json={
        "model": "text-embedding-ada-002",
        "input": "Text to embed"
    }
)

if response.status_code == 200:
    embedding = response.json()["data"][0]["embedding"]
    print(f"Embedding dimension: {len(embedding)}")

List Available Models

import requests

# Get list of available models
response = requests.get("https://vedaya-kge.fly.dev/v1/models")

if response.status_code == 200:
    models = response.json()["data"]
    for model in models:
        print(f"Model: {model['id']}")

Important Compatibility Notes

What Works

✅ OpenAI chat completions endpoint at /v1/chat/completions
✅ Special vedaya-* models for RAG control
✅ Authentication optional (works with dummy keys)
✅ Multi-turn conversations with context
✅ Embeddings generation
✅ Model listing

What Doesn’t Work

❌ Streaming responses (returns 404)
❌ Real-time token streaming
❌ Ollama endpoints (may not be implemented)

Authentication

  • Optional: API works without authentication
  • If you have a key, add it to headers: "Authorization": f"Bearer {API_KEY}"
  • Dummy keys like “sk-dummy” work for testing

MCP Integration

Check MCP (Model Context Protocol) status and available tools:
import requests

# No authentication required for MCP endpoints

# Get MCP status
response = requests.get("https://vedaya-kge.fly.dev/mcp/status")
if response.status_code == 200:
    print("MCP Status:", response.json())

# Get available tools
response = requests.get("https://vedaya-kge.fly.dev/mcp/tools")
if response.status_code == 200:
    tools = response.json()
    for tool in tools:
        print(f"Tool: {tool}")

# Get MCP info
response = requests.get("https://vedaya-kge.fly.dev/mcp/info")
if response.status_code == 200:
    print("MCP Info:", response.json())

Quick Test Example

# Minimal test to verify API is working
from openai import OpenAI

# 1. Setup client (no real auth needed)
client = OpenAI(api_key="sk-dummy", base_url="https://vedaya-kge.fly.dev/v1")

# 2. Query with RAG
response = client.chat.completions.create(
    model="vedaya-hybrid",
    messages=[{"role": "user", "content": "What documents are available?"}],
    max_tokens=200
)

print(response.choices[0].message.content)