Data Ingestion API

The Data Ingestion API allows you to upload files, connect to cloud storage providers, and monitor the processing status of your documents.

Upload Files

Upload multiple files for processing and embedding.

POST /api/data-ingestion/upload-files

Request Body

Parameter	Type	Description	Default
files	array	Array of files to upload (multipart/form-data)	Required
chunk_size	integer	Size of text chunks for processing	500
overlap_pct	integer	Percentage of overlap between chunks	10
embedding_model	string	Model to use for generating embeddings	”default-embedding”
pdf_extractor	string	PDF extraction method (pypdf2, pdfplumber, pymupdf, pymupdf4llm)	“pypdf2”

Response

Returns a status object with information about the uploaded files and processing status.

Connect to Cloud Storage

Connect to a cloud storage provider and ingest files.

POST /api/data-ingestion/connect-cloud

Request Body

Parameter	Type	Description	Default
provider	string	Cloud provider (gdrive, dropbox, onedrive, s3, azure, gdocs, web, excel)	Required
access_token	string	Access token for the cloud provider	Required
refresh_token	string	Refresh token (if applicable)	null
token_expiry	string	Token expiry timestamp	null
folder_path	string	Path to folder in cloud storage	null
chunk_size	integer	Size of text chunks for processing	500
overlap_pct	integer	Percentage of overlap between chunks	10
embedding_model	string	Model to use for generating embeddings	”default-embedding”
pdf_extractor	string	PDF extraction method	”pypdf2”

Response

Returns a status object with information about the connected cloud storage and processing status.

Get Ingestion Status

Get the status of a specific ingestion process.

GET /api/data-ingestion/status/{file_id}

Path Parameters

Parameter	Type	Description
file_id	string	ID of the file to check status for

Response

Returns the current status of the ingestion process for the specified file.

List Ingested Files

Get all ingested files and their status.

GET /api/data-ingestion/files

Response

Returns a list of all ingested files with their processing status.

Delete All Ingested Files

Delete all ingested files and their associated data.

DELETE /api/data-ingestion/files

Response

Returns a confirmation of the deletion operation.

Delete Specific Ingested File

Delete an ingested file and all its associated data.

DELETE /api/data-ingestion/files/{file_id}

Path Parameters

Parameter	Type	Description
file_id	string	ID of the file to delete

Response

Returns a confirmation of the deletion operation.

Ingest URL

Ingest a document from a URL.

POST /api/data-ingestion/ingest-url

Request Body

Parameter	Type	Description	Default
url	string	URL of the document to ingest	Required
filename	string	Custom filename for the document	null
chunk_size	integer	Size of text chunks for processing (100-2000)	500
overlap_pct	integer	Percentage of overlap between chunks (0-50)	10
embedding_model	string	Model to use for generating embeddings	”default-embedding”
pdf_extractor	string	PDF extraction method	”pypdf2”

Response

Returns a status object with information about the ingested URL and processing status.

Reprocess File

Retry processing a file that already exists in the system.

POST /api/data-ingestion/files/{file_id}/reprocess

Path Parameters

Parameter	Type	Description
file_id	string	ID of the file to reprocess

Query Parameters

Parameter	Type	Description	Default
chunk_size	integer	Size of text chunks for processing	500
overlap_pct	integer	Percentage of overlap between chunks	10
embedding_model	string	Model to use for generating embeddings	”default-embedding”
pdf_extractor	string	PDF extraction method	”pypdf2”

Response

Returns a status object with information about the reprocessing operation.

Update Ingestion Settings

Update default ingestion settings.

POST /api/data-ingestion/settings

Request Body

Parameter	Type	Description	Default
chunk_size	integer	Size of text chunks for processing (100-2000)	500
overlap_pct	integer	Percentage of overlap between chunks (0-50)	10
embedding_model	string	Model to use for generating embeddings	”default-embedding”
pdf_extractor	string	PDF extraction method	”pypdf2”

Response

Returns the updated settings configuration.

API Overview

Data Ingestion

Knowledge Graph

Vector Indexing

Retrieval & RAG

Cloud Integration

Data Ingestion

Data Ingestion API

Upload Files

Request Body

Response

Connect to Cloud Storage

Request Body

Response

Get Ingestion Status

Path Parameters

Response

List Ingested Files

Response

Delete All Ingested Files

Response

Delete Specific Ingested File

Path Parameters

Response

Ingest URL

Request Body

Response

Reprocess File

Path Parameters

Query Parameters

Response

Update Ingestion Settings

Request Body

Response

API Overview

Data Ingestion

Knowledge Graph

Vector Indexing

Retrieval & RAG

Cloud Integration

​Data Ingestion API

​Upload Files

​Request Body

​Response

​Connect to Cloud Storage

​Request Body

​Response

​Get Ingestion Status

​Path Parameters

​Response

​List Ingested Files

​Response

​Delete All Ingested Files

​Response

​Delete Specific Ingested File

​Path Parameters

​Response

​Ingest URL

​Request Body

​Response

​Reprocess File

​Path Parameters

​Query Parameters

​Response

​Update Ingestion Settings

​Request Body

​Response

Data Ingestion API

Upload Files

Request Body

Response

Connect to Cloud Storage

Request Body

Response

Get Ingestion Status

Path Parameters

Response

List Ingested Files

Response

Delete All Ingested Files

Response

Delete Specific Ingested File

Path Parameters

Response

Ingest URL

Request Body

Response

Reprocess File

Path Parameters

Query Parameters

Response

Update Ingestion Settings

Request Body

Response