API endpoints for uploading and processing documents
Parameter | Type | Description | Default |
---|---|---|---|
files | array | Array of files to upload (multipart/form-data) | Required |
chunk_size | integer | Size of text chunks for processing | 500 |
overlap_pct | integer | Percentage of overlap between chunks | 10 |
embedding_model | string | Model to use for generating embeddings | ”default-embedding” |
pdf_extractor | string | PDF extraction method (pypdf2, pdfplumber, pymupdf, pymupdf4llm) | “pypdf2” |
Parameter | Type | Description | Default |
---|---|---|---|
provider | string | Cloud provider (gdrive, dropbox, onedrive, s3, azure, gdocs, web, excel) | Required |
access_token | string | Access token for the cloud provider | Required |
refresh_token | string | Refresh token (if applicable) | null |
token_expiry | string | Token expiry timestamp | null |
folder_path | string | Path to folder in cloud storage | null |
chunk_size | integer | Size of text chunks for processing | 500 |
overlap_pct | integer | Percentage of overlap between chunks | 10 |
embedding_model | string | Model to use for generating embeddings | ”default-embedding” |
pdf_extractor | string | PDF extraction method | ”pypdf2” |
Parameter | Type | Description |
---|---|---|
file_id | string | ID of the file to check status for |
Parameter | Type | Description |
---|---|---|
file_id | string | ID of the file to delete |
Parameter | Type | Description | Default |
---|---|---|---|
url | string | URL of the document to ingest | Required |
filename | string | Custom filename for the document | null |
chunk_size | integer | Size of text chunks for processing (100-2000) | 500 |
overlap_pct | integer | Percentage of overlap between chunks (0-50) | 10 |
embedding_model | string | Model to use for generating embeddings | ”default-embedding” |
pdf_extractor | string | PDF extraction method | ”pypdf2” |
Parameter | Type | Description |
---|---|---|
file_id | string | ID of the file to reprocess |
Parameter | Type | Description | Default |
---|---|---|---|
chunk_size | integer | Size of text chunks for processing | 500 |
overlap_pct | integer | Percentage of overlap between chunks | 10 |
embedding_model | string | Model to use for generating embeddings | ”default-embedding” |
pdf_extractor | string | PDF extraction method | ”pypdf2” |
Parameter | Type | Description | Default |
---|---|---|---|
chunk_size | integer | Size of text chunks for processing (100-2000) | 500 |
overlap_pct | integer | Percentage of overlap between chunks (0-50) | 10 |
embedding_model | string | Model to use for generating embeddings | ”default-embedding” |
pdf_extractor | string | PDF extraction method | ”pypdf2” |