Create Pipeline
This creates an pipeline of the Ingestion, Search, and Deletion types.
The base URL for this API is in the following format:
https://controller.api.${CLOUD_REGION}.zillizcloud.com
- You need to replace
${CLOUD_REGION}
with the appropriate region for your deployment. - To get the cloud region ID, refer to On Zilliz Cloud Console or List Cloud Regions.
export CLOUD_REGION="gcp-us-west1"
export BASE_URL="https://controller.api.${CLOUD_REGION}.zillizcloud.com"
The authentication token should be an API key with appropriate privileges.
Name of the pipeline to create. The pipeline name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores.
Description of the pipeline to create.
Type of the pipeline to create. For an ingestion pipeline, the value should be INGESTION
. For more information, refer to Understanding pipelines.
Functions to add in the ingestion pipeline to create. An ingestion pipeline must have one and only one INDEX function and can have 0-50 PRESERVE functions.
A function to add. You have the following options.
A function to add.
Name of the function to add. The function name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores.
Type of the function to add. For a text ingestion pipeline, the value should be INDEX_TEXT
or PRESERVE
.
Name of the embedding model used to convert the source data into vector embeddings. For possible values, refer to Which embedding model does Zilliz Cloud Pipelines use?.
Language of your documents.
A function to add.
Name of the function to add. The function name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores.
Type of the function to add. For a doc ingestion pipeline, the value should be INDEX_DOC
or PRESERVE
.
Name of the embedding model used to convert the source data into vector embeddings. For possible values, refer to Which embedding model does Zilliz Cloud Pipelines use?.
Language of your documents.
The maximum size of a splitted doc segment. For more information about the supported chunk size range of each embedding model, please refer to Zilliz Cloud Limits.
The splitters for Zilliz Cloud to split the specified document into smaller chunks. The value defaults to ["\n\n", "\n", " ", ""]
.
A splitter.
A function to add.
Name of the function to add. The function name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores.
Type of the function to add. For an image ingestion pipeline, the value should be INDEX_IMAGE
or PRESERVE
.
Name of the embedding model used to convert the source data into vector embeddings. For possible values, refer to Which embedding model does Zilliz Cloud Pipelines use?.
A function to add.
Name of the function to add. The function name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores.
Type of the function to add. For an ingestion pipeline that preserves metadata, the value should be PRESERVE
.
Name the input field according to your needs. In a preserve function of an ingestion pipeline, Zilliz Cloud uses the value as the name of a field in the collection to create.
Name of the output field. The value should be the same as that of input_field
.
Data type of the field to create in the target collection.
ID of the target cluster to which the pipeline applies.
Name of the target collection to which the pipeline applies. Zilliz Cloud checks if the collection with the specified name exists. If the collection exists, Zilliz Cloud creates a pipeline for that collection. Otherwise, Zilliz Cloud creates a new collection with the specified name.
ID of the project to which the target cluster belongs.
Name of the pipeline to create. The pipeline name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores.
Description of the pipeline to create.
Type of the pipeline to create. For a search pipeline, the value should be SEARCH
. For more information, refer to Understanding pipelines.
Functions to add in the search pipeline to create. A search pipeline can only have one function.
A function to add.
Name of the function to add. The function name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores.
Type of the function to add. For a text search pipeline, the value should be SEARCH_TEXT
.
ID of the target cluster to which the pipeline applies.
Name of the target collection to which the pipeline applies.
An embedding model in use for vector search. It should be consistent with the embedding model chosen in its compatible collection. For more information, refer to Which embedding model does Zilliz Cloud Pipelines use?.
A reranking model used to rank a set of candidate outputs to improve the quality of the search results. Currently, the only possible value is zilliz/bge-reranker-base
. For more information, refer to Reranker.
A function to add.
Name of the function to add. The function name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores.
Type of the function to add. For a doc search pipeline, the value should be SEARCH_DOC_CHUNK
.
ID of the target cluster to which the pipeline applies.
Name of the target collection to which the pipeline applies.
An embedding model in use for vector search. It should be consistent with the embedding model chosen in its compatible collection. For more information, refer to Which embedding model does Zilliz Cloud Pipelines use?.
A reranking model used to rank a set of candidate outputs to improve the quality of the search results. Currently, the only possible value is zilliz/bge-reranker-base
. For more information, refer to Reranker.
A function to add.
Name of the function to add. The function name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores.
Type of the function to add. For a reverse image search pipeline, the value should be SEARCH_IMAGE_BY_IMAGE
.
ID of the target cluster to which the pipeline applies.
Name of the target collection to which the pipeline applies.
An embedding model in use for vector search. It should be consistent with the embedding model chosen in its compatible collection. For more information, refer to Which embedding model does Zilliz Cloud Pipelines use?.
A reranking model used to rank a set of candidate outputs to improve the quality of the search results. Currently, the only possible value is zilliz/bge-reranker-base
. For more information, refer to Reranker.
A function to add.
Name of the function to add. The function name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores.
Type of the function to add. For an text-image search pipeline , the value should be SEARCH_IMAGE_BY_TEXT
.
ID of the target cluster to which the pipeline applies.
Name of the target collection to which the pipeline applies.
An embedding model in use for vector search. It should be consistent with the embedding model chosen in its compatible collection. For a text-image search pipeline, the only possible value is zilliz/clip-vit-base-patch32-multilingual-v1
.For more information, refer to Which embedding model does Zilliz Cloud Pipelines use?.
A reranking model used to rank a set of candidate outputs to improve the quality of the search results. Currently, the only possible value is zilliz/bge-reranker-base
. For more information, refer to Reranker.
ID of the project to which the target cluster belongs.
Name of the pipeline to create. The pipeline name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores.
Description of the pipeline to create.
Type of the pipeline to create. For a deletion pipeline, the value should be DELETION
. For more information, refer to Understanding pipelines.
Functions to add in the deletion pipeline to create. A deletion pipeline can only have one function.
Name of the function to add. The function name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores.
Type of the function to add. For a text deletion pipeline, the value should be PURGE_TEXT_INDEX
or PURGE_BY_EXPRESSION
.
Name of the function to add. The function name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores.
Type of the function to add. For a doc deletion pipeline, the value should be PURGE_DOC_INDEX
or PURGE_BY_EXPRESSION
.
Name of the function to add. The function name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores.
Type of the function to add. For an image deletion pipeline, the value should be PURGE_IMAGE_INDEX
or PURGE_BY_EXPRESSION
.
ID of the target cluster to which the pipeline applies.
Name of the target collection to which the pipeline applies.
ID of the project to which the target cluster belongs.
export TOKEN="YOUR_API_KEY"
curl --request POST \
--url "${BASE_URL}/v1/pipelines" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Content-Type: application/json" \
-d '{
"name": "my_text_ingestion_pipeline",
"clusterId": "inxx-xxxxxxxxxxxxxxx",
"projectId": "proj-xxxx",
"collectionName": "my_collection",
"description": "A pipeline that generates text embeddings and stores additional fields.",
"type": "INGESTION",
"functions": [
{
"name": "index_my_text",
"action": "INDEX_TEXT",
"language": "ENGLISH",
"embedding": "zilliz/bge-base-en-v1.5"
},
{
"name": "keep_text_info",
"action": "PRESERVE",
"inputField": "source",
"outputField": "source",
"fieldType": "VarChar"
}
]
}'
export TOKEN="YOUR_API_KEY"
curl --request POST \
--url "${BASE_URL}/v1/pipelines" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Content-Type: application/json" \
-d '{
"projectId": "proj-xxxx",
"name": "my_doc_ingestion_pipeline",
"description": "A pipeline that splits a doc file into chunks and generates embeddings. It also stores the publish_year with each chunk.",
"type": "INGESTION",
"functions": [
{
"name": "index_my_doc",
"action": "INDEX_DOC",
"language": "ENGLISH",
"chunkSize": 500,
"embedding": "zilliz/bge-base-en-v1.5",
"splitBy": [
"\n\n",
"\n",
" ",
""
]
},
{
"name": "keep_doc_info",
"action": "PRESERVE",
"inputField": "publish_year",
"outputField": "publish_year",
"fieldType": "Int16"
}
],
"clusterId": "inxx-xxxxxxxxxxxxxxx",
"newCollectionName": "my_collection"
}'
export TOKEN="YOUR_API_KEY"
curl --request POST \
--url "${BASE_URL}/v1/pipelines" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Content-Type: application/json" \
-d '{
"name": "my_image_ingestion_pipeline",
"clusterId": "inxx-xxxxxxxxxxxxxxx",
"projectId": "proj-xxxx",
"collectionName": "my_collection",
"description": "A pipeline that converts an image into vector embeddings and store in efficient index for search.",
"type": "INGESTION",
"functions": [
{
"name": "index_my_image",
"action": "INDEX_IMAGE",
"embedding": "zilliz/vit-base-patch16-224"
},
{
"name": "keep_image_tag",
"action": "PRESERVE",
"inputField": "image_title",
"outputField": "image_title",
"fieldType": "VarChar"
}
]
}'
Indicates whether the request succeeds.
The information of the ingestion pipeline just created.
ID of the pipeline.
Name of the pipeline.
Type of the pipeline. For an ingestion pipeline, the value should be INGESTION
.
Timestamp indicating when the pipeline is created.
Description of the pipeline.
Current status of the pipeline. If the value is other than SERVING
, the pipeline is not working.
The total token usage of the pipeline.
Statistics of the total token usage of the pipeline.
The total token usage of the embedding model.
Functions in the pipeline. An ingestion pipeline must have one and only one INDEX function and can have 0-50 PRESERVE functions.
A function in the pipeline.
Functions of a text ingestion pipeline.
Name of the function.
Type of the function. For a text ingestion pipeline, the value should be INDEX_TEXT
or PRESERVE
.
Name the field according to your needs. For a text ingestion pipeline, use it for a list of texts to be ingested.
An input field.
Language that your source data is in.
Name of the embedding model in use.
Functions of a doc ingestion pipeline.
Name of the function.
Type of the function. For a doc ingestion pipeline, the value should be INDEX_DOC
or PRESERVE
.
Name the field according to your needs. For a doc ingestion pipeline, use it for the pre-signed url of the doc to be ingested.
Language that your source data is in.
The maximum size of a splitted document segment. The allowed chunk size range depends on the embedding model in use. For more information, refer to Zilliz Cloud Limits
Name of the embedding model in use.
The splitters for Zilliz Cloud to split the specified document into smaller chunks. The value defaults to ["\n\n", "\n", " ", ""]
.
A splitter.
Functions of an image ingestion pipeline.
Name of the function.
Type of the function. For an image ingestion pipeline, the value should be INDEX_IMAGE
or PRESERVE
.
Name the fields according to your needs. In an image ingestion pipeline: image_url
stands for pre-signed image URLs in object storage buckets, and image_id
stands for the image ID.
An input field.
Name of the embedding model in use.
Functions for an ingestion pipeline to preserve metadata.
Name of the function.
Type of the function. For an ingestion pipeline that preserves metadata, the value should be PRESERVE
.
Name the field according to your needs. In a PRESERVE
function of an ingestion pipeline, Zilliz Cloud uses the value as the name of a field in the collection to create.
Name of the output field. The value should be the same as that of input_field
.
Data type of the field to create in the target collection.
The target cluster to which the pipeline applies.
The target collection to which the pipeline applies.
Indicates whether the request succeeds.
The information about the search pipeline just created.
A pipeline ID.
Name of the pipeline
Type of the pipeline. For a search pipeline, the value should be SEARCH
.
Description of the pipeline.
Current status of the pipeline. If the value is not SERVING
, the pipeline is not working.
Total token usage of the pipeline.
Statistics on the total token usage of the pipeline.
Token usage of the embedding model.
Token usage of the reranker model.
Functions in the pipeline. A search pipeline can only have one function.
A Function in the created pipeline.
Functions of a text search pipeline.
Name of the function.
Type of the function. For a text search pipeline, the value should be SEARCH_TEXT
.
Name the field according to your needs. For a text search pipeline, use it for the query text (query_text
).
An input field.
The embedding model in use.
The reranking model in use.
Name of the target collection to which the pipeline applies.
ID of the target cluster to which the pipeline applies.
Functions of a doc search pipeline.
Name of the function to add. The function name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores.
Type of the function to add. For a doc search pipeline, the value should be SEARCH_DOC_CHUNK
.
ID of the target cluster to which the pipeline applies.
Name of the target collection to which the pipeline applies.
An embedding model in use for vector search. It should be consistent with the embedding model chosen in its compatible collection. For more information, refer to Which embedding model does Zilliz Cloud Pipelines use?.
A reranking model used to rank a set of candidate outputs to improve the quality of the search results. Currently, the only possible value is zilliz/bge-reranker-base
. For more information, refer to Reranker.
Name the field according to your needs. For a doc search pipeline, use it for the query text (query_text
).
An input field.
Functions of a reverse image search pipeline.
Name of the function to add. The function name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores.
Type of the function to add. For a reverse image search pipeline, the value should be SEARCH_IMAGE_BY_IMAGE
.
ID of the target cluster to which the pipeline applies.
Name of the target collection to which the pipeline applies.
An embedding model in use for vector search. It should be consistent with the embedding model chosen in its compatible collection. For more information, refer to Which embedding model does Zilliz Cloud Pipelines use?.
The reranker model in use .
Name the field according to your needs. For a reverse image search pipeline, use it for the query image URL (query_image_url
).
An input field.
Functions of an image search pipeline.
Name of the function to add. The function name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores.
Type of the function to add. For an image search pipeline, the value should be SEARCH_IMAGE_BY_TEXT
.
ID of the target cluster to which the pipeline applies.
Name of the target collection to which the pipeline applies.
An embedding model in use for vector search. It should be consistent with the embedding model chosen in its compatible collection. For more information, refer to Which embedding model does Zilliz Cloud Pipelines use?.
The reranker model in use .
Name the field according to your needs. For a text-image search pipeline, use it for the query text (query_text
).
An input field.
Indicates whether the request succeeds.
The information about the deletion pipeline just created.
ID of the pipeline.
Name of the pipeline.
Type of the pipeline. For a deletion pipeline, the value should be DELETION
.
Timestamp indicating when the pipeline is created.
Description of the pipeline.
Current status of the pipeline. If the value is other than SERVING
, the pipeline is not working.
Functions in the pipeline. An ingestion pipeline must have one and only one INDEX function and can have 0-50 PRESERVE functions.
A function in the pipeline.
Functions of a text deletion pipeline.
Name of the function.
Type of the function. For a text deletion pipeline, the value should be PURGE_TEXT_INDEX
or PURGE_BY_EXPRESSION
.
Name the field according to your needs. For a text ingestion pipeline, use it for the ID of the text (id
) to be deleted or an expression (expression
).
An input field.
Functions of a doc deletion pipeline.
Name of the function.
Type of the function. For a doc deletion pipeline, the value should be PURGE_DOC_INDEX
or PURGE_BY_EXPRESSION
.
Name the field according to your needs. For a doc deletion pipeline, use it for the name of the doc (doc_name
) to be deleted or an expression (expression
).
Functions of an image deletion pipeline.
Name of the function to add. The function name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores.
Type of the function to add. For an image deletion pipeline, the value should be PURGE_IMAGE_INDEX
or PURGE_BY_EXPRESSION
.
Name the fields according to your needs. In an image deletion pipeline, use it for the ID of the image (image_id
) to delete or an expression (expression
).
An input field.
The target cluster to which the pipeline applies.
The target collection to which the pipeline applies.
Returns an error message.
Response code.
Error message.
{
"code": 200,
"data": {
"pipelineId": "pipe-xxx",
"name": "my_text_ingestion_pipeline",
"type": "INGESTION",
"createTimestamp": 1721187300000,
"description": "A pipeline that generates text embeddings and stores additional fields.",
"status": "SERVING",
"totalUsage": {
"embedding": 0
},
"functions": [
{
"name": "index_my_text",
"action": "INDEX_TEXT",
"inputFields": [
"text_list"
],
"language": "ENGLISH",
"embedding": "zilliz/bge-base-en-v1.5"
},
{
"name": "keep_text_info",
"action": "PRESERVE",
"inputField": "source",
"outputField": "source",
"fieldType": "VarChar"
}
],
"clusterId": "inxx-xxxx",
"collectionName": "my_collection"
}
}
{
"code": 200,
"data": {
"pipelineId": "pipe-xxxx",
"name": "my_doc_ingestion_pipeline",
"type": "INGESTION",
"createTimestamp": 1721187300000,
"description": "A pipeline that splits a doc file into chunks and generates embeddings. It also stores the publish_year with each chunk.",
"status": "SERVING",
"totalUsage": {
"embedding": 0
},
"functions": [
{
"action": "INDEX_DOC",
"name": "index_my_doc",
"inputField": "doc_url",
"language": "ENGLISH",
"chunkSize": 500,
"embedding": "zilliz/bge-base-en-v1.5",
"splitBy": [
"\n\n",
"\n",
" ",
""
]
},
{
"action": "PRESERVE",
"name": "keep_doc_info",
"inputField": "publish_year",
"outputField": "publish_year",
"fieldType": "Int16"
}
],
"clusterId": "in03-***************",
"collectionName": "my_collection"
}
}
{
"code": 200,
"data": {
"pipelineId": "pipe-xxxx",
"name": "my_image_ingestion_pipeline",
"type": "INGESTION",
"createTimestamp": 1721187300000,
"clusterId": "in03-***************",
"collectionName": "my_collection",
"description": "A pipeline that converts an image into vector embeddings and store in efficient index for search.",
"status": "SERVING",
"totalUsage": {
"embedding": 0
},
"functions": [
{
"action": "INDEX_IMAGE",
"name": "index_my_image",
"inputFields": [
"image_url",
"image_id"
],
"embedding": "zilliz/vit-base-patch16-224"
},
{
"action": "PRESERVE",
"name": "keep_image_tag",
"inputField": "image_title",
"outputField": "image_title",
"fieldType": "VarChar"
}
]
}
}