Skip to main content

Create Pipeline

This creates an pipeline.


POST
https://controller.${CLOUD_REGION}.zillizcloud.com/v1/pipelines

Example

📘Notes

This API requires an API key as the authentication token.

Currently, you can create pipelines to ingest data into and search/purge data from your collections. The request parameters vary with the type of pipelines you want to create and the data you want to process.

export CLOUD_REGION="gcp-us-west1"
export API_KEY=""

curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${API_KEY}" \
--url "https://controller.api.${CLOUD_REGION}.zillizcloud.com/v1/pipelines" \
-d '{
"name": "my_text_ingestion_pipeline",
"clusterId": "inxx-xxxxxxxxxxxxxxx",
"projectId": "proj-xxxx",
"collectionName": "my_collection",
"description": "A pipeline that generates text embeddings and stores additional fields.",
"type": "INGESTION",
"functions": [
{
"name": "index_my_text",
"action": "INDEX_TEXT",
"language": "ENGLISH",
"embedding": "zilliz/bge-base-en-v1.5"
},
{
"name": "keep_text_info",
"action": "PRESERVE",
"inputField": "source",
"outputField": "source",
"fieldType": "VarChar"
}
]
}'

Possible response is similar to the following

{
"code": 200,
"data": {
"pipelineId": "pipe-xxxx",
"name": "my_text_ingestion_pipeline",
"type": "INGESTION",
"clusterId": "inxx-xxxxxxxxxxxxxxx",
"collectionName": "my_collection"
"description": "A pipeline that generates text embeddings and stores additional fields.",
"status": "SERVING",
"functions": [
{
"action": "INDEX_TEXT",
"name": "index_my_text",
"inputFields": ["text_list"],
"language": "ENGLISH",
"embedding": "zilliz/bge-base-en-v1.5"
},
{
"action": "PRESERVE",
"name": "keep_text_info",
"inputField": "source",
"outputField": "source",
"fieldType": "VarChar"
}
]
}
}

Request

Parameters

  • No query parameters required

  • No path parameters required

  • No header parameters required

Request Body

Option 1:

{
"name": "string",
"type": "string",
"description": "string",
"functions": [
{
"name": "string",
"action": "string",
"inputField": "string",
"outputField": "string",
"fieldType": "string"
}
],
"clusterId": "string",
"collectionName": "string",
"projectId": "string"
}
ParameterDescription
namestring
Name of the pipeline to create.
typestring
Type of the pipeline to create. For an ingestion pipeline, the value should be INGESTION.
descriptionstring
Description of the pipeline to create.
functionsarray
Actions to take in the pipeline to create. For an ingestion pipeline, you can add only one doc-indexing function and multilpe preserve functions.
functions[]object | object | object | object
functions[][opt_1]object
functions[][opt_1].namestring
Name of the function to create.
functions[][opt_1].actionstring
Type of the function to create. For an ingestion pipeline, possible values are INDEX_DOC, INDEX_TEXT, INDEX_IMAGE, and PRESERVE.
functions[][opt_1].embeddingstring
Name of the embedding model used to convert the text into vector embeddings. For possible values, refer to Ingest, Search, and Delete Data.
functions[][opt_1].languagestring
Language of your documents. Possible values are ENGLISH and CHINESE.
functions[][opt_2]object
functions[][opt_2].namestring
Name of the function to create.
functions[][opt_2].actionstring
Type of the function to create. For an ingestion pipeline, possible values are INDEX_DOC, INDEX_TEXT, INDEX_IMAGE, and PRESERVE.
functions[][opt_2].embeddingstring
Name of the embedding model used to convert the text into vector embeddings. For possible values, refer to Ingest, Search, and Delete Data.
functions[][opt_2].languagestring
Language of your documents. Possible values are ENGLISH and CHINESE.
functions[][opt_2].chunkSizestring
The maximum size of a splitted doc segment
The value defaults to 500
functions[][opt_2][].splitByarray
The splitters for Zilliz Cloud to split the specified document into smaller chunks. The value defaults to ["\n\n", "\n", " ", ""].
functions[][opt_2][].splitBy[]string
A splitter.
functions[][opt_3]object
functions[][opt_3].namestring
Name of the function to create.
functions[][opt_3].actionstring
Type of the function to create. For an ingestion pipeline, possible values are INDEX_DOC, INDEX_TEXT, INDEX_IMAGE, and PRESERVE.
functions[][opt_3].embeddingstring
Name of the embedding model used to convert the text into vector embeddings. For possible values, refer to Ingest, Search, and Delete Data.
functions[][opt_4]object
functions[][opt_4].namestring
Name of the function to create.
functions[][opt_4].actionstring
Type of the function to create. For an ingestion pipeline, possible values are INDEX_DOC and PRESERVE.
functions[][opt_4].inputFieldstring
Name the field according to your needs. In a preserve function of an ingestion pipeline, Zilliz Cloud uses the value as the name of a field in the collection to create.
functions[][opt_4].outputFieldstring
Name of the output field. The value should be the same as that of input_field.
functions[][opt_4].fieldTypestring
Data type of the field to create in the target collection. Possible values are BOOL, INT8, INT16, INT32, INT64, FLOAT, DOUBLE, and VARCHAR.
clusterIdstring
ID of a target cluster. You can find it in cluster details on Zilliz Cloud console.
collectionNamestring
Name of the collection to create in the specified cluster. Zilliz Cloud creates a new collection and name it using this value.
projectIdstring
ID of the project to which the target cluster belongs.

Option 2:

{
"name": "string",
"description": "string",
"type": "string",
"functions": [
{
"name": "string",
"action": "string",
"clusterId": "string",
"collectionName": "string",
"embedding": "string",
"reranker": "string"
}
],
"projectId": "string"
}
ParameterDescription
namestring
Name of the pipeline to create.
descriptionstring
Description of the pipeline to create.
typestring
Type of the pipeline to create. For a search pipeline, the value should be SEARCH.
functionsarray
Actions to take in the search pipeline to create. You can define multiple functions to retrieve results from different collections.
functions[]object
functions[].namestring
Name of the function to create.
functions[].actionstring
Type of the function to create. For a search pipeline, possible value is SEARCH_TEXT, SEARCH_DOC_CHUNK, SEARCH_IMAGE_BY_IMAGE, and SEARCH_IMAGE_BY_TEXT.
functions[].clusterIdstring
ID of a target collection in which Zilliz Cloud concducts the search.
functions[].collectionNamestring
Name of the collection in which ZIlliz Cloud conducts the search.
functions[].embeddingstring
The embedding model used during vector search. The model should be consistent with the one chosen in the compatible collection.
functions[].rerankerstring
If you need to reorder or rank a set of candidate outputs to improve the quality of the search results, set this parameter to a reranker model. This parameter applies only to pipelines for Text and Doc Data. Currently, only zilliz/bge-reranker-base is available as the parameter value.
projectIdstring
ID of the project to which the target cluster belongs

Option 3:

{
"name": "string",
"description": "string",
"type": "string",
"functions": [
{
"name": "string",
"action": "string"
}
],
"clusterId": "string",
"collectionName": "string",
"projectId": "string"
}
ParameterDescription
namestring
Name of the pipeline to create.
descriptionstring
Description of the pipeline to create.
typestring
Type of the pipeline to create. For a deletion pipeline, the value should be DELETION
functionsarray
Actions to take in the pipeline to create.
functions[]object
functions[].namestring
Name of the function to create.
functions[].actionstring
Type of the function to create. For a delete pipeline, possible value is PURGE_BY_EXPRESSION, PURGE_DOC_INDEX, and PURGE_IMAGE_INDEX.
clusterIdstring
ID of a target cluster. You can find it in cluster details on Zilliz Cloud console.
collectionNamestring
Name of the collection to create in the specified cluster. Zilliz Cloud creates a new collection and name it using this value.
projectIdstring
ID of the project to which the target cluster belongs.

Response

Returns information about the pipeline just created.

Response Body

Option 1:

{
"code": "integer",
"data": {
"pipelineId": "integer",
"name": "string",
"type": "string",
"description": "string",
"status": "string",
"functions": {
"oneOf": [
{
"name": "string",
"action": "string",
"inputFields": [
{}
],
"langauge": "string",
"embedding": "string"
},
{
"name": "string",
"action": "string",
"inputField": "string",
"langauge": "string",
"chunkSize": "integer",
"embedding": "string",
"splitBy": "string"
},
{
"name": "string",
"action": "string",
"inputFields": [
{}
],
"embedding": "string"
},
{
"name": "string",
"action": "string",
"inputField": "string",
"outputField": "string",
"fieldType": "string"
}
]
},
"clusterID": "string",
"collectionName": "string"
}
}
PropertyDescription
codeinteger
Indicates whether the request succeeds.
  • 0: The request succeeds.
  • Others: Some error occurs.
dataobject
data.pipelineIdinteger
A pipeline ID.
data.namestring
Name of the pipeline.
data.typestring
Type of the pipeline. For an ingestion pipeline, the value should be INGESTION.
data.descriptionstring
Description of the pipeline.
data.statusstring
Current status of the pipeline. If the value is other than SERVING, the pipeline is not working.
functionsobject | object | object | object
Functions in the pipeline. For an ingestion pipeline, there should be only one INDEX_DOC function.
functions[opt_1]object
functions[opt_1].namestring
Name of the function to create.
functions[opt_1].actionstring
Type of the function to create. For an ingestion pipeline, possible values are INDEX_DOC and PRESERVE.
functions[opt_1][].inputFieldsarray
Names the fields according to your needs. In an INDEX_TEXT function of an ingestion pipeline, use them for the user-provided texts.
functions[opt_1][].inputFields[]string
An input field.
functions[opt_1].langaugestring
Language that your document is in. Possible values are english or chinese. The parameter applies only to ingestion pipelines.
functions[opt_1].embeddingstring
Name of the embedding model in use.
functions[opt_2]object
functions[opt_2].namestring
Name of the function to create.
functions[opt_2].actionstring
Type of the function to create. For an ingestion pipeline, possible values are INDEX_DOC and PRESERVE.
functions[opt_2].inputFieldstring
Name the field according to your needs. In an INDEX_DOC function of an ingestion pipeline, use it for pre-signed document URLs in GCS or AWS S3 buckets.
functions[opt_2].langaugestring
Language that your document is in. Possible values are english or chinese. The parameter applies only to ingestion pipelines.
functions[opt_2].chunkSizeinteger
The maximum size of a splitted document segment.
functions[opt_2].embeddingstring
Name of the embedding model in use.
functions[opt_2].splitBystring
The splitters that Zilliz Cloud uses to split the specified docs.
functions[opt_3]object
functions[opt_3].namestring
Name of the function to create.
functions[opt_3].actionstring
Type of the function to create. For an ingestion pipeline, possible values are INDEX_DOC and PRESERVE.
functions[opt_3][].inputFieldsarray
Names the fields according to your needs. In an INDEX_IMAGE function of an ingestion pipeline: image_url stands for pre-signed image URLs in GCS or AWS S3 buckets, and image_id stands for the image ID.
functions[opt_3][].inputFields[]string
An input field.
functions[opt_3].embeddingstring
Name of the embedding model in use.
functions[opt_4]object
functions[opt_4].namestring
Name of the function to create.
functions[opt_4].actionstring
Type of the function to create. For an ingestion pipeline, possible values are INDEX_DOC and PRESERVE.
functions[opt_4].inputFieldstring
Name the field according to your needs. In a preserve function of an ingestion pipeline, Zilliz Cloud uses the value as the name of a field in the collection to create.
functions[opt_4].outputFieldstring
Name of the output field. The value should be the same as that of input_field.
functions[opt_4].fieldTypestring
Data type of the field to create in the target collection. Possible values are BOOL, INT8, INT16, INT32, INT64, FLOAT, DOUBLE, and VARCHAR.
data.clusterIDstring
The target cluster to which the pipeline applies.
data.collectionNamestring
The target collection to which the pipeline applies.

Option 2:

{
"code": "integer",
"data": {
"pipelineId": "integer",
"name": "string",
"type": "string",
"description": "string",
"status": "string",
"functions": [
{
"name": "string",
"action": "string",
"inputFields": [
{}
],
"clusterID": "string",
"collectionName": "string",
"reranker": "string"
}
]
}
}
PropertyDescription
codeinteger
Indicates whether the request succeeds.
  • 0: The request succeeds.
  • Others: Some error occurs.
dataobject
data.pipelineIdinteger
A pipeline ID.
data.namestring
Name of the pipeline
data.typestring
Type of the pipeline. For a search pipeline, the value should be SEARCH.
data.descriptionstring
Description of the pipeline.
data.statusstring
Current status of the pipeline. If the value is not SERVING, the pipeline is not working.
data[].functionsarray
Functions in the pipeline. For a search pipeline, each of its member functions targets at a different collection.
data[].functions[]object
data[].functions[].namestring
Name of the function.
data[].functions[].actionstring
Type of the function. For a search function, the value should be SEARCH_DOC_CHUNKS, SEARCH_TEXT, SEARCH_IMAGE_BY_IMAGE, and SEARCH_IMAGE_BY_TEXT.
data[].functions[][].inputFieldsarray
Name of the input fields.
data[].functions[][].inputFields[]string
For a SEARCH_DOC_CHUNKS or a SEARCH_IMAGE_BY_TEXT function, you should include query_text as the value.
data[].functions[].clusterIDstring
Target cluster of this function.
data[].functions[].collectionNamestring
Target collection of this function.
data[].functions[].rerankerstring
If you need to reorder or rank a set of candidate outputs to improve the quality of the search results, set this parameter to a reranker model. This parameter applies only to pipelines for Text and Doc Data. Currently, only zilliz/bge-reranker-base is available as the parameter value.

Option 3:

{
"code": "integer",
"data": {
"pipelineId": "integer",
"name": "string",
"type": "string",
"description": "string",
"status": "string",
"functions": [
{
"name": "string",
"action": "string",
"inputField": "string"
}
],
"clusterID": "string",
"collectionName": "string"
}
}
PropertyDescription
codeinteger
Indicates whether the request succeeds.
  • 0: The request succeeds.
  • Others: Some error occurs.
dataobject
data.pipelineIdinteger
A pipeline ID.
data.namestring
Name of the pipeline.
data.typestring
Type of the pipeline. For a deletion pipeline, the value should be DELETION.
data.descriptionstring
Description of the pipeline.
data.statusstring
Current status of the pipeline. If the value is not SERVING, the pipeline is not working.
data[].functionsarray
Functions in the pipeline. For a deletion pipeline, there can be multiple member functions with each representing a deletion request.
data[].functions[]object
data[].functions[].namestring
Name of the function.
data[].functions[].actionstring
Type of the function. For a deletion pipeline, its member functions should be of PURGE_BY_EXPRESSION, PURGE_DOC_INDEX, and PURGE_IMAGE_BY_ID.
data[].functions[].inputFieldstring
Name of the input field. For a PURGE_DOC_INDEX function, the value should be the name of the doc to delete.
data.clusterIDstring
Target cluster of the pipeline.
data.collectionNamestring
Target collection of the pipeline.

Error Response

{
"code": integer,
"message": string
}
PropertyDescription
codeinteger
Indicates whether the request succeeds.
  • 0: The request succeeds.
  • Others: Some error occurs.
messagestring
Indicates the possible reason for the reported error.