Run Pipeline
This runs a pipeline.
https://controller.${CLOUD_REGION}.zillizcloud.com/v1/pipeline/{PIPELINE_ID}/run
Example
This API requires an API key as the authentication token.
Currently, you can run Zilliz Cloud pipelines to ingest,search, and purge multiple types of data, and the request parameters vary with the data types.
- Ingestion
- Search
- Deletion
- Text Data
- Document Data
- Image Data
export CLOUD_REGION="gcp-us-west1"
export API_KEY=""
export PIPELINE_ID="pipe-xxxxxxxxxxxxxxxxxxxxxx"
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${API_KEY}" \
--url "https://controller.api.${CLOUD_REGION}.zillizcloud.com/v1/pipelines/${PIPELINE_ID}/run" \
-d '{
"data": {
"text_list": ["Zilliz Cloud is a fully managed vector database and data services, empowering you to unlock the full potential of unstructured data for your AI applications.", "It can store, index, and manage massive embedding vectors generated by deep neural networks and other machine learning (ML) models."],
"source": "Zilliz official website"
}
}'
Possible response is similar to the following
{
"code": 200,
"data": {
"num_entities": 2,
"ids": [
449281041373015598,
449281041373015599
],
"usage": {
"embedding": 62
},
}
}
export CLOUD_REGION="gcp-us-west1"
export API_KEY=""
export PIPELINE_ID="pipe-xxxxxxxxxxxxxxxxxxxxxx"
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${API_KEY}" \
--url "https://controller.api.${CLOUD_REGION}.zillizcloud.com/v1/pipelines/${PIPELINE_ID}/run" \
-d '{
"data": {
"doc_url": "https://storage.googleapis.com/example-bucket/zilliz_concept_doc.md?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=example%40example-project.iam.gserviceaccount.com%2F20181026%2Fus-central1%2Fstorage%2Fgoog4_request&X-Goog-Date=20181026T181309Z&X-Goog-Expires=900&X-Goog-SignedHeaders=host&X-Goog-Signature=247a2aa45f169edf4d187d54e7cc46e4731b1e6273242c4f4c39a1d2507a0e58706e25e3a85a7dbb891d62afa8496def8e260c1db863d9ace85ff0a184b894b117fe46d1225c82f2aa19efd52cf21d3e2022b3b868dcc1aca2741951ed5bf3bb25a34f5e9316a2841e8ff4c530b22ceaa1c5ce09c7cbb5732631510c20580e61723f5594de3aea497f195456a2ff2bdd0d13bad47289d8611b6f9cfeef0c46c91a455b94e90a66924f722292d21e24d31dcfb38ce0c0f353ffa5a9756fc2a9f2b40bc2113206a81e324fc4fd6823a29163fa845c8ae7eca1fcf6e5bb48b3200983c56c5ca81fffb151cca7402beddfc4a76b133447032ea7abedc098d2eb14a7",
"publish_year": 2023
}
}'
Possible response is similar to the following:
{
"code": 200,
"data": {
"doc_name": "zilliz_concept_doc.md",
"num_chunks": 123,
"usage": {
"embedding": 1247
}
}
}
export CLOUD_REGION="gcp-us-west1"
export API_KEY=""
export PIPELINE_ID="pipe-xxxxxxxxxxxxxxxxxxxxxx"
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${API_KEY}" \
--url "https://controller.api.${CLOUD_REGION}.zillizcloud.com/v1/pipelines/${PIPELINE_ID}/run" \
-d '{
"data": {
"image_url": "xxx",
"image_id": "my-img-123456",
"image_title": "A cute yellow cat"
}
}'
Possible response is similar to the following:
{
"code": 200,
"data": {
"num_entities": 1,
"usage": {
"embedding": 1
}
}
}
- Text Data
- Document Data
- Image Data
export CLOUD_REGION="gcp-us-west1"
export API_KEY="YOUR_API_KEY"
export PIPELINE_ID="pipe-xxxxxxxxxxxxxxxxxxxxxx"
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${API_KEY}" \
--url "https://controller.api.${CLOUD_REGION}.zillizcloud.com/v1/pipelines/${PIPELINE_ID}/run" \
-d '{
"data": {
"query_text": "What is Zilliz Cloud?"
},
"params":{
"limit": 1,
"offset": 0,
"outputFields": [],
"filter": "id >= 0",
}
}'
Possible response is similar to the following
{
"code": 200,
"data": {
"result": [
{
"id": 450524927755095739,
"distance": 0.8015198707580566,
"text": "Zilliz Cloud is a fully managed vector database and data services, empowering you to unlock the full potential of unstructured data for your AI applications."
}
],
"usage": {
"embedding": 17,
"rerank": 0
}
}
}
export CLOUD_REGION="gcp-us-west1"
export API_KEY=""
export PIPELINE_ID="pipe-xxxxxxxxxxxxxxxxxxxxxx"
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${API_KEY}" \
--url "https://controller.api.${CLOUD_REGION}.zillizcloud.com/v1/pipelines/${PIPELINE_ID}/run" \
-d '{
"data": {
"query_text": "How many collections can a cluster with more than 8 CUs hold?"
},
"params":{
"limit": 1,
"offset": 0,
"outputFields": [ "chunk_id", "doc_name" ],
"filter": "id >= 0",
}
}'
Possible response is similar to the following:
{
"code": 200,
"data": {
"result": [
{
"id": "445951244000281783",
"distance": 0.7270776033401489,
"chunk_id": 123,
"doc_name": "zilliz_concept_doc.md",
"chunk_text": "After determining the CU type, you must also specify its size. Note that the\nnumber of collections a cluster can hold varies based on its CU size. A\ncluster with less than 8 CUs can hold no more than 32 collections, while a\ncluster with more than 8 CUs can hold as many as 256 collections.\n\nAll collections in a cluster share the CUs associated with the cluster. To\nsave CUs, you can unload some collections. When a collection is unloaded, its\ndata is moved to disk storage and its CUs are freed up for use by other\ncollections. You can load the collection back into memory when you need to\nquery it. Keep in mind that loading a collection requires some time, so you\nshould only do so when necessary.\n\n## Collection\n\nA collection collects data in a two-dimensional table with a fixed number of\ncolumns and a variable number of rows. In the table, each column corresponds\nto a field, and each row represents an entity.\n\nThe following figure shows a sample collection that comprises six entities and\neight fields.\n\n### Fields\n\nIn most cases, people describe an object in terms of its attributes, including\nsize, weight, position, etc. These attributes of the object are similar to the\nfields in a collection.\n\nAmong all the fields in a collection, the primary key is one of the most\nspecial, because the values stored in this field are unique throughout the\nentire collection. Each primary key maps to a different record in the\ncollection."
},
{
"id": "450524927755095513",
"distance": 0.4568396508693695,
"chunk_id": 125,
"doc_name": "zilliz_concept_doc.md",
"chunk_text": "# Cluster, Collection & Entities\n## Collection\n### Fields\nIn most cases, people describe an object in terms of its attributes, including size, weight, position, etc. These attributes of the object are similar to the fields in a collection. \nAmong all the fields in a collection, the primary key is one of the most special, because the values stored in this field are unique throughout the entire collection. Each primary key maps to a different record in the collection. \nIn the collection shown in Figure 1, the **id** field is the primary key. The first ID **0** maps to the article titled *The Mortality Rate of Coronavirus is Not Important*, and will not be used in any other records in this collection.\\n\\n# Cluster, Collection & Entities\n## Collection\n### Schema\nFields have their own properties, such as data types and related constraints for storing data in the field, like vector dimensions and distance metrics. By defining fields and their order, you will get a skeletal data structure termed schema, which shapes a collection in a way that resembles constructing the structure of a data table. \nFor your reference, Zilliz Cloud supports the following field data types: \n- Boolean value (BOOLEAN)\n- 8-byte floating-point (DOUBLE)\n- 4-byte floating-point (FLOAT)\n- Float vector (FLOAT_VECTOR)\n- 8-bit integer (INT8)\n- 32-bit integer (INT32)\n- 64-bit integer (INT64)\n- Variable character (VARCHAR)\n- [JSON](https://zilliverse.feishu.cn/wiki/H04VwNGoaimjcLkxoH4cs5TQnNd) \nZilliz Cloud provides three types of CUs, each of which have its own application scenarios, and they are also the factor that impacts search performance. \n> 📘 Notes\n>\n> **FLOAT_VECTOR** is the only data type that supports vector embeddings in Zilliz Cloud clusters."
}
],
"usage": {
"embedding": 21,
"rerank": 5110
}
}
}
You are search images by either an image or a query text.
- Search images by an specific image.
export CLOUD_REGION="gcp-us-west1"
export API_KEY="YOUR_API_KEY"
export PIPELINE_ID="pipe-xxxxxxxxxxxxxxxxxxxxxx"
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${API_KEY}" \
--url "https://controller.api.${CLOUD_REGION}.zillizcloud.com/v1/pipelines/${PIPELINE_ID}/run" \
-d '{
"data": {
"query_image_url": "https://unsplash.com/photos/an-orange-and-white-cat-laying-on-top-of-a-table-hAbwQ1elxvI"
},
"params":{
"limit": 1,
"offset": 0,
"outputFields": ["image_id", "image_title"],
"filter": "id >= 0",
}
}'
Possible response is similar to the following:
{
"code": 200,
"data": {
"result": [
{
"id": "image-101",
"distance": 0.4,
"image_id": "image-101",
"image_title": "test title"
},
],
"usage": {
"embedding": 1
}
}
}
}
- Search images by query text.
export CLOUD_REGION="gcp-us-west1"
export API_KEY="YOUR_API_KEY"
export PIPELINE_ID="pipe-xxxxxxxxxxxxxxxxxxxxxx"
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${API_KEY}" \
--url "https://controller.api.${CLOUD_REGION}.zillizcloud.com/v1/pipelines/${PIPELINE_ID}/run" \
-d '{
"data": {
"query_text": "my_text_query"
},
"params":{
"limit": 100,
"outputFields": [],
"filter": "id >= 0",
"offset": 0
}
}'
Possible response is similar as follows.
{
"code": 200,
"data": [{
"id": "101",
"distance": 0.4
}]
}
- Text Data
- Document Data
- Image Data
export CLOUD_REGION="gcp-us-west1"
export API_KEY=""
export PIPELINE_ID="pipe-xxxxxxxxxxxxxxxxxxxxxx"
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${API_KEY}" \
--url "https://controller.api.${CLOUD_REGION}.zillizcloud.com/v1/pipelines/${PIPELINE_ID}/run" \
-d '{
"data": {
"expression": "id in [1, 2, 3]"
}
}'
Possible response is similar to the following
{
"code": 200,
"data": {
"num_deleted_entities": 3
}
}
export CLOUD_REGION="gcp-us-west1"
export API_KEY=""
export PIPELINE_ID="pipe-xxxxxxxxxxxxxxxxxxxxxx"
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${API_KEY}" \
--url "https://controller.api.${CLOUD_REGION}.zillizcloud.com/v1/pipelines/${PIPELINE_ID}/run" \
-d '{
"data": {
"doc_name": "zilliz_concept_doc.md",
}
}'
Possible response is similar to the following:
{
"code": 200,
"data": {
"num_deleted_chunks": 567
}
}
export CLOUD_REGION="gcp-us-west1"
export API_KEY="YOUR_API_KEY"
export PIPELINE_ID="pipe-xxxxxxxxxxxxxxxxxxxxxx"
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${API_KEY}" \
--url "https://controller.api.${CLOUD_REGION}.zillizcloud.com/v1/pipelines/${PIPELINE_ID}/run" \
-d '{
"data": {
"image_id": "my-img-123456"
}
}'
Possible response is similar to the following:
{
"code": 200,
"data": {
"num_deleted_entities": 1
}
}
Request
Parameters
-
No query parameters required
-
Path parameters
Parameter Description PIPELINE_ID string(required)
A valid pipeline ID obtained from either the list-pipelines API endpoints or Zilliz Cloud console. -
Header parameters
Parameter Description Authorization string Content-Type string
Request Body
Option 1: Data ingestion parameters.
{}
Parameter | Description |
---|---|
data | object | object | object |
data[opt_1] | object |
data[opt_1].doc_url | string The URL of the document stored on an object storage. You should use a URL that is either not encoded or encoded in UTF-8. Ensure that the URL remains valid for at least one hour. |
data[opt_1].{YOUR_PRESERVED_FIELD} | string The metadata field to preserve. The input field name should be consistent with what you defined when creating the Ingestion pipeline and adding the PRESERVE function. The value of this field should also follow the predefined field type. |
data[opt_2] | object |
data[opt_2].text_list | string The text or text list to ingest. |
data[opt_2].source | string The metadata field to preserve. The input field name should be consistent with what you defined when creating the Ingestion pipeline and adding the PRESERVE function. The value of this field should also follow the predefined field type. |
data[opt_3] | object |
data[opt_3].image_url | string The URL of the image stored on an object storage. You should use a URL that is either not encoded or encoded in UTF-8. Ensure that the URL remains valid for at least one hour. |
data[opt_3].image_id | string The ID of the image stored on an object storage. |
data[opt_3].image_title | string The title of the image. |
Option 2:
{
"data": {
"query_text": "string",
"query_image_url": "string"
},
"params": {
"limit": "integer",
"offset": "integer",
"outputFields": [],
"filter": "string"
}
}
Parameter | Description |
---|---|
data | object Search data. |
data.query_text | string A query text. Zilliz Cloud embeds it and use the generated vector embeddings to conduct a search in the target collection. This applies to pipelines of a SEARCH_TEXT, a SEARCH_DOC_CHUNK, or a SEARCH_IMAGE_BY_TEXT type. |
data.query_image_url | string The URL of a query image. This applies to pipelines of a SEARCH_IMAGE_BY_IMAGE type. |
params | object Search parameters. |
params.limit | integer Total number of records to return. |
params.offset | integer Total number of records to skip in the search results. |
params[].outputFields | array A list of fields to output for each match in the search result. |
params[].outputFields[] | string A valid output field and should be the one defined in the preserve functions. |
params.filter | string A boolean expression for Zilliz Cloud to filter records before actual searches. |
Option 3:
{
"code": "string",
"data": {
"expression": "string",
"doc_name": "string",
"image_id": "string"
}
}
Parameter | Description |
---|---|
data | object Payload of the doc deletion request. |
data.expression | string A filter expression. This applies to pipelines of the INDEX_TEXT type. |
data.doc_name | string Name of the document to delete. Note that you can delete document by its name, and all the chunks of the document will be removed. This applies to pipelines of the INDEX_DOC_CHUNK type. |
data.image_id | string ID of an image. This applies to pipelines of the INDEX_IMAGE type. |
Response
Returns the result of running a specific pipeline.
Response Body
Option 1:
{
"code": "integer",
"data": {
"oneOf": [
{
"num_entities": "integer",
"ids": [
{}
],
"usage": {
"embedding": "integer"
}
},
{
"num_chunks": "integer",
"doc_name": "string",
"usage": {
"embedding": "integer"
}
},
{
"num_entities": "integer",
"usage": {
"embedding": "string"
}
}
]
}
}
Property | Description |
---|---|
code | integer Indicates whether the request succeeds.
|
data | object | object | object Payload of the response. |
data[opt_1] | object |
data[opt_1].num_entities | integer Number of text strings added to the collection. |
data[opt_1][].ids | array IDs of the returned text strings in the collection. |
data[opt_1][].ids[] | integer |
data[opt_1].usage | object Token usage statistics |
data[opt_1].usage.embedding | integer Number of tokens used in text embedding |
data[opt_2] | object Payload of the response. |
data[opt_2].num_chunks | integer Number of chunks generated. |
data[opt_2].doc_name | string Name of the chunked document with the file extension. |
data[opt_2].usage | object Token usage statistics |
data[opt_2].usage.embedding | integer Number of tokens used in text embedding |
data[opt_3] | object |
data[opt_3].num_entities | integer Number of images added. |
data[opt_3].usage | object Token usage statistics |
data[opt_3].usage.embedding | string Number of tokens used in image embedding |
Option 2:
{
"code": "integer",
"data": {
"results": {
"oneOf": [
[
{
"id": "string",
"distance": "string",
"chuck_text": "string",
"chunk_id": "string",
"doc_name": "string"
}
],
[
{
"id": "string",
"distance": "string",
"text": "string"
}
],
[
{
"id": "string",
"distance": "string",
"image_id": "string",
"image_title": "string"
}
]
]
},
"usage": {
"embedding": "integer",
"rerank": "integer"
}
}
}
Property | Description |
---|---|
code | integer Indicates whether the request succeeds.
|
data | object Payload of the response |
results | array | array | array |
results[][opt_1] | array Returned search result. It is an array of objects. |
results[][opt_1][] | object |
results[][opt_1][].id | string ID of a hit entity, representing a chunk of a document. |
results[][opt_1][].distance | string Distance to the vector embedings of the specified query string. |
results[][opt_1][].chuck_text | string A searched document chunk. |
results[][opt_1][].chunk_id | string A searched chunk ID. |
results[][opt_1][].doc_name | string Name of the document to which the searched chunk belongs |
results[][opt_2] | array |
results[][opt_2][] | object |
results[][opt_2][].id | string ID of a hit entity, representing a chunk of a document. |
results[][opt_2][].distance | string Distance to the vector embedings of the specified query string. |
results[][opt_2][].text | string A searched text. |
results[][opt_3] | array |
results[][opt_3][] | object |
results[][opt_3][].id | string ID of a hit entity, representing a chunk of a document. |
results[][opt_3][].distance | string Distance to the vector embedings of the specified query string. |
results[][opt_3][].image_id | string ID of the searched image in the object storage. |
results[][opt_3][].image_title | string Title of the searched image. |
data.usage | object Token usage statistics |
data.usage.embedding | integer Number of tokens used in embedding |
data.usage.rerank | integer Number of tokens used for reranking. |
Option 3:
{
"code": "string",
"data": {
"num_deleted_chunks": "integer"
}
}
Property | Description |
---|---|
code | integer Indicates whether the request succeeds.
|
data | object |
data.num_deleted_chunks | integer Number of deleted chunks. Note that Zilliz Cloud deletes all chunks of a document if a deletion pipeline carriesits name. |
Error Response
{
"code": integer,
"message": string
}
Property | Description |
---|---|
code | integer Indicates whether the request succeeds.
|
message | string Indicates the possible reason for the reported error. |