Text Data
The Zilliz Cloud web UI provides a simplified and intuitive way of creating, running, and managing Pipelines while the RESTful API offers more flexibility and customization compared to the Web UI.
This guide walks you through the necessary steps to create text pipelines, conduct a semantic search on your embedded text data, and delete the pipeline if it is no longer needed.
Prerequisites and limitations
-
Ensure you have created a cluster deployed in us-west1 on Google Cloud Platform (GCP).
-
In one project, you can only create up to 100 pipelines of the same type. For more information, refer to Zilliz Cloud Limits.
Ingest text data
To ingest any data, you need to first create an ingestion pipeline and then run it.
Create text ingestion pipeline
- Cloud Console
- Bash
-
Navigate to your project.
-
Click on Pipelines from the navigation panel. Then switch to the Overview tab and click Pipelines. To create a pipeline, click + Pipeline.
-
Choose the type of pipeline to create. Click on + Pipeline button in the Ingestion Pipeline column.
-
Configure the Ingestion pipeline you wish to create.
Parameters
Description
Target Cluster
The cluster where a new collection will be automatically created with this Ingestion pipeline. Currently, this can only be a cluster deployed on GCP us-west1.
Collection Name
The name of the auto-created collection.
Pipeline Name
Name of the new Ingestion pipeline. It should only contain lowercase letters, numbers, and underscores.
Description (Optional)
The description of the new Ingestion pipeline.
-
Add an INDEX function to the Ingestion pipeline by clicking + Function. For each Ingestion pipeline, you can add exactly one INDEX function.
-
Enter function name.
-
Select INDEX_TEXT as the function type. An INDEX_TEXT function can generate vector embeddings for all provided text inputs.
-
Choose the embedding model used to generate vector embeddings. Different text languages have distinct embedding models. Currently, there are 5 available models for the English language: zilliz/bge-base-en-v1.5, voyageai/voyage-2, voyageai/voyage-code-2, openai/text-embedding-3-small, and openai/text-embedding-3-large. For the Chinese language, only zilliz/bge-base-zh-v1.5 is available. The following chart briefly introduces each embedding model.
Embedding Model
Description
zilliz/bge-base-en-v1.5
Released by BAAI, this state-of-the-art open-source model is hosted on Zilliz Cloud and co-located with vector databases, providing good quality and best network latency.
Hosted by Voyage AI. This general purpose model excels in retrieving technical documentation containing descriptive text and code. Its lighter version voyage-lite-02-instruct ranks top on MTEB leaderboard. This model is only available when
language
isENGLISH
.Hosted by Voyage AI. This model is optimized for software code, providing outstanding quality for retrieving software documents and source code. This model is only available when
language
isENGLISH
.Hosted by Voyage AI. This is the most powerful generalist embedding model from Voyage AI. It supports 16k context length (4x that of voyage-2) and excels on various types of text including technical and long-context documents. This model is only available when
language
isENGLISH
.Hosted by OpenAI. This highly efficient embedding model has stronger performance over its predecessor text-embedding-ada-002 and balances inference cost and quality. This model is only available when
language
isENGLISH
.Hosted by OpenAI. This is OpenAI's best performing model. Compared to text-embedding-ada-002, the MTEB score has increased from 61.0% to 64.6%. This model is only available when
language
isENGLISH
.zilliz/bge-base-zh-v1.5
Released by BAAI, this state-of-the-art open-source model is hosted on Zilliz Cloud and co-located with vector databases, providing good quality and best network latency. This is the default embedding model when
language
isCHINESE
. -
Click Add to save your function.
-
-
(Optional) Continue to add another PRESERVE function if you need to preserve the metadata for your texts. A PRESERVE function adds additional scalar fields to the collection along with data ingestion.
📘NotesFor each Ingestion pipeline, you can add up to 50 PRESERVE functions.
-
Click + Function.
-
Enter function name.
-
Configure the input field name and type. Supported input field types include Bool, Int8, Int16, Int32, Int64, Float, Double, and VarChar.
📘NotesCurrently, the output field name must be identical to the input field name. The input field name defines the field name used when running the Ingestion pipeline. The output field name defines the field name in the vector collection schema where the preserved value is kept.
For VarChar fields, the value should be a string with a maximum length of 4,000 alphanumeric characters.
When storing date-time in scalar fields, it is recommended to use the Int16 data type for year data, and Int32 for timestamps.
-
Click Add to save your function.
-
-
Click Create Ingestion Pipeline.
-
Continue creating a Search pipeline and a Deletion pipeline that is auto-configured to be compatible with the just-created Ingestion pipeline.
📘NotesBy default, the reranker feature is disabled in the auto-configured search pipeline. If you need to enable reranker, please manually create a new search pipeline.
The following example creates an Ingestion pipeline named my_text_ingestion_pipeline
with an INDEX_TEXT function and a PRESERVE function added.
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${YOUR_API_KEY}" \
--url "https://controller.api.{cloud-region}.zillizcloud.com/v1/pipelines" \
-d '{
"name": "my_text_ingestion_pipeline",
"clusterId": "inxx-xxxxxxxxxxxxxxx",
"projectId": "proj-xxxx",
"collectionName": "my_collection",
"description": "A pipeline that generates text embeddings and stores additional fields.",
"type": "INGESTION",
"functions": [
{
"name": "index_my_text",
"action": "INDEX_TEXT",
"language": "ENGLISH",
"embedding": "zilliz/bge-base-en-v1.5"
},
{
"name": "keep_text_info",
"action": "PRESERVE",
"inputField": "source",
"outputField": "source",
"fieldType": "VarChar"
}
]
}'
The parameters in the above code are described as follows:
-
YOUR_API_KEY
: The credential used to authenticate API requests. Learn more about how to View API Keys. -
cloud-region
: The ID of the cloud region where your cluster exists. Currently, onlygcp-us-west1
is supported. -
clusterId
: The ID of the cluster in which you want to create a pipeline. Currently, you can only choose a cluster deployed in us-west1 on GCP. Learn more about How can I find my CLUSTER_ID? -
projectId
: The ID of the project in which you want to create a pipeline. Learn more about How Can I Obtain the Project ID? -
collectionName
: The name of the collection automatically generated with the ingestion pipeline to create. Alternatively, you can also specify an existing collection. -
name
: The name of the pipeline to create. The pipeline name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores. -
description
(optional): The description of the pipeline to create. -
type
: The type of the pipeline to create. Currently, available pipeline types includeINGESTION
,SEARCH
, andDELETION
. -
functions
: The function(s) to add in the pipeline. An Ingestion pipeline can have only one INDEX function and up to 50 PRESERVE functions.-
name
: The name of the function. The function name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores. -
action
: The type of the function to add. Currently, available options includeINDEX_DOC
,INDEX_TEXT
,INDEX_IMAGE
andPRESERVE
. -
language
: The language of your text to ingest. Possible values includeENGLISH
andCHINESE
. (This parameter is only used in theINDEX_TEXT
andINDEX_DOC_CHUNK
function.) -
embedding
: The embedding model used to generate vector embeddings for your text. Available options are as follows. (This parameter is only used in theIndex
function.)Embedding Model
Description
zilliz/bge-base-en-v1.5
Released by BAAI, this state-of-the-art open-source model is hosted on Zilliz Cloud and co-located with vector databases, providing good quality and best network latency.
Hosted by Voyage AI. This general purpose model excels in retrieving technical documentation containing descriptive text and code. Its lighter version voyage-lite-02-instruct ranks top on MTEB leaderboard. This model is only available when
language
isENGLISH
.Hosted by Voyage AI. This model is optimized for programming code, providing outstanding quality for retrieval code blocks. This model is only available when
language
isENGLISH
.Hosted by Voyage AI. This is the most powerful generalist embedding model from Voyage AI. It supports 16k context length (4x that of voyage-2) and excels on various types of text including technical and long-context documents. This model is only available when
language
isENGLISH
.Hosted by OpenAI. This highly efficient embedding model has stronger performance over its predecessor text-embedding-ada-002 and balances inference cost and quality. This model is only available when
language
isENGLISH
.Hosted by OpenAI. This is OpenAI's best performing model. Compared to text-embedding-ada-002, the MTEB score has increased from 61.0% to 64.6%. This model is only available when
language
isENGLISH
.zilliz/bge-base-zh-v1.5
Released by BAAI, this state-of-the-art open-source model is hosted on Zilliz Cloud and co-located with vector databases, providing good quality and best network latency. This is the default embedding model when
language
isCHINESE
.
-
-
inputField
: The name of theinputField
. You can customize the value but it should be identical with theoutputField
.(This parameter is only used in thePRESERVE
function.) -
outputField
: The name of the output field which will be used in the collection schema. Currently, the output field name must be identical to the input field name. (This parameter is only used in thePRESERVE
function.) -
fieldType
: The data type of the input and output fields. Possible values includeBool
,Int8
,Int16
,Int32
,Int64
,Float
,Double
, andVarChar
. (This parameter is only used in thePRESERVE
function.)📘NotesWhen storing date-time in scalar fields, it is recommended to use the Int16 data type for year data, and Int32 for timestamps.
For
VarChar
field type, themax_length
of the data in this field cannot exceed 4,000.
Below is an example output.
{
"code": 200,
"data": {
"pipelineId": "pipe-xxx",
"name": "my_text_ingestion_pipeline",
"type": "INGESTION",
"createTimestamp": 1721187300000,
"description": "A pipeline that generates text embeddings and stores additional fields.",
"status": "SERVING",
"totalUsage": {
"embedding": 0
},
"functions": [
{
"name": "index_my_text",
"action": "INDEX_TEXT",
"inputFields": ["text_list"],
"language": "ENGLISH",
"embedding": "zilliz/bge-base-en-v1.5"
},
{
"name": "keep_text_info",
"action": "PRESERVE",
"inputField": "source",
"outputField": "source",
"fieldType": "VarChar"
}
],
"clusterId": "inxx-xxxx",
"collectionName": "my_collection"
}
}
The total usage data could delay by a few hours due to technical limitation.
A collection named my_collection
will be automatically if it does not exist in the cluster. However, if it exists, Zililz Cloud Pipelines will check whether the collection schema is consistent with the schema defined in the pipeline.
This collection contains four fields: three output fields of the INDEX_TEXT function, and one output field for each PRESERVE function. The collection schema is as follows.
id (Data Type: Int64) | text (Data type: VarChar) | embedding (Data type: FLOAT_VECTOR) | source (Data type: VarChar) |
---|
Run text ingestion pipeline
- Cloud Console
- Bash
-
Click the "▶︎" button next to your Ingestion pipeline.
-
Input the text or text lists that need to be ingested in the
text_list
field. If you have added a PRESERVE function, enter the value in the defined preserved field as well. Click Run. -
Check the results.
-
Input other texts to run again.
The following example runs the Ingestion pipeline my_text_ingestion_pipeline
. source
is the metadata field that needs to be preserved.
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${YOUR_API_KEY}" \
--url "https://controller.api.{cloud-region}.zillizcloud.com/v1/pipelines/${YOUR_PIPELINE_ID}/run" \
-d '{
"data": {
"text_list": [
"Zilliz Cloud is a fully managed vector database and data services, empowering you to unlock the full potential of unstructured data for your AI applications.",
"It can store, index, and manage massive embedding vectors generated by deep neural networks and other machine learning (ML) models."
],
"source": "Zilliz official website"
}
}'
The parameters in the above code are described as follows:
-
YOUR_API_KEY
: The credential used to authenticate API requests. Learn more about how to View API Keys. -
cloud-region
: The ID of the cloud region where your cluster exists. Currently, onlygcp-us-west1
is supported. -
text_list
: The text or text list to ingest. -
source
(optional): The metadata field to preserve. The input field name should be consistent with what you defined when creating the Ingestion pipeline and adding the PRESERVE function. The value of this field should also follow the predefined field type.
Below is an example response.
{
"code": 200,
"data": {
"num_entities": 2,
"usage": {
"embedding": 63
},
"ids": [
450524927755105948,
450524927755105949
]
}
}
Search text data
To search any data, you need to first create a search pipeline and then run it. Unlike Ingestion and Deletion pipelines, when creating a Search pipeline, the cluster and collection are defined at the function level instead of the pipeline level. This is because Zilliz Cloud allows you to search from multiple collections at a time.
Create text search pipeline
- Cloud Console
- Bash
-
Navigate to your project.
-
Click on Pipelines from the navigation panel. Then switch to the Overview tab and click Pipelines. To create a pipeline, click + Pipeline.
-
Choose the type of pipeline to create. Click on + Pipeline button in the Search Pipeline column.
-
Configure the Search pipeline you wish to create.
Parameters
Description
Pipeline Name
The name of the new Search pipeline. It should only contain lowercase letters, numbers, and underscores only.
Description (Optional)
The description of the new Search pipeline.
-
Add a function to the Search pipeline by clicking + Function. You can add exactly one function.
-
Enter function name.
-
Choose Target Cluster and Target collection. The Target Cluster must be a cluster deployed in us-west1 on Google Cloud Platform (GCP). and the Target Collection must be created by an Ingestion pipeline, otherwise the Search pipeline will not be compatible.
-
Select SEARCH_TEXT as the Function Type. A SEARCH_TEXT function can convert the query text to a vector embedding and retrieve topK most relevant text entities.
-
(Optional) Enable reranker if you want to rank the search results based on their relevance to the query to improve search quality. However, note that enabling reranker will lead to higher cost and search latency. By default, this feature is disabled. Once enabled, you can choose the model service used for reranking. Currently, only zilliz/bge-reranker-base is available.
Reranker Model Service
Description
zilliz/bge-reranker-base
Open-source cross-encoder architecture reranker model published by BAAI. This model is hosted on Zilliz Cloud.
-
Click Add to save your function.
-
-
Click Create Search Pipeline.
The following example creates a Search pipeline named my_text_search_pipeline
with a SEARCH_TEXT function added.
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${YOUR_API_KEY}" \
--url "https://controller.api.{cloud-region}.zillizcloud.com/v1/pipelines" \
-d '{
"projectId": "proj-xxxx",
"name": "my_text_search_pipeline",
"description": "A pipeline that receives text and search for semantically similar texts",
"type": "SEARCH",
"functions": [
{
"name": "search_text",
"action": "SEARCH_TEXT",
"clusterId": "inxx-xxxxxxxxxxxxxxx",
"collectionName": "my_collection",
"embedding": "zilliz/bge-base-en-v1.5",
"reranker": "zilliz/bge-reranker-base"
}
]
}'
The parameters in the above code are described as follows:
-
YOUR_API_KEY
: The credential used to authenticate API requests. Learn more about how to View API Keys. -
cloud-region
: The ID of the cloud region where your cluster exists. Currently, onlygcp-us-west1
is supported. -
projectId
: The ID of the project in which you want to create a pipeline. Learn more about How Can I Obtain the Project ID? -
name
: The name of the pipeline to create. The pipeline name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores. -
description
(optional): The description of the pipeline to create. -
type
: The type of the pipeline to create. Currently, available pipeline types includeINGESTION
,SEARCH
, andDELETION
. -
functions
: The function(s) to add in the pipeline. A Search pipeline can only have one function.-
name
: The name of the function. The function name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores. -
action
: The type of the function to add. Currently, available options includeSEARCH_DOC_CHUNK
,SEARCH_TEXT
,SEARCH_IMAGE_BY_IMAGE
, andSEARCH_IMAGE_BY_TEXT
. -
clusterId
: The ID of the cluster in which you want to create a pipeline. Currently, you can only choose a cluster deployed in us-west1 on GCP. Learn more about How can I find my CLUSTER_ID? -
collectionName
: The name of the collection in which you want to create a pipeline. -
embedding
: The embedding model used during vector search. The model should be consistent with the one chosen in the compatible collection. -
reranker
(Optional): This is an optional parameter for those who want to reorder or rank a set of candidate outputs to improve the quality of the search results. If you do not need the reranker, you can omit this parameter. Currently, onlyzilliz/bge-reranker-base
is available as the parameter value.
-
Below is an example output.
{
"code": 200,
"data": {
"pipelineId": "pipe-xxxx",
"name": "my_text_search_pipeline",
"type": "SEARCH",
"createTimestamp": 1721187655000,
"description": "A pipeline that receives text and search for semantically similar texts",
"status": "SERVING",
"totalUsage": {
"embedding": 0,
"rerank": 0
},
"functions": [
{
"name": "search_text",
"action": "SEARCH_TEXT",
"inputFields": [
"query_text"
],
"clusterId": "inxx-xxxx",
"collectionName": "my_collection",
"reranker": "zilliz/bge-reranker-base",
"embedding": "zilliz/bge-base-en-v1.5"
}
]
}
}
The total usage data could delay by a few hours due to technical limitation.
Run text search pipeline
- Cloud Console
- Bash
-
Click the "▶︎" button next to your Search pipeline. Alternatively, you can also click on the Playground tab.
-
Input the query text. Click Run.
-
Check the results.
-
Enter new query text to rerun the pipeline.
The following example runs the Search pipeline named my_text_search_pipeline
. The query text is "What is Zilliz Cloud?".
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${YOUR_API_KEY}" \
--url "https://controller.api.{cloud-region}.zillizcloud.com/v1/pipelines/${YOUR_PIPELINE_ID}/run" \
-d '{
"data": {
"query_text": "What is Zilliz Cloud?"
},
"params":{
"limit": 1,
"offset": 0,
"outputFields": [],
"filter": "id >= 0"
}
}'
The parameters in the above code are described as follows:
-
YOUR_API_KEY
: The credential used to authenticate API requests. Learn more about how to View API Keys. -
cloud-region
: The ID of the cloud region where your cluster exists. Currently, onlygcp-us-west1
is supported. -
query_text
: The query text used to conduct a semantic search. -
params
: The search parameters to configure.-
limit
: The maximum number of entities to return. The value should be an integer ranging from 1 to 500. The sum of this value of that ofoffset
should be less than 1024. -
offset
: The number of entities to skip in the search results.The sum of this value and that of
limit
should not be greater than 1024.The maximum value is 1024. -
outputFields
: An array of fields to return along with the search results. Note thatid
(entity ID),distance
, andtext
will be returned in the search result by default. If you need other output fields in the returned result, you can configure this parameter. -
filter
: The filter in boolean expression used to find matches for the search
-
Below is an example response.
{
"code": 200,
"data": {
"result": [
{
"id": 450524927755105948,
"distance": 0.9997715353965759,
"text": "Zilliz Cloud is a fully managed vector database and data services, empowering you to unlock the full potential of unstructured data for your AI applications."
}
],
"usage": {
"embedding": 17,
"rerank": 43
}
}
}
Delete text data
To delete any data, you need to first create a deletion pipeline and then run it.
Create text deletion pipeline
- Cloud Console
- Bash
-
Navigate to your project.
-
Click on Pipelines from the navigation panel. Then switch to the Overview tab and click Pipelines. To create a pipeline, click + Pipeline.
-
Choose the type of pipeline to create. Click on + Pipeline button in the Deletion Pipeline column.
-
Configure the Deletion pipeline you wish to create.
Parameters
Description
Pipeline Name
The name of the new Deletion pipeline. It should only contain lowercase letters, numbers, and underscores.
Description (Optional)
The description of the new Deletion pipeline.
-
Add a function to the Deletion pipeline by clicking + Function. You can add exactly one function.
-
Enter function name.
-
Select either PURGE_TEXT_INDEX or PURGE_BY_EXPRESSION as the Function Type. A PURGE_TEXT_INDEX function can delete all text entities with the specified id while a PURGE_BY_EXPRESSION function can delete all text entities matching the specified filter expression.
-
Click Add to save your function.
-
-
Click Create Deletion Pipeline.
The example below creates a Deletion pipeline named my_text_deletion_pipeline
with a PURGE_BY_EXPRESSION function added.
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${YOUR_API_KEY}" \
--url "https://controller.api.{cloud-region}.zillizcloud.com/v1/pipelines" \
-d '{
"projectId": "proj-xxxx",
"name": "my_text_deletion_pipeline",
"description": "A pipeline that deletes entities by expression",
"type": "DELETION",
"functions": [
{
"name": "purge_data_by_expression",
"action": "PURGE_BY_EXPRESSION"
}
],
"clusterId": "inxx-xxxxxxxxxxxxxxx",
"collectionName": "my_collection"
}'
The parameters in the above code are described as follows:
-
YOUR_API_KEY
: The credential used to authenticate API requests. Learn more about how to View API Keys. -
cloud-region
: The ID of the cloud region where your cluster exists. Currently, onlygcp-us-west1
is supported. -
projectId
: The ID of the project in which you want to create a pipeline. Learn more about How Can I Obtain the Project ID? -
name
: The name of the pipeline to create. The pipeline name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores. -
description
(optional): The description of the pipeline to create. -
type
: The type of the pipeline to create. Currently, available pipeline types includeINGESTION
,SEARCH
, andDELETION
. -
functions
: The function(s) to add in the pipeline. A Deletion pipeline can only have one function.-
name
: The name of the function. The function name should be a string of 3-64 characters and can contain only alphanumeric letters and underscores. -
action
: The type of the function to add. Available options includePURGE_DOC_INDEX
,PURGE_TEXT_INDEX
,PURGE_BY_EXPRESSION
, andPURGE_IMAGE_INDEX
.
-
-
clusterId
: The ID of the cluster in which you want to create a pipeline. Currently, you can only choose a cluster deployed on GCP us-west1. Learn more about How can I find my CLUSTER_ID? -
collectionName
: The name of the collection in which you want to create a pipeline.
Below is an example output.
{
"code": 200,
"data": {
"pipelineId": "pipe-xxxx",
"name": "my_text_deletion_pipeline",
"type": "DELETION",
"createTimestamp": 1721187655000,
"description": "A pipeline that deletes entities by expression",
"status": "SERVING",
"functions": [
{
"action": "PURGE_BY_EXPRESSION",
"name": "purge_data_by_expression",
"inputFields": ["expression"]
}
],
"clusterId": "in03-***************",
"collectionName": "my_collection"
}
}
Run text deletion pipeline
- Cloud Console
- Bash
-
Click the "▶︎" button next to your Deletion pipeline. Alternatively, you can also click on the Playground tab.
-
Input the filter expression. Click Run.
-
Check the results.
The following example runs the Deletion pipeline named my_text_deletion_pipeline
.
curl --request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${YOUR_API_KEY}" \
--url "https://controller.api.{cloud-region}.zillizcloud.com/v1/pipelines/${YOUR_PIPELINE_ID}/run" \
-d '{
"data": {
"expression": "id in [1, 2, 3]"
}
}'
The parameters in the above code are described as follows:
-
YOUR_API_KEY
: The credential used to authenticate API requests. Learn more about how to View API Keys. -
cloud-region
: The ID of the cloud region where your cluster exists. Currently, onlygcp-us-west1
is supported. -
expression
: The boolean expression used to filter out entities that need to be deleted. For more information about how to write boolean expression, refer to Filtering.
Below is an example response.
{
"code": 200,
"data": {
"num_deleted_entities": 3
}
}
Manage pipeline
The following are relevant operations that manages the created pipelines in the aforementioned steps.
View pipeline
- Cloud Console
- Bash
Click Pipelines on the left navigation. Choose the Pipelines tab. You will see all the available pipelines.
Click on a specific pipeline to view its detailed information including its basic information, total usage, functions, and related connectors.
The total usage data could delay by a few hours due to technical limitation.
You can also check the pipeline activities on the web UI.
You can call the API to list all existing pipelines or view the details of a particular pipeline.
-
View all existing pipelines
Follow the example below and specify your
projectId
. Learn more about how to obtain the project ID.curl --request GET \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${YOUR_API_KEY}" \
--url "https://controller.api.{cloud-region}.zillizcloud.com/v1/pipelines?projectId=proj-xxxx"Below is an example output.
{
"code": 200,
"data": [
{
"pipelineId": "pipe-xxxx",
"name": "my_text_ingestion_pipeline",
"type": "INGESTION",
"createTimestamp": 1721187655000,
"clusterId": "in03-***************",
"collectionName": "my_collection"
"description": "A pipeline that generates text embeddings and stores additional fields.",
"status": "SERVING",
"totalUsage": {
"embedding": 0
},
"functions": [
{
"action": "INDEX_TEXT",
"name": "index_my_text",
"inputFields": ["text_list"],
"language": "ENGLISH",
"embedding": "zilliz/bge-base-en-v1.5"
},
{
"action": "PRESERVE",
"name": "keep_text_info",
"inputField": "source",
"outputField": "source",
"fieldType": "VarChar"
}
]
},
{
"pipelineId": "pipe-xxxx",
"name": "my_text_search_pipeline",
"type": "SEARCH",
"createTimestamp": 1721187655000,
"description": "A pipeline that receives text and search for semantically similar texts",
"status": "SERVING",
"totalUsage": {
"embedding": 0,
"rerank": 0
},
"functions":
{
"action": "SEARCH_TEXT",
"name": "search_text",
"inputFields": "query_text",
"clusterId": "in03-***************",
"collectionName": "my_collection",
"embedding": "zilliz/bge-base-en-v1.5",
"reranker": "zilliz/bge-reranker-base"
}
},
{
"pipelineId": "pipe-xxxx",
"name": "my_text_deletion_pipeline",
"type": "DELETION",
"createTimestamp": 1721187655000,
"description": "A pipeline that deletes entities by expression",
"status": "SERVING",
"functions":
{
"action": "PURGE_BY_EXPRESSION",
"name": "purge_data_by_expression",
"inputFields": ["expression"]
},
"clusterId": "in03-***************",
"collectionName": "my_collection"
}
]
}📘NotesThe total usage data could delay by a few hours due to technical limitation.
-
View the details of a specific pipeline
Follow the example below to view the details of a pipeline.
curl --request GET \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${YOUR_API_KEY}" \
--url "https://controller.api.{cloud-region}.zillizcloud.com/v1/pipelines/${YOUR_PIPELINE_ID}"Below is example output.
{
"code": 200,
"data": {
"pipelineId": "pipe-xxx",
"name": "my_text_ingestion_pipeline",
"type": "INGESTION",
"createTimestamp": 1721187300000,
"description": "A pipeline that generates text embeddings and stores additional fields.",
"status": "SERVING",
"totalUsage": {
"embedding": 0
},
"functions": [
{
"name": "index_my_text",
"action": "INDEX_TEXT",
"inputFields": ["text_list"],
"language": "ENGLISH",
"embedding": "zilliz/bge-base-en-v1.5"
},
{
"name": "keep_text_info",
"action": "PRESERVE",
"inputField": "source",
"outputField": "source",
"fieldType": "VarChar"
}
],
"clusterId": "inxx-xxxx",
"collectionName": "my_collection"
}
}📘NotesThe total usage data could delay by a few hours due to technical limitation.
Delete pipeline
If you no longer need a pipeline, you can drop it. Note that dropping a pipeline will not remove the auto-created collection where it ingested data.
Dropped pipelines cannot be recovered. Please be cautious with the action.
Dropping a data-ingestion pipeline does not affect the collection created along with the pipeline. Your data is safe.
- Cloud Console
- Bash
To drop a pipeline on the web UI, click the ... button under the Actions column. Then click Drop.
Follow the example below to drop a pipeline.
curl --request GET \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${YOUR_API_KEY}" \
--url "https://controller.api.{cloud-region}.zillizcloud.com/v1/pipelines/${YOUR_PIPELINE_ID}"
The following is an example output.
{
"code": 200,
"data": {
"pipelineId": "pipe-xxx",
"name": "my_text_ingestion_pipeline",
"type": "INGESTION",
"createTimestamp": 1721187300000,
"description": "A pipeline that generates text embeddings and stores additional fields.",
"status": "SERVING",
"totalUsage": {
"embedding": 0
},
"functions": [
{
"name": "index_my_text",
"action": "INDEX_TEXT",
"inputFields": ["text_list"],
"language": "ENGLISH",
"embedding": "zilliz/bge-base-en-v1.5"
},
{
"name": "keep_text_info",
"action": "PRESERVE",
"inputField": "source",
"outputField": "source",
"fieldType": "VarChar"
}
],
"clusterId": "inxx-xxxx",
"collectionName": "my_collection"
}
}
The total usage data could delay by a few hours due to technical limitation.