Skip to main content
Version: User Guides (Cloud)

Text Data

The Zilliz Cloud web UI provides a simplified and intuitive way of creating, running, and managing Pipelines while the RESTful API offers more flexibility and customization compared to the Web UI.

This guide walks you through the necessary steps to create text pipelines, conduct a semantic search on your embedded text data, and delete the pipeline if it is no longer needed.

Prerequisites and limitations

  • Ensure you have created a cluster deployed in us-west1 on Google Cloud Platform (GCP).

  • In one project, you can only create up to 100 pipelines of the same type. For more information, refer to Zilliz Cloud Limits.

Ingest text data

To ingest any data, you need to first create an ingestion pipeline and then run it.

Create text ingestion pipeline

  1. Navigate to your project.

  2. Click on Pipelines from the navigation panel. Then switch to the Overview tab and click Pipelines. To create a pipeline, click + Pipeline.

    create-pipeline

  3. Choose the type of pipeline to create. Click on + Pipeline button in the Ingestion Pipeline column.

    choose-pipeline

  4. Configure the Ingestion pipeline you wish to create.

    Parameters

    Description

    Target Cluster

    The cluster where a new collection will be automatically created with this Ingestion pipeline. Currently, this can only be a cluster deployed on GCP us-west1.

    Collection Name

    The name of the auto-created collection.

    Pipeline Name

    Name of the new Ingestion pipeline. It should only contain lowercase letters, numbers, and underscores.

    Description (Optional)

    The description of the new Ingestion pipeline.

    configure-ingestion-pipeline

  5. Add an INDEX function to the Ingestion pipeline by clicking + Function. For each Ingestion pipeline, you can add exactly one INDEX function.

    1. Enter function name.

    2. Select INDEX_TEXT as the function type. An INDEX_TEXT function can generate vector embeddings for all provided text inputs.

    3. Choose the embedding model used to generate vector embeddings. Different text languages have distinct embedding models. Currently, there are 5 available models for the English language: zilliz/bge-base-en-v1.5, voyageai/voyage-2, voyageai/voyage-code-2, openai/text-embedding-3-small, and openai/text-embedding-3-large. For the Chinese language, only zilliz/bge-base-zh-v1.5 is available. The following chart briefly introduces each embedding model.

      Embedding Model

      Description

      zilliz/bge-base-en-v1.5

      Released by BAAI, this state-of-the-art open-source model is hosted on Zilliz Cloud and co-located with vector databases, providing good quality and best network latency.

      voyageai/voyage-2

      Hosted by Voyage AI. This general purpose model excels in retrieving technical documentation containing descriptive text and code. Its lighter version voyage-lite-02-instruct ranks top on MTEB leaderboard. This model is only available when language is ENGLISH.

      voyageai/voyage-code-2

      Hosted by Voyage AI. This model is optimized for software code, providing outstanding quality for retrieving software documents and source code. This model is only available when language is ENGLISH.

      voyageai/voyage-large-2

      Hosted by Voyage AI. This is the most powerful generalist embedding model from Voyage AI. It supports 16k context length (4x that of voyage-2) and excels on various types of text including technical and long-context documents. This model is only available when language is ENGLISH.

      openai/text-embedding-3-small

      Hosted by OpenAI. This highly efficient embedding model has stronger performance over its predecessor text-embedding-ada-002 and balances inference cost and quality. This model is only available when language is ENGLISH.

      openai/text-embedding-3-large

      Hosted by OpenAI. This is OpenAI's best performing model. Compared to text-embedding-ada-002, the MTEB score has increased from 61.0% to 64.6%. This model is only available when language is ENGLISH.

      zilliz/bge-base-zh-v1.5

      Released by BAAI, this state-of-the-art open-source model is hosted on Zilliz Cloud and co-located with vector databases, providing good quality and best network latency. This is the default embedding model when language is CHINESE.

      add-index-text-function

    4. Click Add to save your function.

  6. (Optional) Continue to add another PRESERVE function if you need to preserve the metadata for your texts. A PRESERVE function adds additional scalar fields to the collection along with data ingestion.

    📘Notes

    For each Ingestion pipeline, you can add up to 50 PRESERVE functions.

    1. Click + Function.

    2. Enter function name.

    3. Configure the input field name and type. Supported input field types include Bool, Int8, Int16, Int32, Int64, Float, Double, and VarChar.

      📘Notes
      • Currently, the output field name must be identical to the input field name. The input field name defines the field name used when running the Ingestion pipeline. The output field name defines the field name in the vector collection schema where the preserved value is kept.

      • For VarChar fields, the value should be a string with a maximum length of 4,000 alphanumeric characters.

      • When storing date-time in scalar fields, it is recommended to use the Int16 data type for year data, and Int32 for timestamps.

      add-preserve-function

    4. Click Add to save your function.

  7. Click Create Ingestion Pipeline.

  8. Continue creating a Search pipeline and a Deletion pipeline that is auto-configured to be compatible with the just-created Ingestion pipeline.

    ingestion-pipeline-created-successfully

    📘Notes

    By default, the reranker feature is disabled in the auto-configured search pipeline. If you need to enable reranker, please manually create a new search pipeline.

Run text ingestion pipeline

  1. Click the "▶︎" button next to your Ingestion pipeline.

    run-pipeline

  2. Input the text or text lists that need to be ingested in the text_list field. If you have added a PRESERVE function, enter the value in the defined preserved field as well. Click Run.

  3. Check the results.

  4. Input other texts to run again.

Search text data

To search any data, you need to first create a search pipeline and then run it. Unlike Ingestion and Deletion pipelines, when creating a Search pipeline, the cluster and collection are defined at the function level instead of the pipeline level. This is because Zilliz Cloud allows you to search from multiple collections at a time.

Create text search pipeline

  1. Navigate to your project.

  2. Click on Pipelines from the navigation panel. Then switch to the Overview tab and click Pipelines. To create a pipeline, click + Pipeline.

  3. Choose the type of pipeline to create. Click on + Pipeline button in the Search Pipeline column.

    create-search-pipeline

  4. Configure the Search pipeline you wish to create.

    Parameters

    Description

    Pipeline Name

    The name of the new Search pipeline. It should only contain lowercase letters, numbers, and underscores only.

    Description (Optional)

    The description of the new Search pipeline.

    configure-search-pipeline

  5. Add a function to the Search pipeline by clicking + Function. You can add exactly one function.

    1. Enter function name.

    2. Choose Target Cluster and Target collection. The Target Cluster must be a cluster deployed in us-west1 on Google Cloud Platform (GCP). and the Target Collection must be created by an Ingestion pipeline, otherwise the Search pipeline will not be compatible.

    3. Select SEARCH_TEXT as the Function Type. A SEARCH_TEXT function can convert the query text to a vector embedding and retrieve topK most relevant text entities.

    4. (Optional) Enable reranker if you want to rank the search results based on their relevance to the query to improve search quality. However, note that enabling reranker will lead to higher cost and search latency. By default, this feature is disabled. Once enabled, you can choose the model service used for reranking. Currently, only zilliz/bge-reranker-base is available.

      Reranker Model Service

      Description

      zilliz/bge-reranker-base

      Open-source cross-encoder architecture reranker model published by BAAI. This model is hosted on Zilliz Cloud.

      add-search-text-function

    5. Click Add to save your function.

  6. Click Create Search Pipeline.

Run text search pipeline

  1. Click the "▶︎" button next to your Search pipeline. Alternatively, you can also click on the Playground tab.

    run-pipeline

  2. Input the query text. Click Run.

  3. Check the results.

  4. Enter new query text to rerun the pipeline.

Delete text data

To delete any data, you need to first create a deletion pipeline and then run it.

Create text deletion pipeline

  1. Navigate to your project.

  2. Click on Pipelines from the navigation panel. Then switch to the Overview tab and click Pipelines. To create a pipeline, click + Pipeline.

  3. Choose the type of pipeline to create. Click on + Pipeline button in the Deletion Pipeline column.

    create-deletion-pipeline

  4. Configure the Deletion pipeline you wish to create.

    Parameters

    Description

    Pipeline Name

    The name of the new Deletion pipeline. It should only contain lowercase letters, numbers, and underscores.

    Description (Optional)

    The description of the new Deletion pipeline.

    configure-deletion-pipeline

  5. Add a function to the Deletion pipeline by clicking + Function. You can add exactly one function.

    1. Enter function name.

    2. Select either PURGE_TEXT_INDEX or PURGE_BY_EXPRESSION as the Function Type. A PURGE_TEXT_INDEX function can delete all text entities with the specified id while a PURGE_BY_EXPRESSION function can delete all text entities matching the specified filter expression.

    3. Click Add to save your function.

  6. Click Create Deletion Pipeline.

Run text deletion pipeline

  1. Click the "▶︎" button next to your Deletion pipeline. Alternatively, you can also click on the Playground tab.

    run-pipeline

  2. Input the filter expression. Click Run.

  3. Check the results.

Manage pipeline

The following are relevant operations that manages the created pipelines in the aforementioned steps.

View pipeline

Click Pipelines on the left navigation. Choose the Pipelines tab. You will see all the available pipelines.

view-pipelines-on-web-ui

Click on a specific pipeline to view its detailed information including its basic information, total usage, functions, and related connectors.

view-pipeline-details

📘Notes

The total usage data could delay by a few hours due to technical limitation.

You can also check the pipeline activities on the web UI.

view-pipelines-activities-on-web-ui

Delete pipeline

If you no longer need a pipeline, you can drop it. Note that dropping a pipeline will not remove the auto-created collection where it ingested data.

🚧Warning
  • Dropped pipelines cannot be recovered. Please be cautious with the action.

  • Dropping a data-ingestion pipeline does not affect the collection created along with the pipeline. Your data is safe.

To drop a pipeline on the web UI, click the ... button under the Actions column. Then click Drop.

delete-pipeline