AI Accelerator Pipelines overview

Suggest edits

As part of EDB Postgres AI - AI Accelerator, Pipelines abstracts away the complexity of working with AI data. It transforms Postgres into a powerful platform for AI data management, combining vector search from PGvector with automation for complex AI workflows.

Pipelines, auto-processing, and intelligent knowledge base

Pipelines handles the entire lifecycle of data preparation through embedding generation, storage, and indexing. You can select your desired model from a list of supported models or connect to any OpenAI API-compatible external model. You can then let Pipelines manage everything else from data ingestion to efficient similarity search.

With data preparation, Pipelines can perform common preprocessing steps within the AI Accelerator directly from Postgres. Pipelines can even perform this pre-processing preparation automatically, keeping processed data up to date.

With auto-processing, Pipelines automates creating and updating embedding, ensuring that vector stores remain up to date without manual embedding management. This approach reduces the risk of stale data that causes GenAI errors and hallucinations.

The intelligent knowledge base feature lets you perform similarity and semantic searches across text and image data with a single knowledge base function call. These searches are possible whether the data is stored in Postgres or object storage.

How Pipelines works

Pipelines delivers its functionality through the Pipelines aidb extension that's embedded into the Postgres server. The extension provides a set of functions and views that allow you to interact with the AI data in your Postgres database.

Preparers

The Pipelines aidb extension introduces the concept of a preparer that you can create for a given type and location of source AI data. This data can reside either in regular columns of a Postgres table or in an S3-compatible object storage bucket.

A preparer encapsulates the functionality to perform common preprocessing steps on source data for use in embeddings generation. The application just needs to create a preparer via the aidb.create_table_preparer() function for Postgres table sources or aidb.create_volume_preparer() for source data stored externally on S3 or on local filesystems.

By default, there is no automatic processing so the application can trigger bulk data preparation processing for all existing data in the source location using aidb.bulk_data_preparation(). Automatic processing can also be configured using aidb.set_auto_preparer() for source data in Postgres tables. The output from the preparer(s) can then be fed into the input for knowledge bases.

Knowledge bases

The Pipelines aidb extension introduces the concept of a knowledge base that you can create for a given type and location of AI data. This data can reside either in regular columns of a Postgres table or in an S3-compatible object storage bucket. Currently, Pipelines supports unstructured plain text documents as well as a set of image formats.

A knowledge base encapsulates all processing that's needed to make the AI data in the provided source location searchable and retrievable through similarity. To create a knowledge base, the application uses the aidb.create_table_knowledge_base() function for Postgres tables or aidb.create_volume_knowledge_base for stored data externally on S3 or local file systems.

Auto-processing

Auto-processing has broad configurability including synchronous and asynchronous as well as batch processing. Availability of these features depends on the type of pipeline (knowledge base or preparer) and the data source and destination type (table or volume).

See auto-processing for more details.

Intelligent knowledge base

Once a knowledge base is created and all embeddings are up to date, the application can use aidb.retrieve_key() to run a similarity search and retrieval by providing a query input. It returns the id of the matched items. When the knowledge base is created for text data, the query input is also a text term. For image knowledge bases, the query input is an image. If you're using a multi-modal model with your knowledge base, such as CLIP, you can use text or images as source and query inputs. This approach allows you to search for images using text and vice versa.

The aidb knowledge base makes sure to use the same encoder LLM for the query input as used for the embedding, conducts a similarity search, and returns the ranked list of similar data from the source location. For Postgres tables, you can also use aidb.retrieve_text() to retrieve the actual data from the source table.

Pipelines currently supports a broad list of open encoder LLMs from HuggingFace as well as a set of OpenAI encoders. HuggingFace LLMs run locally on the Postgres node, while OpenAI encoders involve a call out to the OpenAI cloud service. The aidb.model_providers table shows the list of currently supported model providers.

On this page
Pipelines, auto-processing, and intelligent knowledge base
How Pipelines works

↑ Up

EDB Postgres AI - AI Accelerator

Capabilities

Could this page be better? Report a problem or suggest an addition!