Pipeline

Learn about pipelines in GlassFlow.

What is a Pipeline in GlassFlow?

A pipeline in GlassFlow orchestrates the flow of data from various sources, through transformations, and ultimately sends it to destinations where the data is stored or further utilized. Configuring a pipeline involves specifying these elements and defining how data is processed at each stage.

The pipeline automatically integrates a custom function you define in Python into the specified pipeline, executing the transformations in real-time as data passes through.

Pipeline components

Each pipeline consists of:

  1. Data Sources: Points where data is ingested into the pipeline. This can be databases such as PostgreSQL or MongoDB, message queues/brokers like Amazon SQS or Google Pub/Sub, data streaming services like Amazon Kinesis, file systems, event-driven applications, or any other data sources.

  2. Transformation: A custom function that processes and transforms the incoming data. These functions can clean, enrich, or analyze the data to extract meaningful insights.

  3. Sinks: Destinations where the processed data is sent. This can include analytical databases such as ClickHouse or ChromaDB, storage systems such as Amazon S3 or Azure Blob Storage, data warehouses like Snowflake, Google BigQuery, or other services for further use.

Pipeline configuration

There are two ways to configure a new pipeline:

  1. Using the GlassFlow WebApp app.glassflow.dev.

See the Create a Pipeline page to learn how to create a new pipeline.

Guidelines to name a space and pipeline

A GlassFlow resource name you provide for a space, pipeline, or organization name when you create them.

When you create a resource by providing a name, GlassFlow generates a uniquely identified ID for the resource. The resource can be accessed by this ID.

The resource name can have the following format:

  • Can contain a combination of uppercase, lowercase letters, and numbers.

  • Allowed Special Characters: Including dashes - and underscores _, special characters (e.g., !, @, #, $, %, ^, &, *, (, ), +, =, {, }, [, ], |, \, :, ;, ', ", <, >, ,, ., ?, /)

  • Length Limit: To ensure compatibility and readability, the pipeline name must be within a certain length limit, typically not exceeding 64 characters.

Best Practices:

  • Descriptive Names: Choose names that clearly describe the purpose or function of the pipeline, making it easier to identify and manage multiple pipelines.

  • Consistent Naming Scheme: Adopt a consistent naming scheme across your pipelines, especially if you're managing many of them. This could involve prefixes or suffixes that indicate the pipeline's stage in the data processing workflow (e.g., ingest, transform, export) or its data source.

Last updated

© 2023 GlassFlow