Architecture

Discover GlassFlow architecture and key components.

Key components

The GlassFlow platform is composed of four main components:

  • Developer Interface: Including Web Application, Command Line Interface (CLI), or Python SDK allows users to interact with GlassFlow’s core functionalities.

  • GlassFlow API: Acts as the primary entry point for all requests from WebApp, CLI, or Python SDK, handling authentication, and authorization. Through the API, users can create, modify, and manage data pipelines, initiate data transformations, and monitor the status of their streaming processes.

  • Message Broker: Facilitates reliable event messages consuming and streaming, ensuring data flows smoothly between GlassFlow’s backend API service and serverless execution engine. GlassFlow uses NATS’s built-in distributed publish/subscribe messaging system called JetStream.

  • Serverless Execution Engine: Enables quick execution of Python code in response to events from the message broker and easy deployment of custom functions to the cloud. Leverages technologies like KEDA (Kubernetes Event-Driven Autoscaling) with integrated NATS trigger and Fission serverless framework built on top of Kubernetes.

Data Flow

The following diagram illustrates and describes the overall data flow in GlassFlow, from pipeline creation to events processing.

  1. The user defines a new stream processing pipeline using Web App, CLI, or YAML. Implements a data transformation logic in Python function.

  2. The backend API service manages the overall data streaming lifecycle. It processes pipeline configuration, deploys a custom transformation function in the Python code created by the user, to Fission, and stores user and config-related data in PostgreSQL.

  3. Once a data producer publishes events, the backend API service pushes the event to NATS Jetstream, a high-throughput, scalable message broker. NATS Jetstream queues and manages the distribution of event and delivery messages to the serverless execution engine.

  4. KEDA NATS trigger monitors the flow of events through NATS Jetstream. It dynamically adjusts the number of active processing units based on the current workload and allocates resources to process the events.

  5. Fission executes a transformation function, runs analytics, and performs any necessary computations on the event data in real-time. Within the serverless execution engine, data is transformed, aggregated, or enriched according to the defined business logic or processing requirements in the transformation function. Note that Fission itself has two components: Fission function and user custom functions

  6. A transformed stream of data is pushed to a subscriber and ready to be sent to its final destination, which could be a database, a data warehouse, a vector database for ML applications, or another external system.

  7. Throughout the data flow process, GlassFlow monitors the pipeline running status metrics using Prometheus which provides real-time insights into the performance and health of the data pipelines. This allows developers for proactive management and quick resolution of any issues.

Authentication

The authentication process in GlassFlow, powered by Auth0, provides a secure, efficient, and user-friendly way for users to access the platform through the Web Application or CLI.

Deployment and Operation

Users do not need to deploy their data pipelines manually. The whole process is managed by the GlassFlow platform using containerization (Docker), and orchestration tools (Kubernetes).

When you create a pipeline from a CLI or using a pipeline.yaml file in Glassflow, your function code and configuration will be validated and dockerized, a new Docker image is generated, and this image is then submitted to the GlassFlow serverless engine, as part of the deployment process.

Infrastructure as code

When developing Python stream processing applications with GlassFlow, you construct a sequence of stages that form a pipeline, which can be efficiently managed through CLI and visualized in the Web App. Each stage within this pipeline is crafted to suit specific needs. GlassFlow introduces a streamlined approach with infrastructure as code, enabling the entire pipeline to be defined by a single YAML file, pipeline.yaml(Coming soon). This file simplifies the reconstruction of pipelines, making deployment quick and straightforward. Any modifications to the pipeline configuration in one environment can be seamlessly integrated into another. This capability supports thorough testing of changes in a separate environment before a production deployment.

Last updated

© 2023 GlassFlow