Consume data

This page explains how to consume data from GlassFlow pipelines.

Consuming data is a process of pulling transformed data from a data pipeline in GlassFlow. GlassFlow Python SDK is used to retrieve and consume data from the pipeline.

Create a data consumer for the mobility project

In this section, you'll learn how to consume transformed data from the mobility project data pipeline in GlassFlow and store it locally on a file. You'll use the GlassFlow Python SDK to interact with the pipeline and retrieve the transformed data in real-time.

Prerequisites

We assume that you have already completed the following before proceeding with the tutorial:

You completed the Create a Pipeline tutorial.
You completed the Produce data tutorial.

Consume Transformed Data

Create a Python script consumer_file.py inside the mobility folder and add the following code:

https://github.com/glassflow/glassflow-examples/blob/main/tutorials/mobility/consumer_file.py

"""Get transformed data and store it locally on disk
"""
import glassflow
import sys
from dotenv import dotenv_values
import json


def main():
    config = dotenv_values(".env")
    print(config)
    pipeline_id = config.get("PIPELINE_ID")
    space_id = config.get("SPACE_ID")
    token = config.get("PIPELINE_ACCESS_TOKEN")

    client = glassflow.GlassFlowClient()
    pipeline_client = client.pipeline_client(space_id=space_id,
                                             pipeline_id=pipeline_id,
                                             pipeline_access_token=token)

    with open("mobility_data_transformed.txt", "a+") as f:
        while True:
            try:
                # consume transfornmed data from the pipeline
                res = pipeline_client.consume()
                if res.status_code == 200:
                    # get the transformed data as json
                    data = res.body.event
                    print("Data consumed successfully")
                    print(data)
                    f.write(json.dumps(data) + "\n")
                    f.flush()
            except KeyboardInterrupt:
                print("exiting")
                sys.exit(0)


if __name__ == "__main__":
    main()

This script continuously checks for newly transformed data from the pipeline and consumes it as needed. The main GlassFlow SDK usage revolves around creating a GlassFlow client instance and a pipeline client instance to interact with the GlassFlow platform and consume data from the data pipeline, respectively.

Initializes a GlassFlow client to establish a connection with the GlassFlow platform.

client = glassflow.GlassFlowClient()

Creates a pipeline client for the specific data pipeline identified by pipeline_id within the specified space_id.

pipeline_client = client.pipeline_client(space_id=space_id, 
                                         pipeline_id=pipeline_id, 
                                         pipeline_access_token=token)

Consumes the transformed data from the pipeline. It returns a response object containing the consumed event data.

res = pipeline_client.consume()

Run the script

python consumer_file.py

The script will start consuming data continuously from the pipeline and storing it locally on disk. You can see an example of consumed data here. You can check the updates to the data written to the file by running this command in another terminal window

tail -f mobility_data_transformed.txt

You can further extend this functionality to push the consumed data to cloud storage buckets or real-time databases as per your project requirements. See other tutorials for complex scenarios.

PreviousPublish data NextTutorials

Last updated 2 days ago