Define a transformation function

This page explains how to create a custom transformation function with GlassFlow in Python

Data transformation enables the conversion or mapping of data from one format or structure into another. GlassFlow facilitates this process using a custom Python transformation function, allowing for a wide range of transformation scenarios including data cleaning, validation, normalization, enrichment, and more.

Implementing Transformations

To perform data transformations in GlassFlow, you write a Python script containing a mandatory handler function. This function is where you define your transformation logic:

def handler(data, log):
    # Your transformation logic goes here.
    return data

GlassFlow automatically invokes this function when a data pipeline runs and it passes two arguments:

  • data - represents the event dispatched to the pipeline, accessible within the function as a JSON or Python dictionary.

  • log - is a Python logging object to generate logs. Any logs created by the user will be included in the pipeline logs, which can be viewed through the CLI.

The handler function processes this data and returns the transformed data as a JSON or Python dictionary.

Default Transformation Function

When you create a pipeline in GlassFlow without a custom transformation function, a default "echo" function is automatically created. Here's the basic structure of the default transformation function script in GlassFlow:

To customize the transformation function, you can modify the handler.py file to include your transformation logic.

You can also include other Python dependencies (Python packages that youimport into your script) in the transformation function. See supported libraries with GlassFlow.

Custom transformation code samples

Explore the gallery of sample transformations on the Transformation page.

Data Cleaning

Data cleaning involves removing or correcting the data's inaccurate, incomplete, or irrelevant parts, such as whitespace, correcting typos, or filtering out unwanted records.

Example: Removing Null Values

IP Address Masking

IP address masking is useful for anonymizing user data. This transformation can replace the last octet of an IP address with 0 to mask the user's specific location.

Example: Masking IP Addresses

Data Enrichment

Data enrichment involves enhancing existing data with additional information and for instance, adding user demographic information based on an email address or user ID.

Example: Adding User Type

Next

In the Create a Pipeline section, you will learn how to configure a new pipeline.

Last updated

© 2023 GlassFlow