Real-time clickstream analytics
A practical example of creating a pipeline for analyzing clickstream data and continuously updating real-time dashboards.
Clickstream data contains the information gathered as a user navigates through a web application. Clickstream analytics involves tracking, analyzing, and reporting the web pages visited and user behavior on those pages. This data provides valuable insights into user behavior, such as how they discovered the product or service and their interactions on the website.
In this tutorial, we will build a clickstream analytics dashboard using GlassFlow. We will use Google Analytics Data API in Python to collect clickstream data from a website and send them to a GlassFlow pipeline. Our transformation function will analyze the data to calculate additional metrics, and we will use Streamlit and Plotly to visualize the results.
Pipeline components
Producer
There are two options for data producers:
Use Python script
fake_producer.py
with the Faker library to generate mock clickstream data and push it to GlassFlow. You do not need to Set Up Google Analytics 4 API in this case.Use the Google Analytics 4 Data API integration example code in
ga_producer.py
Python script to push real-time report events to GlassFlow.
GlassFlow
GlassFlow is responsible for receiving real-time analytics data from the producer using Python SDK, applying the transformation function, and then making the transformed data available for consumption by the consumer.
Consumer
The dashboard component is built using Streamlit, a powerful tool for creating interactive web applications. This component visualizes the clickstream data by creating various charts and graphs in Plotly.
We'll use the GlassFlow CLI to create a new space and configure the data pipeline.
Prerequisites
Make sure that you have the following before proceeding with the installation:
You created a GlassFlow account.
You installed GlassFlow CLI and logged into your account via the CLI.
Basic knowledge of Google Analytics, Streamlit, and Plotly.
You have a Google Analytics (GA) account if you use the GA as a data producer.
Installation
Clone the
glassflow-examples
repository to your local machine:Navigate to the project directory:
Create a new virtual environment:
Install the required dependencies:
Steps to set up Google Analytics 4 API
Google Analytics 4 (or GA4) has an API that provides access to page views, traffic sources, and other data points. With this API, you can build custom dashboards, automate reporting, and integrate with other applications. We focus only on accessing and exporting data to GlasFlow using Python. You can find more comprehensive information about how to set up the Google Cloud Project (GCP), enable the API, and configure authentication in the API quickstart, or follow this step-by-step guide.
Enable the Google Analytics Data API for a new project or select an existing project.
Go to https://console.cloud.google.com/apis/credentials. Click "Create credentials" and choose a "Service Account" option. Name the service user and click through the next steps.
Once more go to https://console.cloud.google.com/apis/credentials and click on your newly created user (under Service Accounts) Go to "Keys", click "Add key" -> "Create new key" -> "JSON". A JSON file will be saved to your computer.
Rename this JSON file to
credentials.json
and put it underuse-cases/clickstream-analytics-dashboard
. Then set the path to this file to the environment variableGOOGLE_APPLICATION_CREDENTIALS
:
Add a service account to the Google Analytics property. Using a text editor or VS code, open the
credentials.json
file downloaded in the previous step and search forclient_email
field to obtain the service account email address that looks similar to:
Use this email address to add a user to the Google Analytics property you want to access via the Google Analytics Data API v1. For this tutorial, only Viewer permissions are needed.
Copy the Google Analytics property ID you are discovering and save it to variable value for
GA_PROPERTY_ID
in a.env
file in the project directory.
Define the transformation function
To provide meaningful insights to the user based on the received dimensions and metrics from Google Analytics, we apply some computations in the transformation function:
The sample transformation function enriches input event data with the following:
Engagement Score: Calculates an engagement score based on event count, screen page views, and active users.
Device Usage Insights: Analyzes the proportion of different device categories.
Content Popularity: Tracks the popularity of different screens/pages.
Geographic Distribution: Provides insights on user distribution based on geographic location.
Steps to run the GlassFlow pipeline
Create a Space via CLI
Open a terminal and create a new space called examples
to organize multiple pipelines:
After creating the space successfully, you will get a SpaceID in the terminal.
Create a Pipeline via CLI
Create a new data pipeline inside the space.
This command initializes the pipeline with the name clickstream-analytics-dashboard
in the examples
space and specifies the transformation function transform.py
. After running the command, it returns a new Pipeline ID with its Access Token.
Add pipeline credentials to the environment configuration file
Add the following configuration variables to the .env
file in the project directory:
Replace your_pipeline_id
, your_space_id
, and your_pipeline_access_token
with appropriate values obtained from your GlassFlow account.
Design Streamlit dashboard
The Streamlit dashboard code in consumer.py
the script will visualize the output from the GlassFlow transformation, which includes additional insights such as engagement score, device usage, content popularity, and geographic distribution.
The dashboard is updated in real-time with data being continuously consumed from the GlassFlow pipeline.
Run the pipeline
Run data producer
Run the ga_producer.py
or fake_producer.py
script to start publishing data:
Run the dashboard
Use Streamlit command to run the dashboard:
You see the output with several dashboards updating in real-time:
You learned how to integrate real-time analytics data from Google Analytics into GlassFlow for further processing and visualization. Analytics data can be also stored in a database like ClickHouse for future use.
Last updated