How to sessionize clickstream events

Dec. 23, 2020

What Is Event Sessionization?

Events can be defined as any action taken by a user on a website or in an app (landing on a page, clicking a link, submitting a form, etc.). This is typically referred to as the "clickstream" because it is a raw stream of events that are logged as a user interacts with and navigates through web pages. Landing on this blog post has allowed me to track what you clicked on to get here!

Google Analytics defines a session as a collection of events that lasts until there is 30 minutes of inactivity. As long as events keep firing and are being logged without 30 minutes going by between each one, these events will be grouped as a single session. This methodology assumes if a user has been inactive on a site for a long period of time, they either physically left their device or navigated away to a different site and aren’t coming back soon. This time window can be adjusted to whatever you want, but 30 minutes is the industry standard.

Why Do We Need Event Sessionization?

Sessions make it easier to group and analyze events and actions users take, which helps us understand the customer journey and their interactions with your product. Some analyses include web traffic patterns, cohort analysis, and assigning attribution. These types of analyses enable you to optimize marketing campaigns and user flows to increase conversion rates and close more deals.

For the most part, Google Analytics can provide a great out of the box solution to analyze your web traffic, but if you are part of a larger organization with more complex data pipelines and third-party tracking vendors, you will eventually need to have an analyst pull, standardize, and drill down into these disparate data sets. Most likely this will be conducted in SQL, Python, or R.

How Do I Sessionize in SQL?

If you use Snowflake as your database, you can use the CONDITIONAL_TRUE_EVENT function to easily sessionize events in one line of SQL code. Historically, I've had to create many sub-queries or CTEs (Common Table Expressions) to get the same end result, but window functions make queries much cleaner and more efficient.

  1. First, you'll need to structure your data so there is one row per user per event.
  2. Next, you'll need to get the timestamp of the previous event for each event, partitioned by whichever id you assign to a user.
  3. Next, compare the previous timestamp to the current timestamp in order to see how much time elapsed between the two events.
  4. If the time between the two events is less than 30 minutes, these two events are in the same session. If the time between the two events was greater than 30 minutes, consider these two events to be in separate sessions.

These steps can be consolidated in one line of code in Snowflake using the CONDITIONAL_TRUE_EVENT and the LAG window functions:

CONDITIONAL_TRUE_EVENT(DATEDIFF(MIN, LAG(event_timestamp) OVER (PARTITION BY tracking_id ORDER BY event_timestamp ASC), event_timestamp) >= 30) OVER (PARTITION BY tracking_id ORDER BY event_timestamp ASC) AS session_number

The breakdown should end up looking something like this (click image to open in a new tab):



Notice how once the previous timestamp for this user is greater than 30, the session number increments by 1. This means that the user could have navigated away from our domain and came back later or they left their computer open on the page in the background/on a different tab before triggering another event to fire 225 minutes later.

Now imagine trying to analyze this with thousands of other users and hydrating website activity with marketing, sales, and purchase data. We can answer questions such as “How many sessions/conversions were generated from a Google Ad Campaign?” or “Where in the checkout process are users most likely to abandon their cart?” This is why a good data analyst/scientist is key to manipulate data and answer these questions, which can help optimize spend and drive outcomes that will help scale the business!

Have any questions about how to aggregate touchpoints for marketing attribution? Contact me!
If you have more questions about analyzing website activity, I suggest reading Web Analytics 2.0 as a starting point:

About Me
My image

James Roselle is a data engineer based in Boston.

Learn more!