How Sampling Works in Google Analytics 4 (GA4)

Sampling in Google Analytics can be a real challenge. Read this post to enhance your knowledge on sampling, hit limits, thresholding and cardinality in Google Analytics 4.

I have been using Google Analytics for 10+ years now and one of the biggest pains in Universal Analytics is sampling. There are certainly ways to deal with it, but for the regular (non-paying) user there isn’t a very good solution.

Today, I will dive into the concept of data sampling in Google Analytics 4 and related topics you need to understand to remain confident in your data.

Table of Contents

Let’s start with an introduction to sampling in Google Analytics 4.

Sampling in Google Analytics 4

One of the questions I often get from users starting with Google Analytics 4 is related to data sampling.

“Is the data in Google Analytics 4 still sampled as it is very challenging for me in Universal Analytics?”

Sometimes it is and we will explore this later in more detail showing some examples in Google Analytics 4.

Sampling in GA4 might occur if you use the advanced features in Google Analytics and you pass a certain events threshold.

Standard vs Advanced Reports in GA4

In Google Analytics 4, the standard reports are always unsampled. This is true even if you apply secondary dimensions, filters or other report modifications.

You can find the standard reports under “Reports” in the main navigation:

GA4 - standard reports

The green icon above indicates that the reports are unsampled.

Google Analytics shows an orange icon when data sampling occurs or as a warning regarding a certain threshold that might be applied.

GA4 - events overview

In the example above the report is based on 100% of available data, but there is a warming about a threshold.

The standard, default report section is a good start, but very limiting if you want to get the most out of GA4. A wide range of advanced features need to be mastered to get the most out of this new version of Google Analytics.

Sampling might occur when you create an advanced analysis in GA4.

GA4 - explore feature

This is where you can really dig through your data and derive greater insights. This section currently contains the following report templates:

  • (Blank)
  • Free-Form
  • Funnel exploration
  • Path exploration
  • Segment overlap
  • Cohort exploration
  • User lifetime

I will share more details on these reporting options in a future blogpost. For now, you need to understand that sampling might occur if you use these advanced analysis feature in Google Analytics 4.

GA4 - free-form sampled

Event count > 11 million and 90% of available data is used to generate this report. In this case I would still trust the data, but in general I would say be careful if this percentage is lower than 70 or 80%.

Google Analytics 4 vs Universal Analytics

You might wonder, how does sampling in Google Analytics 4 exactly differ from Universal Analytics?

In Universal Analytics, the default or standard reports are always unsampled. However, sampling occurs if you apply secondary dimensions, segments or other ad-hoc queries to your dataset. Data sampling occurs at a certain threshold and it depends on whether you are a paying customer or not:

  • Analytics Standard: 500k sessions at the property level for the date range you are using
  • Analytics 360: 100M sessions at the view level for the date range you are using

Read this post about Universal Analytics and sampling if you want to learn how you can best deal with sampling.

Back to Google Analytics 4. The default or standard reports are always unsampled (you can’t apply segments here). This is true even if you apply ad-hoc queries to your dataset. You might have noticed that the number and variety of default reports is greatly reduced in Google Analytics 4 compared to Universal Analytics.

The advanced reports in the Explore/Analysis section are usually sampled if you are exceeding 10 million events and the report you create is not a pre-existing standard report.

Hit Limits in GA4

Here comes the great thing…

In Universal Analytics (free) there is a hit limit of 10 million hits per account on a monthly basis.

Google Analytics 4 is also free (there will be a paid version as well) and has no hit/events limits. This is really great if your company has a high number of daily users on the site and/or app and triggers loads of events.

Thresholds in GA4

Data thresholds in GA4 are system-defined and cannot be adjusted. This occurs in Google Analytics 4 for certain dimensions to protect users privacy.

Demographic and affinity dimensions are mostly affected. Here is what Google says:

“If a report or exploration includes demographic information, such as Age, Gender, or Interest Category, and the reporting identity relies on the device ID, the row containing that data may be withheld if there aren’t enough total users to prevent individual users from being identified.”

Let me show you an example of the standard “Demographic details” report:

GA4 - Demographic details - country

The dimension value “unkown” is applied to the Country in the majority of cases and has a strong impact on the gender data visible in GA4.

Here is a more detailed look at United Kingdom users:

GA4 - UK and gender

Over 95% of the Gender dimension values are not visible for users from the UK. Quite a challenge to work with this data!

Cardinality in GA4

Each report dimension (e.g., User source, User medium, User campaign, Gender etc.) has a number of values that can be assigned to it. The total number of unique values for a dimension is known as its cardinality.

Gender is an example of a low-cardinality dimension. On the other hand, Page Path is a high-cardinality dimension as it usually contains many different unique values.

Analytics queries different tables before showing a table in a report. Be aware of potential discrepancies when a query of the aggregated-data or event-level tables returns more rows than Analytics can render.

The result is that part of the dataset is being aggregated as (other).

In most cases this only occurs if a dimension has around 20,000 unique values per day or more. However, I have seen exceptions to the rule:

GA4 - cardinality

Only 317 unique values, but there is still a cardinality (other) dimension value logged in the report.

BigQuery and GA4

Integrating BigQuery with Google Analytics 4 gives you access to the raw data (almost) for free.

BigQuery allows you to export raw data unsampled and so you can conduct much more granular analysis with confidence in your data.

  • Pay for what data is collected and processed (minimum costs)
  • A scalable solution
  • Export custom event parameters and dimensions
  • Connect GA4 data with third-party API’s
  • Connect (GA4) data from BigQuery with popular data visualisation tools such as Data Studio and Tableau

If you are seeing excess data aggregated as (other) on a regular basis, you can use BigQuery Export to export your Analytics data to BigQuery and query the entire dataset.

Concluding Thoughts

Sampling in Google Analytics 4 is still present and can be challenging, but you have a great opportunity to mitigate any impact it has on your data.

Think about integrating GA4 with BigQuery to stay or become fully confident in your data. In the free version of Universal Analytics this isn’t an option!

Invest in SQL and BigQuery and add both skills to your profile if you haven’t yet! And if you are working at a smaller company and collect not so much data, you should be all good (starting out) without this BigQuery/GA4 integration. See it as an future opportunity.

This is it from my side! Happy to hear your comments on sampling and Google Analytics 4.

One last thing… Make sure to get my automated Google Analytics Audit Tool. It contains 25 key health checks on the Google Analytics Setup.

Get Free Access to The Google Analytics Audit Tool

Source link

https://www.cupbord.com/how-sampling-works-in-google-analytics-4-ga4/