Chevron Left

See all Community Stories

How minimal modeling can improve self-service analytics

August 11, 2022

Contributed by

Olga Tatarinova

Epoch8.co

photo of Olga Tatarinova

Olga is a co-founder of Epoch8.co, an agency for analytical and machine learning projects. Passionate about self-service analytics implementation. You can reach her on LinkedIn.

As an analytics agency, we love providing self-service analytics tools for our customers. The idea that business users can explore not only the dashboards we’ve prepared, but the entire dataset as well, is very powerful. But business users often struggle with getting started with self-service analytics.

In this post, we’ll illustrate the typical problem with the common Entity-Relationship data model in self-service, and discuss how to improve the self-service experience by using different data modeling techniques.

Imagine we have a typical dataset fetched from Google Analytics. We have information about Users and their Events. Usually we’ll have two tables for this kind of data: Users and Events.

Sample user dataset: a screenshot from Metabase

Sample events dataset: a screenshot from Metabase

For self-service needs, we usually link the EVENTS table and USER table by primary-foreign keys, and get some USER attributes available from the EVENTS table.

a screenshot from Metabase

This approach seemed reasonable, at least until the day when one of our customers asked to plot DAU (daily active users) for some management meeting.

You can do it in self-service”, – we said. “Just go to the EVENTS table and count the number of distinct users by days”.

a screenshot from Metabase

But”, - the business user said, “If I want to plot Daily Active USERS, why should I go to the EVENTS table? I really don’t get it!

And here’s the real blocker for self-service analytics development in the company – what might be obvious to engineers, isn’t necessarily obvious for core business users.

So we decided to find a data model which suits self-service needs better than the classic Entity-Relationship model.

After some exploration of data modeling techniques, we ran into the Minimal Modeling approach, which dissects the database into Anchors (the main nouns of the domain), Links between the Anchors, and Attributes.

One of the requirements of Minimal Modeling is to have a Link table separated from the Anchors tables.

So after remodeling our dataset, we get:

  • Anchor tables (USER and EVENT) a screenshot from Metabase a screenshot from Metabase

  • And a separate link table a screenshot from Metabase

This link table contains two id’s: user_id and event_id and a timestamp, when the event was made. a screenshot from Metabase

And here comes the trick: if we use this link table as the main table for self-service analytics, we can link both USER and EVENTS table to it.

So USER and EVENTS table will share the same rank in the Summarize sidebar.

a screenshot from Metabase

So if a business user needs to plot Daily Active Users metric, he doesn’t have to guess where he should go: to the USERS table or to the EVENTS table. He goes to the link table, as an entry point.

Daily Active Users, built using a link table: a screenshot from Metabase

And the metrics for both USERS and EVENTS can be defined using only one link table. This idea of having one wide link table can be extended: if we have ORDERS, TRANSACTIONS and other domain anchors, they also can be linked together in the single link table, so it becomes a single point-of-entry for the self-service needs.

Contributed by

Olga Tatarinova

Epoch8.co

photo of Olga Tatarinova

Olga is a co-founder of Epoch8.co, an agency for analytical and machine learning projects. Passionate about self-service analytics implementation. You can reach her on LinkedIn.

You might also like

Anticipating the next click

Ukrit Wattanavaekin

Metabase

Data driven product management

Victor Bolu

WebAutomation

You might also like

Anticipating the next click

Ukrit Wattanavaekin

Metabase

Data driven product management

Victor Bolu

WebAutomation