“Entropy is the price of structure.” - Ilya Prigogine
If you want to stay competitive, you need to give people in your organization access to the data they need to make better decisions. The cost of this democratization of data, however, is the inevitable flood of analysis - which can make it difficult to know which analyses you can trust.
It’s important to understand that there is no cure to this problem. There will always be some level of analytical entropy to tame, but with the right tools and processes in place, you’ll be able to keep the inevitable chaos in check.
The problems with democratizing analytics
The core of these problems center on definitions: how exactly do we define business logic like revenue, lifetime value, churn, and so on? And by “definitions,” we mean generally any quantifiable concept that’s important to your organization. Not just what is X, but how do we calculate X? These are the terms by which you’ll measure your organization, and the more specifically (and consistently) you define them, the better.
Here are some of the problems with definitions that we’ll need to guard against:
Where do people find specific definitions?
Once you start slicing your data to look at your organization from different angles, definitions will proliferate: revenue, churn rate, expected lifetime value, and so on. If we want to understand why our customers are churning, which definitions should we consult? Which new definitions do we need to define? And (literally) where can I find these official definitions in Metabase?
By conflicting we mean: are we even talking about the same thing? Consider revenue, for example. For the Sales team, revenue could mean bookings, but the Accounting folks mean recognized revenue accrual, while the Marketing team is talking about lifetime revenue.
Re-definitions, or which is the canonical definition?
What if we find multiple definitions for the same concept? How do we know which to trust? Are they all off the mark? Even if multiple groups agree that we should track week-over-week bookings, how those bookings are tallied could differ from query to query: one query could be accurate, another inaccurate and unvetted, created by an analyst who wasn’t aware that the official query to calculate bookings already existed; or forgot to omit test data, or failed to account for discounts, or simply created a new query to slice bookings in a different way.
Calculations to monthly revenue will likely change as some revenue streams are sunsetted and other streams are added. If we have different departments using that same definition in multiple questions and dashboards, how should we manage changes to definitions?
Strategies to tame the chaos
With our problems identified, let’s talk about how we can mitigate them. We’ll divide this discussion into two categories: the features Metabase provides, and the organizational processes we recommend you adopt.
Here are some tools that ship with Metabase that will help keep you organized. You’re likely already aware of the questions, dashboards, and collections, but they’re worth itemizing here to get a full picture of the toolkit.
Data reference and descriptions
Metabase provides places for you to include helpful text that contextualizes a particular item, whether that item is a database, table, question, dashboard, metric, or whatever. You don’t have to describe everything, but the more description you include, the less time people will spend figuring out “is this the data I think it is?” and the better their analysis will be. Documenting exceptions with data is especially important (for example, whether a table includes test data or employee accounts or other exceptions that analysts should be aware of).
For “official” databases, dashboards, and questions, you should require owners to maintain their documentation. And don’t get lazy with your titles; you can do a lot with a few extra words. Compare “Customer orders” with something like: “OFFICIAL: Rolling 7-day average daily orders - North America”.
For more about the reference tooling in Metabase, check out Exploring data with Metabase’s data browser.
Segments and metrics
Administrators can define official filters (or a set of filters) known as Segments that can be used in Metabase’s GUI query builders. For example, you could officially define what an “Active User” is via a Segment. “Active Users” would then appear in the filter dropdown or sidebar, so anyone can filter their queries by Active Users to see what kind of products those specific users buy, how long items sit in their carts, and so on.
Similarly, Metrics codify calculations. For example, Admins can set up an official metric for “Average Order Total” so that everyone knows (and can use) the official calculation of that metric, which includes tax but omits applied discounts.
Both Segments and Metrics are versioned. To learn more, check out Segments and Metrics.
Snippets are the SQL counterpart to GUI-based Segments and Metrics. You can use them to capture and replicate bite-sized SQL code. These snippets could capture segments, metrics, complicated joins, or any other bit of SQL that you might want to reuse in many queries.
The idea with segments, metrics and their SQL counterpart, snippets, is to 1) codify the definitions, and 2) make them easy to change as you refine your definitions over time: update one snippet, and each question that employs that snippet benefits from the updated definition.
To learn more, check out SQL snippets: reuse and share SQL code.
Reusing saved questions as sub-queries
You can use the results of saved question as the starting point for new questions. For example, if a table in your database contains all orders - including test and employee orders - you could save a question that filters out non-customer orders as “Customer orders” for everyone to use as the starting data for new questions. To learn more, check out SQL snippets vs Saved Questions vs Views.
Collections group questions and dashboards. Additionally, you can pin the most important items to the top of your collection. To learn more, check out Working with collection permissions.
Here we’re not talking about what tools can do; we’re talking about how people should use them.
Create official collections for each department
For each department, create a collection and make it editable by only a small group of people. This group should curate that collection, and only pin questions and dashboards they have vetted, decorated with useful descriptions, and actively maintain it.
If you’re running Enterprise Edition, you should do the same with SQL Snippets folders. Organize folders by department, assign owners to those folders, and take advantage of folder permissions.
Adopt a naming convention
Set a standard naming convention across your dashboards, collections, and questions so that it’s obvious which items are official. How you define that convention is less important than having a convention at all. When in doubt: even a simple prefix like “Certified” or “Official” (e.g., “OFFICIAL: Email opens per 1000 users”) can help people sift through search results and know which items have been vetted.
Designate collections for experimentation and works in progress
Create designated places for people to store works in progress (sometimes called scratch or playground collections). People can - and should - use personal collections for experimentation, but it’s also important to have public places where people can share their work with others to get feedback on their analysis in progress.
Anyone can duplicate the official questions and dashboards, but you should encourage people to save those items to their personal collections, or to a collection designated for experimentation. If one of the dashboards in these areas takes off, you can relocate it to the relevant “official” collection. You can set permissions on these official collections so that everyone can view them, but only a select few can edit them - ensuring that everything in that collection is correct and actively maintained.
Have a policy on when to archive items
For these ephemeral items, set clear expectations for when people should archive them so that these playgrounds don’t fill with clutter. If you’re managing your department’s collections, and only pinning vetted items, clutter becomes less of an issue, but keeping the scratch collections relatively fresh will improve search results.
And don’t stress about archiving, since you can resurrect items at any time.
Other ideas for taming chaos?
If you have any tips to share, or ideas for changes or improvements to Metabase, let us know on our forum.