SQL joins are a fact of life when it comes to analyzing data, and with Metabase we try to make getting the answers to your questions as easy possible – even if that requires some more complex tactics. In this article, we’ll cover how to join tables in Metabase using simple and custom questions in the notebook editor, and give some context on relational databases, tables, and keys, to give you a better understanding of how joins work.
- Automatic joins in simple questions
- Example explicit join in a custom question
- Relational databases
- The keys to joins
- Types of joins
- Multiple joins
- Column selection
Automatic joins in simple questions
Joins, in SQL, allow you to combine data from multiple tables, and where possible, Metabase will make those connections for you, making it easy for people to work with data across multiple tables. You’ll have to have those foreign key relationships defined in your data model, which allows Metabase to detect those relationships and connect those tables in the data model. If those foreign keys are not configured, administrators can always use the Data Model section to specify those relationships.
We’ll cover keys in depth below, but for now, let’s see an implicit join in action in a simple question. We’ll use the Sample Dataset included with Metabase as our data so you can try it out for yourself.
- From the top navigation bar, select Ask a question.
- Choose Simple Question.
- Select the Sample Dataset as your data source.
- Then choose the
ORDERStable. Metabase will display a list of orders from the
- Click on the Settings button at the bottom left of the screen. As shown in figure 1, Metabase will slide out a sidebar of Table options, and present you with a list of columns to choose from.
The Visible columns are the columns currently in the query of the question, which in this case includes all the columns from the
ORDERS table. You can add and remove columns, and change column settings (which differ depending on the column’s field type).
In addition to the Visible columns, you’ll see a section called More columns. This section will include columns from two other tables:
PEOPLE. (We’ll get to why the
REVIEWS table is conspicuously absent later in the article).
Metabase has already “joined” the
ORDERS table to the
PEOPLE table, which allows you to add columns from these tables.
For example, from this Settings sidebar, scroll down to find the
PRODUCTS table, and click on the + next to the
CATEGORY column to add it to Visible columns. You’ll see a new column,
Products -> Category, in the visible columns, with the
PRODUCT -> prefix indicating that the column is not native to the
By clicking on an ID column and selecting view data, you can check to see which tables are connected to that data. For example, from the
ORDERS table, you can view the details on an entry in the Product Id column. In the case in Figure 2, Metabase tells us that the product, Lightweight Wool Computer (ID: 146), is connected to the
ORDERS table and the
Note that the actual
PRODUCT_ID value is 146, not “Lightweight Wool Computer”. Metabase allows administrators to change how column values are displayed.
As shown in figure 3, when viewing the
ORDERS table, the foreign key,
PRODUCT_ID, displays the value from the
TITLE column of the
We’ll dig into keys below, but first let’s take a look at an explicit join.
Example explicit join in a custom question
Let’s try a simple join in the notebook editor. Start a new custom question with the Sample Dataset. In the Data section, select the
ORDERS table. Next, click on the Join icon to add another table.
If we add the
PRODUCTS table, Metabase will automatically fill in the foreign key relationship, since it knows that the field,
PRODUCT_ID, contains foreign keys that reference values in the
ID column (the entity key) of the
Here’s what the query in figure 5 is saying: for each row in the
ORDERS table, use the value in the
PRODUCT_ID column to find the corresponding row(s) with the same value in the
ID column of the
PRODUCTS table, and return a new row with the values from the columns in both the
If you’re interested, here’s the basic SQL at work under the hood:
select * from ORDERS left join PRODUCTS on ORDERS.PRODUCT_ID = PRODUCTS.ID
(Note that if you convert the question to SQL, you’ll get a much more involved SQL query, but both queries return the same data.)
You may have noticed in the simple question mode above that you couldn’t add columns from one of the tables in the Sample Dataset, the
REVIEWS table. To understand why, we’ll need some background on relational databases.
Relational databases (like PostgreSQL and MySQL), store data in tables (relations) that generally represent an entity of some kind, like orders or products. These tables comprise columns (attributes of the entity) and rows (sometimes called records). If you’ve ever worked with spreadsheet software before, a table is akin to a sheet with columns and rows.
|Table||Entity||Person, Order, Product|
|Column||Attribute of that entity||Address, Description, ID|
|Row||Instance of that attribute||CA, 7, “Lightweight Wool Computer”|
Some of these columns contain special attributes, called keys.
The keys to joins
Each table has a special column that contains distinct keys, known as entity keys, or primary keys, that uniquely identifies each row in the table. These columns typically contain ID numbers, and they can be automatically generated by the database, or by an application (e.g., an employee ID).
A second type of key, known as a foreign key, stores a reference to the entity key of a row in another table.
To join tables, you will use foreign and entity keys to tell the database how to combine data from multiple tables.
Let’s use Metabase’s Data browser to look at some keys with an example table in the Sample Dataset. From the top navigation bar in Metabase, go to Browse Data, and click on the Sample Dataset. As shown in figure 6, you’ll see cards for each table. Hover over a card to see additional options, and click on the book icon that appears to Learn about this table.
This table reference page contains several tabs:
- Fields in this table
- Questions about this table
- X-ray this table
Select the Fields in this table to view the Field name, Field type, and Data type for each field in the table, as shown in Figure 7.
ORDERS table contains one entity key (a.k.a. primary key),
ID, and two foreign keys:
- The foreign key,
USER_ID, is associated with the entity key,
ID, of the
- The foreign key,
PRODUCT_ID, is associated with the entity key,
ID, of the
We can combine data from these tables —
PRODUCTS — by joining on these keys. Joining simply directs Metabase to line up the records from one table, and use the foreign key values in each row to combine data from the other table by finding the corresponding row (or rows) with the matching entity key value.
Metabase defaults to left joins, which brings us to types of joins.
Types of joins
There are four types of joins, which we’ll describe using an example of joining the
ORDERS table (the left circle) to the
PRODUCTS table (the right circle).
- Left outer join. Select all records from the
ORDERStable, along with records from the
PRODUCTStable that meet the join condition, if any.
- Right outer join. Select all records from the
PRODUCTStable, along with records from the
ORDERStable that meet the join condition, if any.
- Inner join. Select only the records from
PRODUCTSwhere the join condition is met.
- Full outer join. Select all records from both tables, whether or not the join condition is met.
And by “join condition” we just mean: for each foreign key in the left table, is there a matching entity key in the right table?
Note that not all databases support all types of joins. As you can see in figure 7, the H2 database (the database that houses the Sample Dataset) does not support full outer joins.
With that context on tables, keys, and joins, let’s see if we can link the
ORDERS table to the
REVIEWS table. Stated generally: if our starting table (the left table) doesn’t contain a foreign key that references the table we want to join to, how would we go about joining them?
If we try to join the
ORDERS table to the
REVIEWS tables, Metabase won’t know what to do (see Figure 10).
ORDERS table does not have a foreign key for the
REVIEWS table, which is why Metabase didn’t automatically connect the two tables.
Let’s turn to the data browser to find out which foreign keys the
REVIEWS table includes (figure 11).
So, here’s our situation:
ORDERStable has foreign keys to the
REVIEWStable has a foreign key to the
To connect the
ORDERS table to the
REVIEWS table, we’ll need to join them via the
PRODUCTS table. We’ll use a custom question to specify the joins. Here’s our notebook:
If you visualize the results, you’ll see a table of all of the columns from all three tables:
REVIEWS. Note that because a single product can have multiple reviews, we’ll see multiple rows for the same product and order, one for each review.
In addition to joining our tables, we can be selective about which columns are visible in our question’s results. In the notebook editor, you can select the columns Metabase displays:
As above, you can also select the visible columns from the Settings sidebar.
You’ll notice that Metabase helpfully makes columns from the
PEOPLE table available to add as well, so you now have the full Sample Dataset at your disposal.
Have fun joining tables in your own datasets, and remember: if you get stuck, be sure to consult the data browser to learn which tables have the foreign keys you need to join them.