Menu Close Get Metabase

Joins in Metabase

Aug 25, 2020 by The Metabase Team

SQL joins are a fact of life when it comes to analyzing data, and with Metabase we try to make getting the answers to your questions as easy possible – even if that requires some more complex tactics. In this article, we’ll cover how to join tables in Metabase using simple and custom questions in the notebook editor, and give some context on relational databases, tables, and keys, to give you a better understanding of how joins work.

We’ll cover:

Automatic joins in simple questions

Joins, in SQL, allow you to combine data from multiple tables, and where possible, Metabase will make those connections for you, making it easy for people to work with data across multiple tables. You’ll have to have those foreign key relationships defined in your data model, which allows Metabase to detect those relationships and connect those tables in the data model. If those foreign keys are not configured, administrators can always use the Data Model section to specify those relationships.

We’ll cover keys in depth below, but for now, let’s see an implicit join in action in a simple question. We’ll use the Sample Dataset included with Metabase as our data so you can try it out for yourself.

  • From the top navigation bar, select Ask a question.
  • Choose Simple Question.
  • Select the Sample Dataset as your data source.
  • Then choose the ORDERS table. Metabase will display a list of orders from the ORDERS table.
  • Click on the Settings button at the bottom left of the screen. As shown in figure 1, Metabase will slide out a sidebar of Table options, and present you with a list of columns to choose from.
Figure 1. <strong>ORDERS</strong> table with the <strong>Settings</strong> sidebar open to the <strong>Table options</strong> section. You can add, remove, and edit columns, including <strong>More columns</strong> from tables that Metabase can automatically join to the <strong>ORDERS</strong> table.
Figure 1. ORDERS table with the Settings sidebar open to the Table options section. You can add, remove, and edit columns, including More columns from tables that Metabase can automatically join to the ORDERS table.

The Visible columns are the columns currently in the query of the question, which in this case includes all the columns from the ORDERS table. You can add and remove columns, and change column settings (which differ depending on the column’s field type).

In addition to the Visible columns, you’ll see a section called More columns. This section will include columns from two other tables: PRODUCTS and PEOPLE. (We’ll get to why the REVIEWS table is conspicuously absent later in the article).

Metabase has already “joined” the ORDERS table to the PRODUCTS and PEOPLE table, which allows you to add columns from these tables.

For example, from this Settings sidebar, scroll down to find the PRODUCTS table, and click on the + next to the CATEGORY column to add it to Visible columns. You’ll see a new column, Products -> Category, in the visible columns, with the PRODUCT -> prefix indicating that the column is not native to the ORDERS table.

By clicking on an ID column and selecting view data, you can check to see which tables are connected to that data. For example, from the ORDERS table, you can view the details on an entry in the Product Id column. In the case in Figure 2, Metabase tells us that the product, Lightweight Wool Computer (ID: 146), is connected to the ORDERS table and the REVIEWS table.

Figure 2. Clicking on an ID column and selecting <strong>View details</strong> will show you, among other data, which tables the record is connected to.
Figure 2. Clicking on an ID column and selecting View details will show you, among other data, which tables the record is connected to.

Note that the actual PRODUCT_ID value is 146, not “Lightweight Wool Computer”. Metabase allows administrators to change how column values are displayed.

Figure 3. The field settings for the PRODUCT ID field in the <strong>ORDERS Table</strong> Admins can use the <strong>Display Values</strong> setting to display a human-readable title instead of an ID.
Figure 3. The field settings for the PRODUCT ID field in the ORDERS Table Admins can use the Display Values setting to display a human-readable title instead of an ID.

As shown in figure 3, when viewing the ORDERS table, the foreign key, PRODUCT_ID, displays the value from the TITLE column of the PRODUCTS table.

We’ll dig into keys below, but first let’s take a look at an explicit join.

Example explicit join in a custom question

Let’s try a simple join in the notebook editor. Start a new custom question with the Sample Dataset. In the Data section, select the ORDERS table. Next, click on the Join icon to add another table.

Figure 4. To add tables to your question, click on the <strong>Join</strong> icon (the Venn Diagram).
Figure 4. To add tables to your question, click on the Join icon (the Venn Diagram).

If we add the PRODUCTS table, Metabase will automatically fill in the foreign key relationship, since it knows that the field, PRODUCT_ID, contains foreign keys that reference values in the ID column (the entity key) of the PRODUCTS table.

Figure 5. Joining the <strong>ORDERS</strong> table to the <strong>PRODUCTS</strong> table by linking the <strong>foreign key (PRODUCT_ID)</strong> in the <strong>ORDERS</strong> table to the <strong>entity key (ID)</strong> in the <strong>PRODUCTS</strong> table.
Figure 5. Joining the ORDERS table to the PRODUCTS table by linking the foreign key (PRODUCT_ID) in the ORDERS table to the entity key (ID) in the PRODUCTS table.

Here’s what the query in figure 5 is saying: for each row in the ORDERS table, use the value in the PRODUCT_ID column to find the corresponding row(s) with the same value in the ID column of the PRODUCTS table, and return a new row with the values from the columns in both the ORDERS and PRODUCTS tables.

If you’re interested, here’s the basic SQL at work under the hood:

select * from ORDERS
left join PRODUCTS
on ORDERS.PRODUCT_ID = PRODUCTS.ID

(Note that if you convert the question to SQL, you’ll get a much more involved SQL query, but both queries return the same data.)

You may have noticed in the simple question mode above that you couldn’t add columns from one of the tables in the Sample Dataset, the REVIEWS table. To understand why, we’ll need some background on relational databases.

Relational databases

Relational databases (like PostgreSQL and MySQL), store data in tables (relations) that generally represent an entity of some kind, like orders or products. These tables comprise columns (attributes of the entity) and rows (sometimes called records). If you’ve ever worked with spreadsheet software before, a table is akin to a sheet with columns and rows.

Term Description Examples
Table Entity Person, Order, Product
Column Attribute of that entity Address, Description, ID
Row Instance of that attribute CA, 7, “Lightweight Wool Computer”

Some of these columns contain special attributes, called keys.

The keys to joins

Each table has a special column that contains distinct keys, known as entity keys, or primary keys, that uniquely identifies each row in the table. These columns typically contain ID numbers, and they can be automatically generated by the database, or by an application (e.g., an employee ID).

A second type of key, known as a foreign key, stores a reference to the entity key of a row in another table.

To join tables, you will use foreign and entity keys to tell the database how to combine data from multiple tables.

Let’s use Metabase’s Data browser to look at some keys with an example table in the Sample Dataset. From the top navigation bar in Metabase, go to Browse Data, and click on the Sample Dataset. As shown in figure 6, you’ll see cards for each table. Hover over a card to see additional options, and click on the book icon that appears to Learn about this table.

Figure 6. Go to <strong>Browse Data</strong> from the top navigation bar, select <strong>Sample Dataset</strong>, and click on the book icon to <strong>Learn about this table</strong>.
Figure 6. Go to Browse Data from the top navigation bar, select Sample Dataset, and click on the book icon to Learn about this table.

This table reference page contains several tabs:

  • Details
  • Fields in this table
  • Questions about this table
  • X-ray this table

Select the Fields in this table to view the Field name, Field type, and Data type for each field in the table, as shown in Figure 7.

Figure 7. Select <strong>Fields in this table</strong> to view the <strong>Field name</strong>, <strong>Field Type</strong>, and <strong>Data Type</strong>. The <strong>Orders</strong> table contains an <strong>entity key</strong> (<strong>ID</strong>) and two <strong>foreign keys</strong>, <strong>USER_ID</strong> and <strong>PRODUCT_ID</strong>.
Figure 7. Select Fields in this table to view the Field name, Field Type, and Data Type. The Orders table contains an entity key (ID) and two foreign keys, USER_ID and PRODUCT_ID.

The ORDERS table contains one entity key (a.k.a. primary key), ID, and two foreign keys: USER_ID and PRODUCT_ID:

  • The foreign key, USER_ID, is associated with the entity key, ID, of the PEOPLE table.
  • The foreign key, PRODUCT_ID, is associated with the entity key, ID, of the PRODUCTS table.

We can combine data from these tables — ORDERS, PEOPLE, PRODUCTS — by joining on these keys. Joining simply directs Metabase to line up the records from one table, and use the foreign key values in each row to combine data from the other table by finding the corresponding row (or rows) with the matching entity key value.

Metabase defaults to left joins, which brings us to types of joins.

Types of joins

Figure 8. The four types of joins: <strong> Left outer join, right outer join, inner join, full outer join</strong>.
Figure 8. The four types of joins: Left outer join, right outer join, inner join, full outer join.

There are four types of joins, which we’ll describe using an example of joining the ORDERS table (the left circle) to the PRODUCTS table (the right circle).

  • Left outer join. Select all records from the ORDERS table, along with records from the PRODUCTS table that meet the join condition, if any.
  • Right outer join. Select all records from the PRODUCTS table, along with records from the ORDERS table that meet the join condition, if any.
  • Inner join. Select only the records from ORDERS and PRODUCTS where the join condition is met.
  • Full outer join. Select all records from both tables, whether or not the join condition is met.

And by “join condition” we just mean: for each foreign key in the left table, is there a matching entity key in the right table?

Note that not all databases support all types of joins. As you can see in figure 7, the H2 database (the database that houses the Sample Dataset) does not support full outer joins.

Figure 9. Select a join: <strong>Left outer join</strong>, <strong>Right outer join</strong>, or <strong>Inner join</strong>.
Figure 9. Select a join: Left outer join, Right outer join, or Inner join.

Multiple joins

With that context on tables, keys, and joins, let’s see if we can link the ORDERS table to the REVIEWS table. Stated generally: if our starting table (the left table) doesn’t contain a foreign key that references the table we want to join to, how would we go about joining them?

If we try to join the ORDERS table to the REVIEWS tables, Metabase won’t know what to do (see Figure 10).

Figure 10. If there is no foreign key relationship, Metabase won't know what to do, and it will prompt you to select a foreign key relationship. Which, in the case of the <strong>ORDERS</strong> and <strong>REVIEWS</strong> tables, does not exist.
Figure 10. If there is no foreign key relationship, Metabase won't know what to do, and it will prompt you to select a foreign key relationship. Which, in the case of the ORDERS and REVIEWS tables, does not exist.

The ORDERS table does not have a foreign key for the REVIEWS table, which is why Metabase didn’t automatically connect the two tables.

Let’s turn to the data browser to find out which foreign keys the REVIEWS table includes (figure 11).

Figure 11. The only foreign key in the <strong>REVIEWS</strong> table is <strong>PRODUCT_ID</strong>.
Figure 11. The only foreign key in the REVIEWS table is PRODUCT_ID.

So, here’s our situation:

  • The ORDERS table has foreign keys to the PRODUCTS and PEOPLE tables.
  • The REVIEWS table has a foreign key to the PRODUCTS table.

To connect the ORDERS table to the REVIEWS table, we’ll need to join them via the PRODUCTS table. We’ll use a custom question to specify the joins. Here’s our notebook:

Figure 12. Joining the <strong>ORDERS</strong> table to the <strong>REVIEWS</strong> table via the <strong>PRODUCTS</strong> table.
Figure 12. Joining the ORDERS table to the REVIEWS table via the PRODUCTS table.

If you visualize the results, you’ll see a table of all of the columns from all three tables: ORDERS, PRODUCTS, and REVIEWS. Note that because a single product can have multiple reviews, we’ll see multiple rows for the same product and order, one for each review.

Column selection

In addition to joining our tables, we can be selective about which columns are visible in our question’s results. In the notebook editor, you can select the columns Metabase displays:

Figure 13. Selecting columns in the notebook editor.
Figure 13. Selecting columns in the notebook editor.

As above, you can also select the visible columns from the Settings sidebar.

Figure 14. Click on <strong>Settings</strong> to add or remove columns from your question.
Figure 14. Click on Settings to add or remove columns from your question.

You’ll notice that Metabase helpfully makes columns from the PEOPLE table available to add as well, so you now have the full Sample Dataset at your disposal.

Have fun joining tables in your own datasets, and remember: if you get stuck, be sure to consult the data browser to learn which tables have the foreign keys you need to join them.