From VLOOKUP/XLOOKUP to Joins
An introduction to database joins based on Excel's VLOOKUP and XLOOKUP functions.
In spreadsheets software like Excel and Google Sheets, the XLOOKUP function (and its predecessor, VLOOKUP) connects data that has been broken up into different sheets or parts of a spreadsheet. If you’ve used XLOOKUP before, you might not have realized that you basically performed the same operation as a database join that combines two database tables. Here is a look at how XLOOKUP works, and how it’s related to simple database joins.
A look at XLOOKUP
These two lookup functions serve the same purpose: to find a record in a spreadsheet based on a lookup value. They differ in some relatively minor ways that we’ll ignore for the purposes of this discussion. XLOOKUP is the more flexible and powerful version of VLOOKUP, which is why we’ll only refer to XLOOKUP for the rest of this article.
For example, in the image above, the left sheet contains a list of orders, while the spreadsheet on the right contains information about products. Each order only has one kind of product, but the quantity ordered can vary. We want to compute the total for each product. The result will be another spreadsheet, which looks like this.
The Orders and Products sheets have some data in common, a list of Product IDs that allows us to link products to orders.
In the Products sheet, the values in the ProductID
column identify rows in the Products sheet itself. In the Orders sheet, the values in the ProductID
column refer to rows in a different sheet, in this case the Products sheet.
To compute the total for each order, we have to look up the product in the Products table, get its price from the Price column, and then multiply it by the entry in the order’s Quantity column.
Looking up using XLOOKUP
The XLOOKUP function takes several arguments:
lookup_value
: the value to look up (which column)lookup_array
: where to look it up (which table)return_array
: what to return- (Plus a few optional arguments, which we’ll ignore here)
In our case, the value we want to look up is a value in the ProductID
column of our Orders table. Where to look it up is the first column of the Products sheet which is also called ProductID
.
XLOOKUP scans the column until it finds a matching ProductID
, then returns a value from the “what to return” argument from the matching row. We’ll specify the Price
column as “what to return”, so XLOOKUP will return the value in the Price column of the row with the matching ProductID
. XLOOKUP then inserts this price into our Totals sheet.
At this point, we can get our order total with standard Excel or Google Sheets functions. We simply multiply the order’s Quantity
by its Price
.
To perform this operation on the entire table, we can simply copy the formula across all rows, again just like you would in any spreadsheet. One thing to be careful with is to make sure the lookup and return ranges don’t slip as you copy the formula down the column. For details on how to use XLOOKUP in Excel, see the XLOOKUP documentation.
Using a JOIN instead
In a database, the same operation we just performed is done with a join. A join works basically the same way as XLOOKUP, but operates on the entire table at once. Under the hood, a join “looks up” each row in the orders table, just like XLOOKUP.
If we have a database with the same tables as in our spreadsheet example above, we can create the same totals table. We’ll first use Metabase’s query builder, and then look at how to do it using a SQL query.
Joining tables using the query builder
Here, we’re picking the two tables to be joined on the left, Orders and Products. Then we define how to join them on the right, which is done by comparing the respective ProductID
columns in each.
We can then create a custom column that multiples the values in the Quantity
column of the Orders table by the Price
in Products.
We end up with the same result as using XLOOKUP above, except now coming from our database.
Specifying the join in SQL
SQL, the Structured Query Language, is the native way of talking to a database. For many questions, a visual editor like the one above is the way to go, but for some advanced queries, SQL is necessary. The example here just serves to show how a simple SQL query works.
The query consists of three parts:
- select the relevant fields from the Orders table (
OrderID
,Quantity
, etc.) - multiply the
Price
value by theQuantity
(in the line ending inAS Total
) - create the join using the
LEFT JOIN
statement and specify which fields have to match
SELECT
Orders.OrderID,
Orders.ProductID,
Orders.Quantity,
Products.Price * Orders.Quantity AS Total,
Products.Name
FROM
Orders
LEFT JOIN Products ON Orders.ProductID = Products.ProductID
The details of the join aside, the main takeaway is that we’ve specified the same elements as in XLOOKUP: the join is defined using one column each from two different tables, and we’re specifying which value to take from the matched row for further operations.
Joins vs. XLOOKUP
There are certainly differences between database joins and XLOOKUP in spreadsheets. There are different types of joins, for example, and we’re only covering left outer joins here. Joins can also match on multiple fields, use functions as part of the matching, etc.
However, for the purpose of combining data from two tables that are matched on a single column, XLOOKUP and (left outer) joins perform essentially the same operation.
Where databases really shine is handling more complex operations. Perhaps you want to compute the sum of all totals per year or month, or compute the difference between values and a rolling average. In a database, aggregation and other operations can be done in the same query as a join.