Lesson

Debugging SQL query logic

What to do when your query returns data that looks wrong.

Combining data can get complicated very quickly (that’s probably why you’re patiently writing SQL instead of using the friendly notebook editor). This debugging guide explains what you can do when your query returns unexpected results.

Debugging steps

  1. Get the schemas for the data sources used in your query.
  2. Review the foreign keys of your tables or nested queries.
    • Is there more than one possible foreign key?
    • Have the foreign keys been renamed or moved to another schema?
    • If you’re not sure, ask the person who maintains the schema.
  3. Check for common SQL logic problems.

Common reasons for unexpected query results

SQL logic describes the way that your query combines data from different tables or data sources (including temporary tables, such as the results of other queries). The most common ways of combining data are joins and nested queries.

Even if your SQL logic used to work, it can break when:

  • The tables or data sources have changed.
  • The nested queries have changed (if you’re building on top of a saved question or model).
  • Your nested queries aren’t being computed as you expect (if you’ve written them from scratch).
  • Your data includes edge cases, such as empty or NULL values.

Most of the time, these changes are introduced upstream by the systems that collect your data, or the lovely people who manage your databases and BI tools.

It’s extremely tricky for teams to anticipate ripple effects from such changes. Fixing SQL logic is not only about responding to change, but updating your approach to better guard against future updates.

If you’re getting a red error message that mentions SQL clauses or table and column names, you most likely have a SQL syntax problem. Go to Debugging SQL syntax instead.

Common SQL logic problems

Aggregated results (counts, sums, etc.) are wrong

  1. If your aggregations are:
    • too high, check if your source tables or queries have duplicated rows.
    • too low, check if your source tables or queries have missing rows.
  2. Check your source tables or queries for filters.
    • How are you handling empty or NULL rows in your aggregations?
    • How are you handling invalid, cancelled, or expired records? Ask your Metabase admin or data team about business logic that you might not know about.
  3. If you’re working with COUNT_DISTINCT, check if it’s interacting with other aggregate functions.
    • For example, applying SUM on top of COUNT_DISTINCT may double-count unique values.
  4. If you’re working with time series data, check your time zones.
  5. If your data gets updated on a schedule, ask your Metabase admin if your tables are up to date.

Explanation

Aggregations are often the first place where you’ll detect a problem caused by one of the common reasons for unexpected query results. The steps above will help you catch any data edge cases that may be skewing your results. If you find lots of edge cases, and you anticipate handling the same cases over and over again, you may want to bundle all of that logic into a model so it can be easily re-used.

Sometimes, you might just need a pair of fresh eyes. If you can’t locate the root cause using the steps above, ask a teammate to help you check your math!

Further reading

How to identify a nested query

If your SQL contains:

How to get the schema for a nested query

  1. Get a sample of data from your nested query.
    • For subqueries or CTEs, run each SELECT block separately and use theLIMIT clause.
    • For saved questions or models, go to the underlying Metabase question from the variables panel or by pasting the ID number into the search bar. Add a row limit using the notebook editor, or add a LIMIT clause in the SQL editor.
  2. Compare the column names and values between your samples to check for foreign keys. For example:
    • In the Metabase Sample Database, the Products table has an ID column, and the Orders table has a Product ID column.
    • ID and Product ID both contain integer values, and many of those values show up in both columns.
  3. Compare the rows between your samples to check for table relationships. For example:
    • The Products table has unique values in the ID column.
    • The Orders table has multiple rows with the same Product ID.
    • The table relationship from Products to Orders is one-to-many (assuming that the foreign key relationship is valid).
  4. If you’re using a model, you can look for explicitly defined metadata by hovering over the column name.
  5. If you’re building off of someone else’s work, ask the original creator of the query, saved question, or model.

Explanation

A schema describes the columns in a table, the data types of those columns, and the relationships between columns across different tables. This metadata is usually explicitly defined for tables stored in your database by the people who manage your data.

Since the results of nested queries are only stored temporarily, the metadata about the results isn’t defined or stored anywhere. The steps above will help you manually inspect the query results instead.

Once you have the schemas for your nested queries, you can follow the debugging steps.

Further reading

Do you have a different problem?

Are you still stuck?

Search or ask the Metabase community.

Thanks for your feedback!

Get articles like this one in your inbox every month