Data and Business Intelligence Glossary Terms

A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
V
W
X

A

Aggregation

The act of summarizing data with a mathematical function, such as a averaging the values in a column, or counting the number of rows in a table.

What is an aggregation? An aggregation is the act of summarizing data with a mathematical function, such as averaging the values in a column, or counting the number of rows in a table. The resulting number is often called a metric, which is distinct from metrics in Metabase. Individual values in a field may not hold much meaning on their own, but when we combine these values in some specific way, we can paint a more comprehensive picture of our data. Aggregation is the process of collapsing those values into a single result, and is usually performed in conjunction with grouping — that is, combining multiple rows based on a certain value, like grouping by a dimension (e.g., a product category or country). Aggregations can be calculated on the fly, but you may also want to create summary tables with your results and save the results of those aggregate functions for future use. Summary tables can be especially useful when working with large datasets; since summary tables are precomputed, queries that rely on them can run much faster. Common aggregate functions in SQL Different databases have different sets of functions, but here are some of the most common aggregate functions that you’ll encounter: COUNT() - Counts the number of rows in a table. AVG() – Computes the average of values in a field. MIN() – Identifies the minimum value in a field. MAX() – Identifies the maximum value in a field. SUM() – Returns the sum of values in a field. STDEV() – Calculates the standard deviation of values in a field. Example aggregation Using Metabase’s Sample Database, let’s say we wanted to know the average price of our products, grouped by product category. In this case, we’ll use the Products table. Our SQL query would look like this: SELECT category, avg(price) FROM products GROUP BY category Just like we wanted, we’ve calculated the average of values in the Price column of our Products table, and grouped those averages according to their value in the Category field. If we wanted to execute that same aggregation in Metabase’s query builder, we’d Summarize by Average of Price, and then group by Category, like in the image below: Fig. 1. Performing an aggregation in the query builder: average price of products grouped by product category.

Read More
Area chart

A type of line chart where the space between plotted values and the x-axis is filled in with a solid color.

What is an area chart? An area chart is a type of line chart where the space between plotted values and the x-axis is filled in with a solid color. You can think of an area chart as a cross between a standard line chart and a bar chart. In an area chart, you still have a line that connects data points (usually plotting change over time), but by filling in the area between that line and the x-axis, you’re highlighting both the data’s relationship to zero and the relationship of any subcategories to a total. When to use an area chart Area charts are a great choice when: You’re plotting multiple subsets of data that, together, make up some total value. You need to represent volume and changes in volume over time. Your data contains too many x-axis data points for a stacked bar chart. Consider a different visualization, like a bar or line chart, if: You’re only measuring a single series over time. Your data is highly volatile, subject to frequent and/or significant swings. You’re measuring a large number of subcategories within one chart. Example area chart Similar to a stacked bar chart, stacking data on an area chart works best if you have a small number of categories to track — too many and your area chart will start to get chaotic or unreadable. Figure 1 shows Orders broken by out by Product → Category, visualized as a stacked area chart. Using this visualization, we can understand both the rise and fall of individual product categories and how orders have performed as a whole: Fig. 1. An example area chart.

Read More
Attribute

An attribute is a property that describes or identifies some entity. In some Metabase plans, user attributes are used to restrict which data people can access.

What is an attribute? An attribute is a property that describes or identifies some entity. People in the data world use “attribute” in a few different contexts, so we’ll do our best to disambiguate here. Basically, an attribute is a characteristic of something. That something might be a table, but attribute could also refer to a characteristic of a specific record, like user attributes in Metabase. Attributes in relational databases In relational databases, people often use attribute synonymously with column or field, like how a product’s Category is an attribute of (or describes) that product. This usage of attribute comes up a lot within the context of data modeling and when designing entity relationship diagrams. Example attribute Here’s a look at the People table in Metabase’s Sample Database, which includes fields like ID, Name, Address, City, State, and so on: Fig. 1. A look at the People table. Each of these fields is an attribute — the values in those fields describe something about the record they’re associated with, in this case a “person” in the People table. User attributes in Metabase Sync user attributes is only available on Pro and Enterprise plans (both self-hosted and on Metabase Cloud). Attribute can also refer to a distinct variable value that gets associated with a certain user, like a User_ID. That structure is known as a key-value pair, sometimes referred to as an attribute-value pair. In Metabase, some plans allow you to set user attributes yourself (or pass them to Metabase via SSO). You can use these user attributes to set up custom destinations on a dashboard, for example by using a user ID to parameterize a URL when that user clicks on a chart. User attributes are also an important part of data sandboxes, which give you granular control over the data that people using your Metabase instance can access. Since data sandboxes are associated with individual users, setting distinct user attributes lets Metabase know exactly how to filter a table depending on who views it.

Read More

B

BI tool

An app designed for people to look at data without relying on code.

What is a BI tool? A BI tool is an app designed for people to look at data without relying on code. These apps allow people to visualize and share data as tables, charts, and dashboards. BI tools plug into existing data sources at your organization, such as your data warehouse, CRM, or event analytics service. Common BI tools Spreadsheet applications A classic example of a BI tool is a spreadsheet app like Microsoft Excel or Google Sheets, where you can visualize data in tables, pivot tables, or charts, and share the results as individual files (or links to those files in the cloud). BI platforms A BI tool is often thought of as an app that is only used to visualize data and make reports. A BI platform like Metabase is type of BI tool that can handle additional tasks adjacent to reporting, such as data modeling, data cataloging, version control, and permissions management. How do BI tools fit into a data stack? Fig. 1. BI tools fit under the analytics component of a modern data stack. BI tools are one of many analytics tools that can be set up at the user-facing end of a data stack. BI platforms in particular can handle some of the same tasks as other parts of your stack. Here’s how you can expect the pieces to interact: BI tools vs. databases BI tools aren’t data sources — they don’t replace production databases or data warehouses for storing data owned by your organization. BI tools pull information from databases by running queries and displaying the results. BI tools vs. ETLs BI tools don’t replace ETLs (or ELTs) for ingesting or transforming large amounts of data on a schedule. However, like ETLs, some BI platforms can handle data modeling and data stitching (joining data from different databases) by running queries on the fly. BI tools vs. event analytics services Event or web analytics services like Google Analytics, Segment, or Amplitude collect usage data from your product. Although these services come with their own interface to visualize and share that data, they aren’t considered BI tools. You can think of them as mini data stacks that can be used standalone. Event analytics services can be integrated into your core data stack by combining them with a central BI tool. You can download event data from the service and move it into the data warehouse connected to your BI tool, or if supported, wire up the service to connect directly to your BI tool. In Metabase, you can add Google Analytics as a data source. BI tools vs. open source coding tools Open source tools like Jupyter Notebook and RShiny use programming languages like Python and R to work with data. They can be used to build reports and dashboards for analytics, but they aren’t considered BI tools because they rely on code rather than a visual interface.

Read More
Bar chart

A data visualization that uses rectangles that are proportional in size to the values they measure.

What is a bar chart? A bar chart is data visualization that uses rectangles (or bars) along axes that are proportional in size to the values being measured. These bars are scaled to the values associated with discrete, categorical data. Bar charts are useful when you want to compare fixed values across categories. One axis will contain the categories being measured, while the other will show their corresponding numerical values. Fig. 1. A bar chart showing the average rating of products by category. Figure 1 shows a standard column chart where the bars are displayed vertically, but you can also visualize your data with horizontal bars, known as a row chart. Figure 2 displays that same data — average product rating by category — in a row chart. Fig. 2. The same data visualized with horizontal bars, called a row chart. You aren’t limited to a single dimension with bar charts; they can be great for multi-series charting. For example, maybe you want to compare how orders performed by quarter based on user source. In this scenario, you could use a standard multi-series bar chart or a stacked bar chart, where user sources are stacked proportional to their values within a single bar, each bar displaying data from one quarter. And if you’d rather see those as relative percentages of within their stacks, consider a 100% stacked bar chart. Bar charts are widely adopted and easy to interpret, making them a good option for quick-reference comparisons.

Read More
Bin

A single range of continuous values used to group values in a chart.

What is a bin? A bin is a single range of continuous values used to group values in a chart. Binning data helps simplify data visualizations, so people can get a sense of their data’s distribution and easily spot outliers. You most often see bins used with histograms, but they aren’t exclusive to histograms and can be useful with other visualizations like line charts or pie charts. If a measure in your dataset contains a lot of unique values, plotting each individual data point on a chart can look cluttered and may not be the best representation of your data. When you bin that data, those values get grouped into equal-sized intervals (like 1–10, 11–20, 21–30, and so on), and your resulting chart will show a count of values within each bin. Data binning example Figure 1 shows the price of products in Metabase’s Sample Database, displayed as a histogram. Fig. 1. The prices of products in our Sample Database, shown as a histogram. Metabase automatically generates bins based on how the data is distributed. The bins here are the price ranges; we can see that we have more products in the $37.50–50.00 price range than any other. Metabase auto-binned these values, but we can also select the number of bins we wanted (either 10, 50, or 100) to tweak this chart further. If your bin size is too small, you’ll have too many and likely end up with a visualization that’s hard to interpret. However, too few bins will give you an incomplete or overly-compressed picture of your data’s distribution, so play around and figure out what works best for your data.

Read More

C

Card

A component of a dashboard that displays data or text.

What is a card? A card is a component of a dashboard that displays data or text. Metabase dashboards are made up of cards, with each card displaying some data (visualized as a table, chart, map, or number) or text (like headings, descriptive information, or relevant links). Cards and questions Cards on a dashboard are more than just mini versions of the questions you’ve asked. If you’re adding saved questions to a dashboard and calling it a day, you may be missing out on a lot that cards can do. You can combine multiple saved questions in a single card, as long as they share a dimension. Your cards don’t have to contain different questions either. It may be useful to put include same card on a dashboard multiple times, like if you want to visualize one question as both a line chart and a bar chart. Editing cards on a dashboard When editing a dashboard, you can: Arrange and resize cards on the dashboard’s grid. Change a card’s visualization options without affecting the underlying question. Connect cards to dashboard filters to filter the question’s results. Set a card’s click behavior to change what happens when someone clicks on a card. And while it’s great for those cards on your dashboards to look slick and visually appealing, it’s more important that you’re conveying the information that people need to see, without too much added fluff. Example of cards on a dashboard The dashboard in figure 1 includes text cards that act as headings and descriptions, as well as cards with numbers, trends, a line chart, and a region map: Fig. 1. Cards with questions and text on a dashboard. Cards and the Metabase API In the Metabase API, the api/card route refers to questions, rather than cards on a dashboard. You can still use the API to edit and get information about the cards on your dashboards (like with api/dashboard), but keep that distinction in mind, and check out the API documentation for a full list of routes and endpoints.

Read More
Collection

A set of items in Metabase, including questions, models, dashboards, and other collections.

What is a collection? In Metabase, a collection is a set of items — questions, models, dashboards, and subcollections — that are stored together for some organizational purpose. You can think of collections like folders within a file system. The root collection in Metabase is called Our Analytics, and it holds every other collection that you and others at your organization create. You may keep a collection titled “Operations” that holds all of the questions, dashboards, and models that your organization’s ops team uses, so people in that department know where to find the items they need to do their jobs. And if there are specific items within a collection that your team uses most frequently, you can pin those to the top of the collection page for easy reference. Pinned questions in a collection will also render a preview of their visualization. Fig. 1. A look at a collection in Metabase. Administrators can set collection permissions to determine which collections (and by extension, their contents) are viewable by which groups in Metabase. In addition to the top-level Our Analytics collection and whatever collections your organization creates, each Metabase user has their own Personal collection. These are semi-private, in that admin users can view the contents of everyone’s personal collections. Personal collections are a good place to store questions and dashboards that you haven’t quite perfected yet, and you can move them to a public collection whenever you’re ready.

Read More
Column

A list of values, usually belonging to a particular field, displayed vertically in a table.

What is a column? A column is a list of values, usually belonging to a particular field, displayed vertically in a table. In a relational database table, values within a column each correspond to a different record. Values in a column share a data type. That is, if a column’s data type is Integer, that means every value within that column must be an integer. There may be other constraints too, related to formatting, character length, or whether or not that value is mandatory. Example column Here’s an image of the Orders table in Metabase’s Sample Database, with the Created At column highlighted. The Created At column’s data type is DateTime, and the values in this column each correspond to the timestamp of a single order. Fig. 1. A look at the Orders table, with the Created At column highlighted. Columns vs fields While columns and fields aren’t technically the same thing, it’s usually okay to use these terms interchangeably. See Columns vs. Fields. However, keep in mind that columns don’t always directly correspond to fields in a database. For example, you may want to create a custom column in Metabase that contains calculated values, like one that displays the percentage discount for each order. You’d create this custom column by telling Metabase to calculate the Discount divided by the Subtotal and display the resulting value in a new column. Columnar storage While many traditional relational databases store data as rows and are typically best suited for holding transactional data, some databases (like data warehouses optimized for analytics) utilize columnar storage. Columnar databases (also known as column-oriented databases) physically store a column’s values together, rather than indexing based on entire rows. This can drastically speed up analytical queries and aggregate functions, since those queries will be able to retrieve similar data from the same location on a disk, rather than executing large reads across the database to pull a column’s values from individual records.

Read More

D

Dashboard

A data visualization tool that holds important charts and text, collected and arranged on a single screen.

What is a dashboard? A dashboard is a data visualization tool that holds important charts and text, collected and arranged on a single screen. Dashboards provide a high-level, centralized look at KPIs and other business metrics, and can cover everything from overall business health to the success of a specific project or campaign. The term comes from the automotive dashboard, which — like its business intelligence counterpart — provides status updates and warnings about important functions (just for things like low brake fluid instead of how your recent marketing campaign performed). Dashboard vs. report Dashboards aren’t exactly the same as reports, though you’ll sometimes hear people refer to dashboards as reports. The difference is that dashboards tend to be easier to read and understand at a glance, while traditional reports provide a more detailed look at a subject. Unlike traditional reports, dashboards are viewable on a single screen and often incorporate some interactive elements. You’re probably not going to print out a dashboard to read, which would make a lot more sense for a traditional report that draws on static, historical data. However, just like with traditional reports, you can send out updated dashboards according to a set schedule, like with dashboard subscriptions in Metabase. Dashboards in Metabase In Metabase, dashboards are made up of cards that contain either questions or text. You have a lot of options when creating and editing dashboards in Metabase, like: Arranging and resizing cards to fit your desired dashboard design. Making your dashboards interactive by setting custom click behavior and linking one dashboard to another. Adding filter widgets and wiring them up to specific fields on individual cards. Using Markdown to annotate your dashboard with text or GIFs. Sharing your dashboard with a link or by embedding it in your website or app. Example dashboard Figure 1 shows an example of a dashboard in Metabase with three question cards and three filter widgets. If someone inputs a customer ID, customer name, or date into one of the filter widgets, the charts will adjust accordingly to fit reflect that added filter. Fig. 1. A dashboard with three question cards, and three filter widgets.

Read More
Data dictionary

A document that describes the tables, fields, and other elements in a database and explains their meaning and origin.

What is a data dictionary? A data dictionary is a document that describes the tables, fields, and other elements in a database and explains their meaning and origin. Data dictionaries are repositories for a database’s metadata, storing the administrative information that people need to understand and make use of that data. Think of them like a typical dictionary, but instead of every word in a language, data dictionaries contains definitions and about the objects that make up your database. An up-to-date and comprehensive data dictionary helps make sure that everyone stays on the same page about what certain fields or tables mean in practice. Data dictionaries can also help make sure that different departments are all using those terms consistently. Data dictionaries are usually a separate file or set of files stored alongside the database they describe. While some aspects of your database’s data dictionary may be accessible to all database users (like important descriptions that everyone needs to know), other parts may only be viewable by database administrators (like technical details about the physical implementation of your database). Data dictionary in Metabase In Metabase, the data reference section acts as a data dictionary. What goes in a data dictionary? Data dictionaries collect and store metadata associated with a database, usually information like: Table and field descriptions Data types Integrity constraints Naming conventions Locations of files While the exact formatting of your data dictionary will depend on your organization and the complexity of your dataset, it’s common for data dictionaries to be formatted as a table or series of tables, with fields for metadata like field name, description, data type, character length, and whether null values are permitted. You can make a data catalog with a simple spreadsheet, within your relational database software, or even as a text document. Data dictionary vs. schema vs. data catalog There’s some overlap here with a database’s schema, but generally speaking a schema defines the structure of the database and how tables and their fields fit together, while a data dictionary provides contextual information about that data. Maybe you’ve heard about data catalogs too, another similar concept. Some organizations utilize data catalogs to better facilitate discovery and analysis of their data; they’re like data dictionaries with some added features and functionality, taking things a step further than the traditional document-based data dictionary.

Read More
Data lake

A data lake is a place to store both structured and unstructured information, typically as files or blobs.

What is a data lake? A data lake is a place to store both structured and unstructured information, typically as files or blobs. You can think of a data lake as a dumping ground for all of your data, regardless of structure, format, or intended use. The idea of a “lake” is largely marketing jargon, but the aquatic comparison comes from the idea that information in a data lake flows in a more “natural” state than that of the more rigid and hierarchical data warehouse. And because they can hold raw data that doesn’t need to adhere to a specific schema, data lakes tend to be cost-effective when scaling to store substantial amounts of information (into the petabytes). Since there’s no need to define a schema at the start, data lakes can be straightforward to set up; you can load data in for a specific use or just to keep it on hand for the future, even if you aren’t yet sure what kinds of queries you’ll need to run on it. However, once you do get things set up, configuring the tools you’ll need to actually make your data lake useful can get complex and expensive — typically requiring the expertise of data engineers. Those engineers will set up ETLs as needed, or even train machine learning models on parts of your data lake. Data lakes rely on a schema-on-read system, meaning data only gets verified against a schema once it’s pulled from that data lake for querying, rather than when it’s first written. This does mean, however, that pulling from and making use of a data lake takes more work. And just because a data lake allows for greater flexibility doesn’t mean you should thrown all data governance out the window; the information that goes into your lake should still be of good quality, cleaned and annotated so that your ETLs or query engine (and by extension, the people who need the data) can make good use of it. When to use a data lake If you need to analyze huge volumes of semi-structured and/or unstructured information (like if you’re an IoT company) then a data lake may be a good fit. Since there’s no need to enforce an overarching schema when data is written, data lakes can also be an effective solution if you’re dealing with many different types of data sources at once — like streaming data, structured application databases, data from IoT devices, social media, or web traffic. Ultimately, organizations with complex data needs may not rely exclusively on a data lake or a data warehouse (or even a data lakehouse), and instead construct data architectures that can incorporate both, taking into account the organization’s overall strategy, the needs of the people who’ll use it, and the types of queries those people will need to execute. Setting up a data lake Let’s say you want to set up a data lake. In broad strokes, the process will look something like this: Choose a cloud storage provider. There are data lake services out there that can help you set up the various layers and tools you’ll need, but at its core your “lake” is your storage layer — wherever you’re keeping that structured and unstructured data together (like in AWS S3 or Microsoft Azure). Identify your data sources. These may be structured (like an application database), semi-structured (like XML or JSON files), or unstructured (like social media posts, images, or text documents). Clean up and ingest data from those sources. At this stage, you’ll annotate those data sources (especially the semi-structured and unstructured ones), adding metadata and tagging and classifying them based on the types of questions you’re likely to ask of that data. Once that data has been cleaned up, those annotated copies get loaded into your data lake, probably in a columnar format like parquet that’s better for analytical queries. Create ETLs as needed and query your data lake. Because of their mix of formats and often-unstructured nature, engineers and data scientists are usually the ones directly accessing a data lake. People like your data analysts will query the data lake through the use of query engines like Presto or SparkSQL, which run ETLs over the data lake, structuring it on a regular schedule so the data can be queried via SQL. Those queries get executed on the cleaned-up, annotated, columnar copies of your data, rather than on the raw data sources themselves (both the raw data and the cleaned-up date are stored in your data lake).

Read More
Data model

Any pattern that organizes and labels information.

What is a data model? The term data model is used to describe any pattern that organizes and labels information. People will use “data model” as a generic way to refer to concepts like schemas, derived tables (views), or ERDs. A good data model helps people find things faster. For example, a mall directory is a data model that organizes information about the stores in the mall. It groups and labels the stores by category or location, and explains how the stores are related to one another by displaying them on a map. This model makes it easier for people to find out where to go, compared to wandering through the mall on their own, or reading through a random list of store names. Data modelling example To make decisions during data modelling, it’s best to start by figuring out what people want to look for, and why. Let’s say we want to create a data model for storing information about movies, to help people look for new things to watch. You can think of this data model as a template that can be filled out for any movie. The template should do two things: Represent the parts of a movie that are useful for finding a specific one. For example, people might search for a movie they want to watch by title, director, genre, or actor. Describe the relationships between the parts, so that it is easy to look up one group of information based on another. For example, the template should make sure that any movie title is associated with at least one director. The simplest type of data model groups related parts together into one template, and includes some information about how to fill it out. For example, the template below can be used as a data model for any movie. Movies Title: Any text (mandatory). Director: List of names (mandatory). Genre: Any text (optional). Cast: List of names (optional). The model can be expanded by adding more parts related to movies such as Release year or Run time. We can also expand on existing parts if they are useful for looking things up. For example, people might want to search for a movie by specific information about the actors in them, such as any acting awards they’ve won. Since Cast only keeps track of actor names, we can split out award information into a new data model. Acting Awards Award: Name of acting award (mandatory). Award year: Year (mandatory). Actor: First name and last name (optional). Since actor names appear in both models (either under Cast or under Actor), there is a relationship to connect the Movies model and the Acting Awards model. When both templates are filled out with real movie and awards information, people will be able to look up a movie by a particular award. The written templates above are a basic way to think about breaking down information for data models, but there are many best practices that you can follow depending on the use case. You can find examples of common data model formats in the next section. Common data models Schemas Schemas are a conceptual data model. They are used by people who work with databases. Information is represented by named columns and data types. Relationships are described by structures such as tables or JSON objects. ERDs ERDs are a visual data model. ERDs are used by people who need to talk about information management and architecture. Information is represented by different shapes, such as rectangles or diamonds. Relationships are described by different lines, such as arrows or dashed lines. Metabase models A Metabase model is a data model that you can create and save from a question or SQL query. Information is represented by named columns and any associated metadata. Relationships are described by the logic used in the question or SQL query. How people actually use the term data model You might find that different teams use the term “data model” informally to mean different things: People who write SQL may use it to refer to derived tables or views. Programmers may use it to refer to a schema or ERD. The Data Model in Metabase If you’re a Metabase admin, you’ll have access to the Data Model page in Metabase. The changes you make here will affect the way that data appears across all of Metabase. What’s the difference between the Data Model page and a Metabase model? The Data Model sits on top of the raw data warehouse tables connected to Metabase. It is a layer of modelling you can use to clean up the tables that your organization can see. You can think of it as a way to “translate” information between the data world and the business world by assigning human-readable names and saving common definitions of segments or metrics. Metabase models sit on top of the Data Model. They can be created by anyone with permissions to use the underlying database tables.

Read More

E

ERD

An ERD, or entity relationship diagram, is a graphical representation of how tables in a database connect to each other.

What is an ERD? An ERD, or entity relationship diagram, is a graphical representation of how tables in a database connect to each other. ERDs show a database’s structure (or schema) at a high-level. ERDs are a useful tool when designing a new data model or identifying issues within an existing schema. Entity relationship diagrams are basically just boxes (your entities, or tables) connected with lines (the relationships between them). Your database software may have some built-in functionality to create ERDs, but you can also use whatever design software you like best or go the analog route and draw out your ERD on a piece of paper. The how is less important; what really matters is making sure that your diagram is accurate and logical, so you can design the most effective database for your specific use case. Example ERD Here’s an example of an ERD showing Metabase’s Sample Database: Fig. 1. A simple ERD of Metabase's Sample Database. The four tables, Orders, Products, People, and Reviews, are our entities, and the connecting lines show the three one-to-many relationships between them. ERD design and notation When sketching out an entity relationship diagram, each box should contain information like that table’s name, field, and key information (primary and foreign keys). You’ll notice in the example above that each table’s key information is indicated with (PK) and (FK) next to field names. The type of line between each entity illustrates the kind of relationship each table has to another. Different organizations and industries use different conventions for ERD notation, but one of the most common is crow’s foot notation, named because the three-pronged symbol (the one that gets used for “to many”) looks a bit like the foot of a bird. Figure 2 shows the common symbols used in crow’s foot notation and their corresponding relationship types: Fig. 2. The lines used in crow's foot notation to represent different types of table relationships.

Read More
Embedding

Placing some functionality of one app inside another. Metabase uses iframes to embed questions, dashboards, or (in some plans) the full Metabase application.

What is embedding? Embedding is the process of placing some functionality of one application inside another. In analytics, this usually means integrating data visualizations into a parent application, allowing people to view charts within the context of the their own application. Embedding can save time and resources for the parent app too, allowing teams to draw on existing analytics tools rather than building everything from scratch themselves. While not the only way to embed something, embedding in Metabase involves using an iframe (an inline frame) to place a question, dashboard, or (in some plans) the full Metabase app within another application. Embedding Metabase charts and dashboards Embedding is more than just placing a static image of a chart into your site or app. Instead, that iframe creates a nested browser within your main browser or app that points to its own, separate URL. This way, the embedded Metabase chart or dashboard stays up to date. When you view an embedded chart, you’re still seeing the Metabase chart itself — just nested in the parent application. Depending on security configurations, your individual embedded charts and dashboards are either public or secure embeds. You can also configure or lock parameters to affect what people see on those charts, like in figure 1: Fig. 1. Making parameters editable and enabling dark mode before publishing a dashboard for embedding. Embedding the full Metabase application Full-app embedding is only available on Pro and Enterprise plans (both self-hosted and on Metabase Cloud). With some plans, you’re able to embed the full Metabase experience within your application. Full-app embedding is particularly useful for multi-tenant analytics, like providing your customers specific reports that they can view and interact with all while remaining in your app.

Read More

F

Filter

A filter is a predicate expression that limits the results of a query based on some stated criteria.

What is a filter? A filter is a predicate expression that limits the results of a query based on some stated criteria. For example, you may want to limit the records in your Orders table so that you only see orders where the value of the Total field is over 100. We can use a predicate expression, Total > 100, to filter the orders. For each record, the query evaluates whether that expression resolves true or false, and narrows the results accordingly. So in this case if the record has a total greater the 100, that record is included in the results. In SQL, queries are filtered using the WHERE clause, like WHERE Total > 100. You can also filter aggregations in SQL using the HAVING clause, like HAVING AVG(rating) > 3.5. Filters in Metabase Filter the results of your questions. Add filters to your dashboards. Set up cross-filtering so dashboard filters update when someone clicks on a card. Link filters on a dashboard to limit results based on the value of another filter. Configure smart field filters in native SQL queries that know which filtering options to present based on field type and column data. Create filter widgets that act as a search function on a dashboard, like for a lookup tool. Example filter in Metabase Figure 1 shows the Products table in Metabase’s Sample Database, with a filter added that narrows the results to only include products where the Title field contains the word “Hat”: Fig. 1. A question in Metabase with one filter added.

Read More

G

H

I

J

Join

The combination of results from two tables in a relational database.

What is a join? A join is the combination of results from two {% include gref key=”table” text=”tables” %} in a {% include gref key=”relational_database” text=”relational database” %}. While the word “join” makes it sound like you’re merging the tables themselves, a join actually takes the {% include gref key=”row” text=”rows” %} from two (or more) different tables and returns a new set of rows that combines {% include gref key=”column” text=”columns” %} from those tables, using {% include gref key=”entity_key” text=”entity keys” %} and {% include gref key=”foreign_key” text=”foreign keys” %} to determine which rows are related. Types of joins There are four types of SQL joins: {% include gref key=”left_outer_join” text=”Left outer join” %}: select all records from Table A, along with records from Table B that meet the join condition, if any. {% include gref key=”right_outer_join” text=”Right outer join” %}: select all records from Table B, along with records from Table A that meet the join condition, if any. {% include gref key=”inner_join” text=”Inner join” %}: only select the records from Table A and B where the join condition is met. {% include gref key=”full_outer_join” text=”Full outer join” %}: select all records from both tables, whether or not the join condition is met. Example joins in Metabase Metabase defaults to left outer joins for questions asked in the {% include gref key=”query_builder” text=”query builder” %}, but inner joins are the default for {% include gref key=”native_query” text=”native SQL queries” %} (that is, if you just use JOIN in your query rather than specifying which type of join). Let’s say we wanted to return results from both the People and Orders tables in Metabase’s {% include gref key=”sample_database” text=”Sample Database” %}, like a table of includes an order ID, the name of the person who placed that order, and their user ID. Query builder join Figure 1 shows what this join would look like in Metabase’s {% include gref key=”notebook_editor” text=”notebook editor” %}. We’d also want to pick which columns are visible, so we aren’t shown every column from both tables. {% include image_and_caption.html url=”/glossary/images/join/join-notebook-editor.png” description=”Fig. 1. A join in the query builder.” %} Native SQL query join If we were to write this same query in SQL, it may look something like this: SELECT orders.id AS "Order ID", people.name AS "Name", people.id AS "User ID" FROM people JOIN orders ON people.id = orders.user_ID Here we’ve identified where the join happens (in this case, joining at People → ID and Orders → User_ID, an entity key and foreign key).

Read More

K

L

Line chart

A type of visualization that connects discrete values connected by lines to show changes and trends.

What is a line chart? A line chart is a type of visualization that connects discrete values connected by lines to show changes and trends. Line charts are useful for tracking quantitative data over some sequence, often (though not always) time, known as a {% include gref key=”time_series” text=”time series” %}. You’ll typically plot the {% include gref key=”measure” text=”measure” %} or {% include gref key=”metric” text=”metric” %} along the y-axis, with your sequence along the x-axis, like the count of orders (y-axis) over time (x-axis). Line charts, like {% include gref key=”bar_chart” text=”bar charts” %}, are a go-to chart for visualizing changes in data and for forecasting future data based on trends. Example line chart in Metabase Figure 1 shows a line chart that plots the number of reviews from the {% include gref key=”sample_database” text=”Sample Database’s” %} Reviews table over time, broken out by month: {% include image_and_caption.html url=”/glossary/images/line-chart/line-chart-example.png” description=”Fig. 1. A line chart showing the number of reviews over time.” %} This example also includes a trend line showing the overall direction of the points on the chart. You have a lot of options when it comes to customizing a line chart, whether you want to display your data as an area chart, adjust the scale of axes in your line chart, or even track multiple series within one graph.

Read More

M

Metadata

Information that describes data to make it easier to find, manipulate, and make use of that data.

What is metadata? Metadata is information that describes data to make it easier to find, manipulate, and make use of that data. Metadata examples Think about a file on your computer, like a digital image or text document. Among many other attributes, that file has a name, file type, extension, size, and timestamps noting when it was created, last opened, and last modified. This is all metadata — none of those attributes are really the file itself, but they do tell you important things about the file. Understanding and keeping track of this metadata tells both you and your computer how that file should be sorted and handled, like indicating to your computer what software to use when opening that file. Metadata exists beyond the analytics world, and is found pretty much everywhere. It’s important in a wide variety of industries, from photography to libraries to broadcast television, since any organization that handles or generates data needs to be able to find and organize it. Metadata is sometimes human-readable (like the title of a book or field names in a database), but can also to be machine-readable, like an XML or JSON file. Metadata in relational databases and data warehouses In a relational database, metadata includes all the information that make up that database’s {% include gref key=”schema” text=”schema” %}, like the following: Table names Field names Entity keys Foreign keys Data types Views Integrity constraints However, there’s more to database metadata than just its schema. User information, business definitions, table and field descriptions, database size, and storage information are all important pieces of metadata too. Depending on how your database is configured, you may store some metadata within the database itself (like table and field names), or in a separate file or set of files that contain all of a database’s metadata. This is known as a {% include gref key=”data_dictionary” text=”data dictionary” %}. In a {% include gref key=”data_warehouse” text=”data warehouse” %}, metadata acts like an index or table of contents, defining all of the objects stored within that data warehouse, as well as information about the various {% include gref key=”etl” text=”ETL” %} jobs that manipulate data so it can be useful for those who need it. Metadata about an ETL would likely include the name of the job, its purpose, when and how often it runs, which data the job uses, and where that data ends up. And if that job is properly annotated with plenty of useful metadata, it’s then easier for you or a coworker to understand what exactly the job does and why. Using metadata in Metabase Metadata plays a big part in Metabase! For example, designating a column’s {% include gref key=”field_type” text=”field type” %} (itself a form of metadata) gives Metabase an idea of what that field actually means, so Metabase can know how to format that field or what kind of visualization to show you. Models make use of metadata too. Annotating columns with a description when creating a model can go a long way in helping people better understand your data. Figure 1 shows how those descriptions show up when hovering over a column in that model: {% include image_and_caption.html url=”/glossary/images/metadata/hover-description.png” description=”Fig. 1. Viewing the Products table’s metadata in the data reference section.” %} Finally, you can always view table metadata in the data reference section of Metabase’s data browser. Figure 2 shows the how that looks for the {% include gref key=”sample_database” text=”Sample Database’s” %} Products table. As you can see, this view provides useful information like column names, descriptions, field types, and data types: {% include image_and_caption.html url=”/glossary/images/metadata/metadata-data-reference.png” description=”Fig. 2. Viewing the Products table’s metadata in the data reference section.” %}

Read More
Metric

A metric is a calculation performed on a measure. In Metabase, a capital-M Metric is a saved aggregation with or without filters based on one table.

What is a metric? A metric is a calculation performed on a measure. Metrics are quantitative attributes of data, with some summarization applied. Metric vs. measure You’ll see the terms metric and {% include gref key=”measure” text=”measure” %} used interchangeably, and they’re pretty similar concepts, both referring to some numerical value that’s part of (or drawn from) your data. However, there’s an important distinction: measures are raw, unaggregated data, while metrics are aggregated (or summarized) data. For example, while a field like Discount is a measure, the standard deviation of that Discount field would be a metric. Some people will also use “metric” to mean a computation of measures that’s specifically related to performance goals, like CRR (customer churn rate) or NRR (net revenue retention). By this definition, a metric is basically a {% include gref key=”kpi” text=”KPI” %} (key performance indicator), depending on whether or not someone has designated that metric as “key.” Example metric If we wanted to determine the average of order subtotals in Metabase’s {% include gref key=”sample_database” text=”Sample Database” %}, we’d do so by summarizing, like in figure 1: {% include image_and_caption.html url=”/glossary/images/metric/example-metric-summarization.png” description=”Fig. 1. Summarizing the Orders table by subtotal average, a metric.” %} In this case, Subtotal is a measure, but average subtotal is our metric. Metrics in Metabase In Metabase, a capital-M Metric is a saved aggregation based on one table, with or without filters applied. If there are certain {% include gref key=”aggregation” text=”aggregations” %} that you and your team need to reference and use on a regular basis (like revenue), you may want to create a metric in Metabase so you can access it when asking questions, without rebuilding that aggregation yourself every time.

Read More

N

Normalization

The process of structuring information in a relational database to reduce redundancy.

What is normalization? Data normalization is the process of structuring information within a {% include gref key=”relational_database” text=”relational database” %} to reduce redundancy. Normalizing data ensures that the tables in a database function as efficiently as possible, eliminating ambiguity so that each table serve a singular purpose. When moving from a denormalized database to a normalized one, you’ll probably need to break out your existing tables to create additional, smaller tables. Those new tables will have a narrower focus, and will link to other tables through {% include gref key=”entity_key” text=”entity” %} and {% include gref key=”foreign_key” text=”foreign keys” %}. With normalization comes the added benefit of reducing your overall database size and easing database maintenance, since you’re no longer storing the same information in several places. Example of data normalization The normalization process is carried out according to rules that build upon each other, known as normal forms. First normal form (1NF) states that fields should not store multiple values within a single cell, and that each field within a table should be unique. Here’s an example: Denormalized table Product_ID Product_name Product_color1 Product_color2 P001 Knit cardigan Pink Maroon P002 Bootcut jeans Navy   P003 Linen vest Camel Off-white P004 Running sneakers Orange   You’ll notice that we have two fields containing similar information, about product color. To bring this table in accordance with 1NF, we need to break this table out into two separate tables that can be joined together. Normalized product name table Product_ID Product_name P001 Knit blazer P002 Bootcut jeans P003 Linen vest P004 Running sneakers Normalized product color table Product_ID Product_color P001 Pink P001 Maroon P002 Navy P003 Camel P003 Off-white P004 Orange Check out our Learn lesson for examples of 2NF and 3NF. While normal forms beyond these three exist, their use is largely theoretical and the first three should be sufficient for most practical database needs.

Read More

O

P

Parameter

A special type of variable that specifies an input to a query.

What is a parameter? A parameter is a special type of variable that specifies an input to a query. Setting a parameter lets end users input a value (like in a dashboard or report) to change what data that query returns, typically filtering by a {% include gref key=”measure” text=”measure” %} or {% include gref key=”dimension” text=”dimension” %}. The parameter passes that value through to the query being run, and the results of that query will depend on whatever value that person entered. Parameter vs. variable vs. argument You may see “parameter” used interchangeably with “variable” or “argument,” so it’s worth pointing out some distinctions here. A parameter is a type of variable; it’s just one where some specific input value gets passed along to the program or query being run. Not all variables are parameters, though — you may also have variables that are set within your program or query and can’t be modified by anyone on the other side. An argument refers to the value itself that gets passed along when your program or query runs. For example, if you set a parameter as {% raw %}{{productID}}{% endraw %}, and enter a value of 34, your argument is 34. A parameter defines that there will be an input value, but the input value itself is the argument. So yes, technically these terms all differ, but it’s okay to use them interchangeably, as long as you’re generally referring to a place or container to pass values into so you can filter results. Parameters in Metabase In Metabase, you can set a parameter using a filter widget or via a URL. Parameters come into play in Metabase in a few different ways: SQL templates By adding parameters to SQL queries in Metabase, you can create SQL templates that add {% include gref key=”filter_widget” text=”filter widgets” %} to those queries, allowing people to easily change that parameter’s value when they run that query. If we wanted to create a SQL template on a query that counted the number of customers in each state using the {% include gref key=”sample_database” text=”Sample Database’s” %} People table, we’d use: SELECT count(*) FROM people WHERE state = {%raw%}{{State}}{%endraw%} By wrapping {%raw%}{{State}}{%endraw%} in double curly brackets, we’ve created a parameter that adds a filter widget to this question, letting people input the state they want without needing to change the text of the query itself, like in figure 1: {% include image_and_caption.html url=”/glossary/images/parameter/parameter-sql-template.png” description=”Fig. 1. Creating a SQL template that adds a filter widget to a query.” %} Dashboard filters Dashboard filters let you set parameters that get applied to a dashboard. For example, you can create a dashboard filter that lets people input a State value and link that that filter to the State column in the questions or cards on your dashboard. Then when people enter the value they want (like North Carolina, shown in figure 2), they’ll see those cards change accordingly. {% include image_and_caption.html url=”/glossary/images/parameter/parameter-dashboard-filter.png” description=”Fig. 2. A dashboard with a filter applied on the State column.” %} When you enter a value into a dashboard filter, you’ll notice that the URL changes to include that value. Custom destinations You can also insert parameters into a URL to dictate what happens when people click on a chart in a dashboard. For example, you can set a custom destination by using values from the card’s results to construct a URL that directs people to another dashboard or external site with that ID as part of the URL. Maybe you have a dashboard containing questions that track how different products in your inventory have sold, and a dashboard filter that lets people input the product they want to check in on. You could take things a step further by passing that product’s ID onto a custom destination, parameterizing a URL with that ID value and sending people to that product’s page on your website. When you visit that site, its URL may look something like this: https://www.your-website.com/products/id?productID=34 In this case, that productID=34 in the URL is your parameter. Embedding When embedding Metabase questions and dashboards in your app, you can set parameters to customize what different users see when viewing those embeds.

Read More
Pivot table

A data visualization that summarizes rows and columns of a table and lets you rotate (pivot) the columns.

What is a pivot table? A pivot table is a data visualization tool that summarizes rows and columns of a table and lets you rotate (“pivot”) the columns to view those summaries in different ways. The summary rows are usually subtotals or grand totals, though they can also be other {% include gref key=”metric” text=”metrics” %} like averages. This ability to rotate columns by 90 degrees, so that the values in that column become the columns themselves for the pivoted tables, can be really helpful when trying to analyze data across multiple {% include gref key=”dimension” text=”dimensions” %}, like time, location, and category. Example pivot table If we want to see how orders perform over the days of the week, broken out by different product categories, a pivot table is a great choice, as it’ll give us an easy-to-digest glimpse at a lot of numerical data. That pivot table may look something like this: {% include image_and_caption.html url=”/glossary/images/pivot-table/pivot-table.png” description=”Fig. 2. A pivot table containing information about how four product categories performed on different days of the week.” %} It’s nice to have those summary rows that include totals for each day of the week, but it’s still not easy to compare category sales on different days. However, if we pivot the Created At column so that its values become the column headings, our result looks like this: {% include image_and_caption.html url=”/glossary/images/pivot-table/pivot-table-pivoted.png” description=”Fig. 2. Our same pivot table, with the Created At column pivoted, giving us a better look at all of the data in our table.” %} Now we can quickly compare orders across day and product category, while still seeing the totals for both, without having to scroll through a long list of rows.

Read More
Predicate

An expression that evaluates to either true or false, like quantity > 0. True and false values are known as Boolean values.

What is a predicate? In SQL, a predicate is a type of conditional expression that evaluates to either true or false, like quantity > 0. Including a predicate in your query narrows down your results by filtering out unwanted rows based on whether that expression returns true or false. Predicate expressions all contain some sort of comparison element, like =, >, or <. When evaluated, the resulting true and false values are known as boolean values, though not all databases support boolean values as a {% include gref key=”data_type” text=”data type” %}. Not all databases support the same list of predicates either, especially predicates beyond mathematical comparisons (like BETWEEN or ISNULL), so check out your database’s documentation to know for sure which predicates will work for your use case. Null values: not zero, just not there While predicates typically evaluate to one of two boolean values (like true or false), if the field being evaluated lacks a value entirely, it’s known as null. That doesn’t mean its value is zero, but that rather that is no value present in that field. If your predicate expression requires that quantity > 0, then a row without values will not return true or false, but rather will return null. Example predicate An example of a predicate is condition that follows WHERE in a simple SQL SELECT query, like so: SELECT * from orders WHERE subtotal > 35 In this case, our predicate expression is subtotal > 35. Each row in the Orders table has a value in the Subtotal field, and for each row, this predicate evaluates whether it’s true or false that the subtotal is greater than $35. From there, our query returns only those rows with a subtotal greater than $35. In Metabase’s {% include gref key=”query_builder” text=”query builder” %}, you use predicates when {% include gref key=”filter” text=”filtering” %} your data. You can also write your own predicates in the {% include gref key=”notebook_editor” text=”notebook editor” %} using custom expressions. In the question below, we’re filtering the People table in the {% include gref key=”sample_database” text=”Sample Database” %} to only show us records where the State field equals Montana, or state = MT: {% include image_and_caption.html url=”/glossary/images/predicate/predicate-filter.png” description=”Fig. 1. A predicate expression (or filter) in Metabase’s query builder that will return only records where the State field equals Montana (MT).” %}

Read More

Q

Query builder

The graphical interface for asking questions in Metabase.

What is the query builder? In Metabase, the query builder is the graphical interface for asking questions. If you aren’t a {% include gref key=”sql” text=”SQL” %} person or just prefer to analyze your data using buttons and dropdowns instead of code, the query builder’s got you covered. And if you aren’t exactly sure what you’re trying to figure out about that data, those buttons and dropdowns can give you some ideas, like listing options for the filters and groupings you can add to your starting {% include gref key=”table” text=”table” %}, {% include gref key=”model” text=”model” %}, or {% include gref key=”saved_question” text=”saved question” %}. Asking questions with Metabase’s query builder There are a couple ways you can use the query builder to ask questions about your data: Start from the data browser. Add {% include gref key=”filter” text=”filters” %} and summarizations using the sidebars to the right of your data visualization. Create your question from scratch using the query builder interface. The query builder offers more flexibility for constructing a question: in addition to the regular filtering and summarization options, you can use {% include gref key=”custom_expression” text=”custom expressions” %} to create more sophisticated filters and aggregations. You can also {% include gref key=”join” text=”join tables” %}, create {% include gref key=”custom_column” text=”custom columns” %}, and preview your results at each step before visualizing the final product. These paths aren’t mutually exclusive — you can start in the data browser, visualize your data, use the sidebars to tweak your question, open the query builder to make additional changes, and so on. Example: using the query builder We’ll use the query builder to construct a question using Metabase’s {% include gref key=”sample_database” text=”Sample Database” %}. Let’s say we want to know how our large orders (that is, orders with a Subtotal greater than $100) are broken out by Product → Category. Figure 1 shows how we’d construct this question in the query builder: {% include image_and_caption.html url=”/glossary/images/query-builder/notebook-editor.png” description=”Fig. 1. Asking a question using the query builder.” %} Once we visualize our question, let’s add another filter so we’re only viewing full-price orders (orders where there was no discount applied). Figure 2 shows what our query builder looks like just before adding that second filter: {% include image_and_caption.html url=”/glossary/images/query-builder/second-filter.png” description=”Fig. 2. Adding a second filter while visualizing our data in the query builder.” %}

Read More
Question

In Metabase, a question is a query, its results, and its visualization.

What is a question? In Metabase, a question is a query, its results, and its visualization. If you’re trying to figure something out about your data in Metabase, you’re probably either asking a question or viewing a question that someone else on your team created. In everyday usage, question is pretty much synonymous with query. What you can do with questions in Metabase You can ask questions in Metabase using the graphical {% include gref key=”query_builder” text=”query builder” %} or the {% include gref key=”native_query_editor” text=”native query editor” %}, and then do things like: Save your question to a {% include gref key=”collection” text=”collection” %} so that you can come back to or build on it later. Add that question to relevant {% include gref key=”dashboard” text=”dashboards” %}. Questions on a dashboard are known as {% include gref key=”card” text=”cards” %}. Set up email or Slack {% include gref key=”alert” text=”alerts” %} on your question. Share the results of your question by sending links to people on your team — even to questions that you haven’t saved. Download the results of your question as CSV, XLSX, or JSON. Convert your {% include gref key=”saved_question” text=”saved question” %} to a {% include gref key=”model” text=”model” %}. Example question Figure 1 shows a question based on Metabase’s {% include gref key=”sample_database” text=”Sample Database” %} — the average rating of our company’s Products, broken out by Category. Here we’ve visualized this question as a {% include gref key=”bar_chart” text=”bar chart” %}: {% include image_and_caption.html url=”/glossary/images/question/example-question.png” description=”Fig. 1. An example question with one summarization, visualized as a bar chart.” %} And figure 2 shows what this same question looks like as a {% include gref key=”table” text=”table” %}: {% include image_and_caption.html url=”/glossary/images/question/example-question-table.png” description=”Fig. 2. The same question, visualized as a table.” %} Questions and the Metabase API In the Metabase API, you can edit and get information about questions in your Metabase instance using the api/card route.

Read More

R

Record

A group of related data with the same structure. A relational database stores each record as a row in a table.

What is a record? A record is a group of related data with the same structure. Just like in a traditional spreadsheet, records in a {% include gref key=”relational_database” text=”relational database” %} are stored as horizontal {% include gref key=”row” text=”rows” %} within a table, and contain values that correspond with that table’s fields, or columns. Records typically reference a single unit, whether that’s a customer, an order, a session, or some other object that your database captures. A record in a database is usually identified by its value in that table’s {% include gref key=”entity_key” text=”entity key” %} field. Example record Let’s take a look at the Orders table in Metabase’s {% include gref key=”sample_database” text=”Sample Database” %} (figure 1). {% include image_and_caption.html url=”/glossary/images/record/orders-table.png” description=”Fig. 1. The Orders table, where each horizontal row is one record, or group of related data.” %} We see the {% include gref key=”field” text=”fields” %} in this table (the columns), like ID, User ID, Product ID, Subtotal, and so on. Each record has values that correspond with those fields, and together, those related properties make up one record. For example, we can see that the record (or row) with the ID of 8 was an order with a subtotal of $68.23, a discount of $8.65, and was created on June 17, 2019. The record right below it, with an ID of 9, follows the same structure, even though its values differ. We can click on the ID field to get a better view of a record itself, like in figure 2: {% include image_and_caption.html url=”/glossary/images/record/record-8.png” description=”Fig. 2. Viewing the individual record for the order with an ID of 8.” %}

Read More
Relational database

A collection of tabular data, or the application that manages the storage and retrieval of tabular data.

What is a relational database? A relational database is a collection of tabular data, or the application that manages the storage and retrieval of tabular data. Relational databases contain {% include gref key=”table” text=”tables” %}, made up of {% include gref key=”column” text=”columns” %} (also known as {% include gref key=”field” text=”fields” %}) and {% include gref key=”row” text=”rows” %} (also known as {% include gref key=”record” text=”records” %}). You’ll establish relationships between tables in a database by assigning a single field to two or more tables. For one of those tables, that field will be designated as an {% include gref key=”entity_key” text=”entity key” %}, while for the other(s) it’ll be a {% include gref key=”foreign_key” text=”foreign key” %}. With these relationships in place, you can query data (probably using {% include gref key=”sql” text=”SQL” %}) across tables without having to reorganize or duplicate that data. Introduced in the early 1970s, relational databases remain a (if not the) dominant model for structuring data today. While technically a relational database refers to your data itself and a relational database management system (RDBMS) refers to the software application you use to manage that data, in reality people use the terms interchangeably. The relational model is so prevalent that in many contexts, the word “database” itself implies a relational one, unless otherwise specified. Example relational database Metabase’s {% include gref key=”sample_database” text=”Sample Database” %} (the one you see used in examples throughout our docs and tutorials) is an H2 relational database. Figure 1 shows a look at the four tables in the Sample Database: {% include image_and_caption.html url=”/glossary/images/relational-database/tables-in-sample-db.png” description=”Fig. 1. Metabase’s Sample Database (a relational database) contains four tables: Products, Orders, People, and Reviews.” %}

Read More
Row

A single group of related data within a table.

What is a row? A row is a single group of related data within a {% include gref key=”table” text=”table” %}. {% include gref key=”relational_database” text=”Relational databases” %} contain tables with rows and {% include gref key=”column” text=”columns” %} (also known as {% include gref key=”record” text=”records” %} and {% include gref key=”field” text=”fields” %}, respectively). Columns are vertical, and hold a list of values all from the same field. Rows are your horizontal elements in a table. The values in a row belong to different fields, but all refer to a single unit (like a row that contains information about one customer or one order). Row vs. record vs. tuple Technically speaking, a row is the underlying logical grouping of related data in a table, while a record refers to that same grouping within the context of an application. When it comes to inserting, updating, or deleting in {% include gref key=”sql” text=”SQL” %}, you’re dealing with rows, not records. This is a pretty subtle distinction though, and most people use these terms interchangeably. You’ll sometimes see tuple used as well, which is a mathematical term for the same concept. Example row in Metabase Figure 1 shows the Products table in Metabase’s {% include gref key=”sample_database” text=”Sample Database” %}, with one row highlighted. This row contains information from several different fields, each value relating to one of the products in our inventory (in this case, the Enormous Aluminum Shirt). {% include image_and_caption.html url=”/glossary/images/row/example-row.png” description=”Fig. 1. A row in the Products table.” %}

Read More

S

SQL

A standardized and widely-used language for accessing and manipulating data in a relational database.

What is SQL? Structured query language (known as SQL) is a standardized and widely-used language for accessing and manipulating data in a {% include gref key=”relational_database” text=”relational databases” %}. Using SQL involves writing and executing structured commands, known as statements, that communicate to a database what information you need or what you want to change. SQL is a published ANSI and ISO standard, meaning there are established rules about what exactly the language includes and how it works. However, SQL-based database systems (like PostgreSQL, MySQL, SQL Server, etc.) each have slightly different functions and their own syntactical quirks — no major databases conform 100% to the official written standard. Using SQL, you can: Create and configure databases, tables, and indexes Insert, update, and delete information in databases Retrieve information from databases (commonly known as querying) Set and adjust database permissions Is it pronounced “S.Q.L.” or “sequel”? Opinions are split on the matter of pronunciation, with some of those opinions very strongly held. When computer scientists Donald Chamberlin and Raymond Boyce first developed the language specification in the early 1970s, they called it “SEQUEL” (pronounced “sequel”), but changed the language’s name to SQL when faced with a trademark dispute. The ANSI and ISO standards dictate that the official pronunciation is the initialism (“S.Q.L.”), but both pronunciations are common today. So go with whatever sounds best to you — just don’t be surprised when someone disagrees with you. Querying databases with SQL No matter how advanced or complex, all SQL queries involve telling a database to return certain columns from a table (or tables), and then optionally specifying conditions about which rows should be included in those results and how they should be presented. SQL is case-insensitive, but you’ll often see people capitalizing reserved words (e.g., functions and clauses like SELECT, WHERE, HAVING, or ORDER BY). You can format your SQL statements as a single line if you like, though people will usually break their queries out onto separate lines for readability. Example SQL query Here’s a SQL query that asks Metabase’s {% include gref key=”sample_database” text=”Sample Database” %} to return a table of orders where the order subtotal was greater than $100: SELECT * FROM orders WHERE subtotal > 100 We can break this query down into three statements: SELECT * tells the database to return every {% include gref key=”column” text=”column” %} in the table. FROM orders tells the database which table that is. WHERE subtotal > 100 tells the database to {% include gref key=”filter” text=”filter” %} results and only return {% include gref key=”row” text=”rows” %} where the value in the Subtotal field is greater than 100. The example query above is a pretty simple one; more advanced queries can include {% include gref key=”join” text=”joins” %}, {% include gref key=”aggregation” text=”aggregations” %}, {% include gref key=”cte” text=”CTEs” %}, and other tools for pulling and organizing data. SQL in Metabase You don’t have to write SQL when asking {% include gref key=”question” text=”questions” %} in Metabase (that’s what the {% include gref key=”query_builder” text=”query builder” %} is for), but if you prefer SQL queries, the {% include gref key=”native_query_editor” text=”native query editor” %} is there for you, along with features like: SQL variables (including field filters) SQL snippets SQL snippet controls (available in some plans) And if you do opt for using the query builder to ask questions in Metabase, you can always view the underlying SQL that powers your question or convert it to a native SQL query.

Read More
SSO

An authentication (auth) setup that lets people use one login to access independent apps.

What is SSO? SSO is an authentication (auth) setup that lets people use one login to access independent apps. It’s a bit like using your passport to get into different countries. With SSO, you don’t need to use one login for each account, just as you don’t need to travel with different pieces of ID specific to each country. For example, you might have two different logins that are used with two different sign-in pages for your email and your online banking. If the IT teams at your email provider and bank each set up SSO, you would be able to use one sign-in page and login to access both websites. {% include image_and_caption.html url=’/glossary/images/sso/google-sign-in.png’ description=”Fig. 1. You’ll probably recognize this Sign in with Google prompt from other apps. This is an example of SSO set up with Google sign-in.” %} How does SSO work? Since digital authentication can’t be done in person, your identity on the internet is checked by a service that asks you for proof of something you know (like a password), or proof of something you physically have (like sending a code to your phone). These pieces of proof are called identity factors. The most common way to authenticate is through a single identity factor, such as a password. If you add another factor, like a phone number, you get two-factor authentication (2FA). Naturally, you can continue adding factors, (an email, a verification app, etc.), to get multi-factor authentication (MFA). Instead of using information that you know or have, SSO uses an identity factor called an authentication token that belongs to the SSO provider. An authentication token is a unique, anonymous piece of information that’s generated when you sign in to the SSO provider. The token is temporarily stored in your browser (like a browser cookie) or on the provider’s servers, and is only valid for a period of time — usually until you close the browser, or within an expiry window set by your security team. When you go to an app that is set up with SSO, it’ll automatically ask for the authentication token from the SSO provider, instead of asking you to log in. If the token is still valid (you’ve signed in to the SSO provider in the same session, and it hasn’t expired), your identity is considered authenticated, and you’ll be allowed into the app. If the token has become invalid, you’ll be prompted to sign in with the SSO provider to create a fresh one. Where does SSO fit into the bigger picture? Authentication only deals with who you are (identity). From there, other services keep track of where you are allowed to go, and what you can do once you get there (access management). Identity and access management (IAM) is the umbrella term for these tools and processes. SSO and other parts of an IAM toolkit are often packaged up by identity providers (IdPs), such as Okta, Auth0, or OneLogin, and implemented as part of cloud security. SSO in Metabase Setting up SSO in Metabase means that people won’t need to create a separate Metabase username and password to access your organization’s data. They can simply log in through the same account as your chosen identity provider. The open-source edition of Metabase can be set up with Google SSO or {% include gref key=”ldap” text=”LDAP” %}. Paid editions of Metabase work with {% include gref key=”saml” text=”SAML” %} and {% include gref key=”jwt” text=”JWT” %} standards (in addition to Google SSO and LDAP). SSO can also be combined with {% include gref key=”data_sandbox” text=”data sandboxing” %} in Metabase paid plans to define the data that people can see and interact with, based on user {% include gref key=”attribute” text=”attributes” %} such as their department or role.

Read More
Schema

The design or structure that defines the organization of a dataset, including its tables, columns, relationships, data types, and integrity constraints.

What is a schema? A schema is the design or structure that defines the organization of a dataset: which columns are grouped into tables, how those tables relate to each other, and the rules and data types that define those columns. Schema is an overloaded term; it’s an abstract word that has accumulated a lot of different definitions, and as a result can be confusing to sort out. Depending on the context, schema can mean: The overall structure, specification, or “blueprints” of your database A diagram that demonstrates how tables in your database relate to each other A single collection of tables (among many) within your database Finally, schema sometimes means something specific to whichever database platform you’re using, like in Oracle, where schema refers to all objects in a database created by the same user. Schema as overall structure: design and implementation Once you’ve figured how your data fits together from a high-level standpoint (that is, your conceptual {% include gref key=”data_model” text=”data model” %}), the next step is creating a schema that reflects that data model, bringing it from the abstract to a database that your organization can use and populate with information. Broadly speaking, this process is made up of two major steps: Design: map out the structure of your database, creating an entity relationship diagram (ERD) in the process. Implementation: Use that ERD to generate the SQL commands that, when run in your database, will create your desired schema. What your schema design process looks like depends on whether you’re dealing with transactional or analytical databases, and whether you’re starting from scratch or have already begun collecting data. Regardless of at which point you’re designing schemas, you’ll have to think deeply about the needs of your organization and what questions you anticipate asking of your data. Schema-on-write vs. schema-on-read Most traditional {% include gref key=”relational_database” text=”relational databases” %} use a schema-on-write system, where data gets verified and formatted into a schema before it’s written to that database. Since the data being written must conform to whatever specific data integrity rules you’ve established (like requiring that all values in a field be unique, not accepting null values in a field, or formatting dates a certain way), adding this new data to your database can be slow. However, the read times are fast, since that data has already been verified. In a schema-on-read system, data (like in a {% include gref key=”data_lake” text=”data lake” %}) is only verified once it’s been read, or pulled from that database. Schema-on-read systems tend to be more flexible, as you can store unstructured data without worrying about it conforming to a rigid data model. In this case, writing data is faster (since that data doesn’t need to be verified as it gets loaded in), but queries take more time to execute. Whether you opt for a schema-on-write or schema-on-read strategy will depend on your organization’s needs and specific use cases. If having meticulously structured and consistent datasets is important to your organization, a schema-on-write system may be your best bet. By contrast, if you regularly need to pull in a wide variety of data without always knowing exactly what it that data looks like, you may want to use a schema-on-write system. Logical and physical schemas Regardless of whether you’re working with a schema-on-write or schema-on-read system, you’ll also need to think about database structure and its implementation — that is, your logical and physical schemas. Logical schemas define the structure of your data, while the actual implementation of that structure (like how and where you store the files and code that make up your database) belongs to a physical schema. Logical schemas Logical schemas are created by diagramming how tables and their fields relate to each other. In creating a logical schema, you’ll establish tables, relationships, fields, and views, answering questions like: What data are we collecting, or do we want to collect? What tables does your database (or individual schema within it) need? How do those tables relate to each other? What fields does does each table need? What are the data types for each of those fields? Which fields are required? Schema as diagram: mapping out entities and relationships In answering these and other questions, you’ll likely sketch out an {% include gref key=”erd” text=”entity relationship diagram (ERD)” %} that defines each table, its fields, their integrity constraints, and the relationships between those tables, including the {% include gref key=”entity_key” text=”primary” %} and {% include gref key=”foreign_key” text=”foreign keys” %} that establish those connections and whether those relationships between tables are one-to-one, one-to-many, or many-to-one. Visualizing your tables and how they relate to each other can also bring to light any major omissions or conflicts. And yes, sometimes you’ll see these diagrams themselves referred to as schemas. The image below shows an entity relationship diagram of a schema with two tables, PRODUCTS and MANUFACTURER. The “(PK)” and “(FK)” notations tell us which fields are primary and foreign keys, and the line linking these tables indicates a one-to-many relationship, in that one manufacturer can be linked to many products. {% include image_and_caption.html url=”/glossary/images/schema/simple-erd.png” description=”Fig. 2. An entity relationship diagram of a schema with two tables.” %} You can map out your schema on paper or using design software that can directly translate your diagram to the SQL commands that you’ll need to implement your database. At this point your schema is platform-agnostic; mapping out those rules and relationships doesn’t tie you to any single database software. Physical schemas Once you’ve identified the logical configuration of your database, you’ll create a physical schema to implement it into a specific RDBMS, defining where your database files will live as well as their storage allocation on a disk. Schema as one collection of tables among many While a single collection of tables may be sufficient if your database only sees a few users and contains data that everyone needs to access, you may find that a relying on a single schema in your database doesn’t cut it for your organization. If you’re handling data across a lot of tables (think in the dozens, hundreds, or thousands), grouping those tables into separate schemas will help from an organizational standpoint, making it so you can store similar information together while retaining the ability to query across schemas when necessary. Keeping multiple schemas within a database can be helpful from a security standpoint as well, like separating tables that hold sensitive information into a schema that only those who need to can access, usually in combination with {% include gref key=”view” text=”views” %}. Schema design for transactional vs. analytical databases When thinking about schemas for {% include gref key=”transactional_database” text=”transactional databases” %} (also known as operational databases), your data will need to be normalized to some extent and adhere to data integrity standards, since efficiency and performance for those small transactions and {% include gref key=”oltp” text=”OLTP” %} are crucial. Designing a schema for an {% include gref key=”analytical_database” text=”analytical databases” %} will look different. For starters, you’ve probably already collected raw data, possibly from multiple sources, and now need to impose some structure in order to analyze it. In this case, redundancy is okay, as analytical databases place greater emphasis on explorability and less on performance. Here your schema can also be more loosely defined, as no fixed patterns (like {% include gref key=”normalization” text=”normalization” %}) are needed. Schema design for analytical databases is more about understanding where data from your various sources lives, and knowing what tables you’ll need to join to answers questions you have. Star schema One common structure you’ll see applied to analytical databases is a star schema, which separates data into fact tables (that is, quantitative data) that relate to multiple {% include gref key=”dimension” text=”dimension” %} tables describing those facts. In a simple implementation of a star schema, several dimension tables all surround and relate to a single fact table, looking like a star in diagram form with the fact table at its center, like so: {% include image_and_caption.html url=”/glossary/images/schema/star-schema.png” description=”Fig. 2. An entity relationship diagram of a simple star schema.” %} Tables within a star schema are typically denormalized, which leads to better performance for analytical queries. Creating a database schema Most database platforms (such as Redshift and PostgreSQL) use “schema” to mean the configuration of a dataset and non-nested groupings of tables and other named objects within that dataset, though Oracle defines schema as all of the objects created by and belonging to a single database user. To create a schema within your RDBMS, use the query CREATE SCHEMA, like in this example, where we create a schema with two tables that are linked by the customer_id field: CREATE SCHEMA new_schema; CREATE TABLE new_schema.orders ( order_id product_id customer_id subtotal order_date ) CREATE TABLE new_schema.customers ( customer_id customer_name customer_address customer_email ); This is a very simple schema; we didn’t specify data types or any other constraints on the fields in our tables. If we wanted to require the customer_id field in the customers table and indicate that its data type is an integer, we’d format that field like this: customer_id INT NOT NULL Note that in MySQL, CREATE SCHEMA is synonymous with CREATE DATABASE.

Read More
Segment

A specific subset of a larger group of items, like a certain grouping of customers.

What is a segment? A segment is a specific subset of a larger group of items, like a certain grouping of customers. The process of defining and creating these subsets is known as segmentation. For example, you may want to segment your customers based on demographic, recent activity, or some other {% include gref key=”attribute” text=”attribute” %}. In Metabase, Segments are named {% include gref key=”filter” text=”filters” %} or sets of filters that people can apply to {% include gref key=”question” text=”questions” %} asked using the {% include gref key=”query_builder” text=”query builder” %}. Admins can define segments to get everyone on the same page about which customers count as “New user” or which are a “Churn risk”, or however else you want to slice up your data. If you’re partial to {% include gref key=”native_query” text=”native SQL queries” %}, {% include gref key=”sql_snippet” text=”SQL snippets” %} serve a similar purpose as segments do for {% include gref key=”gui_question” text=”GUI questions” %}, ensuring consistent definitions that people can plug into their queries. Example segment in Metabase Figure 1 shows a look at the Products table in Metabase’s {% include gref key=”sample_database” text=”Sample Database” %}. You’ll notice that the filter sidebar shows three segments at the top of the list, identifiable by a star icon. With these segments in place, we won’t need to recalculate what a high margin, new, or top rated product is each time we want to draw on those definitions. {% include image_and_caption.html url=”/glossary/images/segment/segment-example.png” description=”Fig. 1. Three segments that we’ve added to the Products table: high margin, new products, and top rated products.” %}

Read More
Serialization

The process of storing the state of an application or other object as a file so that it can be reloaded (or deserialized) later.

What is serialization? Serialization is the process of storing the state of an application or other object as a file, so that it can be reloaded (or deserialized) later. In the serialization process, that complex object gets converted to a stream of data, one that’s much easier to transfer over a network or copy to a new instance. Think of serialization like taking a snapshot of some application, but with the power to turn that flattened snapshot back into a live object whenever you’re ready. Serialization comes in handy if you need to manage multiple instances of an app, like development or staging instances in addition to the ones you use for production. Serialization can also be used to customize an initial state of an application, like preloading a Metabase instance with connection details to a database, predefined groups, {% include gref key=”dashboard” text=”dashboards” %}, and so on. Serialization in Metabase {% include plans-blockquote.html feature=”Serialization” %} Metabase’s serialization feature (available in some plans) lets you capture and export the contents of your Metabase so that you can reload them into another instance — or several. Those Metabase artifacts you capture when serializing are collectively known as a data dump, containing YAML files that hold your {% include gref key=”collection” text=”collections” %}, dashboards, {% include gref key=”saved_question” text=”saved questions” %}, and some system settings — check the documentation for the full list.

Read More
Summary table

The result of an aggregation that gets saved in a database or data warehouse so that people can work with those precomputed metrics.

What is a summary table? A summary table is the result of an aggregation that gets saved in a database or data warehouse so that people can work with those precomputed metrics. The term “summary table” can get confusing, since some people use “summary table” to describe any result of an {% include gref key=”aggregation” text=”aggregate function” %}, like the table that you get after filtering and grouping by some {% include gref key=”measure” text=”measures” %} and {% include gref key=”dimension” text=”dimensions” %}. By this definition, a summary table is basically the same thing as a {% include gref key=”pivot_table” text=”pivot table” %}, minus the pivoting. The difference here comes down to whether or not those tables get saved within your {% include gref key=”data_warehouse” text=”data warehouse” %}. Creating summary tables in your data warehouse can make it easier for people to generate reports without having to query raw data. In this sense, summary tables function a lot like materialized views (which don’t necessarily aggregate data). Example: summary tables in a data warehouse For example, maybe you’re working with an {% include gref key=”analytical_database” text=”analytical database” %} that uses a star schema setup, with a fact table containing tens of thousands of individual orders records, surrounded by dimension tables that describe those orders. If someone at your organization wants to generate a weekly report containing sales data by product category from the last seven days, calculating that from your raw fact and dimension tables every time will be inefficient and costly. Instead, creating a summary table lets you join those tables and aggregate that data far less often. Then in the future when someone creates that report, they can do so using the summary table as a base, rather than needing to calculate those numbers every time from scratch. While there’s some maintenance associated with summary tables (like making sure your data refreshes on a schedule or adjusting the filters and groupings if they aren’t exactly what people need), they still tend to be a highly efficient way of working with large datasets.

Read More

T

Table

Data’s natural habitat. In a database, a table is a series of fields, with the values of those fields arranged in rows.

What is a table? A table is data’s natural habitat. In a {% include gref key=”relational_database” text=”database” %}, a table is a series of {% include gref key=”field” text=”fields” %}, with the values of those fields arranged in {% include gref key=”row” text=”rows” %}, each row with a value corresponding to a field. Tables in databases are connected via {% include gref key=”entity_key” text=”entity” %} and {% include gref key=”foreign_key” text=”foreign keys” %}. Databases are made up of tables, but table can also refer to a type of data visualization, or chart. The table visualization resembles a spreadsheet, with {% include gref key=”column” text=”columns” %} corresponding to fields (or {% include gref key=”aggregation” text=”aggregations” %}, in the case of {% include gref key=”custom_column” text=”custom columns” %}). Example table in Metabase Metabase’s {% include gref key=”sample_database” text=”Sample Database” %} contains four tables: Products, Orders, People, and Reviews. Figure 1 shows a {% include gref key=”question” text=”question” %} in Metabase visualized as a table. This question adds one {% include gref key=”filter” text=”filter” %} to the People table, so that our result is a table of our customers whose State is Georgia: {% include image_and_caption.html url=”/glossary/images/table/example-table.png” description=”Fig. 1. A table showing People in Georgia (GA).” %} With the toggle below the table, we can alternate between viewing this question as a table and as a {% include gref key=”pin_map” text=”pin map” %} (or whatever other visualization you’ve chosen).

Read More

V

Variable

Any value in a program or query that can change. In Metabase, variables in SQL queries get enclosed in double braces.

What is a variable? A variable is any value in a program or query that can change. In {% include gref key=”sql” text=”SQL” %}, declaring a variable allows you to temporarily store a single value while running a query. {% include gref key=”parameter” text=”Parameters” %} are a type of variable, but not all variables are parameters. When people talk about parameters, they’re usually referring specifically to variables that get modified by the end user of a dashboard or report, rather within the text of a query itself. Example variable in Metabase In Metabase, variables are placeholders in {% include gref key=”native_query” text=”SQL queries” %} for values that people can change without needing to rewrite the query itself. Using variables allows you to {% include gref key=”filter” text=”filter” %} your data, often by adding a {% include gref key=”filter_widget” text=”filter widget” %} above that question in the {% include gref key=”native_query_editor” text=”SQL editor” %}. Variables get enclosed in double braces, like this: {% raw %}{{variable_name}}{% endraw %}. In the example below, we create a variable to filter based on the Source field in the {% include gref key=”sample_database” text=”Sample Database’s” %} People table: SELECT * FROM people WHERE source = {% raw %}{{source}}{% endraw %} When you include a variable in your query (in this case {% raw %}{{source}}{% endraw %}), Metabase adds a filter widget above the SQL editor, like in figure 1. Since the filter widget maps to the variable we created, we can plug different values into it to filter for different sources. {% include image_and_caption.html url=”/glossary/images/variable/variable-example-source.png” description=”Fig. 1. The filter widget above the query editor maps to the variable wrapped between double braces.” %}

Read More
View

A query and its results that function like a virtual table in your database.

What is a view? A view is a query and its results that function like a virtual table in your database. Databases compute views on demand, meaning they aren’t precomputed or materialized, and for that reason don’t occupy any storage space in the database. You can think of views as virtual or logical tables. Database views let you combine information from multiple tables and format that information however best makes sense for the people who need to query it. You (or a database administrator) can create a view that hides unnecessary fields in a cluttered table or joins tables to bring together relevant data. By using the view as the starting point, people won’t need to run the same complex queries every time just to get to their actual question about the data. The downside to querying views is that those queries can be time-consuming to run, especially if that view is the result of several tables or multiple {% include gref key=”join” text=”joins” %}. Database administrators also use views for security purposes, like creating views that hide certain fields that exist in the base table. That way, other users can still access and query the data they need without gaining access to sensitive fields or rows. View vs. materialized view If views are virtual tables (computed as needed), then materialized views are like regular tables in a database. While views require that a query be rerun every time that view is referenced, a materialized view is a precomputed view that’s saved in the database. So, materialized views take up space in your database, but since the database doesn’t have to compute materialized views each time, they perform much faster when querying than standard database views (it’s like querying a normal table). When you should (and shouldn’t) use database views Creating views in your database is a good idea if: You need to access the results of a complex query on a regular basis and don’t want to type that query out every time. You’re looking to strengthen database security by restricting access to sensitive information. You want to create custom columns without altering the underlying structure of your database. You’d like to simplify the appearance of your tables by hiding fields that are unlikely to be queried. However, if the underlying structure of your database is subject to change, you probably don’t want to rely on views; the moment a field name changes, the query you’ve established as a view may break. Your BI tool probably has features that function sort of like views too, whether it’s a {% include gref key=”model” text=”model” %}, {% include gref key=”saved_question” text=”saved question” %}, or {% include gref key=”sql_snippet” text=”SQL snippet” %}. The important distinction here is that those are all features that exist with the world of the BI tool, whereas views (materialized or not) are built into your database itself. Example view Using Metabase’s {% include gref key=”sample_database” text=”Sample Database” %}, let’s say we wanted to create a view based on the People table that our team in Pennsylvania could use to access information like the names, addresses, birthdays, and emails of our PA-based customers, but not user passwords. We’d create that view in our database by running query shown below, which creates the view, names it pennsylvania_customers, includes only the columns we want from the People table, and only displays records where the value in the State field is the abbreviation for Pennsylvania (PA). CREATE VIEW pennsylvania_customers AS SELECT id address email name city state birth_date zip created_at FROM people WHERE state = 'PA' Then for future queries, our team in Pennsylvania could access the information they need about their customer base by querying pennsylvania_customers as their starting point. While views are a pretty fundamental feature of any SQL-based database or data warehouse, the specifics of creating, materializing, and maintaining them may differ depending on which database software or data warehouse you use.

Read More

W

X