The 2023 Metabase Community Data Stack Report

A look at the current state of data tooling and self-service analytics

Earlier this year, we released an anonymous data stack survey, through social channels and email, to find out more about data tooling and its impact on different company sizes and roles.

The survey was available to anyone but, out of the 189 responses we received, 89% were Metabase customers.

While we can’t say our insights are a statistically significant representation of data folks everywhere because of the small sample size, the results have a few things you may want to know, like how one specific database could be worse for your team’s morale... Keep reading to find out more.

Copy to clipboard Larger companies are more likely to choose a data tool if it's open source

Explore the dashboard

75% of survey respondents said they’re using an open-source production database, so it’s no surprise that Postgres and MySQL were mentioned most often throughout the survey. But one surprise: larger companies choose an open source production database more often than not.

Large companies said open source, over performance, scalability, and security, was the deciding factor in choosing their production database.

On trend, 75% of respondents at larger companies said dbt is currently a part of their data stack. Open source was also in their top three reasons for choosing a data modeling tool.

Open source was in every corner of the survey results, which is not surprising given that our community rallies around all open-source tools, not just BI.

Copy to clipboard Customer data is more valuable than ever. Social media data? Not so much.

Salesforce was the number one upstream data source. Stripe and Slack were also in the top five. We’re curious to know what Slack data you’re ingesting...

Speaking of the top five, these tools all contain quite a bit of PII so it came as a surprise to us that 92% of respondents didn’t choose security and compliance as their top reason for choosing a data storage option. (This warrants a whole other survey...)

What wasn’t shocking to see: many folks aren’t ingesting data from one social media platform anymore.

That $42,000 per month enterprise API cost may have been the final nail in the coffin. X, formerly known as Twitter (RIP), barely made our top ten upstream data sources.

Explore the dashboard

Copy to clipboard Even with all of the options on the market, most companies still keep data ingestion in-house

Airbyte and Fivetran rounded out the top three, but in-house data ingestion was still more popular than the two combined.

Explore the dashboard

Maybe legacy architecture forces people to build in-house ingestion tools. Or the cost of third-party tooling outweighs the benefits.

It could also just be that third-party ingestion tools are still growing, so maybe we’ll see a shift to them in the coming year.

But a good amount of companies are still choosing to build their own ingestion pipelines. We’ve seen a similar trend in data cataloging (more on that below).

You can keep those Python scripts handy for now. In-house data ingestion seems poised to stay as a complement to commercial offerings versus being replaced entirely by third-party ingestion tooling.

Explore the dashboard

Copy to clipboard Is the future of data cataloging... not data cataloging?

40% of those using a data catalog said they use an in-house tool to host it. And there wasn’t any one commercial tool that came close to that option.

Less surprising, an overwhelming majority of people said they’re not using a data catalog at all.

80% of respondents from small companies said they either don’t use a data catalog or don’t know if they use one. 75% of respondents from midsize companies reported the same.

While 77% of respondents from large companies said they do use a data catalog, data cataloging has a bit of a bad reputation.

The pain of switching between multiple tools in the modern data stack, and adding a data catalog on top of it, is a noted pain point. Data cataloging requires a bit of ingenuity if it is to stay relevant.

Now feels like the time that existing data tools to offer customers new ways to organize their data assets; will require alternatives to data cataloging.

Copy to clipboard Postgres is the most satisfying database... even more if you're on a distributed team

Although it’s one of the most widely used database in the industry, MySQL had the lowest role satisfaction score out of the three most commonly used analytics databases.

Explore the dashboard You may want to rethink your database... and your return to office policy, too. Those happiest in their role said they use PostgreSQL in a distributed team setting.

If you’re using MySQL and have opposing opinions to share, we’re all ears. As for our theory on MySQL's lower score: it's a battle-hardened database, but maybe MySQL is keeping older (less fun) codebases afloat.

Postgres users also said their companies are more self-serve than users of other analytics databases, so it may be a wise option if you’re a global, fully remote team.

Explore the dashboard

Copy to clipboard Self-service score was higher for distributed teams, but there was one role that scored different than the rest

People working on distributed teams said their companies are more self-serve than localized teams. Distributed companies need self-service tools and processes to work asynchronously and let workers query on their own time. This is pretty straightforward.

Explore the dashboard

But from the results around employee satisfaction, there is one large caveat. Perceptions of self-serve differ by role.

People in data roles perceived their companies as less self-serve than their C-level and Engineering counterparts. Explore the dashboard

It’s not surprising that C-Levels and Engineers see their companies as more self-serve. They’re the ones using self-serve tooling.

These results could mean that self-serve is doing as it was intended to do. It could also mean that Data Analytics folks think their companies aren’t as self-serve as they hoped for. There isn’t a huge variation here, but it’s good to keep an eye on.

The good news is we can let you know if that changes! Fill out the survey below to help us figure it out.

The future of the data stack survey

The data stack survey is still open. You can submit your answers now via the form. We’ll create follow-up posts on new, interesting findings as they roll in.

The dashboard and this report are static data for you to use. If you do use the data for something cool, make sure to share it with us!

Submit your answers Explore the dashboard

Business Intelligence

Embedded Analytics

Documentation

Learn

The 2023 Metabase Community Data Stack Report

A look at the current state of data tooling and self-service analytics

Copy to clipboard Larger companies are more likely to choose a data tool if it's open source

Copy to clipboard Customer data is more valuable than ever. Social media data? Not so much.

Copy to clipboard Even with all of the options on the market, most companies still keep data ingestion in-house

Copy to clipboard Is the future of data cataloging... not data cataloging?

Copy to clipboard Postgres is the most satisfying database... even more if you're on a distributed team

Copy to clipboard Self-service score was higher for distributed teams, but there was one role that scored different than the rest

The future of the data stack survey

Business Intelligence

Embedded Analytics

Documentation

Learn

The 2023 Metabase Community Data Stack Report

A look at the current state of data tooling and self-service analytics

Copy to clipboard Larger companies are more likely to choose a data tool if it's open source

Copy to clipboard Customer data is more valuable than ever. Social media data? Not so much.

Copy to clipboard Even with all of the options on the market, most companies still keep data ingestion in-house

Copy to clipboard Is the future of data cataloging... not data cataloging?

Copy to clipboard Postgres is the most satisfying database... even more if you're on a distributed team

Copy to clipboard Self-service score was higher for distributed teams, but there was one role that scored different than the rest

The future of the data stack survey

Share the report