Browse data sources

Apache Spark

Connect Metabase to Apache Spark, an open-source unified analytics engine

Get started with Metabase
Read documentation

Apache Spark + Metabase docs


Type

Official Connector

Built and managed by Metabase, available in all editions


Website

spark.apache.org


Support

Unlimited technical help available on paid plans

From multi-format data in Apache Spark, to insights in Metabase

If you’re using Apache Spark, you’re probably handling large-scale, computationally-intensive queries and need a business intelligence tool that can keep up. Maybe you’ve got A LOT of data that you need to be able to query and make sense of quickly, without lag. Metabase lets your whole team visualize and explore your data in Apache Spark with or without SQL. Run native queries on nested data and scale indefinitely.

Metabase Apache Spark analytics
Metabase Apache Spark analytics
Metabase Apache Spark analytics

Easy-to-use data exploration tools for people of all levels

Get business intelligence tool with friendly UX that lets everyone make sense of your data in Apache Spark.

  • Interactive dashboards that load as fast as Apache Spark does with click-to-explore functionality..
  • Click to drill through on interactive charts and dashboards, zoom in on timelines or areas of interest, and break out for.
  • Ask questions with nothing more than clicks in the Query Builder - no SQL required. (or use the SQL editor if that’s more your style).
  • Set up models and metrics to give less technical team mates metadata rich starting points, with trickier stuff like joins taken care of.

Give data access with granularity

Take control over data access and permissions to keep everyone in their own lane for maximum governance.

  • Granular permissions for viewing and querying data, so people can see (and do) what they need to and nothing else down to the row- and column-level.
  • Manage people and permissions with SSO to map permissions to user groups and attributes.
  • Detailed usage analytics lets you see who did what when, for compliance, performance.

Share data with your team or your customers, easily

Put dashboards and charts in front of people with as much interactivity and room to pull threads (or as little) as you want.

  • Customer-facing analytics is just a snippet away. Embed all of Metabase in your app, or just a dashboard.
  • Export charts and dashboards to PDF, CSV, or share via a public link.
  • Set up subscriptions for regularly scheduled updates. Even to people without a Metabase login.
  • Get alerts when things change unexpectedly.

In-warehouse Apache Spark analytics without extracting data

Metabase runs direct queries in Apache Spark without extracts, so your reports are always up-to-date with your Apache Spark data and don’t require moving large data sets.

Metabase features with Apache Spark


Available with all data sources

  • Unlimited queries, charts, and dashboards
  • Send dashboards and reports via email and Slack
  • Connect to multiple data sources and integrations
  • Single sign on via SAML, LDAP, or JWT
  • Interactive embedding with white label customization
  • Granular row- and column level permissions
See all features

Keep everything in your own cluster

Self-host Metabase and Apache Spark to keep everything on your terms. Get your token and go. Both are open source, with optional cloud hosting.

Frequently asked questions

What’s the best business intelligence tool to connect to Apache Spark?

Apache Spark pairs with a number of BI tools, each with their own pros and cons. Metabase is the most effective way to let everyone in the team start working with data. With a low learning curve and sophisticated but easy-to-use data tools like the Query Builder which lets people ask questions without SQL, simple drill-through, zoom-in, and breakout functionality that let people learn more from data with just a few clicks.

You can set up and connect Metabase to Apache Spark in about 5 minutes and be querying immediately, with drill-through functionality automatically generated and ready for people to start uncovering insights. Metabase is also open source and affordable, with plans and pricing that scales with you.

How does Metabase connect to Apache Spark?

You can connect to Apache Spark when you’re setting up a new Metabase instance, or add a database connection any time in your admins settings:

To add a database connection, click on the gear icon in the top right, and navigate to Admin settings > Databases > Add a database.

For the full details on connecting Metabase to Apache Spark, check out our documentation.

Can I use permissions from my Apache Spark database in Metabase?

Apache Spark permissions cannot be impersonated in Metabase (for now. This is currently possible for PostgreSQL, MySQL, and Snowflake databases).

With granular row-level permissions and user group mapping, you can effectively set up permissions to match those applied in Apache Spark.

Learn more about Metabase permissions in our documentation.

How can you visualize tables in Apache Spark?

Metabase fits with Apache Spark as a querying and visualization layer on top of your data. With Metabase you can query data in Apache Spark - with or without SQL - to create a broad range of data visualizations and types and tell a story with interactive dashboards. Viewers can filter and drill-through to get what’s most relevant, and dig deeper on what’s important to them. Visualizations and dashboards can even be shared or embedded in your app.

How can you query data in Apache Spark?

Apache Spark helps you analyze unstructured, semi-structured, and structured data stored in Amazon S3 - but you’ll likely need pretty advanced data and technical skills to be able to do it.

Metabase makes it possible for everyone in the team to run their own reports, without data skills or relying on someone else to write SQL for them. People used to working in Excel can leverage skills usually reserved for spreadsheets to get the answers they need from data in Apache Spark.

How to create dashboards using Apache Spark?

Metabase lets you bring together charts, visualizations, and questions into interactive dashboards that can be shared with your team and customers.

The automatically generated drill-through menu lets people click on charts to zero in on a particular category or parameter for further analysis; view individual records, or zoom in on a targeted date range. You can also add filters to let people slice the data on what’s most important to them, and add custom-click behaviors to guide data discovery (e.g. send people to a related dashboard).