These are the docs for the Metabase master branch. Some features documented here may not yet be available in the current release. Check out the docs for the current stable version, Metabase v0.58.
Python transforms
Python transforms require the Transforms add-on.
Use Python to write transforms.
How Python transforms work
Python-based transforms require a dedicated Python execution environment.
- In your Metabase, you write a Python script that returns a
pandasDataFrame and uses one or more tables from your database. - Metabase will spin up a new Python execution environment and run the transform there (not on your Metabase instance).
- Metabase securely copies your source data to your Python environment and makes it available as pandas DataFrames.
- The Python environment executes your Python script in memory and saves the resulting DataFrame as a file.
- Your Metabase reads the DataFrame file, writes the results to a new table in your database, and syncs the table to your Metabase.
- On subsequent transform runs, your database will overwrite that table with the updated results unless you mark the transform as incremental.
Set up a Python runner
To execute Python transforms, you’ll need a Python runner - a dedicated environment for running Python code.
If you’re on Metabase Cloud, get the Transforms add-on, and you’re good to go.
If you’re self-hosting Metabase, you’ll need to set up a self-hosted Python runner, see Python runner.
Create a Python transform
Once you’ve set up the Python runner:
- Go to Data studio > Transforms.
-
Click on + New and select Python script.
-
Select a database that has the data you want to transform. See Databases that support transforms.
-
Select one or more tables with the data that you’d like to transform. Optionally, assign aliases to the tables.
The tables you select will be available as DataFrames in your Python code with your chosen aliases, and will be passed to the
transform()function as parameters. For now, all the tables you pick must be in the same database. -
Create a function
transform()that does the data wrangling and returns a pandas DataFrame.See Tips for writing Python transforms. The DataFrame returned by the function will be written back to your database when the transform is run.
-
To test your transform, press the Run Python script button at the bottom right of the editor.
Metabase will pull 100 rows from each input table and run your transform on those rows. You’ll be able to see the result in the Results preview tab below the editor. You can see any other output (e.g. outputs of any print statements in your code) in the Output tab.
The transform preview will only use first 100 rows from each input table. This means that you might not see the real results of your transforms in preview: for example, if your transform renames the values of
DoohickeytoWidget, and the first 100 records of the input table happen to not contain any doohickeys, you won’t be able to preview the result. -
Once you’re done with your code, click Save in the top right corner.
-
Select a target schema for your transform and enter a name for the target table. Metabase will write the results of the transform into this table.
You can only write back to the same database as you chose for the transform source.
- Optionally, make your transform incremental. See Incremental Python transforms
- Optionally, assign tags to your transforms. Tags are used by jobs to run transforms on schedule.
Tips for writing Python transforms
- Metabase will automatically add
import commonto the code of your Python transform. This imports the Python library. You can use Python library for reusable functions and classes. - A Python transform must have a function
transform()that returns a singlepandasDataFrame. - You can use aliases to include tables from your database as DataFrames inside the
transform()function. The tables will only be available in the transform function. Other functions won’t have access to the tables. - Only
pandaswill be imported by default, but you can import certain other packages. You can also use functions from the common library. - Metabase won’t write DataFrame indexes to the database, including indexes created by
groupby(). If you’re using a custom index that you’d like to include in the target table, you’ll need to reset index on your DataFrame inside thetransform()function.
Run a Python transform
See Run a transform. You’ll see logs for a transform run (including the output of any print() statements you included in the transform() code) on the transform’s page.
Available Python packages
You can import any of the following Python packages in Python transforms:
pandas.- Any dependencies of
pandas, e.g.numpy. - Any packages from the Python standard library, e.g.
jsonordatetime.
Due to security considerations for the Python execution environment, you won’t be able to install or import any other packages.
You can also write your own functions and add them to the common Python library in Metabase, which will be available across all your Python transforms.
Common Python library
If you have functions or classes you’d like to reuse across multiple transforms, you can add them to the common library that will be available across all Python transforms.
To add things to the common Python library:
- Go to Data studio > Transforms.
-
Scroll to the very bottom of the transforms list and click on Python library.
-
Add a Python function or class.
Functions in this library can’t access any data in your database.
To use functions or classes from your Python library:
-
Metabase will automatically add
import commonto your transform’s code. Thecommonhere refers to the Python library. You can click oncommonin the editor to navigate to the library. -
You can reference functions or classes from the common library in your code like
common.manifest_kittens().
Incremental Python transforms
By default, Metabase will process all the data in all input tables, drop the existing target table (if one exists), and create a new table with the processed data. You can tell Metabase to only write new data to your target table by marking your transform as incremental.
Prerequisites for incremental transforms
To update a table incrementally, your data has to have a certain structure. See Prerequisites for incremental transforms.
Make a Python transform incremental
To make a Python transform incremental:
- Go to the transform’s page in Data studio > Transforms.
- Switch to Settings tab.
-
In Column to check for new values, select the column that Metabase should check to determine which values are new. See Prerequisites for incremental transforms for more information on the requirements for that column.
Unlike Query transforms, where you select an output column as the column to check for new values, with Python transforms, you have to select a column from the input tables as the column to check for new values.
Current limitations of Python transforms
- The transform function must return a single
pandasDataFrame. Other data manipulation and DataFrame libraries likepolarsorpysparkare not supported. - Transform preview only uses 100 input rows from each input table.
- DataFrame indexes, including indexes created by
groupby(), are ignored from writing back to the database. If you’re using a custom index that you’d like to include in the target table, you’ll need to reset index on your DataFrame inside thetransform()function to make the index into a real column. - Only a limited set of packages are available for import. You can’t install additional packages.
- Because Python transforms use
pandas, all data manipulation is done in memory. The available memory is determined by the Python execution add-on. For large datasets, consider using query-based transforms that run in your database. - Only one Python transform can be run at any given time.
Read docs for other versions of Metabase.