Aug 10, 2022 in Data explorations

5 min read

#jobs #jobs #jobs | Exploring trends in data jobs postings

The Metabase Team Portrait
The Metabase Team
‧ Aug 10, 2022 in Data explorations

‧ 5 min read

#jobs #jobs #jobs | Exploring trends in data jobs postings Image
Share this article

We were interested how trends in data jobs have been evolving over time, so we took a look at job postings on one of our favorite communities, dbt’s Slack.

Their #jobs channel has had over 100+ job postings per month since 2021, so we pulled the data into Metabase to see if we could find any patterns.

Data jobs dashboard

You can explore the full dashboard, or keep reading for our observations and a walk-through of how we built it.

Observations

The number of job postings appears grew hand-in-hand with dbt’s userbase and their Slack community.

Over 40% of job postings mention the possibility of a remote working arrangement.

From 2021 onwards, more than half of the job postings are for remote work, up from 20% for periods before 2021.

Analytics engineering roles are the fastest growing job in the data space, from 6% in 2019 to 32% in 2022. (link to chart)

How we got the data

Getting the data into Metabase

  1. We used Phantombuster to pull raw messages from dbt’s #jobs channel into a CSV file.
  2. We uploaded the CSV file into a Google Sheet and used fivetran to load the .csv data into a table in a Postgres database.
  3. Then we connected our Metabase to the postgres database. (You can skip this step if your database is already connected).
  4. Then we transformed the data and converted it into a model in Metabase. If you want, you can download the model’s data as a CSV, JSON, or XLSX file.

Working with the data in Metabase

Because of the unstructured nature of text, we opted for a SQL question (instead of using the graphical query builder)

Some jobs were posted in the channels, others in threads in that channel. We combined those and filtered out replies and comments to the job posting. We used simple CASE and LIKE statements to extract information like:

  • The role,
  • Whether the role is remote,
  • Any visualization tools mentioned.

Here’s the SQL we used to create the model:

WITH raw_messages
     AS (SELECT CASE
                  WHEN message_url LIKE '%thread_ts=%' THEN
                  Substring(message_url, '.*thread_ts=(.*)')
                  ELSE Substring(message_url,
                       'https://getdbt.slack.com/archives/C7A7BARGT/(.*)'
                       )
                END AS thread_id,
                message_url,
                created_at,
                text,
                username
         FROM   random_datasets.dbt_jobs_scrape_20220802_messages),
     raw_messages_with_order
     AS (SELECT m.*,
                Row_number()
                  OVER(
                    partition BY thread_id
                    ORDER BY created_at ASC) AS post_order
         FROM   raw_messages m),
     combined_messages
     AS (SELECT m1.thread_id,
                m1.message_url,
                m1.created_at,
                m1.username,
                String_agg (m2.text, ' ') AS combined_text
         FROM   raw_messages_with_order m1
                LEFT JOIN raw_messages_with_order m2
                       ON m1.thread_id = m2.thread_id
                          AND m1.username = m2.username
         -- take message from original poster only
         WHERE  m1.post_order = 1
         GROUP  BY m1.thread_id,
                   m1.message_url,
                   m1.created_at,
                   m1.username)
SELECT *,
       -- location
       CASE
         WHEN Lower(Replace(combined_text, 'Is this role remote?', '')) LIKE
              '%remote%'
       THEN true
         WHEN Lower(combined_text) LIKE '%is this role remote? yes%' THEN true
         ELSE false
       END AS is_remote,
       -- BI stack
       CASE
         WHEN Lower(combined_text) LIKE '%metabase%' THEN true
         ELSE false
       END AS stack_includes_metabase,
       CASE
         WHEN Lower(combined_text) LIKE '%looker%' THEN true
         ELSE false
       END AS stack_includes_looker,
       CASE
         WHEN Lower(combined_text) LIKE '%tableau%' THEN true
         ELSE false
       END AS stack_includes_tableau,
       CASE
         WHEN Lower(combined_text) LIKE '%power bi%' THEN true
         ELSE false
       END AS stack_includes_powerbi,
       CASE
         WHEN Lower(combined_text) LIKE '%hex%' THEN true
         ELSE false
       END AS stack_includes_hex,
       CASE
         WHEN Lower(combined_text) LIKE '%qlik%' THEN true
         ELSE false
       END AS stack_includes_qlik,
       -- role
       CASE
         WHEN Lower(combined_text) LIKE '%analyst%' THEN true
         ELSE false
       END AS role_analyst,
       CASE
         WHEN Lower(combined_text) LIKE '%analytics engineer%' THEN true
         ELSE false
       END AS role_analytics_engineer,
       CASE
         WHEN Lower(combined_text) LIKE '%data scien%' THEN true
         ELSE false
       END AS role_data_scientist,
       CASE
         WHEN Lower(combined_text) LIKE '%data engineer%' THEN true
         ELSE false
       END AS role_data_engineer
FROM   combined_messages
WHERE  combined_text IS NOT NULL

You might also enjoy

All posts
The hidden costs of the data stack Image May 12, 2023 in Data explorations

The hidden costs of the data stack

An incomplete list of the less obvious costs associated with maintaining a data stack, and some things you can do to keep those costs under control.

The Metabase Team Portrait
The Metabase Team

9 min read

Bus factor of top GitHub projects Image Nov 14, 2022 in Data explorations

Bus factor of top GitHub projects

What is the bus factor for the top one thousand GitHub repositories with the most stars?

The Metabase Team Portrait
The Metabase Team

4 min read

All posts
Close Form Button

Subscribe to our newsletter

Stay in touch with updates and news from Metabase. No spam, ever.