‧
5 min read
#jobs #jobs #jobs | Exploring trends in data jobs postings
The Metabase Team
‧ 5 min read
Share this article
We were interested how trends in data jobs have been evolving over time, so we took a look at job postings on one of our favorite communities, dbt’s Slack.
Their #jobs channel has had over 100+ job postings per month since 2021, so we pulled the data into Metabase to see if we could find any patterns.
You can explore the full dashboard, or keep reading for our observations and a walk-through of how we built it.
Observations
The number of job postings appears grew hand-in-hand with dbt’s userbase and their Slack community.
Over 40% of job postings mention the possibility of a remote working arrangement.
From 2021 onwards, more than half of the job postings are for remote work, up from 20% for periods before 2021.
Analytics engineering roles are the fastest growing job in the data space, from 6% in 2019 to 32% in 2022. (link to chart)
How we got the data
Getting the data into Metabase
- We used Phantombuster to pull raw messages from dbt’s #jobs channel into a CSV file.
- We uploaded the CSV file into a Google Sheet and used fivetran to load the .csv data into a table in a Postgres database.
- Then we connected our Metabase to the postgres database. (You can skip this step if your database is already connected).
- Then we transformed the data and converted it into a model in Metabase. If you want, you can download the model’s data as a CSV, JSON, or XLSX file.
Working with the data in Metabase
Because of the unstructured nature of text, we opted for a SQL question (instead of using the graphical query builder)
Some jobs were posted in the channels, others in threads in that channel. We combined those and filtered out replies and comments to the job posting. We used simple CASE
and LIKE
statements to extract information like:
- The role,
- Whether the role is remote,
- Any visualization tools mentioned.
Here’s the SQL we used to create the model:
WITH raw_messages
AS (SELECT CASE
WHEN message_url LIKE '%thread_ts=%' THEN
Substring(message_url, '.*thread_ts=(.*)')
ELSE Substring(message_url,
'https://getdbt.slack.com/archives/C7A7BARGT/(.*)'
)
END AS thread_id,
message_url,
created_at,
text,
username
FROM random_datasets.dbt_jobs_scrape_20220802_messages),
raw_messages_with_order
AS (SELECT m.*,
Row_number()
OVER(
partition BY thread_id
ORDER BY created_at ASC) AS post_order
FROM raw_messages m),
combined_messages
AS (SELECT m1.thread_id,
m1.message_url,
m1.created_at,
m1.username,
String_agg (m2.text, ' ') AS combined_text
FROM raw_messages_with_order m1
LEFT JOIN raw_messages_with_order m2
ON m1.thread_id = m2.thread_id
AND m1.username = m2.username
-- take message from original poster only
WHERE m1.post_order = 1
GROUP BY m1.thread_id,
m1.message_url,
m1.created_at,
m1.username)
SELECT *,
-- location
CASE
WHEN Lower(Replace(combined_text, 'Is this role remote?', '')) LIKE
'%remote%'
THEN true
WHEN Lower(combined_text) LIKE '%is this role remote? yes%' THEN true
ELSE false
END AS is_remote,
-- BI stack
CASE
WHEN Lower(combined_text) LIKE '%metabase%' THEN true
ELSE false
END AS stack_includes_metabase,
CASE
WHEN Lower(combined_text) LIKE '%looker%' THEN true
ELSE false
END AS stack_includes_looker,
CASE
WHEN Lower(combined_text) LIKE '%tableau%' THEN true
ELSE false
END AS stack_includes_tableau,
CASE
WHEN Lower(combined_text) LIKE '%power bi%' THEN true
ELSE false
END AS stack_includes_powerbi,
CASE
WHEN Lower(combined_text) LIKE '%hex%' THEN true
ELSE false
END AS stack_includes_hex,
CASE
WHEN Lower(combined_text) LIKE '%qlik%' THEN true
ELSE false
END AS stack_includes_qlik,
-- role
CASE
WHEN Lower(combined_text) LIKE '%analyst%' THEN true
ELSE false
END AS role_analyst,
CASE
WHEN Lower(combined_text) LIKE '%analytics engineer%' THEN true
ELSE false
END AS role_analytics_engineer,
CASE
WHEN Lower(combined_text) LIKE '%data scien%' THEN true
ELSE false
END AS role_data_scientist,
CASE
WHEN Lower(combined_text) LIKE '%data engineer%' THEN true
ELSE false
END AS role_data_engineer
FROM combined_messages
WHERE combined_text IS NOT NULL