Nov 14, 2022 in Data explorations

4 min read

Bus factor of top GitHub projects

The Metabase Team Portrait
The Metabase Team
‧ Nov 14, 2022 in Data explorations

‧ 4 min read

Bus factor of top GitHub projects Image
Share this article

The Bus factor is the number of people on a project that would have to be hit by a bus (or quit) before the project is in serious trouble. We were interested in the bus factors for the top 1,000 projects on GitHub (by stars).

Observations

Check out our dashboard, or read on to learn what we’ve found.

Dataset

  • We used the GitHub API and truckfactor to get and compute the bus factors of the top 1,000 GitHub repositories by star count.
  • Due to memory restrictions, we were only able to compute the bus factors for around 95% of the repos on GitHub.
  • To exclude codeless repos (such as learning resources, or a curated list of a topic), we removed projects where the primary programming language couldn’t be determined, or if the repo was primarily composed of one of the following file types: Makefile, TeX, Dockerfile, and Markdown.
  • If you want to play around with the data yourself, go ahead and download and explore the dataset.

How we computed the bus factor

We used a library called truckfactor to compute the bus/truck factor. Here’s how truck factor does its calculations. For each repo, truckfactor (and here we’re quoting directly from the repo):

  • Reads a git log from the repository
  • Computes for each file who has the knowledge ownership of it.
    • A contributor has knowledge ownership of a file when she edited the most lines in it.
    • That computation is inspired by A. Tornhill Your Code as a Crime Scene.
    • Note, only for text files knowledge ownership is computed. The tool may not return a good answer for repositories containing only binary files.
  • Then similar to G. Avelino et al. A novel approach for estimating Truck Factors low-contributing authors are removed from the analysis as long as still more than half of all files have a knowledge owner. The amount of remaining knowledge owners is the truck factor of the given repository.

For some context, studies conducted in 2015 and 2016 calculated the bus/truck factor of 133 popular GitHub projects. The results show that most of the projects had a small bus factor (65% have bus factor ≤ 2) and that less than 10% of those projects had a bus factor greater than 10.

Distribution of bus factors

Almost half of the projects have a bus factor of two or less.

Only 10% of projects have bus factor of 6 or higher.

There is no correlation between repo stars and bus factor

We initially thought that more popular projects should have more contributors, and therefore a higher bus factor, but that doesn’t seem to be the case.

Average bus factor of top languages used

We’re talking about languages in general here, so languages like HTML and CSS are in play.

  • More than half of all projects use the Shell scripting language (Bash scripts).
  • The most common languages were web-based tools: JavaScript, HTML, CSS, and Typescript. The top general purpose languages included Python, C, and Java.
  • Projects that were written in web-based development languages (JavaScript, HTML, CSS, TypeScript and SCSS) tend to have a lower bus factor compared to projects written in general purpose programming languages (Python, C, Java and C++)

Among the most-starred repositories, JavaScript is the most popular label, led by popular web frameworks and libraries like React, Vue, Bootstrap, and Angular. If we combine Go and Golang, projects written in Go would be the second most-labeled language (though it’s possible that some repos include both the Go and Golang labels, which would inflate the label count).

Hacktoberfest is the second most common label, which makes sense. Hacktoberfest is a month-long celebration of open-source projects to encourage the contributions to open-source projects, and so repo maintainers are incentivized to add the label to attract contributors.

Bus factors by software types

We also broke out bus factor by software type, and machine learning had the most projects with bus factors in the double digits.

Backend projects

Frontend projects

Machine learning projects

Business intelligence projects

Conclusions

  • Metabase supports public transportation.
  • Software is built on a house of cards.
  • Document your code.
  • Metabase’s bus factor is decent (4). Plus, we’re a fully distributed team, so the bus accidents would have to be globally coordinated to put the project in any kind of jeopardy.
  • But our bus factor could be better, so, you know, we’re hiring.

You might also enjoy

All posts
The hidden costs of the data stack Image May 12, 2023 in Data explorations

The hidden costs of the data stack

An incomplete list of the less obvious costs associated with maintaining a data stack, and some things you can do to keep those costs under control.

The Metabase Team Portrait
The Metabase Team

9 min read

The data guide to travel Image Oct 04, 2022 in Data explorations

The data guide to travel

We scraped some data from Google Maps to find out what the top attractions are in the world’s most visited destinations.

The Metabase Team Portrait
The Metabase Team

4 min read

All posts
Close Form Button

Subscribe to our newsletter

Stay in touch with updates and news from Metabase. No spam, ever.