‧
4 min read
Bus factor of top GitHub projects
The Metabase Team
‧ 4 min read
Share this article
The Bus factor is the number of people on a project that would have to be hit by a bus (or quit) before the project is in serious trouble. We were interested in the bus factors for the top 1,000 projects on GitHub (by stars).
Observations
Check out our dashboard, or read on to learn what we’ve found.
Dataset
- We used the GitHub API and truckfactor to get and compute the bus factors of the top 1,000 GitHub repositories by star count.
- Due to memory restrictions, we were only able to compute the bus factors for around 95% of the repos on GitHub.
- To exclude codeless repos (such as learning resources, or a curated list of a topic), we removed projects where the primary programming language couldn’t be determined, or if the repo was primarily composed of one of the following file types: Makefile, TeX, Dockerfile, and Markdown.
- If you want to play around with the data yourself, go ahead and download and explore the dataset.
How we computed the bus factor
We used a library called truckfactor to compute the bus/truck factor. Here’s how truck factor does its calculations. For each repo, truckfactor (and here we’re quoting directly from the repo):
- Reads a git log from the repository
- Computes for each file who has the knowledge ownership of it.
- A contributor has knowledge ownership of a file when she edited the most lines in it.
- That computation is inspired by A. Tornhill Your Code as a Crime Scene.
- Note, only for text files knowledge ownership is computed. The tool may not return a good answer for repositories containing only binary files.
- Then similar to G. Avelino et al. A novel approach for estimating Truck Factors low-contributing authors are removed from the analysis as long as still more than half of all files have a knowledge owner. The amount of remaining knowledge owners is the truck factor of the given repository.
For some context, studies conducted in 2015 and 2016 calculated the bus/truck factor of 133 popular GitHub projects. The results show that most of the projects had a small bus factor (65% have bus factor ≤ 2) and that less than 10% of those projects had a bus factor greater than 10.
Distribution of bus factors
Almost half of the projects have a bus factor of two or less.
Only 10% of projects have bus factor of 6 or higher.
There is no correlation between repo stars and bus factor
We initially thought that more popular projects should have more contributors, and therefore a higher bus factor, but that doesn’t seem to be the case.
Average bus factor of top languages used
We’re talking about languages in general here, so languages like HTML and CSS are in play.
- More than half of all projects use the Shell scripting language (Bash scripts).
- The most common languages were web-based tools: JavaScript, HTML, CSS, and Typescript. The top general purpose languages included Python, C, and Java.
- Projects that were written in web-based development languages (JavaScript, HTML, CSS, TypeScript and SCSS) tend to have a lower bus factor compared to projects written in general purpose programming languages (Python, C, Java and C++)
Most popular labels
Among the most-starred repositories, JavaScript
is the most popular label, led by popular web frameworks and libraries like React
, Vue
, Bootstrap
, and Angular
. If we combine Go
and Golang
, projects written in Go would be the second most-labeled language (though it’s possible that some repos include both the Go
and Golang
labels, which would inflate the label count).
Hacktoberfest
is the second most common label, which makes sense. Hacktoberfest is a month-long celebration of open-source projects to encourage the contributions to open-source projects, and so repo maintainers are incentivized to add the label to attract contributors.
Bus factors by software types
We also broke out bus factor by software type, and machine learning had the most projects with bus factors in the double digits.
Backend projects
Frontend projects
Machine learning projects
Business intelligence projects
Conclusions
- Metabase supports public transportation.
- Software is built on a house of cards.
- Document your code.
- Metabase’s bus factor is decent (4). Plus, we’re a fully distributed team, so the bus accidents would have to be globally coordinated to put the project in any kind of jeopardy.
- But our bus factor could be better, so, you know, we’re hiring.