Mar 27, 2026 in Engineering

5 min read

Meet Repro-Bot, our GitHub issue triage agent

Nathan Voxland Portrait
Nathan Voxland
‧ Mar 27, 2026 in Engineering

‧ 5 min read

Meet Repro-Bot, our GitHub issue triage agent Image
Share this article

Reproducing bug reports is one of the most time-consuming parts of maintaining an open source project. We used Claude Code to build an AI agent that helps us with this task, and we’re sharing the code so you can build your own.

How our AI agent reproduces a bug report

Repro-Bot started as an offsite-hackathon project, and grew from there. It’s largely built around prompts to Claude Code to use existing tools we had in our system (like starting up the database and other environments), external tools (like playwright for browser automation) and some custom scripts that allow us to let Claude run without permissions more safely (more on that below).

We set up Repro-Bot to work through the typical steps a human would do to reproduce an issue:

  1. Set up an environment similar to the person reporting (same Metabase version, same appdb type, same data warehouse types).
  2. Follow the provided steps. Issue reproduction can include browser automation, direct API calls, initial data setup, as well as other development tools.
  3. Are you able to reproduce the problem? If not…,
    • Why not? What are you missing?
    • Think about how the code works and where people could be hitting a connected edge case.
    • What has changed lately in that area?
    • What questions should we ask the person who reported the issue to understand what they’re doing differently?
  4. If you were able to reproduce the issue:
    1. Tell us what you did and what you saw.
    2. Write an automated test that exposes the problem.
    3. If possible, give a first-pass guess at the cause of the problem and complexity of fixing.

Repro-Bot is already saving us time

For new issues, Repro-Bot has helped us respond to the person who reported the issue, which gives us a better understanding of their problem. We’ve also cleared out a bunch of issues from our backlog that Repro-Bot confirmed we had fixed.

Of course, Repro-Bot isn’t infallible. Sometimes it can’t repro an issue. Sometimes it thinks it has reproduced a bug when it hasn’t. But even in those cases, Repro-Bot’s reports are still valuable. It gives us hints and chronicles dead-ends, both of which save devs time getting to the root cause.

In practice, we run it by adding a .Run Repro Bot label to one of our issues. You can actually see the issues it was run on with this query, and here’s an example of what its output looks like for a real issue it reproduced:

Repro-Bot reproducing an issue with a summary and reproduction steps

It then continues with findings, a root cause analysis, evidence supporting that analysis, new tests it proposes, and a list of next steps to fix the bug.

Learnings for your own bug reproduction AI agent

There are basically three steps to build your own version of Repro-Bot:

  • Create a way to spin up your environment in an isolated manner
  • Prompt an LLM on how to interact with your system to reproduce the bugs
  • Prompt an LLM on how to report the findings

The details will depend on your development environment, language, etc., but here are a few lessons we learned along the way:

Keep the agent focused on issue reproduction, not resolution. Originally, we had wanted to make a more end-to-end bot that could do it all: “Reproduce the bug, figure out the cause, AND fix it”. But that wider scope opened up a number of wrong paths the bot could go down. Keeping the agent’s purview small kept its output manageable, and we can always introduce more automation downstream, after we have a human-in-the-loop to validate its report.

Give the agent the tools you would use to reproduce the issue. In our case:

  • Playwright for driving a browser
  • Source code for the version of Metabase where the issue was reported
  • nREPL access to the server instance to inspect the current state
  • Ability to run queries against the database

Trim token usage where you can. Agents try to to be helpful and do what you ask them to do, but with quickly diminishing returns. So don’t let them work too long. We set limits on the time and effort Repro-Bot could spend trying to reproduce an issue, both in the prompt and in the process itself. We also fed transcripts to another agent to write better prompts that would gobble up fewer tokens.

Read the transcripts to see what problems the bot encounters. The agent will do everything it can to reproduce the bug, regardless of what it has to do or how well it can do it. If the bot hits an unrelated issue, or is missing a tool, it’ll automatically switch approaches to work around them. This unfortunately often leads to failure. So use the transcripts to find and fix these blockers so they don’t derail the next repro.

Protect your agent from poisoned prompts. We wanted the bot to be able to run automatically without people having to baby-sit it. Most issues come from Metabase’s public GitHub issues, which could make it trivial for some black hat to poison context. To guard against this, we sandbox the agent and limit its permissions. We also require a human in the loop to make sure there isn’t anything suspicious in the issue. In practice, that means someone has to look at the issue to decide whether it’s a good fit for Repro-Bot, and whether the issue might include a potentially dangerous prompt.

Check out our source code and build your own Repro-Bot

We built Repro-Bot for internal Metabase use, but here’s the code if you’re interested. It’s obviously specific to the Metabase repo’s setup, but the code should give you ideas for how to build your own Repro-Bot.

Subscribe to newsletter
Updates and news from Metabase