Menu Close

How to ruin your life with Data

Dec 22, 2015 by Sameer Al-Sakran

After discussing all the ways that data can help your organization make better decisions in our last blog post, The Push and Pull of Analytics, let’s look at the common mistakes in using data that can make you look silly, lose you the respect of your coworkers, and generally lead you to ruin.

We’ll go over a number of very common missteps in how to apply data to decision making. These are situations that most people will find themselves in, despite their best intentions. The goal in going over these is not to beat yourself up (or, worse yet, others) over the times you have made these mistakes. Rather, it is to reinforce the need to be self aware in your (and your organization’s) decision making process and constantly work on improving. Then at the very end, we’ll provide a checklist we use to keep ourselves honest in day to day work.

So, without further ado, let’s dig into …

10 Common pathologies in using data

1. Mixing up correlation and causation

Yes, it’s the first thing everyone brings up. Yes, you already know about it. But odds are, you’ll still fall for it no matter how trite it sounds. This is especially dangerous when “exploring” historical data or otherwise not having a clear hypothesis. It’s best to treat any common patterns that happened in the past as suggestive at best of causation rather than a cause until proven otherwise.

2. Expecting data to give you answers to questions you can’t formulate

There’s a bit of a cargo cult in recent years regarding “Data Science,” which largely revolves around the belief in this process:

  1. Collect data
  2. Use hyped technology X
  3. Hire expensive Data Scientists/Analysts/MBAs
  4. Nirvana

The reality is that what you get from the process is directly proportional to the degree to which your organization is able to articulate which questions it is trying to answer. More data and talented data analysts can supercharge an organization that has a clear decision and product process. Doubling down on Big Data won’t be the miracle that saves a fundamentally slow and plodding organization. It can however do a great magic trick and make millions of dollars disappear.

3. Looking for data to support a decision you already made

It’s frighteningly common to go through the motions of collecting data, analyzing it and coming to a decision when you (or others on the team) have already made up your mind. Needless to say, when this is the case it’s faster and cheaper to not go through the motions and pretend. If you’re working in a place where the Big Boss Man expects everyone to find information to justify their pre-existing bias and ignore any information to the contrary, you should switch gears and research where else you could be working.

4. Fishing for the positive

A subset of looking for data to support a decision, but useful to call out, is looking for data to support a rosy picture. There’s always something that is trending upwards, even in terminally dying companies. It’s dangerous to go out of your way to find it even if all the actual metrics you thought were important previously are going south.

5. Expecting too much clarity in results

Many fans of martial arts movies will watch a boxing or fencing match and not be able to make sense of it. If you’re used to seeing fight sequences that are choreographed, shot from perfect angles with perfect lighting and editing, the chaos and speed of what closely matched combat looks like in real life is bewildering.

There’s a similar effect that occurs when people used to problems in MBA coursework or who read up on analytics through highly idealized blog posts react to quantitative decision making in real life. In the real world, effects can be small, multimodal (see below about the evil that is an Average) and generally messy. Work with the data you have, not the perfect data you imagine is out there.

6. Expecting to A/B test your way to Nirvana

A/B testing can be a source of magical thinking. While carefully planned A/B tests that are run with discipline can be transformative to a company, they also often lead to chasing one’s tail. Make sure you know what a significant result is before you start an A/B test. Don’t stop the test the instant one of the options seems to be performing better, and always, always include a control group. Also, realize that the smaller the effect, the larger the number of users you’ll need. If you only have 10k monthly active users, you would be better off simply putting off any kind of A/B testing until there are more people you can test things against.

Furthermore, A/B testing won’t automagically determine the best product features or advertising copy for you. The results are only as good as the options that are being tested, and are very sensitive to how good the initial design is. Don’t let “we’ll A/B test that” become a mantra that shuts down the process of deciding what your product actually is, as opposed to adding the last bit of polish.

7. Using wrong time period, a.k.a. “Everything should be Real-time!”

If your customers purchase on a multi-month time frame, and your product cycle moves in two week sprints, why are you spending hundreds of thousands of dollars of engineering time for real-time analytics on your feature instrumentation? Likewise, if you are trying to diagnose errors in network operations where the cost of being down is measured in [tens of millions a minute] (http://www.bloomberg.com/bw/articles/2012-08-02/knight-shows-how-to-lose-440-million-in-30-minutes), you better not be looking at hourly charts. It’s important to tie the time period with the natural time period of your decision making. As cool as it is to see real time counters of how many people are reading your blog posts, if you are looking at your data at too fine a time period, you’ll end up being twitchy and thrashing between decisions. If you are using too large a time period, you’ll forever be three moves behind.

8. Only looking at Averages

Averages are a great place to hide uncomfortable truths. Want to ignore the fact that your paid acquisition channels are getting more and more expensive and unsustainable? Make sure to only use blended averages across organic and paid channels! Is the most important web page for your users getting slower with time? Make sure you can’t tell by only looking at average latency over all your pages. When averages tell you something is getting worse, you should be really really worried. When they tell you things are going just dandy, dig deeper.

Pro-tip: Most average-inspired delusions go away really quickly if you break them out into a histogram. For example, rosy projections regarding average Cost of Customer acquisition typically break down once you break the costs out by channel.

9. Focusing on totals instead of the rate of change

Everyone loves graphs that go up and to the right no matter what you do. “Total number of signups”, “Cumulative revenue”, and “Total value of goods sold” all end up sounding very impressive and make for great press fodder. However, for most situations, you should be looking at the rate of change, and possibly even the growth in that rate. If 95% of the information a number carries happened months or years ago, how does that help you evaluate how you’re doing today let alone how things will look tomorrow?

10. Not evaluating the results of a decision

It’s common to want to collect lots of information before making a big decision. However, once the decision is made, and the results start to trickle in, it’s common to just assume Things Are Going Well. Everyone makes bad decisions. It’s bad decisions you refuse to accept and correct that end up hurting you the most. No matter how embarrassing it is to make a hard call, it’s vital to look at the results and determine whether you made the right call. Better to wise up to the fact that you were wrong right away than to ignore it, be wrong, and only notice after you’ve been wrong for months.

11. Fixating on a predetermined number ahead of time

An opinionated guide to data-driven decision making

Now that we’ve talked about all of the ways you can misstep, we’ll give you some tools to keep these common problems in mind as you are tackling decisions. Here’s our list of questions to ask yourself before, during and after making data-driven decisions.

Before making a decision, ask:

  • What information do I already have?
  • Have we made a decision already?
  • What additional information do we need and when do we need it?
  • How much will it cost me to get this information?
  • How much more accurate will the decision become with this additional data?

While making a decision, ask:

  • Do we need to make this decision?
  • What is the cost of not making this decision?
  • What is the cost of making this decision incorrectly?
  • Do we have all the information we need to make a good decision?
  • If not, can we get additional data in time?
  • What will it look like if we are right or wrong?
  • When and how should we evaluate whether this was the correct call?

After making a decision, ask:

  • Were we right?
  • Are you sure we weren’t? …. It seemed like a really good idea?
  • Ok, so we weren’t completely right, what did we get wrong?
  • What should we have asked before we made this decision?
  • Was there anything about the process that went poorly and should be addressed?

With time this checklist will change as you find things that your organization does well, and things you need to stay on top of at each stage.

And now for a shameless plug…

_As part of a balanced data infrastructure, Metabase can provide everyone in your organization easy access to the data they need to make better decisions. It runs on your laptop, your servers or cloud servers (Amazon Web Services, Heroku, etc) and can be installed in 5 minutes. Anyone can use Metabase to make dashboards, ask simple ad hoc questions and set up nightly emails. _