Data Can’t Lie?

A statistician is someone who can put their head in a hot oven, and their feet in a bucket of ice, and say, “on the average, I feel fine.”

Before we move completely into a world where people are counseled, “use the data, Luke,” disregarding their own beliefs and feelings, we need to have a little discussion. As an example of what we might get wrong, let’s take a look at some interesting problems in the polling from recent elections. According to one article (which happens to have all the numbers conveniently gathered in one place)

  • On May 7th, in an election in Britain, the pre-election polls showed conservatives would win around 280 seats. The exit polls during the election showed the conservatives would win around 316 seats. During the election, conservatives actually won 330 seats.
  • In 1992, also in Britain, the pre-election polls showed the conservative and liberal parties in a dead heat. The conservatives actually won by 7.5 points.
  • In the recent election in Israel, Likud was predicted, through polling, to win 22 seats. Likud actually won 30 seats.

These aren’t random events — they are repeated time and again in elections through the last decade or so. Sociologically, one explanation for the difference between the polls and the results is that in cultures where being conservative is seen as socially unacceptable, people simply tell the pollster what they think the pollster wants to hear — they tell them what they think will make them liked, or at least accepted.

In other words, people are capable of lying. Don’t sit there with a stunned look on your face, as if you’d never thought of this before.

I know polls are one thing, and big data is another. The Internet of Things is, after all, going to put sensors in every home, at every street corner, in every car, and in every piece of electronics you might encounter in your daily life. Then you won’t have the ability to lie, because all sharing will be frictionless.



Ignoring the big brother implications (Big Brother doesn’t want you to keep a diary, because keeping diaries will just cause you emotional angst…), there’s an underlying problem with the exuberance over being able to collect all the information in the world and run it through some form of algorithm that will predict, not only the future, but also how to “nudge” people along a path someone writing the code (or the laws, as the case might be), wants them to go.

People don’t always like to be “nudged,” and they’re pretty good liars, if you want to know the truth. Kids posting the names of songs on social media sites that describe how they feel — knowing their parents don’t know the song, and hence won’t get the point. Jokes about squirrels and blue dories. Winston keeping a dairy outside the range of the all seeing eye of Big Brother — people will find a way to communicate and remember no matter what measures you might take.

And if you think the solution to this problem is “just add more sensors,” then I find your line of thinking pretty creepy.

But this is the bottom line problem with data analytics — we can’t really measure intent, just action. We can try to infer intent from action, but people are pretty good at doing one thing, and meaning another — especially once they figure out how you’re measuring them.

So before we go running off into a world of, “ignore your common sense, and use the data, Luke,” we might need to think about this little problem called humanity. Or maybe it’s time to inject a little reality and humility into our way of thinking.

Maybe we really can’t “solve the world’s problems.” Maybe we can’t influence people to do what we think is best all the time, no matter how much data we covet and collect and hoard and analyze. And maybe, by focusing so hard on this, we’re like the statistician who’s ignoring his head slowly roasting.

At least it’s a dry heat, I guess.