big data

Weekend Reads 011118: Mostly Security and Policy

Traveling is stressful. The last thing you want to worry about is getting scammed by crooks on the street. Your best tool? Knowledge. Know how they work. Know what they’ll do. Prevent it from happening in the first place. —Relatively Interesting

The European Union’s competition chief is zeroing in on how companies stockpile and use so-called big data, or enormous computer files of customer records, industry statistics and other information. The move diverges starkly from a hands-off approach in the U.S., where regulators emphasize the benefits big data brings to innovation. —Natalia Drozdiak @ MarketWatch

The cybersecurity industry has mushroomed in recent years, but the data breaches just keep coming. Almost every day brings news of a new data breach, with millions of records compromised — including payment details, passwords, and other information that makes those customers vulnerable to theft and identity fraud. —Alistair Johnston @ MarketWatch

To break the dominance of Google on Android, Gael Duval, a former Linux developer and creator of now defunct but once hugely popular Mandrake Linux (later known as Mandriva Linux), has developed an open-source version of Android that is not connected to Google. —Kavita Iyer @ TechWorm

China has rarely undertaken a role in developing public international cybersecurity law over the many years the provisions have existed. Only once did it submit a formal proposal — fifteen years ago to the 2002 Plenipotentiary Conference where it introduced a resolution concerning “rapid Internet growth [that] has given rise to new problems in communication security.” Thus, a China formal submission to the upcoming third EG-ITRs meeting on 17-19 January 2018 in Geneva is significant in itself. —Anthony Rutkowski @ CircleID

If all you want is the TL;DR, here’s the headline finding: due to flaws in both Signal and WhatsApp (which I single out because I use them), it’s theoretically possible for strangers to add themselves to an encrypted group chat. However, the caveat is that these attacks are extremely difficult to pull off in practice, so nobody needs to panic. But both issues are very avoidable, and tend to undermine the logic of having an end-to-end encryption protocol in the first place. —Krebs on Security

This past Friday Twitter issued what is perhaps one of the most remarkable statements in modern diplomatic history: it said both that it would not ban a world leader from its platform and that it reserved the right to delete official statements by heads of state of sovereign nations as it saw fit. Have we truly reached a point in human history where private companies now wield absolute authority over what every government on earth may say to their citizens in the online world that has become the defacto modern town square? —Kalev Leetaru @ Forbes

Data Can’t Lie?

A statistician is someone who can put their head in a hot oven, and their feet in a bucket of ice, and say, “on the average, I feel fine.”

Before we move completely into a world where people are counseled, “use the data, Luke,” disregarding their own beliefs and feelings, we need to have a little discussion. As an example of what we might get wrong, let’s take a look at some interesting problems in the polling from recent elections. According to one article (which happens to have all the numbers conveniently gathered in one place)

  • On May 7th, in an election in Britain, the pre-election polls showed conservatives would win around 280 seats. The exit polls during the election showed the conservatives would win around 316 seats. During the election, conservatives actually won 330 seats.
  • In 1992, also in Britain, the pre-election polls showed the conservative and liberal parties in a dead heat. The conservatives actually won by 7.5 points.
  • In the recent election in Israel, Likud was predicted, through polling, to win 22 seats. Likud actually won 30 seats.

These aren’t random events — they are repeated time and again in elections through the last decade or so. Sociologically, one explanation for the difference between the polls and the results is that in cultures where being conservative is seen as socially unacceptable, people simply tell the pollster what they think the pollster wants to hear — they tell them what they think will make them liked, or at least accepted.

In other words, people are capable of lying. Don’t sit there with a stunned look on your face, as if you’d never thought of this before.

I know polls are one thing, and big data is another. The Internet of Things is, after all, going to put sensors in every home, at every street corner, in every car, and in every piece of electronics you might encounter in your daily life. Then you won’t have the ability to lie, because all sharing will be frictionless.



Ignoring the big brother implications (Big Brother doesn’t want you to keep a diary, because keeping diaries will just cause you emotional angst…), there’s an underlying problem with the exuberance over being able to collect all the information in the world and run it through some form of algorithm that will predict, not only the future, but also how to “nudge” people along a path someone writing the code (or the laws, as the case might be), wants them to go.

People don’t always like to be “nudged,” and they’re pretty good liars, if you want to know the truth. Kids posting the names of songs on social media sites that describe how they feel — knowing their parents don’t know the song, and hence won’t get the point. Jokes about squirrels and blue dories. Winston keeping a dairy outside the range of the all seeing eye of Big Brother — people will find a way to communicate and remember no matter what measures you might take.

And if you think the solution to this problem is “just add more sensors,” then I find your line of thinking pretty creepy.

But this is the bottom line problem with data analytics — we can’t really measure intent, just action. We can try to infer intent from action, but people are pretty good at doing one thing, and meaning another — especially once they figure out how you’re measuring them.

So before we go running off into a world of, “ignore your common sense, and use the data, Luke,” we might need to think about this little problem called humanity. Or maybe it’s time to inject a little reality and humility into our way of thinking.

Maybe we really can’t “solve the world’s problems.” Maybe we can’t influence people to do what we think is best all the time, no matter how much data we covet and collect and hoard and analyze. And maybe, by focusing so hard on this, we’re like the statistician who’s ignoring his head slowly roasting.

At least it’s a dry heat, I guess.