Saturday, December 01, 2007

There's No Such Thing As An Anonymized Dataset

Techdirt has an interesting report, culled from Slashdot, about an experiment that went in an unanticipated direction. Neflix released a chunk of deidentified data hoping that researchers could use thed data to tweak and improve the company's recommendation algorithm. Other researchers used the data to match Neflix reviewers to IMDB reviewers, which identified many of the supposedly anonymous Neflix users. See: Techdirt: There's No Such Thing As An Anonymized Dataset (and thanks to Rob Hyndman for sending me the link.)

What's the big deal? Two things: first, those Neflix viewers thought their information would remain private and some of it would reveal personal attitudes toward sex, violence and other matters. Secondly, it is a lesson for anyone else who thinks that releasing an "anonymized" dataset would be ok.

1 comment:

  1. Hello all!

    Nice to see I'm not the only one interested in this topic. I've just created a blog in which I talk about international politics. My last post is titled 'On technology, privacy, and other challenges in the XXI century'. It talks about cameras, databases, and how Governments and Corporations should manage that data. Feel free to take a look and comment or post there if you want. It doesn't have any advertising.

    It's here:

    See ya!
