There's No Such Thing As An Anonymized Dataset

Techdirt has an interesting report, culled from Slashdot, about an experiment that went in an unanticipated direction. Neflix released a chunk of deidentified data hoping that researchers could use thed data to tweak and improve the company's recommendation algorithm. Other researchers used the data to match Neflix reviewers to IMDB reviewers, which identified many of the supposedly anonymous Neflix users. See: Techdirt: There's No Such Thing As An Anonymized Dataset (and thanks to Rob Hyndman for sending me the link.)

What's the big deal? Two things: first, those Neflix viewers thought their information would remain private and some of it would reveal personal attitudes toward sex, violence and other matters. Secondly, it is a lesson for anyone else who thinks that releasing an "anonymized" dataset would be ok.

