An intrepid reporter from the New York Times has provided a vivid illustration that the supposedly de-identified search data released by AOL is not really anonymous.
A Face Is Exposed for AOL Searcher No. 4417749 - New York Times
Buried in a list of 20 million Web search queries collected by AOL and recently released on the Internet is user No. 4417749. The number was assigned by the company to protect the searcher’s anonymity, but it was not much of a shield.
No. 4417749 conducted hundreds of searches over a three-month period on topics ranging from “numb fingers” to “60 single men” to “dog that urinates on everything.”
And search by search, click by click, the identity of AOL user No. 4417749 became easier to discern. There are queries for “landscapers in Lilburn, Ga,” several people with the last name Arnold and “homes sold in shadow lake subdivision gwinnett county georgia.”
It did not take much investigating to follow that data trail to Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga., frequently researches her friends’ medical ailments and loves her three dogs. “Those are my searches,” she said, after a reporter read part of the list to her.
What this really illustrates is the risk posed by simply keeping data around. AOL says they keep the data for a month and this particular database was used internally for research to optimize the AOL service. The usual risk to consider is that the data will illicitly go out the back door, but in this case it went out the front door.
Now the cat's out of the bag: Someone has put the database online, allowing you to search the searches (http://www.aolsearchdatabase.com/). Many of the searches reveal sad details about the users and browsing is creepily voyeuristic. Now Thelma's data is out there, along with searches of over six hundred thousand others.
Thanks to Michael Geist for the link.