Tuesday, January 31, 2006

Don't keep the data that you don't need

The recent controversey over subpoenas of high-profile search engines has spurred a lot of discussion about what search engines know about you. For example, John Battelle was able to get confirmation from Google of what a lot of people have probably always suspected:

1) "Given a list of search terms, can Google produce a list of people who searched for that term, identified by IP address and/or Google cookie value?"

2) "Given an IP address or Google cookie value, can Google produce a list of the terms searched by the user of that IP address or cookie value?"

I put these to Google. To its credit, it rapidly replied that the answer in both cases is "yes." Just FYI.

What else does Google know? Given that Google operates

  • one of the most widely used advertising networks,
  • one of the most widely used webmail services,
  • one of the most widely used mapping services,
  • one of the most widely used website statistics services,
  • one of the most widely used browser toolbars,
  • one of the most widely used news aggregators,
  • one of the most widely used online group services,

they know a heck of a lot. Every time you visit a site that uses adwords, your computer connects to google and tells them what you're viewing and probably what got you there. And all this can be matched by your google cookie or your IP address.

The question is, other than for personalized services, why should a company maintain information that is personally identifiable? Why keep logs that have your ip address down to the last digit when the same value can be obtained from the data by only keeping the first three units (192.168.168.* compared to The level of trust that consumers have for companies like Google is eroding and businesses should take heed of this. If you don't need the information in personally identifiable form, don't keep it.

It will not be long before the cost of keeping this stuff is prohibitive if you have to spend valuable personel time responding to subpoenas. I can imagine the FBI or some other three-letter-agency having a form subpoena that will seek all the records from Google, Yahoo!, DoubleClick and others about the supposed "owner" of a suspicious IP address. What did you search for? What did you read? When were you online? All this info is mantained by a small handful of companies.

UPDATE: While you're thinking about this, check out Google's data minefield by Mark Rasch (via robhyndman.com).

Technorati tags: :: :: :: :: :: :: ::

No comments: