Monday, May 14, 2007

Why does Google remember information about searches?

Straight from Google's official blog:

Official Google Blog: Why does Google remember information about searches? 5/11/2007 11:21:00 AM Posted by Peter Fleischer, Global Privacy Counsel

We recently announced a new policy to anonymize our server logs after 18–24 months. We’re the only leading search company to have taken this step publicly. We believe it’s an important part of our commitment to respect user privacy while balancing a number of important factors.

In developing this policy, we spoke with various privacy advocates, regulators and others about how long they think the period should be. There is a wide spectrum of views on this – some think data should be preserved for longer, others think it should be anonymized almost immediately. We spent a great deal of time sorting this out and thought we’d explain some of the things that prompted us to decide on 18-24 months.

Three factors were critical. One was maintaining our ability to continue to improve the quality of our search services. Another was to protect our systems and our users from fraud and abuse. The third was complying—and anticipating compliance—with possible data retention requirements. Here’s a bit more about each of these:

  • Improve our services: Search companies like Google are constantly trying to improve the quality of their search services. Analyzing logs data is an important tool to help our engineers refine search quality and build helpful new services. Take the example of Google Spell Checker. Google’s spell checking software automatically looks at your query and checks to see if you are using the most common version of a word’s spelling. If it calculates that you’re likely to generate more relevant search results with an alternative spelling, it will ask “Did you mean: (more common spelling)?” We can offer this service by looking at spelling corrections that people do or do not click on. Similarly, with logs, we can improve our search results: if we know that people are clicking on the #1 result we’re doing something right, and if they’re hitting next page or reformulating their query, we’re doing something wrong. The ability of a search company to continue to improve its services is essential, and represents a normal and expected use of such data.
  • Maintain security and prevent fraud and abuse: It is standard among Internet companies to retain server logs with IP addresses as one of an array of tools to protect the system from security attacks. For example, our computers can analyze logging patterns in order to identify, investigate and defend against malicious access and exploitation attempts. Data protection laws around the world require Internet companies to maintain adequate security measures to protect the personal data of their users. Immediate deletion of IP addresses from our logs would make our systems more vulnerable to security attacks, putting the personal data of our users at greater risk. Historical logs information can also be a useful tool to help us detect and prevent phishing, scripting attacks, and spam, including query click spam and ads click spam.
  • Comply with legal obligations to retain data: Search companies like Google are also subject to laws that sometimes conflict with data protection regulations, like data retention for law enforcement purposes. For example, Google may be subject to the EU Data Retention Directive, which was passed last year, in the wake of the Madrid and London terrorist bombings, to help law enforcement in the investigation and prosecution of “serious crime”. The Directive requires all EU Member States to pass data retention laws by 2009 with retention for periods between 6 and 24 months. Since these laws do not yet exist, and are only now being proposed and debated, it is too early to know the final retention time periods, the jurisdictional impact, and the scope of applicability. It's therefore too early to state whether such laws would apply to particular Google services, and if so, which ones. In the U.S., the Department of Justice and others have similarly called for 24-month data retention laws.
At the same time, regulators in other parts of governments have argued for shorter retention periods, reflecting the conflicts in every country between privacy and data protection objectives on the one hand, and law enforcement objectives on the other. Companies like Google are trying to be responsible corporate citizens, and sometimes we are told to do different things by different government entities, or to follow conflicting legal obligations. It's hard enough to get different government entities to talk to each other inside one country. When you multiply this by all the countries where Google must comply with the laws, the potential conflicts are enormous. Nonetheless, Google is committed to providing its users around the world with one consistent high level of data protection.

It’s also worth reiterating that we do not ask our users for their names, address, or phone numbers to use most of our services. For those who want to see what their logs history looks like, we offer transparent access via a Google Account to their own personal Web History.

Finally, we maintain rigorous internal controls of our logs database. We look forward to an ongoing discussion with privacy stakeholders around the world as we pursue a common goal of improving privacy protections for everyone on the Internet.

1 comment:

Tim Trent said...

Google retains data about searches because it is a vendor of advertising, rather than a search engine company.

As a user of Adwords to drive traffic to sites I need to see how effective my outbound campaigns have been, which means I need to know broadly where they have been posted, and I need to track their conversion.

Conversion tracking through Google Analytics needs to look at, among other things, the geography of the site visitor, and the keywords through which they arrived. This way I can analyse ways of making my campaign more effective.

Additionally, as a user of Adsense on, for example, Compliance and Privacy I want to judge how well my content is received in order to tailor the content to meet the market need for information together with relevant adverts to encourage the visitor to click an advert to leave rather than simply closing the window. And this is the same on my own blog "Marketing by Permission"

Google remembers searches in order to let its customers know details about the visitors it sends in more detail than the trail left by the visitor's browser itself in the sit logs, logs which are often inaccessible to the customer on sites they track with Analytics.