Yahoo on Wednesday announced a new data -retention policy, but on Thursday privacy experts were still scratching their heads about what the policy really means.
Although it appeared Yahoo was setting an industry standard for data retention with a promise to anonymize user log data within 90 days (with limited exceptions for fraud, security and legal obligations), privacy advocates say the announcement isn't clear.
"It's subtle, but it's important. Yahoo is not slashing its data-retention policy to three months," said Marc Rotenberg, executive director for the Electronic Privacy Information Center. "Yahoo is modifying the data at three months and keeping the data. So the real question is what is happening to the information that these search companies are keeping?"
Confusing the Privacy Issue
Yahoo said the heads of its business and engineering units worked with the privacy and data-governance teams to review the company's data needs. The goal was to ensure that Yahoo retains data only long enough to serve its business and user-experience needs while maintaining the ability to fight fraud, secure systems, and meet legal obligations.
"This policy represents Yahoo's assessment of the minimum amount of time we need to retain data in order to respond to the needs of our business while deepening our trusted relationship with users," said Anne Toth, Yahoo's vice president of policy. Yahoo is also expanding its policy to apply not only to search-log data but also page views, page clicks, ad views, and ad clicks.
Rotenberg said he would welcome a retention policy in which data is deleted or destroyed, but that's not what Yahoo has announced. As a consequence, he said, there's more confusion about what search-engine companies are doing with the data they collect.
"Just to give an example, the IP address is a unique identifier that more often than not links a search query to an individual user. Yahoo is not even removing the entire IP address. They are knocking out the last few digits," Rotenberg said. "That's a little like having someone's phone number and taking the last number off of it. That's not deletion. It gets very subtle and very complicated."
What Happens When We Search?
When a user conducts a search on Yahoo, Google, Microsoft or another search engine, the companies collect a large volume of data. Most users aren't aware of how much data is collected because it happens behind the scenes.
For example, when a user searches, the engine collects data around the text, which is referred to as the search query. It also saves the date and time stamp, which is when the search occurred down to the second. It records the cookie, which is more accurately called a persistent identifier. It saves the IP address, and there's also a record locater.
"When the companies say they are deleting or anonymizing, neither of those statements are true. What they are really doing is modifying the data that they are keeping. Then the interesting question is, how are they actually modifying the data? What is being kept? What isn't being kept?" Rotenberg asked. "And the fairly obvious question is, is it possible to re-identify the person that made the search, because, at least from the privacy perspective, that's what this is all about."
|