Tribune Company says it was a “Googlebot” searching for news postings that was to blame for a six-year-old story about UAL filing bankruptcy being identified as new – temporarily sending the airline’s stock crashing on Monday. Tribune claims it identified problems with the Googlebot months earlier and asked Google to stop using it to crawl Tribune sites – while Google denies ever receiving such a request.
We don’t have a dog in this fight, nor are we experts on how Web-searching bots work. So we will let each side tell its tale and you can decide.
“Tribune Company today said the confusion surrounding a 2002 Chicago Tribune article on the Internet this past weekend started with the inability of Google’s automated search agent "Googlebot" to differentiate between breaking news and frequently viewed stories on the websites of its newspapers. The company said it identified problems with Googlebot months ago and asked Google to stop using Googlebot to crawl newspaper websites, including The Sun Sentinel (Ft. Lauderdale), for inclusion in Google News. Despite this request, Google continued using Googlebot to crawl The Sun Sentinel’s website.
Despite the company’s earlier request and the confusion caused by Googlebot and Google News earlier this week, we believe that Googlebot continues to misclassify stories.
Tribune also released a summary of the sequence of events started by Googlebot’s crawling The Sun Sentinel’s website in the late-evening and early- morning hours of September 6 and September 7. The summary is as follows:
The article, headlined "United Airlines Files for Bankruptcy," was originally published in the Chicago Tribune in 2002, and appeared on the newspaper’s website. It then became part of the online database of Tribune’s newspapers. Our records indicate that the Googlebot crawled this story as recently as September 2 and September 3 and apparently treated it as old news.
On September 7, 2008 at 1:00:34 ET, (Sept. 6, 2008, 10:00:34 PT) our records indicate that the article received a single visit. Given the fact that it was the middle of the night, traffic to the business section of the Sun Sentinel site was very low at the time. We believe that this single visit resulted in a link to the old article being created on a dynamic portion of the Sun Sentinel’s business section under a tab called "Popular Stories Business: Most Viewed."
Again, no new story was published and the old story was not re-published-a link to the old story was merely created. The URL for the old story did not change when the link appeared.
On September 7, at 1:36:03 ET (Sept. 6, 10:36:03) a user of the Sun Sentinel’s website, viewing a story about airline policies regarding cancelled flights, clicked on the link to the old story under the "Popular Stories Business: Most Viewed" tab. Fifty-two seconds later, at 1:36:57 ET (10:36:57 PT), Googlebot visited the Sun Sentinel’s website again and crawled the story.
This time, despite the fact that the URL to the old story hadn’t changed, despite the fact that Googlebot had seen this story previously, it was apparently treated as though it was breaking news. Shortly thereafter, Google provided a link to the old story on Google News and dated it September 6, 2008. Google’s dating the story on Google News made it appear current to Google News users.
The first referral to the story from the link provided by Google News came just three minutes later, at 1:39:59 ET (10:39:59 PT).
Traffic to the old story increased during the course of the day, Sunday, September 7, with the bulk of it being referrals from Google. On Monday, September 8, traffic increased even more after a summary of the Google News story was made available to subscribers of Bloomberg News.”
Google posted its explanation on its company blog and we’ll send you there by this link to see not only the text, but also screen shots of the Sun Sentinel as the events unfolded.
RBR/TVBR observation: The two explanations actually line up pretty well, other than the dispute over whether or not Tribune asked Google to quit crawling its news sites with the bot. It is quite clear that a human being looking at the same page that the bot crawled would never have made the same error. A new headline at the bottom of the most popular box located in the far lower right corner of the business news main page would not have been interpreted by a human as breaking news. But bots are not human.