Tuesday, September 11, 2007

1,070,000 search results, give or take 1,069,741

Matt Waite wrote on his blog the other day that the new Web site he created, Politifact, already had "more than 1 million results in Google" after only two weeks online. That number sounded incredible to me, so I Googled Politifact, and found that Google did indeed report, in the blue bar at the top, that the results of that search were the first 1-100 of more than 1 million:

So I scrolled through the results to see which sites these were, and, when I got to the bottom, saw links for only five actual pages of results:

And when I clicked on the link for page 5, lo and behold, suddenly there were links for only three pages of results on the bottom. And at the top, on the blue bar where just minutes before it had reported 1,070,000 results for Politifact, it reported only 259:

You can try almost any search term and get wildly inconsistent counts each time you search. SearchEngineWatch demonstrated it using the nonsensical term djkfdkjfdkjddfdfdd. And a Frenchman demonstrated it using variations of Chirac and Sarkozy.

It's been obvious for a long time that Google search results are an entirely bogus form of evidence, as this 2004 MediaBistro article pointed out:

Writers crafting trend stories—or, for that matter, profiles, or lifestyle pieces, or reviews, or even news items—are always desperate to prove the popularity of whatever hot new thing they're identifying. They could do some reporting, of course, or find some statistical research, but instead they're technologically smitten, like everyone else. What's a simpler, or faster, way of quantifying a trend than typing a key word or phrase into Google? Type in almost any person, place, or thing, and Google will bounce back to you a neat numerical value that calculates that person, place, or thing's importance to this world. The writer can sit back and let the search engine's brainy algorithms do all the work—and then even pick up some tech-savvy bonus points, too. Google, and not polls or pie charts, has emerged as a journalist's best friend—and best source.

Not that Journalists are alone in this. Judges do it too:

A New York federal judge said a Google search had helped him decide that 24 Hour Fitness should not receive an injunction against a competitor that owned 24hourfitness.com. The judge said a search for "fitness industry" on the Internet revealed more than 1.6 million hits, mainly linking to sites related to physical training and conditioning.

Google Fight even purports to pick winners based on which search terms get the most results.

Google isn't the only purveyor of flaky numbers. When Robert Scoble found similar inconsistent results from MSN and Yahoo, it prompted him to ask, "Why aren’t there any truth in advertising laws for search engines?"

The problem is that we just don't know what's going on behind the scenes, so to cite any number churned out by a search engine as proof of something without knowing how it was generated is specious. It's a particularly egregious form of confirmation bias. The only Web page count we should trust is the one where we've used our own eyeballs to scrutinize each and every page.

After I pointed out the discrepancies, Matt did the search again and updated his blog to show that the Google results "now stands at 687,000."

"Regardless," he added, "the point remains — your Google hit count is a largely meaningless milestone, except to show that the site has spread widely."

Except it doesn't even show that. When I searched for Politifact again today, it showed 446,000:

Tomorrow, who knows?

0 comments: