in Libraries/Info Sci, Tech

Search engines’ top ten lists – What do they really mean?

As December comes to a close, the web once again gets swarmed with “year’s best” or “most popular” lists. I’ll probably be writing my own annual wrapup post in the near future. Today and yesterday I spent some time looking at Google and Yahoo’s top ten search terms for the year:

Google’s “Top Searches in 2006” (source):

  1. bebo
  2. myspace
  3. world cup
  4. metacafe
  5. radioblog
  6. wikipedia
  7. video
  8. rebelde
  9. mininova
  10. wiki

Yahoo’s “Top 10 Overall Searches” (source):

  1. Britney Spears
  2. WWE
  3. Shakira
  4. Jessica Simpson
  5. Paris Hilton
  6. American Idol
  7. Beyonce Knowles
  8. Chris Brown
  9. Pamela Anderson
  10. Lindsay Lohan

Now, something obviously doesn’t add up here. There’s no way Google’s and Yahoo’s user bases can be that different.

I started looking into just how each search engine calculates and chooses their top results. Google somewhat vaguely states that “To compile these year-end lists and graphs, we reviewed a variety of the most popular search terms that people typed into Google.”

Looking for more specifics, I ran across this interview with a Google VP. It turns out that the top ten is not based on simple popularity. Instead, it is based on ranking the quickest gainers in popularity. This explains why Bebo is ranked higher than Myspace. As a newcomer relative to Myspace, Bebo had more room to grow. If searches for Bebo went from nothing to a huge level this year, that’s a larger change than Myspace going from an already huge to slightly more huge level. And if searches for pornography and other net vices have leveled off, they won’t make the list either no matter how huge their numbers are. The only terms censored from Google’s list are their own product names.

This computer-generated list contrasts sharply with Yahoo’s policy of heavily editing and paring down their list. Based on reading the FAQ about Yahoo’s ‘Buzz’ rankings, the ranking process seems to follow Google’s pretty closely – the list is based on the largest increase, not simple numbers of searches. What’s more revealing is the list of what’s left out: “Company names (such as Yahoo!), utilities and formats (email, MP3), and general terms (movies, downloads, football)…” This alone explains most of the differences between the two companies’ lists – seven of Google’s ten qualify as company names.

Also, “The editors’ goal is to list subjects that are interesting to the broadest possible audience.” It’s hard to be sure, but I’d imagine the real world interpretation of that statement means the focus is on the entertainment world and the listed celebrities.

So what’s really the most-used search term of the year? Of the two lists, I think Google comes closer to answering the question. But there’s a third option: While I couldn’t dig this deeply into AOL’s search rankings, as they provide no background to the selection process, their top ten list rings a little more true to me. ‘Weather’ is number one, and the rest of the list is mostly generic terms like ‘games’ or ‘lyrics’. Not everyone is a power user, after all. But again, how this list was chosen is a mystery. Ultimately, without access to annual raw data the ‘real’ number one term probably can’t be known.

And there are probably a million ways of defining how the ‘real’ one should be calculated anyway. These lists are still useful in trend spotting, just take them with a grain of salt.