Tuesday, December 11, 2007

Google Finds Less Search Results

Everybody should know that when you use a search the engine, the number of search results is just an estimate. You only look at the first 10 or 20 results anyway and, in some cases, the search engine doesn't let you access more than a certain number of results. For example, Google only lets you see the top 1000 results, mostly for efficiency reasons.

"When you perform a search, the results are often displayed with the information: Results 1 - 10 of about XXXX. Google's calculation of the total number of search results is an estimate. We understand that a ballpark figure is valuable, and by providing an estimate rather than an exact account, we can return quality search results faster." (Google help center)

But recently something has changed in Google's algorithm that estimates the number of results. Here's a comparison between the number of results for [Moby] in May (notice the recently-launched bar that used gradients) and today:

Searching for Moby (May 17, 2007)

Searching for Moby (December 10, 2007)

From 15 million results to only 2 million results, there's a long way. For the same query, Yahoo estimates 18,900,000 results, Microsoft finds 7,730,000 results, while Ask only finds 4,089,000 results. Notice that all the other three major search engines show bigger numbers than Google. You might think that this query is just an exception, but that's not the case. Almost every query shows much less results in Google than in other search engines.

A search for [Google] shows (the numbers may vary across different data centers):
* 132,000,000 results - Google (screenshot)
* 1,610,000,000 results - Yahoo
* 244,000,000 results - Windows Live
* 281,620,000 results - Ask.com

And even if this estimate has never been reliable, it's strange to see a such an obvious inaccuracy. If you use complicated queries (more than 3-4 keywords), the estimates become more accurate and Google starts to show more results than other search engines.

In other related news, Google started to treat subdomains the same as directories for some queries. "For several years Google has used something called host crowding, which means that Google will show up to two results from each hostname/subdomain of a domain name. That approach works very well to show 1-2 results from a subdomain, but we did hear complaints that for some types of searches (e.g. esoteric or long-tail searches), Google could return a search page with lots of results all from one domain. In the last few weeks we changed our algorithms to make that less likely to happen in the future," explains Matt Cutts.

Google - Yahoo Comparison
Persistent queries (Greasemonkey script)
Index size and estimation (given two search engines, what are the relative sizes of their indexes?)