How can Google be so fast?

Latency is killed by disk accesses. Hence it’s reasonable to believe that all data used to answer queries is kept in memory. This implies thousands of servers, each replicating one of many shards. Therefore the critical path for search is unlikely to hit any of their flagship distributed systems technologies GFS, MapReduce or BigTable. These will be used to process crawler results, crudely.

The handy thing about search is that there’s no need to have either strongly consistent results or completely up-to-date data, so Google are not prevented from responding to a query because a more up-to-date search result has become available.

So a possible architecture is quite simple: front end servers process the query, normalising it (possibly by stripping out stop words etc.) then distributing it to whatever subset of replicas owns that part of the query space (an alternative architecture is to split the data up by web pages, so that one of every replica set needs to be contacted for every query). Many, many replicas are probably queried, and the quickest responses win. Each replica has an index mapping queries (or individual query terms) to documents which they can use to look up results in memory very quickly. If different results come back from different sources, the front-end server can rank them as it spits out the html.

Note that this is probably a long way different from what Google actually do – they will have engineered the life out of this system so there may be more caches in strange areas, weird indexes and some kind of funky load-balancing scheme amongst other possible differences.

Leave a Comment