performance - How can Google be so fast? -
what technologies , programming decisions make google able serve query fast?
every time search (one of several times per day) amazes me how serve results in near or less 1 second time. sort of configuration , algorithms have in place accomplishes this?
side note: kind of overwhelming thinking if put desktop application , use on machine not half fast google. keep on learning say.
here of great answers , pointers provided:
- google platform
- map reduce
- algorithms crafted
- hardware - cluster farms , massive number of cheap computers
- caching , load balancing
- google file system
latency killed disk accesses. hence it's reasonable believe data used answer queries kept in memory. implies thousands of servers, each replicating 1 of many shards. therefore critical path search unlikely hit of flagship distributed systems technologies gfs, mapreduce or bigtable. these used process crawler results, crudely.
the handy thing search there's no need have either consistent results or up-to-date data, google not prevented responding query because more up-to-date search result has become available.
so possible architecture quite simple: front end servers process query, normalising (possibly stripping out stop words etc.) distributing whatever subset of replicas owns part of query space (an alternative architecture split data web pages, 1 of every replica set needs contacted every query). many, many replicas queried, , quickest responses win. each replica has index mapping queries (or individual query terms) documents can use results in memory quickly. if different results come different sources, front-end server can rank them spits out html.
note long way different google - have engineered life out of system there may more caches in strange areas, weird indexes , kind of funky load-balancing scheme amongst other possible differences.
Comments
Post a Comment