Wednesday, April 23, 2008

Kai-Fu Lee on Cloud Computing

John Breslin highlights some interesting ideas from Kai-Fu Lee's keynote about cloud competing presented at the 17th International World Wide Web Conference (Kai-Fu Lee is the president of Google China from July 2005). He mentions six properties of cloud computing from Google's perspective:

1. User centric. "If data is all stored in the Cloud - images, messages, whatever - once you're connected to the Cloud, any new PC or mobile device that can access your data becomes yours. Not only is the data yours, but you can share it with others."

2. Task centric. "The applications of the past - spreadsheets, e-mail, calendar - are becoming modules, and can be composed and laid out in a task-specific manner. (...) Google considers communication to be a task" and that's the reason why Gmail integrates a chat feature for instant communication.

3. Powerful. "Having lots of computers in the Cloud means that it can do things that your PC cannot do. For example, Google Search is faster than searching in Windows or Outlook or Word" because a Google query hits at least 1000 machines.

4. Accessible. Having your data in the cloud means you can instantly get more information from different repositories - Google's universal search is one example of simultaneous search. "Traditional web page search does IR / TF-IDF / page rank stuff pretty well on the Web at large, but if you want to do a specific type of search, for restaurants, images, etc., web search isn't necessarily the best option. It's difficult for most people to get to the right vertical search page in the first place, since they usually can't remember where to go. Universal search is basically a single search that will access all of these vertical searches."

5. Intelligent. "Data mining and massive data analysis are required to give some intelligence to the masses of data available (massive data storage + massive data analysis = Google Intelligence)."

6. Programmable. "For fault tolerance, Google uses GFS or distributed disk storage. Every piece of data is replicated three times. If one machine dies, a master redistributes the data to a new server. There are around 200 clusters (some with over 5 PB of disk space on 500 machines). The Big Table is used for distributed memory. The largest cells in the Big Table are 700 TB, spread over 2000 machines. MapReduce is the solution for new programming paradigms. It cuts a trillion records into a thousand parts on a thousand machines. Each machine will then load a billion records and will run the same program over these records, and then the results are recombined. While in 2005, there were some 72,000 jobs being run on MapReduce, in 2007, there were two million jobs (use seems to be increasing exponentially)." This recent video has more information about Google's infrastructure.

Kai-Fu Lee thinks that outsourcing IT to a "trusted shop" like Google is the key to make using a computer simple and safe. "Entrepreneurs should have new opportunities with this paradigm shift, being freed from monopoly-dominated markets as more cloud-based companies evolve that are powered by open technologies."

There's a shift from the computer to the user, from applications to tasks, from isolated data to data that can be accessed anywhere and shared with anyone.

"Cloud computing liberates the user from having to remember where the data is, enables the user to access information anywhere once created, and makes services fast and powerful through essentially infinite information and computing. People are using cloud services to find, share, create, and organize information. People are also using cloud services to shop, bank, communicate, socialize. By using cloud computing, these capabilities will be accessible not only on PCs but also telephones, automobiles, televisions, and appliances. (...) Google is committed to help bring about the era of cloud computing, which we believe will facilitate services that are convenient, easy-to-learn, people-centric, scalable, and device-ready," mentions Kai-Fu Lee in the abstract.