peter norvig on search as a force for good

peter starts with the well-known globe with the google queries superimposed. “google saves over 9000 man-years of effort every year. So Google saves 9000 lives per year.” :) numerous mentions of people making a living from google ads. shows keyhole. “the computers from star trek are always omniscient but never helpful, they never tell you: don’t do this.” the spelling checker is not dictionary based but works on their huge accumulated data, like the 500 spellings of britney. the web is a million times larger than the largest computational linguistics corpuses. a billion documents really makes the difference according to banko and brill. humans achieve around 95% accuracy. work is being done on semantic understanding. one program they run is extracting categories and members of these categories from their corpus. “done very simply: you take the whole web, break it into sentences, and look for about half a dozen patterns, such as including, as in Software companies, including..” they take an automated approach to machine translation by looking for pages on the web that have documents in two languages and derive the model from it. this yields a level of translation that is good enough. “doug cutting was more interested in the crawling and indexing side, so thats where lucene looks good, not so much in the sorting.” google focuses on the 95%, the easy part, to get more leverage, but will have to go back to the hard part. they found that the feedback button didn’t work. some people were writing them that they were looking for a specific book by typing in “library”, which indicates a deeper problem. the first google logo was done during burning man 1999 to indicate that “no one is at the office, don’t blame us”.