A free database of the entire Web may spawn the next Google
January 24, 2013

A nonprofit called Common Crawl is now using its own Web crawler and making a giant copy of the Web that it makes accessible to anyone.
The organization offers up over five billion Web pages, available for free so that researchers and entrepreneurs can try things otherwise possible only for those with access to resources on the scale of Google’s, MIT Technology Review reports.
Common Crawl has so far indexed more than five billion pages, adding up to 81 terabytes of data, made available through Amazon’s cloud computing service. For about $25 a programmer could set up an account with Amazon and get to work crunching Common Crawl data, says Lisa Green, Common Crawl’s director.
Common Crawl also has Google’s director of research, Peter Norvig, and MIT Media Lab director Joi Ito on its advisory board.
(More)
Hmm, could the entire Web be stored in DNA or graphene molecules? — Editor
Comments (9)
by Cybernettr
How is this different from Alexa?
by Mishka
So we replaced the name of the cheque’s beneficiary, what’s the big deal?
by Steven
Wasn’t Dr. Negroponte a previous (founding?) MIT Media Lab Director?
Anything MIT has credibility with me. I’ll watch with great interest.
Editor’s mention of DNA or graphene molecules for storage may be spot on!
SLH
by Whittaker
The totality of year 2013′s Internet will fit into a $1 flashdrive in the mid 2040s.
by Jerry
Yeah but that’s ignoring the dark net, otherwise it’ll be counted in petabytes!
by Marcos Marin
Dunno.. How many “million hours of high-definition video” does the web translates into?
by Editor
Hey, we’ll get you a version without the cat videos. Problem solved!
by matt
too funny!
by Marcos Marin
Would it still be the Web then? : )