January 24th, 2008


Google to Host Terabytes of Open-Source Science Data

Very soon now, Google will host terabytes of open-source science data for free, providing everyone with complete access to the data. Users will be able to annotate and comment on the data, as well, creating a sort of peer-review system.

One dataset I'm looking forward to seeing is the Hubble Space Telescope data - all 120 terabytes of it!

Besides simply giving everyone access to research, why is this cool? One real value I see in this is that this will expose "dark data," research that results in negative conclusions (for example, a pill doesn't do what the researchers hoped). Dark data often doesn't appear in scientific journals because of publication bias, where only positive correlations see publication. This alters science, itself, because researchers don't learn about the non-connections that previous researchers discovered, and those might be even more valuable.

So now dark data will have a home. This should save countless dollars and tons of research time, because now researchers can start by seeing what doesn't work simply by digging through the Google archives.