New web-based model for sharing research datasets could have huge benefits

October 15, 2012

(Credit: stock image)

A group of researchers have proposed creating a new web-based data network to help researchers and policymakers worldwide turn existing knowledge into real-world applications and technologies and improve science and innovation policy.

Researchers around the world have created datasets that, if interlinked with other datasets and made more broadly available, could provide the needed foundation for policy and decision makers. But these datasets are spread across countries, scientific disciplines and data providers, and appear in a variety of inconsistent forms.

Writing in the new issue of the journal Science, seven researchers propose a new data network that can help bring this knowledge together and make it available to all.

The benefits to society from such a network are clear, said Bruce Weinberg, co-author of the paper and professor of economics at Ohio State University.

“Such a network could help scientists, policymakers and business people take the knowledge that is now locked in scientific publications and create new technologies and applications,” Weinberg said. “This is a key to economic growth.”

The purpose of this new model is to make data accessible, said Laurel Haak, co-author of the paper and executive director of ORCID, an international, interdisciplinary, open, and not-for-profit organization formed to provide a registry of unique identifiers for researchers.

“Researchers lament the lack of data sharing. But a new data infrastructure has the potential to overcome that problem and potentially transform research practice itself,” Haak said.

In the Science article, the authors say that one key to making this proposed project work is to have a unified set of standards between databases and platforms. One simple example is that databases often have different ways of identifying authors. In one database, an author may be listed as “David A. Smith” while another would list the same person as “D.A. Smith.” Other researchers would have no way of knowing if these two records referred to the same author.

“We need a coordination of data exchange standards to make this effort work,” said David Baker, co-author and executive director of Consortia Advancing Standards in Research Administration Information (CASRAI), a non-profit standards development organization.

This new data infrastructure must only be a “thin layer” on top of the database structures that already exist, Baker added. “It needs to work seamlessly with the databases and platforms we already have in place. “It shouldn’t add another layer of complexity.”

One major issue is achieving broad-based participation in this effort, said co-author Gregg Gordon, president and CEO of the Social Science Research Network.

“We need to have participation from researchers in all fields, whether they work in multinational corporations, non-profits, government agencies or universities,” Gordon said. “We need all the different players to work together to make this effort successful.”

Users of the infrastructure would use the public data and tools at no charge, pay for access to private areas and tools, and apply for access to security-sensitive part of the system.

The authors of the Science paper emphasize that no single organization can manage this infrastructure alone. Governments, non-profits, and for-profits must all collaborate.

They envision a steering committee comprising members of the major data providers, including government agencies, standards organizations, private data vendors as well as the research community.

While a lot of work needs to be done, the researchers say the effort will be worth it.

“The model we propose provides tremendous benefits from combining and mining the vast quantities of data that are already available,” the authors conclude.