New computer better than humans at cataloguing science

A new computer system is better than scientists at the complex task of extracting data from scientific publications and placing it in a database that catalogues the results of thousands of individual studies.

"We demonstrated that the system was no worse than people on all the things we measured, and it was better in some categories," said Christopher Re, who guided the software development for the project while at the University of Wisconsin-Madison. 

The development marks a milestone in the quest to rapidly and precisely summarise, collate and index the vast output of scientists around the globe, said first author Shanan Peters, a professor of geoscience at UW-Madison. 

Peters and colleagues set up the faceoff between PaleoDeepDive, their new machine reading system, and the human scientists who had manually entered data into the Paleobiology Database. 

The knowledge produced by paleontologists is fragmented into hundreds of thousands of publications. 

Yet many research questions require what Peters calls a "synthetic approach: For example, how many species were on the planet at any given time?" 

Teaming up with Re, now at Stanford University, and UW-Madison computer sciences professor Miron Livny, the group built on the DeepDive machine reading system and the HTCondor distributed job management system to create PaleoDeepDive. 

"Getting started required a million hours of computer time," said Peters. 

PaleoDeepDive mimics the human activities needed to assemble the Paleobiology Database. 

"We extracted the same data from the same documents and put it into the exact same structure as the human researchers, allowing us to rigorously evaluate the quality of our system, and the humans," Peters said.

No comments:

Post a Comment

Copyright © 2013. FRESH LEARNERS - All Rights Reserved