automated aggregation of scientific literature

I am intrigued by this example from Stanford of computerized review and synthesis of scientific literature:

Over the last few years, we have built applications for both broad domains that read the Web and for specific domains like paleobiology. In collaboration with Shanan Peters (PaleobioDB), we built a system that reads documents with higher accuracy and from larger corpora than expert human volunteers. We find this very exciting as it demonstrates that trained systems may have the ability to change the way science is conducted.

In a number of research papers we demonstrated the power of DeepDive on NMR data and financial, oil, and gas documents. For example, we showed that DeepDive can understand tabular data. We are using DeepDive to support our own research, exploring how knowledge can be used to build the next generation of data processing systems.

Examples of DeepDive applications include:

  • PaleoDeepDive – A knowledge base for Paleobiologists
  • GeoDeepDive – Extracting dark data from geology journal articles
  • Wisci – Enriching Wikipedia with structured data

The complete code for these examples is available with DeepDive.

Let’s just say an organization is trying to be more innovative. First it needs to understand where its standard operating procedures are in relation to the leading edge. To do that, it needs to understand where the leading edge is. That means research, which can be very tedious, and time consuming. It means the organization is paying people to spend time reviewing large amounts of information, some or even most of which will not turn out to be useful. So a change in mindset is often necessary. But tools that could jump start the process and provide short cuts would be great.

This is my own developing theory of how an organization can become more innovative: First, figure out where the leading edge is. Second, figure out how far the various parts of your organization are from the leading edge. Third, figure out how you are going to bring a critical mass of your organization up to the leading edge – this is as much a human resource problem as an innovation problem. Fourth, then and only then, you are ready to try to advance the leading edge. I think a lot of organizations have a few people that do #1, but then they skip right to #4. Then that small group is way outside the leading edge while the bulk of the organization is nowhere near it. That’s not a recipe for success.

One thought on “automated aggregation of scientific literature

  1. Pingback: December 2014 in Review | Future Yada Yada Yada

Leave a Reply

Your email address will not be published. Required fields are marked *