Task 3.3 Semantic integration of biodiversity literature
- Part of WP 3 - Scientific content and workflow coordination
- Lead: MFN
- Participants: FUB-BGBM, PENSOFT, Plazi.
- Start: M1 (September 2012), End: M24 (August 2014).
Biodiversity literature is being digitized by many institutions around the world and the recent eContent plus project Biodiversity Heritage Library for Europe (BHL-Europe) has achieved substantial progress in coordinating and integrating these efforts in the EU. Semantic enhancements of digitized literature, making it more accessible to researchers as well as amenable to direct fact finding, will be in the main focus of this task.
Pro-iBiosphere will coordinate with BHL, BHL-Europe and BHL-Global works on the analysis of implementation of webservices to either enhance the data at ingest (TaxonFinder) or at search (CoL, PESI, VIAF). To facilitate further mark-up at project level, Plazi and the other partners will analyze the XML schemas currently implemented in their workflows.
Three viable paths for future improvement of semantic mark-up are presently recognized:
- fully automated natural language processing (NLP),
- base mark up complemented by automated processing and specialist correction, and
- social crowd-sourcing models (citizen involvement).
The purpose of the present coordination task is to align ongoing and forthcoming efforts to semantic mark up of biodiversity literature and provide technical and social solutions for their use. A workshop will be organized on the subject (MS12).
See also pro-iBiosphere Deliverables.
- D3.1 Best Practices Guide on editorial policies (estimated at 9 person months)
- D3.2.1 Concept paper for involvement of individual experts, commercial vendors, and citizen scientists (estimated at 4.5 person months)
- D3.2.2 Report on the state and quality of biosystematics documents and survey reports (estimated at 4.5 person months)
- D3.3.1 Report on state-of the art and research horizons of semantic integration of biodiversity literature (estimated at 5.75 person months)
- XML standards in use for taxon treatments have been discussed in File:Pro-iBiosphere WP2 PLAZI D2.1.1 VFF 30062013.pdf
- D3.3.2 Report on progress during the coordination process of partners and non consortium partners (estimated at 5.75 person months)
See also pro-iBiosphere Milestones.
- MS10 - Workshop on data curation and acquisition of Floras and Faunas
- MS11 - Workshop on semantic mark-up generation, data quality and user-participation infrastructure
- MS12 - Workshop on mark-up of biodiversity literature
Second year report
Summary of progress towards objectives
In year 2, there were two deliverables and one milestone in Task 3.3. All of them matched with the provisions in Annex I.
Specifically, D3.3.1 reported on state-of the art and research horizons of semantic integration of biodiversity literature, with special attention to the experiences of Biodiversity Heritage Library (BHL) Europe and of the pro-iBiosphere partners Plazi and Pensoft. It also reported on tools and services available to facilitate semantic integration, and it assessed the feasibility of integrating semantically enhanced information into knowledge management workflows in biodiversity research.
Milestone MS12 was a workshop held to coordinate efforts on semantic integration of biodiversity literature. It brought together specialists active in various areas around the topic, who exchanged experiences and long-term strategic goals along with information on their current activities. It laid the groundwork for D3.3.2, which reviewed use cases for markup of the biodiversity literature, considered several approaches to semantic enrichment, and generated an initial work plan and roadmap for the semantic integration of biodiversity literature. This included recommendations for prioritization, namely to concentrate on semantic enrichment of selected revisionary works, to select the method of enrichment according to concrete use cases, to invest in workflows that automatically produce semantically integrated publication forms, and to integrate past knowledge in a way that is compatible with Web 2.0 approaches to enriching contemporary knowledge.
- Together, D3.3.1 and D3.3.2 provide a comprehensive overview of past, ongoing and planned efforts towards semantic integration of biodiversity literature and position them in the context of the envisioned Open Biodiversity Knowledge Management System.
- The workshop MS12 inspired several of the projects tackled at the Biodiversity Data Enrichment Hackathon co-organized by pro-iBiosphere and Naturalis in March.
- If applicable, explain the reasons for deviations from Annex I and their impact on other tasks as well as on available resources and planning;
- If applicable, explain the reasons for failing to achieve critical objectives and/or not being on schedule and explain the impact on other tasks as well as on available resources and planning (the explanations should be coherent with the declaration by the project coordinator) ;
- A data mining workshop was organized as a fringe event to OKFest
- After inconsistencies in publishers' use of the JATS delivered to PubMed Central were discovered, a standardization group (JATS 4 Reuse) has been formed to address these issues.