Pilot 4

Revision of a tool (CharaParser) that generates identification keys by reusing morphological characters from published species descriptions

Person leading the pilot: Donat Agosti - Plazi

Institutions involved in the pilot:

University of Arizona (Bob Morris and Hong Cui)
FUB-BGBM (Andreas Müller & L.Morris)
UPMC (Reginé Vignes)
MfN & Plazi (Gregor Hagedorn)

A comparison of character parsing techniques will be made. In order to facilitate this, treatments from legacy literature will be marked-up to morphological characters, locality and bibliographic citations.

Other data types (e.g. phylogenies, ecological data) need to be discussed with other ongoing initiatives/projects.

Key generation functions in the CDM:

Single-access keys. The FUB-BGBM has integrated a java library from XPer2 (UPMC). This library allows to generate single-access keys from SDD-data.
Multi-access keys. The basis for a multi-access key is the availability of (i) highly-structured descriptive data and (ii) data in the right format (SDD).

Highly-structured descriptive

This step involves integrating the existing multi-access key user interface (developed by the UPMC, France) for querying the data into the EDIT
Platform for Cybertaxonomy. The Integration of Xper into the platform is kind of a direct plugin for the Taxonomic Editor. The task is not yet ready. Work on this will continue by the end of February 2013.

Data in the right format (SDD). This will be achieved by integrating a Java applet (also developed by the UPMC) into the EDIT platform Portals-Systems (Lorna Morris, EU FP7 ViBRANT project). The Java applet for playing keys is more or less ready. Tests conducted in November 2012.

The applet can run in two modes one being the key-player, the other one being an editor. The FUB-BGBM will use the first option based on SDD. In the context of ViBRANT, Andreas Müller (FUB-BGBM) is working on a solution which allows for using Xper2 for editing descriptive data directly in the CDM, in order to avoid synchronization issues. The applet requires data in SDD format. The transformation of descriptive marked-up data into a highly atomized structure (SDD or CDM) is required. An estimation of the resources needed to program the transformations will depend on the form and quality of the mark-up of the descriptions. For the moment, we do not need to use Xper for the pilots.

The project might not necessarily need Xper to create descriptive data from the markup. A different option is to use half automated methods from the currently half structured descriptive mark-up. pro-iBiosphere will need to investigate technologies for (semi-) automatic transformation of descriptive mark-ups. The transformation for all the six pilot groups will depend on the mark-up of the pilot taxa.