Pilot 2

From pro-iBiosphere Wiki
Jump to: navigation, search

Common query/response model for automated registration of higher plants (International Plant Names Index, IPNI), fungi (Index Fungorum, MycoBank) and animals (ZooBank)

Person leading the pilot: Lyubomir Penev - Pensoft

Institutions involved in the pilot:

IPNI (Nicola Nicolson, Alan Paton. Christine Barker, Christopher Hopkins)
ZooBank (Richard Pyle, Eobert Whitton)
Index Fungorum (Paul Kirk)
MycoBank (Vincent Roberts)

The present description is a concise summary of a paper in press describing the results of this pilot (Penev et al. in press).

There are several ways as to how registration can be best implemented. Different options and the relationship to the publication process have been extensively reviewed by Pyle and Michel (2008) and Morris et al. (2011). The concept of the automated registration model has first been presented by several of the authors of this article at the Sherborn meeting in London in October 2011 and at the Biostystematics 2013 Conference in Vienna, in February 2013.

Registration of many new nomenclatural acts might be a tedious and extremely time-consuming process if done “by hand”, especially in the recently introduced but increasingly submitted “turbo-taxonomic” papers, combining molecular data, concise morphological descriptions, and digital imaging (Butcher et al. 2012, Riedel et al. 2013). The numbers of new taxa described in such papers may count in hundreds, for example 178 new species of parasitic wasps (Butcher et al. 2012) and 101 new species of Trigonopterus weevils (Riedel et al. 2013). The ultimate record is held by the paper of Marsh et al. (2013) describing 277 new braconid wasps from Costa Rica. This paper is remarkable also because it became the first “turbo-taxonomic” paper where all 277 new species have been registered in Zoobank automatically in just a few seconds, saving a great deal of time to the authors, publisher and the registry.

In our view, the registration of nomenclatural acts and the quality control of the bibliographic metadata in these registries should be a primary responsibility of publishers and registry curators and, to a lesser extent, of authors. Registration of a nomenclatural act could be initiated by an author, at the pre-submission or pre-acceptance for publication stage. However, we prefer the publisher initiated model as that avoids registry curators curating data which may never be published according to the rules of the relevant code. Such a practice may lead to “over-saturation” of the registries with names that are not validly published, causing confusion. Focusing on names accepted for publication also allows these curators more time to focus on the published act and this may allow these specialist staff to assist publication by identifying inconsistencies with the relevant code. Moreover, the publishers’ role is essential in checking and correcting the pre-publication registration details against the finally published information. This “journal-centric” registration, has already been implemented in the Pensoft’s journals ZooKeys and PhytoKeys. The model presented below could easily be adapted for author initiation, though we envisage that there would be a greater curatorial overhead and a greater likelihood of errors being created. However, we accept that the model needs to be flexible and allow alternatives if it is to receive community support.

In the “journal-centric” model, the registration of taxonomic and nomenclatural acts involves two main classes of actors: (1) publishers, and (2) registry curators. The publisher takes the responsibility for initiating the registration of nomenclatural acts so that the workflow can be performed following a common stepwise model (see also Fig. 1):

Step 1. XML message from the publisher to the registry on acceptance of the manuscript containing the type of act, taxon names, and preliminary bibliographic metadata; the registry will store the data but not make these publicly available before the final publication date.
Step 2a. Response XML report containing the unique identifier of the act as supplied by the registry and/or any relevant error messages.
Step 2b. Error correction and de-duplication performed manually: human intervention, at either registry’s or publisher’s side (or at both).
Step 3. Inclusion of registry supplied identifiers in the published treatments (protologues, nomenclatural acts).
Step 4. Making the information in the registry publicly accessible upon publication, providing a link from the registry record to the article.

Automated registration process.png

Figure 1. Automated registration process and validation of finally published data and metadata between publisher and registry. Abbreviation on logos: IPNI - International Plant Name Index, IF - Index Fungorum.

The registration process should be as automated as possible. There are several reasons to maximize automation of registration, the most significant being:

  • Increasing cases of bulk, “turbo-taxonimic”, descriptions of new taxa within a single paper, sometimes counted in hundreds, which creates significant overhead on the authoring and editorial process.
  • Decreased risk of errors caused by human intervention (e.g. re-typing).
  • Disambiguation of the dates of acceptance and publication of a manuscript.
  • Efficient and accurate validation of final published data and metadata through automated export from the publisher to the registry on the day of publication.

Automated registration with the International Plant Name Index (IPNI). The pre-publication registration of new plant taxa and nomenclatural acts in IPNI and inclusion of the IPNI identifiers in the protologues was first trialled in the journal PhytoKeys since the publication of its first issue in 2010 (Penev et al. 2010). With the pro-iBiosphere project the workflow has been piloted to include an automated registration module. The pilot project uses a custom XML format illustrated by a new genus Lettowia description and new combination Lettowia nyassae (Oliv.) H. Rob., comb. nov. in the paper of Robinson and Skvarla (2013) (Appendices 1 and 2). The emphasis of the pilot was to understand the workflow; as this is scaled up to production use with a broader range of partners, IPNI will move to use the Taxon Concept Schema standard to encode the data exchanged. This will enable broader adoption. The XML query is submitted to IPNI’s Application Programming Interface (API) through a POST request and replied back with automatically inserted IPNI registration identifiers.

Automated registration with Index Fungorum. The registration workflow of Index Fungorum (IF) will adopt that of IPNI after the IF system has moved to Royal Botanical Gardens Kew to run alongside IPNI.

Automated registration with MycoBank. The following methods of the MycoBank API are enough for a straightforward implementation:

  1. SearchMycoBankWithFilters
  2. InsertUserProfile
  3. UpdateUserProfile
  4. InsertMycobankRecord
  5. UpdateMycobankRecord

Using the combinations (1, 2, 3) and (1, 4, 5) one can implement the Upsert (Update if exists, Insert otherwise) semantics required for the the Common query/response registration model. As there are multiple fungi registries (MycoBank, IndexFungorum, Fungal Names), another approach would be to perform the registration with only one of them and rely on the synchronization mechanisms (currently being built) to propagate the information to the other databases.

Automated registration with ZooBank. Similarly to the case of PhytoKeys, ZooKeys was the first journal that implemented a mandatory registration of new taxon names in zoology, since the publication of its first issue in 2008 (Penev et al. 2008). The automated registration with ZooBank is based on a slightly different approach than that with IPNI and uses the TaxPub XML schema (Catapano, 2010) as a basic standard. Upon acceptance and producing the XML version of the manuscript, we upload it on the Zoobank server through the ZooBank’s interface (see Appendix 3 for the submitted TaxPub XML format). Then a software tool at ZooBank harvests the TaxPub XML and registers the title, authors and new taxon names. The tool also checks if some or all authors have been previously registered and inserts their current (or newly registered) ZooBank UUIDs. In case in the ZooBank database there are authors with identical names (homonyms), the interface displays these so that the operator at the editorial office could disambiguate the overlapping authors’ names by selecting the right one. The whole TaxPub XML is sent back with inserted UUIDs for the article, authors and new names (Appendix 4). In case the manuscript XML has been changed after the registration process, it can be uploaded again and the new data will replace the previous ones. At the day of publication, the names and the bibliographic metadata are made publicly available in ZooBank.

What other journal publishers should do to use the workflow? The registration workflow published in this article is free to use for anyone who would like to implement it. To ensure broader adoption of the registration model, the data exchanged through the workflow should be encoded in a standard. For zoology journals should adopt the TaxPub XML schema (Catapano 2010; open source available at: https://github.com/tcatapano/TaxPub/releases/tag/v0.5-beta) which encodes publications as required by the zoological code. For botany the registration workflow will implement the Taxon Names and Concepts (TCS) XML schema (reference) which encodes names. The supplementary files 1-4 show some data encoded for the pilot project using a custom XML format – whilst this shows the kind of data that will be exchanged, it should not be used as a template – the TCS and TaxPub standards should be used as reference. Once the editorial workflow is defined, and structured data can be produced according to these standards, journal editors should contact registries for access to their Application Programming Interfaces (APIs).

Supplementary file 1. XML query sent from Pensoft to IPNI on the day of acceptance of the manuscript for publication [exemplified with the paper of Robinson and Skvarla (2013)].

Supplementary file 2. XML response of IPNI to the query in Appendix 1. The response is sent back to Pensoft and contains the registration numbers of the new genus name and the new combination [exemplified with the paper of Robinson and Skvarla (2013)].

Supplementary file 3. TaxPub XML of a ready-to-publish manuscript submitted from Pensoft to ZooBank [exemplified with the paper of Morffe and Rodríguez (2013)].

Supplementary file 4. TaxPub XML returned from ZooBank to Pensoft containing UUIDs of the article, authors and new taxon names [exemplified with the paper of Morffe and Rodríguez (2013)].