Demonstrations on outcomes of pro-iBiosphere Data Enrichment Hackathon

From pro-iBiosphere
Jump to: navigation, search

Background information

From 17 - 21 March 2014, Naturalis and pro-iBiosphere hosted researchers and programmers from around the world to participate in the "Data Enrichment Hackathon", an event to engage in intensive, collaborative software development to tackle outstanding issues in biodiversity informatics. Short demonstrations (15 min max.) on the outcomes of the Task Groups were presented on the 10th of June 2014 in Meise.

  • Session have been chaired by: Rutger Vos & Soraya Sierra (Naturalis)
  • Detailed information on the Task Groups of the Data Enrichment Hackathon is available here
  • Persons that gave the demonstrations:
  1. Jeremy Miller (Naturalis)
  2. Robert Hoehndorf (Aberystwyth University)
  3. Alan Paton (RBGKew)
  4. Peter Hovenkamp (Naturalis) Presentation: https://drive.google.com/file/d/0Bxm8s21qKIthM21naEp6LTdqWk0/edit?usp=sharing
  5. Quentin Groom (BGM)
  6. Niall Beard (University of Manchester)
  7. Rutger Vos (Naturalis)
  8. Kevin Richards (ex Landcare Research New Zealand)
  9. Aleksandra Pawlik (University of Manchester)
  10. Ross Mounce (Univ. of Bath) http://www.slideshare.net/rossmounce/proibiosphere

Participants

  1. Cherian Mathew - Botanic Garden and Botanical Museum Berlin-Dahlem (BGBM)
  2. Chuck Miller - Missouri Botanical Garden
  3. David Patterson - Plazi
  4. Patricia Mergen - Royal Museum for Central Africa
  5. Aaike De Wever - Royal Belgian Institute of Natural Sciences
  6. Marc Reynders - Botanic Garden Meise
  7. Rutger Vos - Naturalis
  8. Kevin Richards - Landcare Research
  9. Jerôme Degreef - Botanic Garden Meise
  10. Ross Mounce - University of Bath
  11. Terry Catapano - Plazi
  12. Siti Munirah binti Mat Yunoh - Forest Research Institute Malaysia / Kew Botanic Garden London
  13. Dimitri Brosens - Belgian biodiversity Platform
  14. Aleksandra Nenadic - University of Manchester
  15. Niall Beard - University of Manchester
  16. Aleksandra Pawlik - University of Manchester
  17. Donat Agosti - Plazi
  18. Guido Sautter - Plazi
  19. Soraya Sierra - Naturalis
  20. Anton Güntsch - Botanic Garden and Botanical Museum Berlin-Dahlem (BGBM)
  21. Lyubomir Penev - Pensoft
  22. Alex Hardisty - Cardiff University
  23. Jonathan Giddy - Cardiff University
  24. Quentin Groom - Botanic Garden, Meise
  25. Rob Guralnick - University of Colorado (VerNet project)
  26. Henry Engledow - Botanic Garden Meise
  27. Leo Vanhecke - Botanic Garden Meise
  28. Nicolas Noé - Belgian Biodiversity Platform
  29. Claus Weiland - Senckenberg Nature Research Institute
  30. Karol Marhold - Institute of Botany, Bratislava
  31. Bart Aelterman - Research Institute for Nature and Forest (INBO)
  32. Anne-Sophie Archambeau - National Museum of Natural History (MNHN)
  33. Paul-Andre Duchesne - Royal Belgian Institute for Natural Sciences
  34. Walter Berendsohn - Botanic Garden and Botanical Museum Berlin-Dahlem (BGBM)
  35. Natacha Beau - Botanic Garden Meise
  36. Bourgoin Thierry - National Museum of Natural History (MNHN)
  37. Erik Smets - Naturalis Biodiversity Center
  38. Ann Bogaerts - Botanic Garden Meise
  39. Sabrina Eckert - Botanic Garden and Botanical Museum Berlin-Dahlem (BGBM)
  40. Andreas Müller - Botanic Garden and Botanical Museum Berlin-Dahlem (BGBM)
  41. Wouter Addink - SP2000 Secretariat
  42. Ana Casino - Consortium of European Taxonomic Facilities (CETAF)
  43. Robert Hoehndorf - Aberystwyth University
  44. Sylvia Mota de Oliveira - Naturalis Biodiversity Center
  45. Christian Köhler - ZFMK Bonn
  46. Peter Hovenkamp - Naturalis Biodiversity Center
  47. Christina Flann - Species 2000
  48. Régine Vignes - National Museum of Natural History (MNHN)
  49. Claudia C. Soliz Gamboa - Naturalis Biodiversity Center

Demonstration on Data Visualization

Summary of the demonstration:

From this query URI:

http://plazi.cs.umb.edu/GgServer/srsStats/stats?outputFields=bib.author+bib.year+matCit.specimenCount&FP-bib.year=2004-2010&groupingFields=bib.author+bib.year&orderingFields=bib.author&format=json

To produce this JSON output:

{ "labels": { "DocCount": "DocCount", "BibAuthor": "Author", "BibYear": "Year", "MatCitSpecimenCount": "SpecimenCount" }, "data": [{ "DocCount": "34", "BibAuthor": "Charles R. Haddad", "BibYear": "2010", "MatCitSpecimenCount": "36" }, …

That we can visualise as:

Person giving the demo:

  1. Jeremy Miller (Naturalis)

Team members:

  1. David King (Open University)
  2. Jeremy Miller* (Naturalis)
  3. Serrano Pereira (Natuarlis)
  4. Guido Sautter* (Plazi)

For additional information see our formal Task Group page and informal log page. See also the how to on extracting statistical data from GoldenGATE and visualising it, and the pro-iBiosphere spider pilot [1].

Demonstration on Traits

Summary of the demonstration:

We extracted traits from several floras and represented them using some standard vocabulary. We used a text mining approach to mark up with existing ontology terms, and then we constructed Entity-Quality (EQ) statements from the text and linked them to the taxon. From the EQ statements, we constructed a rough draft of a Flora Phenotype Ontology, which has all the traits found in the floras (Flore du Gabon and Flora Malesiana) and the associated taxa.

Results

Person giving the demo:

Robert Hoehndorf (Aberystwyth University)

Team members:

  1. Quentin Groom (BGM)
  2. Robert Hoehndorf (Aberystwyth University)
  3. George Gosline (RBGKew)
  4. Thomas Hamann (Naturalis)
  5. Claus Weiland (Biodiversity and Climate Research Centre/Senckenberg)

For additional information see the Traits Task Group page.

Demonstrations on Links to/from specimens and names

Summary of the demonstrations:

  • Show making links from specimen citations and name references to specimen / name PURLs.

With a distribution reference like "Doi Chiengdao, 1940, Garrett 1189" we can use regular expressions (e.g. in OpenRefine) to split out the locality, year, collector and collector's field number and call the webservice: http://kewmatcher-mattblissett.rhcloud.com/match/basicCollEventMatch?recordedBy=Garrett&fieldNumber=1189&locality=Doi+Chiengdao&eventDate=1940 (new location) to retrieve a JSON response:

[{ (Some properties omitted)
       "id": "http://specimens.kew.org/herbarium/K000523528",
       "eventDate": "19400422",
       "fieldNumber": "1189",
       "recordedBy": "Garrett, H.B.G.",
       "locality": "Doi Chiengdao",
}]

Jordan worked on parsing the data retrieved from the PURL (e.g. http://data.rbge.org.uk/herb/E00571967, which can return RDF) to include in Pensoft's tools.

A name like Crotalaria globifera E.Mey. passed to the name matching service (not yet online) returns http://ipni.org/urn:lsid:ipni.org:names:488179-1, which also returns RDF.

The web service can be queried using Google/Open Refine, by uploading a CSV file, or programatically. Example of using OpenRefine matching African Flora names to IPNI: Fdac-reconciled.png.

Links have been made for roughly 80% of the names in two(?) floristic datasets to IPNI, from ~35% of the distribution statements in Flora Zambesiaca to specimens (K/E/BR -- ~8000 statements reference one of these herbaria, from a total of over 100000 statements), and most of the Wageningen Herbarium specimens to duplicates in Kew, Edinburgh or Miese (where they exist).

  • Graph database approach to deduplicating collectors

Collector-graph.png Diagram shows Edinburgh's herbarium specimen collection event data loaded into Neo4J, with a query for Collectors who share an area of interest, a visited region, a family of interest and an active decade. There are two pairs of collectors shown, which have therefore been deduplicated by making these connections instead of using a string-based approach.

  • Graph database approach to a taxonomic mind mapper

Peter provided a use case which Nicky worked into a practical example using Graph database Neo4J. See the Neo4j Gist, which includes an explanation. Graph nodes representing specimens are connected according to the thoughts of the taxonomist. On basis of this example, Nicky and Peter will continue and develop a more formal specification.


Note: Two different demos will be presented.


Persons giving the demos:

Peter Hovenkamp (Naturalis), The Taxonomic Mind Mapper - where next?

Alan Paton (RBGKew), Citing Kew specimens using the persistent URI scheme


Team members:

  1. Nicky Nicolson (RBGKew)
  2. Matthew Blissett (RBGKew)
  3. Jordan Biserkov (Pensoft)
  4. Peter Hovenkamp (Naturalis)
  5. Ayco Holleman (Naturalis)
  6. Kevin Richards (ex Landcare Research New Zealand)
  7. Daniel Mietchen (MfN)

Demonstration on CDM API

Summary of the demonstration:

The presentation of this hackathon task will be given during session on the Biodiverity Catalogue.

Person giving the demo:

Cherian Mathew (FUB-BGBM)

Team members:

  1. Patricia Kelbert (FUB-BGBM)
  2. Quentin Groom (BGM)

After hackathon improvements:

  1. Cherian Mathew

For additional information see:

Demonstration on SWeDe

Summary of the demonstration:

This demonstration showcased SWeDe - an XML Schema Definition for Scientific Webservice Descriptions.

The 'SWeDe Farmer', an online form to generate a SWeDe document, was used to help visualise the different types of information that SWeDe encapsulates about webservices. Hopefully, we have been able to take a look at how the Biodiversity Catalogue will make use of SWeDe documents by checking for changes to the SWeDe and updating its database records accordingly.

Person giving the demo:

Niall Beard (University of Manchester)

Team members:

  1. Niall Beard (University of Manchester)
  2. Patricia Kelbert (FUB-BGBM)
  3. Bachir Balech (IBBE-CNR)


For additional information see:

Demonstration on NeXML services

Summary of the demonstration:

The demonstration will showcase how data from disparate sources, expressed in a multitude of syntax formats, can be linked together into a single, expressive data format that includes RDFa-compliant annotations. In addition, the demonstration will show how facets of the merged data set can be extracted and serialized into commonly-used file formats. The functionality of this facility is implemented as RESTful web services that can be addressed by the Taverna workflow manager, which makes it available to BioVeL users.

Person giving the demo:

Rutger Vos (Naturalis)

Team members:

  1. Bachir Balech(IBBE-CNR)
  2. Rutger Vos (Naturalis)
  3. Christian Brenninkmeijer (University of Manchester)
  4. Hannes Hetting (Naturalis)


For additional information see The Outcome.

Demonstration on Web interface for correcting OCR text from BHL

Summary of the demonstration(s):

Produced an engaging editing environment for users to quickly fix the underlying text of poorly OCR'd pages from the Biodiversity Heritage Library (BHL). The resultant cleaner text files could then be re-introduced into BHL's scientific name-finding and other text mining routines. Frequencies of edit actions are computed and we hope these might inform developers of OCR software to make improvements in their products. The interface may be equally useful for specimen label digitization efforts, assuming scans can be transformed into DjVu XML files, the source of the present work.

Person giving the demo:

Kevin Richards (ex Landcare Research New Zealand)

Team members:

  1. Rod Page (University of Glasgow)
  2. David Shorthouse (Université de Montréal / Canadensys)
  3. Kevin Richards (ex Landcare Research New Zealand)
  4. Marko Tahtinen (University of Eastern Finland, BioVeL)


For additional information see: The Pitch and the The Outcome and The Proof-of-Concept

Follow-up for David Shorthouse: Contact recipients of the Digging into Data awardees, Sophia Ananiadou (Director of the National Centre for Text Mining and Professor in the School of Computer Science at the University of Manchester), Anatoliy Gruzd, (Director of the SocialMedia Lab and Associate Professor at Dalhousie University), and William Ulate Rodríguez (Technical Director for US/UK of the Biodiversity Heritage Library at the Missouri Botanical Garden) who may benefit from this work.

Demonstration on integration of Taverna Player using IPython Notebook as an example

Summary of the demonstration:

The demonstration showed the use of Taverna Player within an IPython Notebook to establish a connection to a Taverna Player, select a workflow, run the workflow using data from with the Notebook, and return the results of the workflow run to the Notebook.

The demonstration also described the wider usage of Taverna Player, within the BioVeL portal and within Scratchpads sites such as antkey.

Person giving the demo:

Aleksandra Pawlik (University of Manchester)

Team members:

  1. Alan Williams (University of Manchester)
  2. Youri Lammers (Naturalis)
  3. Aleksandra Pawlik (University of Manchester, Software Sustainability Institute)
  4. Ross Mounce (University of Bath)

Outcomes: tavernaPlayerClient package on Python Package Index

For additional information see Taverna Player on GitHub

Static preview of the demo (using ENM) on NBViewer

Demo video

Demonstration on Liberating Open Access figures to Flickr to maximise re-use

Slides from talk given at the Meise wrap-up event: http://www.slideshare.net/rossmounce/proibiosphere

Summary of the demonstration:

High quality scientific figures are published every day. But many of these remain trapped only inside PDF silos, unable to realise their re-use potential. This project created a set of automated scripts to download, extract and re-upload (with attribution) Open Access figures from Magnolia Press biodiversity journals to support and maximise their re-use potential e.g. on Wikipedia. Impact is both academic -- providing a way to visually assess the content of 100's of papers quickly by scanning their figures, and non-academic -- alerting those with an interest in new species to where they are published on social media channels such as Twitter.

Person giving the demo:

Ross Mounce (Univ. of Bath)

Team members:

  1. Ross Mounce (Univ. of Bath)
  2. Youri Lammers (Naturalis)

List of outcomes:

  1. Open media content, made easily shareable & re-usable
  2. Some phylogenetic tree data for re-use & re-analysis
  3. Sample output from 1 paper (10 figures): http://www.flickr.com/photos/79472036@N07/sets/72157642597074643/with/13268597965/
  4. Articles with broken DOI links have been found and reported to CrossRef e.g. http://biotaxa.org/Phytotaxa/article/view/phytotaxa.112.2.1

Links to active instances of this outcome:

  1. MEDIA: on flickr: http://www.flickr.com/photos/79472036@N07/
  2. CODE: on github https://github.com/rossmounce/LeidenPDFhack
  3. Twitter: https://twitter.com/PhytoFigs


For additional information see here