Data enrichment hackathon, March 17-21 2014

From pro-iBiosphere
Jump to: navigation, search

Background information

Organisers:

Purpose:

  • Enrich structured biodiversity input data with semantic links to related resources and concepts, for example links from named locations to GeoNames, from taxon names to taxon URIs, from characters and traits to trait ontologies, and so on.

Expected result:

  • Open source tools and software that will allow experts from different disciplines beyond taxonomy to re-use and enhance the biodiversity knowledge gathered by experts in the past 60 years.

Approach:

  • Suitable use cases will be determined through self-organization using Open Space Technology. During the hackathon there will be 5 to 6 task groups consisting of 5 to 6 persons. The task groups will be formed based on the discussion that will take place on the 17th of March. During the day task groups will be programming. By the end of each afternoon, participants will come together to have a short plenary to discuss the progress made.

Template for power point presentations:

  • Download the project PowerPoint presentation template here.

Social media:

  • For posts on Twitter, please use the following hashtag: #piblei

Logistics

  • 17th of March (Monday) to 19th of March (Wednesday) 2014 - From Monday to Wednesday the meeting will take place at Naturalis Biodiversity Center, Leiden, the Netherlands at the auditorium of Naturalis. Directions to the venue can be found here.
  • 20th of March (Thursday) to 21th of March (Friday) 2014 - Thursday and Friday the meeting will take place at the Holiday Inn Hotel. Directions to the venue can be found here.
  • Restaurants - Restaurants in Leiden. To be announced/updated depending on whether participants would like to have joint dinner
  • The organisers of the event will pay for:
    • Tea/coffee, snacks (all week at 10.45 - 11.15; 15.15 - 15.45)
    • Lunch (12.30 - 13.30 on Monday - Friday)
    • Social event (18.00 - 19.00 on Monday)

Reimbursements

If funding for travel (economy ticket, cheapest choice) and accommodation has been approved in advance between the organisers of the hackathon and you, please take the following into account: Overnight stay will be paid from the day previous of the meeting to Friday. Breakfast is also covered, if included in the hotel. Lunch/coffee is free. Dinner, taxi and additional hotel charges have to be paid by participants themselves. Reimbursement claims (form available here) must be sent to soraya.sierra@naturalis.nl, Rutger.vos@naturalis.nl with original receipts (not electronic copies thereof) by March 30th at the latest. This includes boarding tickets if you came by plane. Do not forget to add your IBAN and BIC/SWIFT codes (help) under your bank details in the reimbursement form. Otherwise, delays in reimbursements will result.

Preparations prior to the hackathon

A page containing examples of use cases that will be discussed on Monday the 17th of March 2014 has been added to the wiki. In order to facilitate discussions during the hackathon we would appreciate very much if you can make some time to read the information provided by various participants and follow-up the ongoing discussions. In case you have an additional use case that you would like to discuss during the hackathon, please feel free to add it to the wiki.

We would also like to remind you to:

  • ask for a pro-iBiosphere wiki account: here
  • create an account on some public version control provider (e.g. github.com, sourceforge.net or bitbucket.org)
  • update the online table summarising your expertise: here

In case you are presenting a use case, please prepare a presentation when possible using the template available here and share the link on the wiki (see agenda day one).

Important links

Participants

Participants of the hackathon are:

  • persons that have knowledge in programming
  • providers of use cases (taxonomists, ecologists, etc.)

We received applications from experts all over the world. We made a selection of 37 participants trying to keep a balance on the areas of expertise required during the hackathon (see table below). A summary of the expertise of participants is available here.

Name Twitter Organization Role Repository
Balech Bachir Institute of Biomembranes and Bioenergetics -
Italian National Research Center
provider of use case, programmer https://bitbucket.org/bachirb/leidenhackathon_17210314
Jordan Biserkov @jbiserkov Pensoft programmer https://github.com/pensoft/DataHackLeiden
Niall Beard University of Manchester
programmer https://github.com/myGrid/DataHackLeiden
Matthew Blissett Royal Botanic Gardens Kew
programmer https://github.com/RBGKew
Christian Brenninkmeijer University of Manchester programmer
David Eades University of Illinois
George Gosline RBGK
Quentin Groom @cabbageleek BGM provider of use case https://github.com/qgroom
Thomas Hamann Naturalis developer https://github.com/thoha
Hannes Hettling Naturalis https://github.com/hettling
Robert Hoehndorf Aberystwyth University programmer, provider of use case https://github.com/leechuck/plantphenotypes
Ayco Holleman Naturalis developer
Peter Hovenkamp Naturalis provider of use case
Patricia Kelbert FUB-BGBM programmer http://dev.e-taxonomy.eu/svn/
David King @DauvitKing Open University developer https://github.com/Dauvit/Data_enrichment
Don Kirkup @donkirkup Royal Botanic Gardens Kew provider of use case
Youri Lammers Naturalis programmer https://github.com/Y-Lammers
Thibaut Meulemeester Naturalis provider of use case
Daniel Mietchen @EvoMRI MfN provider of use case https://github.com/Daniel-Mietchen/
Jeremy Miller @millerjeremya Naturalis provider of use case
Ross Mounce @rmounce University of Bath programmer https://github.com/rossmounce/LeidenPDFhack
Nicky Nicolson @nickynicolson Royal Botanic Gardens Kew programmer https://github.com/RBGKew
Rod Page @rdmpage University of Glasgow https://github.com/rdmpage
Aleksandra Pawlik @aleksandrana Software Sustainability Institute, myGrid
Serrano Pereira Naturalis programmer https://github.com/figure002
Lyubomir Penev Pensoft
Kevin Richards @richardsk_nz ex Landcare Research New Zealand https://code.google.com/p/biodiversity-software/
Guido Sauter Plazi developer multiple (both on Google Code and on ViBRANT GIT)
David Shorthouse @dpsSpiders Université de Montréal / Canadensys programmer https://github.com/dshorthouse
Soraya Sierra @proibiosphere Naturalis co-organiser
Marko Tahtinen University of Eastern Finland, BioVeL programmer https://bitbucket.org/mjtahtin/marko_hackatlon
Tom van Doren Naturalis provider of use case
Rutger Vos @rvosa Naturalis co-organiser https://github.com/rvosa
Claus Weiland Biodiversity and Climate Research Centre / Senckenberg programmer https://github.com/cp-weiland
Alan R. Williams University of Manchester programmer https://github.com/myGrid/DataHackLeiden


Agenda

Day 1: 17th of March 2014

Day 1 will consist mainly of brief presentations (10 min presentation and 5 min discussion) on possible use cases and tangible outcomes. Purpose: to discuss the most interesting, urgent and realistic outcomes to be achieved. Based on the contributions to the use cases page (as of 13 March) three main topics have emerged, which we've used to organize the ideas proposed so far. Following these pitches there will be several "bootcamp" presentations that will introduce relevant fundamental technologies.

08.30 - 09.00 Registration, getting set up

09.00 - 09.15 Introductory remarks - Rutger Vos & Soraya Sierra: background, logistics (food, getting around), outcomes (tools, publication(s) e.g. in SWJ, proposals, presentations at pro-iBiosphere final event, etc.), approach (what is a hackathon?).

09.15 - 10.45 Open Space: use case pitches on literature / natural language processing:

10.45 - 11.00 Tea / coffee break

11.00 - 12.45 Open Space: use case pitches on platforms and tools:

  • Strings to things, things to strings, thing to things – Rod Page, Univ. Glasgow
  • Annotation tools to link specimens with relations expressing identity – Peter Hovenkamp
  • Annotation tools that allow to annotate records with geographic coordinates not in original, distinguish records with coordinates quoted in original from those annotated secondarily (provide totals for either or both, perhaps plotted with different color) – Jeremy Miller, Naturalis
  • Implementing a webservice to export occurrences from EDIT Platform instances (for example to BioVeL Workflows) – Patricia Kelbert, FUB-BGBM
  • Running of workflows from within an iPython Notebook – Alan Williams, University of Manchester & Aleks Pawlik, Software Sustainability Institute, myGrid
  • Contingent on Media Library API deployment: development of image harvesting client – Serrano Pereira, Naturalis
  • Implementing a WebService able to add and extract metadata to a NeXML file – Bachir Balech, Institute of Biomembranes and Bioenergetics - Italian National Research Center
  • Improvement of Wikimedia pages with information about the pro-iBiosphere pilot taxa – Daniel Mietchen, MfN (Note: the presentation will consist of a brief announcement. The topic will be further explained/covered in the Wikimedia workshop on Tuesday)

12.45 - 13.25 Lunch

13.25 - 13.30 Group photo

13.30 - 14.45 Open Space: use case pitches on semantic linking:

  • Link names to collections via type citation (slides: http://www.slideshare.net/nickyn/nn-leidendatahack) – Nicky Nicolson, RBGK
  • Link collection events between different systems (as above) – Nicky Nicolson, RBGK
  • Link collections to duplicates held in other herbaria (as above) – Nicky Nicolson, RBGK
  • Link specimens from digitized museum collections – Jordan Biserkov
  • Enriching references using the BHL API – Jordan Biserkov
  • RDF knowledge base of plant phenotypes – Robert Hoendorf, Aberystwyth University

14.45 - 15.30 Open Space: Self-assembly bazaar

15.30 - 15.45 Tea / coffee break

15.45 - 16.30 Open Space: Self-assembly bazaar

16.30 - 17.00 Biological Collections Ontology (BCO) – Ramona Walls, John Deck, and John Wieczorek, The iPlant Collaborative (Bootcamp via webex / hangouts / skype) - YouTube video

17.00 - 17.30 Charaparser bootcamp – Hong Cui, Univ. Arizona (Bootcamp via webex / hangouts / skype)

17.30 - 18.00 Open Space: Presentation of Task Groups

18.00 - 19.00 Drinks

19:00 Dinner at Rhodos, Turfmarkt 5, Leiden. http://www.eet.nu/leiden/rhodos-grieks-restaurant

Day 2: 18th of March 2014

08.30: Doors open

09.00 - 12.30 Parallel sessions: bootcamps in the Regentenkamer

09:00 - 09:30 Regentenkamer Bootcamp Digitisation workflow in Digitarium and georeferencing service – Marko Tähtinen, University of Eastern Finland, BioVel. Presentation: https://www.dropbox.com/sh/2zg7yie6p9d0u6o/ht2syK5Rg7/Digitisation_Leiden.pptx

09:30 - 12:30 Regentenkamer Bootcamp Wikimedia workshop – Daniel Mietchen, MfN (e.g. how to edit Wikipedia, how it relates to other Wikimedia projects like Wikimedia Commons, Wikisource, Wikispecies, Wikidata, including hands-on and an introduction to Semantic MediaWiki).

09.00 - 12.30 Parallel sessions: programming in the Auditorium

10.45 - 11.15 Tea / coffee break

12.30 – 13.30 Lunch break

13.30 - 15.30 Programming in Regentenkamer and Auditorium

15.30 - 15.45 Tea / coffee break

15.45 - 17.30 Programming in Regentenkamer and Auditorium

17.30 - 18.00 Plenary (auditorium): brief standups from task groups

18.00 End of activities

Day 3: 19th of March 2014

08.30: Doors open

09.00 - 10.45 Programming in Regentenkamer and Auditorium

10.45 - 11.15 Tea / coffee break

11.00 - 12.30 Programming in Regentenkamer and Auditorium

12.30 – 13.30 Lunch break

13.30 - 15.30 Programming in Regentenkamer and Auditorium

15.30 - 15.45 Tea / coffee break

15.45 - 17.30 Programming in Regentenkamer and Auditorium

17.30 - 18.00 Plenary (auditorium): brief standups from task groups

18.00 End of activities

Day 4: 20th of March 2014

08.30: Doors open

09.00 - 10.45 Programming in "Alkmaar-/Haarlem" and "Katwijk"

10.45 - 11.15 Tea / coffee break

11.15 - 12.30 Programming in "Alkmaar-/Haarlem" and "Katwijk"

12.30 – 13.30 Lunch break

13.30 - 15.15 Programming in "Alkmaar-/Haarlem" and "Katwijk"

15.15 - 15.45 Tea / coffee break

15.45 - 17.30 Programming in "Alkmaar-/Haarlem" and "Katwijk"

17.30 - 18.00 Plenary (Alkmaar-/Haarlem): brief standups from task groups

18.00 End of activities

Day 5: 21th of March 2014

08.30: Doors open

09.00 - 10.45 Programming in "Alkmaar-/Haarlem" and "Katwijk"

10.45 - 11.15 Tea / coffee break

11.15 - 12.30 Programming in "Alkmaar-/Haarlem" and "Katwijk"

12.30 – 13.30 Lunch break

13.30 - 15.15 Programming in "Alkmaar-/Haarlem" and "Katwijk"

15.15 - 15.45 Tea / coffee break

15.45 - 17.30 Programming in "Alkmaar-/Haarlem" and "Katwijk"

17.30 - 18.00 Plenary (Alkmaar-/Haarlem): brief standups from task groups

18.00 End of activities

See also