Workshop Berlin 1: How to improve technical cooperation and interoperability at the e-infrastructure level Minutes

From pro-iBiosphere
Jump to: navigation, search

Contents

How to improve technical cooperation and interoperability at the e-infrastructure level

October 8, 8:30 – 17:30, Venue: Botanic Garden and Botanical Museum Berlin-Dahlem

The workshop was split into two working groups

  • Group 1: collection identifiers implementation review and roadmap
  • Group 2: service registries and the BiodiversityCatalogue

Minutes group 1

Chair

  • Roger Hyam - RBGE

Participants

  • Dominik Roepert - FUB-BGBM
  • Gregor Hagedorn - MfN/Plazi
  • Martin Pullan - RBGE
  • Nicky Nicolson - RBGK
  • Simon Chagnoux - MNHN
  • Ayco Holleman - Naturalis
  • Robert Morris - University of Massachusetts / Plazi
  • Terry Catapano - Plazi
  • Petr Danes - SEZNAM, Prag
  • Falko Glöckler - MfN

Agenda

see [1]

Results

State of implementations of collection-identifiers at partner institutions

  • BGBM Berlin – Implemented to Level 2 (Linked Data compliant)

Example: http://herbarium.bgbm.org/object/B100026394

  • Harvard University Herbaria Level 1 (Stable URI)

Example: http://mczbase.mcz.harvard.edu/guid/MCZ:Fish:12233

  • Harvard Museum of Comparative Zoology

Example: http://mczbase.mcz.harvard.edu/guid/MCZ:Fish:12233

  • Kew – Implemented to Level 1 (Stable URI)

Example: http://specimens.kew.org/herbarium/K000534727

  • MFN Berlin – Implemented to Level 1

Example: http://coll.mfn-berlin.de/u/ZMB_Phasm_D003

  • Naturalis NL – Demonstrated concept with intention of implementation.
  • Paris – Implemented to Level 2

Example: http://coldb.mnhn.fr/catalognumber/mnhn/rs/rs8625

  • RBGE – Implemented to Level 2 (Linked Data compliant)

Example: http://data.rbge.org.uk/herb/E00421509

Major discussion points

Recap of Edinburgh workshop: The meeting started with a recap of the Edinburgh workshop on stable identifiers (http://stories.rbge.org.uk/archives/3846). This was necessary as some participants weren't at the Edinburgh meeting and we needed to be sure everyone agreed to the same basic premise. There was some discussion of case sensitivity and the use of words in URIs, particularly the word “data” (which it was felt by some should be avoided) but there were no major changes to the outcomes of Edinburgh.

There was also some debate as to the three levels proposed in Edinburgh, as they are theoretically wrong. Level 3 (Data Services) sits on top of Level 2 (Linked Data compliant) when the content negotiation and RDF support in Level 2 could be considered just another service. It is possible to build other services on top of stable HTTP URIs than just content negotiation and return of RDF. It is highly likely that any services will make use of HTTP content negotiation, so it wasn't felt necessary to change this.

How does a user (e.g. journal publisher) know an HTTP URI is intended to be a stable HTTP URI and not a common (potentially ephemeral) URL? This was seen as a major issue and different mechanisms were discussed.

One approach was to have a register of URI patterns (held as regular expressions). This combined with the return of an HTTP 200 OK response would indicate a stable URI. This approach was curtly rejected. Who would maintain registry etc.

Return of special HTTP Header parameters was considered and not ruled out. No clear recommendation was made.

Return of HTML Meta tags of the form given below was also considered. As all levels of implementation are likely to return an HTML page by default, inclusion would be simple to implement for all and also be machine readable.

<meta name="Stability" content="Permanent"> <meta name="Stable" content="Yes"> <meta name="Lifetime" content="Forever">

Including a link to a human readable statement of stability in the HTML document returned was proposed. It had already been agreed that human readable pages should contain the stable HTTP URI and the text “Cite as:” or something similar. Adding a link to an explanation of what the stable URI was and how it should be used would encourage use and be easy to regulate. There was brief discussion on whether the text should be on a centrally located page or whether each supplier would host their own page. Precise wording was not discussed.

The final mechanism discussed was inclusion of special properties within the RDF returned by Level 2+ implementations. It was felt that just by returning RDF that stated the URI represented an object of a certain type was an indication of intended stability, but that specific discussion of properties to indicate stability should fall within discussion of Darwin Core RDF vocabularies which are on-going.

From Human Readable Specimen Citation to HTTP URI. This is an issue for marking up legacy literature as well as for active research publishing now. OpenURL–like query mechanisms were considered and then rejected. They would muddy the waters about stability. This is not to say there should not be search interfaces and APIs but they should be kept separate from stable HTTP URIs for objects.

From URI to Human Readable Specimen Citation: It should be simple to create a tool on top of any Level 2 implementation to support such behaviour. All that is needed is agreement on the properties to be returned in the RDF.

What should we do for specimens that haven't been databased yet? Material actively used by researchers often hasn't been databased by curators and therefore will not have a stable HTTP URI. Legacy material also contains specimens that hasn't been databased yet. Notions of placeholder URIs were considered. These discussions overlapped with those on OpenURL-like interfaces. The conclusions were that any such mechanism would confuse the notion of stable URIs and that it was better to look at how priorities were set for databasing of specimens. If people want to cite specimens they should have priority in the databasing workflow of collections. Database on demand should be considered by curators.

SiteMaps of specimen URIs: An outcome of the Edinburgh workshop was a decision to investigate the use of SiteMaps for sharing complete lists of HTTP URIs published by collections so that they might be made available to harvesters and indexers to build on.

Berlin and Edinburgh both implemented SiteMap files based on the Semantic Web Extension (http://sw.deri.org/2007/07/sitemapextension/) to the SiteMap standard (http://www.sitemaps.org/protocol.html) and showed these at the meeting.

The similarity between these RDF files and the Darwin Core Archive files was debated. Could we use Darwin Core Archive files as our SiteMaps? The RDF properties to include in the semantic SiteMap files was also discussed.

The use cases that might exploit these files were discussed. They included harvesting but any harvesting could not be incremental (keeping a cache up to date for example) or selective indexing (e.g. to produce geo or taxonomic focussed subsets).

No clear decisions were reached as to whether SiteMaps should be produced or not and discussions moved on to the large aggregator projects, who are already harvesting our data and supporting these use-cases.

Clear statements were made around the lack of take up of HTTP URIs (or any persistent identifiers) by big projects: GBIF, EOL, JSTOR are examples.

Where should HTTP URIs be placed in Darwin Core Archive files? There was discussion as to where in the Darwin Core Archive files the URIs should be placed. This differed to the on-going Darwin Core RDF discussions.

Things participants agreed on

1.  The general approach proposed at Edinburgh continued to be endorsed.
2.  Human readable pages returned for stable HTTP URIs would return a link to a statement of stability.
3.  There is an urgent need for the larger data aggregators to take HTTP URIs (indeed any form of supplier issued identifier) more seriously. This is important for Attribution as well as for implementation of further services and is something that should be raised at high level.
4.  Clear areas for further clarification were agreed.

Things that need to be further explored/discussed

1.  HTML Meta Tags: Potential tags to be included in HTML pages need to be agreed based on some standard – possibly Darwin Core.
2.  Statement of Stability: The wording of a stability statement needs to be drafted. We need to be decisive whether this statement will be centrally hosted or hosted on collections' own websites. The latter seems preferable, as it avoids any element of centralisation.
3.  Placement of HTTP URIs in Darwin Core Archive files – This may be clarified during TDWG 2013.
4.  How do we make recommendation to curators to database specimens on request so that as many cited specimens have HTTP URIs as possible?
5.  How do we make recommendations to major data aggregators to make use of and display our stable HTTP URIs.
6.  How do we gather use-cases from data aggregators and publishers to inform decisions about implementation of services?
7.  Implement a proof of concept service to go from stable URI to specimen citation string using known fields in RDF returned.

Minutes group 2

Chair

  • Anton Güntsch - FUB-BGBM

Participants

  • Cherian Mathew - FUB-BGBM
  • Niall Beard - University of Manchester
  • Yde de Jong - University of Finland & Royal Belgian Institute of Natural Sciences
  • Wouter Addink - Naturalis
  • Michael Diepenbroek - Mare
  • Boris Jacob - Royal Museum for Central Africa
  • Markus Weiss - SNSB IT-Center Munich
  • Hanna Koivula - Finnish Museum of Natural History
  • David Fichtmüller - FUB-BGBM
  • Craig Hilton-Taylor - International Union for Conservation of Nature

Agenda

see [2]

Results

The scope of the meeting was to assess the BiodiversityCatalogue ([3]) and its potential role as a global registry for biodiversity-related web services. To inform participants about i) the use of service registries in the context of data experiments and workflow developments and ii) the present functionality of the BiodiversityCatalogue. The meeting started with two introductory presentations:

  • Introduction to the Biodiversity Catalogue (Niall Beard) (see [4])
  • BiodiversityCatalogue and workflow development (Cherian Mathew) (see [5] for a similar presentation given by Cherian Mathew at TDWG 2013).

The participants had then the opportunity to test a clone of the BiodiversityCatalogue themselves by browsing and searching the existing registrations and registering new services. Observations made during this unstructured testing phase were collected on a flip chart (see [6], [7], [8]).

In a discussion notes from the unstructured testing were then clustered into three topics to be analysed by break out groups in a more structured way:

Topic 1: Improving the User Interface

Participants
  • Cherian Mathew - FUB-BGBM
  • Niall Beard - University of Manchester
  • Craig Hilton-Taylor
Documentation

see [9]

Recommendations
1.1. Change name to "Biodiversity Service Catalogue"

On Home page:

1.2. move Latest Activity, Site Announcements, Latest Services and Top Contributers into a single tabbed panel
1.3. remove Monitor Test Changes and Helpful Links from BioCatalog completely.

On Services page:

1.4. move tag cloud to bottom of page
1.5. set 'Simple View as default view with star rating and one line description
1.6. current filters applied cab be cleaner
1.7. move current filters applied  to top of page
1.8. less examples in filter categories 

When registering services:

1.9. Ping URLs immediately and show result
1.10. Display tip/hing text on the right when entering URLs/text or choosing drop down items
1.11. Display categories as trees
1.12. Add mandatory short description
1.13. Add Wizard pages popups for "License + Cost + Usage Condition", "Publications + Citations" and "Contact"

Displaying the services:

1.14: Services registration status/completion widget (like progress bar)
1.15: Populate "Add new end point": Dropdown for HTTP Request type (GET, POST, ...), Base URL is already known; field for end point relative to base; automated ping one the cursor leaves the field.
1.16. On Endpoint page add possibility to set parameters as optional
1.17. if example data is provided, call service endpoint and fill result as output
1.18. In "monitoring page": add possibility to ping for users
1.19. add field for login/key request if required
1.20. feedback mechanism for publishing/subscribing to notifications (e.g. API change broadcasted to users who have subscribed)
1.21: make "Annotations" a link to the list of annotations
1.22: On "provider" page: provide more possibility of branding/info

Topic 2: Quality and Policies

Participants
  • David Fichtmüller - FUB-BGBM
  • Michael Diepenbroek - Mare
  • Boris Jacob - Royal Museum for Central Africa
  • Wouter Addink - Naturalis
Documentation

see [10]

Recommendations

Fields for Quality Control in Service Metadata

2.1. checkbox if the person entering the data is also the person responsible for the service
2.2. signal the protection level: public, registration, restricted (no automated availability checks for closed services)
2.3. field to indicate: open access or paid access

Continuous Quality Checks:

2.4. check regularly if links still work (not only end points but also links to documentation, ToS, organization, ...)
2.5. Reminders to users asking them to check if the data about their services is still up to date (about once a year)

Compiling user feedback

2.6. Allow user feedback for services (not to be confused with the option of adding missing information as it currently is)
2.7. Gather feedback in a structured way, allowing it to be used in a metric for quality control
2.8. Handle feedback and problems via a ticket system

Dissemination

2.9. Suggestion box for new entries on existing services

Long Term Organizational Recommendations

2.10. Develop a Metric to describe the quality of a service
2.11. Develop a content standard for service description (to exchange data with other service registries)

Topic 3: Additional fields for Service Descriptions

Participants
  • Anton Güntsch - FUB-BGBM
  • Markus Weiss - SNSB IT-Center Munich
  • Hanna Koivula - Finnish Museum of Natural History
  • Yde de Jong - University of Finland & Royal Belgian Institute of Natural Sciences
Documentation

see [11]

Recommendations

Technical Parameters

3.1. more structure for describing examples
3.2. field for response examples
3.3. controlled vocabulary for exchange formats
3.4. Uploading exchange format documentation

Content description

3.5. Metadata field for service related to context/background
3.6. Map the tags to an ontology?

Access & Licensing

3.7. Provide controlled vocabulary for licenses?
3.8. Option to express the guaranteed lifetime of a service

User Annotations

3.9. General comment/review field for service users (maybe with [star-based] rating)
3.10. Community Platform features

Other Recommendations

3.11. look / compare with UDDI
3.12. look at services that are not REST/SOAP (e.g. files)
3.13. look at lessons learned from BioCat on content description
3.14. include service usage statistics in 
3.15. Logos/Icons for services and service providers
3.16. Best Practices for services description
3.17. Categories should include project/networks names for branding purposes

Summary and major agreements

  • The BiodiversityCatalogue has the necessary functions needed for a global registry of biodiversity-related services. Service providers should not wait for more functions and start to register their services now.
  • Metadata standards and exchange protocols need to be implemented for synchronisation with similar/related existing catalogues and registries.
  • With a growing number of services being registered, the BiodiversityCatalogue will need curation and quality control.
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox