Recommendations for actions to facilitate improved cooperation
The Workshop “Routes towards cooperation” in Berlin, 2013-05-23, aimed at increasing our reciprocal understanding and foster progress towards multi-institutional action that could improve cooperation. The following recommendations, aiming at bottom up and mid-level management decisions, are one result of the workshop.
Create Infrastructures that foster collaboration and avoid duplication of efforts
- Establish a multi-institutional group focused on coordinating collaborative software development. The goal is to improve the efficiency of resource use by means of common Open Source based development projects using Open Source methodology.
- New technologies are opportunities for collaborative development right from the start (avoiding the problems with existing and incompatible legacy developments)
- New development methods and frameworks exist which better support collaborative development than previous solution.
- Establish agreements on specialization in services (example: one institution specializes in geographical analysis another in visualization tools), providing services to other institutions or projects
- Depending on trust among institutions more or less mission critical services can be shared. With insufficient trust in other institutions commitment to longevity of services, only non-mission critical services can be shared
- Ask institutions which activities they want to strengthen and share as services w
- Also ask which activities they would rather like to abandon and use as services from others (in place of a missing solution, but also replacing an unsatisfying existing solution).
- A special opportunity to share services are new crowd sourcing services (e.g. clean up data, bibliographic references, or markup or annotate scientific names, treatments, material citations etc. in legacy literature). The service would provide a well publicized platform where various project can be launched and connect with a community of volunteers. (See also J. Chamberlain, piB Feb meeting)
- Agreement to register all biodiversity web services that are provided to other Biodiversity institutions in the Biodiversity Catalogue (Hosted @ Univ. Manchester) and all workflows that are provided to other Biodiversity institutions in myExperiment (Hosted @ Univ. Manchester)
- Agreement to communicate the expected and planned stability of services by means of a standard vocabulary (e.g.: undecided, experimental, long-term service without fixed API, long-term service with stable and versioned API)
- Build, discuss and share vocabularies on a simple level, especially before fully developed ontologies exist.
- Agreement to collaborate on the development of shared term definitions (glossary-style) with the understanding that new terms can be freely added, but an effort will be made to re-use or improve existing term definitions. Do not wait for ontologies. The increasing atomization of data might lead to an avalanche of vocabulary terms. Don’t wait, until a theoretical framework has been developed but use with few terms and then expand.
Establish technical means to improve sharing and interconnection
- In the pre-digital world, objects acquire multiple identifiers over time. While it is desirable to have as few identifiers as possible in the digital world, waiting for the ultimate identifier that fulfills all requirements comes at the cost of waiting for the perfect solution. We recommend to accept multiple digital identifiers and integrate identity relations between these identifiers (“a123 same as b123”) into the existing discovery systems such as GBIF.
- Institutions need to establish long-term management procedures define stable web addresses (URIs) for a core set f their digital or non-digital assets. The primary bottleneck is not a lack of technology, but a lack of management decisions which URIs to maintain as stable and which URIs to allow to break when technologies or needs change are necessary to utilize. The technology (web server rewrite rules and content negotiation) is generally available.
- All Natural History Collections should assign stable identifiers to their specimens, minimally to those that can be accessed (images or data) online until the end of the year 2013.
- To be compatible with the Semantic Web and Linked Open Data, the recommended type of stable identifiers are web addresses (“http://…”-URIs). This can be achieved both with stable http-URIs based on the institutional domain name (preferred in the semantic web) and DOI technology in the form of http://dx.doi.org/..doi (preferred in the publishing industry) are possible implementations.
- Current activities to implement Linked Open Data systems: Royal Botanical Garden Edinburgh, Univ. Manchester, Kew, Museum für Naturkunde, BGBM Berlin, iMarine, Plazi, Zoobank, AntWeb (please add missing institutions!). Under discussion in CETAF ISTC.
- Similarly, URI identifiers should be established for names or treatments.
- Work towards providing RDF data for specimens and taxa. However: stable http-identifiers for specimens are a hugely valid first step, even if not yet accompanied by RDF Data. These may follow much later.
- Share experience with Semantic Web, Scalability of Triple Stores for RDF
- Add standardized machine readable open access licenses (e. g. CC0 for data that may contain minor copyright portions, CC BY or CC BY-SA for text)
- Agreement to communicate the data policies according to the Linked Open Data five star scoring.
- Work towards institutional policy agreements on Open Access
- All harvested and centrally accessible data should have a clear mechanism for
- attributing the responsible scientist (collector, person performing identification, etc.)
- pointing the user back to the source provider (institution or published data set).
- Consider the W3C Provenance (PROV) model to record who created and changed data and where data are located
- providing usage statistics back to sources.
- By systematically providing links to any data used in a publication the hypothesis testing and verification of results will be supported.
- Norman Morrison: work towards specification and adoption of a common mechanism for data publication and citation to increase the rewards of publishing open access data and build trust
- Work towards or support the development and use a system of citation of data usage. Create benchmarks of data usage to be able to interpret usage (like citation index)
- Define or decide on mechanisms of data citation.
Notes from working groups
The following are part of the notes of the working groups, as presented on flipcharts and sticky notes.
- Barcode Pipelines
- Voucher Specimens (better access)
- enhance quality
- expand quantity (accelerate digitization)
- 2-D objects are low hanging fruits
- Entomology specimens are difficult in comparison to herbarium sheets
- simplify permissions
- reward & citations
- networks are 90% sociology
- Data: consistent
- Complete accurate
- Make us of mark-up in review stage of publications
- ontology development
- IDs and feedback
- Long term resources
- index of expertise
- networks productive
- commitment beyond MOU signing
- Career development
- Unique resolvable persistent identifiers
- Granular knowledge curation
- include vouchers from GenBank, etc.
- Digital infrastructure for ecology
- statistical data
- biodiversity data
- geospatial data
- Virtual research environment
- User centric
- mashup data across diversity
- Data reuse, repurpose, reanalyze
- data infrastructure
- common services
- self organization
- cooperation on merit
- MOU: do you need one?
- Links with other communities - ecoinformatics
- Observation aggregation
- Essential Biodiversity Variables
- Full open sharing
- Need content
- use cases
- Locating cited specimen
- Data visualization
- fund raising
- A lag time to digitization
- Getting community support
- working groups
- A safe place for images
Culture of Bioinformatics vs. real informatics
- Where are biodiversity data?
- Connected by names
- Global Names Architecture
- Human edited data aggregation
- Taxonomic backbone
- piping tool
To think about: What should be the emphasis> Biota? Isn’t that too narrow? Use Floras as exemplar.