Data enrichment hackathon, March 17-21 2014/RDFKB
RDF knowledge base of plant phenotypes
Phenotypes in plants are recorded for individual specimen, taxa, particular genotypes, etc. in several places (e.g., plant model organism databases). In some cases, the particular genotype and environmental conditions are recorded as well. An integrated knowledge base that contains (1) a description of the sample, (2) the phenotypes observed, and (3) the conditions under which they have been observed would allow comparative analyses of multiple plant species and may lead to identification of traits relevant for food security (e.g., drought resistance). Additionally, mapping to geospatial coordinates (as available for many collected specimen available in museums) will allow comparison of phenotypic characteristics under particular environmental conditions across multiple samples, species, etc. (example of LinkedGeoData here).
The goal would be to come up (1) with a simple data model (an ontology) that can characterize the relations between genotype, phenotype, environment in which plant samples, mutant genotypes, etc. are being observed, and then (2) generate an RDF knowledge base that integrates (or can integrate) this information from multiple sources (databases like TAIR, sample collections in museums, or from text mining).
Resources to consider:
- Darwin Core
- Plant Ontology, Plant Trait Ontology and PATO
- Phenotypes: Gramene, TAIR
- GeoNames: 3 million names + geolocation; useful for text mining
Relation to other projects:
- This project aims to extract entities (taxon names, museum identifiers, geolocations, genebank identifiers, etc.); this information can be made available as Linked GeoData in an RDF KB; this would allow integration (and federated queries) with UniProt RDF or EBI's RDF data sources.
- Taxonomic and geographic information from the World Spider Catalogue can be represented as linked geo data in the same store, and geographic distribution visualized using OSM or similar.
- Old literature (1930s and before) will not have geolocation for specimen, but usually have some reference to a location ("Madagascar", "Bavaria", "Black Forest"), many of which have a geolocation in DBPedia. Map these references to geo-locations through SPARQL.