NeXML Services

From pro-iBiosphere
Jump to: navigation, search

Task Group NeXML/BioVeL services

Participants

Aims

  • To develop command-line tools that merge data in a number of commonly-used phylogenetic file formats and export them as NeXML.
  • To develop command-line tools that extract objects from NeXML data: Taxa, Trees, Character matrices, all with metadata embedded.
  • To wrap these tools inside Taverna-compatible RESTful services.
  • To publish these services on BiodiversityCatalogue.
  • To annotate these services according to BioVeL guidelines.

Activities

  • Development
  • Web service testing and publishing
  • Documentation

Outcomes

Webservices having two main functions:

  • Create a NeXML file from three objects: a) Multiple sequence alignment, b) phylogenetic tree and c) Taxa list with their associated metadata: the Merger service.
  • Read a NeXML file and offers the possibility to extract the above listed objects with their corresponding metadata: the Extractor service.

Links

Implementation details

The Merger service

NeXMLMerger.svg

Inputs

  • Phylogenetic trees, in at least the following formats: Newick, NEXUS, PhyloXML, NeXML. There are two parameters for specifying trees, the location (trees={URL}), and the syntax format (treeformat={Newick|NEXUS|PhyloXML|NeXML}).
  • Alignments, in at least the following formats: PHYLIP, NEXUS, NeXML, FASTA. There are three parameters for each alignment file, the location (data={URL}), the syntax format (dataformat={PHYLIP|NEXUS|NeXML|FASTA}), and, optionally, the data type (datatype={dna|protein|standard}, default is dna).
  • Character sets, in text format, i.e. charsets={URL}, charsetformat={nexus|txt}.
  • Metadata in JSON or TSV syntax. i.e. meta={URL}, metaformat={JSON|TSV}. The first column of the metadata identifies which object is annotated. We can distinguish the following objects: TaxonID, AlignmentID, TreeID, NodeID, SiteID, CharacterID

Output

  • A NeXML document.

URL API

  • The service responds to HTTP GET requests, so all parameters are combined in the QUERY_STRING, with all "dangerous" characters URL-escaped.

The Extractor service

Inputs

  • NeXML file, whose location is specified as a URL, e.g. nexml={URL}
  • A parameter that specifies which objects to extract, e.g. objects={Taxa|Trees|Matrices}
  • A parameter that specifies the output formats, treeformat={NEXUS|Newick|PhyloXML|NeXML}, dataformat={NEXUS|PHYLIP|FASTA|Stockholm}, metaformat={tsv|JSON|csv}, charsetformat={txt}

Output

  • A subset of the NeXML data in the requested format, with a separate download of the metadata, likewise in the requested format.

Service deployment

We deploy the services as mod_perl handlers, which means that for synchronous services (i.e. everything is done in one request/response cycle) no forking is done at all. For asynchronous servers, the service class doesn't have to keep track of its session: the superclass keeps track of serializing and de-serializing the job object between requests

Virtualization

The web services can be easily installed on a virtual machine with vagrant. Vagrant takes care of all dependencies required on the server site (web server, perl libraries etc.). It is therefore easily possible to run the services in cloud computing environments such as openstack.

Itol Upload Service/ Workflow

We wanted a Service to upload Files to [Itol ]

Issues

  • Itol expected a post with a Multi Part Format
  • http://itol.embl.de/batch_uploader.cgi expected treeFile to be in File Format
  • Taverna did not have a good way for doing Posts with Multi Part Format

Implementations

  • Create a Taverna Workflow which as able to do the Post File:UploadToItolInner.txt (rename to *.t2flow to run)
  • Create a Taverna Workflow which is able to download an image File:ItolDownLoader.txt (rename to *.t2flow to run)
  • Currently for reasons beyond my knowledge the download only works with trees loaded via a webpage