Three levels of reproducibility: Docker, Galaxy, Linked Data

[Originally posted at LinkedIn]

I have just stumbled upon this thread on why one should use Galaxy (https://www.biostars.org/p/50034/). One of the reasons posted is reproducibility, but Galaxy only solves one level of reproducibility, “functional reproducibility” (What I did with the data). There is at least two other levels, one “bellow” Galaxy and another one “above” Galaxy:

  • Bellow: computational environment: Operating System, library dependencies, binaries.
  • Above: semantics. What the data means.

In order to be completely reproducible, one has to be reproducible on the three levels:

  1. Computational: Docker.
  2. Functional: Galaxy.
  3. Semantics: URIs, RDF, SPARQL, OWL.

And how to do it is described in our GigaScience paper, “Enhanced reproducibility of SADI Web Service Worfkflows with Galaxy and Docker” 🙂 (http://www.gigasciencejournal.com/content/4/1/59)

Just to emphasize and clarify, the 3 levels would be:
3.- Semantics: what the data means.
2.- Functional: what I did with the data.
1.- Computational: how I did it.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: