Useful docs on URIs, httpRange-14, conneg, etc.

Creating proper URIs for information and non-information resources, and designing a system arcquitecture accordingly is a messy business. Some pointers for future reference:

Advertisements

Minitutorial sobre Schema.org

He añadido las slides de un pequeño tutorial que voy a dar sobre Schema.org y cómo usarlo para añadir datos estructurados en páginas web. Schema es interesante por que es la primera “ontología” de la Web Semántica que se adopta masivamente (De ahí la charla “Light at the end of the tunnel” de Ramanathan V. Guha, uno de los creadores de RDF) y se apoya mucho en JSON-LD, otro lenguaje “puente” entre la Web Semántica y la Web “Ordinaria” (una diferenciación que cada día tiene menos sentido, ya que ya usamos la Web Semántica en cada búsqueda que hacemos en google). Este tutorial es parte del proyecto Servicios OpenLinkedData, uno de los mayores proyectos de implantación de Linked Data en administraciones públicas.

Actualización: el contenido de este trabajo se irá colgando en el repositorio GitHub de Open Data Euskadi.

GigaScience´s Impact Factor

Even though the editors of GigaScience don’t like Impact Factors (and I agree with them), GigaScience has received a very high Impact Factor, 7.46. I’m quite happy since we published a paper in GigaScience last year, Enhanced reproducibility of SADI web service workflows with Galaxy and Docker.

Tagged , , , , , , , , , , ,

Transforming CSV data to RDF with Grafter

Part of my work is to develop pipelines to transform already existing Open Data (Usually CSVs in some data portal, like CKAN) into RDF and hopefully Linked Data. If I have to do the transformation myself, interactively, I normally use Google Refine with the RDF plugin. However, what I need now is a batch pipeline that I can plug into a bigger Java platform.

Therefore, I’m looking at Grafter. Even though I have never programmed in Clojure (or any other functional language whatsoever!), Grafter’s approach seems very sensible and intuitive. Additionally, I have always wanted to use Tawny-OWL, so probably it will be easier if I learn a bit of Clojure with Grafter first. Coming from Java/Perl/Python, the functional approach felt a bit weird in the beggining, but it actually makes more sense when defining pipelines to process data.

I have gone through the Grafter guide using Leiningen in Ubuntu 14.04. So far so good (I had to install Leiningen manually though, since Ubuntu’s Leiningen package was very outdated). In order to run the Grafter example in Eclipse (Mars), or any other Clojure program, one needs to install first the CounterClockWise plugin. Note that if you want to also use GitHub, like me, there is bug that prevents the project from being properly cloned, when you choose the “New project wizard”: I cloned with the General project wizard, copied the files from another Grafter project, and surprisingly it worked (trying to convert the project to Leiningen/Clojure didn’t work!).

My progress converting data obtained in Gipuzkoa Irekia to RDF can be seen at GitHub. Also, I’m aiming at adding Data Cube SPARQL constraints as Clojure test, here.

 

Tagged , , , , ,

Servicios OpenLinkedData

Hemos conseguido al adjudicación del contrato para implementar parte de Open Data Euskadi como Linked Data: Contratación de los Servicios OpenLinkedData.

Three levels of reproducibility: Docker, Galaxy, Linked Data

[Originally posted at LinkedIn]

I have just stumbled upon this thread on why one should use Galaxy (https://www.biostars.org/p/50034/). One of the reasons posted is reproducibility, but Galaxy only solves one level of reproducibility, “functional reproducibility” (What I did with the data). There is at least two other levels, one “bellow” Galaxy and another one “above” Galaxy:

  • Bellow: computational environment: Operating System, library dependencies, binaries.
  • Above: semantics. What the data means.

In order to be completely reproducible, one has to be reproducible on the three levels:

  1. Computational: Docker.
  2. Functional: Galaxy.
  3. Semantics: URIs, RDF, SPARQL, OWL.

And how to do it is described in our GigaScience paper, “Enhanced reproducibility of SADI Web Service Worfkflows with Galaxy and Docker” 🙂 (http://www.gigasciencejournal.com/content/4/1/59)

Just to emphasize and clarify, the 3 levels would be:
3.- Semantics: what the data means.
2.- Functional: what I did with the data.
1.- Computational: how I did it.

Charla TikiTalka sobre Linked Data

El viernes 12 Febrero di una charla sobre Datos enlazados y Web Semántica (Slides.com), como parte del evento TikiTalka organizado por VE Interactive Bilbao. Las otras charlas fueron muy interesantes y había cerveza gratis y futbolín, ¿Qué más se puede pedir?