ESDS Update

Using SDMX
The World Development Indicators (WDI) dataset that we are converting to RDF is already prepared in SDMX-ML Compact format.   A dataset described in SDMX-ML format will have at least two XML files, a data file and a data structure definition file (DSD). To produce RDF we need to use both these files.

The observations in the data file are associated with coded values.  For example, a series of yearly observations may be under a series heading where FREQ=”A” SERIES=”NY_ADJ_NNTY_KD_ZG” and REF_AREA=”AFG”. Abbreviating these values makes the SDMX XML more compact and efficient, but unless we are familiar with the data, we can only guess that “AFG” might stand for Afghanistan, the frequency is Annual and the data refers to “Adjusted net national income (annual % growth)”. The descriptive text for all the abbreviated fields is held in the DSD file. The DSD file also defines the structure of dataset, including what concepts (dimensions) and observation statuses (footnotes) the dataset uses.

Modelling the RDF
Often the most difficult aspect of exposing datasets via RDF is modelling the data.  The Publishing Statistical Data working group (of which Mimas is part) has already created a vocabulary mapping SDMX objects to the corresponding entities in RDF.  The diagram below is an extract from the RDF Data Cube vocabulary (http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html#outline) and shows an outline of the vocabulary.

To utilise this framework we have created RDF classes that reference the SDMX objects for datasets, key-families, observations and so forth.

Transformation Process
SDMX has been designed to encompass a large diversity of statistical data, the WDI dataset uses only a small subset of the available SDMX. What we aim to do is convert the SDMX objects in the source XML to their equivalent RDF objects. As the SDMX-ML source files are a form of XML and the RDF output we are producing is a number XML files, we have chosen to use XSLT transformations to do the conversion. XSLT transformation is a standard technique for manipulating XML, it provides mechanisms for pattern matching in XML, collecting and rewriting data.

Difficulties
On the face of it, the work would seem to be straightforward however, we have encountered some difficulties in the process.  SDMX has no requirement to use unique field names or key family (code list names) because each pair of files can be considered in isolation. In a collection of RDF data, unique identifiers are a key principle so duplicate object names cannot be used. We use URIs to identify objects and it is good practice to use a human readable URI rather than an abbreviation or code. This forces us to be consistent in the capitalisation we use for identifiers and prevent us from using characters that are illegal in URIs (or that change the meaning, such as “/”, “\” or “:”).

Output
Having created and tested a transformation process we are now ready to begin converting the full WDI dataset and we hope to have this completed within the next week.  The Mimas Linked Data working group is currently looking at implementing a triple store and we hope the dataset will eventually reside there.  However, for the short term we are using simple flat files on a web server.

To show how the WDI RDF can be linked out we will be spending the final few weeks of the project building a demonstrator product.  This will likely take the form of some kind visualization.

Advertisements
Posted in Uncategorized | Tagged , , , , , | 1 Comment

Census Aggregate Linked Data

As part of the CAIRD II project our colleagues in the Census Dissemination Unit (CDU) have been working to make census aggregate data and metadata available as RDF.  As with the work we are doing at ESDS, the CAIRD II project is building upon the work of the Publishing Statistical Data working group.

The CDU have published an interesting blog post explaining what they are doing with Linked Data and how they hope to apply it to the 2011 census.

Posted in Uncategorized | Tagged , , , , , | 1 Comment

SDMX Global Conference 2011

The third SDMX Global Conference took place in Washington DC this month, I (Paul Murphy) was there representing Mimas. The conference brought together around 280 delegates, from 90 countries to discuss all aspects of Statistical Data and Metadata Exchange.  The World Bank and IMF did a fantastic job of hosting the discussions. With SDMX now in its 10 year there was a clear shift in the focus of this conference from developing the initiative, towards implementing a now mature standard and showcasing SDMX tools.  There are some really interesting tools being developed to take advantage of SDMX, particularly in the area of visualisation; The ESDS International team are currently looking at how we can bring some of these to our users.  I was particularly happy to see Mimas’s work with SDMX get a mention during the OECD’s talk on the business case for SDMX.

I’d hoped the keynote on Gov 2.0 might have seen a mention of Linked Data, unforunately it didn’t. Indeed, there was very little talk of Linked Data at the conference at all, and in my chats with the various delegates there seemed to be little knowledge of it amongst SDMX practitioners.  It seems that, despite the work of EuroStat and the Publishing Statistical Data Working Group, there is still a lot to be done in raising awareness and interest of linked aggregate statistical data.

Posted in Uncategorized | Tagged , , , , , , | 3 Comments

Mimas staff involved

Mimasld Principal Co-ordinator: Ross MacIntyre

Landmap

Co-ordination: Kamie Kitmitto

Development: Bharti Gupta

Spatial Data Expertise: Gail Millin-Chalabi & Yin Tun

ESDS International

Co-ordination: Paul Murphy, Jackie Carter

Analyst: Rob Diamond-Green

Development: Yogesh Patel, Will Standring

Posted in Uncategorized | Tagged , , , , | Leave a comment