Sunday, 23 September 2012

Releasing data really works, Part II

[Oct 2012 update: Here is an example on how much further map stories can be taken]

Here is another simple example of posting data as maps on-line, in order to help linguists this time elucidate spatio-temporal relationships on not-insignificant amounts of data.

In my previous post I suggested that downloading public data, putting it in context and QCing it, then re-posting it to the originators helps cooperate for the greater good of improving said datasets. That exercise started, however, in trying to help Cambridge University archaeologists use one of their peers data on the geo-history of East Anglia. posts linguists' information using beautiful maps to explain complex geo-histrical contexts. A debate emerged around computational geography: on the one hand ethno-lingusitic issues are very complex but based on sparse data as research is far from easy; on the other hand computer modelling has been attempted with mixed success. So I proposed to them that online GIS may just be the way to post complex data and tease out those elusive relationships, to help put scientific fact ahead of popular myth.

To put maps where my mouth is, I took a dataset posted in the first comment to the originating article of the debate. I cleaned it up a bit to only show spatio-temporal data on carbon 14 dates on Eurasian sites, where Neolithic remains help trace hypothetical and potential origin of agriculture. Follow the dark to light tan from roughly 10,000 to 5.000 years BCE:

I simply went to ArcGIS Online, logged in and create an ArcGIS Explorer map by importing the simplified spreadsheet as a CSV file with type, latitude, longitude, site, region and carbon 14 dating information. I then classified them on a simple colour ramp using natural breaks. Four classes approached the breaks in the Neolithic phases around 6,400 and 8,800 BCE. Quite simple really.

This is a tiny example in a vast topic: it addresses nonetheless the prickly issue of correctly representing complex spatio-temporal relationships by non-GIS experts. The aim is to help avoid misrepresentations of data that fuel unnecessary debates, and thus help focus on important issues.