Sunday, 20 October 2013

Releasing data really works, Part III

More and more free data are available that are quality-controlled and verifiable. Guardian Data Blog's @smfrogers (now at Twitter) was quite sanguine about this:
Comment is free, but facts are sacred
This reflects the entire geo-industry's credo - say what you want, but ensure your data's Triple-A rating: available, accurate and auditable. Esri's Jack Dangermond exhorted data data accuracy as well as crowd-sourcing. OpenStreetMap's Steve Coast  also said that in crowdsourcing "all users have a stake in the accuracy of the data." I recently posted a complete round-trip process with UK Ordnance Survey data: download data to use in an East Anglia medieval history project, run it through Socium's on-line validation, report inaccuracies to the Ordnance Survey, who will incorporate edits in the later data iterations (more later).

Guardian Data posted Great Britain's train station data, and they used Google Fusion Tables to post some of the data. I downloaded the data set, mapped it against UK post code data from Doogal UK to place stations at post code centerpoints, and classified it by year and frequency. UK Ordnance Survey County and District data, and NOAA GSHHS coastal outline subset completed the picture. The maps were created on ArcMap for Home Use. then posted on loader for ArcMap data was then used to post it online here and below, together with USGS SRTM web map service (WMS) for background.

The caveat here is that stations are not posted at their street address but at postode centerpoints - but this project is not to create, say, accurate routing from one station to the other for costing purposes, but rather to show the evolution of traffic in said stations. The key factor here is to have a common geographic reference appropriate to the exercise. My Medieval Fenlands project for example benefitted from the fact the Parishes were a common geographic entity since Domesday in 1067! HC Darby's data were mapped against said parishes, and data historian Julie Bowring (pers. comm.) pointed out that Parish boundaries had moved slightly over time. But she agreed that for the purposes of comparing agro-economic wealth in East Anglia over almost a millennia, using a common set of Parish polygons will not affect the results, provided that the datasets used are disclosed. And of course the data source from the Ordnance Survey copyright notice....

This is yet another example where posting data and making it publicly available can move forward map making through mashups of various data sources. The key proviso, however, is that data sources are acknowledged all the way. Not only will it allow auditing and referral, but it also allows others to create more of the same according to their particular expertise. Isn't that, after all, what crowdsourcing is all about?