Saturday, 6 September 2014

On joining and merging historic multi-lingual geodata

Earlier posts chronicled the history even the beauty of historic shipping and climate data from CLIWOC. British, Dutch, French and Spanish maritime agencies transferred paper logs to digital records. In doing so look-up tables allowed to convert multi-lingual records into quantifiable attributes. Something odd (to me) happened in the process of mapping these: over 1/4M records doubled to almost 1/2M when look-ups were joined and then wind and direction tables merged to create maps symbolised by wind force and orientation.

Data are posted online for free re-use, to allow the geodata-oriented to view and possibly comment, if anything odd happened in the way I treated CLIWOC data. This is a very rich dataset, and the cleaner it can be made, the easier it will be for climatologists to pick it up and study historic data. Do go ahead and don't be shy!

[Sat 20 Sep update: Thanks Hussein for the comment, here are my lessons learned:
  • mega datasets need not be reduced as disk and memory are no longer constraints  (more on that later)
  • create large join tables with redundant attributes if need be, as views and relates kill performance 
  • feature joins then attribute cleanups sure create more features but they're clean 
  • original data are maintained and table names are kept to allow for auditing against originals]