Sunday, 25 September 2016

"They call me the cleaner"

[Update: a mirror article on LinkedIn Pulse expands on historic and business background]

I refer to the character Victor played by Jen Reno in Luc Besson's 1990 film "Nikita".

from YouTube

In the process of helping a 2017 French electoral campaign, I received an electoral list that was complete but with four address fields that were filled in pell-mell by voter registrants. A Safe 2015 FME World Tour presentation showed an unusual use of FME Workbench: to scrape, cull and list music title lists. I did the same here to normalise address fields for upload to a voting campaign website.


Now I'm a spatial not a regex kinda guy, so Safe's tech support engineer Richard Mosley gave me a hand, kudos for his support.  The result is a clean list that can be uploaded to nationbuilder.com.

Here are a few more technical details if you're interested:
  • CSV Feature Writer allows to manipulate non-spatial data
  • StringSearcher transformer on each address field parses them
  • AttributeManager formats them to write them out consistently
  • Country is the constant last field and others are written leftward
  • thus addresses are consistent ADR1, ADR2, ZIP, CITY, CTRY
  • needless to say there's a lot of cleaun-up still to do after that...

And when updates arrive, run this only on the new data or deltas:
  • Another CSV Feature Writer is used with FeatureMerger transformer 
  • use Requestor for the original and Supplier for the new file
  • Merged outputs the common fields and NotMerged the different ones
  • only run the first process against the hopefully shorter difference file
  • this can also be used to find the rejected fields for later QC & reload

Lessons learned were:
  • metadata is king - no verbose PDF? no clean up of CSV lists!
  • a bit of logic and reasoning went a long way toward crafting an FME clean-up script  
  • but there are no shortcuts to QC and final clean-up, "you get out what you put into it"
And thanks again for Safe Software's help.