I refer to the character Victor played by Jen Reno in Luc Besson's 1990 film "Nikita".
In the process of helping a 2017 French electoral campaign, I received an electoral list that was complete but with four address fields that were filled in pell-mell by voter registrants. A Safe 2015 FME World Tour presentation showed an unusual use of FME Workbench: to scrape, cull and list music title lists. I did the same here to normalise address fields for upload to a voting campaign website.
Now I'm a spatial not a regex kinda guy, so Safe's tech support engineer Richard Mosley gave me a hand, kudos for his support. The result is a clean list that can be uploaded to nationbuilder.com.
Here are a few more technical details if you're interested:
- CSV Feature Writer allows to manipulate non-spatial data
- StringSearcher transformer on each address field parses them
- AttributeManager formats them to write them out consistently
- Country is the constant last field and others are written leftward
- thus addresses are consistent ADR1, ADR2, ZIP, CITY, CTRY
- needless to say there's a lot of cleaun-up still to do after that...
And when updates arrive, run this only on the new data or deltas:
- Another CSV Feature Writer is used with FeatureMerger transformer
- use Requestor for the original and Supplier for the new file
- Merged outputs the common fields and NotMerged the different ones
- only run the first process against the hopefully shorter difference file
- this can also be used to find the rejected fields for later QC & reload
Lessons learned were:
- metadata is king - no verbose PDF? no clean up of CSV lists!
- a bit of logic and reasoning went a long way toward crafting an FME clean-up script
- but there are no shortcuts to QC and final clean-up, "you get out what you put into it"