Sunday, 25 September 2016

"They call me the cleaner"

[Update: a mirror article on LinkedIn Pulse expands on historic and business background]

I refer to the character Victor played by Jen Reno in Luc Besson's 1990 film "Nikita".

from YouTube

In the process of helping a 2017 French electoral campaign, I received an electoral list that was complete but with four address fields that were filled in pell-mell by voter registrants. A Safe 2015 FME World Tour presentation showed an unusual use of FME Workbench: to scrape, cull and list music title lists. I did the same here to normalise address fields for upload to a voting campaign website.

Now I'm a spatial not a regex kinda guy, so Safe's tech support engineer Richard Mosley gave me a hand, kudos for his support.  The result is a clean list that can be uploaded to

Here are a few more technical details if you're interested:
  • CSV Feature Writer allows to manipulate non-spatial data
  • StringSearcher transformer on each address field parses them
  • AttributeManager formats them to write them out consistently
  • Country is the constant last field and others are written leftward
  • thus addresses are consistent ADR1, ADR2, ZIP, CITY, CTRY
  • needless to say there's a lot of cleaun-up still to do after that...

And when updates arrive, run this only on the new data or deltas:
  • Another CSV Feature Writer is used with FeatureMerger transformer 
  • use Requestor for the original and Supplier for the new file
  • Merged outputs the common fields and NotMerged the different ones
  • only run the first process against the hopefully shorter difference file
  • this can also be used to find the rejected fields for later QC & reload

Lessons learned were:
  • metadata is king - no verbose PDF? no clean up of CSV lists!
  • a bit of logic and reasoning went a long way toward crafting an FME clean-up script  
  • but there are no shortcuts to QC and final clean-up, "you get out what you put into it"
And thanks again for Safe Software's help.

1 comment:

  1. MP Board Class 10th Syllabus 2021-2022 - MP Board has released the reduced MP Board syllabus 2021-2022 for class 10th on the website The Board has issued the revised Board syllabus for MP Board Class 10th for the academic session 2021-2022. Students preparing for 10th board examinaton should check the MPBSE Class 10th syllabus 2021 for English, Mathematics, Science, Social Science and Special English more. MPBSE 10th Syllabus MP Board Class 10th syllabus 2021-2022 contains all the units and topics of every subject which need to study for upcoming exams. Students can check the detailed subject-wise MPBSE Board class 10 syllabus 2021 in the web page. MP board 10th syllabus 2021 will assist students to formulate preparation strategies for the MPBSE 10th exams. After completing the MP Board Class 10th syllabus 2021-2022