> Can you please elaborate more on how to carry out successive matches.
Not too sure whats unclear.
So the course of action would be:
1. Sort the tables by ZIP
2. Merge on ZIP equality and SUBCITY equal to the start of city (use scan() for the first word if long enough or scan() using the parentheses as delimiter or operator =: , or all these successively)
3. What's hasn't been matched can be retried with other criteria including fuzzy ones, like using the function compged() . The cost get higher but the volumes get smaller.
1. Join the easily found matches using an obvious criterion like ZIP equality and SUBCITY = first word => function scan()
2. Join the unmatched data on a less direct criterion like ZIP equality and SUBCITY = any word => function index()
3. Repeat the process for unmatched data until satisfied: the volume to match goes down as the criterion increases in fuzziness.
4. When finished, append the successive matches. It is a good idea to keep track of the match method so the data includes some sort of match-quality score.
... View more