Hello,
I've got a situation where I need to compare whether the addresses for contacts at companies match. Since Joe, Will and Jane are the contacts for company 123456, and all three have the same address, then they should go into the "matching" table so I know only one information pack needs to be put together and sent to them. Due to unwanted spaces and different spellings of Street, I've corrected these problems and created a variable named Cats_Addr. However, Jane has misspelt 'Chicago' so my matching code would still say their addresses don't match and that multiple information packs should be sent, which is a waste.
I need to use COMPGED to calculate the generalised edit distance between the addresses in this list, as that would tell me Jane's address is only one character different to Joe's and Will's, so I know it is most likely the same address. Likewise, Bob and Ping should get the same information pack, except that London is misspelled, and for the last company, Berlin is misspelled.
Everything I read up on COMPGED always compares two columns next to each other, but I can't find anything where COMPGED is used to compare the address or addresses below it (checking vertically instead of horizontally, checking against every address showing for the same companyID). Please note that there are thousands of customers so I can't just transpose to check the columns that way. Any thoughts please? Thanks for your ideas.
data have;
infile datalines dlm=',' dsd truncover;
input CompanyID $ CustID :$20. Name :$12. Address :$15. Address2 :$10.;
datalines;
123456,336,Joe,1 Any Street, Chicago
123456,337,Will,1 Any St, Chicago
123456,338,Jane,1 Any St, Chicgo
99888,222,Bob,2 Other Street, London
99888,222,Ping,2 Other St, Londun
77777,123,George,1 May Street, Berlin
77777,124,Rahul,1 May St, Berlen
;
data have2;
set have;
length Cats_Addr $25.;
Cats_Addr=compress(cats(Address,Address2));
Cats_Addr=tranwrd(Cats_Addr,'Street','St');
run;
data matching no_match;
do until (last.companyID);
set have2;
by CompanyID notsorted;
if first.CompanyID then _test=Cats_Addr;
if Cats_Addr NE _test then _differentAddr=1;
end;
do until (last.companyID);
set have2;
by CompanyID notsorted;
if _differentAddr then output no_match;
else output matching;
end;
drop _:;
run;
... View more