I have a large dataset with addresses where each individual has multiple addresses. I am using a ArcGIS to geocode these addresses but I am first trying to deduplicate the addresses in SAS. Sometime the addresses are actually different addresses and sometime they are the same address repeated, with time indicator for each address. In the example observations below, I would like all three rows to collapse to one row. I was wondering if anyone on here has ever done this (and save me some time and headaches). At the bottom you can see the code I have started for this. I made the dataset horizontal, so that each individual has one row and mulitple addresses (ending in _1 - _17). I was trying to match with the COMPGED function. If a person moved from address A then to B and back to A I would not want the addresses to collapse, as the time variables will show that the individual was at address A from two separate time periods and address B during a time period. Once I have sufficiently deduplicated the addresses, I will make the dataset long again and the export to geocode. Raw data: Row ID Address City State Zipcode DateBegin DateEnd 1 1 123 Northbay Lane Sometown CA 12345 1/1/2015 2/28/2015 2 1 123 Northbay Ln Sometown CA 12345 3/1/2015 11/30/2016 3 1 123 Nrthby Lane Sometown CA 12345 12/1/2016 3/10/2017 Change to: Row ID Address City State Zipcode DateBegin DateEnd 1 1 123 Northbay Lane Sometown CA 12345 1/1/2015 3/10/2017 In this example I just show one example for one person but this could happen several times for one person then the data sets are several hundred thousand participants so I can't manually make the changes. Let me know if I can explain anything. Any advice would be greatly appreciated! - Claire Code I have started: f_add_1 = todedup_add_1; do i = 2 to 17; do j = 2 to 16; if todedup_add_(i) ne ' ' then do; if COMPGED(todedup_add_(i-1),todedup_add_(i), ':') = 0 then continue; /* MATCH */ else if (0 < COMPGED(todedup_add_(i-1),todedup_add_(i), ':') <= 500) and (0 = COMPGED(house_num_(i-1),house_num_(i), ':')) then continue; /* MATCH */ else if (500 < COMPGED(todedup_add_(i-1),todedup_add_(i), ':') <= 600) and (0 = COMPGED(house_num_(i-1),house_num_(i), ':')) and (0 = COMPGED(zip_(i-1),zip_(i), ':')) then continue; /* MATCH, if the house number is identical and the zipcode then it is a match */ else if 600 < COMPGED(todedup_add_(i-1),todedup_add_(i), ':') then do; /* NOT A MATCH; MAKE A NEW FINAL ADDRESS */ f_add_(j) = todedup_add_(i); if i = incoming_row_count then leave; i=i+1; end; end; end; end;
... View more