While there is no substitute for proper cleaning of defective data, a shortcut approach would be to take the most common combination of values to be the "correct" set and apply those to your data. There is no guarantee that this will be correct but the "errors" should be spread across the wide variety of ways it is possible to be wrong so the valid data is anticipated to be the most populous. In your particular instance it mihgt be more appropriate to hande id1 and var1 separately from id2 and var2 but this might be food for thought: proc sql noprint; /*First create distinct combinations of id1, id2, var1 and var2*/ create table work.groups as select distinct id1, id2, var1, var2, count(*) as counter from work.have group by 1,2,3,4 ; /*Then select the most popular combination (on the assumption that this is the one most likely to be correct)*/ create table work.best_group as select id1, id2, var1, var2 from work.groups group by 1,2 having counter=max(counter) ; /*Finally join the original dataset back to the "best" grouping*/ create table work.want as select h.id1, h.id2, h.Transaction_Date, bg.var1, bg.var2 from work.have as h, work.best_group as bg where h.id1 = bg.id1 and h.id2 = bg.id2 ; quit;
... View more