My data has IDs as a group of variables, let's say 3. It means each variable may not be unique individually, but together every 3 variables specifies a unique observation. It looks like this:
Var1 Var2 Var3
John PHIL PA
Mike PHIL PA
John CHIC IL
John PHIL PA
You will see that observations 1 and 4 are duplicates and there comes the question: How can I identify duplicates from this data?
If it's some single ID variable I can do:
data dups nodups;
set have;
by ID;
if first.ID and last.ID then output nodups;
else output dups;
run;
But I'm not sure how to do in this situation. I can sort them out one by one by wonder if there's a more efficient way.
Just sort by all three vars:
proc sort data=have;
by var1 var2 var3;
run;
data dups nodups;
set have;
by var1 var2 var3;
if first.var3 and last.var3 then output nodups;
else output dups;
run;
Just sort by all three vars:
proc sort data=have;
by var1 var2 var3;
run;
data dups nodups;
set have;
by var1 var2 var3;
if first.var3 and last.var3 then output nodups;
else output dups;
run;
Oh...Wao...Yeh...
I think I'm gonna go home. Well, no : )
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.