My data has IDs as a group of variables, let's say 3. It means each variable may not be unique individually, but together every 3 variables specifies a unique observation. It looks like this:
Var1 Var2 Var3
John PHIL PA
Mike PHIL PA
John CHIC IL
John PHIL PA
You will see that observations 1 and 4 are duplicates and there comes the question: How can I identify duplicates from this data?
If it's some single ID variable I can do:
data dups nodups;
set have;
by ID;
if first.ID and last.ID then output nodups;
else output dups;
run;
But I'm not sure how to do in this situation. I can sort them out one by one by wonder if there's a more efficient way.
Just sort by all three vars:
proc sort data=have;
by var1 var2 var3;
run;
data dups nodups;
set have;
by var1 var2 var3;
if first.var3 and last.var3 then output nodups;
else output dups;
run;
Just sort by all three vars:
proc sort data=have;
by var1 var2 var3;
run;
data dups nodups;
set have;
by var1 var2 var3;
if first.var3 and last.var3 then output nodups;
else output dups;
run;
Oh...Wao...Yeh...
I think I'm gonna go home. Well, no : )
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.