My data has IDs as a group of variables, let's say 3. It means each variable may not be unique individually, but together every 3 variables specifies a unique observation. It looks like this:
Var1 Var2 Var3
John PHIL PA
Mike PHIL PA
John CHIC IL
John PHIL PA
You will see that observations 1 and 4 are duplicates and there comes the question: How can I identify duplicates from this data?
If it's some single ID variable I can do:
data dups nodups;
set have;
by ID;
if first.ID and last.ID then output nodups;
else output dups;
run;
But I'm not sure how to do in this situation. I can sort them out one by one by wonder if there's a more efficient way.
Just sort by all three vars:
proc sort data=have;
by var1 var2 var3;
run;
data dups nodups;
set have;
by var1 var2 var3;
if first.var3 and last.var3 then output nodups;
else output dups;
run;
Just sort by all three vars:
proc sort data=have;
by var1 var2 var3;
run;
data dups nodups;
set have;
by var1 var2 var3;
if first.var3 and last.var3 then output nodups;
else output dups;
run;
Oh...Wao...Yeh...
I think I'm gonna go home. Well, no : )
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.