My data has IDs as a group of variables, let's say 3. It means each variable may not be unique individually, but together every 3 variables specifies a unique observation. It looks like this:
Var1 Var2 Var3
John PHIL PA
Mike PHIL PA
John CHIC IL
John PHIL PA
You will see that observations 1 and 4 are duplicates and there comes the question: How can I identify duplicates from this data?
If it's some single ID variable I can do:
data dups nodups;
set have;
by ID;
if first.ID and last.ID then output nodups;
else output dups;
run;
But I'm not sure how to do in this situation. I can sort them out one by one by wonder if there's a more efficient way.
Just sort by all three vars:
proc sort data=have;
by var1 var2 var3;
run;
data dups nodups;
set have;
by var1 var2 var3;
if first.var3 and last.var3 then output nodups;
else output dups;
run;
Just sort by all three vars:
proc sort data=have;
by var1 var2 var3;
run;
data dups nodups;
set have;
by var1 var2 var3;
if first.var3 and last.var3 then output nodups;
else output dups;
run;
Oh...Wao...Yeh...
I think I'm gonna go home. Well, no : )
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.