i have a dataset that has two ids: id1 and id2. i want to find out if there are any duplicated records for the combined id1 and id2. i created a code like this but it don't work. are there any easy way to do this in sas proc sql?
data combine;
set dataset;
combinedid=catx(id1, id2);
run;
PROC SQL;
SELECT id1, id2,
FREQ(combinedid) AS dupe
FROM dataset
GROUP BY combinedid
HAVING dupe GE 2;
quit;
Run a count and use HAVING:
proc sql;
select id1, id2, count(*) as count
from have
group by id1, id2
having calculated count > 1;
quit;
Or run a proc freq or proc summary and filter out the observations with a count > 1.
Thank you all. Both proc sql and proc freq works with my case:
proc sql;
select id1, id2, count(*) as count
from have
group by id1, id2
having calculated count > 1;
quit;
PROC FREQ data=have;
TABLES id1*id2 / noprint out=duplist;
RUN;
PROC PRINT data=duplist;
WHERE count ge 2;
RUN;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.