i have a dataset that has two ids: id1 and id2. i want to find out if there are any duplicated records for the combined id1 and id2. i created a code like this but it don't work. are there any easy way to do this in sas proc sql?
data combine;
set dataset;
combinedid=catx(id1, id2);
run;
PROC SQL;
SELECT id1, id2,
FREQ(combinedid) AS dupe
FROM dataset
GROUP BY combinedid
HAVING dupe GE 2;
quit;
Run a count and use HAVING:
proc sql;
select id1, id2, count(*) as count
from have
group by id1, id2
having calculated count > 1;
quit;
Or run a proc freq or proc summary and filter out the observations with a count > 1.
Thank you all. Both proc sql and proc freq works with my case:
proc sql;
select id1, id2, count(*) as count
from have
group by id1, id2
having calculated count > 1;
quit;
PROC FREQ data=have;
TABLES id1*id2 / noprint out=duplist;
RUN;
PROC PRINT data=duplist;
WHERE count ge 2;
RUN;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.