Hi folks,
I have a data set on emergency healthcare use by children so one id can have multiple admission dates to emergency wards. My problem is same id with same admission date more than once.
So, I want to find out duplicates based on two variables; 'id' and 'admissiondate' and create a data set without duplicates but before I create a data set without duplicates I need to know how many duplicate entries I have in the data set for verification purposes.
Thanks
S
Hi @sks521
You can use a PROC SORT to do that with the NODUPKEY option
The DUPOUT = option will output a dataset 'want' with the duplicate records base on the key 'id admissiondate' specified in the BY statement.
The OUT = option will output a dataset 'duplicates' with no duplicate records
proc sort data=have out = want dupout = duplicates nodupkey;
by id admissiondate;
run;
Hi @sks521
You can use a PROC SORT to do that with the NODUPKEY option
The DUPOUT = option will output a dataset 'want' with the duplicate records base on the key 'id admissiondate' specified in the BY statement.
The OUT = option will output a dataset 'duplicates' with no duplicate records
proc sort data=have out = want dupout = duplicates nodupkey;
by id admissiondate;
run;
Best Method now is:
proc sort data=ds1 out =ds2 dupout = dup nodupkey;
by id admissiondate;
run;
Sort your data by ID and AdmissionDate, and then use a data step:
data want;
do NDups=0 by 1 until(last.AdmissionDate);
set have;
by ID AdmissionDate;
end;
run;
The NDups variable should the contain the number of duplicates for each by group.
@sks521 wrote:
Hi folks,
I have a data set on emergency healthcare use by children so one id can have multiple admission dates to emergency wards. My problem is same id with same admission date more than once.
So, I want to find out duplicates based on two variables; 'id' and 'admissiondate' and create a data set without duplicates but before I create a data set without duplicates I need to know how many duplicate entries I have in the data set for verification purposes.
Thanks
S
This will create a data set that has the id and admiisiondate with a count of how many duplicates when there are duplicates.
proc freq data=have noprint; tables id*admissiondate /out=work.counts (drop=percent where=(count>1)) ; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.