hi all,
I have a large data set that includes multiple rows for some subjects. Can anyone tell me how to (1) view all of the entries for which this is the case and (2) remove all entries except for the first one for each subject, And ideally place them in a different data set?
tia for any help,
leslie
Proc Sort should give you an easy option to implement such logic.
data have;
do subject=3,2,4,1,1,3,2,2;
otherVar+1;
output;
end;
stop;
run;
proc sort nodupkey
data=have
out=firstSubj
dupout=dupSubj
;
by subject;
run;
title 'firstSubj';
proc print data=firstSubj;
run;
title 'dupSubj';
proc print data=dupSubj;
run;
title;
For part 2:
/* UNTESTED CODE */
proc sql;
create table want as select * from have
group by subject having count(subject)=1;
quit;
Thank you! I am not sure I need to delete yet so I have not tried your code. does it remove all entries where count>1, including first?
thanks again,
leslie
You are correct, and so I withdraw my solution.
Proc Sort should give you an easy option to implement such logic.
data have;
do subject=3,2,4,1,1,3,2,2;
otherVar+1;
output;
end;
stop;
run;
proc sort nodupkey
data=have
out=firstSubj
dupout=dupSubj
;
by subject;
run;
title 'firstSubj';
proc print data=firstSubj;
run;
title 'dupSubj';
proc print data=dupSubj;
run;
title;
Thank you!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.