Hi,
I have a dataset which is supposed to be at the person-level but in reality has some duplicate rows. It has an ID variable, 2 date variables, and several other categorical variables.
ID Date1 Date2 Categ1.......Categ10
1 1/3/2014 4/9/2013 abc xyz
1 2/15/2015 def jkl
2 10/9/2013 abc def
2 11/4/2014 jkl xyz
3 2/28/2012 abc xyz
3 3/15/2013 9/7/2014 def jkl
What I want to do when there are dupes is to take the row with the latest Date1. I know how to do that by itself:
data want;
set have;
by ID Date1;
if last.ID then output;
run;
The problem is that in cases where a person has a value for Date2 in the row with the earlier Date1, that value of Date2 should be filled into the later Date1 row (ie, the row that will be output into the new dataset without duplicates). So for person 1 for example, their second row would be output and would have Date1=2/15/2015 and Date2=4/9/2013. All the other variables should maintain their values from the original Date1=2/15/2015 row). I need to be able to do this without overwriting Date2 in case the later/output row for a person already has a value for Date2--ie, person 3 should have Date1=3/15/2013 and Date2=9/7/2014 in the output dataset, with all the other variables corresponding to the Date1=3/15/2013 row.
Any help is much appreciated.
I would do:
data want;
do until(last.id);
set have(rename=date2=d2); by id;
if not missing(d2) then date2 = d2;
end;
format date2 mmddyy10.;
drop d2;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.