BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I have a dataset which is supposed to be at the person-level but in reality has some duplicate rows. It has an ID variable, 2 date variables, and several other categorical variables.

 

ID     Date1        Date2      Categ1.......Categ10

1      1/3/2014    4/9/2013    abc              xyz

1      2/15/2015                   def              jkl

2      10/9/2013                   abc             def

2      11/4/2014                   jkl               xyz

3      2/28/2012                  abc              xyz

3      3/15/2013  9/7/2014    def              jkl

 

What I want to do when there are dupes is to take the row with the latest Date1. I know how to do that by itself:

 

data want;

set have;

by ID Date1;

if last.ID then output;

run;

 

The problem is that in cases where a person has a value for Date2 in the row with the earlier Date1, that value of Date2 should be filled into the later Date1 row (ie, the row that will be output into the new dataset without duplicates). So for person 1 for example, their second row would be output and would have Date1=2/15/2015 and Date2=4/9/2013. All the other variables should maintain their values from the original Date1=2/15/2015 row). I need to be able to do this without overwriting Date2 in case the later/output row for a person already has a value for Date2--ie, person 3 should have Date1=3/15/2013 and Date2=9/7/2014 in the output dataset, with all the other variables corresponding to the Date1=3/15/2013 row.

 

Any help is much appreciated.

 

2 REPLIES 2
Reeza
Super User
Use retain and first with an IF condition.

Some variation of the following should work.

retain date_keep;
IF first.id then date_keep=date2;
if last.id and missing(date2) then date2=date_keep;
if last.id then output;
PGStats
Opal | Level 21

I would do:

 

data want;
do until(last.id);
    set have(rename=date2=d2); by id;
    if not missing(d2) then date2 = d2;
    end;
format date2 mmddyy10.;
drop d2;
run;
PG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1146 views
  • 0 likes
  • 3 in conversation