DATA Step, Macro, Functions and more

Filling a variable across multiple rows conditionally

Frequent Contributor
Posts: 138

Filling a variable across multiple rows conditionally



I have a dataset which is supposed to be at the person-level but in reality has some duplicate rows. It has an ID variable, 2 date variables, and several other categorical variables.


ID     Date1        Date2      Categ1.......Categ10

1      1/3/2014    4/9/2013    abc              xyz

1      2/15/2015                   def              jkl

2      10/9/2013                   abc             def

2      11/4/2014                   jkl               xyz

3      2/28/2012                  abc              xyz

3      3/15/2013  9/7/2014    def              jkl


What I want to do when there are dupes is to take the row with the latest Date1. I know how to do that by itself:


data want;

set have;

by ID Date1;

if last.ID then output;



The problem is that in cases where a person has a value for Date2 in the row with the earlier Date1, that value of Date2 should be filled into the later Date1 row (ie, the row that will be output into the new dataset without duplicates). So for person 1 for example, their second row would be output and would have Date1=2/15/2015 and Date2=4/9/2013. All the other variables should maintain their values from the original Date1=2/15/2015 row). I need to be able to do this without overwriting Date2 in case the later/output row for a person already has a value for Date2--ie, person 3 should have Date1=3/15/2013 and Date2=9/7/2014 in the output dataset, with all the other variables corresponding to the Date1=3/15/2013 row.


Any help is much appreciated.


Super User
Posts: 17,912

Re: Filling a variable across multiple rows conditionally

Use retain and first with an IF condition.

Some variation of the following should work.

retain date_keep;
IF then date_keep=date2;
if and missing(date2) then date2=date_keep;
if then output;
Respected Advisor
Posts: 4,659

Re: Filling a variable across multiple rows conditionally

I would do:


data want;
do until(;
    set have(rename=date2=d2); by id;
    if not missing(d2) then date2 = d2;
format date2 mmddyy10.;
drop d2;
Ask a Question
Discussion stats
  • 2 replies
  • 3 in conversation