DATA Step, Macro, Functions and more

Filling variable from another row with same ID

Reply
Frequent Contributor
Posts: 138

Filling variable from another row with same ID

Hi,

I have a dataset with multiple rows per person, which I am reducing to one row per person based on values of other variables. There's an ID variable, two categorical variables, and a date variable. Categ_1 can differ across the same ID, but categ_2 will always have the same value for the same ID (or more accurately, it can be missing, but there won't be one person who has two separate non-missing values for this variable).

ID     categ_1     categ_2     date

1             a                         1/2/2015

1             b            x            2/14/2015

2             c            y            10/9/2013

2             c            y             5/4/2012

Basically, my logic for which record to keep for each person is this:

     1. If any of their values of categ_1=a, that record should be kept

     2. If none of their values of categ_1=a, then the record with the most recent date should be kept

So all I did was sort by categ_1 and date (descending), then output the top record.

The problem is that none of the records where categ_1=a has a value for categ_2. Since categ_2 will always  be the same within each ID number, I want to add a step where I fill categ_2 using other row(s) of that ID number before outputting. The way I have it now, the top record for ID1 will output, which is right, but I want it to have categ_2=x before it outputs to the one person/one row dataset.

Any help is much appreciated.

Occasional Contributor
Posts: 13

Re: Filling variable from another row with same ID

Posted in reply to Walternate

How about something like this:

/* order your input set */

proc sql;

create table part_output as

select *

from input

order by id, categ_1, categ_2, date desc;

/* select records using by processing and a retained flag (twrot) */

data myoutput;

     set part_output;

     retain twrot;

     by id;

     if first.id and categ_1 = 'a' then do;

           twrot = 0;

           output;

     end;

     else if twrot = 1 then do;

           twrot = 0;

           output;

     end;

     if last.id then do;

           twrot = 1;

     end;

run;

Super User
Super User
Posts: 7,039

Re: Filling variable from another row with same ID

UPDATE statement is useful for this type of problem.

data have ;

  input ID     categ_1  $ categ_2 $    date :mmddyy10. ;

  format date yymmdd10.;

cards;

1             a            .             1/2/2015

1             b            x            2/14/2015

2             c y            10/9/2013

2             c            y             5/4/2012

;;;;

proc sql ;

create view middle as

select *,(categ_1 = 'a') as isa

from have

order by id, isa , date

;

quit;

data want ;

  update middle (obs=0) middle ;

  by id ;

  drop isa ;

run;

proc print; run;


Obs ID categ_1 categ_2 date

1 1 a x 2015-01-02

2 2 c y 2013-10-09

Super User
Posts: 10,023

Re: Filling variable from another row with same ID

Posted in reply to Walternate

Code: Program1.sas

data have ;
  input ID   categ_1  $ categ_2 $   date :mmddyy10. ;
  format date yymmdd10.;
cards;
1 a . 1/2/2015
1 b x 2/14/2015
2 c y 10/9/2013
2 c y 5/4/2012
;;;;
run;
proc sort data=have;by id date;run;
data want;
do until(last.id);
  set have;
  by id;
  if not missing(categ_2) then _categ_2=categ_2;
  if not found then do;
   _categ_1=categ_1;
   _date=date;
  end;
  if categ_1='a' then do;
   _categ_1 =categ_1 ;
   _date=date;
   found=1;
  end;
end;
keep id _:;
format _date mmddyy10.;
run;
Ask a Question
Discussion stats
  • 3 replies
  • 1393 views
  • 0 likes
  • 4 in conversation