Filling variable from another row with same ID

Walternate · Posted 08-06-2015 03:07 PM

Hi,

I have a dataset with multiple rows per person, which I am reducing to one row per person based on values of other variables. There's an ID variable, two categorical variables, and a date variable. Categ_1 can differ across the same ID, but categ_2 will always have the same value for the same ID (or more accurately, it can be missing, but there won't be one person who has two separate non-missing values for this variable).

ID categ_1 categ_2 date

1 a 1/2/2015

1 b x 2/14/2015

2 c y 10/9/2013

2 c y 5/4/2012

Basically, my logic for which record to keep for each person is this:

1. If any of their values of categ_1=a, that record should be kept

2. If none of their values of categ_1=a, then the record with the most recent date should be kept

So all I did was sort by categ_1 and date (descending), then output the top record.

The problem is that none of the records where categ_1=a has a value for categ_2. Since categ_2 will always be the same within each ID number, I want to add a step where I fill categ_2 using other row(s) of that ID number before outputting. The way I have it now, the top record for ID1 will output, which is right, but I want it to have categ_2=x before it outputs to the one person/one row dataset.

Any help is much appreciated.

Sonywell · Posted 08-06-2015 04:51 PM

How about something like this:

/* order your input set */

proc sql;

create table part_output as

select *

from input

order by id, categ_1, categ_2, date desc;

/* select records using by processing and a retained flag (twrot) */

data myoutput;

set part_output;

retain twrot;

by id;

if first.id and categ_1 = 'a' then do;

twrot = 0;

output;

end;

else if twrot = 1 then do;

twrot = 0;

output;

end;

if last.id then do;

twrot = 1;

end;

run;

Tom · Posted 08-06-2015 05:08 PM

UPDATE statement is useful for this type of problem.

data have ;

input ID categ_1 $ categ_2 $ date :mmddyy10. ;

format date yymmdd10.;

cards;

1 a . 1/2/2015

1 b x 2/14/2015

2 c y 10/9/2013

2 c y 5/4/2012

;;;;

proc sql ;

create view middle as

select *,(categ_1 = 'a') as isa

from have

order by id, isa , date

;

quit;

data want ;

update middle (obs=0) middle ;

by id ;

drop isa ;

run;

proc print; run;

Obs ID categ_1 categ_2 date

1 1 a x 2015-01-02

2 2 c y 2013-10-09

Ksharp · Posted 08-07-2015 09:04 AM

Code: Program1.sas

data have ;
  input ID   categ_1  $ categ_2 $   date :mmddyy10. ;
  format date yymmdd10.;
cards;
1 a . 1/2/2015
1 b x 2/14/2015
2 c y 10/9/2013
2 c y 5/4/2012
;;;;
run;
proc sort data=have;by id date;run;
data want;
 do until(last.id);
  set have;
  by id;
  if not missing(categ_2) then _categ_2=categ_2;
  if not found then do;
   _categ_1=categ_1;
   _date=date;
  end;
  if categ_1='a' then do;
   _categ_1 =categ_1 ;
   _date=date;
   found=1;
  end;
 end;
keep id _:;
format _date mmddyy10.;
run;

karanfil · Posted 07-19-2020 07:16 PM

Hello All,

I have file ( attached) .I have data like in first table and I would like to have data looks like second table. If anybody help me code for this I really appreciate it.

Thanks