Solved: Update information from previous row

Joey2 · Posted 02-28-2021 02:17 PM

I have a dataset with many patients first year visit information. Then I want to get some information from the first year data into the second and third year. For example, for ID=1, then his age will be 51, 52 in the 2nd and 3rd year, while gender is still male.

data have;
input id age visit gender $;
datalines;
1 50 1 male
1 .  2  .
1 .  3  .
2 60 1 female
2 .  2  .
2 .  3  .
;
run;

what I want

id	age	visit	gender
1	50	1	male
1	51	2	male
1	52	3	male
2	60	1	female
2	61	2	female

Thank you.

PeterClemmensen · Posted 02-28-2021 02:31 PM

Try this


data have;
input id age visit gender $;
datalines;
1 50 1 male
1 .  2  .
1 .  3  .
2 60 1 female
2 .  2  .
2 .  3  .
;
run;

data want(drop = a g);
   set have;
   by id;
   if first.id then do;
      g = gender;
      a = age;
   end;
   else a + 1;

   gender = g;
   age    = a;
   

   retain a g;
run;

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 02-28-2021 02:31 PM

Try this


data have;
input id age visit gender $;
datalines;
1 50 1 male
1 .  2  .
1 .  3  .
2 60 1 female
2 .  2  .
2 .  3  .
;
run;

data want(drop = a g);
   set have;
   by id;
   if first.id then do;
      g = gender;
      a = age;
   end;
   else a + 1;

   gender = g;
   age    = a;
   

   retain a g;
run;

The DATA to DATA Step Macro
Blog: SASnrd

Joey2 · Posted 02-28-2021 03:01 PM

It works! Thank you.

Tom · Posted 02-28-2021 03:10 PM

You can actually trick the UPDATE statement into doing this for you. The purpose of the UPDATE statement is to allow you to apply a transaction dataset to an existing dataset. When the variable is the transaction dataset is missing then the current value is not replaced.

You have to have a BY group and normally the BY variables should uniquely identify the observations in the source dataset. But you can have multiple transactions per BY group. Your data already has a BY variable, ID, but it is not unique. But if you just treat ALL of the observations as the transactions your can avoid that problem. So use the OBS=0 dataset option to start with an empty source dataset. To output more than one observation per BY group just add an explicit OUTPUT statement.

data want;
  update have(obs=0) have;
  by id;
  output;
run;

mkeintz · Posted 02-28-2021 03:44 PM

For future reference, consider a strategic combination of DROP= and POINT=:

data have;
input id age visit gender $;
datalines;
1 50 1 male
1 .  2  .
1 .  3  .
2 60 1 female
2 .  2  .
2 .  3  .
run;
data want;
  set have (drop=age gender);
  by id;
  if first.id then set have point=_n_;
run;

The benefit of this approach is that it avoids the need for an explicit RETAIN statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Tom · Posted 02-28-2021 03:50 PM

That trick will only work when the non-missing values only appear on the first observation in the BY group.

Joey2 · Posted 02-28-2021 07:28 PM

Your code would not give the updated age for 2nd and 3rd year:

id visit age gender

1	1	50	male
1	2	50	male
1	3	50	male
2	1	60	female
2	2	60	female
2	3	60	female

We need one more step to correct it:

data want;
set want;
if visit=2 then age=age+1;
if visit=3 then age=age+2;
run;

Update information from previous row

Re: Update information from previous row

Re: Update information from previous row

Re: Update information from previous row

Re: Update information from previous row

Re: Update information from previous row

Re: Update information from previous row

Re: Update information from previous row

SAS Innovate 2025: Call for Content

Classroom Training Available!