I have a dataset with many patients first year visit information. Then I want to get some information from the first year data into the second and third year. For example, for ID=1, then his age will be 51, 52 in the 2nd and 3rd year, while gender is still male.
data have;
input id age visit gender $;
datalines;
1 50 1 male
1 . 2 .
1 . 3 .
2 60 1 female
2 . 2 .
2 . 3 .
;
run;
what I want
id | age | visit | gender |
1 | 50 | 1 | male |
1 | 51 | 2 | male |
1 | 52 | 3 | male |
2 | 60 | 1 | female |
2 | 61 | 2 | female |
Thank you.
Try this
data have;
input id age visit gender $;
datalines;
1 50 1 male
1 . 2 .
1 . 3 .
2 60 1 female
2 . 2 .
2 . 3 .
;
run;
data want(drop = a g);
set have;
by id;
if first.id then do;
g = gender;
a = age;
end;
else a + 1;
gender = g;
age = a;
retain a g;
run;
Try this
data have;
input id age visit gender $;
datalines;
1 50 1 male
1 . 2 .
1 . 3 .
2 60 1 female
2 . 2 .
2 . 3 .
;
run;
data want(drop = a g);
set have;
by id;
if first.id then do;
g = gender;
a = age;
end;
else a + 1;
gender = g;
age = a;
retain a g;
run;
You can actually trick the UPDATE statement into doing this for you. The purpose of the UPDATE statement is to allow you to apply a transaction dataset to an existing dataset. When the variable is the transaction dataset is missing then the current value is not replaced.
You have to have a BY group and normally the BY variables should uniquely identify the observations in the source dataset. But you can have multiple transactions per BY group. Your data already has a BY variable, ID, but it is not unique. But if you just treat ALL of the observations as the transactions your can avoid that problem. So use the OBS=0 dataset option to start with an empty source dataset. To output more than one observation per BY group just add an explicit OUTPUT statement.
data want;
update have(obs=0) have;
by id;
output;
run;
For future reference, consider a strategic combination of DROP= and POINT=:
data have;
input id age visit gender $;
datalines;
1 50 1 male
1 . 2 .
1 . 3 .
2 60 1 female
2 . 2 .
2 . 3 .
run;
data want;
set have (drop=age gender);
by id;
if first.id then set have point=_n_;
run;
The benefit of this approach is that it avoids the need for an explicit RETAIN statement.
That trick will only work when the non-missing values only appear on the first observation in the BY group.
Your code would not give the updated age for 2nd and 3rd year:
1 | 1 | 50 | male |
1 | 2 | 50 | male |
1 | 3 | 50 | male |
2 | 1 | 60 | female |
2 | 2 | 60 | female |
2 | 3 | 60 | female |
We need one more step to correct it:
data want;
set want;
if visit=2 then age=age+1;
if visit=3 then age=age+2;
run;
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.