I have a dataset with many patients first year visit information. Then I want to get some information from the first year data into the second and third year. For example, for ID=1, then his age will be 51, 52 in the 2nd and 3rd year, while gender is still male.
data have;
input id age visit gender $;
datalines;
1 50 1 male
1 . 2 .
1 . 3 .
2 60 1 female
2 . 2 .
2 . 3 .
;
run;
what I want
id | age | visit | gender |
1 | 50 | 1 | male |
1 | 51 | 2 | male |
1 | 52 | 3 | male |
2 | 60 | 1 | female |
2 | 61 | 2 | female |
Thank you.
Try this
data have;
input id age visit gender $;
datalines;
1 50 1 male
1 . 2 .
1 . 3 .
2 60 1 female
2 . 2 .
2 . 3 .
;
run;
data want(drop = a g);
set have;
by id;
if first.id then do;
g = gender;
a = age;
end;
else a + 1;
gender = g;
age = a;
retain a g;
run;
Try this
data have;
input id age visit gender $;
datalines;
1 50 1 male
1 . 2 .
1 . 3 .
2 60 1 female
2 . 2 .
2 . 3 .
;
run;
data want(drop = a g);
set have;
by id;
if first.id then do;
g = gender;
a = age;
end;
else a + 1;
gender = g;
age = a;
retain a g;
run;
You can actually trick the UPDATE statement into doing this for you. The purpose of the UPDATE statement is to allow you to apply a transaction dataset to an existing dataset. When the variable is the transaction dataset is missing then the current value is not replaced.
You have to have a BY group and normally the BY variables should uniquely identify the observations in the source dataset. But you can have multiple transactions per BY group. Your data already has a BY variable, ID, but it is not unique. But if you just treat ALL of the observations as the transactions your can avoid that problem. So use the OBS=0 dataset option to start with an empty source dataset. To output more than one observation per BY group just add an explicit OUTPUT statement.
data want;
update have(obs=0) have;
by id;
output;
run;
For future reference, consider a strategic combination of DROP= and POINT=:
data have;
input id age visit gender $;
datalines;
1 50 1 male
1 . 2 .
1 . 3 .
2 60 1 female
2 . 2 .
2 . 3 .
run;
data want;
set have (drop=age gender);
by id;
if first.id then set have point=_n_;
run;
The benefit of this approach is that it avoids the need for an explicit RETAIN statement.
That trick will only work when the non-missing values only appear on the first observation in the BY group.
Your code would not give the updated age for 2nd and 3rd year:
1 | 1 | 50 | male |
1 | 2 | 50 | male |
1 | 3 | 50 | male |
2 | 1 | 60 | female |
2 | 2 | 60 | female |
2 | 3 | 60 | female |
We need one more step to correct it:
data want;
set want;
if visit=2 then age=age+1;
if visit=3 then age=age+2;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.