- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a dataset with many patients first year visit information. Then I want to get some information from the first year data into the second and third year. For example, for ID=1, then his age will be 51, 52 in the 2nd and 3rd year, while gender is still male.
data have;
input id age visit gender $;
datalines;
1 50 1 male
1 . 2 .
1 . 3 .
2 60 1 female
2 . 2 .
2 . 3 .
;
run;
what I want
id | age | visit | gender |
1 | 50 | 1 | male |
1 | 51 | 2 | male |
1 | 52 | 3 | male |
2 | 60 | 1 | female |
2 | 61 | 2 | female |
Thank you.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try this
data have;
input id age visit gender $;
datalines;
1 50 1 male
1 . 2 .
1 . 3 .
2 60 1 female
2 . 2 .
2 . 3 .
;
run;
data want(drop = a g);
set have;
by id;
if first.id then do;
g = gender;
a = age;
end;
else a + 1;
gender = g;
age = a;
retain a g;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try this
data have;
input id age visit gender $;
datalines;
1 50 1 male
1 . 2 .
1 . 3 .
2 60 1 female
2 . 2 .
2 . 3 .
;
run;
data want(drop = a g);
set have;
by id;
if first.id then do;
g = gender;
a = age;
end;
else a + 1;
gender = g;
age = a;
retain a g;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can actually trick the UPDATE statement into doing this for you. The purpose of the UPDATE statement is to allow you to apply a transaction dataset to an existing dataset. When the variable is the transaction dataset is missing then the current value is not replaced.
You have to have a BY group and normally the BY variables should uniquely identify the observations in the source dataset. But you can have multiple transactions per BY group. Your data already has a BY variable, ID, but it is not unique. But if you just treat ALL of the observations as the transactions your can avoid that problem. So use the OBS=0 dataset option to start with an empty source dataset. To output more than one observation per BY group just add an explicit OUTPUT statement.
data want;
update have(obs=0) have;
by id;
output;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For future reference, consider a strategic combination of DROP= and POINT=:
data have;
input id age visit gender $;
datalines;
1 50 1 male
1 . 2 .
1 . 3 .
2 60 1 female
2 . 2 .
2 . 3 .
run;
data want;
set have (drop=age gender);
by id;
if first.id then set have point=_n_;
run;
The benefit of this approach is that it avoids the need for an explicit RETAIN statement.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That trick will only work when the non-missing values only appear on the first observation in the BY group.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your code would not give the updated age for 2nd and 3rd year:
1 | 1 | 50 | male |
1 | 2 | 50 | male |
1 | 3 | 50 | male |
2 | 1 | 60 | female |
2 | 2 | 60 | female |
2 | 3 | 60 | female |
We need one more step to correct it:
data want;
set want;
if visit=2 then age=age+1;
if visit=3 then age=age+2;
run;