HI everyone,
I have a data set that looks like this:
data have;
input nr subject service $ weight encounter :yymmdd10.;
format encounter yymmddd10.;
datalines;
1 1 heart 67 20130101
2 1 lung 67 20130131
3 1 gastro 69 20130302
4 1 heart 68 20130401
5 1 neuro 66 20130501
6 2 neuro 80 20130531
7 2 neuro 82 20130630
8 2 neuro 82 20130730
9 2 neuro 81 20130829
10 2 neuro 83 20130928
11 2 neuro 82 20131028
12 3 lung 55 20131127
13 3 lung 55 20131227
14 3 gastric 57 20140126
15 3 heart 57 20140225
16 3 lung 55 20140327
17 4 neuro 88 20140426
18 4 neuro 88 20140526
19 4 neuro 88 20140625
20 4 lung 86 20140725
21 4 lung 86 20140824
;
run;
I am trying to clean this dataset and use it for survival analysis, basically, I am trying to come up with a data set that contains the variables nr, subject, service,weight and encounter with one row per subject.
I will need to select the row that corresponds to the last encounter with service 'heart' for each subject, for subject that have not had an encounter with service 'heart' I need to select the latest encounter date for that subject.
I also need to have a new variable called 'censored' equal to 0 if the subject had an encounter with hear and 1 if not. SO the output should look like this:
nr | subject | service | weight | encounter | censored |
4 | 1 | heart | 68 | 20130401 | 0 |
11 | 2 | neuro | 82 | 20131028 | 1 |
15 | 3 | heart | 57 | 20140225 | 0 |
21 | 4 | lung | 86 | 20140824 | 1 |
Now this is doable with a few proc sqls but I was wondering if anyone could give an easier solution maybe using a data-- set -- by combined with arrays and a do loop?
Thank you\
Try this
data want(drop=p);
do _N_ = 1 by 1 until (last.subject);
set have;
by subject;
if service = "heart" then p = _N_;
end;
if not p then p=_N_;
do _N_ = 1 to _N_;
set have;
censored = (service = "heart");
if _N_ = p then output;
end;
run;
Try this
data want(drop=p);
do _N_ = 1 by 1 until (last.subject);
set have;
by subject;
if service = "heart" then p = _N_;
end;
if not p then p=_N_;
do _N_ = 1 to _N_;
set have;
censored = (service = "heart");
if _N_ = p then output;
end;
run;
@PeterClemmensen: impressive
Anytime. Thank you for a clear question.
A well-stated question is the fast path to a working answer. Kudos.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.