I am not that advanced in SAS programming and have currently run into some problems regarding how to call a specific value closest to a given value. The problem consits of two "subproblems"
We are looking at outcomes after a time period of 12, 26, 52, 104, 156, 208 and 260 weeks, we have acceptet an interval of +/- 4 weeks for the time points as the visit date is not always the exact date corresponding to the time period. This give rise to the first problem that some patients might multiple visits in the time frame.
Here we want the datapoint closest to the time point. E.g. in the example provided for ID = 1, visit 2 (8 weeks), visit 3 (12 weeks), and visit 4 (14 weeks) all satisfy the condition of 12 weeks +/- 4 weeks. The one we would like to be our 12 weeks visit should be visit 3.
id | visit | Duration_weeks | score1 | score2 |
1 | 1 | 0 | 10 | 10 |
1 | 2 | 8 | 7 | 5 |
1 | 3 | 12 | 7 | 5 |
1 | 4 | 14 | 5 | 5 |
1 | 5 | 23 | 2 | 2 |
1 | 6 | 26 | . | . |
1 | 7 | 31 | 2 | 2 |
2 | 1 | 0 | 10 | 12 |
2 | 2 | 10 | . | . |
2 | 3 | 14 | 5 | 5 |
2 | 4 | 26 | 3 | 3 |
Now the second problem is that we are looking at two different outcomes: score1 and score2. For some visits score1 and score2 is not recorded, in this scenario, if multiple visits are in the time period, we want to call the visit which have both scores recorded. and if only one then score1>score2. E.g. in the example provided for ID = 2 we want visit 3 to be the one called.
I can code this with lead and lag functions (atleast the first part), however this is very time consuming and syntax heavy. Is there a smartere way to accomplish this?
I would start by finding the records in the correct intervals and giving them a rank by scores available and a calculated time difference:
data ordered;
set have;
do scheduled=12, 26, 52, 104, 156, 208,260;
if scheduled-4<=Duration_weeks<=scheduled+4 then do;
rank=missing(score1)*2+missing(score2); /* Score1 has higher priority, high rank is bad */
diff=abs(scheduled-Duration_weeks);
output;
end;
end;
run;
Then, it is very simple:
proc sort;
by id scheduled rank diff;
run;
data want;
set ordered;
by id scheduled rank diff;
if first.scheduled;
run;
If I read your specification correctly, you want to prioritize RANK over DIFF. If not, switch the two in the BY statements.
I need a bit more information than this.. Does this mean you want 7 different data sets? One for each value of 12, 26, 52, 104, 156, 208 and 260?
I would start by finding the records in the correct intervals and giving them a rank by scores available and a calculated time difference:
data ordered;
set have;
do scheduled=12, 26, 52, 104, 156, 208,260;
if scheduled-4<=Duration_weeks<=scheduled+4 then do;
rank=missing(score1)*2+missing(score2); /* Score1 has higher priority, high rank is bad */
diff=abs(scheduled-Duration_weeks);
output;
end;
end;
run;
Then, it is very simple:
proc sort;
by id scheduled rank diff;
run;
data want;
set ordered;
by id scheduled rank diff;
if first.scheduled;
run;
If I read your specification correctly, you want to prioritize RANK over DIFF. If not, switch the two in the BY statements.
Thank you, this is exactly what I needed - it works perfect.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.