Using SAS 9.4
I have a variable C_mos indicated the months from event A to event B and I want to create a new variable and choose the C_v score that is closest to 6 month between 5-7 months and leave the rest blank. However, I want to prioritize 6 months as the closest but then prioritize 7 months over 5 months as they have reached the 6 month threshold. So I envision the data to look like below:
ID | C_MOS | C_V | newvar |
1 | 5.25 | 1 | . |
1 | 6.02 | 2 | 2 |
1 | 7.01 | 3 | . |
1 | 5.98 | 4 | . |
1 | 10 | 5 | . |
1 | 5 | 6 | . |
2 | 5.02 | 1 | . |
2 | 5.5 | 2 | . |
2 | 7.5 | 3 | . |
2 | 7.06 | 4 | 4 |
2 | 8 | 5 | . |
So as this shows the closest to 6.0 is taken as long as 6 is achieved but where there is not a 6 month score month 7 is given the priority over month 5 even though the month 5 (5.02) is closer to 6.0 than then 7 month (7.06). Thank you
Then you need to read each ID series twice. The first time, you track the optimal value (call it _X67) in the range [6,8), and also track the optimal value in the range [5,6), call it _x5 (in case no _X67 is found). Note x in the range [6,8) means 6<=x<8.
Take _x67 if it exists, otherwise take _x5 as the optimal _X value.
In the second reading of each ID, compare c_mos to _X, and set NEWVAR=C_V when C_MOS=_X.
data have;
input
ID C_MOS C_V expected_var;
datalines;
1 5.25 1 .
1 6.02 2 2
1 7.01 3 .
1 5.98 4 .
1 10 5 .
1 5 6 .
2 5.02 1 .
2 5.5 2 .
2 7.5 3 .
2 7.06 4 4
2 8 5 .
run;
data want (drop=_:);
do until (last.id);
set have;
by id;
if (6<=c_mos<8) then _x67=min(_x67,c_mos) ; else
if (5<=c_mos<6) then _x5=max(_x5,c_mos);
end;
_x=coalesce(_x67,_x5);
do until (last.id);
set have;
by id;
newvar=ifn(c_mos=_x,c_v,.);
output;
end;
run;
The "do until (last.id)" loops, with embedded "set have; by id", each reads a complete series for a single ID. The first loop establishes _X5 and _X67. The second loop uses those values to identify the desired record.
If there is a tie, then two records will get a value for NEWVAR. And if there are no values in the range [5,8) then that ID will have only missing values for NEWVAR.
You want to choose "c_v score that is closest to 6 month", yet for ID=2 you choose 7.06 rather than the much closer 5.5. I don't understand the rule. I presume you would therefore also want 6.9 over 5.5. In what situation would you choose a value 5.5?
Then you need to read each ID series twice. The first time, you track the optimal value (call it _X67) in the range [6,8), and also track the optimal value in the range [5,6), call it _x5 (in case no _X67 is found). Note x in the range [6,8) means 6<=x<8.
Take _x67 if it exists, otherwise take _x5 as the optimal _X value.
In the second reading of each ID, compare c_mos to _X, and set NEWVAR=C_V when C_MOS=_X.
data have;
input
ID C_MOS C_V expected_var;
datalines;
1 5.25 1 .
1 6.02 2 2
1 7.01 3 .
1 5.98 4 .
1 10 5 .
1 5 6 .
2 5.02 1 .
2 5.5 2 .
2 7.5 3 .
2 7.06 4 4
2 8 5 .
run;
data want (drop=_:);
do until (last.id);
set have;
by id;
if (6<=c_mos<8) then _x67=min(_x67,c_mos) ; else
if (5<=c_mos<6) then _x5=max(_x5,c_mos);
end;
_x=coalesce(_x67,_x5);
do until (last.id);
set have;
by id;
newvar=ifn(c_mos=_x,c_v,.);
output;
end;
run;
The "do until (last.id)" loops, with embedded "set have; by id", each reads a complete series for a single ID. The first loop establishes _X5 and _X67. The second loop uses those values to identify the desired record.
If there is a tie, then two records will get a value for NEWVAR. And if there are no values in the range [5,8) then that ID will have only missing values for NEWVAR.
Something like below could work.
data have;
infile datalines dsd;
input ID C_MOS C_V expected;
datalines;
1,5.25,1,.
1,6.02,2,2
1,7.01,3,.
1,5.98,4,.
1,10,5,.
1,5,6,.
2,5.02,1,.
2,5.5,2,.
2,7.5,3,.
2,7.06,4,4
2,8,5,.
3,9.5,2,.
3,4.5,3,.
3,3.06,4,.
3,8,5,.
;
proc sql;
/* create table want as*/
select
o.*,
i.c_v as derived
from have o
left join
(
select id,c_v
from
(
select
*,
case(floor(c_mos))
when 6 then 10000-c_mos
when 7 then 1000-c_mos
when 5 then 100-c_mos
else .
end as calc
from have
)
group by id
having
max(calc) = calc
and not missing(calc)
) i
on o.id=i.id and o.c_v=i.c_v
;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.