Using SAS 9.4
I have a variable C_mos indicated the months from event A to event B and I want to create a new variable and choose the C_v score that is closest to 6 month between 5-7 months and leave the rest blank. However, I want to prioritize 6 months as the closest but then prioritize 7 months over 5 months as they have reached the 6 month threshold. So I envision the data to look like below:
ID | C_MOS | C_V | newvar |
1 | 5.25 | 1 | . |
1 | 6.02 | 2 | 2 |
1 | 7.01 | 3 | . |
1 | 5.98 | 4 | . |
1 | 10 | 5 | . |
1 | 5 | 6 | . |
2 | 5.02 | 1 | . |
2 | 5.5 | 2 | . |
2 | 7.5 | 3 | . |
2 | 7.06 | 4 | 4 |
2 | 8 | 5 | . |
So as this shows the closest to 6.0 is taken as long as 6 is achieved but where there is not a 6 month score month 7 is given the priority over month 5 even though the month 5 (5.02) is closer to 6.0 than then 7 month (7.06). Thank you
Then you need to read each ID series twice. The first time, you track the optimal value (call it _X67) in the range [6,8), and also track the optimal value in the range [5,6), call it _x5 (in case no _X67 is found). Note x in the range [6,8) means 6<=x<8.
Take _x67 if it exists, otherwise take _x5 as the optimal _X value.
In the second reading of each ID, compare c_mos to _X, and set NEWVAR=C_V when C_MOS=_X.
data have;
input
ID C_MOS C_V expected_var;
datalines;
1 5.25 1 .
1 6.02 2 2
1 7.01 3 .
1 5.98 4 .
1 10 5 .
1 5 6 .
2 5.02 1 .
2 5.5 2 .
2 7.5 3 .
2 7.06 4 4
2 8 5 .
run;
data want (drop=_:);
do until (last.id);
set have;
by id;
if (6<=c_mos<8) then _x67=min(_x67,c_mos) ; else
if (5<=c_mos<6) then _x5=max(_x5,c_mos);
end;
_x=coalesce(_x67,_x5);
do until (last.id);
set have;
by id;
newvar=ifn(c_mos=_x,c_v,.);
output;
end;
run;
The "do until (last.id)" loops, with embedded "set have; by id", each reads a complete series for a single ID. The first loop establishes _X5 and _X67. The second loop uses those values to identify the desired record.
If there is a tie, then two records will get a value for NEWVAR. And if there are no values in the range [5,8) then that ID will have only missing values for NEWVAR.
You want to choose "c_v score that is closest to 6 month", yet for ID=2 you choose 7.06 rather than the much closer 5.5. I don't understand the rule. I presume you would therefore also want 6.9 over 5.5. In what situation would you choose a value 5.5?
Then you need to read each ID series twice. The first time, you track the optimal value (call it _X67) in the range [6,8), and also track the optimal value in the range [5,6), call it _x5 (in case no _X67 is found). Note x in the range [6,8) means 6<=x<8.
Take _x67 if it exists, otherwise take _x5 as the optimal _X value.
In the second reading of each ID, compare c_mos to _X, and set NEWVAR=C_V when C_MOS=_X.
data have;
input
ID C_MOS C_V expected_var;
datalines;
1 5.25 1 .
1 6.02 2 2
1 7.01 3 .
1 5.98 4 .
1 10 5 .
1 5 6 .
2 5.02 1 .
2 5.5 2 .
2 7.5 3 .
2 7.06 4 4
2 8 5 .
run;
data want (drop=_:);
do until (last.id);
set have;
by id;
if (6<=c_mos<8) then _x67=min(_x67,c_mos) ; else
if (5<=c_mos<6) then _x5=max(_x5,c_mos);
end;
_x=coalesce(_x67,_x5);
do until (last.id);
set have;
by id;
newvar=ifn(c_mos=_x,c_v,.);
output;
end;
run;
The "do until (last.id)" loops, with embedded "set have; by id", each reads a complete series for a single ID. The first loop establishes _X5 and _X67. The second loop uses those values to identify the desired record.
If there is a tie, then two records will get a value for NEWVAR. And if there are no values in the range [5,8) then that ID will have only missing values for NEWVAR.
Something like below could work.
data have;
infile datalines dsd;
input ID C_MOS C_V expected;
datalines;
1,5.25,1,.
1,6.02,2,2
1,7.01,3,.
1,5.98,4,.
1,10,5,.
1,5,6,.
2,5.02,1,.
2,5.5,2,.
2,7.5,3,.
2,7.06,4,4
2,8,5,.
3,9.5,2,.
3,4.5,3,.
3,3.06,4,.
3,8,5,.
;
proc sql;
/* create table want as*/
select
o.*,
i.c_v as derived
from have o
left join
(
select id,c_v
from
(
select
*,
case(floor(c_mos))
when 6 then 10000-c_mos
when 7 then 1000-c_mos
when 5 then 100-c_mos
else .
end as calc
from have
)
group by id
having
max(calc) = calc
and not missing(calc)
) i
on o.id=i.id and o.c_v=i.c_v
;
quit;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.