BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GS2
Obsidian | Level 7 GS2
Obsidian | Level 7

Using SAS 9.4

 

I have a variable C_mos indicated the months from event A to event B and I want to create a new variable and choose the C_v score that is closest to 6 month between 5-7 months and leave the rest blank. However, I want to prioritize 6 months as the closest but then prioritize 7 months over 5 months as they have reached the 6 month threshold. So I envision the data to look like below:

ID C_MOS C_V newvar
1 5.25 1 .
1 6.02 2 2
1 7.01 3 .
1 5.98 4 .
1 10 5 .
1 5 6 .
2 5.02 1 .
2 5.5 2 .
2 7.5 3 .
2 7.06 4 4
2 8 5 .

 

So as this shows the closest to 6.0 is taken as long as 6 is achieved but where there is not a 6 month score month 7 is given the priority over month 5 even though the month 5 (5.02) is closer to 6.0 than then 7 month (7.06). Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Then you need to read each ID series twice.  The first time, you track the optimal value (call it _X67) in the range [6,8), and also track the optimal value in the range [5,6), call it _x5 (in case no _X67 is found).     Note x in the range [6,8) means 6<=x<8.

 

Take _x67 if it exists, otherwise take _x5 as the optimal _X value.

 

In the second reading of each ID, compare c_mos to _X, and set NEWVAR=C_V when C_MOS=_X.

 

data have;
input
ID	C_MOS	C_V	expected_var;
datalines;
1	5.25	1	.
1	6.02	2	2
1	7.01	3	.
1	5.98	4	.
1	10	5	.
1	5	6	.
2	5.02	1	.
2	5.5	2	.
2	7.5	3	.
2	7.06	4	4
2	8	5	.
run;

data want (drop=_:);
  do until (last.id);
    set have;
    by id;
    if (6<=c_mos<8) then _x67=min(_x67,c_mos) ; else
    if (5<=c_mos<6) then _x5=max(_x5,c_mos);
  end;
  _x=coalesce(_x67,_x5);
  do until (last.id);
    set have;
    by id;
    newvar=ifn(c_mos=_x,c_v,.);
    output;
  end;
run;

The "do until (last.id)" loops, with embedded "set have; by id", each reads a complete series for a single ID.  The first loop establishes _X5 and _X67.  The second loop uses those values to identify the desired record.

 

If there is a tie, then two records will get a value for NEWVAR.  And if there are no values in the range [5,8) then that ID will have only missing values for NEWVAR.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
mkeintz
PROC Star

You want to choose "c_v score that is closest to 6 month", yet for ID=2 you choose 7.06 rather than the much closer 5.5.  I don't understand the rule.  I presume you would therefore also want 6.9 over 5.5.   In what situation would you choose a value 5.5?

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
GS2
Obsidian | Level 7 GS2
Obsidian | Level 7
I want the the 6 month first, then 7 month and finally 5 month. So, in your example while 5.5 is technically closer it has not reached the 6 month threshold and therefore a 7.06 would be chosen. So it almost a hierarchy where I want the closest C_mos between 6-6.99, if not available then 7-7.99, if not available then 5-5.9. But within each group I would want the closest to 6.0 per ID. Does that make more sense? Thank you
mkeintz
PROC Star

Then you need to read each ID series twice.  The first time, you track the optimal value (call it _X67) in the range [6,8), and also track the optimal value in the range [5,6), call it _x5 (in case no _X67 is found).     Note x in the range [6,8) means 6<=x<8.

 

Take _x67 if it exists, otherwise take _x5 as the optimal _X value.

 

In the second reading of each ID, compare c_mos to _X, and set NEWVAR=C_V when C_MOS=_X.

 

data have;
input
ID	C_MOS	C_V	expected_var;
datalines;
1	5.25	1	.
1	6.02	2	2
1	7.01	3	.
1	5.98	4	.
1	10	5	.
1	5	6	.
2	5.02	1	.
2	5.5	2	.
2	7.5	3	.
2	7.06	4	4
2	8	5	.
run;

data want (drop=_:);
  do until (last.id);
    set have;
    by id;
    if (6<=c_mos<8) then _x67=min(_x67,c_mos) ; else
    if (5<=c_mos<6) then _x5=max(_x5,c_mos);
  end;
  _x=coalesce(_x67,_x5);
  do until (last.id);
    set have;
    by id;
    newvar=ifn(c_mos=_x,c_v,.);
    output;
  end;
run;

The "do until (last.id)" loops, with embedded "set have; by id", each reads a complete series for a single ID.  The first loop establishes _X5 and _X67.  The second loop uses those values to identify the desired record.

 

If there is a tie, then two records will get a value for NEWVAR.  And if there are no values in the range [5,8) then that ID will have only missing values for NEWVAR.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Patrick
Opal | Level 21

Something like below could work.

data have;
  infile datalines dsd;
  input ID C_MOS C_V expected;
datalines;
1,5.25,1,.
1,6.02,2,2
1,7.01,3,.
1,5.98,4,.
1,10,5,.
1,5,6,.
2,5.02,1,.
2,5.5,2,.
2,7.5,3,.
2,7.06,4,4
2,8,5,.
3,9.5,2,.
3,4.5,3,.
3,3.06,4,.
3,8,5,.
;

proc sql;
/*  create table want as*/
    select 
      o.*,
      i.c_v as derived
    from have o
    left join
    (
      select id,c_v
      from
      (
      select 
        *,
        case(floor(c_mos))
          when 6 then 10000-c_mos
          when 7 then 1000-c_mos
          when 5 then 100-c_mos
          else .
          end as calc
      from have
      )
      group by id
      having 
        max(calc) = calc 
        and not missing(calc)
    ) i
    on o.id=i.id and o.c_v=i.c_v
    ;
quit;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 910 views
  • 0 likes
  • 3 in conversation