BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
loredana_cornea
Obsidian | Level 7

I have a database that looks like this:

Var     D1_     D2_     D3_     D4_     D5_     D6_     D7_     D8_     D9_     K2_1_     K2_2_     K2_3_     K2_4_     K2_5_     K2_6_     K2_7_     K2_8_     K2_9_      maxi

123     21     45     56     87     98     100     135     256     125     125          356          654          589          458     213          524          54          653          214          654

..... etc

maxi is max(k2_&i._)

I'm having the following code:

%macro do_list2;

data x;

  set testing;

  %do i=1 %to 9 %by 1;

  if (K2_&i._=maxi and &var <= D&i._) then I1_&var=1; else I1_&var=0;

  if (K2_&i._=maxi and &var > D&i._) then I2_&var=1; else I2_&var=0;

  %end;

run;

%mend;

%do_list2;

My steps in arriving here were the following:

I have calculated the 9 percentiles of the variable "&var", then I have calculated a chisq test for each interval to see which one is more associated with the probability of default. I have recorded the highest value of this test in the "maxi" variable. In my code above I am trying (without succes) to determine the variable K2_&i._ that has the highest value (as you can see, here it corresponds to K2_2_, meaning that the 2nd percentile is the optimum split of the variable "var") and, with this in mind, I tell it that if (&var <= than D2_) then create me a dummy variable I1=1,         ... etc...

I have to perform this operation for lots of variables so I'm trying to automate it in a macro.

The code above isn't working because if the maximum "K2_&i._" variable isn't the 9th one, the results will overwrite and I'm stuck with 2 dummy variables that only take the value 0.

I have tried to save the "&i." in another global variable like this:

if K2_&i._=maxi then %let j=&i.;      /* tried this version too j=&i */

Error. If it worked I would have used j instead of i in the condition &var <= D&j._

Then I thought that maybe if it were possible something like this:

if K2_&i._ NE maxi then DROP K2_&i._;

Unfortunately SAS doesn't allow that.

So, any ideas???

1 ACCEPTED SOLUTION

Accepted Solutions
loredana_cornea
Obsidian | Level 7

THANK YOU ALL

I combined your very usefull advices and the perfect code looks like:

data x;

  set testing;

  array D[9];

  array K2_[9];

  i=whichn(maxi, of K2_

  • );
  •   I1_&var=(&var<=D);

      I2_&var=1-I1_&var;

    run;

    I appreciate your efforts very much. Thank you, you've made my day

    View solution in original post

    12 REPLIES 12
    RW9
    Diamond | Level 26 RW9
    Diamond | Level 26

    Hi,

    I would check out arrays.  For example:

    data want;

         set have;

         array d{9} best.;

         array k{9} best.;

         array l{9} best.;

         do I=1 to 9;

              if d{I}=1 and k{1}=1 then l{I}=1;

         end;

    run;

    loredana_cornea
    Obsidian | Level 7

    Either I didn't undertand the concept or I wasn't able to master it, it's not working. Here is my code adapted to your suggestion:

    data x;

      set testing;

      array D{9} best.;

      array K2{9} best.;

      %do i=1 %to 9 %by 1;

      if (K2{i}=maxi and &var <= D{i}) then I1_&var=1; else I1_&var=0;

      if (K2{i}=maxi and &var <= D{i}) then I2_&var=0; else I2_&var=1;

      %end;

    run;

    RW9
    Diamond | Level 26 RW9
    Diamond | Level 26

    Hi,

    Yes, you are mixing datastep and macro code there.  My example uses arrays in a normal datastep so you shouldn't need any macro code - though I am not sure and it doesn't say in your example where &var is defined.  Note also that you don't want the underscores as they then don't match the arrays :

    %let var=SOME_VARIABLE;

    data x;

      set testing;

      array D{9} best.;

      array K2{9} best.;

      array I{9} best.;

      do i=1 to 9 by 1;

      if (K2{i}=maxi and &var <= D{i}) then I{I}=1; else I{I}=0;

      if (K2{i}=maxi and &var <= D{i}) then I{I}=0; else I{I}=1;

      %end;

    run;

    The principal is that you have a set of variables all called the same thing, e.g. d1 d2 d3 - k2_1 k2_2 etc. these can be thought of as arrays.  You want a variable output for each one of these - another array.  Then its simply a matter of do looping over the 9 possibles and checking each one to your output array.

    Tom
    Super User Tom
    Super User

    Once you have defined an array you should no longer need to use macro logic to generate code. Instead the data step can loop over the index to the array and apply the logic.

    Looks like you have two regular variables:

    MAXI and &VAR  (where the real name of the variable is contained in the macro variable VAR)

    Looks like you have two series of variables that we can use ARRAYS to reference:

    ARRAY D  D1_   D2_   D3_   D4_   D5_   D6_   D7_   D8_   D9_   ;

    ARRAY K2 K2_1_ K2_2_ K2_3_ K2_4_ K2_5_ K2_6_ K2_7_ K2_8_ K2_9_ ;

    And you want to create two new series of variables:

    ARRAY i1_ (9) ;

    ARRAY i2_ (9) ;

    Then you want to assign these 0/1 values based on some logic.  It looks like you want the second to just be the complement of the first.

    do i=1 to dim(D);

       I1_(i) =  (k2(i) = MAXI and &var <= D(i)) ;

       I2_(i) = not I1_(i);

    end;

    Note it would be much easier to write the code if the groups of variables did not have underscore appended to their names.  Then you could use variable lists. For example D1-D9 is a variable list that references variables D1, D2, D3, .... , D9.

    loredana_cornea
    Obsidian | Level 7

    So, I've done the following:

    %macro def(var);

    .....

    proc univariate .....     /*   to obtain the D1_   ... D9_  */

    ....

    proc freq data .....      /* to obtain K2_1    ...    K2_9  */

    data x;

      set testing;

      array D{9} best.;

      array K{9} best.;

      array D D1_ D2_ D3_ D4_ D5_ D6_ D7_ D8_ D9_;

      array K2 K2_1 K2_2 K2_3 K2_4 K2_5 K2_6 K2_7 K2_8 K2_9;

      %do i=1 %to dim(D);

      I1_&var = (K2(i)=maxi and &var <= D(i));

      I2_&var=1-I2_&var;

      %end;

    run;

    %mend;

    And got:

    ERROR 22-322: Syntax error, expecting one of the following: a name, a numeric constant, $, (,

                  ;, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_, _TEMPORARY_.

    ERROR 124-185: The variable D has already been defined.

    ERROR 201-322: The option is not recognized and will be ignored.

    26  ! data x;     set testing;     array D{9} best.;     array K{9} best.;     array D D1_ D2_

    26  ! D3_ D4_ D5_ D6_ D7_ D8_ D9_;     array K2 K2_1 K2_2 K2_3

                                                 --

                                                 124

    ERROR: Required operator not found in expression: dim(D)

    ERROR: The %TO value of the %DO I loop is invalid.

    ERROR: The macro DEFAILLANCE will stop executing.

    ERROR 124-185: The variable K2 has already been defined.

    RW9
    Diamond | Level 26 RW9
    Diamond | Level 26

    First thing I would change is:

    %macro def(var);

         proc univariate .....     /*   to obtain the D1_   ... D9_  */          /* Change these to be Dx - where x=1 to 9 - no                                                                                                          underscores */

              ...

         proc freq data .....      /* to obtain K2_1    ...    K2_9  */

              ...

         data x;    

           set testing;

           array D{9} best.;

           array K2_{9} best.;

           array I{9} best.;

           do i=1 to 9;

                I{I}= (K2_(i)=maxi and &var <= D(i));     /* This will fill the I array with 1 or 0 based on K and D array versus                                                                                      maxin and &var */

                /* This logic does not appear in your original description

                I2_&var=1-I2_&var;        

                   hence I remove it */

           end;

         run;

    %mend def;

    loredana_cornea
    Obsidian | Level 7

    I assume the problem is that the percentiles were created in the univariate procedure and I can't define an array with the same name. So, how exactly i'm to declare the name of the percentiles in the output option of univariate as ARRAY?

    Concerning the other thing I just want a dummy variable I1_&var that goes 1 if &var<=the percentile corresponding to the maximum K2 and 0 otherwise. And I2 that is its complement.

    RW9
    Diamond | Level 26 RW9
    Diamond | Level 26

    Could you provide some test input and desired output.

    loredana_cornea
    Obsidian | Level 7

    Of course:

    Input:

    Var   D1   D2   D3   D4   D5   D6   D7   D8   D9   K2_1   K2_2   K2_3   K2_4   K2_5   K2_6   K2_7   K2_8   K2_9  maxi

    100   15    30   45    60   75   100 125  130  132    77        88      99      111       101    98        87      76        70      111

    Output:

    Var    D4   K2_4   I1_var    I2_var

    100    60    111       0           1      

    RW9
    Diamond | Level 26 RW9
    Diamond | Level 26

    Well, with that example the following achieves the output (oh, and I shouldn't have used best. on the format of the array):

    data have;
      attrib Var   D1   D2   D3   D4   D5   D6   D7   D8   D9   K2_1   K2_2   K2_3   K2_4   K2_5   K2_6   K2_7   K2_8   K2_9  maxi format=best.;
      infile datalines;
      input Var   D1   D2   D3   D4   D5   D6   D7   D8   D9   K2_1   K2_2   K2_3   K2_4   K2_5   K2_6   K2_7   K2_8   K2_9  maxi;
    datalines;
    100   15    30   45    60   75   100 125  130  132    77        88      99      111       101    98        87      76        70      111
    run;

    data want (keep=var d_result k2_result l1_var l2_var);

      set have;

      attrib d_result k2_result l1_var l2_var format=best.;

      array K2_{9} 8.;

      array D{9} 8.;

      i=1;

      do while (i<=9 and l1_var=.);

        if K2_{i}=maxi then do;

          d_result=D{i};

          k2_result=K2_{i};

          if var <= D{i} then l1_var=1; else l1_var=0;

          if var > D{i} then l2_var=1; else l2_var=0;

        end;

        i=i+1;

      end;

    run;

    data_null__
    Jade | Level 19

    Looks to me like the only thing you need to know if WHICH K2 item is MAXI and WHICH D item is VAR

    data a;
       input var d1-d9 k2_1-k2_9 maxi;
       array d[9];
       array k2_[9];
       i = whichn(maxi,of k2_
  • );
       j = whichn(var,of d
  • );
       i1_var = i gt j;
       i2_var = i lt j;
      
  • cards;
    100   15    30   45    60   75   100 125  130  132    77        88      99      111       101    98        87      76        70      111
    ;;;;
       run;    
    loredana_cornea
    Obsidian | Level 7

    THANK YOU ALL

    I combined your very usefull advices and the perfect code looks like:

    data x;

      set testing;

      array D[9];

      array K2_[9];

      i=whichn(maxi, of K2_

  • );
  •   I1_&var=(&var<=D);

      I2_&var=1-I1_&var;

    run;

    I appreciate your efforts very much. Thank you, you've made my day

    sas-innovate-2024.png

    Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

    Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

     

    Register now!

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    Click image to register for webinarClick image to register for webinar

    Classroom Training Available!

    Select SAS Training centers are offering in-person courses. View upcoming courses for:

    View all other training opportunities.

    Discussion stats
    • 12 replies
    • 1940 views
    • 6 likes
    • 4 in conversation