Solved: macro / %do loop / conditional column deleting PROBLEM

loredana_cornea · Posted 07-24-2014 05:40 AM

I have a database that looks like this:

Var D1_ D2_ D3_ D4_ D5_ D6_ D7_ D8_ D9_ K2_1_ K2_2_ K2_3_ K2_4_ K2_5_ K2_6_ K2_7_ K2_8_ K2_9_ maxi

123 21 45 56 87 98 100 135 256 125 125 356 654 589 458 213 524 54 653 214 654

..... etc

maxi is max(k2_&i._)

I'm having the following code:

%macro do_list2;

data x;

set testing;

%do i=1 %to 9 %by 1;

if (K2_&i._=maxi and &var <= D&i._) then I1_&var=1; else I1_&var=0;

if (K2_&i._=maxi and &var > D&i._) then I2_&var=1; else I2_&var=0;

%end;

run;

%mend;

%do_list2;

My steps in arriving here were the following:

I have calculated the 9 percentiles of the variable "&var", then I have calculated a chisq test for each interval to see which one is more associated with the probability of default. I have recorded the highest value of this test in the "maxi" variable. In my code above I am trying (without succes) to determine the variable K2_&i._ that has the highest value (as you can see, here it corresponds to K2_2_, meaning that the 2nd percentile is the optimum split of the variable "var") and, with this in mind, I tell it that if (&var <= than D2_) then create me a dummy variable I1=1, ... etc...

I have to perform this operation for lots of variables so I'm trying to automate it in a macro.

The code above isn't working because if the maximum "K2_&i._" variable isn't the 9th one, the results will overwrite and I'm stuck with 2 dummy variables that only take the value 0.

I have tried to save the "&i." in another global variable like this:

if K2_&i._=maxi then %let j=&i.; /* tried this version too j=&i */

Error. If it worked I would have used j instead of i in the condition &var <= D&j._

Then I thought that maybe if it were possible something like this:

if K2_&i._ NE maxi then DROP K2_&i._;

Unfortunately SAS doesn't allow that.

So, any ideas???

loredana_cornea · Posted 07-25-2014 10:26 AM

THANK YOU ALL

I combined your very usefull advices and the perfect code looks like:

data x;

set testing;

array D[9];

array K2_[9];

i=whichn(maxi, of K2_

);

I1_&var=(&var<=D);

I2_&var=1-I1_&var;

run;

I appreciate your efforts very much. Thank you, you've made my day

View solution in original post

RW9 · Posted 07-24-2014 05:49 AM

Hi,

I would check out arrays. For example:

data want;

set have;

array d{9} best.;

array k{9} best.;

array l{9} best.;

do I=1 to 9;

if d{I}=1 and k{1}=1 then l{I}=1;

end;

run;

loredana_cornea · Posted 07-25-2014 04:28 AM

Either I didn't undertand the concept or I wasn't able to master it, it's not working. Here is my code adapted to your suggestion:

data x;

set testing;

array D{9} best.;

array K2{9} best.;

%do i=1 %to 9 %by 1;

if (K2{i}=maxi and &var <= D{i}) then I1_&var=1; else I1_&var=0;

if (K2{i}=maxi and &var <= D{i}) then I2_&var=0; else I2_&var=1;

%end;

run;

RW9 · Posted 07-25-2014 04:49 AM

Hi,

Yes, you are mixing datastep and macro code there. My example uses arrays in a normal datastep so you shouldn't need any macro code - though I am not sure and it doesn't say in your example where &var is defined. Note also that you don't want the underscores as they then don't match the arrays :

%let var=SOME_VARIABLE;

data x;

set testing;

array D{9} best.;

array K2{9} best.;

array I{9} best.;

do i=1 to 9 by 1;

if (K2{i}=maxi and &var <= D{i}) then I{I}=1; else I{I}=0;

if (K2{i}=maxi and &var <= D{i}) then I{I}=0; else I{I}=1;

%end;

run;

The principal is that you have a set of variables all called the same thing, e.g. d1 d2 d3 - k2_1 k2_2 etc. these can be thought of as arrays. You want a variable output for each one of these - another array. Then its simply a matter of do looping over the 9 possibles and checking each one to your output array.

Tom · Posted 07-25-2014 04:53 AM

Once you have defined an array you should no longer need to use macro logic to generate code. Instead the data step can loop over the index to the array and apply the logic.

Looks like you have two regular variables:

MAXI and &VAR (where the real name of the variable is contained in the macro variable VAR)

Looks like you have two series of variables that we can use ARRAYS to reference:

ARRAY D D1_ D2_ D3_ D4_ D5_ D6_ D7_ D8_ D9_ ;

ARRAY K2 K2_1_ K2_2_ K2_3_ K2_4_ K2_5_ K2_6_ K2_7_ K2_8_ K2_9_ ;

And you want to create two new series of variables:

ARRAY i1_ (9) ;

ARRAY i2_ (9) ;

Then you want to assign these 0/1 values based on some logic. It looks like you want the second to just be the complement of the first.

do i=1 to dim(D);

I1_(i) = (k2(i) = MAXI and &var <= D(i)) ;

I2_(i) = not I1_(i);

end;

Note it would be much easier to write the code if the groups of variables did not have underscore appended to their names. Then you could use variable lists. For example D1-D9 is a variable list that references variables D1, D2, D3, .... , D9.

loredana_cornea · Posted 07-25-2014 06:17 AM

So, I've done the following:

%macro def(var);

.....

proc univariate ..... /* to obtain the D1_ ... D9_ */

....

proc freq data ..... /* to obtain K2_1 ... K2_9 */

data x;

set testing;

array D{9} best.;

array K{9} best.;

array D D1_ D2_ D3_ D4_ D5_ D6_ D7_ D8_ D9_;

array K2 K2_1 K2_2 K2_3 K2_4 K2_5 K2_6 K2_7 K2_8 K2_9;

%do i=1 %to dim(D);

I1_&var = (K2(i)=maxi and &var <= D(i));

I2_&var=1-I2_&var;

%end;

run;

%mend;

And got:

ERROR 22-322: Syntax error, expecting one of the following: a name, a numeric constant, $, (,

;, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_, _TEMPORARY_.

ERROR 124-185: The variable D has already been defined.

ERROR 201-322: The option is not recognized and will be ignored.

26 ! data x; set testing; array D{9} best.; array K{9} best.; array D D1_ D2_

26 ! D3_ D4_ D5_ D6_ D7_ D8_ D9_; array K2 K2_1 K2_2 K2_3

--

124

ERROR: Required operator not found in expression: dim(D)

ERROR: The %TO value of the %DO I loop is invalid.

ERROR: The macro DEFAILLANCE will stop executing.

ERROR 124-185: The variable K2 has already been defined.

RW9 · Posted 07-25-2014 06:32 AM

First thing I would change is:

%macro def(var);

proc univariate ..... /* to obtain the D1_ ... D9_ */ /* Change these to be Dx - where x=1 to 9 - no underscores */

...

proc freq data ..... /* to obtain K2_1 ... K2_9 */

...

data x;

set testing;

array D{9} best.;

array K2_{9} best.;

array I{9} best.;

do i=1 to 9;

I{I}= (K2_(i)=maxi and &var <= D(i)); /* This will fill the I array with 1 or 0 based on K and D array versus maxin and &var */

/* This logic does not appear in your original description

I2_&var=1-I2_&var;

hence I remove it */

end;

run;

%mend def;

loredana_cornea · Posted 07-25-2014 07:28 AM

I assume the problem is that the percentiles were created in the univariate procedure and I can't define an array with the same name. So, how exactly i'm to declare the name of the percentiles in the output option of univariate as ARRAY?

Concerning the other thing I just want a dummy variable I1_&var that goes 1 if &var<=the percentile corresponding to the maximum K2 and 0 otherwise. And I2 that is its complement.

RW9 · Posted 07-25-2014 07:59 AM

Could you provide some test input and desired output.

loredana_cornea · Posted 07-25-2014 08:06 AM

Of course:

Input:

Var D1 D2 D3 D4 D5 D6 D7 D8 D9 K2_1 K2_2 K2_3 K2_4 K2_5 K2_6 K2_7 K2_8 K2_9 maxi

100 15 30 45 60 75 100 125 130 132 77 88 99 111 101 98 87 76 70 111

Output:

Var D4 K2_4 I1_var I2_var

100 60 111 0 1

RW9 · Posted 07-25-2014 08:27 AM

Well, with that example the following achieves the output (oh, and I shouldn't have used best. on the format of the array):

data have;
attrib Var   D1   D2   D3   D4   D5   D6   D7   D8   D9   K2_1   K2_2   K2_3   K2_4   K2_5   K2_6   K2_7   K2_8   K2_9 maxi format=best.;
infile datalines;
input Var   D1   D2   D3   D4   D5   D6   D7   D8   D9   K2_1   K2_2   K2_3   K2_4   K2_5   K2_6   K2_7   K2_8   K2_9 maxi;
datalines;
100   15    30   45    60   75   100 125 130 132    77        88      99      111       101    98        87      76        70      111
run;

data want (keep=var d_result k2_result l1_var l2_var);

set have;

attrib d_result k2_result l1_var l2_var format=best.;

array K2_{9} 8.;

array D{9} 8.;

i=1;

do while (i<=9 and l1_var=.);

if K2_{i}=maxi then do;

d_result=D{i};

k2_result=K2_{i};

if var <= D{i} then l1_var=1; else l1_var=0;

if var > D{i} then l2_var=1; else l2_var=0;

end;

i=i+1;

end;

run;

data_null__ · Posted 07-25-2014 08:37 AM

Looks to me like the only thing you need to know if WHICH K2 item is MAXI and WHICH D item is VAR

data a;
   input var d1-d9 k2_1-k2_9 maxi;
   array d[9];
   array k2_[9];
   i = whichn(maxi,of k2_);
   j = whichn(var,of d
);
   i1_var = i gt j;
   i2_var = i lt j;
   
cards; 
100   15    30   45    60   75   100 125  130  132    77        88      99      111       101    98        87      76        70      111
;;;;
   run;     

loredana_cornea · Posted 07-25-2014 10:26 AM

THANK YOU ALL

I combined your very usefull advices and the perfect code looks like:

data x;

set testing;

array D[9];

array K2_[9];

i=whichn(maxi, of K2_

);

I1_&var=(&var<=D);

I2_&var=1-I1_&var;

run;

I appreciate your efforts very much. Thank you, you've made my day

macro / %do loop / conditional column deleting PROBLEM

Re: macro / %do loop / conditional column deleting PROBLEM

Re: macro / %do loop / conditional column deleting PROBLEM

Re: macro / %do loop / conditional column deleting PROBLEM

Re: macro / %do loop / conditional column deleting PROBLEM

Re: macro / %do loop / conditional column deleting PROBLEM

Re: macro / %do loop / conditional column deleting PROBLEM

Re: macro / %do loop / conditional column deleting PROBLEM

Re: macro / %do loop / conditional column deleting PROBLEM

Re: macro / %do loop / conditional column deleting PROBLEM

Re: macro / %do loop / conditional column deleting PROBLEM

Re: macro / %do loop / conditional column deleting PROBLEM

Re: macro / %do loop / conditional column deleting PROBLEM

Re: macro / %do loop / conditional column deleting PROBLEM

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away