I have a database that looks like this:
Var D1_ D2_ D3_ D4_ D5_ D6_ D7_ D8_ D9_ K2_1_ K2_2_ K2_3_ K2_4_ K2_5_ K2_6_ K2_7_ K2_8_ K2_9_ maxi
123 21 45 56 87 98 100 135 256 125 125 356 654 589 458 213 524 54 653 214 654
..... etc
maxi is max(k2_&i._)
I'm having the following code:
%macro do_list2;
data x;
set testing;
%do i=1 %to 9 %by 1;
if (K2_&i._=maxi and &var <= D&i._) then I1_&var=1; else I1_&var=0;
if (K2_&i._=maxi and &var > D&i._) then I2_&var=1; else I2_&var=0;
%end;
run;
%mend;
%do_list2;
My steps in arriving here were the following:
I have calculated the 9 percentiles of the variable "&var", then I have calculated a chisq test for each interval to see which one is more associated with the probability of default. I have recorded the highest value of this test in the "maxi" variable. In my code above I am trying (without succes) to determine the variable K2_&i._ that has the highest value (as you can see, here it corresponds to K2_2_, meaning that the 2nd percentile is the optimum split of the variable "var") and, with this in mind, I tell it that if (&var <= than D2_) then create me a dummy variable I1=1, ... etc...
I have to perform this operation for lots of variables so I'm trying to automate it in a macro.
The code above isn't working because if the maximum "K2_&i._" variable isn't the 9th one, the results will overwrite and I'm stuck with 2 dummy variables that only take the value 0.
I have tried to save the "&i." in another global variable like this:
if K2_&i._=maxi then %let j=&i.; /* tried this version too j=&i */
Error. If it worked I would have used j instead of i in the condition &var <= D&j._
Then I thought that maybe if it were possible something like this:
if K2_&i._ NE maxi then DROP K2_&i._;
Unfortunately SAS doesn't allow that.
So, any ideas???
THANK YOU ALL
I combined your very usefull advices and the perfect code looks like:
data x;
set testing;
array D[9];
array K2_[9];
i=whichn(maxi, of K2_
I1_&var=(&var<=D);
I2_&var=1-I1_&var;
run;
I appreciate your efforts very much. Thank you, you've made my day
Hi,
I would check out arrays. For example:
data want;
set have;
array d{9} best.;
array k{9} best.;
array l{9} best.;
do I=1 to 9;
if d{I}=1 and k{1}=1 then l{I}=1;
end;
run;
Either I didn't undertand the concept or I wasn't able to master it, it's not working. Here is my code adapted to your suggestion:
data x;
set testing;
array D{9} best.;
array K2{9} best.;
%do i=1 %to 9 %by 1;
if (K2{i}=maxi and &var <= D{i}) then I1_&var=1; else I1_&var=0;
if (K2{i}=maxi and &var <= D{i}) then I2_&var=0; else I2_&var=1;
%end;
run;
Hi,
Yes, you are mixing datastep and macro code there. My example uses arrays in a normal datastep so you shouldn't need any macro code - though I am not sure and it doesn't say in your example where &var is defined. Note also that you don't want the underscores as they then don't match the arrays :
%let var=SOME_VARIABLE;
data x;
set testing;
array D{9} best.;
array K2{9} best.;
array I{9} best.;
do i=1 to 9 by 1;
if (K2{i}=maxi and &var <= D{i}) then I{I}=1; else I{I}=0;
if (K2{i}=maxi and &var <= D{i}) then I{I}=0; else I{I}=1;
%end;
run;
The principal is that you have a set of variables all called the same thing, e.g. d1 d2 d3 - k2_1 k2_2 etc. these can be thought of as arrays. You want a variable output for each one of these - another array. Then its simply a matter of do looping over the 9 possibles and checking each one to your output array.
Once you have defined an array you should no longer need to use macro logic to generate code. Instead the data step can loop over the index to the array and apply the logic.
Looks like you have two regular variables:
MAXI and &VAR (where the real name of the variable is contained in the macro variable VAR)
Looks like you have two series of variables that we can use ARRAYS to reference:
ARRAY D D1_ D2_ D3_ D4_ D5_ D6_ D7_ D8_ D9_ ;
ARRAY K2 K2_1_ K2_2_ K2_3_ K2_4_ K2_5_ K2_6_ K2_7_ K2_8_ K2_9_ ;
And you want to create two new series of variables:
ARRAY i1_ (9) ;
ARRAY i2_ (9) ;
Then you want to assign these 0/1 values based on some logic. It looks like you want the second to just be the complement of the first.
do i=1 to dim(D);
I1_(i) = (k2(i) = MAXI and &var <= D(i)) ;
I2_(i) = not I1_(i);
end;
Note it would be much easier to write the code if the groups of variables did not have underscore appended to their names. Then you could use variable lists. For example D1-D9 is a variable list that references variables D1, D2, D3, .... , D9.
So, I've done the following:
%macro def(var);
.....
proc univariate ..... /* to obtain the D1_ ... D9_ */
....
proc freq data ..... /* to obtain K2_1 ... K2_9 */
data x;
set testing;
array D{9} best.;
array K{9} best.;
array D D1_ D2_ D3_ D4_ D5_ D6_ D7_ D8_ D9_;
array K2 K2_1 K2_2 K2_3 K2_4 K2_5 K2_6 K2_7 K2_8 K2_9;
%do i=1 %to dim(D);
I1_&var = (K2(i)=maxi and &var <= D(i));
I2_&var=1-I2_&var;
%end;
run;
%mend;
And got:
ERROR 22-322: Syntax error, expecting one of the following: a name, a numeric constant, $, (,
;, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_, _TEMPORARY_.
ERROR 124-185: The variable D has already been defined.
ERROR 201-322: The option is not recognized and will be ignored.
26 ! data x; set testing; array D{9} best.; array K{9} best.; array D D1_ D2_
26 ! D3_ D4_ D5_ D6_ D7_ D8_ D9_; array K2 K2_1 K2_2 K2_3
--
124
ERROR: Required operator not found in expression: dim(D)
ERROR: The %TO value of the %DO I loop is invalid.
ERROR: The macro DEFAILLANCE will stop executing.
ERROR 124-185: The variable K2 has already been defined.
First thing I would change is:
%macro def(var);
proc univariate ..... /* to obtain the D1_ ... D9_ */ /* Change these to be Dx - where x=1 to 9 - no underscores */
...
proc freq data ..... /* to obtain K2_1 ... K2_9 */
...
data x;
set testing;
array D{9} best.;
array K2_{9} best.;
array I{9} best.;
do i=1 to 9;
I{I}= (K2_(i)=maxi and &var <= D(i)); /* This will fill the I array with 1 or 0 based on K and D array versus maxin and &var */
/* This logic does not appear in your original description
I2_&var=1-I2_&var;
hence I remove it */
end;
run;
%mend def;
I assume the problem is that the percentiles were created in the univariate procedure and I can't define an array with the same name. So, how exactly i'm to declare the name of the percentiles in the output option of univariate as ARRAY?
Concerning the other thing I just want a dummy variable I1_&var that goes 1 if &var<=the percentile corresponding to the maximum K2 and 0 otherwise. And I2 that is its complement.
Could you provide some test input and desired output.
Of course:
Input:
Var D1 D2 D3 D4 D5 D6 D7 D8 D9 K2_1 K2_2 K2_3 K2_4 K2_5 K2_6 K2_7 K2_8 K2_9 maxi
100 15 30 45 60 75 100 125 130 132 77 88 99 111 101 98 87 76 70 111
Output:
Var D4 K2_4 I1_var I2_var
100 60 111 0 1
Well, with that example the following achieves the output (oh, and I shouldn't have used best. on the format of the array):
data have;
attrib Var D1 D2 D3 D4 D5 D6 D7 D8 D9 K2_1 K2_2 K2_3 K2_4 K2_5 K2_6 K2_7 K2_8 K2_9 maxi format=best.;
infile datalines;
input Var D1 D2 D3 D4 D5 D6 D7 D8 D9 K2_1 K2_2 K2_3 K2_4 K2_5 K2_6 K2_7 K2_8 K2_9 maxi;
datalines;
100 15 30 45 60 75 100 125 130 132 77 88 99 111 101 98 87 76 70 111
run;
data want (keep=var d_result k2_result l1_var l2_var);
set have;
attrib d_result k2_result l1_var l2_var format=best.;
array K2_{9} 8.;
array D{9} 8.;
i=1;
do while (i<=9 and l1_var=.);
if K2_{i}=maxi then do;
d_result=D{i};
k2_result=K2_{i};
if var <= D{i} then l1_var=1; else l1_var=0;
if var > D{i} then l2_var=1; else l2_var=0;
end;
i=i+1;
end;
run;
Looks to me like the only thing you need to know if WHICH K2 item is MAXI and WHICH D item is VAR
THANK YOU ALL
I combined your very usefull advices and the perfect code looks like:
data x;
set testing;
array D[9];
array K2_[9];
i=whichn(maxi, of K2_
I1_&var=(&var<=D);
I2_&var=1-I1_&var;
run;
I appreciate your efforts very much. Thank you, you've made my day
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.