BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nstdt
Quartz | Level 8

Dear all, I am looking for solutions to the situation  below.

 

I have two macros that interact as follows: the macro function %ProcessOrigData passes a dataset A  and a variable V as macro parameters  to the macro function %CreateNewData.

The variable V is one of the variables contained in the dataset A. I need the macro %CreateNew Data to create a new dataset from dataset A using a data step , in which I need to manipulate the variable V using an If condition. However, I am unable to use the If statement as I normally would in regular SAS inside a Data Step since the variable &V is a macro variable. I tried using %IF but I get an error (something about unexpected input or expecting a different kind of input).

 

I need to call %CreateNewData several times from inside %ProcessOrigData with the dataset A and a new variable V each time. The dataset A contains variables State,Age,LoanAmt,InterestRate...etc and a variable called rank. The important thing is I need to call %CreateNewData(A, InterestRate)- with a different variable each time- from inside the macro %ProcessOrigData.

 

The only solution I could think of is to replace all the variable names in dataset A with automatic names such as Var1,Var2...etc and have a DO loop handle each variable. However, I already need a DO loop inside %CreateNewData to process the variable V using an IF statement and want to avoid too many loops if possible. 

 

Here is some  sample code(I am using SAS enterprise guide 5.1 :

 

 

%macro CreateNewData ( A, V)

 

data outdata;

set &A;

 

%do i=1 %to &Num; /** Start DO loop **/
     

/*** Note : &Num and &&limit&i are GLOBAL macro variables defined earlier ***/

     high=&&limit&i.;

   

    %if missing(&V) %then
            %do;
                    rank = 0;
            %end;
   %else %do;
            %if (&V < high and rk = -1) %then rk = &i.;
    %end;


%end; /** ends the DO loop **/

 

run; /** end the Data step **/

 

 

%mend; /** end the macro CreateNewData **/

 

/************ macro %ProcessOrigData is below **********************/

 

%macro ProcessOrigData(inputData1, inputData2, V);

 

data A;

set inputData1

 

/** processing steps**/

 

run;

 

%createNewData(A, V);

 

%mend;

 

%mend;

 

 

Thanks for any advice!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

Oh it's a binning problem.  My first thought for binning in SAS is to use formats.   This separates the definition of the bin categories from the data step code, and tends to make life easier. 

 

Consider code like below:

 

proc format;
  value bheight
    low-60  ='1'
    60<-high='2'
  ;
  value bweight
    low-100  ='1'
    100<-high='2'
  ;
run;

proc print data=sashelp.class;
  where put(height,bheight.)='1';
run;

data binned;
  set sashelp.class;
  bheight=put(height,bheight.);
  bweight=put(weight,bweight.);
run;

proc print data=binned;
run;

 

As shown with the PROC PRINT step, often you don't need to create a new dataset to apply the binning, you can just use the format. Or you can make separate bin variables for each variable you would want to be able to bin by.

 

To make the code more dynamic, you could make a dataset which stores the cutpoints for each variable, and build the formats from that dataset (using PROC FORMAT CNTLIN) and you could use a macro to generate the  set to assignment statements to create the bin variables.  But I would start without macro code, and just focus on binning approaches.

The Boston Area SAS Users Group is hosting free webinars!
Next up: Joe Madden & Joseph Henry present Putting Power into the Hands of the Programmer with SAS Viya Workbench on Wednesday Nov 6.
Register now at https://www.basug.org/events.

View solution in original post

13 REPLIES 13
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Not seeing any need for all that code.  Perhaps post a small bit of test data (form of a datastep) and what you want the output to look like.  A simple datastep should suffice.  I see a few problems with where your going with that code below (other than overcomplicating things), for instance you use missing function in the macro code not datastep, it doesn't do what you think it will do.  You have to remember, macr is simply a text generation tool - it doesn't do anything.  What comes out of the macro pre-processor is text Base SAS code, which then gets fed into the compiler where the actual processing happens.  So you want to focus on getting Base SAS code working, then if you see overlaps, then is the time to whittle it down using some macro code.

Shmuel
Garnet | Level 18

With hope I undrestood your question,

change line:   %createNewData(A, V);

 

into: %createNewData(A, &V);

 

Thats because you suplay the V variable name while executing

    %ProcessOrigData(inputData1, inputData2, V);

nstdt
Quartz | Level 8

Sorry , that was a typo in my post -- I do actually call %createNewData(A,&V) not V. I still get errors. I would like to be able to use the name of the variable inside my data step within %createNewData. That is, suppose &V = Age.  I want to say if ( Age < high) then do....

 

At the moment, SAS does not allow me to execute that IF statement with or without the % sign inside the data step.

 

Kurt_Bremser
Super User

You cannot use the macro to do anything in the data step, you can only use the macro to create code for the data step.

All macro action is over before the data step is compiled and run.

You also cannot use data step functions in macro language conditions unless you wrap them into %sysfunc(). And you cannot use data step variables in macro, because they do not yet exist when the macro executes.

I guess you can accomplish what you want to do by creating a temporary array with your limit values and a simple do loop.

nstdt
Quartz | Level 8

Thanks for the reply. 

What I ended up doing was rename the variable &V to the generic Var1 . This works for my purposes since I only need to pass in one variable along with my dataset A each time. I have also removed the %IF --- now, there are only normal IF statements inside the data step.

 

however, you said there should be no macro statements inside the Data step. I still have the %DO loop which runs fine. Please see below. So it looks like macro functions can be processed inside Data steps, maybe just not inside an If function? Is that right?

 

%macro CreateNewData(A, V);

data outdata.;
set &A;


%do i=1 %to &Num;


        high=&&v&i.;
        if missing(Var1) then
        do;
            rank = 0;
       end;
     else do;
             if (Var1 < high and rank = -1) then rank = &i.;
     end;
%end;

run;

Shmuel
Garnet | Level 18

As said before, a macro program just generates a text.

You can use a macro [by %macro_name()] inside datastep, if generated text makes sence inside that datastep.

 

You cannot use a mcro it if generated text is like:

 

        data a;

          set b;

               .....

                /* generated text: */

               data x;

                 set y;

               ...

              run;

             /* end of generated text */

             ....

       run;

Kurt_Bremser
Super User

Although it might look as if the macro code worked "in" the data step, it didn't. But it created &num repetitions of code for the data step.

The data step worked as if you yourself had typed the code repeatedly.

Tom
Super User Tom
Super User

Why not use an ARRAY instead?

data outdata.;
  set &A;
  array high &V.1 - &V.&NUM ;
  rank=-1;
  if missing(Var1) then rank = 0;
  else do i=1 to dim(high) while (rank = -1);
    if (Var1 < high(i)) then rank = i ;
  end;
run;
Quentin
Super User

Hi,

 

The macro language is tough.  I suggest you re-read @Kurt_Bremser's explanation again.

 

A key point is that the purpose of the macro language is to generate SAS code.  The macro language does not process data, and does not know about data sets or data set variables.  It doesn't even know about SAS code, it just generates text and passes it along.  It's simply a way to generate SAS code instead of typing it yourself. 

 

Beginning macro programmers are also confused about when to use a macro language %IF statement vs a data step IF statement.  If you want to evaluate the value of  a macro variable (or expression) to make a decision about what SAS code to generate, you need to use an %IF statement.  If you want to evaluate the value of a data step variable (or expression) and make a run-time decision about which SAS code to execute, you need an IF statement.

 

I suggest you turn on system options MPRINT , which will show the SAS code that was generated by the macro language, and review it carefully to make sure you macro is generating the code you want.  The logic looks surprising to me, but you haven't given a full example of your problem and inputs, so I can't say with certainty that it is doing the wrong thing.

 

Here's a little example similar to yours.

 

%macro CreateNewData(data= 
                    ,out= 
                    ,var= /*space delimited list of variables*/
                     );
  %local i vari;

  data &out;
    set &data;


    %*loop over the list of variables, generating an if/else block for each variable;
    %do i=1 %to %sysfunc(countw(&var,%str( ))); 
       %let vari=%scan(&var,&i,%str( ));
  
       if &vari > 60 then &vari._over60=1;
       else &vari._over60=0;
    %end;

 run;
 %mend CreateNewData;

 options mprint;
 %CreateNewData(data=sashelp.class,out=want,var=height weight)

 

That macro takes a VAR parameter which accepts a list of dataset variable names.  There is a macro %DO loop which iterates through the list of dataset variable names.  Within the macro %DO loop it generates a SAS language IF/ELSE block.   It is a data step IF statement because I am making a decision about the values stored in a data step variable.

 

Running the macro with MPRINT turned on shows this code was generated:

215   %CreateNewData(data=sashelp.class,out=want,var=height weight)
MPRINT(CREATENEWDATA):   data want;
MPRINT(CREATENEWDATA):   set sashelp.class;
MPRINT(CREATENEWDATA):   if height > 60 then height_over60=1;
MPRINT(CREATENEWDATA):   else height_over60=0;
MPRINT(CREATENEWDATA):   if weight > 60 then weight_over60=1;
MPRINT(CREATENEWDATA):   else weight_over60=0;
MPRINT(CREATENEWDATA):   run;

And looking there, you can see why Kurt suggested you consider your original problem, and think about possible array solutions. 

 

 

Kind Regards,

-Q.

The Boston Area SAS Users Group is hosting free webinars!
Next up: Joe Madden & Joseph Henry present Putting Power into the Hands of the Programmer with SAS Viya Workbench on Wednesday Nov 6.
Register now at https://www.basug.org/events.
nstdt
Quartz | Level 8

Thanks very much for the detailed reply. It comes close to the problem I'm facing. 

I am trying to create bins for different variables ("V") in a dataset ("A"), but each variable is handled seprately and produces an associated binned dataset of its own through %CreateNewData.

My goal is to generate these different "binned" datasets corresponding to each variable(for  further analysis later on). I wanted to have a good structure to handle the many variables in my dataset, so I knew I needed to use macros - though I do agree I have complicated things quite a bit 😞

 

 

The macro %ProcessOrigData -which assigns the number of Bins and the individual break points for each variable- passes the dataset A and the current variable V to be binned to the macro %CreateNewData . 

The number of bins "&Num" and the  exact break points "&&v&i" for each variable are different - they are stored as global macro variables currently inside %ProcessOrigData, though it is a complicated setup.

 

The macro  %CreateNewData bins the variable V according to the breakpoints "&&v&i" and creates a new data set which contains the Bin number ("rank" in my code ). I would really like to figure out a neater structurefor all this ! 

 

In your code, I think I would just use your %DO loop as the outer loop. So it would look like the code below. I'm going to use "&&BrPt&j" instead of  "&&v&i"  and "&numBreaks" instead of "&Num" for clarity.

%macro CreateNewData(data= 
                    ,out= 
                    ,var= /*space delimited list of variables*/
                     );
  %local i vari;

  data &out;
    set &data;

    %*loop over the list of variables, generating an if/else block for each variable;
    %do i=1 %to %sysfunc(countw(&var,%str( ))); 
       %let vari=%scan(&var,&i,%str( ));

/** Loop over the number of Bins for the current variable. Both &numBreaks and &&BrPt&j are global. **/ %do j=1 %to &numBreaks;
high=&&BrPt&j.;

/** Missing values go in Bin 0 **/
if missing(&vari) then
do;
rank = 0;
end;
else do;
if (&vari < high and rank = -1) then rank = &j.;
end;
%end;
/** Values greater than the highest break point are in the highest bin **/
max = &&BrPt&numBreaks.;maxBin = %eval(&numBreaks+1);
if ((&vari >= max) and rank = -1) then rank =maxBin;
%end;
run;
%mend CreateNewData;
options mprint;
%CreateNewData(data=sashelp.class,out=want,var=height weight)

  I'm including a skeleton of the macro %ProcessOrigData below, along with some code I found from the DO loop blog to generate dummy data. I am treating the macro variables &numBreaks and all the possible break points I can have &BrPt1,...etc as global macro variables.

 

/** Get dummy data **/

/* SAS macro that duplicates the Excel RANDBETWEEN function */
%macro RandBetween(min, max);
(&min + floor((1+&max-&min)*rand("uniform")))
%mend;

data test;
do i = 1 to 100;
height = %RandBetween(100, 200);
weight = %RandBetween(40, 100);
output;
end;
run;

 

/**  Code for the macro ProcessOrigData **/

/** I just initialize the maximum possible break points I cna have for any variable to zero. This isn't really optimal..... **/

/** I also have a lot of variables in the real problem ...***/

 

%let numBreaks =0; %let BrPt1 =0; %let BrPt2 =0;
%macro processOrig(inputVar);
if &inputVar = height then do;
&numBreaks = 1;
&BrPt1 = 160;
end;

else do;
&numBreaks = 2;
&BrPt1 = 50;&BrPt2 = 70;
end;

 

Please do give your comments on what would be an efficient way to tackle this binning problem...

Thanks for your help!

ballardw
Super User

@nstdt wrote:

 

My goal is to generate these different "binned" datasets corresponding to each variable(for  further analysis later on).


Depending on the analyis later you may be better off leaving all the data in one set and having one or more variables that identifies the sets of records to analyze together. The SAS BY statement will let almost any procedure do the same analysis on all of the groups identified by the variables on a BY statement. You can also use data set options to restrict which ones if needed such as a WHERE statement or dataset option WHERE option.

Proliferation of datasets usually means much more code later on.

Quentin
Super User

Oh it's a binning problem.  My first thought for binning in SAS is to use formats.   This separates the definition of the bin categories from the data step code, and tends to make life easier. 

 

Consider code like below:

 

proc format;
  value bheight
    low-60  ='1'
    60<-high='2'
  ;
  value bweight
    low-100  ='1'
    100<-high='2'
  ;
run;

proc print data=sashelp.class;
  where put(height,bheight.)='1';
run;

data binned;
  set sashelp.class;
  bheight=put(height,bheight.);
  bweight=put(weight,bweight.);
run;

proc print data=binned;
run;

 

As shown with the PROC PRINT step, often you don't need to create a new dataset to apply the binning, you can just use the format. Or you can make separate bin variables for each variable you would want to be able to bin by.

 

To make the code more dynamic, you could make a dataset which stores the cutpoints for each variable, and build the formats from that dataset (using PROC FORMAT CNTLIN) and you could use a macro to generate the  set to assignment statements to create the bin variables.  But I would start without macro code, and just focus on binning approaches.

The Boston Area SAS Users Group is hosting free webinars!
Next up: Joe Madden & Joseph Henry present Putting Power into the Hands of the Programmer with SAS Viya Workbench on Wednesday Nov 6.
Register now at https://www.basug.org/events.
nstdt
Quartz | Level 8

Thanks, I think using proc format is the way to go- didn't know about this option before.

Thanks also for the nice explanation about %If and If statements!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 6125 views
  • 8 likes
  • 7 in conversation