BookmarkSubscribeRSS Feed
RobertNYC
Obsidian | Level 7

Hi all,

I have a long data reduction program which reduces variables to get ready for predictive modeling.  The main procedure I use is proc varclus.   What I would like to like to do is make this more automated.  What I would like to have happen is have the data reduction macro run again and again for one target until the variables rep and first are equal to each other.  So, in other words run over and over again until only one variable falls into each cluster.  

Example after one run, I may get a data set which looks like this.

 

 

Example after one run, I may get a data set which looks like this.

_NAME_

rep

first

POR_COR_AP_W

*

*

POR_KIDS

*

*

MPRTL6

 

*

M_PER

 

*

FLAG_DRESS_B

 

*

FIRST_PROD_W

 

*

FLAG_ACC_G

 

*

FRAGRANCE_PER

 

*

NE_ind

 

*

DSLP

 

*

PUR_FOOTWEAR_B

 

*

PUR_Q2

 

*

MDOLL76

*

*

avg_LUX_APPAREL_M_PV

*

*

ROR_KIDS

 

*

TOTALP_STORES

*

*

PUR_ACC_B

 

*

MKD_OTHER

*

*

sales_ratiod

*

*

 

I would like the macro to run over and over again until  

_NAME_

rep

first

MPRTL6

*

*

NE_ind

*

*

DSLP

*

*

PUR_FOOTWEAR_B

*

*

MDOLL76

*

*

ROR_KIDS

*

*

TOTALP_STORES

*

*

MKD_OTHER

*

*

sales_ratiod

*

*

 

I’m having trouble programming in the iterative process  any feedback /help id greatly appreciated 

%macro reduce(n=, t=, b=, outstat=);


proc corr data= mod.master_model_build  best=&b. noprob outp=cor_&t. noprint  nosimple ; 
var /*my vars***/; 



;
with 
&t.; run;

/*******reduction code........*/
/*******reduction code........*/
/*******reduction code........*/
/*******reduction code........*/
/*******reduction code........*/

/******Eventually the code comes to a proc varclus statement*******/

proc varclus data=mod.master_model_build  maxeigen=.9 outstat=&outstat.  noprint short;
var &varlist.;run;


/*******more reduction code........*/
/*******more reduction code........*/

/***last step***/
data t12;
merge t12 t3;
by cluster;
if r2ratio=minrat then rep='*';
else rep='';
if first.cluster then first='*';
drop minrat;
run;

data mod.best_&t.; 
set  t12;
keep _name_ rep first;
/***I always want to take the var that is at the top of the cluster***/
where  first='*' ;run; 

;run; 

/***and here is where it ends****/
/****I have to run again if first='*' is not equal to rep='*'***/
%mend;
%reduce(outstat=statset1,  t=depvar1,  b=9999);run; 
%reduce(outstat=statset2,  t=depvar2,  b=9999);run; 
%reduce(outstat=statset3,  t=depvar3,  b=9999);run; 
6 REPLIES 6
PaigeMiller
Diamond | Level 26

Forgive me but I'm not going to try to give a direct answer to your question.

 

However, I point out that if you really want to do predictive modelling, why would you use a method like variable clustering, that doesn't take into account the ability of the X variables to predict the Y? How could you possibly get the best predictive model that way?

 

Wouldn't it make more sense to use a method that determines which variables are most predictive and then uses those variables in the predictive model? If so, if you think that makes more sense, then forget this variable reduction via variable clustering idea, and use Partial Least Squares (PROC PLS in SAS). PLS gives high weights to variables that are predictive and low weights to variables that are not predictive, and this not only overcomes the drawbacks of variable clustering, but also means you don't have to write and debug a macro to do this. And as a side benefit, all of this effort of "selecting the right variables for the predictive model" goes away as well.

--
Paige Miller
RobertNYC
Obsidian | Level 7
the question above about isn't about predictive molding its about modifying a macro to make it iterative.
Tom
Super User Tom
Super User

I cannot tell what you need help with.

Your posted code doesn't really look like a full program. Is that because you removed some of the details to make the code smaller?

What is you actual question?

 

If you want your macro to loop you probably want to add a %DO loop.  What kind depends on what you actually want to change on each iteration of the loop.

 

RobertNYC
Obsidian | Level 7

Yes, I removed a bunch of the code. My actual question  just is  how to a write an iterative macro which stops processing when  the variables rep and first are both equal to * .  So, do until rep='*' and first='*'

Tom
Super User Tom
Super User

@RobertNYC wrote:

Yes, I removed a bunch of the code. My actual question  just is  how to a write an iterative macro which stops processing when  the variables rep and first are both equal to * .  So, do until rep='*' and first='*'


Macro code doesn't access data variables directly.  It looks like you want to keep running until ALL observations have the value '*'. 

Here is one way to make such a test and populate a macro variable with either YES or NO to represent the result.

%let done=YES ;
data _null_;
  set have ;
  if rep ne '*' or first ne '*' then do;
    call symputx('done'='NO');
    stop;
  end;
run;

You could then perhaps include that into a loop in your macro.

%local done ;
...
%do %until (&done = YES);
...
%end;

But you might want to add more logic to make sure that you don't get into an infinite loop.

RobertNYC
Obsidian | Level 7

Thanks Tom, this is very helpful.  I will see if I can figure it out.  thanks!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 887 views
  • 1 like
  • 3 in conversation