- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I have a long data reduction program which reduces variables to get ready for predictive modeling. The main procedure I use is proc varclus. What I would like to like to do is make this more automated. What I would like to have happen is have the data reduction macro run again and again for one target until the variables rep and first are equal to each other. So, in other words run over and over again until only one variable falls into each cluster.
Example after one run, I may get a data set which looks like this.
Example after one run, I may get a data set which looks like this.
_NAME_ |
rep |
first |
POR_COR_AP_W |
* |
* |
POR_KIDS |
* |
* |
MPRTL6 |
|
* |
M_PER |
|
* |
FLAG_DRESS_B |
|
* |
FIRST_PROD_W |
|
* |
FLAG_ACC_G |
|
* |
FRAGRANCE_PER |
|
* |
NE_ind |
|
* |
DSLP |
|
* |
PUR_FOOTWEAR_B |
|
* |
PUR_Q2 |
|
* |
MDOLL76 |
* |
* |
avg_LUX_APPAREL_M_PV |
* |
* |
ROR_KIDS |
|
* |
TOTALP_STORES |
* |
* |
PUR_ACC_B |
|
* |
MKD_OTHER |
* |
* |
sales_ratiod |
* |
* |
I would like the macro to run over and over again until
_NAME_ |
rep |
first |
MPRTL6 |
* |
* |
NE_ind |
* |
* |
DSLP |
* |
* |
PUR_FOOTWEAR_B |
* |
* |
MDOLL76 |
* |
* |
ROR_KIDS |
* |
* |
TOTALP_STORES |
* |
* |
MKD_OTHER |
* |
* |
sales_ratiod |
* |
* |
I’m having trouble programming in the iterative process any feedback /help id greatly appreciated
%macro reduce(n=, t=, b=, outstat=); proc corr data= mod.master_model_build best=&b. noprob outp=cor_&t. noprint nosimple ; var /*my vars***/; ; with &t.; run; /*******reduction code........*/ /*******reduction code........*/ /*******reduction code........*/ /*******reduction code........*/ /*******reduction code........*/ /******Eventually the code comes to a proc varclus statement*******/ proc varclus data=mod.master_model_build maxeigen=.9 outstat=&outstat. noprint short; var &varlist.;run; /*******more reduction code........*/ /*******more reduction code........*/ /***last step***/ data t12; merge t12 t3; by cluster; if r2ratio=minrat then rep='*'; else rep=''; if first.cluster then first='*'; drop minrat; run; data mod.best_&t.; set t12; keep _name_ rep first; /***I always want to take the var that is at the top of the cluster***/ where first='*' ;run; ;run; /***and here is where it ends****/ /****I have to run again if first='*' is not equal to rep='*'***/ %mend; %reduce(outstat=statset1, t=depvar1, b=9999);run; %reduce(outstat=statset2, t=depvar2, b=9999);run; %reduce(outstat=statset3, t=depvar3, b=9999);run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Forgive me but I'm not going to try to give a direct answer to your question.
However, I point out that if you really want to do predictive modelling, why would you use a method like variable clustering, that doesn't take into account the ability of the X variables to predict the Y? How could you possibly get the best predictive model that way?
Wouldn't it make more sense to use a method that determines which variables are most predictive and then uses those variables in the predictive model? If so, if you think that makes more sense, then forget this variable reduction via variable clustering idea, and use Partial Least Squares (PROC PLS in SAS). PLS gives high weights to variables that are predictive and low weights to variables that are not predictive, and this not only overcomes the drawbacks of variable clustering, but also means you don't have to write and debug a macro to do this. And as a side benefit, all of this effort of "selecting the right variables for the predictive model" goes away as well.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I cannot tell what you need help with.
Your posted code doesn't really look like a full program. Is that because you removed some of the details to make the code smaller?
What is you actual question?
If you want your macro to loop you probably want to add a %DO loop. What kind depends on what you actually want to change on each iteration of the loop.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I removed a bunch of the code. My actual question just is how to a write an iterative macro which stops processing when the variables rep and first are both equal to * . So, do until rep='*' and first='*'
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@RobertNYC wrote:
Yes, I removed a bunch of the code. My actual question just is how to a write an iterative macro which stops processing when the variables rep and first are both equal to * . So, do until rep='*' and first='*'
Macro code doesn't access data variables directly. It looks like you want to keep running until ALL observations have the value '*'.
Here is one way to make such a test and populate a macro variable with either YES or NO to represent the result.
%let done=YES ;
data _null_;
set have ;
if rep ne '*' or first ne '*' then do;
call symputx('done'='NO');
stop;
end;
run;
You could then perhaps include that into a loop in your macro.
%local done ;
...
%do %until (&done = YES);
...
%end;
But you might want to add more logic to make sure that you don't get into an infinite loop.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Tom, this is very helpful. I will see if I can figure it out. thanks!