BookmarkSubscribeRSS Feed
thetwudu
Calcite | Level 5

Hello,

 

I am trying to write a macro that will drop variables from my regression model, one at a time, based on VIF. I want it to run, drop the highest VIF variable, run again, drop the highest, then repeat until all VIFs are 5 or less. Please help. My data set is named "save.dafinal", dependent variable is "gaaverage", and independent variables are saved as a macro variable "%radius1demo". Thanks.

6 REPLIES 6
Reeza
Super User
Your question is coming across as "Do my work" rather than "I need help", In general, if you need someone to do work, you hire a consultant.

What have you tried? Where are you having issues?
thetwudu
Calcite | Level 5

Oh. Sorry. I'm new to posting in a community. Usually I just google my way to an answer from the SAS website, but this one has been rough. This is what I've done thus far.

 

%MACRO MULTICOLLINEARITY(YVAR,FIELDS,MAX_VIF);

ods _all_ close;

%put Running with &fields;

 

PROC REG DATA=save.dafinal;

MODEL &YVAR = &FIELDS / VIF COLLIN NOINT;

ODS OUTPUT PARAMETERESTIMATES=PAREST2;

RUN;

quit;

 

proc sort data=parest2;

by descending varianceinflation;

run;

data _null_;

set parest2(obs=1);

if varianceinflation > &max_vif then do;

fields_run = tranwrd("&fields",trim(variable),' ');

if not missing(fields_run) then do;

call_string = cats('%multicollinearity(',"&yvar.,",fields_run,",&max_vif.)");

call execute(call_string);

end;

end;

else do;

put "Stopped with Max VIF:" variable "=" varianceinflation;

end;

run;

ods preferences;

%MEND MULTICOLLINEARITY;

%MULTICOLLINEARITY(yvar=gaaverage, fields=%radius1demovars,MAX_VIF=5)

phullathi
Obsidian | Level 7

/* Author PRAKASH HULLATHI */
/*CREATING THE DATA FOR MULTICOLLINEARITY*/

%MACRO REMOVE_MULTICOLLINEARITY(DATASET=,YVAR=,VIF_CUTOFF=);

/*taking output of all variables*/
PROC CONTENTS DATA=&DATASET. VARNUM OUT=T;
RUN;

/*filter only numeric variables excluding dependent and date variables*/
DATA T1;
SET T;
KEEP NAME ;
WHERE TYPE=1 AND NAME NOT IN("&yvar.") and FORMAT not in('DATE') ;/* Remove target and date variables*/
RUN;

/*creating macro for independent variables*/
PROC SQL NOPRINT;
SELECT NAME INTO : XVARS SEPARATED BY ' '
FROM T1;
QUIT;

%PUT independent_variables=&XVARS;

/*running the regression model till independent variables vif< specified vif_cutoff*/
%DO %UNTIL (%SYSEVALF(&MAX_VIF.<=&VIF_CUTOFF.) );

/*taking the output of independent variables vif by removing the intercept*/
ODS OUTPUT PARAMETERESTIMATES=PAREST2;

PROC REG DATA=&DATASET. ;

MODEL &YVAR.= &XVARS. / VIF ;

RUN;

QUIT;

/*dropping the independent variables with missing vif value*/


DATA T11;
SET PAREST2;
IF VarianceInflation NOT IN(.) ;
RUN;

/*sorting the vif value by descending order*/
PROC SORT DATA=PAREST2 OUT=PAREST2_SORT ;
BY DESCENDING VarianceInflation;
RUN;

/*considering the highest vif value */
DATA PAREST2_SORT_2;
SET PAREST2_SORT;
IF _N_=1;
RUN;

/*creating the macro for highest vif value*/
PROC SQL NOPRINT;
SELECT VARIABLE INTO: REMOVE_VAR
FROM PAREST2_SORT_2;
QUIT;

/*dropping the highest vif value variable*/

PROC SQL NOPRINT ;
SELECT Variable INTO: XVARS SEPARATED BY ' '
FROM T11
WHERE VARIABLE NOT IN("&REMOVE_VAR.","Intercept") ;
QUIT;

/*getiing the highest vif value*/
PROC SQL NOPRINT;
SELECT MAX(VarianceInflation) INTO: MAX_VIF
FROM PAREST2;
QUIT;

%PUT max_vif=&MAX_VIF. variable_removed=&REMOVE_VAR. ;

%END;

%MEND REMOVE_MULTICOLLINEARITY;


%REMOVE_MULTICOLLINEARITY(DATASET=sashelp.cars,YVAR=MPG_City,VIF_CUTOFF=10)


/* after running the above sas macro code
for multicollinearity then run the below code */

/* final data set with no multicollinearity variables*/
data NO_MULTICOLLINEARITY;
SET PAREST2;
KEEP VARIABLE;
RUN;

PROC SQL NOPRINT;
SELECT VARIABLE INTO : XVARS_final SEPARATED BY ' '
FROM NO_MULTICOLLINEARITY
WHERE VARIABLE NOT IN("Intercept");
QUIT;

PROC REG DATA=SASHELP.CARS;
MODEL MPG_City=&XVARS_final./ VIF ;
RUN;
QUIT;

 

 

 

 

phullathi
Obsidian | Level 7

/* Author PRAKASH HULLATHI */
/*CREATING THE DATA FOR MULTICOLLINEARITY*/

%MACRO REMOVE_MULTICOLLINEARITY(DATASET=,YVAR=,VIF_CUTOFF=);

/*taking output of all variables*/
PROC CONTENTS DATA=&DATASET. VARNUM OUT=T;
RUN;

/*filter only numeric variables excluding dependent and date variables*/
DATA T1;
SET T;
KEEP NAME ;
WHERE TYPE=1 AND NAME NOT IN("&yvar.") and FORMAT not in('DATE') ;/* Remove target and date variables*/
RUN;

/*creating macro for independent variables*/
PROC SQL NOPRINT;
SELECT NAME INTO : XVARS SEPARATED BY ' '
FROM T1;
QUIT;

%PUT independent_variables=&XVARS;

/*running the regression model till independent variables vif< specified vif_cutoff*/
%DO %UNTIL (%SYSEVALF(&MAX_VIF.<=&VIF_CUTOFF.) );

/*taking the output of independent variables vif by removing the intercept*/
ODS OUTPUT PARAMETERESTIMATES=PAREST2;

PROC REG DATA=&DATASET. ;

MODEL &YVAR.= &XVARS. / VIF ;

RUN;

QUIT;

/*dropping the independent variables with missing vif value*/


DATA T11;
SET PAREST2;
IF VarianceInflation NOT IN(.) ;
RUN;

/*sorting the vif value by descending order*/
PROC SORT DATA=PAREST2 OUT=PAREST2_SORT ;
BY DESCENDING VarianceInflation;
RUN;

/*considering the highest vif value */
DATA PAREST2_SORT_2;
SET PAREST2_SORT;
IF _N_=1;
RUN;

/*creating the macro for highest vif value*/
PROC SQL NOPRINT;
SELECT VARIABLE INTO: REMOVE_VAR
FROM PAREST2_SORT_2;
QUIT;

/*dropping the highest vif value variable*/

PROC SQL NOPRINT ;
SELECT Variable INTO: XVARS SEPARATED BY ' '
FROM T11
WHERE VARIABLE NOT IN("&REMOVE_VAR.","Intercept") ;
QUIT;

/*getiing the highest vif value*/
PROC SQL NOPRINT;
SELECT MAX(VarianceInflation) INTO: MAX_VIF
FROM PAREST2;
QUIT;

%PUT max_vif=&MAX_VIF. variable_removed=&REMOVE_VAR. ;

%END;

%MEND REMOVE_MULTICOLLINEARITY;


%REMOVE_MULTICOLLINEARITY(DATASET=sashelp.cars,YVAR=MPG_City,VIF_CUTOFF=10)


/* after running the above sas macro code
for multicollinearity then run the below code */

/* final data set with no multicollinearity variables*/
data NO_MULTICOLLINEARITY;
SET PAREST2;
KEEP VARIABLE;
RUN;

PROC SQL NOPRINT;
SELECT VARIABLE INTO : XVARS_final SEPARATED BY ' '
FROM NO_MULTICOLLINEARITY
WHERE VARIABLE NOT IN("Intercept");
QUIT;

PROC REG DATA=SASHELP.CARS;
MODEL MPG_City=&XVARS_final./ VIF ;
RUN;
QUIT;

 

 

 

 

Reeza
Super User
It's not common to reply to a 4 year old thread. Are you providing an answer here or trying to ask a question?
phullathi
Obsidian | Level 7

Hi Team,

 

I am providing the correct solution for removing multicollinearity sequentially.

 

Regards

Prakash Hullathi

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2719 views
  • 0 likes
  • 3 in conversation