Not applicable
Posts: 0

# Regression modelling problem

Hi,
my problem like as

data new;
input x y z u v;
cards;
1 0 3 4 7
2 3 2 5 9
3 . 0 3 7
4 3 5 1 7
5 6 8 3 5
;
proc reg;
model x = y z u v / vif out=vifmatrix collin out=colinmatrix;
run;

I got two out put file as vifmatrix and colinmatrix, I have to check maximum vif of the corresponding variable (say u) then find out the maximum colliniarity variable (say v) with maximum vif variable(u), then delete that variable (v) from data set then again run proc reg with same data set but variable ‘v’ is not contain in the dataset.

I want this bcz I have thousand variables, I don’t want to repeat the same sas code.

Can any one give me any idea ( any macro, any loop kind of things)

Regards,
Pabitra
Not applicable
Posts: 0

## Re: Regression modelling problem

you should use ods tables (http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/reg_sect49.htm) to get the output to datasets. For example,
/*
data new;
input x y z u v;
cards;
1 0 3 4 7
2 3 2 5 9
3 . 0 3 7
4 3 5 1 7
5 6 8 3 5
6 3 6 9 10
7 2 5 10 20
;
proc reg;
ods output CollinDiag=colinmatrix ParameterEstimates=vifmatrix(keep= variable varianceinflation);
model x = y z u v / vif collin;
run;
*/

I added two more lines of data, because the model was an over-fit with fewer observations and no collinear diagnositics was produced.
Not applicable
Posts: 0

## Re: Regression modelling problem

This ok..but my problem is other,

i I got two out put file as vifmatrix and colinmatrix, I have to check maximum vif of the corresponding variable (say u) then find out the maximum colliniarity variable (say v) with maximum vif variable(u), then delete that variable (v) from data set then again run proc reg with same data set but variable ‘v’ is not contain in the dataset.

I want this bcz I have thousand variables, I don’t want to repeat the same sas code.
Not applicable
Posts: 0

## Re: Regression modelling problem

Now that you had your colinmatrix and vifmatrix, just merge them side by side and run two queries. For example,
/*
data together;
merge vifmatrix colinmatrix;
proc sql;
select variable into :u from together
having varianceinflation=max(varianceinflation);
select variable into :v from together
where variable ne "&u"
having &u = max(&u);
quit;
*/
After that, &v is the variable that has max "var prop" with &u that has the max variance inflation.
However, the "var prop"s in the colinmatrix means nothing to me.
Not applicable
Posts: 0

## Re: Regression modelling problem

thanks,
but in this example, max vif for 'v', and 2nd maximum correlation with u, i want to remove u from my dataset, then i want to run my sas code proc reg again with modified dataset, next time again i want to check same things, same modification in the data set again run proc reg, i need kind of loop or macro for this problem then i cant repeate the same code by manually,

plz suggest me
Not applicable
Posts: 0

## Re: Regression modelling problem

For example, if you want to select 2 variables at last,
/*
data new;
input x y z u v;
cards;
1 0 3 4 7
2 3 2 5 9
3 . 0 3 7
4 3 5 1 7
5 6 8 3 5
6 3 6 9 10
7 2 5 10 20
;

%macro var_select(dsn=new, depvar=x, varlist=y z u v, n=4);
ods _all_ close;
%do %until(&n = 2);
proc reg data=&dsn;
ods output CollinDiag=colinmatrix ParameterEstimates=vifmatrix(keep= variable varianceinflation);
model &depvar = &varlist / vif collin;
run;

data together;
merge vifmatrix colinmatrix;
proc sql noprint;
*find the variable name with max variance inflation and dump it into &u;
select variable into :u from together
having varianceinflation=max(varianceinflation);
*find the variable name with max 'var prop' associated with &u and dump it into &v (I'm not sure if it's the statistically right thing to do);
select variable into :v from together
where variable ne "&u"
having &u = max(&u);

*Then run a query to construct the list of independent variables excluding &v;
select count(*), variable into :n, :varlist separated by ' '
from vifmatrix where variable ne "&v" and lowcase(variable) ne 'intercept';
quit;
%end;
ods listing;

*dump the selected variables into a dataset;
proc sql;
create table var_selection as
select variable
from vifmatrix where variable ne "&v" and lowcase(variable) ne 'intercept';
quit;
%mend var_select;

%var_select;
*/
The variables selected will go to dataset var_selection, and you may change the stop condition "&n=2" to anything you like.
Not applicable
Posts: 0

## Re: Regression modelling problem

thanx,
now i want to stop this loop when VIF<2 (say) and index<20 (say)

also i want out put of proc regression

plz suggest me Message was edited by: pabitra
Not applicable
Posts: 0

## Re: Regression modelling problem

It's way beyond my capacity, unless you can offer me a job
Not applicable
Posts: 0

## Re: Regression modelling problem

i m student..i want to learn more.....how can i offer u job.....
Discussion stats
• 8 replies
• 267 views
• 0 likes
• 1 in conversation