Write and run SAS programs in your web browser

VARIABLE/DIMENSION Reduction using proc corr

Reply
Occasional Contributor
Posts: 6

VARIABLE/DIMENSION Reduction using proc corr

Hi I am performing logistic regression and i am trying to reduce the continous variable for my model .

 I have applied PRoc corr and applied the filter of corelation as  x <-0.3 / x>0.3(conditional formatting) in excel. now how should i select the varibale for final model.

 

Refer to excel sheet attached . Please also share the process to reduce variable when you have lots of variables with corelation 

Trusted Advisor
Posts: 1,499

Re: VARIABLE/DIMENSION Reduction using proc corr

[ Edited ]

Try using either PROC PRINCOMP or better yet PROC PLS to reduce the dimensionality of your predictor variables.

 

Don't use PROC CORR for this purpose.

Occasional Contributor
Posts: 6

Re: VARIABLE/DIMENSION Reduction using proc corr

But , Master Miller , it will also help in reducing collinearity as well .Also , PRoc PLS is better for reduction of continous variables ?
Trusted Advisor
Posts: 1,499

Re: VARIABLE/DIMENSION Reduction using proc corr


pathakvishal wrote:
But , Master Miller , it will also help in reducing collinearity as well .Also , PRoc PLS is better for reduction of continous variables ?

That's one of the major benefits of PROC PLS is that it provides better estimates of the model coefficients and better estimates of the predicted values (better in this case meaning lower mean squared error of the estimates) in the presence of collinearity among the predictor variables, compared to ordinary least squares regression.

Occasional Contributor
Posts: 6

Re: VARIABLE/DIMENSION Reduction using proc corr

Hi Master Miller , 

 

I hope we use PROC PLS before modelling (PROC Logistic) for continous varibale reduction . Also i am not able to find any good & easy article on PROC PLS() . If posssible , can you please share the link of any stuff like that. Thanx in advance !!!

Trusted Advisor
Posts: 1,499

Re: VARIABLE/DIMENSION Reduction using proc corr

Maybe we need to take a step back.

 

PLS does not reduce the number of original predictor variables. You let PLS determine which variables have high importance, and which have low importance, but they are all in the model. It uses ALL of them. You don't use PLS to select some to use, and discard the rest. This is different than what you may have learned about using PROC REG. This is a paradigm shift, and an important and valuable shift.

 


 


Also i am not able to find any good & easy article on PROC PLS() .


The documentation is a good place to start. Google finds plenty of introductory articles on Partial Least Squares.

Occasional Contributor
Posts: 6

Re: VARIABLE/DIMENSION Reduction using proc corr

Hi Master Miller , 

 

I am doing some case study for the first time , Logistic regression  and i want to remove variables with high corelation for which you said USe proc PLS . I went through many articles about PROC PLS , but have some confusion ..

 

1 )  i was planning to do Proc corr(for continous variables) , remove varaibles with colinearity and then opt  factor analysis to get most effective variable so that it can be used in logistic regression (Proc Logistic).  My concern as you told me over thread to use PROC PLS so just want to know PLS is an step in place of PRoc Cor OR its a complete modelling step like proc Logistic .

 

2) if it is used in place PROC CORR , then do we have to use factor analysisin the next step or directly i can go ahead with PRoc Losgictic .

 

3) Which statement inside PROC PLS we must use to get  desired variables (spare the dumbness) ?

Trusted Advisor
Posts: 1,499

Re: VARIABLE/DIMENSION Reduction using proc corr

[ Edited ]
  1. PLS is a complete modelling method that accounts for the collinearity among your predictor variables
  2. No factor analysis, no PROC CORR, no logistic regression; if you use PROC PLS, you would need to create dummy variables of your responses to simulate a logistic regression model; it's not quite the same as logistic regression from a statistical point of view, but it does enable you to predict which category the data point belongs in. If you want an actual logistic version of Partial Least Squares, it is described in https://cedric.cnam.fr/fichiers/RC906.pdf, and there is an R package which appears to do logistic partial least squares regression at https://cran.r-project.org/web/packages/plsRglm/plsRglm.pdf. I am not aware of anyone programming this in SAS.
  3. PLS does not eliminate variables the way you keep asking. It uses ALL variables and assigns the one that are least important a loading value close to zero.
Ask a Question
Discussion stats
  • 7 replies
  • 138 views
  • 0 likes
  • 2 in conversation