BookmarkSubscribeRSS Feed
pathakvishal
Fluorite | Level 6

Hi I am performing logistic regression and i am trying to reduce the continous variable for my model .

 I have applied PRoc corr and applied the filter of corelation as  x <-0.3 / x>0.3(conditional formatting) in excel. now how should i select the varibale for final model.

 

Refer to excel sheet attached . Please also share the process to reduce variable when you have lots of variables with corelation 

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Try using either PROC PRINCOMP or better yet PROC PLS to reduce the dimensionality of your predictor variables.

 

Don't use PROC CORR for this purpose.

--
Paige Miller
pathakvishal
Fluorite | Level 6
But , Master Miller , it will also help in reducing collinearity as well .Also , PRoc PLS is better for reduction of continous variables ?
PaigeMiller
Diamond | Level 26

@pathakvishal wrote:
But , Master Miller , it will also help in reducing collinearity as well .Also , PRoc PLS is better for reduction of continous variables ?

That's one of the major benefits of PROC PLS is that it provides better estimates of the model coefficients and better estimates of the predicted values (better in this case meaning lower mean squared error of the estimates) in the presence of collinearity among the predictor variables, compared to ordinary least squares regression.

--
Paige Miller
pathakvishal
Fluorite | Level 6

Hi Master Miller , 

 

I hope we use PROC PLS before modelling (PROC Logistic) for continous varibale reduction . Also i am not able to find any good & easy article on PROC PLS() . If posssible , can you please share the link of any stuff like that. Thanx in advance !!!

PaigeMiller
Diamond | Level 26

Maybe we need to take a step back.

 

PLS does not reduce the number of original predictor variables. You let PLS determine which variables have high importance, and which have low importance, but they are all in the model. It uses ALL of them. You don't use PLS to select some to use, and discard the rest. This is different than what you may have learned about using PROC REG. This is a paradigm shift, and an important and valuable shift.

 


 


Also i am not able to find any good & easy article on PROC PLS() .


The documentation is a good place to start. Google finds plenty of introductory articles on Partial Least Squares.

--
Paige Miller
pathakvishal
Fluorite | Level 6

Hi Master Miller , 

 

I am doing some case study for the first time , Logistic regression  and i want to remove variables with high corelation for which you said USe proc PLS . I went through many articles about PROC PLS , but have some confusion ..

 

1 )  i was planning to do Proc corr(for continous variables) , remove varaibles with colinearity and then opt  factor analysis to get most effective variable so that it can be used in logistic regression (Proc Logistic).  My concern as you told me over thread to use PROC PLS so just want to know PLS is an step in place of PRoc Cor OR its a complete modelling step like proc Logistic .

 

2) if it is used in place PROC CORR , then do we have to use factor analysisin the next step or directly i can go ahead with PRoc Losgictic .

 

3) Which statement inside PROC PLS we must use to get  desired variables (spare the dumbness) ?

PaigeMiller
Diamond | Level 26
  1. PLS is a complete modelling method that accounts for the collinearity among your predictor variables
  2. No factor analysis, no PROC CORR, no logistic regression; if you use PROC PLS, you would need to create dummy variables of your responses to simulate a logistic regression model; it's not quite the same as logistic regression from a statistical point of view, but it does enable you to predict which category the data point belongs in. If you want an actual logistic version of Partial Least Squares, it is described in https://cedric.cnam.fr/fichiers/RC906.pdf, and there is an R package which appears to do logistic partial least squares regression at https://cran.r-project.org/web/packages/plsRglm/plsRglm.pdf. I am not aware of anyone programming this in SAS.
  3. PLS does not eliminate variables the way you keep asking. It uses ALL variables and assigns the one that are least important a loading value close to zero.
--
Paige Miller

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2063 views
  • 0 likes
  • 2 in conversation