04-07-2017 09:06 AM
Hi I am performing logistic regression and i am trying to reduce the continous variable for my model .
I have applied PRoc corr and applied the filter of corelation as x <-0.3 / x>0.3(conditional formatting) in excel. now how should i select the varibale for final model.
Refer to excel sheet attached . Please also share the process to reduce variable when you have lots of variables with corelation
04-07-2017 09:10 AM - edited 04-07-2017 09:16 AM
Try using either PROC PRINCOMP or better yet PROC PLS to reduce the dimensionality of your predictor variables.
Don't use PROC CORR for this purpose.
04-07-2017 10:01 AM
But , Master Miller , it will also help in reducing collinearity as well .Also , PRoc PLS is better for reduction of continous variables ?
That's one of the major benefits of PROC PLS is that it provides better estimates of the model coefficients and better estimates of the predicted values (better in this case meaning lower mean squared error of the estimates) in the presence of collinearity among the predictor variables, compared to ordinary least squares regression.
04-07-2017 03:05 PM
Hi Master Miller ,
I hope we use PROC PLS before modelling (PROC Logistic) for continous varibale reduction . Also i am not able to find any good & easy article on PROC PLS() . If posssible , can you please share the link of any stuff like that. Thanx in advance !!!
04-07-2017 03:28 PM
Maybe we need to take a step back.
PLS does not reduce the number of original predictor variables. You let PLS determine which variables have high importance, and which have low importance, but they are all in the model. It uses ALL of them. You don't use PLS to select some to use, and discard the rest. This is different than what you may have learned about using PROC REG. This is a paradigm shift, and an important and valuable shift.
Also i am not able to find any good & easy article on PROC PLS() .
The documentation is a good place to start. Google finds plenty of introductory articles on Partial Least Squares.
04-10-2017 02:31 PM
Hi Master Miller ,
I am doing some case study for the first time , Logistic regression and i want to remove variables with high corelation for which you said USe proc PLS . I went through many articles about PROC PLS , but have some confusion ..
1 ) i was planning to do Proc corr(for continous variables) , remove varaibles with colinearity and then opt factor analysis to get most effective variable so that it can be used in logistic regression (Proc Logistic). My concern as you told me over thread to use PROC PLS so just want to know PLS is an step in place of PRoc Cor OR its a complete modelling step like proc Logistic .
2) if it is used in place PROC CORR , then do we have to use factor analysisin the next step or directly i can go ahead with PRoc Losgictic .
3) Which statement inside PROC PLS we must use to get desired variables (spare the dumbness) ?
04-10-2017 03:46 PM - edited 04-10-2017 03:52 PM