Hi
I am trying proc quantselect for the first time in SAS, with the following syntax:
proc quantselect data=data;
class classvar1;
model y=scalevar1*classvar1 scalevar1 classvar1 / details=all selection=stepwise (select=sl slentry=0.05 slstay=0.1 choose=adjr1);
run;
the model selected by proc quantselect is y=scalevar1*classvar1
Now if I run glm testing the 4 different models, ie:
Model 1 (selected through proc quantselect):
proc glm data=data;class classvar1;model y=scalevar1*classvar1 / effectsize solution;run;
Model 2
proc glm data=data;class classvar1;model y=scalevar1*classvar1 scalevar1 classvar1 / effectsize solution;run;
Model 3
proc glm data=data;class classvar1;model y=scalevar1*classvar1 scalevar1 / effectsize solution;run;
Model 4
proc glm data=data;class classvar1;model y=scalevar1 classvar1 / effectsize solution;run;
Then the R^2 value of model 1 (0.545252) is lower than that of Model 2 (0.570148). I am not sure about how then PROC QUANTSELECT selected model 1 over model 2. Could it be because quantselect doesn't use R^2? I based the model choice of the adjuster R for quantile regression, even though I am not sure what that is.
Any explanations would be greatly appreciated
Thanks!
Neri
Hello,
R-squared is not used for model selection in PROC QUANTREG (PROC QUANTSELECT).
The model selection can be based on the minimization of the average check loss (ACL) computed from the validation data.
As @Ksharp correctly points out, you are not "optimizing" mean prediction (conditional mean of the response),
but you are "optimizing" the fit of the entire conditional distribution.
(Although quantile regression is most often used to model specific conditional quantiles of the response, its full potential
lies in modeling the entire conditional distribution.)
Koen
Hello,
R-squared is not used for model selection in PROC QUANTREG (PROC QUANTSELECT).
The model selection can be based on the minimization of the average check loss (ACL) computed from the validation data.
As @Ksharp correctly points out, you are not "optimizing" mean prediction (conditional mean of the response),
but you are "optimizing" the fit of the entire conditional distribution.
(Although quantile regression is most often used to model specific conditional quantiles of the response, its full potential
lies in modeling the entire conditional distribution.)
Koen
Plenty of good advice has already been given. I do want to point out something about R^2 that is happening when you run GLM on the different models. For a given dataset, the more independent terms you have in the model, the higher the R^2 value. I would have been really, really surprised if Model 1 had given you a larger R^2 than Model 2.
SteveDenham
Hi Steve
Thanks for the answer, and would love it if you could go a little deeper into your comment. As suggested I repeated my analysis using glmselect, and again model1 is chosen over the rest, but model2 has a higher R^2. So what you said is relevant, but I'd appreciate it if you could explain a bit more.
The other piece of information to add is that glm of model 1 gives a significant effect for scalevar1*classvar1, whereas glm of model 2 is only significant for the main effect of scalevar1.
Thanks
Neri
Any introductory text on regression analysis will walk you through the algebra to prove that increasing the number of predictors will increase the R^2. See this YouTube video for a quick walk through https://www.youtube.com/watch?v=CGQpi580sZM
The video goes on to talk about the adjusted R^2, which penalizes for the number of predictors.
When it comes to multiple regression and model selection, there is a lot of literature out there. It turns out that almost every algorithm for model selection has at least some drawback, but it is worse for stepwise and all possible subset methods. Good luck.
SteveDenham
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.