Solved: Model Selection: Quantselect results vs individual R^2 results obtaine...

Neridhren · Posted 01-19-2024 03:38 PM

Hi

I am trying proc quantselect for the first time in SAS, with the following syntax:

proc quantselect data=data;
class classvar1;
model y=scalevar1*classvar1 scalevar1 classvar1 / details=all selection=stepwise (select=sl slentry=0.05 slstay=0.1 choose=adjr1);
run;

the model selected by proc quantselect is y=scalevar1*classvar1

Now if I run glm testing the 4 different models, ie:

Model 1 (selected through proc quantselect):

proc glm data=data;class classvar1;model y=scalevar1*classvar1 / effectsize solution;run;

Model 2

proc glm data=data;class classvar1;model y=scalevar1*classvar1 scalevar1 classvar1 / effectsize solution;run;

Model 3

proc glm data=data;class classvar1;model y=scalevar1*classvar1 scalevar1 / effectsize solution;run;

Model 4

proc glm data=data;class classvar1;model y=scalevar1 classvar1 / effectsize solution;run;

Then the R^2 value of model 1 (0.545252) is lower than that of Model 2 (0.570148). I am not sure about how then PROC QUANTSELECT selected model 1 over model 2. Could it be because quantselect doesn't use R^2? I based the model choice of the adjuster R for quantile regression, even though I am not sure what that is.

Any explanations would be greatly appreciated

Thanks!

Neri

sbxkoenk · Posted 01-21-2024 08:51 AM

Hello,

R-squared is not used for model selection in PROC QUANTREG (PROC QUANTSELECT).

The model selection can be based on the minimization of the average check loss (ACL) computed from the validation data.

As @Ksharp correctly points out, you are not "optimizing" mean prediction (conditional mean of the response),

but you are "optimizing" the fit of the entire conditional distribution.
(Although quantile regression is most often used to model specific conditional quantiles of the response, its full potential
lies in modeling the entire conditional distribution.)

SAS Global Forum 2017 -- Paper SAS525-2017
Five Things You Should Know about Quantile Regression
Robert N. Rodriguez and Yonggang Yao, SAS Institute Inc.
https://support.sas.com/resources/papers/proceedings17/SAS0525-2017.pdf
Fast Quantile Process Regression
https://communities.sas.com/t5/Research-and-Science-from-SAS/Fast-Quantile-Process-Regression/ta-p/7...

Koen

View solution in original post

Ksharp · Posted 01-21-2024 04:28 AM

proc quantselect is based on MEDIAN,
whereas , proc glm/glmselect is based on MEAN, if you want to build a quantile regression, just use proc quantselect.

sbxkoenk · Posted 01-21-2024 08:51 AM

Hello,

R-squared is not used for model selection in PROC QUANTREG (PROC QUANTSELECT).

The model selection can be based on the minimization of the average check loss (ACL) computed from the validation data.

As @Ksharp correctly points out, you are not "optimizing" mean prediction (conditional mean of the response),

but you are "optimizing" the fit of the entire conditional distribution.
(Although quantile regression is most often used to model specific conditional quantiles of the response, its full potential
lies in modeling the entire conditional distribution.)

SAS Global Forum 2017 -- Paper SAS525-2017
Five Things You Should Know about Quantile Regression
Robert N. Rodriguez and Yonggang Yao, SAS Institute Inc.
https://support.sas.com/resources/papers/proceedings17/SAS0525-2017.pdf
Fast Quantile Process Regression
https://communities.sas.com/t5/Research-and-Science-from-SAS/Fast-Quantile-Process-Regression/ta-p/7...

Koen

gp4 · Posted 01-22-2024 04:37 PM

If means are appropriate, try glmselect. If you want to model the median or other quantile, then quantreg.

SteveDenham · Posted 01-30-2024 10:37 AM

Plenty of good advice has already been given. I do want to point out something about R^2 that is happening when you run GLM on the different models. For a given dataset, the more independent terms you have in the model, the higher the R^2 value. I would have been really, really surprised if Model 1 had given you a larger R^2 than Model 2.

SteveDenham

Neridhren · Posted 01-30-2024 10:42 AM

Hi Steve

Thanks for the answer, and would love it if you could go a little deeper into your comment. As suggested I repeated my analysis using glmselect, and again model1 is chosen over the rest, but model2 has a higher R^2. So what you said is relevant, but I'd appreciate it if you could explain a bit more.

The other piece of information to add is that glm of model 1 gives a significant effect for scalevar1*classvar1, whereas glm of model 2 is only significant for the main effect of scalevar1.

Thanks

Neri

SteveDenham · Posted 01-30-2024 10:56 AM

Any introductory text on regression analysis will walk you through the algebra to prove that increasing the number of predictors will increase the R^2. See this YouTube video for a quick walk through https://www.youtube.com/watch?v=CGQpi580sZM

The video goes on to talk about the adjusted R^2, which penalizes for the number of predictors.

When it comes to multiple regression and model selection, there is a lot of literature out there. It turns out that almost every algorithm for model selection has at least some drawback, but it is worse for stepwise and all possible subset methods. Good luck.

SteveDenham

Neridhren · Posted 01-30-2024 11:45 AM

Awesome! Great resource. Thanks again
Neri

Model Selection: Quantselect results vs individual R^2 results obtained by glm of each model

Re: Model Selection: Quantselect results vs individual R^2 results obtained by glm of each model

Re: Model Selection: Quantselect results vs individual R^2 results obtained by glm of each model

Re: Model Selection: Quantselect results vs individual R^2 results obtained by glm of each model

Re: Model Selection: Quantselect results vs individual R^2 results obtained by glm of each model

Re: Model Selection: Quantselect results vs individual R^2 results obtained by glm of each model

Re: Model Selection: Quantselect results vs individual R^2 results obtained by glm of each model

Re: Model Selection: Quantselect results vs individual R^2 results obtained by glm of each model

Re: Model Selection: Quantselect results vs individual R^2 results obtained by glm of each model

Registration is open