BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Neridhren
Fluorite | Level 6

Hi

I am trying proc quantselect for the first time in SAS, with the following syntax:

proc quantselect data=data;
class classvar1;
model y=scalevar1*classvar1 scalevar1 classvar1 / details=all selection=stepwise (select=sl slentry=0.05 slstay=0.1 choose=adjr1);
run;

the model selected by proc quantselect is y=scalevar1*classvar1

 

Now if I run glm testing the 4 different models, ie:

 

Model 1 (selected through proc quantselect):

proc glm data=data;class classvar1;model y=scalevar1*classvar1 / effectsize solution;run;

 

Model 2

proc glm data=data;class classvar1;model y=scalevar1*classvar1 scalevar1 classvar1 / effectsize solution;run;

 

Model 3

proc glm data=data;class classvar1;model y=scalevar1*classvar1 scalevar1 / effectsize solution;run;

 

Model 4

proc glm data=data;class classvar1;model y=scalevar1 classvar1 / effectsize solution;run;

 

Then the R^2 value of model 1 (0.545252) is lower than that of Model 2 (0.570148). I am not sure about how then PROC QUANTSELECT selected model 1 over model 2. Could it be because quantselect doesn't use R^2? I based the model choice of the adjuster R for quantile regression, even though I am not sure what that is.

Any explanations would be greatly appreciated

Thanks!

Neri

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

Hello,

 

R-squared is not used for model selection in PROC QUANTREG (PROC QUANTSELECT).

The model selection can be based on the minimization of the average check loss (ACL) computed from the validation data. 

 

As @Ksharp correctly points out, you are not "optimizing" mean prediction (conditional mean of the response),

but you are "optimizing" the fit of the entire conditional distribution.
(Although quantile regression is most often used to model specific conditional quantiles of the response, its full potential
lies in modeling the entire conditional distribution.)

 

Koen

View solution in original post

7 REPLIES 7
Ksharp
Super User
proc quantselect is based on MEDIAN,
whereas , proc glm/glmselect is based on MEAN, if you want to build a quantile regression, just use proc quantselect.
sbxkoenk
SAS Super FREQ

Hello,

 

R-squared is not used for model selection in PROC QUANTREG (PROC QUANTSELECT).

The model selection can be based on the minimization of the average check loss (ACL) computed from the validation data. 

 

As @Ksharp correctly points out, you are not "optimizing" mean prediction (conditional mean of the response),

but you are "optimizing" the fit of the entire conditional distribution.
(Although quantile regression is most often used to model specific conditional quantiles of the response, its full potential
lies in modeling the entire conditional distribution.)

 

Koen

gp4
Fluorite | Level 6 gp4
Fluorite | Level 6
If means are appropriate, try glmselect. If you want to model the median or other quantile, then quantreg.
SteveDenham
Jade | Level 19

Plenty of good advice has already been given. I do want to point out something about R^2 that is happening when you run GLM on the different models. For a given dataset, the more independent terms you have in the model, the higher the R^2 value. I would have been really, really surprised if Model 1 had given you a larger R^2 than Model 2.

 

SteveDenham

Neridhren
Fluorite | Level 6

Hi Steve

Thanks for the answer, and would love it if you could go a little deeper into your comment. As suggested I repeated my analysis using glmselect, and again model1 is chosen over the rest, but model2 has a higher R^2.  So what you said is relevant, but I'd appreciate it if you could explain a bit more.

The other piece of information to add is that glm of model 1 gives a significant effect for scalevar1*classvar1, whereas glm of model 2 is only significant for the main effect of scalevar1.

Thanks

Neri

SteveDenham
Jade | Level 19

Any introductory text on regression analysis will walk you through the algebra to prove that increasing the number of predictors will increase the R^2. See this YouTube video for a quick walk through https://www.youtube.com/watch?v=CGQpi580sZM 

 

The video goes on to talk about the adjusted R^2, which penalizes for the number of predictors. 

 

When it comes to multiple regression and model selection, there is a lot of literature out there. It turns out that almost every algorithm for model selection has at least some drawback, but it is worse for stepwise and all possible subset methods.  Good luck.

 

SteveDenham

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1112 views
  • 5 likes
  • 5 in conversation