BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jolly
Calcite | Level 5

Hello,

I was wondering if you can get the R-Squared(predicted) values for models in SAS 9.4 for regression?

Thank you,

Jeff S. O.

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Here is a way to get the predicted r-squared with proc reg:

 

proc reg data=sashelp.class outest=outest plots=none;
model weight = height / rsquare press sse adjrsq;
run;
quit;

data outestPlus;
set outest;
_PRSQ_ = 1 - _PRESS_ * (1 - _RSQ_)/_SSE_;
label _PRSQ_ = "Predicted r-squared";
run;

proc print data=outestPlus label; 
var _RSQ_ _ADJRSQ_ _PRSQ_;
run;
PG

View solution in original post

9 REPLIES 9
Reeza
Super User

What proc are you using? I'm fairly sure their output by default in proc REG. 

Jolly
Calcite | Level 5

I'm willing to use any of the regression procedures for this.  R-Squared(predicted) is not to be confused with R-Squared(adj) or normal R-Squared.  R-Squared(predicted) is based on the PRESS statistic.  I am trying to get R-Squared(predicted) values for each model as you could for the Cp values.

Thank you,

Jeff S. O.

Ksharp
Super User
Then check PROC GLMSELECT which contains all the good fit statistic : AIC , BIC , PRESS , R-square , Cp ..........


PGStats
Opal | Level 21

Here is a way to get the predicted r-squared with proc reg:

 

proc reg data=sashelp.class outest=outest plots=none;
model weight = height / rsquare press sse adjrsq;
run;
quit;

data outestPlus;
set outest;
_PRSQ_ = 1 - _PRESS_ * (1 - _RSQ_)/_SSE_;
label _PRSQ_ = "Predicted r-squared";
run;

proc print data=outestPlus label; 
var _RSQ_ _ADJRSQ_ _PRSQ_;
run;
PG
Jolly
Calcite | Level 5

Thank you,

I do have another question however, which I may need to start a new thread for.  From what I see so far, Proc PHREG is the only procedure that will do the Best Subset Selection process for determining what predictors to include in a model.  The question I have is, can you get other values such as Cp or the different R-Squared values along with the Chi-square score that is presented with each model?

Thank you,
Jeff S. O.

PGStats
Opal | Level 21

Simple answer: No. Best subset model selection is limited to certain criteria that are "easy" to compute. Unless there is some mathematical shortcut, there are too many candidate subsets for the best subsets approach to be practical.  PHREG is for fitting survival models. Proc reg does best subset selection when METHOD = RSQUARE, ADJRSQ, or CP. PRESS and thus predicted r-squared is expensive to calculate, so I wouldn't expect best subset model selection based on that criterion.

 

For modern approaches to variable selection with large (long and wide) datasets, look at proc glmselect.

PG
Jolly
Calcite | Level 5

A desired example is output from MiniTab.  I do have MiniTab, but it is my last resort for various reasons.  Here is the MiniTab output:

 

Response is BP

                                                                                  W         S
                                                                                   e       P t
                                                                                   i        u r
                                                                              A g B D l  e
                                            Mallows                      g h S u s  s
Vars    R-Sq      R-Sq(adj)            Cp              S      e t A  r e  s
1          90.3      89.7              312.8       1.7405      X
1          75.0      73.6              829.1       2.7903      X
2          99.1      99.0              15.1       0.53269      X X
2          92.0      91.0              256.6       1.6246      X X
3          99.5       99.4             6.4         0.43705      X X X
3          99.2       99.1            14.1        0.52012      X X X
4          99.5       99.4              6.4        0.42591      X X X X
4          99.5       99.4            7.1          0.43500      X X X X
5          99.6       99.4           7.0           0.42142      X X X X X
5          99.5       99.4            7.7          0.43078      X X X X X
6          99.6       99.4            7.0          0.40723      X X X X X X

 

 

Thank you,

Jeff S. O.

 

Ksharp
Super User

It sounds that you are doing FORMARD selection method.

Why could have two obs for each variable ?

 

 

 


proc glmselect data=sashelp.baseball plot=CriterionPanel;
class league division;
model logSalary = nAtBat nHits nHome nRuns nRBI nBB
yrMajor crAtBat crHits crHome crRuns crRbi
crBB league division nOuts nAssts nError
/ selection=forward(select=SL) stats=all;
run;

x.png

Rizki
Calcite | Level 5

But how if we want to get the R square from mixed model?
Do we need the same statement as you describ before?
Many thanks

my mixed model is:

data Mole;
input study dosage tgp;
datalines;
1 0 21.2
1 25 0.24
1 25 0.75
2 0 7.3
2 50 6.46
2 50 6.42
3 0 8.1
3 54 4.2
3 0 3.9
3 54 0.8
4 0 7.49
4 53 7.63
4 53 6.13
4 53 5.09
4 53 6.06
5 0 1.84
5 100 0.51
5 0 3.94
5 100 0.49
;
/* linear */
proc mixed data=Mole;
class study;
model tgp=dosage/solution;
random study;
run;

/* Qudratic */
proc mixed data=Mole;
class study;
model tgp=dosage|dosage/solution;
random study;
run;

 

Could you please help me how to get the RMSE and R-square value from this model trough SAS?

 Many thanks in advance for your help. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 34207 views
  • 5 likes
  • 5 in conversation