BookmarkSubscribeRSS Feed
ombra3d
Calcite | Level 5

Hi everyone!

I'm running a weighted linear regression with the following code;

 

proc reg data=mydata ;
weight variance;
model sur_estim_y= sur_estim_x;
run;

 

I want to obtain the 95%CI of the Rsquare estimate (or the standard dev of the Rsquare estimate)

In the manual I saw the possibility to compute 95%CI for the parameter estimates but not for the Rsquare.

Does anyone know how to do it? 

Thanks a lot

10 REPLIES 10
PaigeMiller
Diamond | Level 26

This is what I found: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjngf_OgKT9A...

 

R-squared is simple the square of the correlation coefficient, which is what this paper gets confidence intervals for.

--
Paige Miller
ombra3d
Calcite | Level 5

Thanks a lot! 

I used proc corr and obtained the same Rsquare as from proc reg (corr coefficent **2) 

However i'm not sure I how to use standard error from the corr coefficent  to compute 95%CI for the Rsquare estimates

PaigeMiller
Diamond | Level 26

You obtain the confidence limits for the correlation coefficient r as shown in the article. Then you square them to get the confidence limits for R-squared.

--
Paige Miller
Ksharp
Super User

Yes. RSquare is just the square of correlation between Y and Yhat(Y predicted value).

You could use fisher options of PROC CORR to get its CI .

 

proc reg data=sashelp.class  ;
weight age;
model weight= height;
output out=want predicted=p;
quit;




ods output FisherPearsonCorr=FisherPearsonCorr;
proc corr data=want outp=outp fisher;
var weight;
with p;
weight age;
run;

data FisherPearsonCorr;
 set FisherPearsonCorr;
RSquare=Corr**2;
lcl_rsquare=lcl**2;
ucl_rsquare=ucl**2;
run;

proc print noobs;run;

Ksharp_0-1676897747357.png

 

JoakimE
Obsidian | Level 7

I like the idea of calculating a confidence interval via the correlation coefficient, like Ksharp explained. However, after trying it out on a concrete example, I find that it does not correspond to another method to calculate the CI via calculating a standard error for the R2 estimate as explained here (for example):

 

https://agleontyev.netlify.app/post/2019-09-05-calculating-r-squared-confidence-intervals/

 

Notably, this method will result in a symetric CI around the R2 estimate, whereas the method via the correlation coefficient will not yield a symetric CI. So I am a bit confused as to what method should take precedence.

 

If anyone can shed light on this it would be greatly appreciated.

 

KR,

Joakim

ballardw
Super User

Something else that might be of importance with the Proc Corr approach is weights are treated slightly differently than in Proc Reg.

 

In Proc Reg

Values of the weight variable must be nonnegative. If an observation’s weight is zero, the observation 
is deleted from the analysis. If a weight is negative or missing, it is set to zero, and the
observation is excluded from the analysis.

In Proc Corr

The observations with missing weights are excluded from the analysis. 
By default, for observations with nonpositive weights, weights are set to zero and the observations
are included in the analysis.
You can use the EXCLNPWGT option to exclude observations with negative or zero weights from the analysis.

If you have non-positive weights it looks like you need the EXCLNPWGT option to match the R-square from Reg

Ksharp
Super User

Sorry. I have no idea about it.
If you want symmetric CI, you could try option " fisher(BIASADJ= no) ". But that is still not get exact symmetric CI.

proc reg data=sashelp.heart(obs=100)  ;
weight ageatstart;
model weight= height Diastolic Systolic;
output out=want predicted=p;
quit;




ods output FisherPearsonCorr=FisherPearsonCorr;
proc corr data=want outp=outp fisher(BIASADJ= no);
var weight;
with p;
weight ageatstart;
run;

data FisherPearsonCorr;
 set FisherPearsonCorr;
RSquare=Corr**2;
lcl_rsquare=lcl**2;
ucl_rsquare=ucl**2;

_lcl2=lcl_rsquare-RSquare;
_ucl2=ucl_rsquare-RSquare;

run;

proc print noobs;run;

Ksharp_0-1714202076041.png

 





Or you could try Bootstrap Method to get RSquare CI.

Maybe @Rick_SAS @StatDave knew something you want.

Rick_SAS
SAS Super FREQ

>  this method will result in a symmetric CI around the R2 estimate, whereas the method via the correlation coefficient will not yield a symmetric CI. So I am a bit confused as to what method should take precedence.

 

Note that R-squared is always in the interval [0, 1], so you would not expect a symmetric CI for the R-squared statistic. The true sampling distribution of R-squared is not symmetric. The formula you quote (in Cohen's book, but actually from Olkin & Finn, 1995, which I have not read) is a large-sample asymptotic approximation that assumes symmetry for the R-squared distribution. I would not use it unless the sample R-squared is far from 0 and 1, and the sample size is large.

 

For the case of one regressor, you can use the connection between the regression R-squared value and the (squared) correlation coefficient. I think KSharp has the right idea.

JoakimE
Obsidian | Level 7

Thank you Rick for a clear and logical explanation. It makes sense to me.

 

And thanks Ksharp for neat solution to the problem.

 

KR,

Joakim

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2434 views
  • 3 likes
  • 6 in conversation