Hi all,
Using proc reg, we can easily obtain the CI for beta parameters. One of the output now I would like to obtain the CI, that is the root MSE. Root MSE is an estimate for the standard deviation of the measurement errors.
Proc reg data=lm1 plots=none;
Model y=x/clb;
Run;
Could you tell me how to obtain the CI for it?
Kind regards,
Trung Dung.
I'm not aware of a way to compute the confidence interval for root MSE in SAS. Apparently, Google is not aware of a way to do this either.
You could always use bootstrap or jackknife methods to obtain confidence intervals for any estimate.
I'm not aware of a way to compute the confidence interval for root MSE in SAS. Apparently, Google is not aware of a way to do this either.
You could always use bootstrap or jackknife methods to obtain confidence intervals for any estimate.
Yes, I did google and I cannot find anything like that, but at least I know that it is not an available option in SAS.
Thank you for your reply @PaigeMiller.
Hi @trungdungtran,
As you say correctly, Root MSE is an estimate for the standard deviation s of the error term in the linear regression model. So, given this point estimate, it's a reasonable question to ask for a confidence interval estimate -- for s, not for Root MSE. (Edit: Note that this is perfectly analogous to the CIs you mentioned. They are for the true parameters, not for the estimates.)
Under the usual normality assumption, i.e., the error term has a normal distribution with mean 0 and standard deviation s, an exact confidence interval for s can be computed easily from the error sum of squares, the corresponding degrees of freedom (see PROC REG output) and quantiles of the corresponding chi-square distribution.
Example:
ods output anova=anova(keep=source df ss where=(source='Error'))
fitstatistics=fs(keep=label1 nvalue1 where=(label1='Root MSE') rename=(nvalue1=Root_MSE));
proc reg data=sashelp.class plots=none;
model weight=height / clb;
quit;
%let alpha=0.05;
data _null_;
set anova;
set fs;
lcl=sqrt(ss/cinv(1-&alpha/2,df)); /* lower (1-&alpha)*100% confidence limit for sigma */
ucl=sqrt(ss/cinv(&alpha/2,df)); /* upper (1-&alpha)*100% confidence limit for sigma */
put (lcl Root_MSE ucl) (=6.3);
run;
Result:
lcl=8.424 Root_MSE=11.226 ucl=16.830
Reference: S.R. Searle, Linear Models (cf. page 414, formula 59).
Still not convinced? Perform a simulation:
%let alpha=0.05;
%let b0=-143; /* (arbitrary) intercept */
%let b1=3.9; /* (arbitrary) slope */
%let sigma=12.34; /* (arbitrary) error standard deviation */
/* Simulate 100000 datasets with WGT values calculated from HEIGHT
using a linear regression model */
data sim;
call streaminit(27182818);
set sashelp.class(keep=height);
do i=1 to 100000;
wgt=&b0+&b1*height+rand('norm',0,&sigma);
output;
end;
run;
proc sort data=sim;
by i;
run;
/* Perform regression analyses */
ods exclude all;
ods noresults;
ods output anova=anova(keep=i source df ss where=(source='Error'));
proc reg data=sim plots=none;
by i;
model wgt=height;
quit;
ods exclude none;
ods results;
/* Compute (1-&alpha)*100% confidence intervals for &sigma
and determine if true value is covered or not (c=1 | c=0) */
data chk;
set anova;
lcl=sqrt(ss/cinv(1-&alpha/2,df));
ucl=sqrt(ss/cinv(&alpha/2,df));
c=(lcl<=&sigma<=ucl);
run;
/* Estimate coverage probability */
ods exclude BinomialTest;
proc freq data=chk;
tables c / binomial(level='1');
run;
Result:
Cumulative Cumulative c Frequency Percent Frequency Percent ------------------------------------------------------ 0 5025 5.03 5025 5.03 1 94975 94.98 100000 100.00 Binomial Proportion c = 1 Proportion 0.9498 ASE 0.0007 95% Lower Conf Limit 0.9484 95% Upper Conf Limit 0.9511 Exact Conf Limits 95% Lower Conf Limit 0.9484 95% Upper Conf Limit 0.9511 Sample Size = 100000
The result is what you'd expect with a true 95% coverage probability.
Thank @FreelanceReinh, you understand my question more than what I wrote.
Actually, I am learning macro to simulate data and assess the performance of the model. I start with a linear regression model with one covariate. Three parameters are involved: intercept, slope, and sigma. I can do for beta's but for sigma I only know that root MSE is an estimate for sigma.
Now you answer helps me to obtain the CI for sigma also. From that, I can compute the coverage probability, what I had done for beta's.
I appreciate your help!
You're welcome, @trungdungtran.
@trungdungtran wrote:
Actually, I am learning macro to simulate data and assess the performance of the model.
Macros and simulation? Make sure you read Rick Wicklin's blog post "Simulation in SAS: The slow way or the BY way" the sooner the better. (Quote: "Never use a macro loop to create a simulation.") There's also a more specific article in Rick's blog: "Simulate data for a linear regression model".
Thank you @FreelanceReinh for suggestion about the blog post.
I am learning macro so I take this as an exercise for me to practice.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.