Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Re: bootstrapping OLS regression confidence intervals and applyto indi...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 07-28-2023 11:01 AM
(2469 views)

I have adapted the bootstrap example from Oct 24, 2018 Rick Wicklin DO Loop and it successfully runs with my data. However, I am not sure how

to generate the 95% confidence intervals for each predicted observation?

I have attached the program and the dataset. Please advise. Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes, you can get the bootstrap CLM at each value of the explanatory variable.

First, read about how to perform case resampling in the blog post "Bootstrap regression estimates: Case resampling":

Then, you can read about how to obtain bootstrap estimates of the CLM in the blog post "Bootstrap confidence intervals for the predicted mean in a regression model"

For completeness, the following program implements the bootstrap method to estimate the CLM at a set of points that you can specify:

```
/* choose data to analyze. Optionally, sort. */
data Sample(keep=x y);
/* rename vars to X and Y to make roles easier to understand */
set Sashelp.Class(rename=(Weight=Y Height=X));
run;
/* OPTIONAL: Sort by X variable for future plotting */
proc sort data=Sample; by X; run;
/* create a scoring set */
data ScoreData;
do X = 50 to 75 by 5;
ScoreId + 1; /* include an ID variable */
output;
end;
run;
/* 1. Compute the statistics on the original data. Estimate CLM by using the asymptotic formula.
Concatenate the scoring data and use the "missing response trick"
https://blogs.sas.com/content/iml/2014/02/17/the-missing-value-trick-for-scoring-a-regression-model.html
*/
data SampleAndScore;
set Sample ScoreData(in=S);
IsScore = S;
run;
proc reg data=SampleAndScore plots(only)=FitPlot(nocli);
model Y = X;
/* output the theoretical CI for each scoring observation for later comparison */
output out=RegOut(where=(IsScore=1)) Pred=Pred LCLM=LowerP UCLM=UpperP;
run; quit;
/* 2. Generate many bootstrap samples by using PROC SURVEYSELECT */
%let NumSamples = 5000; /* number of bootstrap resamples */
proc surveyselect data=Sample NOPRINT seed=1
out=BootCases(rename=(Replicate=SampleID))
method=urs /* resample with replacement */
samprate=1 /* each bootstrap sample has N observations */
/* OUTHITS (optional) use OUTHITS option to suppress the frequency var */
reps=&NumSamples; /* generate NumSamples bootstrap resamples */
run;
/* 3. Compute the statistics for each bootstrap sample. */
proc reg data=BootCases outest=PE noprint; /* fit the model and save the parameter estimates for each sample */
by SampleID;
freq NumberHits;
model Y = X;
run;quit;
proc score data=ScoreData score=PE type=parms predict /* score each model on the scoring data */
out=BootOut(rename=(MODEL1=Pred));
by SampleID;
var X;
run;
/* 4. Compute the bootstrap 95% prediction CL for each row in the score data */
/* Find the 2.5th and 97.5th percentiles for the bootstrap statistics each X value */
proc univariate data=BootOut noprint;
class ScoreID;
var Pred;
output out=BootCL mean=BootMean pctlpre=Pctl_ pctlpts=2.5, 97.5;
run;
/* merge the REG output and the bootstrap CL */
data All;
merge RegOut BootCL;
by ScoreID;
run;
proc print data=All;
var ScoreId X BootMean Pctl_2_5 Pctl_97_5;
run;
/* visualize the bootstrap mean and CLM */
title "Bootstrap CLM";
title2 "&NumSamples Bootstrap Samples";
proc sgplot data=All;
series x=x y=Pred / legendlabel="OLS Pred" lineattrs=(color=gray);
scatter x=x y=BootMean / legendlabel="Boot Pred" markerattrs=(symbol=X);
highlow x=x low=Pctl_2_5 high=Pctl_97_5 / legendlabel="Boot CLM"
lowcap=serif highcap=serif lineattrs=GraphError;
yaxis label="Y";
run;
```

10 REPLIES 10

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

There is no reason to hide programs in attachments unless it is 100's of lines long. Just paste into a text box opened on the forum with the </> icon that appears above the message window.

******************************************************************************************************************************************************************************; **************************************************SAS USER Fluvio1*******SAS program: Fluvio_Reg_Boot_CLM_1a.sas ********************************************************; ****************************************************************Program for SAS support 07282020*****************************************************************************; ************************************************Compute OLS regression and develop upper and lower 95% confidence limits by bootstrapping*********************************; **************************************************Code below adapted from Oct 24, 2018 Rick Wicklin DO Loop ************************************************************; %web_drop_table(WORK.IMPORT); FILENAME REFFILE '/home/u43067642/Irreff_gen/FInal_POLS_digitizing_error/NEW_DATA_1.xlsx'; PROC IMPORT DATAFILE=REFFILE DBMS=XLSX OUT=WORK.New_data_1; GETNAMES=YES; RUN; proc contents data=New_data_1; run; /* 1. compute the statistics on the original data */ proc reg data=new_data_1 plot ; model Y = X / CLB covb ; /* original estimates */ run; quit; title "Bootstrap Distribution of Regression Estimates"; title2 "Case Resampling"; %let NumSamples = 5000; /* number of bootstrap resamples */ %let IntEst =0.05675; /* original estimates for later visualization */ %let XEst = 0.99636; /* 2. Generate many bootstrap samples by using PROC SURVEYSELECT */ proc surveyselect data=new_data_1 NOPRINT seed=1 out=BootCases(rename=(Replicate=SampleID)) method=urs /* resample with replacement */ samprate=1 /* each bootstrap sample has N observations */ /* OUTHITS use OUTHITS option to suppress the frequency var */ reps=&NumSamples; /* generate NumSamples bootstrap resamples */ run; /* 3. Compute the statistic for each bootstrap sample */ proc reg data=BootCases outest=PEBoot noprint; by SampleID; freq NumberHits; model y=x ; run;quit; /*Data new_data_1; length EXPboot_x EXPboot_y 8; set bootcases; if y='.' then do; delete; end; EXPboot_x=EXP(x); EXPboot_y=EXP(y); run;*/ /* 4. Visualize bootstrap distribution */ proc sgplot data=PEBoot; label Intercept = "Estimate of Intercept" X = "Estimate of Coefficient of X"; scatter x=Intercept y=X / markerattrs=(Symbol=CircleFilled) transparency=0.7; /* Optional: draw reference lines at estimates for original data */ refline &IntEst / axis=x lineattrs=(color=blue); refline &XEst / axis=y lineattrs=(color=blue); xaxis grid; yaxis grid; run; proc stdize data=PEBoot vardef=N pctlpts=5 95 PctlMtd=ORD_STAT outstat=Pctls; var Intercept X; run; proc report data=Pctls nowd; where _type_ =: 'P'; label _type_ = 'Confidence Limit'; columns ('Bootstrap Confidence Intervals (B=5000)' _ALL_); run;

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Should calling @Rick_SAS

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I am not sure what "should calling Rick......" means ??

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

KSharp likes to alert experts that there is a post that might interest them. He uses an "@ mention" syntax to include their name in a thread, which sends an email to the expert. So KSharp has already "called" me by mentioning my name.

I think we need to clarify what you are asking for. You can certainly get a 95% bootstrap CI for the predicted MEAN, but I don't think the bootstrap is the right tool to get a CI for individual responses. Which are you trying to find? In PROC REG, this is the difference between the popular LCLM/UCLM options (CL for the conditional mean) and the rarely used LCLI/UCLI options. See SAS Help Center: Predicted and Residual Values

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes, you can get the bootstrap CLM at each value of the explanatory variable.

First, read about how to perform case resampling in the blog post "Bootstrap regression estimates: Case resampling":

Then, you can read about how to obtain bootstrap estimates of the CLM in the blog post "Bootstrap confidence intervals for the predicted mean in a regression model"

For completeness, the following program implements the bootstrap method to estimate the CLM at a set of points that you can specify:

```
/* choose data to analyze. Optionally, sort. */
data Sample(keep=x y);
/* rename vars to X and Y to make roles easier to understand */
set Sashelp.Class(rename=(Weight=Y Height=X));
run;
/* OPTIONAL: Sort by X variable for future plotting */
proc sort data=Sample; by X; run;
/* create a scoring set */
data ScoreData;
do X = 50 to 75 by 5;
ScoreId + 1; /* include an ID variable */
output;
end;
run;
/* 1. Compute the statistics on the original data. Estimate CLM by using the asymptotic formula.
Concatenate the scoring data and use the "missing response trick"
https://blogs.sas.com/content/iml/2014/02/17/the-missing-value-trick-for-scoring-a-regression-model.html
*/
data SampleAndScore;
set Sample ScoreData(in=S);
IsScore = S;
run;
proc reg data=SampleAndScore plots(only)=FitPlot(nocli);
model Y = X;
/* output the theoretical CI for each scoring observation for later comparison */
output out=RegOut(where=(IsScore=1)) Pred=Pred LCLM=LowerP UCLM=UpperP;
run; quit;
/* 2. Generate many bootstrap samples by using PROC SURVEYSELECT */
%let NumSamples = 5000; /* number of bootstrap resamples */
proc surveyselect data=Sample NOPRINT seed=1
out=BootCases(rename=(Replicate=SampleID))
method=urs /* resample with replacement */
samprate=1 /* each bootstrap sample has N observations */
/* OUTHITS (optional) use OUTHITS option to suppress the frequency var */
reps=&NumSamples; /* generate NumSamples bootstrap resamples */
run;
/* 3. Compute the statistics for each bootstrap sample. */
proc reg data=BootCases outest=PE noprint; /* fit the model and save the parameter estimates for each sample */
by SampleID;
freq NumberHits;
model Y = X;
run;quit;
proc score data=ScoreData score=PE type=parms predict /* score each model on the scoring data */
out=BootOut(rename=(MODEL1=Pred));
by SampleID;
var X;
run;
/* 4. Compute the bootstrap 95% prediction CL for each row in the score data */
/* Find the 2.5th and 97.5th percentiles for the bootstrap statistics each X value */
proc univariate data=BootOut noprint;
class ScoreID;
var Pred;
output out=BootCL mean=BootMean pctlpre=Pctl_ pctlpts=2.5, 97.5;
run;
/* merge the REG output and the bootstrap CL */
data All;
merge RegOut BootCL;
by ScoreID;
run;
proc print data=All;
var ScoreId X BootMean Pctl_2_5 Pctl_97_5;
run;
/* visualize the bootstrap mean and CLM */
title "Bootstrap CLM";
title2 "&NumSamples Bootstrap Samples";
proc sgplot data=All;
series x=x y=Pred / legendlabel="OLS Pred" lineattrs=(color=gray);
scatter x=x y=BootMean / legendlabel="Boot Pred" markerattrs=(symbol=X);
highlow x=x low=Pctl_2_5 high=Pctl_97_5 / legendlabel="Boot CLM"
lowcap=serif highcap=serif lineattrs=GraphError;
yaxis label="Y";
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

It doesn't make any difference, but if you want the confidence intervals on the original data scale, exponentiate the estimates.

That is, if

(L(x), U(x)) is the confidence interval for the mean of log(Y) when X=x, then

(exp(L(x)), exp(U(x))) is the confidence interval for the mean of Y.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello @Fluvio1 ,

Response to your query has been given 40 minutes ago ... by @Rick_SAS .

I am just adding this page in case someone wonders where the tpspline plot in your last post comes from.

It took me a while to figure it out. 😉

But here's the code for that plot :

The TPSPLINE Procedure

Example 125.5 Computing a Bootstrap Confidence Interval

SAS/STAT User's Guide (for SAS Viya 3.5)

https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_tpspline_examples05.htm

BR, Koen

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.