Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- Calculating bootstrapped 95% CI for 99th percentile of a variable

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 06-06-2024 11:30 AM
(359 views)

Hello,

I tried using the bootstrapping method described by Rick Wicklin here to calculate the 95% CI around my statistic of interest (the 99th percentile of a variable).

My sample size is about 3200 participants. I am planning to use 2000 replicates. However, when I ran the code (below), it returned CI that do not surround the initial 99th percentile. For example, the 99th percentile of the variable was 0.13 and the CI limits generated were 11.6 and 42.9. I think Any thoughts on what I did wrong here? I added p99 to step 2, which was not in the original example, but is my statistic of interest.

Thanks,

Sophie

`proc means data=mydata p99; `

var myvar;

run;
%let NumSamples = 2000; /* number of bootstrap resamples */
/* 1. Generate many bootstrap samples */
proc surveyselect data=mydata NOPRINT seed=1
out=BootSSFreq(rename=(Replicate=SampleID))
method=urs /* resample with replacement */
samprate=1 /* each bootstrap sample has N observations */
/*outhits*/ /* OUTHITS option to suppress the frequency var */
reps=&NumSamples; /* generate NumSamples bootstrap resamples */
run;
/* 2. Compute the statistic for each bootstrap sample */
proc means data=BootSSFreq p99 noprint;
by SampleID;
freq NumberHits;
var myvar;
output out=OutStats skew=Skewness; /* approx sampling distribution */
run;
/* 3. Use approx sampling distribution to make statistical inferences */
proc univariate data=OutStats noprint;
var Skewness;
output out=Pctl pctlpre =CI95_
pctlpts =2.5 97.5 /* compute 95% bootstrap confidence interval */
pctlname=Lower Upper;
run;
proc print data=Pctl noobs; run;

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Rick's article was to find the interval around the SKEWNESS of a variable. So you copied his Proc Means Code asking for the same values

proc means data=BootSSFreq p99 noprint; by SampleID; freq NumberHits; var myvar; output out=OutStatsskew=Skewness; /* approx sampling distribution */ run;

I think you want P99= and use that in univariate.

@sophiec wrote:

Hello,

I tried using the bootstrapping method described by Rick Wicklin here to calculate the 95% CI around my statistic of interest (the 99th percentile of a variable).

My sample size is about 3200 participants. I am planning to use 2000 replicates. However, when I ran the code (below), it returned CI that do not surround the initial 99th percentile. For example, the 99th percentile of the variable was 0.13 and the CI limits generated were 11.6 and 42.9. I think Any thoughts on what I did wrong here? I added p99 to step 2, which was not in the original example, but is my statistic of interest.

Thanks,

Sophie

`proc means data=mydata p99;`

var myvar;

run; %let NumSamples = 2000; /* number of bootstrap resamples */ /* 1. Generate many bootstrap samples */ proc surveyselect data=mydata NOPRINT seed=1 out=BootSSFreq(rename=(Replicate=SampleID)) method=urs /* resample with replacement */ samprate=1 /* each bootstrap sample has N observations */ /*outhits*/ /* OUTHITS option to suppress the frequency var */ reps=&NumSamples; /* generate NumSamples bootstrap resamples */ run; /* 2. Compute the statistic for each bootstrap sample */ proc means data=BootSSFreq p99 noprint; by SampleID; freq NumberHits; var myvar; output out=OutStats skew=Skewness; /* approx sampling distribution */ run; /* 3. Use approx sampling distribution to make statistical inferences */ proc univariate data=OutStats noprint; var Skewness; output out=Pctl pctlpre =CI95_ pctlpts =2.5 97.5 /* compute 95% bootstrap confidence interval */ pctlname=Lower Upper; run; proc print data=Pctl noobs; run;

5 REPLIES 5

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I tried updating the code as follows, as I think what I was calculating before was the 95% CI for the skewness of the variable and not the 99th percentile.

If this is correct, can I calculate bootstrapped CIs for subgroups of my data (for example, by sex)?

Thanks,

Sophie

```
%let NumSamples = 2000; /* number of bootstrap resamples */
/* 1. Generate many bootstrap samples */
proc surveyselect data=cric NOPRINT seed=1
out=BootSSFreq(rename=(Replicate=SampleID))
method=urs /* resample with replacement */
samprate=1 /* each bootstrap sample has N observations */
outhits /* OUTHITS option to suppress the frequency var */
reps=&NumSamples; /* generate NumSamples bootstrap resamples */
run;
/* 2. Compute the statistic for each bootstrap sample */
proc means data=BootSSFreq p99 noprint;
by SampleID;
freq NumberHits;
var myvar;
output out=OutStats p99=percentile; /* approx sampling distribution */
run;
/* 3. Use approx sampling distribution to make statistical inferences */
proc univariate data=OutStats noprint;
var percentile;
output out=Pctl pctlpre =CI95_
pctlpts =2.5 97.5 /* compute 95% bootstrap confidence interval */
pctlname=Lower Upper;
run;
proc print data=Pctl noobs; run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Rick's article was to find the interval around the SKEWNESS of a variable. So you copied his Proc Means Code asking for the same values

proc means data=BootSSFreq p99 noprint; by SampleID; freq NumberHits; var myvar; output out=OutStatsskew=Skewness; /* approx sampling distribution */ run;

I think you want P99= and use that in univariate.

@sophiec wrote:

Hello,

I tried using the bootstrapping method described by Rick Wicklin here to calculate the 95% CI around my statistic of interest (the 99th percentile of a variable).

My sample size is about 3200 participants. I am planning to use 2000 replicates. However, when I ran the code (below), it returned CI that do not surround the initial 99th percentile. For example, the 99th percentile of the variable was 0.13 and the CI limits generated were 11.6 and 42.9. I think Any thoughts on what I did wrong here? I added p99 to step 2, which was not in the original example, but is my statistic of interest.

Thanks,

Sophie

`proc means data=mydata p99;`

var myvar;

run; %let NumSamples = 2000; /* number of bootstrap resamples */ /* 1. Generate many bootstrap samples */ proc surveyselect data=mydata NOPRINT seed=1 out=BootSSFreq(rename=(Replicate=SampleID)) method=urs /* resample with replacement */ samprate=1 /* each bootstrap sample has N observations */ /*outhits*/ /* OUTHITS option to suppress the frequency var */ reps=&NumSamples; /* generate NumSamples bootstrap resamples */ run; /* 2. Compute the statistic for each bootstrap sample */ proc means data=BootSSFreq p99 noprint; by SampleID; freq NumberHits; var myvar; output out=OutStats skew=Skewness; /* approx sampling distribution */ run; /* 3. Use approx sampling distribution to make statistical inferences */ proc univariate data=OutStats noprint; var Skewness; output out=Pctl pctlpre =CI95_ pctlpts =2.5 97.5 /* compute 95% bootstrap confidence interval */ pctlname=Lower Upper; run; proc print data=Pctl noobs; run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you! I realized that shortly after posting and updated the code (see above). However, I'm finding that the calculated 99th percentile is not centered within the 95% CI that it calculates.

For example, see the following 99th percentiles with 95% CI:

overall sample: 102.6 (55.0, 229.0)

subgroup 1: 55.0 (17.0, 617.7)

subgroup 2: 86.0 (36.7, 252.9)

subgroup 3: 187.4 (57.9, 291.8)

Thanks again!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.