Obsidian | Level 7

## Calculating bootstrapped 95% CI for 99th percentile of a variable

Hello,

I tried using the bootstrapping method described by Rick Wicklin  here to calculate the 95% CI around my statistic of interest (the 99th percentile of a variable).

My sample size is about 3200 participants. I am planning to use 2000 replicates. However, when I ran the code (below), it returned CI that do not surround the initial 99th percentile. For example, the 99th percentile of the variable was 0.13 and the CI limits generated were 11.6 and 42.9. I think  Any thoughts on what I did wrong here? I added p99 to step 2, which was not in the original example, but is my statistic of interest.

Thanks,

Sophie

``````proc means data=mydata p99; var myvar; run;

%let NumSamples = 2000;       /* number of bootstrap resamples */
/* 1. Generate many bootstrap samples */
proc surveyselect data=mydata NOPRINT seed=1
out=BootSSFreq(rename=(Replicate=SampleID))
method=urs              /* resample with replacement */
samprate=1              /* each bootstrap sample has N observations */
/*outhits*/ 			 /* OUTHITS option to suppress the frequency var */
reps=&NumSamples;       /* generate NumSamples bootstrap resamples */
run;

/* 2. Compute the statistic for each bootstrap sample */
proc means data=BootSSFreq p99 noprint;
by SampleID;
freq NumberHits;
var myvar;
output out=OutStats skew=Skewness;  /* approx sampling distribution */
run;

/* 3. Use approx sampling distribution to make statistical inferences */
proc univariate data=OutStats noprint;
var Skewness;
output out=Pctl pctlpre =CI95_
pctlpts =2.5  97.5       /* compute 95% bootstrap confidence interval */
pctlname=Lower Upper;
run;

proc print data=Pctl noobs; run;``````
1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Calculating bootstrapped 95% CI for 99th percentile of a variable

Rick's article was to find the interval around the SKEWNESS of a variable. So you copied his Proc Means Code asking for the same values

```proc means data=BootSSFreq p99 noprint;
by SampleID;
freq NumberHits;
var myvar;
output out=OutStats skew=Skewness;  /* approx sampling distribution */
run;```

I think you want P99= and use that in univariate.

@sophiec wrote:

Hello,

I tried using the bootstrapping method described by Rick Wicklin  here to calculate the 95% CI around my statistic of interest (the 99th percentile of a variable).

My sample size is about 3200 participants. I am planning to use 2000 replicates. However, when I ran the code (below), it returned CI that do not surround the initial 99th percentile. For example, the 99th percentile of the variable was 0.13 and the CI limits generated were 11.6 and 42.9. I think  Any thoughts on what I did wrong here? I added p99 to step 2, which was not in the original example, but is my statistic of interest.

Thanks,

Sophie

``````proc means data=mydata p99; var myvar; run;

%let NumSamples = 2000;       /* number of bootstrap resamples */
/* 1. Generate many bootstrap samples */
proc surveyselect data=mydata NOPRINT seed=1
out=BootSSFreq(rename=(Replicate=SampleID))
method=urs              /* resample with replacement */
samprate=1              /* each bootstrap sample has N observations */
/*outhits*/ 			 /* OUTHITS option to suppress the frequency var */
reps=&NumSamples;       /* generate NumSamples bootstrap resamples */
run;

/* 2. Compute the statistic for each bootstrap sample */
proc means data=BootSSFreq p99 noprint;
by SampleID;
freq NumberHits;
var myvar;
output out=OutStats skew=Skewness;  /* approx sampling distribution */
run;

/* 3. Use approx sampling distribution to make statistical inferences */
proc univariate data=OutStats noprint;
var Skewness;
output out=Pctl pctlpre =CI95_
pctlpts =2.5  97.5       /* compute 95% bootstrap confidence interval */
pctlname=Lower Upper;
run;

proc print data=Pctl noobs; run;``````

5 REPLIES 5
Obsidian | Level 7

## Re: Calculating bootstrapped 95% CI for 99th percentile of a variable

I tried updating the code as follows, as I think what I was calculating before was the 95% CI for the skewness of the variable and not the 99th percentile.

If this is correct, can I calculate bootstrapped CIs for subgroups of my data (for example, by sex)?

Thanks,

Sophie

``````
%let NumSamples = 2000; /* number of bootstrap resamples */
/* 1. Generate many bootstrap samples */
proc surveyselect data=cric NOPRINT seed=1
out=BootSSFreq(rename=(Replicate=SampleID))
method=urs /* resample with replacement */
samprate=1 /* each bootstrap sample has N observations */
outhits /* OUTHITS option to suppress the frequency var */
reps=&NumSamples; /* generate NumSamples bootstrap resamples */
run;

/* 2. Compute the statistic for each bootstrap sample */
proc means data=BootSSFreq p99 noprint;
by SampleID;
freq NumberHits;
var myvar;
output out=OutStats p99=percentile; /* approx sampling distribution */
run;

/* 3. Use approx sampling distribution to make statistical inferences */
proc univariate data=OutStats noprint;
var percentile;
output out=Pctl pctlpre =CI95_
pctlpts =2.5 97.5 /* compute 95% bootstrap confidence interval */
pctlname=Lower Upper;
run;

proc print data=Pctl noobs; run;``````
Super User

## Re: Calculating bootstrapped 95% CI for 99th percentile of a variable

Use CLASS Varname; in the Proc Means and Proc Univariate to get subgroups. However depending on your distribution of values in the subgroup data you may need to change the sample size in Surveyselect to have large enough samples.

Super User

## Re: Calculating bootstrapped 95% CI for 99th percentile of a variable

Rick's article was to find the interval around the SKEWNESS of a variable. So you copied his Proc Means Code asking for the same values

```proc means data=BootSSFreq p99 noprint;
by SampleID;
freq NumberHits;
var myvar;
output out=OutStats skew=Skewness;  /* approx sampling distribution */
run;```

I think you want P99= and use that in univariate.

@sophiec wrote:

Hello,

I tried using the bootstrapping method described by Rick Wicklin  here to calculate the 95% CI around my statistic of interest (the 99th percentile of a variable).

My sample size is about 3200 participants. I am planning to use 2000 replicates. However, when I ran the code (below), it returned CI that do not surround the initial 99th percentile. For example, the 99th percentile of the variable was 0.13 and the CI limits generated were 11.6 and 42.9. I think  Any thoughts on what I did wrong here? I added p99 to step 2, which was not in the original example, but is my statistic of interest.

Thanks,

Sophie

``````proc means data=mydata p99; var myvar; run;

%let NumSamples = 2000;       /* number of bootstrap resamples */
/* 1. Generate many bootstrap samples */
proc surveyselect data=mydata NOPRINT seed=1
out=BootSSFreq(rename=(Replicate=SampleID))
method=urs              /* resample with replacement */
samprate=1              /* each bootstrap sample has N observations */
/*outhits*/ 			 /* OUTHITS option to suppress the frequency var */
reps=&NumSamples;       /* generate NumSamples bootstrap resamples */
run;

/* 2. Compute the statistic for each bootstrap sample */
proc means data=BootSSFreq p99 noprint;
by SampleID;
freq NumberHits;
var myvar;
output out=OutStats skew=Skewness;  /* approx sampling distribution */
run;

/* 3. Use approx sampling distribution to make statistical inferences */
proc univariate data=OutStats noprint;
var Skewness;
output out=Pctl pctlpre =CI95_
pctlpts =2.5  97.5       /* compute 95% bootstrap confidence interval */
pctlname=Lower Upper;
run;

proc print data=Pctl noobs; run;``````

Obsidian | Level 7

## Re: Calculating bootstrapped 95% CI for 99th percentile of a variable

Thank you! I realized that shortly after posting and updated the code (see above). However, I'm finding that the calculated 99th percentile is not centered within the 95% CI that it calculates.

For example, see the following 99th percentiles with 95% CI:

overall sample: 102.6 (55.0, 229.0)

subgroup 1: 55.0 (17.0, 617.7)

subgroup 2: 86.0 (36.7, 252.9)

subgroup 3: 187.4 (57.9, 291.8)

Thanks again!

SAS Employee

## Re: Calculating bootstrapped 95% CI for 99th percentile of a variable

I can't tell you for sure why you are getting that behavior. However, I will point out that, in general, bootstrapping is known to provide poor estimates of the sampling distribution of extreme order statistics. Textbook examples are the minimum and maximum. I suspect that will also be true for the 99th percentile, so I would encourage you to think carefully about whether to even use a bootstrap here.

Discussion stats
• 5 replies
• 360 views
• 4 likes
• 3 in conversation