BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sophiec
Obsidian | Level 7

Hello,

 

I tried using the bootstrapping method described by Rick Wicklin  here to calculate the 95% CI around my statistic of interest (the 99th percentile of a variable).

 

My sample size is about 3200 participants. I am planning to use 2000 replicates. However, when I ran the code (below), it returned CI that do not surround the initial 99th percentile. For example, the 99th percentile of the variable was 0.13 and the CI limits generated were 11.6 and 42.9. I think  Any thoughts on what I did wrong here? I added p99 to step 2, which was not in the original example, but is my statistic of interest. 

 

Thanks, 

Sophie 

proc means data=mydata p99; 
var myvar;
run; %let NumSamples = 2000; /* number of bootstrap resamples */ /* 1. Generate many bootstrap samples */ proc surveyselect data=mydata NOPRINT seed=1 out=BootSSFreq(rename=(Replicate=SampleID)) method=urs /* resample with replacement */ samprate=1 /* each bootstrap sample has N observations */ /*outhits*/ /* OUTHITS option to suppress the frequency var */ reps=&NumSamples; /* generate NumSamples bootstrap resamples */ run; /* 2. Compute the statistic for each bootstrap sample */ proc means data=BootSSFreq p99 noprint; by SampleID; freq NumberHits; var myvar; output out=OutStats skew=Skewness; /* approx sampling distribution */ run; /* 3. Use approx sampling distribution to make statistical inferences */ proc univariate data=OutStats noprint; var Skewness; output out=Pctl pctlpre =CI95_ pctlpts =2.5 97.5 /* compute 95% bootstrap confidence interval */ pctlname=Lower Upper; run; proc print data=Pctl noobs; run;
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Rick's article was to find the interval around the SKEWNESS of a variable. So you copied his Proc Means Code asking for the same values

proc means data=BootSSFreq p99 noprint;
   by SampleID;
   freq NumberHits;
   var myvar;
   output out=OutStats skew=Skewness;  /* approx sampling distribution */
run;

I think you want P99= and use that in univariate.

 


@sophiec wrote:

Hello,

 

I tried using the bootstrapping method described by Rick Wicklin  here to calculate the 95% CI around my statistic of interest (the 99th percentile of a variable).

 

My sample size is about 3200 participants. I am planning to use 2000 replicates. However, when I ran the code (below), it returned CI that do not surround the initial 99th percentile. For example, the 99th percentile of the variable was 0.13 and the CI limits generated were 11.6 and 42.9. I think  Any thoughts on what I did wrong here? I added p99 to step 2, which was not in the original example, but is my statistic of interest. 

 

Thanks, 

Sophie 

proc means data=mydata p99; 
var myvar;
run; %let NumSamples = 2000; /* number of bootstrap resamples */ /* 1. Generate many bootstrap samples */ proc surveyselect data=mydata NOPRINT seed=1 out=BootSSFreq(rename=(Replicate=SampleID)) method=urs /* resample with replacement */ samprate=1 /* each bootstrap sample has N observations */ /*outhits*/ /* OUTHITS option to suppress the frequency var */ reps=&NumSamples; /* generate NumSamples bootstrap resamples */ run; /* 2. Compute the statistic for each bootstrap sample */ proc means data=BootSSFreq p99 noprint; by SampleID; freq NumberHits; var myvar; output out=OutStats skew=Skewness; /* approx sampling distribution */ run; /* 3. Use approx sampling distribution to make statistical inferences */ proc univariate data=OutStats noprint; var Skewness; output out=Pctl pctlpre =CI95_ pctlpts =2.5 97.5 /* compute 95% bootstrap confidence interval */ pctlname=Lower Upper; run; proc print data=Pctl noobs; run;

 

View solution in original post

5 REPLIES 5
sophiec
Obsidian | Level 7

I tried updating the code as follows, as I think what I was calculating before was the 95% CI for the skewness of the variable and not the 99th percentile. 

If this is correct, can I calculate bootstrapped CIs for subgroups of my data (for example, by sex)? 

 

Thanks, 

Sophie 


%let NumSamples = 2000; /* number of bootstrap resamples */
/* 1. Generate many bootstrap samples */
proc surveyselect data=cric NOPRINT seed=1
out=BootSSFreq(rename=(Replicate=SampleID))
method=urs /* resample with replacement */
samprate=1 /* each bootstrap sample has N observations */
outhits /* OUTHITS option to suppress the frequency var */
reps=&NumSamples; /* generate NumSamples bootstrap resamples */
run;

/* 2. Compute the statistic for each bootstrap sample */
proc means data=BootSSFreq p99 noprint;
by SampleID;
freq NumberHits;
var myvar;
output out=OutStats p99=percentile; /* approx sampling distribution */
run;

/* 3. Use approx sampling distribution to make statistical inferences */
proc univariate data=OutStats noprint;
var percentile;
output out=Pctl pctlpre =CI95_
pctlpts =2.5 97.5 /* compute 95% bootstrap confidence interval */
pctlname=Lower Upper;
run;

proc print data=Pctl noobs; run;
ballardw
Super User

Use CLASS Varname; in the Proc Means and Proc Univariate to get subgroups. However depending on your distribution of values in the subgroup data you may need to change the sample size in Surveyselect to have large enough samples.

ballardw
Super User

Rick's article was to find the interval around the SKEWNESS of a variable. So you copied his Proc Means Code asking for the same values

proc means data=BootSSFreq p99 noprint;
   by SampleID;
   freq NumberHits;
   var myvar;
   output out=OutStats skew=Skewness;  /* approx sampling distribution */
run;

I think you want P99= and use that in univariate.

 


@sophiec wrote:

Hello,

 

I tried using the bootstrapping method described by Rick Wicklin  here to calculate the 95% CI around my statistic of interest (the 99th percentile of a variable).

 

My sample size is about 3200 participants. I am planning to use 2000 replicates. However, when I ran the code (below), it returned CI that do not surround the initial 99th percentile. For example, the 99th percentile of the variable was 0.13 and the CI limits generated were 11.6 and 42.9. I think  Any thoughts on what I did wrong here? I added p99 to step 2, which was not in the original example, but is my statistic of interest. 

 

Thanks, 

Sophie 

proc means data=mydata p99; 
var myvar;
run; %let NumSamples = 2000; /* number of bootstrap resamples */ /* 1. Generate many bootstrap samples */ proc surveyselect data=mydata NOPRINT seed=1 out=BootSSFreq(rename=(Replicate=SampleID)) method=urs /* resample with replacement */ samprate=1 /* each bootstrap sample has N observations */ /*outhits*/ /* OUTHITS option to suppress the frequency var */ reps=&NumSamples; /* generate NumSamples bootstrap resamples */ run; /* 2. Compute the statistic for each bootstrap sample */ proc means data=BootSSFreq p99 noprint; by SampleID; freq NumberHits; var myvar; output out=OutStats skew=Skewness; /* approx sampling distribution */ run; /* 3. Use approx sampling distribution to make statistical inferences */ proc univariate data=OutStats noprint; var Skewness; output out=Pctl pctlpre =CI95_ pctlpts =2.5 97.5 /* compute 95% bootstrap confidence interval */ pctlname=Lower Upper; run; proc print data=Pctl noobs; run;

 

sophiec
Obsidian | Level 7

Thank you! I realized that shortly after posting and updated the code (see above). However, I'm finding that the calculated 99th percentile is not centered within the 95% CI that it calculates. 

 

For example, see the following 99th percentiles with 95% CI: 

overall sample: 102.6 (55.0, 229.0)

subgroup 1: 55.0 (17.0, 617.7)

subgroup 2: 86.0 (36.7, 252.9)

subgroup 3: 187.4 (57.9, 291.8)

 

Thanks again! 

Mike_N
SAS Employee

I can't tell you for sure why you are getting that behavior. However, I will point out that, in general, bootstrapping is known to provide poor estimates of the sampling distribution of extreme order statistics. Textbook examples are the minimum and maximum. I suspect that will also be true for the 99th percentile, so I would encourage you to think carefully about whether to even use a bootstrap here. 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 362 views
  • 4 likes
  • 3 in conversation