<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: discrepancies in distribution-free confidence interval for percentiles in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/534867#M26937</link>
    <description>&lt;P&gt;Below is an R program to calculate distribution-free CI for percentiles. It returns an CI of (71.9,103.6), which corresponds to the pair of order statistics (2282,2342), for 90th percentile. While SAS returns an CI of (71.4,102.2) with corresponding pair of order statistics of (2281,2341).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;calc_ci&amp;lt;- function(vec,percentile){&lt;/P&gt;&lt;P&gt;library(tidyverse)&lt;BR /&gt;vec_order&amp;lt;- sort(vec) ###sorted in ascending order&lt;BR /&gt;n&amp;lt;- length(vec)&lt;BR /&gt;&lt;BR /&gt;mid&amp;lt;- floor(n*percentile)+1&lt;BR /&gt;###type 5: empirical distribution function with averaging&lt;BR /&gt;if (n*percentile==floor(n*percentile)) perc&amp;lt;- (vec_order[mid-1]+vec_order[mid])/2&lt;BR /&gt;else perc&amp;lt;- vec_order[mid] #exact version, percentile are calculated by choosing the order statistic&lt;BR /&gt;&lt;BR /&gt;len&amp;lt;- min(mid,n-mid)&lt;BR /&gt;list&amp;lt;- data.frame(i=0:len) %&amp;gt;% mutate(lower=mid-i,upper=mid+i) %&amp;gt;%&lt;BR /&gt;mutate(prob=pbinom(q = upper-1,size = n, prob = percentile) - pbinom(q = lower-1,size = n, prob = percentile),&lt;BR /&gt;logic=prob&amp;gt;=1-alpha)&lt;BR /&gt;i_select&amp;lt;- min(list[list$logic,'i'])&lt;BR /&gt;u&amp;lt;- mid + i_select&lt;BR /&gt;l&amp;lt;- mid - i_select&lt;BR /&gt;&lt;BR /&gt;ci_per&amp;lt;- data.frame("p"=perc,"lcl"=vec_order[l],"ucl"=vec_order[u])&lt;BR /&gt;return(ci_per)&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;calc_ci(vec,0.9)&lt;/P&gt;</description>
    <pubDate>Tue, 12 Feb 2019 15:09:59 GMT</pubDate>
    <dc:creator>swimmingfish4</dc:creator>
    <dc:date>2019-02-12T15:09:59Z</dc:date>
    <item>
      <title>discrepancies in distribution-free confidence interval for percentiles</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/534605#M26917</link>
      <description>&lt;P&gt;I followed the exact methodology from SAS 9.4 procedures guide: statistical procedures, third edition to calculate confidence interval for percentiles. However I have found a few discrepancies when I compared my calculated resutls with what was returned from SAS.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For example, I want to caculate the CI for 90%tile of the results. The percentile calculated using [np]+1 and it returned 2312.&lt;/P&gt;&lt;P&gt;I start with the par of order statistics: (2311,2313) and then (2310,2314), (2309,2315) and so on. The optimal pair of statistics is (2282,2342) with coverage probability of&amp;nbsp;0.9515646. However SAS returned the pair of the order statistics is (2281,2341). I don't know why this result is procuded instead of the symmetric pair of (2282,2342) around 2312. I am using CIPCTLDF=(lowerpre=LCL upperpre=UCL) options to get the distrbution-free confidence interval. The data I used can be found in the attachments. N=2568 and percentile=90% and alpha=0.05.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Method shown in the SAS procedure page:&lt;/P&gt;&lt;P&gt;The two-sided&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0004.png" border="0" alt="$100(1-\alpha )\% $" width="79" height="16" /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;confidence limits for the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0180.png" border="0" alt="$100p$" width="29" height="14" /&gt;&lt;/SPAN&gt;th percentile are&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class="AAmathobject"&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0186.png" border="0" alt="\[  \begin{array}{lcl} \mbox{lower limit} &amp;amp;  = &amp;amp;  X_{(l)} \\ \mbox{upper limit} &amp;amp;  = &amp;amp;  X_{(u)} \end{array}  \]" width="135" height="36" /&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;where&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0187.png" border="0" alt="$X_{(j)}$" width="23" height="16" /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=" AAmathtext"&gt;j&lt;/SPAN&gt;th order statistic when the data values are arranged in increasing order:&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class="AAmathobject"&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0188.png" border="0" alt="\[  X_{(1)} \leq X_{(2)} \leq \ldots \leq X_{(n)}  \]" width="145" height="17" /&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;The lower rank&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=" AAmathtext"&gt;l&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and upper rank&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=" AAmathtext"&gt;u&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;are integers that are symmetric (or nearly symmetric) around&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0189.png" border="0" alt="$\lfloor np \rfloor +1$" width="49" height="16" /&gt;&lt;/SPAN&gt;, where&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0190.png" border="0" alt="$\lfloor np \rfloor $" width="25" height="16" /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is the integer part of&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0011.png" border="0" alt="$np$" width="15" height="10" /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=" AAmathtext"&gt;n&lt;/SPAN&gt;is the sample size. Furthermore,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=" AAmathtext"&gt;l&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=" AAmathtext"&gt;u&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;are chosen so that&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0191.png" border="0" alt="$X_{(l)}$" width="22" height="16" /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0192.png" border="0" alt="$X_{(u)}$" width="25" height="16" /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;are as close to&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0193.png" border="0" alt="$X_{\lfloor np \rfloor +1}$" width="47" height="16" /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;as possible while satisfying the coverage probability requirement,&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class="AAmathobject"&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0194.png" border="0" alt="\[  Q(u-1;n,p) - Q(l-1;n,p) \geq 1 - \alpha  \]" width="227" height="16" /&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;where&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0195.png" border="0" alt="$Q(k;n,p)$" width="56" height="16" /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is the cumulative binomial probability&lt;/P&gt;</description>
      <pubDate>Mon, 11 Feb 2019 21:32:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/534605#M26917</guid>
      <dc:creator>swimmingfish4</dc:creator>
      <dc:date>2019-02-11T21:32:40Z</dc:date>
    </item>
    <item>
      <title>Re: discrepancies in distribution-free confidence interval for percentiles</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/534624#M26919</link>
      <description>&lt;P&gt;It would help to post the code for the procedure you used. Without that we a guessing as to which procedure and options actually specified.&lt;/P&gt;</description>
      <pubDate>Mon, 11 Feb 2019 21:28:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/534624#M26919</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2019-02-11T21:28:56Z</dc:date>
    </item>
    <item>
      <title>Re: discrepancies in distribution-free confidence interval for percentiles</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/534630#M26920</link>
      <description>&lt;P&gt;Sure. Below is the code part I used to calculate the distribution-free confidence interval for 90%tile.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;***read in csv file;&lt;/P&gt;&lt;P&gt;proc import out=testdata1 datafile="vec.csv" dbms=csv replace;&lt;BR /&gt;getnames=yes;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;***use the CIPCTLDF option to request distribution-free confidence limits for percentiles;&lt;BR /&gt;proc univariate data=testdata1 cipctldf;&lt;BR /&gt;var x;&lt;BR /&gt;output out=pctl1 pctlpts=90 pctlpre=p&lt;BR /&gt;cipctldf=(lowerpre=LCL upperpre=UCL);&lt;BR /&gt;run;&lt;BR /&gt;proc print noobs;run;endsas;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Results generated are shown below:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The SAS System&lt;/P&gt;&lt;P&gt;The UNIVARIATE Procedure&lt;BR /&gt;Variable: x&lt;/P&gt;&lt;P&gt;Moments&lt;/P&gt;&lt;P&gt;N 2568 Sum Weights 2568&lt;BR /&gt;Mean 237.121456 Sum Observations 608927.9&lt;BR /&gt;Std Deviation 2133.64526 Variance 4552442.11&lt;BR /&gt;Skewness 16.3191468 Kurtosis 309.929034&lt;BR /&gt;Uncorrected SS 1.18305E10 Corrected SS 1.16861E10&lt;BR /&gt;Coeff Variation 899.811133 Std Error Mean 42.1041308&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Basic Statistical Measures&lt;BR /&gt;&lt;BR /&gt;Location Variability&lt;/P&gt;&lt;P&gt;Mean 237.1215 Std Deviation 2134&lt;BR /&gt;Median 5.1000 Variance 4552442&lt;BR /&gt;Mode 0.0000 Range 50274&lt;BR /&gt;Interquartile Range 13.30000&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Tests for Location: Mu0=0&lt;BR /&gt;&lt;BR /&gt;Test -Statistic- -----p Value------&lt;/P&gt;&lt;P&gt;Student's t t 5.631786 Pr &amp;gt; |t| &amp;lt;.0001&lt;BR /&gt;Sign M 1259 Pr &amp;gt;= |M| &amp;lt;.0001&lt;BR /&gt;Signed Rank S 1585711 Pr &amp;gt;= |S| &amp;lt;.0001&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Quantiles (Definition 5)&lt;BR /&gt;&lt;BR /&gt;95% Confidence Limits -------Order Statistics-------&lt;BR /&gt;Level Quantile Distribution Free LCL Rank UCL Rank Coverage&lt;/P&gt;&lt;P&gt;100% Max 50274.3&lt;BR /&gt;99% 4739.8 3266.9 7294.7 2532 2552 95.19&lt;BR /&gt;95% 335.0 231.2 529.9 2419 2463 95.29&lt;BR /&gt;90% 85.3 71.4 102.2 2281 2341 95.15&lt;BR /&gt;75% Q3 15.8 13.7 18.5 1883 1970 95.26&lt;BR /&gt;50% Median 5.1 4.8 5.4 1235 1335 95.15&lt;BR /&gt;25% Q1 2.5 2.4 2.6 599 686 95.26&lt;BR /&gt;10% 1.4 1.3 1.5 228 288 95.15&lt;BR /&gt;5% 0.8 0.7 0.9 106 150 95.29&lt;BR /&gt;1% 0.0 0.0 0.0 17 37 95.19&lt;BR /&gt;0% Min 0.0&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&amp;#12; The SAS System&lt;/P&gt;&lt;P&gt;The UNIVARIATE Procedure&lt;BR /&gt;Variable: x&lt;/P&gt;&lt;P&gt;Extreme Observations&lt;BR /&gt;&lt;BR /&gt;----Lowest---- -----Highest-----&lt;BR /&gt;&lt;BR /&gt;Value Obs Value Obs&lt;/P&gt;&lt;P&gt;0 2427 27810.0 1558&lt;BR /&gt;0 2426 35257.3 2338&lt;BR /&gt;0 2425 36921.4 2339&lt;BR /&gt;0 2167 48214.6 2340&lt;BR /&gt;0 2019 50274.3 1666&lt;BR /&gt;&amp;#12; The SAS System&lt;/P&gt;&lt;P&gt;p90 LCL90 UCL90&lt;/P&gt;&lt;P&gt;85.3 71.4 102.2&lt;/P&gt;</description>
      <pubDate>Mon, 11 Feb 2019 21:37:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/534630#M26920</guid>
      <dc:creator>swimmingfish4</dc:creator>
      <dc:date>2019-02-11T21:37:40Z</dc:date>
    </item>
    <item>
      <title>Re: discrepancies in distribution-free confidence interval for percentiles</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/534784#M26929</link>
      <description>&lt;P&gt;I assume that you computed the CIs with a program and got a different answer? Please&amp;nbsp;post the program that you uses to check the results.&lt;/P&gt;</description>
      <pubDate>Tue, 12 Feb 2019 12:34:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/534784#M26929</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2019-02-12T12:34:50Z</dc:date>
    </item>
    <item>
      <title>Re: discrepancies in distribution-free confidence interval for percentiles</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/534867#M26937</link>
      <description>&lt;P&gt;Below is an R program to calculate distribution-free CI for percentiles. It returns an CI of (71.9,103.6), which corresponds to the pair of order statistics (2282,2342), for 90th percentile. While SAS returns an CI of (71.4,102.2) with corresponding pair of order statistics of (2281,2341).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;calc_ci&amp;lt;- function(vec,percentile){&lt;/P&gt;&lt;P&gt;library(tidyverse)&lt;BR /&gt;vec_order&amp;lt;- sort(vec) ###sorted in ascending order&lt;BR /&gt;n&amp;lt;- length(vec)&lt;BR /&gt;&lt;BR /&gt;mid&amp;lt;- floor(n*percentile)+1&lt;BR /&gt;###type 5: empirical distribution function with averaging&lt;BR /&gt;if (n*percentile==floor(n*percentile)) perc&amp;lt;- (vec_order[mid-1]+vec_order[mid])/2&lt;BR /&gt;else perc&amp;lt;- vec_order[mid] #exact version, percentile are calculated by choosing the order statistic&lt;BR /&gt;&lt;BR /&gt;len&amp;lt;- min(mid,n-mid)&lt;BR /&gt;list&amp;lt;- data.frame(i=0:len) %&amp;gt;% mutate(lower=mid-i,upper=mid+i) %&amp;gt;%&lt;BR /&gt;mutate(prob=pbinom(q = upper-1,size = n, prob = percentile) - pbinom(q = lower-1,size = n, prob = percentile),&lt;BR /&gt;logic=prob&amp;gt;=1-alpha)&lt;BR /&gt;i_select&amp;lt;- min(list[list$logic,'i'])&lt;BR /&gt;u&amp;lt;- mid + i_select&lt;BR /&gt;l&amp;lt;- mid - i_select&lt;BR /&gt;&lt;BR /&gt;ci_per&amp;lt;- data.frame("p"=perc,"lcl"=vec_order[l],"ucl"=vec_order[u])&lt;BR /&gt;return(ci_per)&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;calc_ci(vec,0.9)&lt;/P&gt;</description>
      <pubDate>Tue, 12 Feb 2019 15:09:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/534867#M26937</guid>
      <dc:creator>swimmingfish4</dc:creator>
      <dc:date>2019-02-12T15:09:59Z</dc:date>
    </item>
    <item>
      <title>Re: discrepancies in distribution-free confidence interval for percentiles</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/534921#M26940</link>
      <description>&lt;P&gt;I don't speak R yet but I am never surprised when different programs, SAS vs R, return slightly different answers. There are so many things going into the algorithms in the background regarding precision, rounding and orders of operations.&lt;/P&gt;</description>
      <pubDate>Tue, 12 Feb 2019 16:55:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/534921#M26940</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2019-02-12T16:55:51Z</dc:date>
    </item>
    <item>
      <title>Re: discrepancies in distribution-free confidence interval for percentiles</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/535007#M26942</link>
      <description>&lt;P&gt;I get the following error when I try to run your script :&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;vec = 1:100&lt;BR /&gt;&amp;nbsp; &amp;nbsp;calc_ci(vec,0.9)&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;Error in mutate_impl(.data, dots) : &lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;Evaluation error: non-numeric argument to binary operator.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Feb 2019 20:58:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/535007#M26942</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2019-02-12T20:58:32Z</dc:date>
    </item>
    <item>
      <title>Re: discrepancies in distribution-free confidence interval for percentiles</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/535022#M26945</link>
      <description>&lt;P&gt;sorry, I forgot to set alpha to be 0.05.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;###set alpha to be 0.05&lt;/P&gt;&lt;P&gt;alpha&amp;lt;- 0.05&lt;/P&gt;&lt;P&gt;calc_ci&amp;lt;- function(vec,percentile){&lt;/P&gt;&lt;P&gt;library(tidyverse)&lt;BR /&gt;vec_order&amp;lt;- sort(vec) ###sorted in ascending order&lt;BR /&gt;n&amp;lt;- length(vec)&lt;BR /&gt;&lt;BR /&gt;mid&amp;lt;- floor(n*percentile)+1&lt;BR /&gt;###type 5: empirical distribution function with averaging&lt;BR /&gt;if (n*percentile==floor(n*percentile)) perc&amp;lt;- (vec_order[mid-1]+vec_order[mid])/2&lt;BR /&gt;else perc&amp;lt;- vec_order[mid] #exact version, percentile are calculated by choosing the order statistic&lt;BR /&gt;&lt;BR /&gt;len&amp;lt;- min(mid,n-mid)&lt;BR /&gt;list&amp;lt;- data.frame(i=0:len) %&amp;gt;% mutate(lower=mid-i,upper=mid+i) %&amp;gt;%&lt;BR /&gt;mutate(prob=pbinom(q = upper-1,size = n, prob = percentile) - pbinom(q = lower-1,size = n, prob = percentile),&lt;BR /&gt;logic=prob&amp;gt;=1-alpha)&lt;BR /&gt;i_select&amp;lt;- min(list[list$logic,'i'])&lt;BR /&gt;u&amp;lt;- mid + i_select&lt;BR /&gt;l&amp;lt;- mid - i_select&lt;BR /&gt;&lt;BR /&gt;ci_per&amp;lt;- data.frame("p"=perc,"lcl"=vec_order[l],"ucl"=vec_order[u])&lt;BR /&gt;return(ci_per)&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;calc_ci(vec,0.9)&lt;/P&gt;</description>
      <pubDate>Tue, 12 Feb 2019 21:19:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/535022#M26945</guid>
      <dc:creator>swimmingfish4</dc:creator>
      <dc:date>2019-02-12T21:19:23Z</dc:date>
    </item>
    <item>
      <title>Re: discrepancies in distribution-free confidence interval for percentiles</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/535211#M26951</link>
      <description>&lt;P&gt;Although&amp;nbsp;I do not have the answer to your question, I looked at this last night and have a few comments. I think there are two main issues:&lt;/P&gt;
&lt;P&gt;1. The choice of the initial value for the lower and upper index.&lt;/P&gt;
&lt;P&gt;2. The method by which the lower index is decremented and the upper index is incremented until the difference in binomial&amp;nbsp;probability exceeds the target coverage probability. See the paragraph that contains Eqn 5.1.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have attached a copy of the relevant pages from Hahn and Meeker (1991), which is the reference in the SAS doc. The description of the method of&amp;nbsp;determining the interval is fairly vague, using terms like "nearly symmetrical" and "as close together as possible."&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your program uses floor(np)+1 as the initial index to start the incrementing/decrementing process. However, Hahn and Meeker propose&amp;nbsp;using "integers l and u that are closest to p(n+1)."&amp;nbsp; I interpret that as l=floor(p*(n+1)) and u=ceil(p*(n+1)) although it is not clear to me what to do if l=u, which seems to be forbidden by Eqn 5.1. Do you decrement l? Increment u? I don't know.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In your program, you follow&amp;nbsp;Hahn and Meeker's sentence that says&amp;nbsp;"For example, ...you can use l=i^{-} - j and u=i^{+} + j for j=1,2,3..., incrementing j until [Eqn 5.1] is satisfied." But their comment&amp;nbsp;is only one way to&amp;nbsp; implement the&amp;nbsp;goal to "choose the integers l and u symmetrically (or nearly symmetrically) around p*(n+1) and &lt;EM&gt;as &lt;/EM&gt;close together as possible." For example, an alternative is to test at each step whether decrementing l or incrementing u (and leaving the other unchanged) would satisfy Eqn 5.1. If so, that would lead to values that are closer together than changing both bounds in lockstep. I don't know what PROC UNIVARIATE is doing, but I suspect it is doing something more complicated, given that your program gives different answers.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Incidentally, the SAS interval&amp;nbsp;&lt;SPAN&gt;(2281,2341) has a smaller coverage probability than the interval&amp;nbsp; (2282,2342) that your program produces, so I guess that might indicate&amp;nbsp;that UNIVARIATE is using a more sophisticated decrement/increment algorithm.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc iml;
n = 2568;
p = 0.9;
mid = p*(n+1);
print mid;&lt;BR /&gt;
l = 2281; u = 2341;
dQSAS = cdf("binomial", u-1, p, n) - cdf("binomial", l-1, p, n);
print l u dQSAS;&lt;BR /&gt;
l = 2282; u = 2342;
dQAlt = cdf("binomial", u-1, p, n) - cdf("binomial", l-1, p, n);
print l u dQAlt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Unfortunately, I will be out of the office for the next week, but please post any progress that you make. It is an interesting question.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Feb 2019 15:02:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/535211#M26951</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2019-02-13T15:02:34Z</dc:date>
    </item>
    <item>
      <title>Re: discrepancies in distribution-free confidence interval for percentiles</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/538806#M27069</link>
      <description>&lt;P&gt;Thank you Rick for your detailed response. Here are some of my thoughts on the difference identified:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. You mentioned the choice of the intiial value for the lower and upper index may differ. That's right. I did started with Hahn and Meeker's guidance choosing &lt;SPAN&gt;l=floor(p*(n+1)) and u=ceil(p*(n+1))&amp;nbsp;but later on I saw the&amp;nbsp;&lt;/SPAN&gt;SAS theory page for calculating percentiles under The Univariate Procedure (&lt;A href="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/viewer.htm#procstat_univariate_details14.htm" target="_blank" rel="noopener"&gt;http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/viewer.htm#procstat_univariate_details14.htm&lt;/A&gt;), which states that&amp;nbsp;&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;&lt;SPAN&gt;The lower rank&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=" AAmathtext"&gt;l&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;and upper rank&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=" AAmathtext"&gt;u&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;are integers that are symmetric (or nearly symmetric) around&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0189.png" border="0" alt="$\lfloor np \rfloor +1$" width="49" height="16" /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;, where&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0190.png" border="0" alt="$\lfloor np \rfloor $" width="25" height="16" /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;is the integer part of&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0011.png" border="0" alt="$np$" width="15" height="10" /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;and&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=" AAmathtext"&gt;n&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;is the sample size.&amp;nbsp;&lt;/SPAN&gt;Furthermore,&lt;SPAN class=" AAmathtext"&gt;l&lt;/SPAN&gt;and&lt;SPAN class=" AAmathtext"&gt;u&lt;/SPAN&gt;are chosen so that&lt;SPAN&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0191.png" border="0" alt="$X_{(l)}$" width="22" height="16" /&gt;&lt;/SPAN&gt;and&lt;SPAN&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0192.png" border="0" alt="$X_{(u)}$" width="25" height="16" /&gt;&lt;/SPAN&gt;are as close to&lt;SPAN&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0193.png" border="0" alt="$X_{\lfloor np \rfloor +1}$" width="47" height="16" /&gt;&lt;/SPAN&gt;as possible while satisfying the coverage probability requirement,&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class="AAmathobject"&gt;&lt;IMG src="http://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/images/procstat_univariate0194.png" border="0" alt="\[  Q(u-1;n,p) - Q(l-1;n,p) \geq 1 - \alpha  \]" width="227" height="16" /&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/BLOCKQUOTE&gt;&lt;DIV class=""&gt;&lt;DIV class="AAmathobject"&gt;In order to align with what SAS does, I decided to choose&amp;nbsp;&lt;SPAN&gt;floor(np)+1 as the initial index to start the incrementing/decrementing process.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="AAmathobject"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="AAmathobject"&gt;2. In the example data case, yes,&amp;nbsp;&lt;SPAN&gt;the SAS interval&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;(2281,2341) did have a smaller coverage probability than the interval&amp;nbsp; (2282,2342) that R program produces. Here is my puzzle. Assuming SAS implemented a more sophisticated algorithm then I don't know why the interval of (2283,2343) is not chosen since it coverage probability is closest to 0.95 compared to the other.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="AAmathobject"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="AAmathobject"&gt;&lt;SPAN&gt;pair of order statistics (2282,2342) with coverage probability of&amp;nbsp;0.9515646 (R function result)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="AAmathobject"&gt;&lt;SPAN&gt;pair of order statistics (2281,2341) with coverage probability of 0.9514861 (SAS result).&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="AAmathobject"&gt;&lt;SPAN&gt;pair of order statistics (2283,2343) with coverage probability of 0.9506712 (optimal minimal coverage probability).&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="AAmathobject"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="AAmathobject"&gt;Below is how I searched around the original symmetric CI in R:&lt;/DIV&gt;&lt;DIV class="AAmathobject"&gt;&lt;PRE&gt;l&amp;lt;- 2282
u&amp;lt;- 2342&lt;BR /&gt;alpha&amp;lt;- 0.05
percentile&amp;lt;- 0.9
n&amp;lt;- 2568
library(tidyverse)
tab&amp;lt;- tibble("l_candidate"=c(l,l+1,l-1,l,l,l+1,l+1),"u_candidate"=c(u,u+1,u-1,u+1,u-1,u,u)) %&amp;gt;%
  mutate(prob=pbinom(q = u_candidate-1,size = n, prob = percentile) - pbinom(q = l_candidate-1,size = n, prob = percentile),
         logic=prob&amp;gt;=1-alpha) %&amp;gt;% filter(logic) %&amp;gt;% filter(prob==min(prob))
tab&lt;BR /&gt;# A tibble: 1 x 4&lt;BR /&gt;l_candidate u_candidate prob logic&lt;BR /&gt;&amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;lgl&amp;gt;&lt;BR /&gt;1 2283 2343 0.951 TRUE &lt;/PRE&gt;&lt;/DIV&gt;&lt;DIV class="AAmathobject"&gt;&lt;SPAN&gt;3. I am sure that SAS's algorithm did something different from what is stated in the theory page. It would be greatly appreciated if you can provide any insight regarding the hidden implementation from SAS end.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 26 Feb 2019 21:00:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/538806#M27069</guid>
      <dc:creator>swimmingfish4</dc:creator>
      <dc:date>2019-02-26T21:00:50Z</dc:date>
    </item>
    <item>
      <title>Re: discrepancies in distribution-free confidence interval for percentiles</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/538817#M27071</link>
      <description>&lt;P&gt;&lt;EM&gt;&amp;gt; I am sure that SAS's algorithm did something different from what is stated in the theory page.&amp;nbsp;&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;As&amp;nbsp;I&amp;nbsp;pointed out in my previous response, the description in the documentation and in Hahn/Meeker are both vague. There are several algorithms that would be consistent with the text.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I cannot provide any &lt;SPAN&gt;insight regarding the "hidden implementation," but if I think about it further,&amp;nbsp;I will let you know.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Out of curiosity, what are you trying to accomplish? Is this a school project? A research paper?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 26 Feb 2019 21:19:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/538817#M27071</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2019-02-26T21:19:07Z</dc:date>
    </item>
    <item>
      <title>Re: discrepancies in distribution-free confidence interval for percentiles</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/538842#M27072</link>
      <description>&lt;P&gt;Thank you so much for your input. I am trying to create an R function that is able to calculate the CI of percentiles and I wanna make sure I get the correct results. So SAS univariate procedure is chosen to be the gold standard.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 26 Feb 2019 22:19:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/538842#M27072</guid>
      <dc:creator>swimmingfish4</dc:creator>
      <dc:date>2019-02-26T22:19:59Z</dc:date>
    </item>
    <item>
      <title>Re: discrepancies in distribution-free confidence interval for percentiles</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/898598#M44509</link>
      <description>&lt;P&gt;Hi! I am looking into this problem now. Have you found out anything after the discussion in 2019?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am trying to&amp;nbsp; find initial parameters LLIM and ULIM for the metod I describe in&lt;/P&gt;
&lt;P&gt;Fast and Accurate Calculation of Descriptive Statistics of Very Large Sets of Data&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Many thanks in advance!&lt;/P&gt;
&lt;P&gt;/Br AndersS&lt;/P&gt;</description>
      <pubDate>Sat, 14 Oct 2023 11:32:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/discrepancies-in-distribution-free-confidence-interval-for/m-p/898598#M44509</guid>
      <dc:creator>AndersS</dc:creator>
      <dc:date>2023-10-14T11:32:19Z</dc:date>
    </item>
  </channel>
</rss>

