<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic 95% CI for Categorical Variables in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/245713#M12948</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am wondering how to complete 95% CI for a categorical variable such as race or income bracket.&lt;/P&gt;
&lt;P&gt;I know how to do it for a binary variable:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc univariate data=data&amp;nbsp; cibasic(alpha=0.05);&amp;nbsp;&amp;nbsp; var binary; run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But how do I do it for a categorical variable such as race?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So that I get the following&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Race&lt;/P&gt;
&lt;P&gt;White X, 95% CI x1-x2&lt;/P&gt;
&lt;P&gt;Black Y, 95% CI Y1-Y2&lt;/P&gt;
&lt;P&gt;Asian Z, 95% cI Z1-Z2&lt;/P&gt;
&lt;P&gt;and so on.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;THanks so much in advance,&lt;/P&gt;
&lt;P&gt;Louise&lt;/P&gt;
&lt;PRE class="sascode"&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sun, 24 Jan 2016 18:04:56 GMT</pubDate>
    <dc:creator>LLW</dc:creator>
    <dc:date>2016-01-24T18:04:56Z</dc:date>
    <item>
      <title>95% CI for Categorical Variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/245713#M12948</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am wondering how to complete 95% CI for a categorical variable such as race or income bracket.&lt;/P&gt;
&lt;P&gt;I know how to do it for a binary variable:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc univariate data=data&amp;nbsp; cibasic(alpha=0.05);&amp;nbsp;&amp;nbsp; var binary; run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But how do I do it for a categorical variable such as race?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So that I get the following&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Race&lt;/P&gt;
&lt;P&gt;White X, 95% CI x1-x2&lt;/P&gt;
&lt;P&gt;Black Y, 95% CI Y1-Y2&lt;/P&gt;
&lt;P&gt;Asian Z, 95% cI Z1-Z2&lt;/P&gt;
&lt;P&gt;and so on.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;THanks so much in advance,&lt;/P&gt;
&lt;P&gt;Louise&lt;/P&gt;
&lt;PRE class="sascode"&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 24 Jan 2016 18:04:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/245713#M12948</guid>
      <dc:creator>LLW</dc:creator>
      <dc:date>2016-01-24T18:04:56Z</dc:date>
    </item>
    <item>
      <title>Re: 95% CI for Categorical Variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/245745#M12949</link>
      <description>&lt;P&gt;PROC UNIVARIATE only analyzes continuous variables.&amp;nbsp; For discrete variables, you have to use other procedures such as FREQ, LOGISTIC, GENMOD, and CATMOD.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It sounds like you want the confidence intervals for a multinomial variable. An internet search reveals the following possibilities:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="http://support.sas.com/kb/32/609.html" target="_self"&gt;Confidence interval for a multinomial proportion&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;If the data are from a survey, &lt;A href="http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_surveyfreq_a0000000221.htm" target="_self"&gt;use PROC SURVEYFREQ&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;If you describe the source of your data, we might be able to make particular recommendations.&lt;/P&gt;</description>
      <pubDate>Sun, 24 Jan 2016 22:07:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/245745#M12949</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2016-01-24T22:07:56Z</dc:date>
    </item>
    <item>
      <title>Re: 95% CI for Categorical Variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/245747#M12950</link>
      <description>&lt;P&gt;Hi Louise,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I see, Rick was faster. I also had a look at that &lt;A href="http://support.sas.com/kb/32/609.html" target="_blank"&gt;SAS Usage Note 32609&lt;/A&gt;. Depending on whether you have a SAS/IML license, there is the "MULTINOM" or the PROC CATMOD approach.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In either case&amp;nbsp;one could add SAS code to&amp;nbsp;automate the process to some extent, so that you don't have to enter, e.g., the response statement for PROC CATMOD manually.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, I haven't checked yet if these CIs are identical&amp;nbsp;to those in&amp;nbsp;&lt;A href="http://www.springer.com/de/book/9781461381242" target="_blank"&gt;R.G. Miller, Simultaneous Statistical Inference&lt;/A&gt;, p. 217, eqn. 14. (Not sure if this is relevant to you.) Both refer to the Bonferroni adjustment.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Edit:&lt;/STRONG&gt; I've checked this now. No, the definition in that classic reference is very different. Those CIs are not symmetric about the point estimate. Please see my next post for more details.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jan 2016 23:37:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/245747#M12950</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2016-01-27T23:37:52Z</dc:date>
    </item>
    <item>
      <title>Re: 95% CI for Categorical Variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/246503#M12997</link>
      <description>&lt;P&gt;I have investigated the CIs proposed in &lt;A href="http://support.sas.com/kb/32/609.html" target="_blank" rel="nofollow"&gt;SAS Usage Note 32609&lt;/A&gt;&amp;nbsp;with a primary focus on the PROC CATMOD approach as I don't have SAS/IML licensed.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It turned out that the CIs produced by PROC CATMOD (at least in the example given) are merely the same that could be produced by PROC FREQ using the BINOMIAL option of the TABLES statement: the simple, well-known approximate CI for a &lt;EM&gt;single&lt;/EM&gt; proportion. I've observed only three marginal differences:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;PROC FREQ, unlike PROC CATMOD, limits the confidence bounds to [0, 1].&lt;/LI&gt;
&lt;LI&gt;Both procedures have difficulties with zero frequencies when using default settings.&lt;BR /&gt; PROC FREQ has the ZEROS option of the WEIGHT statement to produce the degenerate interval [0, 0] in this case.&lt;BR /&gt;There doesn't seem to exist an equivalent option in PROC CATMOD.&lt;/LI&gt;
&lt;LI&gt;PROC FREQ is much faster (but appears to be still slower than calculating the CI in a data step).&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Given this comparison, I don't know why that SAS Usage Note suggests PROC CATMOD to perform a typical task for PROC FREQ.&lt;/P&gt;
&lt;P&gt;Moreover, there seems to be an error in Example 2 of the document: It says that "Bonferroni adjusted 95% confidence intervals" are calculated using the "MULTINOM" module and they call this module with the appropriate parameter ("S") for simultaneous CIs, but the results presented in the table contain only unadjusted 95% CIs! These are identical to those computed with PROC CATMOD subsequently.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Both PROC FREQ (TABLES statement) and PROC CATMOD (MODEL statement) have the common ALPHA= option, so that the Bonferroni adjustment can be achieved by increasing the nominal confidence level (from 95% to 98.333...% in the example with 3 categories).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To demonstrate the difference between adjusted and unadjusted CI in terms of coverage probability and to compare these CIs to that (from R.G. Miller's monograph) mentioned in my previous post, I've performed a quick (run time &amp;lt;1 min) simulation. It involves 10 million random samples from one particular multinomial distribution. Please see my next post.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jan 2016 23:50:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/246503#M12997</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2016-01-27T23:50:57Z</dc:date>
    </item>
    <item>
      <title>Re: 95% CI for Categorical Variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/246504#M12998</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Simulation to determine coverage probabilities of the following three confidence regions for
   the parameter vector (p1, p2, p3) of a Mult(100, p1, p2, p3) (multinomial) distribution with
   p1=0.1, p2=0.18, p3=0.72 (example parameter values taken from SAS Usage Note 32609)

   (1) 95% Wald confidence intervals for the individual probabilities (as computed by PROC CATMOD
       in SAS Usage Note 32609, but the same as available from PROC FREQ except for the [trivial]
       limitation of confidence bounds to [0, 1] in PROC FREQ)
   (2) 98.333...% Wald confidence intervals as above (Bonferroni adjustment: (1-0.05/3)*100%), as
       also suggested in R.G. Miller, Simultaneous Statistical Inference, p. 216, formula 10
   (3) Bonferroni-adjusted CIs suggested in formula 14, p. 217, ibid., based on chi-square statistics
*/

%let n=100;   /* parameters of multinomial distribution */
%let p1=0.1;
%let p2=0.18;
%let p3=0.72;
%let c=3;     /* number of categories */
%let a=0.05;  /* alpha */

data sim;
call streaminit(271828);
z=probit(1-&amp;amp;a/2);       /* normal quantile without Bonferroni adjustment */ 
b=probit(1-&amp;amp;a/(2*&amp;amp;c));  /* normal quantile with Bonferroni adjustment */
x=cinv(1-&amp;amp;a/&amp;amp;c,1);      /* chi-square quantile, df=1, with Bonferroni adjustment */
array n[3];             /* samples of the multinomial distribution */
array lcl[3,3];         /* lower confidence limits (indices e.g. [2,3] for interval 2, param. p3) */
array ucl[3,3];         /* upper confidence limits (indices as above) */
array y[3];             /* 0-1 indicators for "(p1,p2,p3) contained in conf. region (1, 2, 3)" */
do i=1 to 1e7;          /* sample size for the simulation */
  n1=0; n2=0; n3=0;     /* initialization */
  do j=1 to &amp;amp;n;         /* generate sample from multinomial distribution */
    v=rand('table', &amp;amp;p1, &amp;amp;p2, &amp;amp;p3);
    n[v]+1;
  end;
  do k=1 to 3;          /* calculate CIs for p1, p2, p3 */
    p=n[k]/&amp;amp;n;          /* point estimate of probability "pk" (k=1, 2, 3) */
    s=sqrt(p*(1-p)/&amp;amp;n); /* approx. standard error of the above point estimate */
    lcl[1,k]=p-z*s;     /* CI 1 */
    ucl[1,k]=p+z*s;
    lcl[2,k]=p-b*s;     /* CI 2 */
    ucl[2,k]=p+b*s;
    lcl[3,k]=(x+2*n[k]-sqrt(x*(x+4*n[k]*(&amp;amp;n-n[k])/&amp;amp;n)))/(2*(&amp;amp;n+x)); /* CI 3 */
    ucl[3,k]=(x+2*n[k]+sqrt(x*(x+4*n[k]*(&amp;amp;n-n[k])/&amp;amp;n)))/(2*(&amp;amp;n+x));
    do m=1 to 3;        /* check if confidence regions cover the true parameter vector */
      y[m]=(lcl[m,1] &amp;lt;= &amp;amp;p1 &amp;lt;= ucl[m,1] &amp;amp;
            lcl[m,2] &amp;lt;= &amp;amp;p2 &amp;lt;= ucl[m,2] &amp;amp;
            lcl[m,3] &amp;lt;= &amp;amp;p3 &amp;lt;= ucl[m,3]);
    end;
  end;
  output;
end;
keep y:;
run;

/* Estimate coverage probabilities of the three confidence regions */

ods select BinomialCLs;
proc freq data=sim;
tables y: / binomial(level='1' cl=exact);
run;

/* Results:

     Point est.    Clopper-Pearson 95% CI
                   
 (1) 0.8551        0.8549         0.8554
 (2) 0.9402        0.9401         0.9404
 (3) 0.9633        0.9632         0.9634
*/&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;(Actually, this could be coded more elegantly by restricting the calculations to the finitely many [&amp;lt;&amp;lt;1E7] possible outcomes of the particular multinomial distribution and weighing the results by the corresponding probabilities. But the run time of the above code was only less than 1 minute.)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So, for the particular multinomial distribution Mult(100, 0.1, 0.18, 0.72), only the third confidence region achieves the nominal 95% confidence level. The second comes close, but the first fails, as expected.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;U&gt;Conclusion:&lt;/U&gt; The PROC CATMOD approach presented in SAS Usage Note 32609 does not seem to provide substantial benefits over using PROC FREQ (here &lt;EM&gt;with&lt;/EM&gt; Bonferroni adjustment):&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;ods exclude BinomialTest;
proc freq data=a;
weight count / zeros;
tables y / binomial (level='1') alpha=0.01666666666667;
tables y / binomial (level='2') alpha=0.01666666666667;
tables y / binomial (level='3') alpha=0.01666666666667;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;There are alternative confidence regions in the literature which can be implemented easily in a data step.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jan 2016 00:07:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/246504#M12998</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2016-01-28T00:07:33Z</dc:date>
    </item>
    <item>
      <title>Re: 95% CI for Categorical Variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/246580#M12999</link>
      <description>&lt;P&gt;If anyone is curious about how to simulate multinomial data in SAS, there are several options&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="http://blogs.sas.com/content/iml/2013/08/07/alternate-ways-to-simulate-multinomial-data.html" target="_self"&gt;Use the SAS DATA step&lt;/A&gt;, as FreelanceReinhard did&lt;/LI&gt;
&lt;LI&gt;Use&lt;A href="http://blogs.sas.com/content/iml/2013/08/07/alternate-ways-to-simulate-multinomial-data.html" target="_self"&gt; the SURVEYSELECT procedure &lt;/A&gt;with probability proportional to size sampling&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;Use &lt;A href="http://blogs.sas.com/content/iml/2013/08/05/simulate-from-multinomial-distribution.html" target="_self"&gt;the RandMultinomial function in SAS/IML software&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Thu, 28 Jan 2016 11:13:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/246580#M12999</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2016-01-28T11:13:40Z</dc:date>
    </item>
    <item>
      <title>Re: 95% CI for Categorical Variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/249849#M13137</link>
      <description>&lt;P&gt;Thank you so much everyone for your help!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I adapted the code to account for every single race value in my "race variable" and "missing" so that my 95% CI would account for missing values as well.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc freq data=data;&lt;BR /&gt;tables race: / missing binomial(level='1' cl=exact);&lt;BR /&gt;tables race: / missing binomial(level='2' cl=exact);&lt;BR /&gt;tables race: / missing binomial(level='3' cl=exact);&lt;BR /&gt;tables race: / missing binomial(level='4' cl=exact);&lt;BR /&gt;tables race: / missing binomial(level='5' cl=exact);&lt;BR /&gt;tables race: / missing binomial(level='6' cl=exact);&lt;BR /&gt;run;&lt;/P&gt;</description>
      <pubDate>Sat, 13 Feb 2016 04:06:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/249849#M13137</guid>
      <dc:creator>LLW</dc:creator>
      <dc:date>2016-02-13T04:06:20Z</dc:date>
    </item>
    <item>
      <title>Re: 95% CI for Categorical Variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/250004#M13163</link>
      <description>&lt;P&gt;This is fine if your focus is on the individual confidence intervals rather than a simultaneous confidence region.&lt;/P&gt;</description>
      <pubDate>Sun, 14 Feb 2016 22:24:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/250004#M13163</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2016-02-14T22:24:43Z</dc:date>
    </item>
    <item>
      <title>Re: 95% CI for Categorical Variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/333141#M17593</link>
      <description>&lt;P&gt;Adding reference for the future: &lt;A href="http://blogs.sas.com/content/iml/2017/02/15/confidence-intervals-multinomial-proportions.html" target="_self"&gt;compute simultaneous confidence intervals for multinomial proportions in SAS.&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Feb 2017 19:59:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/95-CI-for-Categorical-Variables/m-p/333141#M17593</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2017-02-15T19:59:57Z</dc:date>
    </item>
  </channel>
</rss>

