Help using Base SAS procedures

Comparing observed to expected values in 2x2 contingency tables

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 81
Accepted Solution

Comparing observed to expected values in 2x2 contingency tables

I'm conducting a chi-squared goodness of fit test comparing the following observed cell counts to the expected count values.

 

Observed    
  X Y Total
No 209,916 1,191 211,107
Yes 7,645 461 8,106
Total 217,561 1,652 219,213
       
Expected      
  X Y Total
No 209,516.100 1,590.913 211,107
Yes 8,044.913 61.087 8,106
Total 217,561 1,652 219,213

 

Is it possible to find the chi-square p-value for 2x2 or larger multinomial tables with specified expected values using proc freq?

 

I ended up hand calculating the chi-square test statistic in the above example (= 2,739.234) and then ran the following code for chi-square dbn with 1 df:

 

data _null_;
	pvalue = 1 - PROBCHI(2739.234, 1);
	put pvalue;
run;

The resulting p-value was so small SAS rounded down to zero.

 


Accepted Solutions
Solution
‎01-19-2016 10:12 AM
SAS Super FREQ
Posts: 3,755

Re: Comparing observed to expected values in 2x2 contingency tables

Although you can use the TESTF= and TESTP= options for one-way tables, these options are not supported for tw-way tables. One reason is that the test statistic has an asymptotic chi-square distribution under the null hypothesis of independence between rows and columns.   If you plug in your own expected values, there is not reason to think that the (Obs-Expected)**2/Expected statistic is distributed like chi-square.  Therefore you can't compute p-values in the usual way.

View solution in original post


All Replies
Super User
Posts: 19,850

Re: Comparing observed to expected values in 2x2 contingency tables

PROC FREQ does all of that...though maybe not in the exact format you want?

 

proc freq data=have;
table var1*var2/chisq expected list;
weight freq;
ods table crosstabfreqs=want1;
ods table chisq=want2;
run;

proc print data=want1;
proc print data=want2;
run;
Frequent Contributor
Posts: 81

Re: Comparing observed to expected values in 2x2 contingency tables

Thanks Reeza -

 

Does the "expected" option allow the user to manually enter in the expected null hypothesis values, or does "expected" only calculate the 2x2 table's row and column means through cross-multiplication?

Solution
‎01-19-2016 10:12 AM
SAS Super FREQ
Posts: 3,755

Re: Comparing observed to expected values in 2x2 contingency tables

Although you can use the TESTF= and TESTP= options for one-way tables, these options are not supported for tw-way tables. One reason is that the test statistic has an asymptotic chi-square distribution under the null hypothesis of independence between rows and columns.   If you plug in your own expected values, there is not reason to think that the (Obs-Expected)**2/Expected statistic is distributed like chi-square.  Therefore you can't compute p-values in the usual way.

Frequent Contributor
Posts: 81

Re: Comparing observed to expected values in 2x2 contingency tables

Ah - in that case maybe a better idea would be to conduct two binomial tests comparing the % Yes for X and Y between the Observed & Expected?

SAS Super FREQ
Posts: 3,755

Re: Comparing observed to expected values in 2x2 contingency tables

It seems like that is an answer to a different question than you originally asked. But, yes, you could use PROC FREQ and use two TABLES statements to conduct two hypothesis tests for the marginal distributions.

Respected Advisor
Posts: 2,655

Re: Comparing observed to expected values in 2x2 contingency tables

I'll follow up on @Rick_SAS's comment.  In a 2x2 table with fixed margins, the expected value is determined by the marginal values--you literally cannot specify other values.  If you loosen this restriction, then it is indeed separate binomial tests against prespecified expected values.  My question would be "Where do you get those values?" and more importantly, "How many observations go into the estimate of the proportion?"  The latter is a prime determinant of both the Type I and Type II errors for what you are going after.

 

Steve Denham

Super User
Posts: 10,041

Re: Comparing observed to expected values in 2x2 contingency tables

[ Edited ]

Yes . Proc freq will do these all for you .

But If you use IML code, that would be very easy thing too.

 the p-value is also near zero , which means reject H0 .

Sorry. I am confused with DF. DF=1 .

 

data o;
input (X Y) (: comma32.);
cards;
209,916 1,191	
7,645 461	
run;
data e;
input (X Y) (: comma32.);
cards;
209,516.100 1,590.913
8,044.913 61.087
;
run;

proc iml;
use o;
read all var _num_ into o;
close;
use e;
read all var _num_ into e;
close;
df=nrow(o)*ncol(o)-nrow(o)-ncol(o)+1;
chi= sum((o-e)##2/e);
p=1-cdf('chisq',chi,df);
print chi,df,p[f=pvalue.];
quit;

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 537 views
  • 4 likes
  • 5 in conversation