Solved: Re: Comparing observed to expected values in 2x2 contingency tables

RobF · Posted 01-13-2016 02:48 PM

I'm conducting a chi-squared goodness of fit test comparing the following observed cell counts to the expected count values.

Observed
	X	Y	Total
No	209,916	1,191	211,107
Yes	7,645	461	8,106
Total	217,561	1,652	219,213

Expected
	X	Y	Total
No	209,516.100	1,590.913	211,107
Yes	8,044.913	61.087	8,106
Total	217,561	1,652	219,213

Is it possible to find the chi-square p-value for 2x2 or larger multinomial tables with specified expected values using proc freq?

I ended up hand calculating the chi-square test statistic in the above example (= 2,739.234) and then ran the following code for chi-square dbn with 1 df:

data _null_;
	pvalue = 1 - PROBCHI(2739.234, 1);
	put pvalue;
run;

The resulting p-value was so small SAS rounded down to zero.

Rick_SAS · Posted 01-14-2016 10:29 AM

Although you can use the TESTF= and TESTP= options for one-way tables, these options are not supported for tw-way tables. One reason is that the test statistic has an asymptotic chi-square distribution under the null hypothesis of independence between rows and columns. If you plug in your own expected values, there is not reason to think that the (Obs-Expected)**2/Expected statistic is distributed like chi-square. Therefore you can't compute p-values in the usual way.

View solution in original post

Reeza · Posted 01-13-2016 03:48 PM

PROC FREQ does all of that...though maybe not in the exact format you want?

proc freq data=have;
table var1*var2/chisq expected list;
weight freq;
ods table crosstabfreqs=want1;
ods table chisq=want2;
run;

proc print data=want1;
proc print data=want2;
run;

RobF · Posted 01-14-2016 10:17 AM

Thanks Reeza -

Does the "expected" option allow the user to manually enter in the expected null hypothesis values, or does "expected" only calculate the 2x2 table's row and column means through cross-multiplication?

Rick_SAS · Posted 01-14-2016 10:29 AM

Although you can use the TESTF= and TESTP= options for one-way tables, these options are not supported for tw-way tables. One reason is that the test statistic has an asymptotic chi-square distribution under the null hypothesis of independence between rows and columns. If you plug in your own expected values, there is not reason to think that the (Obs-Expected)**2/Expected statistic is distributed like chi-square. Therefore you can't compute p-values in the usual way.

RobF · Posted 01-14-2016 01:29 PM

Ah - in that case maybe a better idea would be to conduct two binomial tests comparing the % Yes for X and Y between the Observed & Expected?

Rick_SAS · Posted 01-14-2016 01:59 PM

It seems like that is an answer to a different question than you originally asked. But, yes, you could use PROC FREQ and use two TABLES statements to conduct two hypothesis tests for the marginal distributions.

SteveDenham · Posted 01-14-2016 03:47 PM

I'll follow up on @Rick_SAS's comment. In a 2x2 table with fixed margins, the expected value is determined by the marginal values--you literally cannot specify other values. If you loosen this restriction, then it is indeed separate binomial tests against prespecified expected values. My question would be "Where do you get those values?" and more importantly, "How many observations go into the estimate of the proportion?" The latter is a prime determinant of both the Type I and Type II errors for what you are going after.

Steve Denham

Ksharp · Posted 01-13-2016 09:26 PM

Yes . Proc freq will do these all for you .

But If you use IML code, that would be very easy thing too.

the p-value is also near zero , which means reject H0 .

Sorry. I am confused with DF. DF=1 .

data o;
input (X Y) (: comma32.);
cards;
209,916 1,191	
7,645 461	
run;
data e;
input (X Y) (: comma32.);
cards;
209,516.100 1,590.913
8,044.913 61.087
;
run;

proc iml;
use o;
read all var _num_ into o;
close;
use e;
read all var _num_ into e;
close;
df=nrow(o)*ncol(o)-nrow(o)-ncol(o)+1;
chi= sum((o-e)##2/e);
p=1-cdf('chisq',chi,df);
print chi,df,p[f=pvalue.];
quit;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away