BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RobF
Quartz | Level 8

I'm conducting a chi-squared goodness of fit test comparing the following observed cell counts to the expected count values.

 

Observed    
  X Y Total
No 209,916 1,191 211,107
Yes 7,645 461 8,106
Total 217,561 1,652 219,213
       
Expected      
  X Y Total
No 209,516.100 1,590.913 211,107
Yes 8,044.913 61.087 8,106
Total 217,561 1,652 219,213

 

Is it possible to find the chi-square p-value for 2x2 or larger multinomial tables with specified expected values using proc freq?

 

I ended up hand calculating the chi-square test statistic in the above example (= 2,739.234) and then ran the following code for chi-square dbn with 1 df:

 

data _null_;
	pvalue = 1 - PROBCHI(2739.234, 1);
	put pvalue;
run;

The resulting p-value was so small SAS rounded down to zero.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Although you can use the TESTF= and TESTP= options for one-way tables, these options are not supported for tw-way tables. One reason is that the test statistic has an asymptotic chi-square distribution under the null hypothesis of independence between rows and columns.   If you plug in your own expected values, there is not reason to think that the (Obs-Expected)**2/Expected statistic is distributed like chi-square.  Therefore you can't compute p-values in the usual way.

View solution in original post

7 REPLIES 7
Reeza
Super User

PROC FREQ does all of that...though maybe not in the exact format you want?

 

proc freq data=have;
table var1*var2/chisq expected list;
weight freq;
ods table crosstabfreqs=want1;
ods table chisq=want2;
run;

proc print data=want1;
proc print data=want2;
run;
RobF
Quartz | Level 8

Thanks Reeza -

 

Does the "expected" option allow the user to manually enter in the expected null hypothesis values, or does "expected" only calculate the 2x2 table's row and column means through cross-multiplication?

Rick_SAS
SAS Super FREQ

Although you can use the TESTF= and TESTP= options for one-way tables, these options are not supported for tw-way tables. One reason is that the test statistic has an asymptotic chi-square distribution under the null hypothesis of independence between rows and columns.   If you plug in your own expected values, there is not reason to think that the (Obs-Expected)**2/Expected statistic is distributed like chi-square.  Therefore you can't compute p-values in the usual way.

RobF
Quartz | Level 8

Ah - in that case maybe a better idea would be to conduct two binomial tests comparing the % Yes for X and Y between the Observed & Expected?

Rick_SAS
SAS Super FREQ

It seems like that is an answer to a different question than you originally asked. But, yes, you could use PROC FREQ and use two TABLES statements to conduct two hypothesis tests for the marginal distributions.

SteveDenham
Jade | Level 19

I'll follow up on @Rick_SAS's comment.  In a 2x2 table with fixed margins, the expected value is determined by the marginal values--you literally cannot specify other values.  If you loosen this restriction, then it is indeed separate binomial tests against prespecified expected values.  My question would be "Where do you get those values?" and more importantly, "How many observations go into the estimate of the proportion?"  The latter is a prime determinant of both the Type I and Type II errors for what you are going after.

 

Steve Denham

Ksharp
Super User

Yes . Proc freq will do these all for you .

But If you use IML code, that would be very easy thing too.

 the p-value is also near zero , which means reject H0 .

Sorry. I am confused with DF. DF=1 .

 

data o;
input (X Y) (: comma32.);
cards;
209,916 1,191	
7,645 461	
run;
data e;
input (X Y) (: comma32.);
cards;
209,516.100 1,590.913
8,044.913 61.087
;
run;

proc iml;
use o;
read all var _num_ into o;
close;
use e;
read all var _num_ into e;
close;
df=nrow(o)*ncol(o)-nrow(o)-ncol(o)+1;
chi= sum((o-e)##2/e);
p=1-cdf('chisq',chi,df);
print chi,df,p[f=pvalue.];
quit;

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2572 views
  • 4 likes
  • 5 in conversation