turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Comparing observed to expected values in 2x2 conti...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-13-2016 02:48 PM

I'm conducting a chi-squared goodness of fit test comparing the following observed cell counts to the expected count values.

Observed | |||

X | Y | Total | |

No | 209,916 | 1,191 | 211,107 |

Yes | 7,645 | 461 | 8,106 |

Total | 217,561 | 1,652 | 219,213 |

Expected | |||

X | Y | Total | |

No | 209,516.100 | 1,590.913 | 211,107 |

Yes | 8,044.913 | 61.087 | 8,106 |

Total | 217,561 | 1,652 | 219,213 |

Is it possible to find the chi-square p-value for 2x2 or larger multinomial tables with specified expected values using proc freq?

I ended up hand calculating the chi-square test statistic in the above example (= 2,739.234) and then ran the following code for chi-square dbn with 1 df:

```
data _null_;
pvalue = 1 - PROBCHI(2739.234, 1);
put pvalue;
run;
```

The resulting p-value was so small SAS rounded down to zero.

Accepted Solutions

Solution

01-19-2016
10:12 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-14-2016 10:29 AM

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-13-2016 03:48 PM

PROC FREQ does all of that...though maybe not in the exact format you want?

```
proc freq data=have;
table var1*var2/chisq expected list;
weight freq;
ods table crosstabfreqs=want1;
ods table chisq=want2;
run;
proc print data=want1;
proc print data=want2;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-14-2016 10:17 AM

Thanks Reeza -

Does the "expected" option allow the user to manually enter in the expected null hypothesis values, or does "expected" only calculate the 2x2 table's row and column means through cross-multiplication?

Solution

01-19-2016
10:12 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-14-2016 10:29 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-14-2016 01:29 PM

Ah - in that case maybe a better idea would be to conduct two binomial tests comparing the % Yes for X and Y between the Observed & Expected?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-14-2016 01:59 PM

It seems like that is an answer to a different question than you originally asked. But, yes, you could use PROC FREQ and use two TABLES statements to conduct two hypothesis tests for the marginal distributions.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-14-2016 03:47 PM

I'll follow up on @Rick_SAS's comment. In a 2x2 table with fixed margins, the expected value is determined by the marginal values--you literally cannot specify other values. If you loosen this restriction, then it is indeed separate binomial tests against prespecified expected values. My question would be "Where do you get those values?" and more importantly, "How many observations go into the estimate of the proportion?" The latter is a prime determinant of both the Type I and Type II errors for what you are going after.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-13-2016 09:26 PM - edited 01-14-2016 12:44 AM

Yes . Proc freq will do these all for you .

But If you use IML code, that would be very easy thing too.

the p-value is also near zero , which means reject H0 .

Sorry. I am confused with DF. DF=1 .

```
data o;
input (X Y) (: comma32.);
cards;
209,916 1,191
7,645 461
run;
data e;
input (X Y) (: comma32.);
cards;
209,516.100 1,590.913
8,044.913 61.087
;
run;
proc iml;
use o;
read all var _num_ into o;
close;
use e;
read all var _num_ into e;
close;
df=nrow(o)*ncol(o)-nrow(o)-ncol(o)+1;
chi= sum((o-e)##2/e);
p=1-cdf('chisq',chi,df);
print chi,df,p[f=pvalue.];
quit;
```