turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Chi squared test for unequal sample sizes

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-22-2016 11:16 AM

Hi,

I’m trying to compare sex distributions among different data collection centres using the chi squared (chisq) option in SAS (9.2) but there are different numbers of observations for every data collection centre, which the resulting cross-tabs can’t account for. I’m still a beginner in statistics in SAS, but the only procedure that I know of which allows you to account for unequal sample sizes is proc anova (lines statement) and this is not the right test for this comparison. Would anyone be kind enough to share their insight on this? Thank you!

The following code creates an example of my data.

**DATA** example;

INFILE datalines;

INPUT centre_A centre_B centre_C centre_D centre_E;

DATALINES;

1 1 0 1 1

0 0 0 1 0

0 0 0 1 0

0 1 1 0 0

1 1 1 1 1

1 1 0 1 .

1 0 0 . .

1 0 . . .

1 1 . . .

0 0 . . .

0 1 . . .

;

**run**;

Accepted Solutions

Solution

03-23-2016
04:27 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-22-2016 12:57 PM

Assuming that 0-1 values are about sex and not knowing what individuals in an observation (line) have in common, I would suppose you could run the following Chi-square test to see if the sex ratio varies among centers:

```
DATA example;
INFILE datalines;
INPUT centre_1 - centre_5;
obs = _n_;
DATALINES;
1 1 0 1 1
0 0 0 1 0
0 0 0 1 0
0 1 1 0 0
1 1 1 1 1
1 1 0 1 .
1 0 0 . .
1 0 . . .
1 1 . . .
0 0 . . .
0 1 . . .
;
proc transpose data=example out=exList;
var centre_1 - centre_5;
by obs;
run;
proc freq data=exList;
table _name_*col1 / chisq;
run;
```

PG

All Replies

Solution

03-23-2016
04:27 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-22-2016 12:57 PM

Assuming that 0-1 values are about sex and not knowing what individuals in an observation (line) have in common, I would suppose you could run the following Chi-square test to see if the sex ratio varies among centers:

```
DATA example;
INFILE datalines;
INPUT centre_1 - centre_5;
obs = _n_;
DATALINES;
1 1 0 1 1
0 0 0 1 0
0 0 0 1 0
0 1 1 0 0
1 1 1 1 1
1 1 0 1 .
1 0 0 . .
1 0 . . .
1 1 . . .
0 0 . . .
0 1 . . .
;
proc transpose data=example out=exList;
var centre_1 - centre_5;
by obs;
run;
proc freq data=exList;
table _name_*col1 / chisq;
run;
```

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2016 04:24 AM

Thank you very much for your reply- its exactly what I needed! I do have some final questions about how to interpret the output:

- I imagine that I have to use the results from the Likelihood Ratio Chi-Square rather than the Chi-Square since I’m interested in the ratios, correct? (Although they were really similar)
- The default null hypothesis is be that there is no significant difference between the ratios, right?
- SAS warns me that 63% of my data is missing- this would make sense because of the unequal sample sizes. I wonder if this is a problem as far as the reliability of the results are concerned or how cautious I should be in interpreting the results?

Thanks again for your quick reply, I learned something new which I can apply elsewhere !!