Hi,
I’m trying to compare sex distributions among different data collection centres using the chi squared (chisq) option in SAS (9.2) but there are different numbers of observations for every data collection centre, which the resulting cross-tabs can’t account for. I’m still a beginner in statistics in SAS, but the only procedure that I know of which allows you to account for unequal sample sizes is proc anova (lines statement) and this is not the right test for this comparison. Would anyone be kind enough to share their insight on this? Thank you!
The following code creates an example of my data.
DATA example;
INFILE datalines;
INPUT centre_A centre_B centre_C centre_D centre_E;
DATALINES;
1 1 0 1 1
0 0 0 1 0
0 0 0 1 0
0 1 1 0 0
1 1 1 1 1
1 1 0 1 .
1 0 0 . .
1 0 . . .
1 1 . . .
0 0 . . .
0 1 . . .
;
run;
Assuming that 0-1 values are about sex and not knowing what individuals in an observation (line) have in common, I would suppose you could run the following Chi-square test to see if the sex ratio varies among centers:
DATA example;
INFILE datalines;
INPUT centre_1 - centre_5;
obs = _n_;
DATALINES;
1 1 0 1 1
0 0 0 1 0
0 0 0 1 0
0 1 1 0 0
1 1 1 1 1
1 1 0 1 .
1 0 0 . .
1 0 . . .
1 1 . . .
0 0 . . .
0 1 . . .
;
proc transpose data=example out=exList;
var centre_1 - centre_5;
by obs;
run;
proc freq data=exList;
table _name_*col1 / chisq;
run;
Assuming that 0-1 values are about sex and not knowing what individuals in an observation (line) have in common, I would suppose you could run the following Chi-square test to see if the sex ratio varies among centers:
DATA example;
INFILE datalines;
INPUT centre_1 - centre_5;
obs = _n_;
DATALINES;
1 1 0 1 1
0 0 0 1 0
0 0 0 1 0
0 1 1 0 0
1 1 1 1 1
1 1 0 1 .
1 0 0 . .
1 0 . . .
1 1 . . .
0 0 . . .
0 1 . . .
;
proc transpose data=example out=exList;
var centre_1 - centre_5;
by obs;
run;
proc freq data=exList;
table _name_*col1 / chisq;
run;
Thank you very much for your reply- its exactly what I needed! I do have some final questions about how to interpret the output:
Thanks again for your quick reply, I learned something new which I can apply elsewhere !! 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.