Hi,
I have a data with 8 categories and their counts (freq). I am interested in only 2 specific categories. So, I want to test equality of the proportions of only those 2 categories using chi square test in proc freq. However, to compute proportions, I need to consider total of all the 8 categories. How to do it?
I have many such data sets and some of the interested categories have count < 5; so I need to use Fisher's exact test on those data sets. So, how to give option (Fisher or exact) in such case?
My SAS program with data is as follows. When I run it, it considers all the categories,which I don't want:
DATA cfsstudy;
INPUT cfs $ count;
DATALINES;
midline 19
general 4
Parietal 1
frontal 1
central 3
occipital 2
TPO 2
multifocal 3
;
RUN;
*this is the data it is saying has invalid data;
proc freq data= cfsstudy;
weight count;
tables cfs /chisq;
* /exact;
run;
I appreciate your feedback.
Your 8 categories are the categories of a multinomial variable. A comparison of the proportions in a multinomial variable is a comparison of dependent proportions. This can be done in PROC CATMOD or with specialized code in SAS/IML. Both are discussed and illustrated in this note which also shows how confidence intervals for the proportions can be obtained.
> this is the data it is saying has invalid data
I do not see any errors with the code you submitted. It correctly performs a chi-square test for the null hypothesis that the proportions are equal across all categories.
You ask how to compare only two categories. It's not clear what you mean. Do you mean that you want to compare two groups such as general and central, ignoring the other observations? If so, you could add
WHERE cfs in ('general' 'central');
and a binary test of proportions to the PROC FREQ call.
Thanks. Yes, I want to compare only 2 categories, but by considering the total of all 8 categories, i.e 35 in this case and not just the total of those 2 categories. The proc freq with chisq considers total of only 2 categories.
Basically, I want to test if there is a significant difference in the frequencies/counts of the 2 selected categories by considering the total observations. So, for the “general” and “central” categories, I want to consider the proportions 4/35 and 3/35 (and not 4/7, 3/7). Is there any way to test it?
(Sorry, the error message in my program was old.)
Your 8 categories are the categories of a multinomial variable. A comparison of the proportions in a multinomial variable is a comparison of dependent proportions. This can be done in PROC CATMOD or with specialized code in SAS/IML. Both are discussed and illustrated in this note which also shows how confidence intervals for the proportions can be obtained.
Thank you very much. Both methods work.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.