Hello,
We are working with survey data and we want to know if the proportions of categories A, B and C are different for dependent variable which is (Y/N). Is there a way to test if the proportions between A and B and A and C are statistically significant? Or can we subset the data and run a simple chisq between A and B and another test for A and C?
proc surveyfreq data = sample1;
table DV*Categories/col row nostd nowt chisq ;
weight WTS_P;
run;
Thank you!
You will need to subset the data to get the 2x2 comparisons and then run SURVEYFREQ independently for each comparison. The trick however is that in subsetting, you will need to make sure you treat the groups as true subgroups (i.e. domains) in order to make sure the standard errors are correct. To do this, you will need to set all the groups not in the comparison to missing and then use the NOMCAR option on the SURVEYFREQ statement.
Roughly speaking it would look like this, creating a new variable for each comparison.
data new;
set old;
if category in ('A', 'B') then newcatab=category;
else newcatab=.;
proc surveyfreq data=new nomcar;
tables newcatab*sex ....;
....
run;
Thank you very much! It worked.
Sorry, it ran, but I am not sure how to interpret the output. I don't think it is comparing the proportions of A and B, but it is comparing A with all the other groups lumped together:
This is the code:
data sample2;
set sample1;
if CAT in ('A', ' B') then CAT2=CAT;
else CAT2=.;
RUN;
ods graphics off;
proc surveyfreq data=sample2 nomcar;
tables CAT2*SEX/col row nostd nowt wchisq chisq;
weight WTS_P;
run;
This is the output:
I would like to compare the row percents of A with B, A with C and A with D. This is the table:
Is it possible to do these comparisons making sure, as you said before, with the correct standard errors?
Thank you for your help
There is something odd about the output that you attached. It appears to be for a different table, namely, ANIMALCAT2*EAR_F_PROTEIN. Can you double-check to make sure you are looking at the right output. The other thing I would check is that you have properly subsetted. Here is an example using that table you sent.
data test;
do animalcat='A','B','C','D';
do sex='M','F';
input count;
do i=1 to count;
output;
end;
end;
end;
datalines;
67
51
424
166
1612
155
920
30
;
data subset;
set test;
if animalcat in ('A','B') then animalcat2=animalcat;
run;
proc surveyfreq data=subset;
tables animalcat*sex;
run;
proc surveyfreq data=subset nomcar;
tables animalcat2*sex;
run;
Hello,
Sorry, I didn't copy the right output. How can I check I'm sub-setting the data correctly? It will be more efficient if I did not have to copy the datalines since this is a huge data set and I have to run tests multiple times with different dependent variables. You mentioned before that it is important to include the other groups that are not being compared as missing values. Why is this missing on the last code?
This is the code that I used:
data sample2;
set sample1;
if ANIMALCAT in ('A', 'B') then ANIMALCAT2=ANIMALCAT;
RUN;
ods graphics off;
proc surveyfreq data=sample1;
tables ANIMALCAT*EAR_F_PROTEIN/col row nostd nowt wchisq chisq;
weight WTS_P;
run;
ods graphics off;
proc surveyfreq data=sample2 nomcar;
tables ANIMALCAT2*EAR_F_PROTEIN/col row nostd nowt wchisq chisq;
weight WTS_P;
run;
This is the output for the last proc.
How can I compare A and B, A and C and A and D?
Thank you in advance
Hi,
I figure what was wrong. I left a space between the quotation and the category B
data sample2;
set sample1;
if ANIMALCAT in ('A', ' B') then ANIMALCAT2=ANIMALCAT;
else ANIMALCAT2='.';
RUN;
Now without the space I get the right output:
data sample2;
set sample1;
if ANIMALCAT in ('A', 'B') then ANIMALCAT2=ANIMALCAT;
else ANIMALCAT2='.';
RUN;
So this is comparing A and B taking into account the other two groups for standard errors, right?
Please confirm I am interpreting this correctly.
Thank you again
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.