Hello,
I have many cross tabulations using proc freq like below:
variable: city (Ottawa, Montreal, Toronto, Vancouver), sex (male vs female) and drinking (heavy drinker, moderate drinker, no drinker)
proc freq data=mydata;
Tables city * sex * drinking / nopercent nocum;
weight = individual_weight ;
run;
I'd like to compare the row percents among several tables, for example:
Whether the difference between male vs female perceived heavy drinker is significant
or whether the difference between women living in Ottawa vs Montreal that perceived moderate drinker is significant .....
I can do CHISQ option for table statement, but it doesn't give me what I want. I need to compute CL for each estimations. But, I don't know how?
Thank you,
1. Your syntax for the WEIGHT statement is wrong. Delete the equal sign.
2. With the syntax you've used, the first variable (city) determines the strata or groups. For each group, you get an (r x c) table and tests for the association between the 2nd and 3rd variables (sex and drinking). The chi-square test then applies to the sex-by-drinking tables, controlling for the city. The CL option on the TABLE statement produces confidence intervals for the STATISTICS (not for the cell percentages or row percentages... it's not clear what CLs you want).
3. For tests of specific situations, you can use the WHERE statement to subset the statement, which will probably make it easier to interpret the tests. For example, you might run
proc freq data=mydata;
where sex=0 and (city='Ottawa' or city='Montreal');
Tables city * drinking / nopercent nocum chisquare cl;
weight individual_weight ;
run;
If this doesn't answer your questions, please clarify.
1. Your syntax for the WEIGHT statement is wrong. Delete the equal sign.
2. With the syntax you've used, the first variable (city) determines the strata or groups. For each group, you get an (r x c) table and tests for the association between the 2nd and 3rd variables (sex and drinking). The chi-square test then applies to the sex-by-drinking tables, controlling for the city. The CL option on the TABLE statement produces confidence intervals for the STATISTICS (not for the cell percentages or row percentages... it's not clear what CLs you want).
3. For tests of specific situations, you can use the WHERE statement to subset the statement, which will probably make it easier to interpret the tests. For example, you might run
proc freq data=mydata;
where sex=0 and (city='Ottawa' or city='Montreal');
Tables city * drinking / nopercent nocum chisquare cl;
weight individual_weight ;
run;
If this doesn't answer your questions, please clarify.
Rick_SAS, thank you for the response. The mistake on the weight syntax was a mistyping. As for making subset for specific test of significant, i have loads of table with many cross-tabulations. That's what I'm searching for faster/easier way to find those.
Hi Rick,
I have another question please. I did as you said and it seems it is the only way i can do the test - with many tables I have, it can be done with Macro. But how about if the "drinking" has more than 2 values, for example, no drinker, moderate drinker and heavy drinker :
proc freq data=mydata;
where sex=0 and (city='Ottawa' or city='Montreal');
Tables city * drinking / nopercent nocum chisquare cl;
weight individual_weight ;
run;
I run the above syntax and the estimations are significant but I cannot tell the differences between the group of drinkers. Do I have to sub-sample for drinking as well? Doesn't that look alarming to have so many sub samples? Thank you!
> I cannot tell the differences between the group of drinkers.
That is correct. A chi-square test tells you that the distribution of the proportions is different from the null distribution, but it does not tell you which categories are responsible for the difference. This is (unfortunately!) the case for many statistical tests: They tell you the model doesn't fit the data but they don't necessarily tell you why.
You can use deviations from the expected values to inspect which cells are most different from their expected values. A nice graphical summary is provided by using the mosaic plot. Try adding
plots=MosaicPlot(colorstat=StdRes)
to your TABLES statement. For more information about mosaic plots and deviations from expected values, see "Color cells in a mosaic plot by deviation from independence."
That's great. Thank you so much!
Hi Rick,
Sorry to bother you again. I ran the code and have the plot as PDF file. I found how to interpret the plot according to the info you sent me. Thank you!
I still would like to know more about the interepretation of significant test useing STD residual here. I'm particularly interested to see whether there is significant diff in drinking group #2 between city 1 and 2. Thank you.
The mosaic plot shows deviations from expected values under the hypothesis of no association.
I believe I have already answered your question when I said:
3. For tests of specific situations, you can use the WHERE statement to subset the statement, which will probably make it easier to interpret the tests.
So if you have a specific research question such as "is there a significant diff in drinking group #2 between city 1 and 2" (averaged over other factors or conditional on other factors), then you should use the data and code that are relevant to that test.
Ok, I will do that. Thank you so much for your time.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.