- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have many cross tabulations using proc freq like below:
variable: city (Ottawa, Montreal, Toronto, Vancouver), sex (male vs female) and drinking (heavy drinker, moderate drinker, no drinker)
proc freq data=mydata;
Tables city * sex * drinking / nopercent nocum;
weight = individual_weight ;
run;
I'd like to compare the row percents among several tables, for example:
Whether the difference between male vs female perceived heavy drinker is significant
or whether the difference between women living in Ottawa vs Montreal that perceived moderate drinker is significant .....
I can do CHISQ option for table statement, but it doesn't give me what I want. I need to compute CL for each estimations. But, I don't know how?
Thank you,
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1. Your syntax for the WEIGHT statement is wrong. Delete the equal sign.
2. With the syntax you've used, the first variable (city) determines the strata or groups. For each group, you get an (r x c) table and tests for the association between the 2nd and 3rd variables (sex and drinking). The chi-square test then applies to the sex-by-drinking tables, controlling for the city. The CL option on the TABLE statement produces confidence intervals for the STATISTICS (not for the cell percentages or row percentages... it's not clear what CLs you want).
3. For tests of specific situations, you can use the WHERE statement to subset the statement, which will probably make it easier to interpret the tests. For example, you might run
proc freq data=mydata;
where sex=0 and (city='Ottawa' or city='Montreal');
Tables city * drinking / nopercent nocum chisquare cl;
weight individual_weight ;
run;
If this doesn't answer your questions, please clarify.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1. Your syntax for the WEIGHT statement is wrong. Delete the equal sign.
2. With the syntax you've used, the first variable (city) determines the strata or groups. For each group, you get an (r x c) table and tests for the association between the 2nd and 3rd variables (sex and drinking). The chi-square test then applies to the sex-by-drinking tables, controlling for the city. The CL option on the TABLE statement produces confidence intervals for the STATISTICS (not for the cell percentages or row percentages... it's not clear what CLs you want).
3. For tests of specific situations, you can use the WHERE statement to subset the statement, which will probably make it easier to interpret the tests. For example, you might run
proc freq data=mydata;
where sex=0 and (city='Ottawa' or city='Montreal');
Tables city * drinking / nopercent nocum chisquare cl;
weight individual_weight ;
run;
If this doesn't answer your questions, please clarify.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Rick_SAS, thank you for the response. The mistake on the weight syntax was a mistyping. As for making subset for specific test of significant, i have loads of table with many cross-tabulations. That's what I'm searching for faster/easier way to find those.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Rick,
I have another question please. I did as you said and it seems it is the only way i can do the test - with many tables I have, it can be done with Macro. But how about if the "drinking" has more than 2 values, for example, no drinker, moderate drinker and heavy drinker :
proc freq data=mydata;
where sex=0 and (city='Ottawa' or city='Montreal');
Tables city * drinking / nopercent nocum chisquare cl;
weight individual_weight ;
run;
I run the above syntax and the estimations are significant but I cannot tell the differences between the group of drinkers. Do I have to sub-sample for drinking as well? Doesn't that look alarming to have so many sub samples? Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
> I cannot tell the differences between the group of drinkers.
That is correct. A chi-square test tells you that the distribution of the proportions is different from the null distribution, but it does not tell you which categories are responsible for the difference. This is (unfortunately!) the case for many statistical tests: They tell you the model doesn't fit the data but they don't necessarily tell you why.
You can use deviations from the expected values to inspect which cells are most different from their expected values. A nice graphical summary is provided by using the mosaic plot. Try adding
plots=MosaicPlot(colorstat=StdRes)
to your TABLES statement. For more information about mosaic plots and deviations from expected values, see "Color cells in a mosaic plot by deviation from independence."
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That's great. Thank you so much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Rick,
Sorry to bother you again. I ran the code and have the plot as PDF file. I found how to interpret the plot according to the info you sent me. Thank you!
I still would like to know more about the interepretation of significant test useing STD residual here. I'm particularly interested to see whether there is significant diff in drinking group #2 between city 1 and 2. Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The mosaic plot shows deviations from expected values under the hypothesis of no association.
I believe I have already answered your question when I said:
3. For tests of specific situations, you can use the WHERE statement to subset the statement, which will probably make it easier to interpret the tests.
So if you have a specific research question such as "is there a significant diff in drinking group #2 between city 1 and 2" (averaged over other factors or conditional on other factors), then you should use the data and code that are relevant to that test.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ok, I will do that. Thank you so much for your time.