Solved: Re: significance test for the estimation of proportions

mandan414 · Posted 02-11-2021 10:16 PM

Hello,

I have many cross tabulations using proc freq like below:

variable: city (Ottawa, Montreal, Toronto, Vancouver), sex (male vs female) and drinking (heavy drinker, moderate drinker, no drinker)

proc freq data=mydata;

Tables city * sex * drinking / nopercent nocum;

weight = individual_weight ;

run;

I'd like to compare the row percents among several tables, for example:

Whether the difference between male vs female perceived heavy drinker is significant

or whether the difference between women living in Ottawa vs Montreal that perceived moderate drinker is significant .....

I can do CHISQ option for table statement, but it doesn't give me what I want. I need to compute CL for each estimations. But, I don't know how?

Thank you,

Rick_SAS · Posted 02-12-2021 09:31 AM

1. Your syntax for the WEIGHT statement is wrong. Delete the equal sign.

2. With the syntax you've used, the first variable (city) determines the strata or groups. For each group, you get an (r x c) table and tests for the association between the 2nd and 3rd variables (sex and drinking). The chi-square test then applies to the sex-by-drinking tables, controlling for the city. The CL option on the TABLE statement produces confidence intervals for the STATISTICS (not for the cell percentages or row percentages... it's not clear what CLs you want).

3. For tests of specific situations, you can use the WHERE statement to subset the statement, which will probably make it easier to interpret the tests. For example, you might run

proc freq data=mydata;
where sex=0 and (city='Ottawa' or city='Montreal');
       Tables city * drinking / nopercent nocum chisquare cl;
       weight individual_weight ;
run;

If this doesn't answer your questions, please clarify.

View solution in original post

Rick_SAS · Posted 02-12-2021 09:31 AM

1. Your syntax for the WEIGHT statement is wrong. Delete the equal sign.

2. With the syntax you've used, the first variable (city) determines the strata or groups. For each group, you get an (r x c) table and tests for the association between the 2nd and 3rd variables (sex and drinking). The chi-square test then applies to the sex-by-drinking tables, controlling for the city. The CL option on the TABLE statement produces confidence intervals for the STATISTICS (not for the cell percentages or row percentages... it's not clear what CLs you want).

3. For tests of specific situations, you can use the WHERE statement to subset the statement, which will probably make it easier to interpret the tests. For example, you might run

proc freq data=mydata;
where sex=0 and (city='Ottawa' or city='Montreal');
       Tables city * drinking / nopercent nocum chisquare cl;
       weight individual_weight ;
run;

If this doesn't answer your questions, please clarify.

mandan414 · Posted 02-16-2021 11:38 AM

Rick_SAS, thank you for the response. The mistake on the weight syntax was a mistyping. As for making subset for specific test of significant, i have loads of table with many cross-tabulations. That's what I'm searching for faster/easier way to find those.

mandan414 · Posted 03-04-2021 07:22 AM

Hi Rick,

I have another question please. I did as you said and it seems it is the only way i can do the test - with many tables I have, it can be done with Macro. But how about if the "drinking" has more than 2 values, for example, no drinker, moderate drinker and heavy drinker :

proc freq data=mydata;
where sex=0 and (city='Ottawa' or city='Montreal');
       Tables city * drinking / nopercent nocum chisquare cl;
       weight individual_weight ;
run;

I run the above syntax and the estimations are significant but I cannot tell the differences between the group of drinkers. Do I have to sub-sample for drinking as well? Doesn't that look alarming to have so many sub samples? Thank you!

Rick_SAS · Posted 03-04-2021 07:46 AM

> I cannot tell the differences between the group of drinkers.

That is correct. A chi-square test tells you that the distribution of the proportions is different from the null distribution, but it does not tell you which categories are responsible for the difference. This is (unfortunately!) the case for many statistical tests: They tell you the model doesn't fit the data but they don't necessarily tell you why.

You can use deviations from the expected values to inspect which cells are most different from their expected values. A nice graphical summary is provided by using the mosaic plot. Try adding

plots=MosaicPlot(colorstat=StdRes)

to your TABLES statement. For more information about mosaic plots and deviations from expected values, see "Color cells in a mosaic plot by deviation from independence."

mandan414 · Posted 03-04-2021 07:57 AM

That's great. Thank you so much!

mandan414 · Posted 03-04-2021 09:58 AM

Hi Rick,

Sorry to bother you again. I ran the code and have the plot as PDF file. I found how to interpret the plot according to the info you sent me. Thank you!

I still would like to know more about the interepretation of significant test useing STD residual here. I'm particularly interested to see whether there is significant diff in drinking group #2 between city 1 and 2. Thank you.

Rick_SAS · Posted 03-04-2021 10:23 AM

The mosaic plot shows deviations from expected values under the hypothesis of no association.

I believe I have already answered your question when I said:

3. For tests of specific situations, you can use the WHERE statement to subset the statement, which will probably make it easier to interpret the tests.

So if you have a specific research question such as "is there a significant diff in drinking group #2 between city 1 and 2" (averaged over other factors or conditional on other factors), then you should use the data and code that are relevant to that test.

mandan414 · Posted 03-04-2021 10:46 AM

Ok, I will do that. Thank you so much for your time.

Classroom Training Available!