- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have a survey for boys and girls who attended an education series for healthy eating and based on the following chi square test the pattern of change in behavior is different for boys and girls.
We asked the kids how the education affected their use of fruits and vegetables. I want to compare the same answers for patterns of consumption for girls and boys. For example, if the % boys who said they will use more vegies are significantly more than % girls who replied they will use more vegies (19.5% vs. 15.6%). I am not interested to test different levels for boys and girls, for example, if more boys said "use more" than girls who said "use less" (19.5% vs. 8.9%)
Could you please help me with this test?
proc surveyfreq data = &data VARHEADER = NAMELABEL nosummary;
tables Kids_gender*Fruits / nostd nocellpercent row row(cl) cv chisq;
weight WEIGHT_scale;
run;
Kids gender | Use of fruits and vegetables | Frequency | Weighted Frequency |
CV for Percent |
Row Percent |
95% Confidence Limits for Row Percent |
CV for Row Percent |
|
Girls | Use less | 748 | 688.4 | 0.044 | 8.9 | 8.1 | 9.6 | 0.0430 |
Use more | 1490 | 1214.1 | 0.031 | 15.6 | 14.7 | 16.5 | 0.0292 | |
No difference, would use the same amount |
5706 | 4559.3 | 0.015 | 58.7 | 57.4 | 59.9 | 0.0109 | |
Not applicable / I do not eat them |
912 | 860.8 | 0.040 | 11.1 | 10.2 | 11.9 | 0.0386 | |
Don't know | 459 | 445.6 | 0.056 | 5.7 | 5.1 | 6.4 | 0.0547 | |
Total | 9315 | 7768.3 | 0.011 | 100.0 | ||||
Boys | Use less | 647 | 955.1 | 0.045 | 10.7 | 9.8 | 11.6 | 0.0442 |
Use more | 1211 | 1735.6 | 0.032 | 19.5 | 18.3 | 20.6 | 0.0306 | |
No difference, would use the same amount |
3743 | 4812.7 | 0.016 | 54.0 | 52.5 | 55.4 | 0.0139 | |
Not applicable / I do not eat them | 596 | 884.3 | 0.048 | 9.9 | 9.0 | 10.8 | 0.0474 | |
Don't know | 340 | 532.4 | 0.063 | 6.0 | 5.2 | 6.7 | 0.0622 | |
Total | 6537 | 8920.1 | 0.009 | 100.0 | ||||
Total | Use less | 1395 | 1643.5 | 0.031 | ||||
Use more | 2701 | 2949.7 | 0.022 | |||||
No difference, would use the same amount |
9449 | 9372.0 | 0.009 | |||||
Not applicable / I do not eat them |
1508 | 1745.1 | 0.031 | |||||
Don't know | 799 | 978.0 | 0.042 | |||||
Total | 15852 | 16688.4 | ||||||
Frequency Missing = 3 |
Rao-Scott Chi-Square Test | |
Pearson Chi-Square | 67.5914 |
Design Correction | 1.6183 |
Rao-Scott Chi-Square | 41.7675 |
DF | 4 |
Pr > ChiSq | <.0001 |
F Value | 10.4419 |
Num DF | 4 |
Den DF | 63404 |
Pr > F | <.0001 |
Sample Size = 3 |
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Emma_at_SAS ,
Can't you solve this with a by-variable?
by pattern_of_consumption
The analysis will be repeated for each level of the by-variable.
You can then use the MULTTEST procedure to address the multiple testing problem (inflation of the type I - error).
Kind regards,
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you @sbxkoenk for your thoughts. I have a question about your suggestion. If I use the
by pattern_of_consumption
approach to slice the comparison for the subgroups of levels of consumption for fruits and vegies then for example, for the kids who said use less, I am comparing 748 girls who said they use less with 647 boys who said use less and I would compare 41.9% girls with 58.1% boys vs. I wanted to compare 8.9% girls with 10.7% boys from the overall sample of girls and boys (9315 girls and 6537 boys).
My concern/question is if it is correct to use the p-value from this BY analysis below in my MULTTEST procedure to adjust for type I error.
Kids Gender | Frequency | Weighted Frequency |
Percent | CV for Percent |
---|---|---|---|---|
Girls | 748 | 688 | 41.9 | 0.0379 |
Boys | 647 | 955 | 58.1 | 0.0273 |
Total | 1395 | 1644 | 100.0 |
Rao-Scott Chi-Square Test | |
---|---|
Pearson Chi-Square | 36.7227 |
Design Correction | 1.4406 |
Rao-Scott Chi-Square | 25.4908 |
DF | 1 |
Pr > ChiSq | <.0001 |
F Value | 25.4908 |
Num DF | 1 |
Den DF | 1394 |
Pr > F | <.0001 |
Sample Size = 1395 |
The percentages I want to compare | The percentages I compare with BY analysis | MULTTEST adjustment | |||||
Patterns of consumption | Female | Male | Female | Male | P-value | Adjusted p-value | |
Use less | 8.9 | 10.7 | 41.9 | 58.1 | 0.0001 | ||
Use more | 15.6 | 19.5 | 41.2 | 58.8 | 0.0001 | ||
No difference, would use the same amount |
58.7 | 54.0 | 48.6 | 51.4 | 0.0367 | ||
Not applicable / I do not use this |
11.1 | 9.9 | 49.3 | 50.7 | 0.6779 | ||
Don't know | 5.7 | 6.0 | 45.6 | 54.4 | 0.038 |
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have a survey for boys and girls who attended an education series for healthy eating and based on the following chi square test the pattern of change in behavior is different for boys and girls.
We asked the kids how the education affected their use of fruits and vegetables. I want to compare the same answers for patterns of consumption for girls and boys. For example, if the % boys who said they will use more vegies are significantly more than % girls who replied they will use more vegies (19.5% vs. 15.6%). I am not interested to test different levels for boys and girls, for example, if more boys said "use more" than girls who said "use less" (19.5% vs. 8.9%)
Could you please help me with this test?
proc surveyfreq data = &data VARHEADER = NAMELABEL nosummary;
tables Kids_gender*Fruits / nostd nocellpercent row row(cl) cv chisq;
weight WEIGHT_scale;
run;
Kids gender | Use of fruits and vegetables | Frequency | Weighted Frequency |
CV for Percent |
Row Percent |
95% Confidence Limits for Row Percent |
CV for Row Percent |
|
Girls | Use less | 748 | 688.4 | 0.044 | 8.9 | 8.1 | 9.6 | 0.0430 |
Use more | 1490 | 1214.1 | 0.031 | 15.6 | 14.7 | 16.5 | 0.0292 | |
No difference, would use the same amount |
5706 | 4559.3 | 0.015 | 58.7 | 57.4 | 59.9 | 0.0109 | |
Not applicable / I do not eat them |
912 | 860.8 | 0.040 | 11.1 | 10.2 | 11.9 | 0.0386 | |
Don't know | 459 | 445.6 | 0.056 | 5.7 | 5.1 | 6.4 | 0.0547 | |
Total | 9315 | 7768.3 | 0.011 | 100.0 | ||||
Boys | Use less | 647 | 955.1 | 0.045 | 10.7 | 9.8 | 11.6 | 0.0442 |
Use more | 1211 | 1735.6 | 0.032 | 19.5 | 18.3 | 20.6 | 0.0306 | |
No difference, would use the same amount |
3743 | 4812.7 | 0.016 | 54.0 | 52.5 | 55.4 | 0.0139 | |
Not applicable / I do not eat them | 596 | 884.3 | 0.048 | 9.9 | 9.0 | 10.8 | 0.0474 | |
Don't know | 340 | 532.4 | 0.063 | 6.0 | 5.2 | 6.7 | 0.0622 | |
Total | 6537 | 8920.1 | 0.009 | 100.0 | ||||
Total | Use less | 1395 | 1643.5 | 0.031 | ||||
Use more | 2701 | 2949.7 | 0.022 | |||||
No difference, would use the same amount |
9449 | 9372.0 | 0.009 | |||||
Not applicable / I do not eat them |
1508 | 1745.1 | 0.031 | |||||
Don't know | 799 | 978.0 | 0.042 | |||||
Total | 15852 | 16688.4 | ||||||
Frequency Missing = 3 |
Rao-Scott Chi-Square Test | |
Pearson Chi-Square | 67.5914 |
Design Correction | 1.6183 |
Rao-Scott Chi-Square | 41.7675 |
DF | 4 |
Pr > ChiSq | <.0001 |
F Value | 10.4419 |
Num DF | 4 |
Den DF | 63404 |
Pr > F | <.0001 |
Sample Size = 3 |
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You may get an easier to read for your purpose if you reverse the order of the variables in the tables statement. Try
proc surveyfreq data = &data VARHEADER = NAMELABEL nosummary; tables Fruits * Kids_gender / nostd nocellpercent row row(cl) cv chisq; weight WEIGHT_scale; run;
Then the boy/girl responses for the same level of the question will be closer together and easier to read but the information isn't going to change.
I don't see anything related to "change in behavior" though. That would require some sort of Before/After response time indicator.
I would also tend to be a bit concerned over your nearly 6% "Don't know" response. That might indicate the way that particular data point is collected wasn't very clear to a lot of respondents to the survey. I would consider creating another response variable where the "Don't Know" are set to missing so you can compare the responses among those respondents that made and actual choice.
The CHI-square test statistic tells you if there is/is not a significant difference in distribution of values overall.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you @ballardw for your thoughts and suggestions. I've added my thoughts to your comments below:
You may get an easier to read for your purpose if you reverse the order of the variables in the tables statement. Try
proc surveyfreq data = &data VARHEADER = NAMELABEL nosummary; tables Fruits * Kids_gender / nostd nocellpercent row row(cl) cv chisq; weight WEIGHT_scale; run;
Then the boy/girl responses for the same level of the question will be closer together and easier to read but the information isn't going to change. I tried this but I noticed it will give me different percentages than what I want. I want % girls among all girls who said they would use less, use more, ... but if I switch the order of the variables to Fruits * Kids_gender then I get %boys and girls who responded use less ...
Use of fruits and vegetables | Kids gender | Frequency | Weighted Frequency |
Percent | CV for Percent |
Row Percent |
95% Confidence Limits for Row Percent |
CV for Row Percent |
|
---|---|---|---|---|---|---|---|---|---|
Use less | Girls | 748 | 688.4 | 4.1253 | 0.044 | 41.9 | 38.8 | 45.0 | 0.0379 |
Boys | 647 | 955.1 | 5.7232 | 0.045 | 58.1 | 55.0 | 61.2 | 0.0273 | |
Total | 1395 |
I don't see anything related to "change in behavior" though. That would require some sort of Before/After response time indicator. At this stage, I am interested in the different patterns of behavior of girls and boys after the intervention/workshop. Your point is a good idea but answers a different question.
I would also tend to be a bit concerned over your nearly 6% "Don't know" response. That might indicate the way that particular data point is collected wasn't very clear to a lot of respondents to the survey. I would consider creating another response variable where the "Don't Know" are set to missing so you can compare the responses among those respondents that made and actual choice. In this survey Don't Know is a legitimate answer because the kids are guessing on how the workshop would affect their behavior in the future and some kids are not sure how or whether the workshop would change their behavior in practice.
The CHI-square test statistic tells you if there is/is not a significant difference in the distribution of values overall. Thanks for confirming. Now that my overall test of patterns of behavior for boys and girls are different, I am interested to know whether the 10.7% of boys who replied they will use less is significantly higher than the 8.9% of the girls who also responded to use less and similar comparisons for other levels (if 19.5% boys are significantly more than 15.6% girls, ...)
Thanks