Statistical Procedures

Emma_at_SAS · Posted 08-24-2022 01:17 PM

Hello,

I have a survey for boys and girls who attended an education series for healthy eating and based on the following chi square test the pattern of change in behavior is different for boys and girls.

We asked the kids how the education affected their use of fruits and vegetables. I want to compare the same answers for patterns of consumption for girls and boys. For example, if the % boys who said they will use more vegies are significantly more than % girls who replied they will use more vegies (19.5% vs. 15.6%). I am not interested to test different levels for boys and girls, for example, if more boys said "use more" than girls who said "use less" (19.5% vs. 8.9%)

Could you please help me with this test?

proc surveyfreq data = &data VARHEADER = NAMELABEL nosummary;
tables Kids_gender*Fruits / nostd nocellpercent row row(cl) cv chisq;
weight WEIGHT_scale;
run;

Kids gender	Use of fruits and vegetables	Frequency	Weighted Frequency	CV for Percent	Row Percent	95% Confidence Limits for Row Percent		CV for Row Percent
Girls	Use less	748	688.4	0.044	8.9	8.1	9.6	0.0430
	Use more	1490	1214.1	0.031	15.6	14.7	16.5	0.0292
	No difference, would use the same amount	5706	4559.3	0.015	58.7	57.4	59.9	0.0109
	Not applicable / I do not eat them	912	860.8	0.040	11.1	10.2	11.9	0.0386
	Don't know	459	445.6	0.056	5.7	5.1	6.4	0.0547
	Total	9315	7768.3	0.011	100.0
Boys	Use less	647	955.1	0.045	10.7	9.8	11.6	0.0442
	Use more	1211	1735.6	0.032	19.5	18.3	20.6	0.0306
	No difference, would use the same amount	3743	4812.7	0.016	54.0	52.5	55.4	0.0139
	Not applicable / I do not eat them	596	884.3	0.048	9.9	9.0	10.8	0.0474
	Don't know	340	532.4	0.063	6.0	5.2	6.7	0.0622
	Total	6537	8920.1	0.009	100.0
Total	Use less	1395	1643.5	0.031
	Use more	2701	2949.7	0.022
	No difference, would use the same amount	9449	9372.0	0.009
	Not applicable / I do not eat them	1508	1745.1	0.031
	Don't know	799	978.0	0.042
	Total	15852	16688.4
Frequency Missing = 3

Rao-Scott Chi-Square Test
Pearson Chi-Square	67.5914
Design Correction	1.6183

Rao-Scott Chi-Square	41.7675
DF	4
Pr > ChiSq	<.0001

F Value	10.4419
Num DF	4
Den DF	63404
Pr > F	<.0001
Sample Size = 3

Thanks

sbxkoenk · Posted 08-27-2022 11:58 AM

Hello @Emma_at_SAS ,

Can't you solve this with a by-variable?
by pattern_of_consumption
The analysis will be repeated for each level of the by-variable.

You can then use the MULTTEST procedure to address the multiple testing problem (inflation of the type I - error).

Kind regards,

Koen

Emma_at_SAS · Posted 08-29-2022 03:37 PM

Thank you @sbxkoenk for your thoughts. I have a question about your suggestion. If I use the

by pattern_of_consumption

approach to slice the comparison for the subgroups of levels of consumption for fruits and vegies then for example, for the kids who said use less, I am comparing 748 girls who said they use less with 647 boys who said use less and I would compare 41.9% girls with 58.1% boys vs. I wanted to compare 8.9% girls with 10.7% boys from the overall sample of girls and boys (9315 girls and 6537 boys).

My concern/question is if it is correct to use the p-value from this BY analysis below in my MULTTEST procedure to adjust for type I error.


Kids Gender	Frequency	Weighted Frequency	Percent	CV for Percent
Girls	748	688	41.9	0.0379
Boys	647	955	58.1	0.0273
Total	1395	1644	100.0

Rao-Scott Chi-Square Test
Pearson Chi-Square	36.7227
Design Correction	1.4406

Rao-Scott Chi-Square	25.4908
DF	1
Pr > ChiSq	<.0001

F Value	25.4908
Num DF	1
Den DF	1394
Pr > F	<.0001
Sample Size = 1395

	The percentages I want to compare		The percentages I compare with BY analysis			MULTTEST adjustment
Patterns of consumption	Female	Male	Female	Male	P-value	Adjusted p-value
Use less	8.9	10.7	41.9	58.1	0.0001
Use more	15.6	19.5	41.2	58.8	0.0001
No difference, would use the same amount	58.7	54.0	48.6	51.4	0.0367
Not applicable / I do not use this	11.1	9.9	49.3	50.7	0.6779
Don't know	5.7	6.0	45.6	54.4	0.038

Thanks

Emma_at_SAS · Posted 08-24-2022 02:29 PM

Hello,

I have a survey for boys and girls who attended an education series for healthy eating and based on the following chi square test the pattern of change in behavior is different for boys and girls.

We asked the kids how the education affected their use of fruits and vegetables. I want to compare the same answers for patterns of consumption for girls and boys. For example, if the % boys who said they will use more vegies are significantly more than % girls who replied they will use more vegies (19.5% vs. 15.6%). I am not interested to test different levels for boys and girls, for example, if more boys said "use more" than girls who said "use less" (19.5% vs. 8.9%)

Could you please help me with this test?

proc surveyfreq data = &data VARHEADER = NAMELABEL nosummary;
tables Kids_gender*Fruits / nostd nocellpercent row row(cl) cv chisq;
weight WEIGHT_scale;
run;

Kids gender	Use of fruits and vegetables	Frequency	Weighted Frequency	CV for Percent	Row Percent	95% Confidence Limits for Row Percent		CV for Row Percent
Girls	Use less	748	688.4	0.044	8.9	8.1	9.6	0.0430
	Use more	1490	1214.1	0.031	15.6	14.7	16.5	0.0292
	No difference, would use the same amount	5706	4559.3	0.015	58.7	57.4	59.9	0.0109
	Not applicable / I do not eat them	912	860.8	0.040	11.1	10.2	11.9	0.0386
	Don't know	459	445.6	0.056	5.7	5.1	6.4	0.0547
	Total	9315	7768.3	0.011	100.0
Boys	Use less	647	955.1	0.045	10.7	9.8	11.6	0.0442
	Use more	1211	1735.6	0.032	19.5	18.3	20.6	0.0306
	No difference, would use the same amount	3743	4812.7	0.016	54.0	52.5	55.4	0.0139
	Not applicable / I do not eat them	596	884.3	0.048	9.9	9.0	10.8	0.0474
	Don't know	340	532.4	0.063	6.0	5.2	6.7	0.0622
	Total	6537	8920.1	0.009	100.0
Total	Use less	1395	1643.5	0.031
	Use more	2701	2949.7	0.022
	No difference, would use the same amount	9449	9372.0	0.009
	Not applicable / I do not eat them	1508	1745.1	0.031
	Don't know	799	978.0	0.042
	Total	15852	16688.4
Frequency Missing = 3

Rao-Scott Chi-Square Test
Pearson Chi-Square	67.5914
Design Correction	1.6183

Rao-Scott Chi-Square	41.7675
DF	4
Pr > ChiSq	<.0001

F Value	10.4419
Num DF	4
Den DF	63404
Pr > F	<.0001
Sample Size = 3

Thanks

ballardw · Posted 08-24-2022 06:20 PM

You may get an easier to read for your purpose if you reverse the order of the variables in the tables statement. Try

proc surveyfreq data = &data VARHEADER = NAMELABEL nosummary;
   tables Fruits * Kids_gender / nostd nocellpercent row row(cl) cv chisq;
   weight WEIGHT_scale;
run;

Then the boy/girl responses for the same level of the question will be closer together and easier to read but the information isn't going to change.

I don't see anything related to "change in behavior" though. That would require some sort of Before/After response time indicator.

I would also tend to be a bit concerned over your nearly 6% "Don't know" response. That might indicate the way that particular data point is collected wasn't very clear to a lot of respondents to the survey. I would consider creating another response variable where the "Don't Know" are set to missing so you can compare the responses among those respondents that made and actual choice.

The CHI-square test statistic tells you if there is/is not a significant difference in distribution of values overall.

Emma_at_SAS · Posted 08-29-2022 03:53 PM

Thank you @ballardw for your thoughts and suggestions. I've added my thoughts to your comments below:

You may get an easier to read for your purpose if you reverse the order of the variables in the tables statement. Try

proc surveyfreq data = &data VARHEADER = NAMELABEL nosummary;
   tables Fruits * Kids_gender / nostd nocellpercent row row(cl) cv chisq;
   weight WEIGHT_scale;
run;

Then the boy/girl responses for the same level of the question will be closer together and easier to read but the information isn't going to change. I tried this but I noticed it will give me different percentages than what I want. I want % girls among all girls who said they would use less, use more, ... but if I switch the order of the variables to Fruits * Kids_gender then I get %boys and girls who responded use less ...

Use of fruits and vegetables	Kids gender	Frequency	Weighted Frequency	Percent	CV for Percent	Row Percent	95% Confidence Limits for Row Percent		CV for Row Percent
Use less	Girls	748	688.4	4.1253	0.044	41.9	38.8	45.0	0.0379
	Boys	647	955.1	5.7232	0.045	58.1	55.0	61.2	0.0273
	Total	1395

I don't see anything related to "change in behavior" though. That would require some sort of Before/After response time indicator. At this stage, I am interested in the different patterns of behavior of girls and boys after the intervention/workshop. Your point is a good idea but answers a different question.

I would also tend to be a bit concerned over your nearly 6% "Don't know" response. That might indicate the way that particular data point is collected wasn't very clear to a lot of respondents to the survey. I would consider creating another response variable where the "Don't Know" are set to missing so you can compare the responses among those respondents that made and actual choice. In this survey Don't Know is a legitimate answer because the kids are guessing on how the workshop would affect their behavior in the future and some kids are not sure how or whether the workshop would change their behavior in practice.

The CHI-square test statistic tells you if there is/is not a significant difference in the distribution of values overall. Thanks for confirming. Now that my overall test of patterns of behavior for boys and girls are different, I am interested to know whether the 10.7% of boys who replied they will use less is significantly higher than the 8.9% of the girls who also responded to use less and similar comparisons for other levels (if 19.5% boys are significantly more than 15.6% girls, ...)

Thanks

Statistical Procedures

Chi square post hoc: testing the levels of a variable for boys and girls

Re: Chi square post hoc: testing the levels of a variable for boys and girls

Re: Chi square post hoc: testing the levels of a variable for boys and girls

Chi square post hoc: testing the levels of a variable for boys and girls

Re: Chi square post hoc: testing the levels of a variable for boys and girls

Re: Chi square post hoc: testing the levels of a variable for boys and girls

chi-square test / compute variable worth

Chi squared test for unequal sample sizes

3 way cross tabulation (chi square test)

Errors Running Chi square

code for chi square test from cards input

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...