Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
Emma_at_SAS
Lapis Lazuli | Level 10

Hello,

 

I have a survey for boys and girls who attended an education series for healthy eating and based on the following chi square test the pattern of change in behavior is different for boys and girls.

We asked the kids how the education affected their use of fruits and vegetables. I want to compare the same answers for patterns of consumption for girls and boys. For example, if the % boys who said they will use more vegies are significantly more than % girls who replied they will use more vegies (19.5% vs. 15.6%). I am not interested to test different levels for boys and girls, for example, if more boys said "use more" than girls who said "use less" (19.5% vs. 8.9%)

 

Could you please help me with this test?

 

proc surveyfreq data = &data VARHEADER = NAMELABEL nosummary;
tables Kids_gender*Fruits / nostd nocellpercent row row(cl) cv chisq;
weight WEIGHT_scale;
run;

 

Kids gender Use of fruits and vegetables Frequency Weighted
Frequency
CV for
Percent
Row
Percent
95% Confidence
Limits
for Row Percent
CV for
Row
Percent
Girls Use less 748 688.4 0.044 8.9 8.1 9.6 0.0430
  Use more 1490 1214.1 0.031 15.6 14.7 16.5 0.0292
  No difference, would use the
same amount
5706 4559.3 0.015 58.7 57.4 59.9 0.0109
  Not applicable / I do not eat
them
912 860.8 0.040 11.1 10.2 11.9 0.0386
  Don't know 459 445.6 0.056 5.7 5.1 6.4 0.0547
  Total 9315 7768.3 0.011 100.0      
Boys Use less 647 955.1 0.045 10.7 9.8 11.6 0.0442
  Use more 1211 1735.6 0.032 19.5 18.3 20.6 0.0306
  No difference, would use the
same amount
3743 4812.7 0.016 54.0 52.5 55.4 0.0139
  Not applicable / I do not eat them 596 884.3 0.048 9.9 9.0 10.8 0.0474
  Don't know 340 532.4 0.063 6.0 5.2 6.7 0.0622
  Total 6537 8920.1 0.009 100.0      
Total Use less 1395 1643.5 0.031        
  Use more 2701 2949.7 0.022        
  No difference, would use the
same amount
9449 9372.0 0.009        
  Not applicable / I do not eat
them
1508 1745.1 0.031        
  Don't know 799 978.0 0.042        
  Total 15852 16688.4          
Frequency Missing = 3

 

Rao-Scott Chi-Square Test
Pearson Chi-Square 67.5914
Design Correction 1.6183
   
Rao-Scott Chi-Square 41.7675
DF 4
Pr > ChiSq <.0001
   
F Value 10.4419
Num DF 4
Den DF 63404
Pr > F <.0001
Sample Size = 3

 

Thanks

5 REPLIES 5
sbxkoenk
SAS Super FREQ

Hello @Emma_at_SAS ,

 

Can't you solve this with a by-variable?
     by pattern_of_consumption
The analysis will be repeated for each level of the by-variable. 

 

You can then use the MULTTEST procedure to address the multiple testing problem (inflation of the type I - error).

 

Kind regards,

Koen

Emma_at_SAS
Lapis Lazuli | Level 10

Thank you @sbxkoenk for your thoughts. I have a question about your suggestion. If I use the 

by pattern_of_consumption

approach to slice the comparison for the subgroups of levels of consumption for fruits and vegies then for example, for the kids who said use less, I am comparing 748 girls who said they use less with 647 boys who said use less and I would compare 41.9% girls with 58.1% boys vs. I wanted to compare 8.9% girls with 10.7% boys from the overall sample of girls and boys (9315 girls and 6537 boys).

My concern/question is if it is correct to use the p-value from this BY analysis below in my MULTTEST procedure to adjust for type I error. 

 
Kids Gender Frequency Weighted
Frequency
Percent CV for
Percent
Girls 748 688 41.9 0.0379
Boys 647 955 58.1 0.0273
Total 1395 1644 100.0  

Rao-Scott Chi-Square Test
Pearson Chi-Square 36.7227
Design Correction 1.4406
   
Rao-Scott Chi-Square 25.4908
DF 1
Pr > ChiSq <.0001
   
F Value 25.4908
Num DF 1
Den DF 1394
Pr > F <.0001
Sample Size = 1395
  The percentages I want to compare   The percentages I compare with BY analysis     MULTTEST adjustment
Patterns of consumption Female Male Female Male P-value Adjusted p-value
Use less 8.9 10.7 41.9 58.1 0.0001    
Use more 15.6 19.5 41.2 58.8 0.0001    
No difference, would use the
same amount
58.7 54.0 48.6 51.4 0.0367    
Not applicable / I do not use
this
11.1 9.9 49.3 50.7 0.6779    
Don't know 5.7 6.0 45.6 54.4 0.038    

Thanks

Emma_at_SAS
Lapis Lazuli | Level 10

Hello,

 

I have a survey for boys and girls who attended an education series for healthy eating and based on the following chi square test the pattern of change in behavior is different for boys and girls.

We asked the kids how the education affected their use of fruits and vegetables. I want to compare the same answers for patterns of consumption for girls and boys. For example, if the % boys who said they will use more vegies are significantly more than % girls who replied they will use more vegies (19.5% vs. 15.6%). I am not interested to test different levels for boys and girls, for example, if more boys said "use more" than girls who said "use less" (19.5% vs. 8.9%)

 

Could you please help me with this test?

 

proc surveyfreq data = &data VARHEADER = NAMELABEL nosummary;
tables Kids_gender*Fruits / nostd nocellpercent row row(cl) cv chisq;
weight WEIGHT_scale;
run;

 

Kids gender Use of fruits and vegetables Frequency Weighted
Frequency
CV for
Percent
Row
Percent
95% Confidence
Limits
for Row Percent
CV for
Row
Percent
Girls Use less 748 688.4 0.044 8.9 8.1 9.6 0.0430
  Use more 1490 1214.1 0.031 15.6 14.7 16.5 0.0292
  No difference, would use the
same amount
5706 4559.3 0.015 58.7 57.4 59.9 0.0109
  Not applicable / I do not eat
them
912 860.8 0.040 11.1 10.2 11.9 0.0386
  Don't know 459 445.6 0.056 5.7 5.1 6.4 0.0547
  Total 9315 7768.3 0.011 100.0      
Boys Use less 647 955.1 0.045 10.7 9.8 11.6 0.0442
  Use more 1211 1735.6 0.032 19.5 18.3 20.6 0.0306
  No difference, would use the
same amount
3743 4812.7 0.016 54.0 52.5 55.4 0.0139
  Not applicable / I do not eat them 596 884.3 0.048 9.9 9.0 10.8 0.0474
  Don't know 340 532.4 0.063 6.0 5.2 6.7 0.0622
  Total 6537 8920.1 0.009 100.0      
Total Use less 1395 1643.5 0.031        
  Use more 2701 2949.7 0.022        
  No difference, would use the
same amount
9449 9372.0 0.009        
  Not applicable / I do not eat
them
1508 1745.1 0.031        
  Don't know 799 978.0 0.042        
  Total 15852 16688.4          
Frequency Missing = 3

 

Rao-Scott Chi-Square Test
Pearson Chi-Square 67.5914
Design Correction 1.6183
   
Rao-Scott Chi-Square 41.7675
DF 4
Pr > ChiSq <.0001
   
F Value 10.4419
Num DF 4
Den DF 63404
Pr > F <.0001
Sample Size = 3

 

Thanks

ballardw
Super User

You may get an easier to read for your purpose if you reverse the order of the variables in the tables statement. Try

 

proc surveyfreq data = &data VARHEADER = NAMELABEL nosummary;
   tables Fruits * Kids_gender / nostd nocellpercent row row(cl) cv chisq;
   weight WEIGHT_scale;
run;

Then the boy/girl responses for the same level of the question will be closer together and easier to read but the information isn't going to change.

 

I don't see anything related to "change in behavior" though. That would require some sort of Before/After response time indicator.

 

I would also tend to be a bit concerned over your nearly 6% "Don't know" response. That might indicate the way that particular data point is collected wasn't very clear to a lot of respondents to the survey.  I would consider creating another response variable where the "Don't Know" are set to missing so you can compare the responses among those respondents that made and actual choice.

 

The CHI-square test statistic tells you if there is/is not a significant difference in distribution of values overall.

Emma_at_SAS
Lapis Lazuli | Level 10

Thank you @ballardw for your thoughts and suggestions. I've added my thoughts to your comments below:

 

You may get an easier to read for your purpose if you reverse the order of the variables in the tables statement. Try

 

proc surveyfreq data = &data VARHEADER = NAMELABEL nosummary;
   tables Fruits * Kids_gender / nostd nocellpercent row row(cl) cv chisq;
   weight WEIGHT_scale;
run;

Then the boy/girl responses for the same level of the question will be closer together and easier to read but the information isn't going to change. I tried this but I noticed it will give me different percentages than what I want. I want % girls among all girls who said they would use less, use more, ... but if I switch the order of the variables to Fruits * Kids_gender then I get %boys and girls who responded use less ...

Use of fruits and vegetables Kids gender Frequency Weighted
Frequency
Percent CV for
Percent
Row
Percent
95% Confidence Limits
for Row Percent
CV for
Row Percent
Use less Girls 748 688.4 4.1253 0.044 41.9 38.8 45.0 0.0379
  Boys 647 955.1 5.7232 0.045 58.1 55.0 61.2 0.0273
  Total 1395              

 

I don't see anything related to "change in behavior" though. That would require some sort of Before/After response time indicator. At this stage, I am interested in the different patterns of behavior of girls and boys after the intervention/workshop. Your point is a good idea but answers a different question.

 

I would also tend to be a bit concerned over your nearly 6% "Don't know" response. That might indicate the way that particular data point is collected wasn't very clear to a lot of respondents to the survey.  I would consider creating another response variable where the "Don't Know" are set to missing so you can compare the responses among those respondents that made and actual choice. In this survey Don't Know is a legitimate answer because the kids are guessing on how the workshop would affect their behavior in the future and some kids are not sure how or whether the workshop would change their behavior in practice. 

 

The CHI-square test statistic tells you if there is/is not a significant difference in the distribution of values overall. Thanks for confirming. Now that my overall test of patterns of behavior for boys and girls are different, I am interested to know whether the 10.7% of boys who replied they will use less is significantly higher than the 8.9% of the girls who also responded to use less and similar comparisons for other levels (if 19.5% boys are significantly more than 15.6% girls, ...)

Thanks

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1804 views
  • 1 like
  • 3 in conversation