BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SAS-questioner
Obsidian | Level 7

If I have a dataset like below

School       Group       activity_2_rate  activity_2_rate
   1          1              50%                 60%
   2          1              67%                 23%
   3          1              64%                 60%
   4          2              50%                 30%
   5          2              50%                 60%
   6          2              60%                 60%

And I want to compare the mean difference of two groups on activity_1_rate and activity_2_rate, separately, what kind of test should I use? I thought might be I can use the T-test because I don't think the chi-square test would work in this case. But I am not sure about it, I have some doubts about using the T-test. First, those percentages might not be normally distributed. Second, the percentage might be different in denominators, for example, 50% might result from 5/10, but 40% might result from 40/100. In this case, can I still use the T-test? Or are there any other tests that I can use instead to compare the two groups?

Could anyone help me with it? Thank you so much!

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

Well, if you know both the numerator and denominator for each estimate, and since the numerators are all > 40, then you could use:

 

proc genmod data=yourdata;
class group;
model numerator1/denominator1 = group /dist=binomial type3;
lsmeans group/diff ilink;
run;

Where numerator1 and denominator1 are the values for activity_1, and would be replaced by numerator2 and denominator2 for activity_2.  Since it seems from your communications that the denominator would be the same for both activities because it is the enrollment at the school, you could simplify a little bit.

 

SteveDenham

 

View solution in original post

23 REPLIES 23
PaigeMiller
Diamond | Level 26

Do you know the number of data points in each group and each school?

--
Paige Miller
SAS-questioner
Obsidian | Level 7

There are two groups in total, and each group has 10 schools, so 20 schools in total.

And, I think that's how they compute the activity_1_rate and activity_2_rate:

activity_1_rate=(ac_1_a_rate+ac_1_b_rate+ac_1_c_rate+ac_1_d_rate)/4, and I think they have sample size for each ac_1_a_rate, ac_1_b_rate, ac_1_c_rate, ac_1_d_rate. (Some sample size might missed).

And it's the same way to compute the activity_2_rate.

Reeza
Super User
What are the actual number of observations here? If the numbers are small, it doesn't matter the methodology - you don't have the statistical power to do the analysis.
SAS-questioner
Obsidian | Level 7

If you mean the school, each group has 10 schools, so 20 schools in total. I understand the power might be a problem, but it would be nice if you could tell me a method to compare the mean difference and take the percentage and sample size (that used to compute different activity rates) into consideration. Do you have any ideas?

Reeza
Super User

The number of schools is irrelevant, you need to know the numerator/denominator of those rates.

 

If you have an N of 10 versus and N of 6000 the answers differ. In general, if your sample sizes are similar and large you're fine with the t-test.

SAS-questioner
Obsidian | Level 7

I think I know the denominator of those rates, even though only one denominator is missing. But the observations for each school are different, varying from 80 to 400 I think. If you said I can use the T-test, how could I reflect the denominators of these rates?

Reeza
Super User
It's important the the groups being compared are also relatively equal. If not, you're likely to get false positives as the smaller N, have higher variability in the estimates.

You can compare just using the percentages, you don't need the denominators. But if you have them, it would be easier to use the raw data.
SAS-questioner
Obsidian | Level 7

OK, if I use the raw data, do you mean I should only use the N instead of the percentage and compare the N for two groups using the T-test?

Reeza
Super User

I think you'd need to explain your experimental design and hypothesis for us to recommend a methodology. 

 

 

SAS-questioner
Obsidian | Level 7

OK, so I have two groups of schools, and each school hosts two activities. And the whole activity has been hosted for a year. The activity_1_rate is the number of students who chose to participate in the activity one divides the total number of students. Now, I want to compare if the activity_1_rate has a mean difference between two groups of schools. That's pretty much of the study, I am not sure if I describe it clearly? So, pretty much it's either chi-square test or T-test, since I only want to compare the mean difference of the two groups. But the question is, each school's students are different, ranging from 80 to 400. So if I only compare the percentage, the result might not be valid. If there is a way, to take both the percentage and the number of students into consideration?

Reeza
Super User

@SAS-questioner wrote:

OK, so I have two groups of schools, and each school hosts two activities. And the whole activity has been hosted for a year. The activity_1_rate is the number of students who chose to participate in the activity one divides the total number of students. Now, I want to compare if the activity_1_rate has a mean difference between two groups of schools. That's pretty much of the study, I am not sure if I describe it clearly? So, pretty much it's either chi-square test or T-test, since I only want to compare the mean difference of the two groups. But the question is, each school's students are different, ranging from 80 to 400. So if I only compare the percentage, the result might not be valid. If there is a way, to take both the percentage and the number of students into consideration?


Given your experimental design, I don't think that a t-test or Chi-square is appropriate here. You may want to look into @SteveDenham suggestion. If you were comparing Activity 1 to Activity 2 per school (adjusting for multiple testing), then a t-test would be appropriate but that isn't what you have here. 

PaigeMiller
Diamond | Level 26

@SAS-questioner wrote:

I think I know the denominator of those rates, even though only one denominator is missing. But the observations for each school are different, varying from 80 to 400 I think. If you said I can use the T-test, how could I reflect the denominators of these rates?


We need the number of observations! (Probably that is the number of students)

 

Not the number of schools. Although perhaps a superior analysis would be to take into account the effect of each school ...

 

OK, if I use the raw data, do you mean I should only use the N instead of the percentage and compare the N for two groups using the T-test?

 

You need both N and the percent.

 

If the raw data is binary for each student, you can use that as well.

--
Paige Miller
SteveDenham
Jade | Level 19

If the interest here is to compare the two groups, AND you have reason to believe that the percentages are valid estimates for some population (like repeatedly measuring the given schools), you might want to consider using PROC GENMOD to analyze your data.  After converting percentages to proportions by dividing by 100, this could give you a first shot at an analysis:

 

proc genmod data=yourdata;
class group;
model activity_1_rate = group /dist=binomial type3;
lsmeans group/diff ilink;
run;

A separate analysis would be done for activity_2_rate.

@StatDave would likely also recommend following this up with the %NLmeans macro to correctly compare the means on the original scale.

Pay attention to @Reeza 's comment about power - an N of 3 schools per group is only going to detect large differences in the dependent variable.

 

SteveDenham

 

 

 

SAS-questioner
Obsidian | Level 7

Thank you for your suggestion, but will the method takes the sample size into consideration? Like I mentioned above, the activity_1_rate is computed by using the number of students who participated in activity 1 divides the total number of students. And the number of students is different for each school, ranging from 80 to 500. If I only compare the percentage without taking the sample size into consideration, will the result be not valid? If I use this method, should I put the numerator (number of students who participate in activity 1) as weight? Or I should only replace the percentage with numerator, and use a T-test?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 23 replies
  • 1158 views
  • 4 likes
  • 5 in conversation