Statistical Procedures

SAS-questioner · Posted 04-26-2022 05:30 PM

If I have a dataset like below

School       Group       activity_2_rate  activity_2_rate
   1          1              50%                 60%
   2          1              67%                 23%
   3          1              64%                 60%
   4          2              50%                 30%
   5          2              50%                 60%
   6          2              60%                 60%

And I want to compare the mean difference of two groups on activity_1_rate and activity_2_rate, separately, what kind of test should I use? I thought might be I can use the T-test because I don't think the chi-square test would work in this case. But I am not sure about it, I have some doubts about using the T-test. First, those percentages might not be normally distributed. Second, the percentage might be different in denominators, for example, 50% might result from 5/10, but 40% might result from 40/100. In this case, can I still use the T-test? Or are there any other tests that I can use instead to compare the two groups?

Could anyone help me with it? Thank you so much!

SteveDenham · Posted 04-27-2022 12:08 PM

Well, if you know both the numerator and denominator for each estimate, and since the numerators are all > 40, then you could use:

proc genmod data=yourdata;
class group;
model numerator1/denominator1 = group /dist=binomial type3;
lsmeans group/diff ilink;
run;

Where numerator1 and denominator1 are the values for activity_1, and would be replaced by numerator2 and denominator2 for activity_2. Since it seems from your communications that the denominator would be the same for both activities because it is the enrollment at the school, you could simplify a little bit.

SteveDenham

View solution in original post

PaigeMiller · Posted 04-26-2022 06:03 PM

Do you know the number of data points in each group and each school?

--
Paige Miller

SAS-questioner · Posted 04-26-2022 06:11 PM

There are two groups in total, and each group has 10 schools, so 20 schools in total.

And, I think that's how they compute the activity_1_rate and activity_2_rate:

activity_1_rate=(ac_1_a_rate+ac_1_b_rate+ac_1_c_rate+ac_1_d_rate)/4, and I think they have sample size for each ac_1_a_rate, ac_1_b_rate, ac_1_c_rate, ac_1_d_rate. (Some sample size might missed).

And it's the same way to compute the activity_2_rate.

Reeza · Posted 04-26-2022 06:14 PM

What are the actual number of observations here? If the numbers are small, it doesn't matter the methodology - you don't have the statistical power to do the analysis.

SAS-questioner · Posted 04-26-2022 06:18 PM

If you mean the school, each group has 10 schools, so 20 schools in total. I understand the power might be a problem, but it would be nice if you could tell me a method to compare the mean difference and take the percentage and sample size (that used to compute different activity rates) into consideration. Do you have any ideas?

Reeza · Posted 04-26-2022 06:46 PM

The number of schools is irrelevant, you need to know the numerator/denominator of those rates.

If you have an N of 10 versus and N of 6000 the answers differ. In general, if your sample sizes are similar and large you're fine with the t-test.

SAS-questioner · Posted 04-26-2022 06:54 PM

I think I know the denominator of those rates, even though only one denominator is missing. But the observations for each school are different, varying from 80 to 400 I think. If you said I can use the T-test, how could I reflect the denominators of these rates?

Reeza · Posted 04-26-2022 06:57 PM

It's important the the groups being compared are also relatively equal. If not, you're likely to get false positives as the smaller N, have higher variability in the estimates.

You can compare just using the percentages, you don't need the denominators. But if you have them, it would be easier to use the raw data.

SAS-questioner · Posted 04-26-2022 07:00 PM

OK, if I use the raw data, do you mean I should only use the N instead of the percentage and compare the N for two groups using the T-test?

Reeza · Posted 04-26-2022 07:42 PM

I think you'd need to explain your experimental design and hypothesis for us to recommend a methodology.

SAS-questioner · Posted 04-27-2022 10:15 AM

OK, so I have two groups of schools, and each school hosts two activities. And the whole activity has been hosted for a year. The activity_1_rate is the number of students who chose to participate in the activity one divides the total number of students. Now, I want to compare if the activity_1_rate has a mean difference between two groups of schools. That's pretty much of the study, I am not sure if I describe it clearly? So, pretty much it's either chi-square test or T-test, since I only want to compare the mean difference of the two groups. But the question is, each school's students are different, ranging from 80 to 400. So if I only compare the percentage, the result might not be valid. If there is a way, to take both the percentage and the number of students into consideration?

Reeza · Posted 04-27-2022 10:31 AM

@SAS-questioner wrote:

OK, so I have two groups of schools, and each school hosts two activities. And the whole activity has been hosted for a year. The activity_1_rate is the number of students who chose to participate in the activity one divides the total number of students. Now, I want to compare if the activity_1_rate has a mean difference between two groups of schools. That's pretty much of the study, I am not sure if I describe it clearly? So, pretty much it's either chi-square test or T-test, since I only want to compare the mean difference of the two groups. But the question is, each school's students are different, ranging from 80 to 400. So if I only compare the percentage, the result might not be valid. If there is a way, to take both the percentage and the number of students into consideration?

Given your experimental design, I don't think that a t-test or Chi-square is appropriate here. You may want to look into @SteveDenham suggestion. If you were comparing Activity 1 to Activity 2 per school (adjusting for multiple testing), then a t-test would be appropriate but that isn't what you have here.

PaigeMiller · Posted 04-27-2022 06:28 AM

@SAS-questioner wrote:

I think I know the denominator of those rates, even though only one denominator is missing. But the observations for each school are different, varying from 80 to 400 I think. If you said I can use the T-test, how could I reflect the denominators of these rates?

We need the number of observations! (Probably that is the number of students)

Not the number of schools. Although perhaps a superior analysis would be to take into account the effect of each school ...

OK, if I use the raw data, do you mean I should only use the N instead of the percentage and compare the N for two groups using the T-test?

You need both N and the percent.

If the raw data is binary for each student, you can use that as well.

--
Paige Miller

SteveDenham · Posted 04-27-2022 08:35 AM

If the interest here is to compare the two groups, AND you have reason to believe that the percentages are valid estimates for some population (like repeatedly measuring the given schools), you might want to consider using PROC GENMOD to analyze your data. After converting percentages to proportions by dividing by 100, this could give you a first shot at an analysis:

proc genmod data=yourdata;
class group;
model activity_1_rate = group /dist=binomial type3;
lsmeans group/diff ilink;
run;

A separate analysis would be done for activity_2_rate.

@StatDave would likely also recommend following this up with the %NLmeans macro to correctly compare the means on the original scale.

Pay attention to @Reeza 's comment about power - an N of 3 schools per group is only going to detect large differences in the dependent variable.

SteveDenham

SAS-questioner · Posted 04-27-2022 10:24 AM

Thank you for your suggestion, but will the method takes the sample size into consideration? Like I mentioned above, the activity_1_rate is computed by using the number of students who participated in activity 1 divides the total number of students. And the number of students is different for each school, ranging from 80 to 500. If I only compare the percentage without taking the sample size into consideration, will the result be not valid? If I use this method, should I put the numerator (number of students who participate in activity 1) as weight? Or I should only replace the percentage with numerator, and use a T-test?

Statistical Procedures

How to compare the mean difference between two groups if the value of each individual is percentage?

Re: How to compare the mean difference between two groups if the value of each individual is percent

Re: How to compare the mean difference between two groups if the value of each individual is percent

Re: How to compare the mean difference between two groups if the value of each individual is percent

Re: How to compare the mean difference between two groups if the value of each individual is percent

Re: How to compare the mean difference between two groups if the value of each individual is percent

Re: How to compare the mean difference between two groups if the value of each individual is percent

Re: How to compare the mean difference between two groups if the value of each individual is percent

Re: How to compare the mean difference between two groups if the value of each individual is percent

Re: How to compare the mean difference between two groups if the value of each individual is percent

Re: How to compare the mean difference between two groups if the value of each individual is percent

Re: How to compare the mean difference between two groups if the value of each individual is percent

Re: How to compare the mean difference between two groups if the value of each individual is percent

Re: How to compare the mean difference between two groups if the value of each individual is percent

Re: How to compare the mean difference between two groups if the value of each individual is percent

Re: How to compare the mean difference between two groups if the value of each individual is percent

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...