BookmarkSubscribeRSS Feed
bburugap
Calcite | Level 5

Hi all,

 

I need to perform a test which shows if there is a statistical difference between males and females in getting diagnosis. I believe I should perform a chi square. But, is it OK to do (since the sample is large).  Below is the data:

 

males

females

total

        12,340,524

        17,104,707

diagnosed

             374,332

             438,832

undiagnosed

        11,966,192

        16,665,875

rate of diagnosed per 100

3.03

2.57

 

Please help. Any help would be appreciated.

 

Thanks!

6 REPLIES 6
Reeza
Super User

Because you ahve a large N your test will be significant no matter what so you're beyond that now. 

 

You need to look at the effect size, as well as the specificity and sensitivy to understand the practical limitations. You also have an issue where this is a 'base' comparison, there are likely underlying factors not considered but should be depending on the subject matter area. 

PGStats
Opal | Level 21

The Chi Square test is inacurate when the sample is too small. Why would there be a problem when the sample is very large?

PG
Reeza
Super User

Because you can detect differences that are statistically significant but not practically significant.  Mostly this is because people interpret 'statistically significant' as being an actual diffference when all it means is we were able to measure a difference. 

 

Maybe this is a better answer:

https://stats.stackexchange.com/questions/125750/sample-size-too-large

 

https://stats.stackexchange.com/questions/2516/are-large-data-sets-inappropriate-for-hypothesis-test...

 

 

Reeza
Super User
Quick clarification: You can do the test, it's how you interpret the results that matter more.
Ksharp
Super User


http://blogs.sas.com/content/iml/2017/07/05/test-equality-two-proportions-sas.html

StatDave
SAS Super FREQ

With a very large sample size, a statistical test can detect very small differences with significance. As stated earlier, that significant difference might be trivially small and not of practical significance to you. To make what a significance test can detect the same as what you deem practical, you need to choose the sample size for the study. This can be done using PROC POWER. 

 

For these results, you would probably would prefer to estimate the difference between the genders and get a confidence interval for that difference.  The following code does that. Notice that the tiny difference (0.0047) has a very tight confidence interval (0.0046,0.0048) because of the enormous sample size. If you add the CHISQ option to test the gender difference, it is highly significant also because of the huge sample size. The estimate of the difference is probably more useful in this case. 

 

data a; 
do diag='y','n';
do gender='m','f';
input count @@;
output;
end; end; 
datalines;
374332 438832
11966192 16665875
;
proc freq;
weight count;
table gender*diag / riskdiff;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1582 views
  • 1 like
  • 5 in conversation