BookmarkSubscribeRSS Feed
bburugap
Calcite | Level 5

Hi all,

 

I need to perform a test which shows if there is a statistical difference between males and females in getting diagnosis. I believe I should perform a chi square. But, is it OK to do (since the sample is large).  Below is the data:

 

males

females

total

        12,340,524

        17,104,707

diagnosed

             374,332

             438,832

undiagnosed

        11,966,192

        16,665,875

rate of diagnosed per 100

3.03

2.57

 

Please help. Any help would be appreciated.

 

Thanks!

6 REPLIES 6
Reeza
Super User

Because you ahve a large N your test will be significant no matter what so you're beyond that now. 

 

You need to look at the effect size, as well as the specificity and sensitivy to understand the practical limitations. You also have an issue where this is a 'base' comparison, there are likely underlying factors not considered but should be depending on the subject matter area. 

PGStats
Opal | Level 21

The Chi Square test is inacurate when the sample is too small. Why would there be a problem when the sample is very large?

PG
Reeza
Super User

Because you can detect differences that are statistically significant but not practically significant.  Mostly this is because people interpret 'statistically significant' as being an actual diffference when all it means is we were able to measure a difference. 

 

Maybe this is a better answer:

https://stats.stackexchange.com/questions/125750/sample-size-too-large

 

https://stats.stackexchange.com/questions/2516/are-large-data-sets-inappropriate-for-hypothesis-test...

 

 

Reeza
Super User
Quick clarification: You can do the test, it's how you interpret the results that matter more.
Ksharp
Super User


http://blogs.sas.com/content/iml/2017/07/05/test-equality-two-proportions-sas.html

StatDave
SAS Super FREQ

With a very large sample size, a statistical test can detect very small differences with significance. As stated earlier, that significant difference might be trivially small and not of practical significance to you. To make what a significance test can detect the same as what you deem practical, you need to choose the sample size for the study. This can be done using PROC POWER. 

 

For these results, you would probably would prefer to estimate the difference between the genders and get a confidence interval for that difference.  The following code does that. Notice that the tiny difference (0.0047) has a very tight confidence interval (0.0046,0.0048) because of the enormous sample size. If you add the CHISQ option to test the gender difference, it is highly significant also because of the huge sample size. The estimate of the difference is probably more useful in this case. 

 

data a; 
do diag='y','n';
do gender='m','f';
input count @@;
output;
end; end; 
datalines;
374332 438832
11966192 16665875
;
proc freq;
weight count;
table gender*diag / riskdiff;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1486 views
  • 1 like
  • 5 in conversation