what Statistics test to choose

bburugap · Posted 09-09-2017 09:14 PM

Hi all,

I need to perform a test which shows if there is a statistical difference between males and females in getting diagnosis. I believe I should perform a chi square. But, is it OK to do (since the sample is large). Below is the data:

	males	females
total	12,340,524	17,104,707
diagnosed	374,332	438,832
undiagnosed	11,966,192	16,665,875
rate of diagnosed per 100	3.03	2.57

Please help. Any help would be appreciated.

Thanks!

Reeza · Posted 09-09-2017 10:41 PM

Because you ahve a large N your test will be significant no matter what so you're beyond that now.

You need to look at the effect size, as well as the specificity and sensitivy to understand the practical limitations. You also have an issue where this is a 'base' comparison, there are likely underlying factors not considered but should be depending on the subject matter area.

PGStats · Posted 09-09-2017 10:44 PM

The Chi Square test is inacurate when the sample is too small. Why would there be a problem when the sample is very large?

PG

Reeza · Posted 09-09-2017 10:50 PM

Because you can detect differences that are statistically significant but not practically significant. Mostly this is because people interpret 'statistically significant' as being an actual diffference when all it means is we were able to measure a difference.

Maybe this is a better answer:

https://stats.stackexchange.com/questions/125750/sample-size-too-large

https://stats.stackexchange.com/questions/2516/are-large-data-sets-inappropriate-for-hypothesis-test...

Reeza · Posted 09-09-2017 10:51 PM

Quick clarification: You can do the test, it's how you interpret the results that matter more.

Ksharp · Posted 09-10-2017 08:35 AM



http://blogs.sas.com/content/iml/2017/07/05/test-equality-two-proportions-sas.html

StatDave · Posted 09-14-2017 01:30 PM

With a very large sample size, a statistical test can detect very small differences with significance. As stated earlier, that significant difference might be trivially small and not of practical significance to you. To make what a significance test can detect the same as what you deem practical, you need to choose the sample size for the study. This can be done using PROC POWER.

For these results, you would probably would prefer to estimate the difference between the genders and get a confidence interval for that difference. The following code does that. Notice that the tiny difference (0.0047) has a very tight confidence interval (0.0046,0.0048) because of the enormous sample size. If you add the CHISQ option to test the gender difference, it is highly significant also because of the huge sample size. The estimate of the difference is probably more useful in this case.

data a; 
do diag='y','n';
do gender='m','f';
input count @@;
output;
end; end; 
datalines;
374332 438832
11966192 16665875
;
proc freq;
weight count;
table gender*diag / riskdiff;
run;

what Statistics test to choose

Re: what Statistics test to choose

Re: what Statistics test to choose

Re: what Statistics test to choose

Re: what Statistics test to choose

Re: what Statistics test to choose

Re: what Statistics test to choose