Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
jewcyfruit
Calcite | Level 5

Reviving - I'm running chi-square with fisher's and it's now on hour 4. Sample size is ~3000. Variable 1 is education status with 8 categories and var2 is exposure status (2 categories). 3/16 cells are <5. I'm going to let it run overnight. Thoughts? 

My code is:

proc freq

data= dataset;

tables variable1*variable2/chisq exact;

run; 

4 REPLIES 4
Ksharp
Super User

Try Monte Carlo estimation of exact p-values instead of direct exact p-value computation.

proc freq data=sashelp.heart(obs=100);
table bp_status*smoking_Status;
exact chisq fisher/seed=123 alpha=0.05 n=1000000 mc maxtime=3600;
run;
Season
Barite | Level 11

I agree with @Ksharp's solution. Fisher's exact test is extremely computationally intensive. Moreover, a frustrating fact is that you may insist in running the code by letting your computer alone while the computer may end up reporting a lack of memory after an extremely long period of time (e.g., 7 days) and nothing else.

Still, if you are still curious and just want to try out Fisher's exact test, then you first have to augment the available memory of your SAS session. By default, a memory of 2GB is assigned to SAS. This is far from enough for doing such a Fisher's exact test.

Therefore, the first thing you should do is to open a folder on your computer, and then type the following code in its address bar:

Place where your SAS program is installed -memsize 1T

Like this:

C:\Program Files\SASHome\SASFoundation\9.4\sas.exe -memsize 1T

"C:\Program Files\SASHome\SASFoundation\9.4\sas.exe" stands for the directory of my SAS software. " -memsize 1T" stands for provision of 1TB of memory to a SAS session that will appear once you press the "Enter" button on your keyboard. Run Fisher's exact test on this very SAS session instead of those invoked by other means (e.g., via the Start Menu) in that those SAS sessions are still automatically assigned a memory of size 2GB.

A final remark is that if your computer is not computationally powerful, then it is better to resort to the Monte Carlo-simulated Fisher's exact test as suggested by @Ksharp.

StatDave
SAS Super FREQ
Practically speaking, it might be unnecessary to use the exact method. For one thing, the usual rule of thumb of the chi-square test not being valid with many expected (not observed) cell counts less than 5 is considered by many statisticians as overly conservative. Further, Stokes, Davis, and Koch (2012, Categorical Data Analysis Using the SAS System, Third Edition) state that the exact method usually produces more conservative results and recommend using the exact method only when sample sizes are small and the p-values from the usual (asymptotic) tests are less than 0.10. If the usual p-values are larger than 0.15, they suggest that the exact results are likely to be about the same. That being said, the decision is yours and the Monte Carlo approximation is at least worth a try.
Season
Barite | Level 11
Thank you for your explantaion on the theoretical and practical details! I have learnt a lot!

sas-innovate-white.png

Join us for our biggest event of the year!

Four days of inspiring keynotes, product reveals, hands-on learning opportunities, deep-dive demos, and peer-led breakouts. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 377 views
  • 4 likes
  • 4 in conversation