- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to calculate the fishers score on 4 variables but it seems to be taking forever. Could it be that the code is wrong? It did not show any error in the log. Here is my code;
proc freq data = &curlib..IRAOut order = data;
tables CurTotal * Basetotal * CurTrans * BaseTrans / fishers;
run;
Is there another way to calculate the fishers score? when I used the chisq option it did not give me the fscore.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Forever, as in minutes, or hours, or days???
How many observations in your data set?
You might want to read the section "Computational Resources" here: https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=procstat&docsetTarget=pro...
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It was running for several hours so I terminated the program. My data set has 2268 observations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How many levels do your 4 variables have? With names like CurTotal, Basetotal , CurTrans and BaseTrans I would be tempted to think that these really aren't categorical variables.
I would be tempted to run something like:
ods select nlevels; proc freq data=data = &curlib..IRAOut nlevels; tables CurTotal Basetotal CurTrans BaseTrans ; run;
Then look at the product of the number of levels reported combined. Since this code is going to make a separate table of curtrans*basetrans for each combination of curtotal and basetotal in the data you may be seeing the effect of creating a very large amount of output: one row and one column for each value of curtrans*basetrans . If you have 10 levels of curtrans and 10 of basetrans that is a 10 by 10 table with the associated frequencies and percentages. If in addition you have 10 levels of each of your total variables that is 100 tables of 10 by 10 output. If you have more levels that gets worse.
And from the Freq documentation for the Fisher Tables option:
Note: PROC FREQ computes exact tests by using fast and efficient algorithms that are superior to direct enumeration. Exact tests are appropriate when a data set is small, sparse, skewed, or heavily tied. For some large problems, computation of exact tests might require a substantial amount of time and memory. Consider using asymptotic tests for such problems. Alternatively, when asymptotic methods might not be sufficient for such large problems, consider using Monte Carlo estimation of exact p-values. You can request Monte Carlo estimation by specifying the MC computation-option in the EXACT statement. See the section Computational Resources for more information.
So you may want to consider the Exact statement instead of the tables option.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Fisher scoring is not the same as Fisher's exact test.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You only calculate Fisher's Test when your number of records in a particular group is low, usually less than 5 or 10. Otherwise, the Chi Square Test will have the same values.
If you're trying to calculate the Fisher Score, https://arxiv.org/abs/1202.3725
That's a very different calculation/problem.
@Stacy1 wrote:
I am trying to calculate the fishers score on 4 variables but it seems to be taking forever. Could it be that the code is wrong? It did not show any error in the log. Here is my code;
proc freq data = &curlib..IRAOut order = data;
tables CurTotal * Basetotal * CurTrans * BaseTrans / fishers;
run;
Is there another way to calculate the fishers score? when I used the chisq option it did not give me the fscore.