- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would like to find a distribution that best fit the sample of a variable. the distribution could be normal, gamma, exponential, or log-normal etc. Is there a way to tell SAS to find the distribution and provide the parameters ?
- Tags:
- distribution
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To my knowledge there is no automatic procedure. But you can pit the distributions against each other by fitting them to your data as a mixture with proc FMM
proc fmm data=sashelp.heart plots=none componentinfo gconv=0;
model cholesterol = / dist=normal label="Normal";
model cholesterol = / dist=lognormal label="Lognormal";
model cholesterol = / dist=gamma label="Gamma";
model cholesterol = / dist=exponential label="Exponential";
run;
Mixing Standard Component Probability GLogit(Prob) Error z Value Pr > |z| 1 0 -5.97E13 0 . . 2 0.9897 6.0324 0.4371 13.80 <.0001 3 0.0079 1.2062 0.4639 2.60 0.0093 4 0.0024 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
PROC SEVERITY in SAS/ETS fit many distributions and uses statistical criteria (AIC, BIC, etc) to identify the best fitting distribution. See the Getting Started example in the SEVERITY documentation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. but I got MEMORY error when I use my data. and also, it does not work with variable with negative value
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please post the portion of the SAS log that shows the error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Error 1:
355 proc severity data=sample2 crit=aicc;
NOTE: Writing HTML Body file: sashtml.htm
356 loss indicativefee_mean;
357 dist _predefined_;
358 run;
ERROR: Java virtual machine exception. java.lang.OutOfMemoryError: GC overhead limit
exceeded.
ERROR: Java virtual machine exception. java.lang.OutOfMemoryError: GC overhead limit
exceeded.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1972257 observations read from the data set
WORK._DOCTMP000000000000000000001.
NOTE: PROCEDURE SEVERITY used (Total process time):
real time 1:26.26
cpu time 1:02.89
Error 2:
360 proc severity data=sample2 crit=aicc;
361 loss lnindicativefee;
362 dist _predefined_;
363 run;
WARNING: For at least one observation, variable lnindicativefee has a negative value.
Ignoring such observations.
WARNING: No valid observations found.
NOTE: PROCEDURE SEVERITY used (Total process time):
real time 0.58 seconds
cpu time 0.51 seconds
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I do not know what is causing the Java error, but try using PLOTS=NONE to suppress plots.
Regarding the WARNINGS,
WARNING: For at least one observation, variable lnindicativefee has a negative value.
Ignoring such observations.
WARNING: No valid observations found.
The warning says that all of the observations are invalid for one of the distributions that you are fitting. Instead of using the _PREDEFINED_ keyword, specify the distributions individually (for example, DIST Exponential). That will restrict the procedure to only the distributions of interest. You can also use PRINT=ALL to find out more information about each fit.
Remember that several of these distributions have restrictions on the value of the observations. For example, negative values are invalid for the exponential distribution. Similar restrictions apply for the lognormal and gamma distributions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The simpler, nonmodeling approach is using PROC UNIVARIATE. See this note on distribution testing and parameter estimation.
- Tags:
- distribution fitting
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Does this mean I have to try the parameters to see which one fits best ?