I would like to find a distribution that best fit the sample of a variable. the distribution could be normal, gamma, exponential, or log-normal etc. Is there a way to tell SAS to find the distribution and provide the parameters ?
To my knowledge there is no automatic procedure. But you can pit the distributions against each other by fitting them to your data as a mixture with proc FMM
proc fmm data=sashelp.heart plots=none componentinfo gconv=0;
model cholesterol = / dist=normal label="Normal";
model cholesterol = / dist=lognormal label="Lognormal";
model cholesterol = / dist=gamma label="Gamma";
model cholesterol = / dist=exponential label="Exponential";
run;
Mixing Standard Component Probability GLogit(Prob) Error z Value Pr > |z| 1 0 -5.97E13 0 . . 2 0.9897 6.0324 0.4371 13.80 <.0001 3 0.0079 1.2062 0.4639 2.60 0.0093 4 0.0024 0
PROC SEVERITY in SAS/ETS fit many distributions and uses statistical criteria (AIC, BIC, etc) to identify the best fitting distribution. See the Getting Started example in the SEVERITY documentation.
Thanks. but I got MEMORY error when I use my data. and also, it does not work with variable with negative value
Please post the portion of the SAS log that shows the error.
Error 1:
355 proc severity data=sample2 crit=aicc;
NOTE: Writing HTML Body file: sashtml.htm
356 loss indicativefee_mean;
357 dist _predefined_;
358 run;
ERROR: Java virtual machine exception. java.lang.OutOfMemoryError: GC overhead limit
exceeded.
ERROR: Java virtual machine exception. java.lang.OutOfMemoryError: GC overhead limit
exceeded.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1972257 observations read from the data set
WORK._DOCTMP000000000000000000001.
NOTE: PROCEDURE SEVERITY used (Total process time):
real time 1:26.26
cpu time 1:02.89
Error 2:
360 proc severity data=sample2 crit=aicc;
361 loss lnindicativefee;
362 dist _predefined_;
363 run;
WARNING: For at least one observation, variable lnindicativefee has a negative value.
Ignoring such observations.
WARNING: No valid observations found.
NOTE: PROCEDURE SEVERITY used (Total process time):
real time 0.58 seconds
cpu time 0.51 seconds
I do not know what is causing the Java error, but try using PLOTS=NONE to suppress plots.
Regarding the WARNINGS,
WARNING: For at least one observation, variable lnindicativefee has a negative value.
Ignoring such observations.
WARNING: No valid observations found.
The warning says that all of the observations are invalid for one of the distributions that you are fitting. Instead of using the _PREDEFINED_ keyword, specify the distributions individually (for example, DIST Exponential). That will restrict the procedure to only the distributions of interest. You can also use PRINT=ALL to find out more information about each fit.
Remember that several of these distributions have restrictions on the value of the observations. For example, negative values are invalid for the exponential distribution. Similar restrictions apply for the lognormal and gamma distributions.
The simpler, nonmodeling approach is using PROC UNIVARIATE. See this note on distribution testing and parameter estimation.
Does this mean I have to try the parameters to see which one fits best ?
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.