Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- determine the distribution for a sample

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 10-17-2018 07:25 PM
(5999 views)

I would like to find a distribution that best fit the sample of a variable. the distribution could be normal, gamma, exponential, or log-normal etc. Is there a way to tell SAS to find the distribution and provide the parameters ?

- Tags:
- distribution

8 REPLIES 8

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

To my knowledge there is no automatic procedure. But you can pit the distributions against each other by fitting them to your data as a mixture with proc FMM

```
proc fmm data=sashelp.heart plots=none componentinfo gconv=0;
model cholesterol = / dist=normal label="Normal";
model cholesterol = / dist=lognormal label="Lognormal";
model cholesterol = / dist=gamma label="Gamma";
model cholesterol = / dist=exponential label="Exponential";
run;
```

Mixing Standard Component Probability GLogit(Prob) Error z Value Pr > |z| 1 0 -5.97E13 0 . . 2 0.9897 6.0324 0.4371 13.80 <.0001 3 0.0079 1.2062 0.4639 2.60 0.0093 4 0.0024 0

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Please post the portion of the SAS log that shows the error.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Error 1:

```
355 proc severity data=sample2 crit=aicc;
NOTE: Writing HTML Body file: sashtml.htm
356 loss indicativefee_mean;
357 dist _predefined_;
358 run;
ERROR: Java virtual machine exception. java.lang.OutOfMemoryError: GC overhead limit
exceeded.
ERROR: Java virtual machine exception. java.lang.OutOfMemoryError: GC overhead limit
exceeded.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1972257 observations read from the data set
WORK._DOCTMP000000000000000000001.
NOTE: PROCEDURE SEVERITY used (Total process time):
real time 1:26.26
cpu time 1:02.89
```

Error 2:

```
360 proc severity data=sample2 crit=aicc;
361 loss lnindicativefee;
362 dist _predefined_;
363 run;
WARNING: For at least one observation, variable lnindicativefee has a negative value.
Ignoring such observations.
WARNING: No valid observations found.
NOTE: PROCEDURE SEVERITY used (Total process time):
real time 0.58 seconds
cpu time 0.51 seconds
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I do not know what is causing the Java error, but try using PLOTS=NONE to suppress plots.

Regarding the WARNINGS,

```
WARNING: For at least one observation, variable lnindicativefee has a negative value.
Ignoring such observations.
WARNING: No valid observations found.
```

The warning says that all of the observations are invalid for one of the distributions that you are fitting. Instead of using the _PREDEFINED_ keyword, specify the distributions individually (for example, DIST Exponential). That will restrict the procedure to only the distributions of interest. You can also use PRINT=ALL to find out more information about each fit.

Remember that several of these distributions have restrictions on the value of the observations. For example, negative values are invalid for the exponential distribution. Similar restrictions apply for the lognormal and gamma distributions.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Tags:
- distribution fitting

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Does this mean I have to try the parameters to see which one fits best ?

Secure your spot at the must-attend AI and analytics event of 2024: SAS Innovate 2024! Get ready for a jam-packed agenda featuring workshops, super demos, breakout sessions, roundtables, inspiring keynotes and incredible networking events.

Register by March 1 to snag the Early Bird rate of just $695! Don't miss out on this exclusive offer.

** **

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.