turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS/GRAPH and ODS Graphics
- /
- Overlay historgram and distributions

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-28-2016 11:44 AM - edited 10-28-2016 01:14 PM

I've read & studied some of Risk's posts about overlaying histograms/distributions, but still not getting it.

Been trying different things with Univariate, Capability, etc.

I have continuous data (supplement intake plot below) that I'm trying to fit various distributions (gamma, beta, lognormal, exponential, invgauss). Having trouble specifying mu = , sigma = , etc.

with help from the internet:

title 'supplement'; ods graphics on;

ods select histogram parameterestimates goodnessoffit fitquantiles;

proc univariate data = growth; var suppintake;

histogram / midpoints = 0.2 to 0.8 by 0.2

lognormal weibull gamm odstitle = title;

inset n mean (5.3) std = 'Std Dev' (5.3) skewness (5.3)

/pos = ne header = 'Summary Stats';

run;

I got the following to work, but none of the distributions fit the continuous data (feed intake)

DATA LAMB; SET grow;

PROC SORT; BY DAY ID JUN UREA;

RUN;QUIT;

ods graphics on;

ods select Histogram ParameterEstimates GoodnessOfFit FitQuantiles;

proc univariate;

var suppDMIkg;

histogram / midpoints=0.2 to 0.8 by 0.2

lognormal

weibull

gamma;

inset n mean(5.3) std='Std Dev'(5.3) skewness(5.3)

/ pos = ne header = 'Summary Statistics';

run;

Save

Accepted Solutions

Solution

10-28-2016
02:49 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to AgReseach7

10-28-2016 02:11 PM

There is nothing wrong with those notes in the log. They are not errors, just information. You can make the first NOTE go away by specifying lognormal(THETA=0).

As to the fit, by convention the distributions that we call lognormal, Weibull, and Gamma distributions have positive skewness.

However, your data have negative skewness, so the data distribution doesn't look anything like these theoretical distributions.

But that's no problem, because you can apply a linear transformation of the form x --> a - b*x. This will "flip" the direction of the tail of the data so that the data distribution can be modeled by the standard distributions.

For example, the following data step creates a new variable "OneMinusSuppDMIkg" that has the value 1-SuppDMIkg. This new variable has positive skew and the smallest value is 0.49 so you can model it as follows:

```
data A;
set growth;
OneMinusSuppDMIkg = 1 - suppDMIkg;
run;
proc univariate data=A;
histogram oneMinusSuppDMIkg / midpoints=0.475 to 0.8 by 0.025
lognormal(theta=0.48)
gamma(theta=0.48)
weibull(theta=0.48);
run;
```

Equivalently, you could define Q = 0.52 - suppDMIkg and then use THETA=0 as the threashold.

Usually someone with domain knowledge can figure out a nice interpretable transformation. For example, if the measurements are "centimeters for a manufactured part," you might want to change units to "deviations less than the upper specification limit."

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to AgReseach7

10-28-2016 11:52 AM - edited 10-28-2016 11:53 AM

What is your question?

If you want to create a Q-Q plot, as in your images, then use the QQPLOT statement.

If you want to specify values for the parameter, rather than have the software find maximum likelihood estimates, then specify the parameter values in parentheses after the name of the distribution. For example:

HISTOGRAM / lognormal(mu=10 sigma=2) gamma(theta=0) weibull(theta=0 C=EST);

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

10-28-2016 01:42 PM

Hey Rick.

I initially posted & then edited (last part that I got to work).

I attached the data if needed.

My specific questions:

1. I guess that was my 1st question: how to specify mu sigma theta.

2. Any issues with the following log statements?

NOTE: Since a threshold parameter (THETA) was not specified for the lognormal fit for

suppDMIkg, a zero threshold is assumed.

NOTE: At least one W.D format was too small for the number to be printed. The decimal may

be shifted by the "BEST" format.

2. My data are continuous but for suppDMIkg (supplement intake), I can't get any distribution to fit & am thus, stuck.

Thanks for your time

Solution

10-28-2016
02:49 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to AgReseach7

10-28-2016 02:11 PM

There is nothing wrong with those notes in the log. They are not errors, just information. You can make the first NOTE go away by specifying lognormal(THETA=0).

As to the fit, by convention the distributions that we call lognormal, Weibull, and Gamma distributions have positive skewness.

However, your data have negative skewness, so the data distribution doesn't look anything like these theoretical distributions.

But that's no problem, because you can apply a linear transformation of the form x --> a - b*x. This will "flip" the direction of the tail of the data so that the data distribution can be modeled by the standard distributions.

For example, the following data step creates a new variable "OneMinusSuppDMIkg" that has the value 1-SuppDMIkg. This new variable has positive skew and the smallest value is 0.49 so you can model it as follows:

```
data A;
set growth;
OneMinusSuppDMIkg = 1 - suppDMIkg;
run;
proc univariate data=A;
histogram oneMinusSuppDMIkg / midpoints=0.475 to 0.8 by 0.025
lognormal(theta=0.48)
gamma(theta=0.48)
weibull(theta=0.48);
run;
```

Equivalently, you could define Q = 0.52 - suppDMIkg and then use THETA=0 as the threashold.

Usually someone with domain knowledge can figure out a nice interpretable transformation. For example, if the measurements are "centimeters for a manufactured part," you might want to change units to "deviations less than the upper specification limit."

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

11-04-2016 04:54 PM

For additional thoughts, discussion, and an example of "reversing the distribution" when the data has negative skewness, see

"Sometimes you need to reverse the data before you fit a distribution."