BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AgReseach7
Obsidian | Level 7

I've read & studied some of Risk's posts about overlaying histograms/distributions, but still not getting it.

Been trying different things with Univariate, Capability, etc.

I have continuous data (supplement intake plot below) that I'm trying to fit various distributions (gamma, beta, lognormal, exponential, invgauss). Having trouble specifying mu = , sigma = , etc.

 

with help from the internet:

title 'supplement';  ods graphics on;

ods select histogram parameterestimates goodnessoffit fitquantiles;

proc univariate data = growth;  var suppintake;

  histogram / midpoints = 0.2 to 0.8 by 0.2

             lognormal  weibull  gamm  odstitle = title;

inset n mean (5.3) std = 'Std Dev' (5.3) skewness (5.3)

  /pos = ne header = 'Summary Stats';

run;

 

 

Supp intake graph.jpg

 

goat serum graph.png

 

 

 

I got the following to work, but none of the distributions fit the continuous data (feed intake)

DATA LAMB; SET grow;
PROC SORT; BY DAY ID JUN UREA;
RUN;QUIT;

ods graphics on;
ods select Histogram ParameterEstimates GoodnessOfFit FitQuantiles;
proc univariate;
   var suppDMIkg;
   histogram / midpoints=0.2 to 0.8 by 0.2
               lognormal
               weibull
               gamma;
   inset n mean(5.3) std='Std Dev'(5.3) skewness(5.3)
          / pos = ne  header = 'Summary Statistics';
run;

Save

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

There is nothing wrong with those notes in the log. They are not errors, just information. You can make the first NOTE go away by specifying lognormal(THETA=0).

 

As to the fit, by convention the distributions that we call lognormal, Weibull, and Gamma distributions have positive skewness. 

However, your data have negative skewness, so the data distribution doesn't look anything like these theoretical distributions.

But that's no problem, because you can apply a linear transformation of the form x --> a - b*x. This will "flip" the direction of the tail of the data so that the data distribution can be modeled by the standard distributions.

 

For example, the following data step creates a new variable "OneMinusSuppDMIkg" that has the value 1-SuppDMIkg.  This new variable has positive skew and the smallest value is 0.49 so you can model it as follows:

 

data A;
set growth;
OneMinusSuppDMIkg = 1 - suppDMIkg;
run;

proc univariate data=A;
histogram oneMinusSuppDMIkg / midpoints=0.475 to 0.8 by 0.025
          lognormal(theta=0.48)
          gamma(theta=0.48)
          weibull(theta=0.48);
run;

Equivalently, you could define Q = 0.52 - suppDMIkg and then use THETA=0 as the threashold.

 

Usually someone with domain knowledge can figure out a nice interpretable transformation. For example, if the measurements are "centimeters for a manufactured part," you might want to change units to "deviations less than the upper specification limit."

View solution in original post

4 REPLIES 4
Rick_SAS
SAS Super FREQ

What is your question?

 

If you want to create a Q-Q plot, as in your images, then use the QQPLOT statement.

 

If you want to specify values for the parameter, rather than have the software find maximum likelihood estimates, then specify the parameter values in parentheses after the name of the distribution. For example:

HISTOGRAM / lognormal(mu=10 sigma=2) gamma(theta=0) weibull(theta=0 C=EST);

 

 

AgReseach7
Obsidian | Level 7

Hey Rick.

I initially posted & then edited (last part that I got to work).

I attached the data if needed.

 

My specific questions:

1. I guess that was my 1st question: how to specify mu sigma theta.

2. Any issues with the following log statements?

  NOTE: Since a threshold parameter (THETA) was not specified for the lognormal fit for
      suppDMIkg, a zero threshold is assumed.
  NOTE: At least one W.D format was too small for the number to be printed. The decimal may
      be shifted by the "BEST" format.

 

2. My data are continuous but for suppDMIkg (supplement intake), I can't get any distribution to fit & am thus, stuck.

 

Thanks for your time

Rick_SAS
SAS Super FREQ

There is nothing wrong with those notes in the log. They are not errors, just information. You can make the first NOTE go away by specifying lognormal(THETA=0).

 

As to the fit, by convention the distributions that we call lognormal, Weibull, and Gamma distributions have positive skewness. 

However, your data have negative skewness, so the data distribution doesn't look anything like these theoretical distributions.

But that's no problem, because you can apply a linear transformation of the form x --> a - b*x. This will "flip" the direction of the tail of the data so that the data distribution can be modeled by the standard distributions.

 

For example, the following data step creates a new variable "OneMinusSuppDMIkg" that has the value 1-SuppDMIkg.  This new variable has positive skew and the smallest value is 0.49 so you can model it as follows:

 

data A;
set growth;
OneMinusSuppDMIkg = 1 - suppDMIkg;
run;

proc univariate data=A;
histogram oneMinusSuppDMIkg / midpoints=0.475 to 0.8 by 0.025
          lognormal(theta=0.48)
          gamma(theta=0.48)
          weibull(theta=0.48);
run;

Equivalently, you could define Q = 0.52 - suppDMIkg and then use THETA=0 as the threashold.

 

Usually someone with domain knowledge can figure out a nice interpretable transformation. For example, if the measurements are "centimeters for a manufactured part," you might want to change units to "deviations less than the upper specification limit."

Rick_SAS
SAS Super FREQ

For additional thoughts, discussion, and an example of "reversing the distribution" when the data has negative skewness, see

"Sometimes you need to reverse the data before you fit a distribution."

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1543 views
  • 0 likes
  • 2 in conversation