BookmarkSubscribeRSS Feed
star68
Calcite | Level 5

I have a couple of questions regarding the proc fmm output:

1) How could I convert histogram (the empirical data) into line graph?  

2) The actual distribution graph looks like a gamma curve with alpha=3 and beta=0.5.  x is from 0.01 to 8 by 0.01.  Note that 1) there are lots of missing y values for x values starting from 4.2.  2) There's a lot of volatility in the counts which I need to capture in the fitted curve so that my cdf is actually approaching 1, not 0.92 or 1.2

Right now I am doing trial and error by changing the number of parameters, parameter starting values and distribution name.  What is the best way to get the optimal fitted distribution result?  

4 REPLIES 4
Rick_SAS
SAS Super FREQ

1. You can use the OUTHIST= option on the HISTOGRAM statement in PROC UNIVARIATE to get the histogram information in a SAS data set, which you can then graph:

 

proc univariate data=sashelp.cars;
   var mpg_city;
   histogram mpg_city / grid vscale=proportion ENDPOINTS OUTHIST=OutHist;
run;

proc sgplot data=OutHist;
   series x=_MINPT_ y=_OBSPCT_;
run;

2. It sounds like you want to renormalize the gamma distribution by truncating the tail and then dividing by the area under the curve on [0, b], where b is the truncation point. (Maybe you want b=4.2? I'm not sure.) This is called a truncated gamma distribution.  I have shown how to form the truncated NORMAL distribution, and maybe that article can guide you.  The main idea is to divide the PDF and CDF by the areas under the CDF up to the cutoff point. For example, here is the computation for X_max = 4.2:

 

/* rescale the PDF so that it is a density on [0, b], where b=4.2 */
data Gamma;
alpha = 3; beta = 0.5;
Denom = cdf('gamma', 4.2, alpha, beta);  /* scaling factor = AUC on [0, 4.2] */
do x = 0 to 4.2 by 0.1;
   PDF = pdf('gamma', x, alpha, beta) / Denom;
   CDF = cdf('gamma', x, alpha, beta) / Denom;
   output;
end;
run;

title "Rescaled PDF and CDF for Gamma(3, 0.5)";
proc sgplot data=Gamma;
   series x=x y=PDF / curvelabel;
   series x=x y=CDF / curvelabel;
   refline 1 / axis=y;
run;
star68
Calcite | Level 5
Thank you very much for your response. I need to clarify and ask you the following questions:
1) How could I put both the actual distribution and proc fmm generated mixture distributions on the same graph?
2) To clarify that even though there are some missing y values for x values from 4.2 to 8.0, the long right tail in general has an asymptote of 0.0003 with little volatility. How could I adjust the fmm procedure for that?
Rick_SAS
SAS Super FREQ

1. I discuss this in the article "Fit a mixture of Weibull distributions in SAS." See the last section (before the Summary).  If you need the histogram as well, see "How to overlay a custom density curve on a histogram in SAS," which shows how to use GTL instead of PROC SGPLOT.

 

2. When you call PROC FMM, include

WHERE x < 4.2;
just after the PROC FMM statement. This will cause the parameter estimates to fit only the observations of interest. You might still want to adjust the PDF/CDF as I did in my original response to that the probability is 1 on [0, 4.2].

star68
Calcite | Level 5
As always, thank you very much, Rick for all your responses. They are helpful!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 442 views
  • 2 likes
  • 2 in conversation