I have a couple of questions regarding the proc fmm output:
1) How could I convert histogram (the empirical data) into line graph?
2) The actual distribution graph looks like a gamma curve with alpha=3 and beta=0.5. x is from 0.01 to 8 by 0.01. Note that 1) there are lots of missing y values for x values starting from 4.2. 2) There's a lot of volatility in the counts which I need to capture in the fitted curve so that my cdf is actually approaching 1, not 0.92 or 1.2
Right now I am doing trial and error by changing the number of parameters, parameter starting values and distribution name. What is the best way to get the optimal fitted distribution result?
1. You can use the OUTHIST= option on the HISTOGRAM statement in PROC UNIVARIATE to get the histogram information in a SAS data set, which you can then graph:
proc univariate data=sashelp.cars;
var mpg_city;
histogram mpg_city / grid vscale=proportion ENDPOINTS OUTHIST=OutHist;
run;
proc sgplot data=OutHist;
series x=_MINPT_ y=_OBSPCT_;
run;
2. It sounds like you want to renormalize the gamma distribution by truncating the tail and then dividing by the area under the curve on [0, b], where b is the truncation point. (Maybe you want b=4.2? I'm not sure.) This is called a truncated gamma distribution. I have shown how to form the truncated NORMAL distribution, and maybe that article can guide you. The main idea is to divide the PDF and CDF by the areas under the CDF up to the cutoff point. For example, here is the computation for X_max = 4.2:
/* rescale the PDF so that it is a density on [0, b], where b=4.2 */
data Gamma;
alpha = 3; beta = 0.5;
Denom = cdf('gamma', 4.2, alpha, beta); /* scaling factor = AUC on [0, 4.2] */
do x = 0 to 4.2 by 0.1;
PDF = pdf('gamma', x, alpha, beta) / Denom;
CDF = cdf('gamma', x, alpha, beta) / Denom;
output;
end;
run;
title "Rescaled PDF and CDF for Gamma(3, 0.5)";
proc sgplot data=Gamma;
series x=x y=PDF / curvelabel;
series x=x y=CDF / curvelabel;
refline 1 / axis=y;
run;
1. I discuss this in the article "Fit a mixture of Weibull distributions in SAS." See the last section (before the Summary). If you need the histogram as well, see "How to overlay a custom density curve on a histogram in SAS," which shows how to use GTL instead of PROC SGPLOT.
2. When you call PROC FMM, include
WHERE x < 4.2;
just after the PROC FMM statement. This will cause the parameter estimates to fit only the observations of interest. You might still want to adjust the PDF/CDF as I did in my original response to that the probability is 1 on [0, 4.2].
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.