- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am working on fitting distributon to the data and now I am so confuse about the code.
I have found the example of creating a histogram to display lognormal fit and use code as follow;
title 'Lognormal dist. ';
ods select Histogram Lognormal.ParameterEstimates Lognormal.GoodnessOfFit FitQuantiles;
proc univariate data=uy2013;
var avg_claim;
histogram / lognormal(w=3 theta=est)
odstitle = title;
inset n mean (5.3) std='Std Dev' (5.3) skewness (5.3) /
pos = ne
header = 'Summary Statistics';
run;
I would like to know 'Is this code for fitting two-parameter lognormal distribution?'
if it is, what theta=est is used for ???
the data that I used start from -0.100 but the threshold in the result is -772.2
why the threshold is that? (I am using Base SAS 9.4)
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your statement is not correct. When you specify THETA=EST, you get a three-parameter fit.
The simple form of your call is
proc univariate data=uy2013;
var avg_claim;
histogram / lognormal(theta=est); /* fit three-parameter lognormal distrib */
run;
If you want a two-parameter fit, specify a lower bound for the threshold parameter, or accept the default, which is THETA=0:
proc univariate data=uy2013;
var avg_claim;
histogram / lognormal(theta=1); /* sets THETA=1 as threshold parameter (lower bound) */
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The THETA=EST option requests that the maximum likelihood estimate of theta be used as the threshold.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi:
From http://support.sas.com/documentation/cdl/en/procstat/70116/HTML/default/viewer.htm#procstat_univaria...
It says:
Suggest you look at the rest of the UNIVARIATE documentation in detail. The Overview starts here:
http://support.sas.com/documentation/cdl/en/procstat/70116/HTML/default/viewer.htm#procstat_univaria...
cynthia
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
why this example specifies theta=est but the result of this is two-parameter lognormal distribution not three.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your statement is not correct. When you specify THETA=EST, you get a three-parameter fit.
The simple form of your call is
proc univariate data=uy2013;
var avg_claim;
histogram / lognormal(theta=est); /* fit three-parameter lognormal distrib */
run;
If you want a two-parameter fit, specify a lower bound for the threshold parameter, or accept the default, which is THETA=0:
proc univariate data=uy2013;
var avg_claim;
histogram / lognormal(theta=1); /* sets THETA=1 as threshold parameter (lower bound) */
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thank you so much ,it helps a lot.
but may I ask you for more information.
After i run the statement as per your suggestion, please find my distribution output as attached.
title 'Lognormal dist.;
ods select Histogram Lognormal.ParameterEstimates Lognormal.GoodnessOfFit FitQuantiles;
proc univariate data=ec.uy2013;
var root_avg_jt;
where root_avg_jt ge 55 and root_avg_jt le 496;
histogram / lognormal(w=3 threshold=46.7)
odstitle = title;
inset n mean (5.3) std='Std Dev' (5.3) skewness (5.3) /
pos = ne
header = 'Summary Statistics';
run;
I am just curious:
1. I want to make sure that the output is 2-parameters lognormal (not 3 parameters).
2. how do we know the lower bound/theta? should I start from considering the data range (minimum value)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, your model is two-parameter when you specify the THRESHOLD= value.
The threshold value comes from using domain knowledge of the data. For example, the lognormal and Weibull distributions are often used to model time-to-failure for some component. The time must always be positive, so threshold=0 for that application. Most two-parameter families implicitly assume that the threshold is zero.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
it helps a lot.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Reference is made to my questions on the SAS communities regarding 2 or 3 parameters distribution.
As I’m still not sure about the output, could you please let me have more clarification as following?
The data that I used for fitting distribution is loss data (claim) with range [50 to 524] (after transforming data by square root).
Because, I don’t know how to set the threshold, I ran SAS as “threshold = est”. So I got the threshold value which is 46.9.
After that, I specified the mentioned threshold value of 46.9 in the histogram statement again. I think I got the 2 parameter log normal distribution (with p-value 0.017) as per suggestion.
However, could you please let me have your confirmation if this model is valid?
Additionally, I am not sure for the next step of “simulation”. Please advise us how to simulate the 2 parameter distribution with specify threshold like this case?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Technically you have fit a three-parameter distribution because you are using a threshold parameter that came from estimating the data. A proper two-parameter family would use a threshold parameter that is based on domain-specific knowledge of the population distribution, not a sample.
However, I don't understand why you are worrying about this subtle aspect of the problem. If your goal is to simulate from a two-parameter lognormal distribution and thereby generate many samples that look like the observed data, then what you have done is perfectly fine.
For simulation, see the article "Simulate lognormal data in SAS."