Help using Base SAS procedures

proc univariate

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 11
Accepted Solution

proc univariate

I am working on fitting distributon to the data and now I am so confuse about the code.

I have found the example of creating a histogram to display lognormal fit and use code as follow;

 

title 'Lognormal dist.  ';

ods select Histogram Lognormal.ParameterEstimates Lognormal.GoodnessOfFit FitQuantiles;

proc univariate data=uy2013;

var avg_claim;

histogram / lognormal(w=3 theta=est)

odstitle = title;

inset n mean (5.3) std='Std Dev' (5.3) skewness (5.3) /

pos = ne

header = 'Summary Statistics';

run;

 

 

I would like to know 'Is this code for fitting two-parameter lognormal distribution?'

if it is, what theta=est is used for ??? 

the data that I used start from -0.100 but the threshold in the result is -772.2

why the threshold is that? (I am using Base SAS 9.4)


Accepted Solutions
Solution
‎04-19-2017 10:11 PM
SAS Super FREQ
Posts: 3,483

Re: proc univariate

Your statement is not correct. When you specify THETA=EST, you get a three-parameter fit.

The simple form of your call is

 

proc univariate data=uy2013;

var avg_claim;

histogram / lognormal(theta=est);   /* fit three-parameter lognormal distrib */

run;

 

If you want a two-parameter fit, specify a lower bound for the threshold parameter, or accept the default, which is THETA=0:

 

proc univariate data=uy2013;

var avg_claim;

histogram / lognormal(theta=1);   /* sets THETA=1 as threshold parameter (lower bound) */

run;

View solution in original post


All Replies
Valued Guide
Posts: 632

Re: proc univariate

[ Edited ]

The THETA=EST option requests that the maximum likelihood estimate of theta be used as the threshold.

SAS Super FREQ
Posts: 8,744

Re: proc univariate

[ Edited ]
Occasional Contributor
Posts: 11

Re: proc univariate

http://support.sas.com/documentation/cdl/en/procstat/70116/HTML/default/viewer.htm#procstat_univaria...

 

why this example specifies theta=est but the result of this is two-parameter lognormal distribution not three.

Solution
‎04-19-2017 10:11 PM
SAS Super FREQ
Posts: 3,483

Re: proc univariate

Your statement is not correct. When you specify THETA=EST, you get a three-parameter fit.

The simple form of your call is

 

proc univariate data=uy2013;

var avg_claim;

histogram / lognormal(theta=est);   /* fit three-parameter lognormal distrib */

run;

 

If you want a two-parameter fit, specify a lower bound for the threshold parameter, or accept the default, which is THETA=0:

 

proc univariate data=uy2013;

var avg_claim;

histogram / lognormal(theta=1);   /* sets THETA=1 as threshold parameter (lower bound) */

run;

Occasional Contributor
Posts: 11

Re: proc univariate

thank you so much ,it helps a lot.
but may I ask you for more information.

 

After i run the statement as per your suggestion, please find my distribution output as attached.

 

title 'Lognormal dist.;

ods select Histogram Lognormal.ParameterEstimates Lognormal.GoodnessOfFit FitQuantiles;

proc univariate data=ec.uy2013;

var root_avg_jt;

where root_avg_jt ge 55 and root_avg_jt le 496;

histogram / lognormal(w=3 threshold=46.7)

odstitle = title;

inset n mean (5.3) std='Std Dev' (5.3) skewness (5.3) /

pos = ne

header = 'Summary Statistics';

run;

 

I am just curious:

1. I want to make sure that the output is 2-parameters lognormal (not 3 parameters).

2. how do we know the lower bound/theta? should I start from considering the data range (minimum value)? 


distribution output.JPG
SAS Super FREQ
Posts: 3,483

Re: proc univariate

Yes, your model is two-parameter when you specify the THRESHOLD= value.

 

The threshold value comes from using domain knowledge of the data. For example, the lognormal and Weibull distributions are often used to model time-to-failure for some component. The time must always be positive, so threshold=0 for that application. Most two-parameter families implicitly assume that the threshold is zero.

Occasional Contributor
Posts: 11

Re: proc univariate

Thank you so much for your kindness.
it helps a lot.
Occasional Contributor
Posts: 11

Re: proc univariate

Reference is made to my questions on the SAS communities regarding 2 or 3 parameters distribution.

As I’m still not sure about the output, could you please let me have more clarification as following?

The data that I used for fitting distribution is loss data (claim) with range [50 to 524] (after transforming data by square root).

Because, I don’t know how to set the threshold, I ran SAS as “threshold = est”. So I got the threshold value which is 46.9.

After that, I specified the mentioned threshold value of 46.9 in the histogram statement again. I think I got the 2 parameter log normal distribution (with p-value 0.017) as per suggestion.

However, could you please let me have your confirmation if this model is valid?

Additionally, I am not sure for the next step of “simulation”. Please advise us how to simulate the 2 parameter distribution with specify threshold like this case?

SAS Super FREQ
Posts: 3,483

Re: proc univariate

[ Edited ]

Technically you have fit a three-parameter distribution because you are using a threshold parameter that came from estimating the data. A  proper two-parameter family would use a threshold parameter that is based on domain-specific knowledge of the population distribution, not a sample.

 

However, I don't understand why you are worrying about this subtle aspect of the problem. If your goal is to simulate from a two-parameter lognormal distribution and thereby generate many samples that look like the observed data, then what you have done is perfectly fine.

 

For simulation, see the article "Simulate lognormal data in SAS."

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 326 views
  • 1 like
  • 4 in conversation