Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- proc univariate

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 04-17-2017 09:44 PM
(2313 views)

I am working on fitting distributon to the data and now I am so confuse about the code.

I have found the example of creating a histogram to display lognormal fit and use code as follow;

title 'Lognormal dist. ';

ods select Histogram Lognormal.ParameterEstimates Lognormal.GoodnessOfFit FitQuantiles;

**proc** **univariate** data=uy2013;

var avg_claim;

histogram / lognormal(w=**3** theta=est)

odstitle = title;

inset n mean (**5.3**) std='Std Dev' (**5.3**) skewness (**5.3**) /

pos = ne

header = 'Summary Statistics';

**run**;

I would like to know 'Is this code for fitting two-parameter lognormal distribution?'

if it is, what theta=est is used for ???

the data that I used start from -0.100 but the threshold in the result is -772.2

why the threshold is that? (I am using Base SAS 9.4)

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Your statement is not correct. When you specify THETA=EST, you get a three-parameter fit.

The simple form of your call is

**proc** **univariate** data=uy2013;

var avg_claim;

histogram / lognormal(theta=est); /* fit three-parameter lognormal distrib */

**run**;

If you want a two-parameter fit, specify a lower bound for the threshold parameter, or accept the default, which is THETA=0:

**proc** **univariate** data=uy2013;

var avg_claim;

histogram / lognormal(theta=1); /* sets THETA=1 as threshold parameter (lower bound) */

**run**;

9 REPLIES 9

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

From http://support.sas.com/documentation/cdl/en/procstat/70116/HTML/default/viewer.htm#procstat_univaria...

It says:

Suggest you look at the rest of the UNIVARIATE documentation in detail. The Overview starts here:

http://support.sas.com/documentation/cdl/en/procstat/70116/HTML/default/viewer.htm#procstat_univaria...

cynthia

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

**theta=est** but the result of this is two-parameter lognormal distribution not three.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Your statement is not correct. When you specify THETA=EST, you get a three-parameter fit.

The simple form of your call is

**proc** **univariate** data=uy2013;

var avg_claim;

histogram / lognormal(theta=est); /* fit three-parameter lognormal distrib */

**run**;

If you want a two-parameter fit, specify a lower bound for the threshold parameter, or accept the default, which is THETA=0:

**proc** **univariate** data=uy2013;

var avg_claim;

histogram / lognormal(theta=1); /* sets THETA=1 as threshold parameter (lower bound) */

**run**;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

thank you so much ,it helps a lot.

but may I ask you for more information.

After i run the statement as per your suggestion, please find my distribution output as attached.

title 'Lognormal dist.;

ods select Histogram Lognormal.ParameterEstimates Lognormal.GoodnessOfFit FitQuantiles;

**proc** **univariate** data=ec.uy2013;

var root_avg_jt;

where root_avg_jt ge **55** and root_avg_jt le **496**;

histogram / lognormal(w=**3** threshold=**46.7**)

odstitle = title;

inset n mean (**5.3**) std='Std Dev' (**5.3**) skewness (**5.3**) /

pos = ne

header = 'Summary Statistics';

**run**;

I am just curious:

1. I want to make sure that the output is 2-parameters lognormal (not 3 parameters).

2. how do we know the lower bound/theta? should I start from considering the data range (minimum value)?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes, your model is two-parameter when you specify the THRESHOLD= value.

The threshold value comes from using domain knowledge of the data. For example, the lognormal and Weibull distributions are often used to model time-to-failure for some component. The time must always be positive, so threshold=0 for that application. Most two-parameter families implicitly assume that the threshold is zero.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you so much for your kindness.

it helps a lot.

it helps a lot.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Reference is made to my questions on the SAS communities regarding 2 or 3 parameters distribution.

As I’m still not sure about the output, could you please let me have more clarification as following?

The data that I used for fitting distribution is loss data (claim) with range [50 to 524] (after transforming data by square root).

Because, I don’t know how to set the threshold, I ran SAS as “threshold = est”. So I got the threshold value which is 46.9.

After that, I specified the mentioned threshold value of 46.9 in the histogram statement again. I think I got the 2 parameter log normal distribution (with p-value 0.017) as per suggestion.

However, could you please let me have your confirmation if this model is valid?

Additionally, I am not sure for the next step of “simulation”. Please advise us how to simulate the 2 parameter distribution with specify threshold like this case?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Technically you have fit a three-parameter distribution because you are using a threshold parameter that came from estimating the data. A proper two-parameter family would use a threshold parameter that is based on domain-specific knowledge of the population distribution, not a sample.

However, I don't understand why you are worrying about this subtle aspect of the problem. If your goal is to simulate from a two-parameter lognormal distribution and thereby generate many samples that look like the observed data, then what you have done is perfectly fine.

For simulation, see the article "Simulate lognormal data in SAS."

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.