turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- proc univariate

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-17-2017 09:44 PM

I am working on fitting distributon to the data and now I am so confuse about the code.

I have found the example of creating a histogram to display lognormal fit and use code as follow;

title 'Lognormal dist. ';

ods select Histogram Lognormal.ParameterEstimates Lognormal.GoodnessOfFit FitQuantiles;

**proc** **univariate** data=uy2013;

var avg_claim;

histogram / lognormal(w=**3** theta=est)

odstitle = title;

inset n mean (**5.3**) std='Std Dev' (**5.3**) skewness (**5.3**) /

pos = ne

header = 'Summary Statistics';

**run**;

I would like to know 'Is this code for fitting two-parameter lognormal distribution?'

if it is, what theta=est is used for ???

the data that I used start from -0.100 but the threshold in the result is -772.2

why the threshold is that? (I am using Base SAS 9.4)

Accepted Solutions

Solution

04-19-2017
10:11 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Peaw

04-18-2017 10:24 AM

Your statement is not correct. When you specify THETA=EST, you get a three-parameter fit.

The simple form of your call is

**proc** **univariate** data=uy2013;

var avg_claim;

histogram / lognormal(theta=est); /* fit three-parameter lognormal distrib */

**run**;

If you want a two-parameter fit, specify a lower bound for the threshold parameter, or accept the default, which is THETA=0:

**proc** **univariate** data=uy2013;

var avg_claim;

histogram / lognormal(theta=1); /* sets THETA=1 as threshold parameter (lower bound) */

**run**;

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Peaw

04-17-2017 10:01 PM - edited 04-17-2017 10:03 PM

The THETA=EST option requests that the maximum likelihood estimate of theta be used as the threshold.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Peaw

04-17-2017 10:06 PM - edited 04-17-2017 10:07 PM

Hi:

From http://support.sas.com/documentation/cdl/en/procstat/70116/HTML/default/viewer.htm#procstat_univaria...

It says:

Suggest you look at the rest of the UNIVARIATE documentation in detail. The Overview starts here:

http://support.sas.com/documentation/cdl/en/procstat/70116/HTML/default/viewer.htm#procstat_univaria...

cynthia

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Cynthia_sas

04-18-2017 12:00 AM

why this example specifies **theta=est** but the result of this is two-parameter lognormal distribution not three.

Solution

04-19-2017
10:11 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Peaw

04-18-2017 10:24 AM

Your statement is not correct. When you specify THETA=EST, you get a three-parameter fit.

The simple form of your call is

**proc** **univariate** data=uy2013;

var avg_claim;

histogram / lognormal(theta=est); /* fit three-parameter lognormal distrib */

**run**;

If you want a two-parameter fit, specify a lower bound for the threshold parameter, or accept the default, which is THETA=0:

**proc** **univariate** data=uy2013;

var avg_claim;

histogram / lognormal(theta=1); /* sets THETA=1 as threshold parameter (lower bound) */

**run**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

04-18-2017 11:13 PM

thank you so much ,it helps a lot.

but may I ask you for more information.

After i run the statement as per your suggestion, please find my distribution output as attached.

title 'Lognormal dist.;

ods select Histogram Lognormal.ParameterEstimates Lognormal.GoodnessOfFit FitQuantiles;

**proc** **univariate** data=ec.uy2013;

var root_avg_jt;

where root_avg_jt ge **55** and root_avg_jt le **496**;

histogram / lognormal(w=**3** threshold=**46.7**)

odstitle = title;

inset n mean (**5.3**) std='Std Dev' (**5.3**) skewness (**5.3**) /

pos = ne

header = 'Summary Statistics';

**run**;

I am just curious:

1. I want to make sure that the output is 2-parameters lognormal (not 3 parameters).

2. how do we know the lower bound/theta? should I start from considering the data range (minimum value)?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Peaw

04-19-2017 05:52 AM

Yes, your model is two-parameter when you specify the THRESHOLD= value.

The threshold value comes from using domain knowledge of the data. For example, the lognormal and Weibull distributions are often used to model time-to-failure for some component. The time must always be positive, so threshold=0 for that application. Most two-parameter families implicitly assume that the threshold is zero.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

04-20-2017 03:57 AM

Thank you so much for your kindness.

it helps a lot.

it helps a lot.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

05-01-2017 10:39 PM

Reference is made to my questions on the SAS communities regarding 2 or 3 parameters distribution.

As I’m still not sure about the output, could you please let me have more clarification as following?

The data that I used for fitting distribution is loss data (claim) with range [50 to 524] (after transforming data by square root).

Because, I don’t know how to set the threshold, I ran SAS as “threshold = est”. So I got the threshold value which is 46.9.

After that, I specified the mentioned threshold value of 46.9 in the histogram statement again. I think I got the 2 parameter log normal distribution (with p-value 0.017) as per suggestion.

However, could you please let me have your confirmation if this model is valid?

Additionally, I am not sure for the next step of “simulation”. Please advise us how to simulate the 2 parameter distribution with specify threshold like this case?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Peaw

05-02-2017 07:53 AM - edited 05-10-2017 03:37 PM

Technically you have fit a three-parameter distribution because you are using a threshold parameter that came from estimating the data. A proper two-parameter family would use a threshold parameter that is based on domain-specific knowledge of the population distribution, not a sample.

However, I don't understand why you are worrying about this subtle aspect of the problem. If your goal is to simulate from a two-parameter lognormal distribution and thereby generate many samples that look like the observed data, then what you have done is perfectly fine.

For simulation, see the article "Simulate lognormal data in SAS."