turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- fit distribution to edf

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-12-2014 10:24 AM

Dear SAS-experts,

I have an empirical distribtion function: a couple of income classes and the number of households in them,

like 30% under 1000 USD, 40% under 2000 USD etc.

And I'd like to fit a lognormal distribtion to my data and get its parameters.

If I had raw data at the household level, I'd use proc univarite with the histogram option.

But I am at a loss in my situation where my data is cumulated already.

Accepted Solutions

Solution

09-13-2014
06:15 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-13-2014 06:15 AM

You can consider your data as interval censored data. You should have a lowerbound and upper bound for each interval, though for the highest interval the upper bound should be missing and for the lowest interval the lowerbound should be missing (zero is not allowed).

Intervalcensored data can be fitted with proc lifereg, and you can specify that your data was interval censored. The percent of observations within each interval should be used as weight.

I made an example were I first simulate data from a lognormal distribution, then I find the frequency of observations within specified intervals Then I estimate the parameteres with lifereg, and I find almost same parameters as those I used in the simulation. The intercept estimate is the estimated mean of log(income) and scale is the standard-deviation of log(income).

%let highestvalue=1000;

%let invervallength=5;

data test;

do i=1 to 100000;

y=exp(rand('normal',5,2));

logy=log(y);

lower=min(&highestvalue,&invervallength*int(y/&invervallength));

output;

end;

keep lower logy y;

run;

ods select none;

ods output onewayfreqs=freqs(keep=lower percent);

proc freq data=test;

table lower/missing;

run;

ods select all;

data freqs;

set freqs;

upper=lower+&invervallength.;

if lower=0 then lower=.;

if lower>=&highestvalue then do; upper=.;lower=&highestvalue.;end;

run;

proc lifereg data=freqs;

model (lower upper)=/distribution=lnormal;

weight percent;

run;

All Replies

Solution

09-13-2014
06:15 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-13-2014 06:15 AM

You can consider your data as interval censored data. You should have a lowerbound and upper bound for each interval, though for the highest interval the upper bound should be missing and for the lowest interval the lowerbound should be missing (zero is not allowed).

Intervalcensored data can be fitted with proc lifereg, and you can specify that your data was interval censored. The percent of observations within each interval should be used as weight.

I made an example were I first simulate data from a lognormal distribution, then I find the frequency of observations within specified intervals Then I estimate the parameteres with lifereg, and I find almost same parameters as those I used in the simulation. The intercept estimate is the estimated mean of log(income) and scale is the standard-deviation of log(income).

%let highestvalue=1000;

%let invervallength=5;

data test;

do i=1 to 100000;

y=exp(rand('normal',5,2));

logy=log(y);

lower=min(&highestvalue,&invervallength*int(y/&invervallength));

output;

end;

keep lower logy y;

run;

ods select none;

ods output onewayfreqs=freqs(keep=lower percent);

proc freq data=test;

table lower/missing;

run;

ods select all;

data freqs;

set freqs;

upper=lower+&invervallength.;

if lower=0 then lower=.;

if lower>=&highestvalue then do; upper=.;lower=&highestvalue.;end;

run;

proc lifereg data=freqs;

model (lower upper)=/distribution=lnormal;

weight percent;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-15-2014 02:55 AM

Hi Jacob, that's terrific, thanks a lot !