Solved
New Contributor
Posts: 2

fit distribution to edf

Dear SAS-experts,

I have an empirical distribtion function: a couple of income classes and the number of households in them,

like 30% under 1000 USD, 40% under 2000 USD etc.

And I'd like to fit a lognormal distribtion to my data and get its parameters.

If I had raw data at the household level, I'd use proc univarite with the histogram option.

But I am at a loss in my situation where my data is cumulated already.

Accepted Solutions
Solution
‎09-13-2014 06:15 AM
Super Contributor
Posts: 301

Re: fit distribution to edf

You can consider your data as interval censored data. You should have a lowerbound and upper bound for each interval, though for the highest interval the upper bound should be missing and for the lowest interval the lowerbound should be missing (zero is not allowed).

Intervalcensored data can be fitted with proc lifereg, and you can specify that your data was interval censored. The percent of observations within each interval should be used as weight.

I made an example were I first simulate data from a lognormal distribution, then I find the frequency of observations within specified intervals Then I estimate the parameteres with lifereg, and I find almost same parameters as those I used in the simulation. The intercept estimate is the estimated mean of log(income) and scale is the standard-deviation of log(income).

%let highestvalue=1000;

%let invervallength=5;

data test;

do i=1 to 100000;

y=exp(rand('normal',5,2));

logy=log(y);

lower=min(&highestvalue,&invervallength*int(y/&invervallength));

output;

end;

keep lower  logy y;

run;

ods select none;

ods output onewayfreqs=freqs(keep=lower percent);

proc freq data=test;

table lower/missing;

run;

ods select all;

data freqs;

set freqs;

upper=lower+&invervallength.;

if lower=0 then lower=.;

if lower>=&highestvalue then do; upper=.;lower=&highestvalue.;end;

run;

proc lifereg data=freqs;

model (lower upper)=/distribution=lnormal;

weight percent;

run;

All Replies
Solution
‎09-13-2014 06:15 AM
Super Contributor
Posts: 301

Re: fit distribution to edf

You can consider your data as interval censored data. You should have a lowerbound and upper bound for each interval, though for the highest interval the upper bound should be missing and for the lowest interval the lowerbound should be missing (zero is not allowed).

Intervalcensored data can be fitted with proc lifereg, and you can specify that your data was interval censored. The percent of observations within each interval should be used as weight.

I made an example were I first simulate data from a lognormal distribution, then I find the frequency of observations within specified intervals Then I estimate the parameteres with lifereg, and I find almost same parameters as those I used in the simulation. The intercept estimate is the estimated mean of log(income) and scale is the standard-deviation of log(income).

%let highestvalue=1000;

%let invervallength=5;

data test;

do i=1 to 100000;

y=exp(rand('normal',5,2));

logy=log(y);

lower=min(&highestvalue,&invervallength*int(y/&invervallength));

output;

end;

keep lower  logy y;

run;

ods select none;

ods output onewayfreqs=freqs(keep=lower percent);

proc freq data=test;

table lower/missing;

run;

ods select all;

data freqs;

set freqs;

upper=lower+&invervallength.;

if lower=0 then lower=.;

if lower>=&highestvalue then do; upper=.;lower=&highestvalue.;end;

run;

proc lifereg data=freqs;

model (lower upper)=/distribution=lnormal;

weight percent;

run;

New Contributor
Posts: 2