BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NormanNorden
Calcite | Level 5

Dear SAS-experts,

I have an empirical distribtion function: a couple of income classes and the number of households in them,

like 30% under 1000 USD, 40% under 2000 USD etc.

And I'd like to fit a lognormal distribtion to my data and get its parameters.

If I had raw data at the household level, I'd use proc univarite with the histogram option.

But I am at a loss in my situation where my data is cumulated already.

1 ACCEPTED SOLUTION

Accepted Solutions
JacobSimonsen
Barite | Level 11

You can consider your data as interval censored data. You should have a lowerbound and upper bound for each interval, though for the highest interval the upper bound should be missing and for the lowest interval the lowerbound should be missing (zero is not allowed).

Intervalcensored data can be fitted with proc lifereg, and you can specify that your data was interval censored. The percent of observations within each interval should be used as weight.

I made an example were I first simulate data from a lognormal distribution, then I find the frequency of observations within specified intervals Then I estimate the parameteres with lifereg, and I find almost same parameters as those I used in the simulation. The intercept estimate is the estimated mean of log(income) and scale is the standard-deviation of log(income).

%let highestvalue=1000;

%let invervallength=5;

data test;

  do i=1 to 100000;

    y=exp(rand('normal',5,2));

    logy=log(y);

    lower=min(&highestvalue,&invervallength*int(y/&invervallength));

    output;

  end;

  keep lower  logy y;

run;

ods select none;

ods output onewayfreqs=freqs(keep=lower percent);

proc freq data=test;

  table lower/missing;

run;

ods select all;

data freqs;

  set freqs;

  upper=lower+&invervallength.;

  if lower=0 then lower=.;

  if lower>=&highestvalue then do; upper=.;lower=&highestvalue.;end;

run;

proc lifereg data=freqs;

  model (lower upper)=/distribution=lnormal;

  weight percent;

run;

View solution in original post

2 REPLIES 2
JacobSimonsen
Barite | Level 11

You can consider your data as interval censored data. You should have a lowerbound and upper bound for each interval, though for the highest interval the upper bound should be missing and for the lowest interval the lowerbound should be missing (zero is not allowed).

Intervalcensored data can be fitted with proc lifereg, and you can specify that your data was interval censored. The percent of observations within each interval should be used as weight.

I made an example were I first simulate data from a lognormal distribution, then I find the frequency of observations within specified intervals Then I estimate the parameteres with lifereg, and I find almost same parameters as those I used in the simulation. The intercept estimate is the estimated mean of log(income) and scale is the standard-deviation of log(income).

%let highestvalue=1000;

%let invervallength=5;

data test;

  do i=1 to 100000;

    y=exp(rand('normal',5,2));

    logy=log(y);

    lower=min(&highestvalue,&invervallength*int(y/&invervallength));

    output;

  end;

  keep lower  logy y;

run;

ods select none;

ods output onewayfreqs=freqs(keep=lower percent);

proc freq data=test;

  table lower/missing;

run;

ods select all;

data freqs;

  set freqs;

  upper=lower+&invervallength.;

  if lower=0 then lower=.;

  if lower>=&highestvalue then do; upper=.;lower=&highestvalue.;end;

run;

proc lifereg data=freqs;

  model (lower upper)=/distribution=lnormal;

  weight percent;

run;

NormanNorden
Calcite | Level 5

Hi Jacob, that's terrific, thanks a lot !

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1264 views
  • 0 likes
  • 2 in conversation