BookmarkSubscribeRSS Feed
SColby
Calcite | Level 5

I am trying fit a lognormal distribution to data that has weights.  For instance, a particular observation may represent 2000 households while another one may represent 500 households.  The data set contains a variable, "weight," that represents each observation's proportional weighting.

Proc Univariate does not allow the "Weight" option to be used when using any of the statements that do distribution fitting.  The parameter estimates are desired as well as the test statistics and accompannying p-values.

What is the best way to fit distributions to weighted data?

Thanks,

Scott

5 REPLIES 5
Ksharp
Super User

But proc univariate has Weight statement.

Ksharp

SColby
Calcite | Level 5

The Weight statement cannot be used in conjunction with the histogram statement.

Scott

SteveDenham
Jade | Level 19

Do you have access to PROC CAPABILITY? The following code was generated by EG4.3.  I used anml_no as a weight, and it generated a pretty nice looking lognormal dataset this way.  The output had the goodness-of-fit tests for a lognormal distribution.

PROC CAPABILITY DATA = WORK.SORTTempTableSorted

CIBASIC(TYPE=TWOSIDED ALPHA=0.05)

MU0=0

;

WEIGHT anml_no;

VAR IgG;

;

HISTOGRAM IgG / LOGNORMAL ( W=1 L=1 COLOR=LIME ZETA=EST THETA=EST

SIGMA=EST)

CAXIS=PURPLE

CTEXT=BLACK

CFRAME=WHITE

CBARLINE=BLACK

CFILL=GRAY

;

/* -------------------------------------------------------------------

End of task code.

------------------------------------------------------------------- */

RUN; QUIT;

Good luck.

Steve Denham

Message was edited by: Steve Denham Unfortunately, after digging a little deeper, I don't think the goodness-of-fit tests are weighted.  I get exactly the same values with and without the weight statement.  Only the basic stats and tests for location appear to actually use the weights.

SColby
Calcite | Level 5

I discovered the same thing running Proc Capability on my data: the parameter estimates are identical whether or not I include a Weight statement.  This is the case for lognormal, Weibull and gamma fits.

Thanks for the suggestion though.  It surprises me that such a simple and commonly needed operation isn't built directly into these procesures.

Not sure where to go from here.

Scott

Rick_SAS
SAS Super FREQ

I gave a presentation on this topic a few years ago at the Joint Statistical Meetings, and I disagree that this is a simple operation. The two major difficulties are (1) weighted histograms are not simple to define and understand, and (2) a weighted fit is not well-defined.

The WEIGHT statement in UNIVARIATE has the following meaning (from the doc):

The UNIVARIATE procedure uses the values of the WEIGHT variable to modify the computation of a number of summary statistics by assuming that the variance of the th value of the analysis variable is equal to , where is an unknown parameter.

The implications of this is that each observation is from a different distribution! You can't put a single "fitted curve" on top of a histogram because no such curve exists. Even if you solve for the common variance, , that doesn't do you much good because you can't use it to overlay a density estimate or to do a GOF test.

The problem of constructing a weighted statistical graphic is still an area of research. I'll give you the same challenge I gave the statisticians at JSM: find a paper (in a reputable journal) in which weighted histograms are defined and weighted fits are described. Send that paper to me at SAS and I'll pass it on to the UNIVARIATE developer.

As to where should you go from here, if the weights are inverse probablities, then perhaps you can define FREQ = 1/WEIGHT and use a FREQ statement. Histograms, fits, GOF tests, etc, are well-defined for count data.  It sounds like this might be survey data, and if so you should use the SURVEY* procs (SURVEYMEANS, SURVEYFREQ, etc) to analyze survey data. These procs make the correct adjustments when computing variance of variables.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1804 views
  • 0 likes
  • 4 in conversation