Programming the statistical procedures from SAS

Distribution Fitting with Weights

Reply
Occasional Contributor
Posts: 18

Distribution Fitting with Weights

I am trying fit a lognormal distribution to data that has weights.  For instance, a particular observation may represent 2000 households while another one may represent 500 households.  The data set contains a variable, "weight," that represents each observation's proportional weighting.

Proc Univariate does not allow the "Weight" option to be used when using any of the statements that do distribution fitting.  The parameter estimates are desired as well as the test statistics and accompannying p-values.

What is the best way to fit distributions to weighted data?

Thanks,

Scott

Super User
Posts: 9,782

Distribution Fitting with Weights

But proc univariate has Weight statement.

Ksharp

Occasional Contributor
Posts: 18

Re: Distribution Fitting with Weights

The Weight statement cannot be used in conjunction with the histogram statement.

Scott

Respected Advisor
Posts: 2,655

Re: Distribution Fitting with Weights

Do you have access to PROC CAPABILITY? The following code was generated by EG4.3.  I used anml_no as a weight, and it generated a pretty nice looking lognormal dataset this way.  The output had the goodness-of-fit tests for a lognormal distribution.

PROC CAPABILITY DATA = WORK.SORTTempTableSorted

CIBASIC(TYPE=TWOSIDED ALPHA=0.05)

MU0=0

;

WEIGHT anml_no;

VAR IgG;

;

HISTOGRAM IgG / LOGNORMAL ( W=1 L=1 COLOR=LIME ZETA=EST THETA=EST

SIGMA=EST)

CAXIS=PURPLE

CTEXT=BLACK

CFRAME=WHITE

CBARLINE=BLACK

CFILL=GRAY

;

/* -------------------------------------------------------------------

End of task code.

------------------------------------------------------------------- */

RUN; QUIT;

Good luck.

Steve Denham

Message was edited by: Steve Denham Unfortunately, after digging a little deeper, I don't think the goodness-of-fit tests are weighted.  I get exactly the same values with and without the weight statement.  Only the basic stats and tests for location appear to actually use the weights.

Occasional Contributor
Posts: 18

Re: Distribution Fitting with Weights

I discovered the same thing running Proc Capability on my data: the parameter estimates are identical whether or not I include a Weight statement.  This is the case for lognormal, Weibull and gamma fits.

Thanks for the suggestion though.  It surprises me that such a simple and commonly needed operation isn't built directly into these procesures.

Not sure where to go from here.

Scott

SAS Super FREQ
Posts: 3,559

Re: Distribution Fitting with Weights

I gave a presentation on this topic a few years ago at the Joint Statistical Meetings, and I disagree that this is a simple operation. The two major difficulties are (1) weighted histograms are not simple to define and understand, and (2) a weighted fit is not well-defined.

The WEIGHT statement in UNIVARIATE has the following meaning (from the doc):

The UNIVARIATE procedure uses the values of the WEIGHT variable to modify the computation of a number of summary statistics by assuming that the variance of the th value of the analysis variable is equal to , where is an unknown parameter.

The implications of this is that each observation is from a different distribution! You can't put a single "fitted curve" on top of a histogram because no such curve exists. Even if you solve for the common variance, , that doesn't do you much good because you can't use it to overlay a density estimate or to do a GOF test.

The problem of constructing a weighted statistical graphic is still an area of research. I'll give you the same challenge I gave the statisticians at JSM: find a paper (in a reputable journal) in which weighted histograms are defined and weighted fits are described. Send that paper to me at SAS and I'll pass it on to the UNIVARIATE developer.

As to where should you go from here, if the weights are inverse probablities, then perhaps you can define FREQ = 1/WEIGHT and use a FREQ statement. Histograms, fits, GOF tests, etc, are well-defined for count data.  It sounds like this might be survey data, and if so you should use the SURVEY* procs (SURVEYMEANS, SURVEYFREQ, etc) to analyze survey data. These procs make the correct adjustments when computing variance of variables.

Ask a Question
Discussion stats
  • 5 replies
  • 310 views
  • 0 likes
  • 4 in conversation