turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Distribution Fitting with Weights

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-18-2011 07:39 PM

I am trying fit a lognormal distribution to data that has weights. For instance, a particular observation may represent 2000 households while another one may represent 500 households. The data set contains a variable, "weight," that represents each observation's proportional weighting.

Proc Univariate does not allow the "Weight" option to be used when using any of the statements that do distribution fitting. The parameter estimates are desired as well as the test statistics and accompannying p-values.

What is the best way to fit distributions to weighted data?

Thanks,

Scott

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SColby

08-19-2011 04:29 AM

But proc univariate has Weight statement.

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

08-19-2011 01:16 PM

The Weight statement cannot be used in conjunction with the histogram statement.

Scott

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SColby

08-19-2011 06:38 AM

Do you have access to PROC CAPABILITY? The following code was generated by EG4.3. I used anml_no as a weight, and it generated a pretty nice looking lognormal dataset this way. The output had the goodness-of-fit tests for a lognormal distribution.

PROC CAPABILITY DATA = WORK.SORTTempTableSorted

CIBASIC(TYPE=TWOSIDED ALPHA=0.05)

MU0=0

;

WEIGHT anml_no;

VAR IgG;

;

HISTOGRAM IgG / LOGNORMAL ( W=1 L=1 COLOR=LIME ZETA=EST THETA=EST

SIGMA=EST)

CAXIS=PURPLE

CTEXT=BLACK

CFRAME=WHITE

CBARLINE=BLACK

CFILL=GRAY

;

/* -------------------------------------------------------------------

End of task code.

------------------------------------------------------------------- */

RUN; QUIT;

Good luck.

Steve Denham

Message was edited by: Steve Denham Unfortunately, after digging a little deeper, I don't think the goodness-of-fit tests are weighted. I get exactly the same values with and without the weight statement. Only the basic stats and tests for location appear to actually use the weights.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

08-19-2011 06:38 PM

I discovered the same thing running Proc Capability on my data: the parameter estimates are identical whether or not I include a Weight statement. This is the case for lognormal, Weibull and gamma fits.

Thanks for the suggestion though. It surprises me that such a simple and commonly needed operation isn't built directly into these procesures.

Not sure where to go from here.

Scott

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SColby

08-21-2011 06:24 AM

I gave a presentation on this topic a few years ago at the Joint Statistical Meetings, and I disagree that this is a simple operation. The two major difficulties are (1) weighted histograms are not simple to define and understand, and (2) a weighted fit is not well-defined.

The WEIGHT statement in UNIVARIATE has the following meaning (from the doc):

The UNIVARIATE procedure uses the values of the WEIGHT variable to modify the computation of a number of summary statistics by assuming that the variance of the th value of the analysis variable is equal to , where is an unknown parameter.

The implications of this is that *each *observation is from a *different *distribution! You can't put a single "fitted curve" on top of a histogram because no such curve exists. Even if you solve for the common variance, , that doesn't do you much good because you can't use it to overlay a density estimate or to do a GOF test.

The problem of constructing a weighted statistical graphic is still an area of research. I'll give you the same challenge I gave the statisticians at JSM: find a paper (in a reputable journal) in which weighted histograms are defined and weighted fits are described. Send that paper to me at SAS and I'll pass it on to the UNIVARIATE developer.

As to where should you go from here, if the weights are inverse probablities, then perhaps you can define FREQ = 1/WEIGHT and use a FREQ statement. Histograms, fits, GOF tests, etc, are well-defined for count data. It sounds like this might be survey data, and if so you should use the SURVEY* procs (SURVEYMEANS, SURVEYFREQ, etc) to analyze survey data. These procs make the correct adjustments when computing variance of variables.