Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Weight Statement in Proc Logistic

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 06-08-2018 04:00 PM
(15660 views)

Hello All,

I am working on a data set that has 800,000 records, of which only 80 are Events (Target = 1) and all the remaining records are non-events.

I do not want to do oversampling (Taking all the events observations and matching with equal number of non-events, as I will just have 160 records).

so, I decided to do weighting. That is weighed up all the events and weighed down all the non-events to make the proportion of events to non-events 50:50, using a weight variable called good_bad_wgt which I used in my logistic regression.

proc logistic data = dummies outset = est;

model Target (event = '1') = %goodvariables/selection = stepwise slstay = 0.05 slentry = 0.05;

weight = good_bad_wgt;

run;

What I want to know is:

1. Are the resulting probabilities over-estimated?

2. If so, How do I adjust the probabilities.

If someone can help me better understand how the weight statement in Proc logistic works, I would really appreciate it.

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

13 REPLIES 13

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Does not the documentation for the WEIGHT statement explain all of this?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

In the link you shared, It says "Weights do not affect the computation of predicted probabilities, their confidence limits, or the predicted response level"

Is this correct?

Does it mean the probabilities obtained from proc logistic are the True probabilities and need not be adjusted (i.e. offset)?

Thanks.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@praneeth09m248 wrote:

In the link you shared, It says "Weights do not affect the computation of predicted probabilities, their confidence limits, or the predicted response level"

Is this correct?

If that's what SAS says in the documentation, then I believe it is correct. Why would I believe anything else?

Does it mean the probabilities obtained from proc logistic are the True probabilities and need not be adjusted (i.e. offset)?

Well, um ...

"true probabilities" and "need not be adjustet (i.e. offset)" are terms that I really don't know what they mean.

They are estimates of the probabilities, given the model (and some simple assumptions). Estimates are never "true probabilities" the way I would use the phrase, but perhaps you are using "true probabilities" differently than I would use it.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

when we oversample the data, probabilities obtained are overestimated, to bring the probabilities back to its original values, we adjust the intercept. (I refer to the probabilities obtained after the adjustment as True probabilities).

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@praneeth09m248 wrote:

when we oversample the data, probabilities obtained are overestimated, to bring the probabilities back to its original values, we adjust the intercept. (I refer to the probabilities obtained after the adjustment as True probabilities).

Wait! Now you are oversampling? I don't see where that comes from, you certainly haven't explained that. Furthermore, the weights you provide in the WEIGHT statement ought to eliminate the oversampling.

Furthermore, it is highly likely that SAS does any adjustments needed under the hood, so you the human user don't have to take the results and adjust them further. But do I know that for sure? No, because I have never dug into it; although that's a very good thing for a statistical analysis package to do, and SAS is a very highly regarded statistical analysis package.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@praneeth09m248 wrote:

In the link you shared, It says "Weights do not affect the computation of predicted probabilities, their confidence limits, or the predicted response level"

Is this correct?

No, your interpretation is not correct. The paragraph that you quote from begins "If a SCORE statement is specified, then...." The entire paragraph is talking about the SCORE statement and the statement that you quote refers ONLY to data specified on the SCORE statement, not to the data used to fit the model. During the fitting of the model, the weights determine the parameter estimates and therefore affect the predicted probabilities and CLs. However, the sentence that you quote indicates that **after** the model is fit, then the procedure scores the model based only on the values of the explanatory variables.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you have weights shouldn't you be using PROC SURVEYLOGISTIC?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I am not familiar with proc surveylogistic but I have used proc logistic with weight options in the past when I was not interested in the True probabilities but was more interested in the rank ordering of the probabilities.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Since you’ve weighted the obs your odds ratios and estimates may not reflect the actual probabilities. Your weighting approach sounds a bit like setting up prior probabilities.

That may be a more intuitive approach.

https://support.sas.com/resources/papers/proceedings14/SAS400-2014.pdf

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Both would get the same parameter estimators but different standard error , with weight variable or not.

I remembered @Rick_SAS has written a blog about it for PROC REG before.

If you have too small probability of event.

Two choice:

1) oversample, otherwise your model would not be trusted.

2) try other distribution like : Poisson Distribution, Negative Binomial Distribution .

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Ksharp is probably referring to the article

"The difference between frequencies and weights in regression analysis"

Another relevant article is

"How to understand weight variables in statistical analyses"

which explains the differences between the analytical weights that PROC LOGISTIC uses and the survey weights that PROC SURVEYLOGISTIC uses.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks Rick, the article was very articulate.

So, the probabilities that proc Logistic produces when I use a weight statement is for the data sample after applying the weights.

So, the probabilities that proc Logistic produces when I use a weight statement is for the data sample after applying the weights.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

**Don't miss out on SAS Innovate - Register now for the FREE Livestream!**

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.