Programming the statistical procedures from SAS

Weight Statement in Proc Logistic

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

Weight Statement in Proc Logistic

[ Edited ]

Hello All,

 

I am working on a data set that has 800,000 records, of which only 80 are Events (Target = 1) and all the remaining records are non-events.

 

I do not want to do oversampling (Taking all the events observations and matching with equal number of non-events, as I will just have 160 records).

 

so, I decided to do weighting. That is weighed up all the events and weighed down all the non-events to make the proportion of events to non-events 50:50, using a weight variable called good_bad_wgt which I used in my logistic regression.

 

proc logistic data = dummies outset = est;

 model Target (event = '1')  = %goodvariables/selection = stepwise slstay = 0.05 slentry = 0.05;

weight = good_bad_wgt;

run;

 

What I want to know is:

 

1. Are the resulting probabilities over-estimated?

2. If so, How do I adjust the probabilities.

 

If someone can help me better understand how the weight statement in Proc logistic works, I would really appreciate it.

 

Thanks.


Accepted Solutions
Solution
‎06-10-2018 03:46 PM
SAS Super FREQ
Posts: 4,240

Re: Weight Statement in Proc Logistic

Posted in reply to praneeth09m248

When you use a WEIGHT statement, the weights are used to determine the parameter estimates (="fit the model"). After the parameters are estimated, the procedure does not need or use any additional weights to score data (="evaluate the model based on the values of the explanatory variables"). The predicted probabilities, CIs, etc, are determined solely by the parameter estimates. 

View solution in original post


All Replies
Respected Advisor
Posts: 3,003

Re: Weight Statement in Proc Logistic

Posted in reply to praneeth09m248

Does not the documentation for the WEIGHT statement explain all of this?

http://documentation.sas.com/?cdcId=pgmmvacdc&cdcVersion=9.4&docsetId=statug&docsetTarget=statug_log...

--
Paige Miller
Occasional Contributor
Posts: 8

Re: Weight Statement in Proc Logistic

[ Edited ]
Posted in reply to PaigeMiller

In the link you shared, It says "Weights do not affect the computation of predicted probabilities, their confidence limits, or the predicted response level"

 

Is this correct?

 

Does it mean the probabilities obtained from proc logistic are the True probabilities and need not be adjusted (i.e. offset)?

 

Thanks.

Respected Advisor
Posts: 3,003

Re: Weight Statement in Proc Logistic

Posted in reply to praneeth09m248

@praneeth09m248 wrote:

In the link you shared, It says "Weights do not affect the computation of predicted probabilities, their confidence limits, or the predicted response level"

 

Is this correct?

 


If that's what SAS says in the documentation, then I believe it is correct. Why would I believe anything else?

 

Does it mean the probabilities obtained from proc logistic are the True probabilities and need not be adjusted (i.e. offset)?

 

Well, um ...

 

"true probabilities" and "need not be adjustet (i.e. offset)" are terms that I really don't know what they mean. 

 

They are estimates of the probabilities, given the model (and some simple assumptions). Estimates are never "true probabilities" the way I would use the phrase, but perhaps you are using "true probabilities" differently than I would use it.

--
Paige Miller
Occasional Contributor
Posts: 8

Re: Weight Statement in Proc Logistic

Posted in reply to PaigeMiller
when we oversample the data, probabilities obtained are overestimated, to bring the probabilities back to its original values, we adjust the intercept. (I refer to the probabilities obtained after the adjustment as True probabilities).
Respected Advisor
Posts: 3,003

Re: Weight Statement in Proc Logistic

[ Edited ]
Posted in reply to praneeth09m248

@praneeth09m248 wrote:
when we oversample the data, probabilities obtained are overestimated, to bring the probabilities back to its original values, we adjust the intercept. (I refer to the probabilities obtained after the adjustment as True probabilities).

Wait! Now you are oversampling? I don't see where that comes from, you certainly haven't explained that. Furthermore, the weights you provide in the WEIGHT statement ought to eliminate the oversampling.

 

Furthermore, it is highly likely that SAS does any adjustments needed under the hood, so you the human user don't have to take the results and adjust them further. But do I know that for sure? No, because I have never dug into it; although that's a very good thing for a statistical analysis package to do, and SAS is a very highly regarded statistical analysis package.

--
Paige Miller
SAS Super FREQ
Posts: 4,240

Re: Weight Statement in Proc Logistic

Posted in reply to praneeth09m248

@praneeth09m248 wrote:

In the link you shared, It says "Weights do not affect the computation of predicted probabilities, their confidence limits, or the predicted response level"

 

Is this correct?

 


No, your interpretation is not correct. The paragraph that you quote from begins "If a SCORE statement is specified, then...." The entire paragraph is talking about the SCORE statement and the statement that you quote refers ONLY to data specified on the SCORE statement, not to the data used to fit the model. During the fitting of the model, the weights determine the parameter estimates and therefore affect the predicted probabilities and CLs. However, the sentence that you quote indicates that after the model is fit, then the procedure scores the model based only on the values of the explanatory variables.

Super User
Posts: 23,705

Re: Weight Statement in Proc Logistic

Posted in reply to praneeth09m248

If you have weights shouldn't you be using PROC SURVEYLOGISTIC?

Occasional Contributor
Posts: 8

Re: Weight Statement in Proc Logistic

I am not familiar with proc surveylogistic but I have used proc logistic with weight options in the past when I was not interested in the True probabilities but was more interested in the rank ordering of the probabilities.
Super User
Posts: 23,705

Re: Weight Statement in Proc Logistic

Posted in reply to praneeth09m248

Since you’ve weighted the obs your odds ratios and estimates may not reflect the actual probabilities. Your weighting approach sounds a bit like setting up prior probabilities.

That may be a more intuitive approach. 

 

 

https://support.sas.com/resources/papers/proceedings14/SAS400-2014.pdf

Super User
Posts: 10,778

Re: Weight Statement in Proc Logistic

Posted in reply to praneeth09m248

Both would get the same parameter estimators but different standard error , with weight variable or not.

I remembered @Rick_SAS has written a blog about it for PROC REG before.

 

If you have too small probability of event.

Two choice:

1) oversample, otherwise your model would not be trusted.

 2) try other distribution like : Poisson Distribution, Negative Binomial Distribution .

SAS Super FREQ
Posts: 4,240

Re: Weight Statement in Proc Logistic

@Ksharp is probably referring to the article

"The difference between frequencies and weights in regression analysis"

Another relevant article is 

"How to understand weight variables in statistical analyses"

which explains the differences between the analytical weights that PROC LOGISTIC uses and the survey weights that PROC SURVEYLOGISTIC uses.

Occasional Contributor
Posts: 8

Re: Weight Statement in Proc Logistic

Posted in reply to praneeth09m248
Thanks Rick, the article was very articulate.
So, the probabilities that proc Logistic produces when I use a weight statement is for the data sample after applying the weights.

Solution
‎06-10-2018 03:46 PM
SAS Super FREQ
Posts: 4,240

Re: Weight Statement in Proc Logistic

Posted in reply to praneeth09m248

When you use a WEIGHT statement, the weights are used to determine the parameter estimates (="fit the model"). After the parameters are estimated, the procedure does not need or use any additional weights to score data (="evaluate the model based on the values of the explanatory variables"). The predicted probabilities, CIs, etc, are determined solely by the parameter estimates. 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 276 views
  • 1 like
  • 5 in conversation