BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
praneeth09m248
Calcite | Level 5

Hello All,

 

I am working on a data set that has 800,000 records, of which only 80 are Events (Target = 1) and all the remaining records are non-events.

 

I do not want to do oversampling (Taking all the events observations and matching with equal number of non-events, as I will just have 160 records).

 

so, I decided to do weighting. That is weighed up all the events and weighed down all the non-events to make the proportion of events to non-events 50:50, using a weight variable called good_bad_wgt which I used in my logistic regression.

 

proc logistic data = dummies outset = est;

 model Target (event = '1')  = %goodvariables/selection = stepwise slstay = 0.05 slentry = 0.05;

weight = good_bad_wgt;

run;

 

What I want to know is:

 

1. Are the resulting probabilities over-estimated?

2. If so, How do I adjust the probabilities.

 

If someone can help me better understand how the weight statement in Proc logistic works, I would really appreciate it.

 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

When you use a WEIGHT statement, the weights are used to determine the parameter estimates (="fit the model"). After the parameters are estimated, the procedure does not need or use any additional weights to score data (="evaluate the model based on the values of the explanatory variables"). The predicted probabilities, CIs, etc, are determined solely by the parameter estimates. 

View solution in original post

13 REPLIES 13
PaigeMiller
Diamond | Level 26

Does not the documentation for the WEIGHT statement explain all of this?

http://documentation.sas.com/?cdcId=pgmmvacdc&cdcVersion=9.4&docsetId=statug&docsetTarget=statug_log...

--
Paige Miller
praneeth09m248
Calcite | Level 5

In the link you shared, It says "Weights do not affect the computation of predicted probabilities, their confidence limits, or the predicted response level"

 

Is this correct?

 

Does it mean the probabilities obtained from proc logistic are the True probabilities and need not be adjusted (i.e. offset)?

 

Thanks.

PaigeMiller
Diamond | Level 26

@praneeth09m248 wrote:

In the link you shared, It says "Weights do not affect the computation of predicted probabilities, their confidence limits, or the predicted response level"

 

Is this correct?

 


If that's what SAS says in the documentation, then I believe it is correct. Why would I believe anything else?

 

Does it mean the probabilities obtained from proc logistic are the True probabilities and need not be adjusted (i.e. offset)?

 

Well, um ...

 

"true probabilities" and "need not be adjustet (i.e. offset)" are terms that I really don't know what they mean. 

 

They are estimates of the probabilities, given the model (and some simple assumptions). Estimates are never "true probabilities" the way I would use the phrase, but perhaps you are using "true probabilities" differently than I would use it.

--
Paige Miller
praneeth09m248
Calcite | Level 5
when we oversample the data, probabilities obtained are overestimated, to bring the probabilities back to its original values, we adjust the intercept. (I refer to the probabilities obtained after the adjustment as True probabilities).
PaigeMiller
Diamond | Level 26

@praneeth09m248 wrote:
when we oversample the data, probabilities obtained are overestimated, to bring the probabilities back to its original values, we adjust the intercept. (I refer to the probabilities obtained after the adjustment as True probabilities).

Wait! Now you are oversampling? I don't see where that comes from, you certainly haven't explained that. Furthermore, the weights you provide in the WEIGHT statement ought to eliminate the oversampling.

 

Furthermore, it is highly likely that SAS does any adjustments needed under the hood, so you the human user don't have to take the results and adjust them further. But do I know that for sure? No, because I have never dug into it; although that's a very good thing for a statistical analysis package to do, and SAS is a very highly regarded statistical analysis package.

--
Paige Miller
Rick_SAS
SAS Super FREQ

@praneeth09m248 wrote:

In the link you shared, It says "Weights do not affect the computation of predicted probabilities, their confidence limits, or the predicted response level"

 

Is this correct?

 


No, your interpretation is not correct. The paragraph that you quote from begins "If a SCORE statement is specified, then...." The entire paragraph is talking about the SCORE statement and the statement that you quote refers ONLY to data specified on the SCORE statement, not to the data used to fit the model. During the fitting of the model, the weights determine the parameter estimates and therefore affect the predicted probabilities and CLs. However, the sentence that you quote indicates that after the model is fit, then the procedure scores the model based only on the values of the explanatory variables.

Reeza
Super User

If you have weights shouldn't you be using PROC SURVEYLOGISTIC?

praneeth09m248
Calcite | Level 5
I am not familiar with proc surveylogistic but I have used proc logistic with weight options in the past when I was not interested in the True probabilities but was more interested in the rank ordering of the probabilities.
Reeza
Super User

Since you’ve weighted the obs your odds ratios and estimates may not reflect the actual probabilities. Your weighting approach sounds a bit like setting up prior probabilities.

That may be a more intuitive approach. 

 

 

https://support.sas.com/resources/papers/proceedings14/SAS400-2014.pdf

Ksharp
Super User

Both would get the same parameter estimators but different standard error , with weight variable or not.

I remembered @Rick_SAS has written a blog about it for PROC REG before.

 

If you have too small probability of event.

Two choice:

1) oversample, otherwise your model would not be trusted.

 2) try other distribution like : Poisson Distribution, Negative Binomial Distribution .

Rick_SAS
SAS Super FREQ

@Ksharp is probably referring to the article

"The difference between frequencies and weights in regression analysis"

Another relevant article is 

"How to understand weight variables in statistical analyses"

which explains the differences between the analytical weights that PROC LOGISTIC uses and the survey weights that PROC SURVEYLOGISTIC uses.

praneeth09m248
Calcite | Level 5
Thanks Rick, the article was very articulate.
So, the probabilities that proc Logistic produces when I use a weight statement is for the data sample after applying the weights.

Rick_SAS
SAS Super FREQ

When you use a WEIGHT statement, the weights are used to determine the parameter estimates (="fit the model"). After the parameters are estimated, the procedure does not need or use any additional weights to score data (="evaluate the model based on the values of the explanatory variables"). The predicted probabilities, CIs, etc, are determined solely by the parameter estimates. 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 16477 views
  • 1 like
  • 5 in conversation