BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
apolitical
Obsidian | Level 7

When the ratio of success (event of interest occurring) in the regressed dataset is extremely low, one could upsample it, i.e. choose all successes and only a portion of the non-success, to increase the ratio of success in the regressed sample. To do logistic regression on the new sample, a weight variable or the ‘offset=’ option should be used, and doing a regular logistic with the upsampled dataset will give incorrect estimates. My questions regarding this are:

 

1. When using an offset variable, this webpage defines the way this variable should be defined. Is this is the only ‘correct’ way to specify it?

http://support.sas.com/kb/22/601.html

‘Or, to adjust by using an offset, add a variable to your data set defined as log[(r1*(1-p1)) / ((1-r1)*p1)], where log represents the natural logarithm. Specify this variable in the OFFSET= option of the MODEL statement in PROC LOGISTIC.’

 

Also, if the offset variable is a constant (as is the case above), isn’t this regression equivalent to one without using offset variable? The only nominal difference would be in the one with offset, the intercept plus offset (who is forced to have coefficient of 1) is equal to the intercept of the plain logistic regression without offset. The coefficients on independent variables would still be the same, as would the overall fit. I tried it in SAS and it seems to be the case. If this is so, what’s the point of using an offset variable?

 

2. When using ‘weight’, how is the weight variable specified exactly? What’s the consequence of using a weight variable in logistic regression on the original dataset that’s not upsampled?

 

3. I haven’t seen anywhere that uses both weight and offset together, what happens if you use them both? Does it render the regression incorrect or is it just unnecessary?

 

4. If I run logistic regression with weight or offset, save the regression results with ‘outmodel=mymodel’ option. Then I do scoring on a new dataset (untreated, no upsampling, etc.) with ‘proc logistic inmodel=mymodel’, what precautions should be taken to produce the correct probability scores?

 

I am sorry that’s a lot of questions. Thank you so much.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
"When the ratio of success (event of interest occurring) in the regressed dataset is extremely low"
That would lead you to Possion or Negative Binomial Regression Model .

Logistic link function :  log(p/1-p)
when p~0 
==>
log(p)  which is Possion Regression Model.

Check this :

http://support.sas.com/kb/24/188.html 



View solution in original post

1 REPLY 1
Ksharp
Super User
"When the ratio of success (event of interest occurring) in the regressed dataset is extremely low"
That would lead you to Possion or Negative Binomial Regression Model .

Logistic link function :  log(p/1-p)
when p~0 
==>
log(p)  which is Possion Regression Model.

Check this :

http://support.sas.com/kb/24/188.html 



sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 7442 views
  • 0 likes
  • 2 in conversation