Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How to determine logistic regression formula from estimates output

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 02-25-2013 11:56 AM
(15010 views)

Greetings all. I am trying to create my regression forumula from the estimates output from proc logistic. Thinking back to multiple regression (and it was several years ago), I could simply take the intercept + (estimate1*variable1) + (extimateN*variableN). However, if I use this methodology, I seem to get some results that are counter intuitive. My question is, how does the 'Exp(Est)' affect the parameter estimate with respect to putting it in the regression forumula? I have attached a copy of my log, and sure would appreciate it if anyone would be willing to put some experienced eyes on it. For example, variable 'bad_debt_at_connection' indicates a customer opened a new account when they had an account that at some point in the past went to a collection agency for non-payment. If I am trying to predict accounts that will go to collections, it seems to me if this condition were true for any customer, they would be more likely to go to collections if they have already done so in the past. However, the parameter estimate of -0.1582 seems to me to indicate this is not the case. The two class variables are binary, having either a 1 or 0. The rest of the variables are numeric, and should be treated as such. Thank you.

Greg.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The intercept does matter.

I pulled the exact estimates from the model instead of typing it it. But if you type it in, it's pretty close.

proc logistic data=Neuralgia2 outest=sample;

class sex (ref='0') / param=ref ;

model Pain (event='1') = age sex ;

output out=pred p=phat

predprob=(individual crossvalidate) ;

run ;

/* the formula I am trying to replicate in Excel is the 'myformula' variable

in the below data set

*/

data logformula (keep= age sex pain ip_1 myformula difference);

set pred;

if _n_=1 then set sample (keep = intercept age sex1 rename = (age=age_estimate sex1=sex_estimate));

length sex_estimate age_estimate intercept myformula 8. ;

myformula = 1/(1+exp(-1*(intercept+(sex_estimate*sex) + (age_estimate*age)))); *<<< can't get this to match ip_1;

difference=phat-myformula;

format difference 12.8;

run;

proc print data=logformula (obs=25) ;

var sex age pain ip_1 myformula ; * I need to be able to replicate ip_1 given MLE values;

run ;

17 REPLIES 17

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You didn't fit your model properly. You need to specify that you want your class variables to have referential coding not effect coding.

Try referring to this site to interpret your output and basics of logistic regression. A logistic regression isn't linear, so the way you're trying to write the equation isn't correct. If you can find a statistician to help you out.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you Reeza. Unfortuneately, I do not have access to an in-house statustician, so I have been trying to figure it out on my own for several days now. Below is what I am trying.

proc logistic data = credit.crnp_logreg;

class bad_debt_at_conn dep_unpaid;

model chg_off (event='1') = bad_debt_at_conn dep_unpaid cr_scor arrears qy_fc_c qy_fc_b qy_st_c qy_st_b / selection=none expb;

quit;

Would you be able to show me how to run it so it performs as you suggets, or to point me to a good reference? I have tried googling everything I can think of. Thank you.

Greg

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Statistical Computing Seminars: Introduction to SAS proc logistic

If I'm working on a new proc what I like to do is first try it out with the example data and make sure I understand how to interpret it for the example data. Then go back and make sure I understand it for my data.

SAS has a bunch of examples in the documentation that is pretty good. The link above from UCLA is good. You can also try googling proc logistic at lexjansen.com.

My main suggestion is to add /param=ref; to your class statement like below. You may also want to specify what reference levels you want it coded it but that's up to you.

proc logistic data = credit.crnp_logreg;

class bad_debt_at_conn dep_unpaid/param=ref;

model chg_off (event='1') = bad_debt_at_conn dep_unpaid cr_scor arrears qy_fc_c qy_fc_b qy_st_c qy_st_b / selection=none expb;

quit;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Reeza, thank you so, so much for that link. That is *exactly* what I need.

Greg

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Ok, I am still missing something. I used / rsq lackfit, to see if my model even made sense, and indeed it is a good fit. I guess what I am missing is that SAS must be using some kind of formula to determine the probability of each observation having a response of either 0 or 1. I am still unsure how to do this. Do I need to look at the probability of each variable independently, the put them all together? Thank you.

Greg

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I'm confused as to what your question is, what do you need to put them together for?

You can see the basic calculations P(x=0) and P(x=1) in the output dataset if you're looking for that. Take a look at the 'Pred' dataset. If you want to figure out how to calculate those by hand its also possible, and a good exercise when starting out.

Using the remission data in the first example in the SAS documentation:

proc logistic data=Remission outest=betas covout;

model remiss(event='1')=cell smear infil li blast temp ;

output out=pred p=phat lower=lcl upper=ucl

predprob=(individual crossvalidate);

run;

proc print data=pred;

run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Reeza, thank you for the link. That is *exactly* what I am trying to do. I've got close, but can't seem to get it exactly right. I have my spread sheet set up just like the poster in the other post has done, with the MLE estimates populated above the variable names. the closest I could get to calculating the individual probability of any one observation, and this is attempting to use the forumula you suggested to the other poster...

1/(1 + EXP(-1*(var_1*estimate_1)+(var_2*estimate_2)+(varN*estimate_N)))

The value calculated with this formula is slightly different (with an absolute difference less than .01) than the 'Individual Probability of event=1' value calculated by SAS. I'm also not sure where the intercept fits in. I found this document... http://support.sas.com/resources/papers/proceedings12/317-2012.pdf and it seems to imply the intercept need not be taking into consideration in some cases. Do you see anything wrong with my forumula? Again, thank you so much for your help.

Greg

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

How are you getting your estimates from SAS to excel, a difference of .01 may be rounding.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

When I check one, I get the probabilities equivalent to .0001 in Excel.

Close enough for me. However, you may want to ensure that you've done the parametrization correctly in terms of implementing the equation. That's the only think I can think of without seeing the results and equations.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Reeza, I took your advice, and started over with a documented example I understood. I looked at the Neuralgia example 51.2 avalailable at http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_logistic_sec...

I started by altering the data a bit to match my scenario by changing 'M' and 'F' to 0 and 1, and also to change 'No' and 'Yes' to 0 and 1. Other than that, it pretty much seems the same as my situation, with the exeption that I have 3 binary class predictors, and 3 numeric predictors. I tried using the same forumula that got me close yesterday on my own data, and now I am not even coming close, as all of the results of my formula are .9999 when rounded. I have attached the sas code for the example, including my changegs, the proc logistic, and a final dataset with the current iteration of the formula I am trying to get to work. It just seems like I should be able to replicate the predicted probabilities given the estimates and intercept, but after trying everything I can think of, I just can't get it to work. I have referenced my stats book, and I don't see anything contrary to what you have suggested in the other post, and what I am trying. Thank you.

Greg

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The intercept does matter.

I pulled the exact estimates from the model instead of typing it it. But if you type it in, it's pretty close.

proc logistic data=Neuralgia2 outest=sample;

class sex (ref='0') / param=ref ;

model Pain (event='1') = age sex ;

output out=pred p=phat

predprob=(individual crossvalidate) ;

run ;

/* the formula I am trying to replicate in Excel is the 'myformula' variable

in the below data set

*/

data logformula (keep= age sex pain ip_1 myformula difference);

set pred;

if _n_=1 then set sample (keep = intercept age sex1 rename = (age=age_estimate sex1=sex_estimate));

length sex_estimate age_estimate intercept myformula 8. ;

myformula = 1/(1+exp(-1*(intercept+(sex_estimate*sex) + (age_estimate*age)))); *<<< can't get this to match ip_1;

difference=phat-myformula;

format difference 12.8;

run;

proc print data=logformula (obs=25) ;

var sex age pain ip_1 myformula ; * I need to be able to replicate ip_1 given MLE values;

run ;

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.