Barite | Level 11

Interpreting 'Predicted' in Proc Logistic -- What's your take?

Predicted

This table was created by Proc Logistic.  Model is to predict i_50505_Z.  There are around 100 independent variables (not shown).

The 'Probability' has to do with 'Odds Ratio' -- the odds of arriving at i_50505_Z = 1, versus i_50505_Z = 0.  The column is sorted with highest Probability at top.

I find this table to be fascinating, if I'm interpreting it correctly.  It 'scores' every single observation in the entire dataset.

Looking across the independent variables, IFF a particular variable is found to be 'significant' (from other not shown tables), and the Probability is shown to be high, the value for that independent variable observation is the best to choose for arriving at the desired dependent variable target.

Example:

You're targeting i_50505_Z to be in the top 10% (i.e., i_50505_Z = 1).  Probability is 0.922 for a particular observation.  Independent variable X1 (p<.001) for that observation is (say) 4.7.  Then X1=4.7 is a pretty darn good guess for arriving at your objective.

In other words, given a circumstance where you see X1=4.7 you are highly likely to find high values of i_50505_Z.

Discovering this is the precise purpose of the statistical analysis.  If the program can't do it, ask for a refund.

Nicholas Kormanik

p.s. -- a side issue.  Notice that in the i_50505_Z column is 0.39.  Such value is certainly not among the top.  Really curious how that got to be included.

12 REPLIES 12
Diamond | Level 26

Re: Interpreting 'Predicted' in Proc Logistic -- What's your take?

Looking across the independent variables, IFF a particular variable is found to be 'significant' (from other not shown tables), and the Probability is shown to be high, the value for that independent variable observation is the best to choose for arriving at the desired dependent variable target.

I think this is way off the mark. Probabilities shown refer to how the model "scores" (or predicts) a single observation, using all independent variables. It doesn't tell you anything about what variables are most predictive or what variables are most significant (which is not the same as most predictive), because each independent variable has an effect on the predictions of ALL observations. Probabilities tell you nothing about variables.

There are around 100 independent variables (not shown).

Generally not a good thing to put 100 independent variables into a Logistic regression model. This produces problems caused by multi-collinearity between the independent variables (in other words, the independent variables are correlated with each other) and this will cause the regression coefficients can have huge variances (meaning the model can be quite unstable) and even have the wrong sign. There's plenty of reading on the internet about multi-collinearity.

--
Paige Miller
Barite | Level 11

Re: Interpreting 'Predicted' in Proc Logistic -- What's your take?

Thanks @PaigeMiller.  Well, then, given the information in the table at top, particularly Probability, how might one use this information?

The challenge is:  We seek a high Y outcome, we have Xi - Xn, how can the table above help us achieve our objective?

(note:  Multicollinearity is supposed to be less of a problem with Logistic Regression, than it is with Linear Regression.)

Diamond | Level 26

Re: Interpreting 'Predicted' in Proc Logistic -- What's your take?

@NKormanik wrote:

Well, then, given the information in the table at top, particularly Probability, how might one use this information?

The challenge is:  We seek a high Y outcome, we have Xi - Xn, how can the table above help us achieve our objective?

How can we answer these questions? We don't know what your objective is. What does "high Y outcome" mean? Is it highest predicted value?

(note:  Multicollinearity is supposed to be less of a problem with Logistic Regression, than it is with Linear Regression.)

I disagree. I have seen logistic regression report signs on the coefficients that are opposite what a univariate regression would show.

--
Paige Miller
Barite | Level 11

Re: Interpreting 'Predicted' in Proc Logistic -- What's your take?

Hate it when the procedure outputs disagree.

I'm certain my datasets would violate every caution and assumption OLS requires.

Quite likely Logistic will end up a bust as well.

That is, unless you and others can shed some light.....

Diamond | Level 26

Re: Interpreting 'Predicted' in Proc Logistic -- What's your take?

@NKormanik wrote:

Hate it when the procedure outputs disagree.

I'm certain my datasets would violate every caution and assumption OLS requires.

Quite likely Logistic will end up a bust as well.

That is, unless you and others can shed some light.....

Unknown what you mean by any of the above.

I don't know what you mean by "procedure outputs disagree". Be specific

I don't know what assumptions are violated. Be specific.

Logistic regression predicts log odds ratios. You haven't really stated the goal of this modeling effort. Do you want to determine which variables are important? Do you want to determine which observations have high (or low) predicted probabilities? Both? Neither? Something else? How do you want to use this logistic regression?

But you want us to "shed some light", without you first clearly stating your goal, what it means. I can't do that.

--
Paige Miller
Super User

Re: Interpreting 'Predicted' in Proc Logistic -- What's your take?

I concur with @PaigeMiller

I would suggest using a random forest if you want to make those types of statements, logistic regression doesn't really provide that type of interpretation easily.

Barite | Level 11

Re: Interpreting 'Predicted' in Proc Logistic -- What's your take?

@Reeza, well, that there are other statistical tools available that can help solve the problem is hugely encouraging.  Absolutely for sure.

Presently, however, I'm attempting to gain something from Proc Logistic.  Like, anything.

When it becomes totally apparent that Proc Logistic is a bust, then I'll move on.

Super User

Re: Interpreting 'Predicted' in Proc Logistic -- What's your take?

IMO it isn't common to focus on the scoring output very much. Instead, you typically look at the parameter estimates to see which ones affect your outcome the most - typically using odds ratio plots.

https://blogs.sas.com/content/iml/2015/07/29/or-plots-log-scale.html

I'd probably recommend starting there, and then pruning your model by removing variables that are not significant or don't seem to have much of an impact (significance and effect size are different things). Then you can make rules such as : if X2, X3, X4 are high this person is likely to be in the high performing group.
Barite | Level 11

Re: Interpreting 'Predicted' in Proc Logistic -- What's your take?

@Reeza wrote:
make rules such as : if X2, X3, X4 are high this person is likely to be in the high performing group.

Precisely.  That's what I've been trying to do.  Stepwise regression is supposed to eliminate the variables that are not meaningful.

Super User

Re: Interpreting 'Predicted' in Proc Logistic -- What's your take?

@NKormanik wrote:

@Reeza wrote:
make rules such as : if X2, X3, X4 are high this person is likely to be in the high performing group.

Precisely.  That's what I've been trying to do.  Stepwise regression is supposed to eliminate the variables that are not meaningful.

Then you need to look at the parameter estimates and the odds ratio, not the output you're currently examining.

Barite | Level 11

Re: Interpreting 'Predicted' in Proc Logistic -- What's your take?

This is a post-factual table that scores the observations according to the specified model. There is no differential treatment of observations before standard interrogation that follows a set of rules prescribed by the model. The observation with 92% correctness of prediction had to satisfy the criteria of many of the 100 predictors, not just X1. About seemingly out-of-place value 0.39, values do not have to be in order of magnitude, the probability depends on their circumstances.

Barite | Level 11

Re: Interpreting 'Predicted' in Proc Logistic -- What's your take?

@pink_poodle wrote:

The observation with 92% correctness of prediction had to satisfy the criteria of many of the 100 predictors, not just X1.

I'm interpreting that single observation to be golden -- like a nugget of solid gold among the rock rubble.

True, not only X1 to take note of, but all the 'significant' variables Proc Logistic has come up with.  Say:  X1, X7, X33, X49, X81.

Those all make up the Super Team.  Take extra special note of 'em.

When those babies line up, you're pretty safe to bet big.

Discussion stats
• 12 replies
• 1194 views
• 10 likes
• 4 in conversation