09-08-2012 02:07 PM
I am new to SAS and advanced statistics and am having problems understanding the result that a logistic regression is showing me.
I am doing this work for a job interview and thus desperately need some advice promptly.
I am analysing NFL trends for qualifying to the Play Offs
I would expect a stat called Combined EPA to be heavily correlated with qualifying for the NFL Playoffs.
When Qualifying on the Y axis and Combined EPA on the X axis I get the following result:
This appears to imply that as Combined EPA increases the probability of not qualifying increases, which is the opposite of my hypothesis
However I then plotted the graph with the axes switched and got the following result
Clearly from this graph teams that qualify have on average a higher Combined EPA then teams that didnt qualify.
Given these 2 graphs could someone please explain to me the correlation in the 1st graph between Not qualifying and a higher Combined EPA?
Thank you in advance for any assistance
09-08-2012 09:02 PM
You are correct to be confused. I understand the second graph. However, the first graph has some unknown continuous variable plotted on the vertical axis. Notice that there are points with vertical coordinates 0.3 and 0.25, so the vertical graph is not the plot of a dicotomous variable.
Is this output from JMP? Perhaps the JMP user's guide would shed some light on what is being plotted. There is also a dedicated JMP discussion forum at https://communities.sas.com/community/support-communities/jmp_software
09-10-2012 02:07 PM
If you were using the LOGISTIC procedure in SAS, I would ask to see your code. But it looks like you are using JMP (???). In LOGISTIC, the default is to model the probability of "0" (or the lowest code), not the probability of "1". This causes confusion for many people, because everything could appear reversed. There are ways to reverse this in LOGISTIC (including using DESCENDING in the procedure statement). Don't know about JMP. Your first graph is likely a plot of the estimated probability (a continuous variable from 0-1) versus the predictor.
09-10-2012 04:27 PM
The Parameter Estimates table only shows one regressor. Are you showing us the whole table?
LVM: if there were multiple covariates, I would say your guess is correct, but for 1-variable regression, isn't the predicted probability equal to the sigmoidal curve that is shown? How can several observations that have the same value of X get mapped to distinct predicted probabilities? I'm not doubting your explanation about the interpretation of the scatter plot, but it seems like there is more going on here than has been revealed.