DATA Step, Macro, Functions and more

C statistic vs Chi Square

Reply
Occasional Contributor
Posts: 17

C statistic vs Chi Square

I ran a logistic regression model with only one independent variable. The c statistic was .5 which means that the model is no better than random chance. When I looked at the cross tabulation between the dependent and independent variable and ran a chi-square i got an x2 value of 931 with my chi squared value 3.84. It showed to be significant. The odds ratio was 2.83.

My question is that if it is significant then why does the model report .5 which seems like it would be only on chance. Is the model not appropriate for that specific independent variable since it shows only .5 for the c-statistic or is the c-statistic not that important?

 

There are 61000 rows of data. I have 10 variables that I'm looking at but have run them all in the regression analysis and have also looked at them independently to see if there are effects when others are added. Which there is. Odds ratios change minorly when other varaialbes are added.

All independent variables are binary.

Super User
Posts: 10,500

Re: C statistic vs Chi Square

The logistic model shares a common feature with a more general class of linear models: a function of the mean of the response variable is assumed to be linearly related to the explanatory variables.

 

Note the LINEAR part. So when the statistic is as you report it may only be telling you that the relationship of the link function isn't linear and not "no relation".

 

Classic example is to use the data of y = x*x and run a linear regression model y = x where x is (-a,a). The linear will show a slope of 0 and report "not significant" but you know that there is a very (deterministic) relationship. So you may have to transform data or use a different approach.

 

Chi-square doen't assume any linearitly only checking if the proportion of each value is distributed in a similar fashion.

Occasional Contributor
Posts: 17

Re: C statistic vs Chi Square

So if the C statistic shows that there is no linear relationship then I should still be able to report the odds ratio.  The odds ratio doesn't need to worry about linearity at all correct?  In that case, would it be better to not run a logistic regression model and just manually calculate th odds ratios for each varaiable?

Super User
Posts: 10,500

Re: C statistic vs Chi Square

If the model is not "good" I would be leary of reporting any details based on that model.

Do you get the same odds-ratios from Proc Freq?

 

Data and code go along way to providing more specific answers.

Occasional Contributor
Posts: 17

Re: C statistic vs Chi Square

Using only one variable i obtained a c statistic of .5.  When I included the other 14 variables in my logistic regression model the c statistic increased to .74 which seems to be "good" in terms of the model.  The only issue is that when I did this some odds ratios fluctuated but I'm guessing that has to do with taking into account the other variables in the model?

 

Example: In the picutre. it shows odds ratio of 2.83 but when added with all the other variables it decreases to 1.12.  Should both numbers be reported? One number being (2.83 - only if you look at this on variable) and (1.12 when you look at all varaibles in the model)?


info.jpg
Super User
Posts: 10,500

Re: C statistic vs Chi Square

Different variables will often (almost always) result in different results.

You may also need to consider your diagnostics. If any of the model variables are missing for records then likely a reduced set of records were used for the model which would also affect such statistics.

 

What needs to be reported is based on the initial research question. Ideally the analysis plan should include the needed summaries, statistics and such before the data is looked at.

Occasional Contributor
Posts: 17

Re: C statistic vs Chi Square

[ Edited ]

Luckily all info in the varialbes is accounted for. 

So the initial research question was what factors contribute to donationg more or not.  First i looked at each independent variable compared with the dependent varaible and recorded the odds ratio. Then I loaded all of the independet variables in the model and the odds ratios fluctuated from when they were looked at individually vs as a group (attached).  The difference with the odds ratio must be due to all the other variables being in the equation.  But I guess what I'm hung up on is which odds ratio is "right" since there are odds ratios of the logistic regression analysis and odds ratios of comparing individual independent varialbes and the dependent variable.''

 

Would it be fair to say that the model would be more appropriate because it takes into account 13 other variables which is more "real life" since when making conclusions one should look at many variables?  Looking at just one variable is limiting?


differences.jpg
Ask a Question
Discussion stats
  • 6 replies
  • 119 views
  • 0 likes
  • 2 in conversation