BookmarkSubscribeRSS Feed
mmagnuson
Quartz | Level 8

I ran a logistic regression model with only one independent variable. The c statistic was .5 which means that the model is no better than random chance. When I looked at the cross tabulation between the dependent and independent variable and ran a chi-square i got an x2 value of 931 with my chi squared value 3.84. It showed to be significant. The odds ratio was 2.83.

My question is that if it is significant then why does the model report .5 which seems like it would be only on chance. Is the model not appropriate for that specific independent variable since it shows only .5 for the c-statistic or is the c-statistic not that important?

 

There are 61000 rows of data. I have 10 variables that I'm looking at but have run them all in the regression analysis and have also looked at them independently to see if there are effects when others are added. Which there is. Odds ratios change minorly when other varaialbes are added.

All independent variables are binary.

6 REPLIES 6
ballardw
Super User

The logistic model shares a common feature with a more general class of linear models: a function of the mean of the response variable is assumed to be linearly related to the explanatory variables.

 

Note the LINEAR part. So when the statistic is as you report it may only be telling you that the relationship of the link function isn't linear and not "no relation".

 

Classic example is to use the data of y = x*x and run a linear regression model y = x where x is (-a,a). The linear will show a slope of 0 and report "not significant" but you know that there is a very (deterministic) relationship. So you may have to transform data or use a different approach.

 

Chi-square doen't assume any linearitly only checking if the proportion of each value is distributed in a similar fashion.

mmagnuson
Quartz | Level 8

So if the C statistic shows that there is no linear relationship then I should still be able to report the odds ratio.  The odds ratio doesn't need to worry about linearity at all correct?  In that case, would it be better to not run a logistic regression model and just manually calculate th odds ratios for each varaiable?

ballardw
Super User

If the model is not "good" I would be leary of reporting any details based on that model.

Do you get the same odds-ratios from Proc Freq?

 

Data and code go along way to providing more specific answers.

mmagnuson
Quartz | Level 8

Using only one variable i obtained a c statistic of .5.  When I included the other 14 variables in my logistic regression model the c statistic increased to .74 which seems to be "good" in terms of the model.  The only issue is that when I did this some odds ratios fluctuated but I'm guessing that has to do with taking into account the other variables in the model?

 

Example: In the picutre. it shows odds ratio of 2.83 but when added with all the other variables it decreases to 1.12.  Should both numbers be reported? One number being (2.83 - only if you look at this on variable) and (1.12 when you look at all varaibles in the model)?


info.jpg
ballardw
Super User

Different variables will often (almost always) result in different results.

You may also need to consider your diagnostics. If any of the model variables are missing for records then likely a reduced set of records were used for the model which would also affect such statistics.

 

What needs to be reported is based on the initial research question. Ideally the analysis plan should include the needed summaries, statistics and such before the data is looked at.

mmagnuson
Quartz | Level 8

Luckily all info in the varialbes is accounted for. 

So the initial research question was what factors contribute to donationg more or not.  First i looked at each independent variable compared with the dependent varaible and recorded the odds ratio. Then I loaded all of the independet variables in the model and the odds ratios fluctuated from when they were looked at individually vs as a group (attached).  The difference with the odds ratio must be due to all the other variables being in the equation.  But I guess what I'm hung up on is which odds ratio is "right" since there are odds ratios of the logistic regression analysis and odds ratios of comparing individual independent varialbes and the dependent variable.''

 

Would it be fair to say that the model would be more appropriate because it takes into account 13 other variables which is more "real life" since when making conclusions one should look at many variables?  Looking at just one variable is limiting?


differences.jpg

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1596 views
  • 0 likes
  • 2 in conversation